Monday, October 17, 2011

Configuration Automation with Gradle

For a while I've been following the Gradle ecosystem, seeing it grow by leaps and bounds.  As a build system and automation platform, Gradle provides a strong value proposition around a pluggable lifecycle model combined with easy task definition and the Groovy programming language.  One of the areas where Java still has some pain points is in configuration management.  You can bootstrap your development environment easily enough with Eclipse template projects, Grails, Spring Roo, Maven Archetypes and the like.  However, what about your deployment environment?  With the large number of Java app servers, message brokers, cache servers, and other interesting things being developed - an automation system around continuous deployment looks like the next logical step.  Gradle is positioned to take that step in my opinion.  What follows is a collision of ideas - Gradle, continuous integration, continuous deployment, cloud computing, and where we can find the next evolution in Java automation.

Today this space is largely filled by solutions like Puppet, Chef, and a handful of other tools that tackle server administration automation generically - usually following a concept of cookbooks and repeatable dependency management for the platform.  While they do support the various Java environments (Tomcat, ActiveMQ, etc), the lack of pure Java integration in the automation stack means you cannot exploit Java's capabilities directly without jumping through an interop layer.  Here are some example ideas on what it would be nice to do:

  •  Have a configuration management automation system that integrated with JMX, feeding information back to a management console
  •  Share or re-use Java assets in the automation workflow - e.g. using your Spring Batch beans as part of automating the setup of your database
  •  Leverage Java APIs in the automation system to distribute capabilities - e.g. start tomcat, start activeMQ, generate test JMS messages to validate connectivity, or perhaps use JMX to interrogate server status to validate sanity
  •  Build configuration artifacts shared directly into integration and production server environments (e.g. properties files, Spring bean files, etc)
  •  Provide a platform for next-generation Java platforms (OSGi, Cloud, etc)


The more I think about this, the more it makes sense to introduce some Gradle plugins that expand current task models into the continuous deployment and configuration management space.  Here is a sample list of tasks that could be executed in a Gradle build 
  1.  Compile, test, package
  2.  Jetty/tomcat integration tests, bootstrapping their configuration
  3.  Deploy to Tomcat cluster, update local config - updating the software artifacts with local software config (e.g. hostname) as necessary
  4.  Validate tomcat cluster sanity
  5.  Initialize database - not with bash scripts running SQL commands, but Groovy SQL, your data access layer jar being invoked, etc

Steps 1 and 2 are what you do today with Gradle in your own dev environment.  When you think about steps 3-5 in the various environments out there - from your home grown environments, to larger app servers, to virtual machines, a task framework around server management within Gradle looks more and more attractive. 

Sunday, August 14, 2011

Stardog and Spring Framework

Last week, Clark&Parsia released an initial integration between Stardog and Spring. To quote the Stardog site, Stardog is a commercial RDF database: insanely fast SPARQL query, transactions, and world-class OWL reasoning support. Of course, Spring provides a leading technology stack for rapid development of Java applications. Almost all projects support Spring integration in one form or another - with the exception of the Semantic Web technology stacks. So, working with C&P, we came up with an initial integration of Stardog and Spring.

Stardog-Spring 0.0.1 provides the initial groundwork for Spring developers to get started with Stardog, and in general, Semantic Web technology. Over time, the Stardog Spring integration will be expanded to support some of the larger enterprise capabilities from Spring, such as Spring Batch. Stardog-Spring is open source, available on Github, and licensed under the Apache 2.0 license.

For 0.0.1, there are three fundamental capabilities:
  1. DataSouce and DataSourceFactoryBean for managing Stardog connections
  2. SnarlTemplate for transaction- and connection-pool safe Stardog programming
  3. DataImporter for easy bootstrapping of input data into Stardog
The implementations follow the standard design patterns used across the Spring Framework, so if you are familiar with JdbcTemplate, JmsTemplate, etc you will be right at home with the SnarlTemplate.  The SnarlTemplate provides interface callbacks for querying, adding, and removing data - abstracting away the boilerplate connection handling and transaction handling for you.  Likewise, the DataSource and the FactoryBean look and feel very much like SQL dataSource's and factory beans within Spring.  

You can read the documention here and get the source here.  There is also a downloadable jar from Github as well.

This implementation was built with Gradle, and you need edit the build.gradle file to point at your Stardog release for it to build.  Of course Stardog-Spring works well with Spring Jena and Groovy SPARQL.

Last but not least, you will have to sign up with the Stardog testers to get the current version.  Eventually there will be a community-style edition and enterprise style edition of Stardog.

Saturday, July 30, 2011

Linked Data Microframework: Linked Ratpack

The other day I ran across some of the Sinatra inspired web microframeworks available in various languages, including Ratpack for Groovy.  Given RDF builder DSL in Groovy Sparql, I thought it would be a nice thought expertiment to create a microframework for linked data and RDF.  After an afternoon of coding and testing, the results look quite promising.  So here it is - Linked Ratpack, a microframework for Linked Data.

Linked RP works the same way Ratpack does - you provide a single Domain Specific Language (DSL) script where you write your methods to perform some function on a URL, and it weaves those in to a Jetty container.  In this case, I've added some capabilities to Ratpack to work with linked data:

  • RDFBuilder from Groovy SPARQL is automatically available to the DSL script under the 'rdf' variable
  • link(String endpoint) is available as a function to get an instance of the Groovy SPARQL Sparql class for performing Sparql queries.
  • resolve(String uri) is a new piece of functionality that uses Groovy's HTTPBuilder DSL and Jena to retrieve a URL and read it into RDF.  It should work across various RDF serialization types, and likely bomb out on HTML or anything else if you feed it an incorrect URI
The following Gist illustrates everything fairly nicely:



You can now browse to the following URLs:
  • localhost:4999/
  • localhost:4999/tim
  • localhost:4999/groovy

Note: since Jena models being returned by those functions get automatically serialized back out - if you want to do serialization inline - return null

To get started with Linked Ratpack, you must do the following:
  1. Get Groovy SPARQL from Github, and build/install it with Gradle
  2. Get Linked Ratpack from Github, and build it
  3. Create simple groovy scripts, like the above gist, and run "ratpack path/to/whatever.groovy"
This will start an HTTP server on whatever port you define in the DSL.  After that, you can start browsing to your URLs, hooking up SPARQL endpoint and generating RDF.

For me, this is one of the missing pieces in building linked data applications - an easy way to stand up little RDF servers to test walking RDF graphs hop-by-hop and perform URI de-referencing, and experimenting with generating derivative RDF sites from other RDF data sources (e.g. SPARQL Construct).

Many thanks to Justin Voss ( @ github ) for creating Ratpack in the first place, it was a solid foundation to build off of.

Enjoy!

Wednesday, July 13, 2011

Groovy SPARQL 0.2 Available

Version 0.2 of Groovy SPARQL is now available.  This minor release includes a Groovy DSL for RDF, now you can build RDF and then query it.  The Groovy DSL is fairly flexible and takes advantage of a number of Groovy features including:
  1. Optional syntax in Groovy 1.8 for more fluid DSLs
  2. GPars, aka the Groovy Parallelizer for asynchronous output hooks
  3. Usage of the BuilderSupport class
 Per previous posts on the blog, if you want to use it in Grails / GroovyConsole / other apps, I recommend downloading, doing the Gradle build, install into your local Maven repo and then you can include it easily enough in whatever build environment you are using.

Here is a GIST showing the RDFBuilder DSL in action, with comments noting all of the 'features' available so far.  This is still a work in progress, and the more I attempt to use it to build FOAF and other vocabularies, I'm sure I'll be shaking some bugs out (not the least of which is the wonderful world of URI fragments).



Enjoy!

Thursday, July 7, 2011

Gradle, Maven, and Grapes Working Together

For Groovy SPARQL and Spring Jena, I wanted to start leveraging these in little test Groovy scripts running in the console or command line.  At first, I assumed the maven install that happens in their Gradle builds would immediately be picked up by Grape.  However, Grape uses Ant+Ivy, and Ivy does not look in your maven repo by default (doesn't it seem like it should?).

So here are the missing pieces of the puzzle:
  1. Setup your Gradle build to create a POM and install your jars into your local maven repo
  2. Add  Maven repo support to your Grape configuration.  Grape configuration is in ~/.groovy/grapeConfig.xml - and it's an Ivy file in disguise.  See [*] below for an example which is down near the bottom of the Grape documentation.
  3. Install your POM artifacts into Grape.  For Groovy SPARQL, the command is:    grape install org.codehaus.groovy.sparql groovy-sparql 0.1
  4. Now you can use Grape, e.g. @Grab('org.codehaus.groovy.sparql:groovy-sparql:0.1')

You can also use grape list to see what jars are now available.

All in all, this makes tools like the Groovy Console an excellent REPL for both Java, Groovy, and presumably other Java polyglot programming.

     * Here is the sample grapeConfig.xml file from the Groovy documentation.


    Tuesday, July 5, 2011

    Announcing Spring Jena

    On the heels of Groovy SPARQL, here is an initial code base for standard Java and Spring applications -- Spring Jena!

    The Spring folks have been putting together an impressive portfolio of data oriented capabilities for NoSQL data stores.  To compliment the those capabilities, here is Spring Jena - a project I hope to propose back to the Spring community to provide direct Jena API support and direct SPARQL.

    Much like Groovy SPARQL, this is a relatively simple code base that applies the template design pattern to Jena and ARQ to simplify every day needs for creating, modifying, and querying RDF data.  There is a lot more work to do here, most noteably the parameterized queries.

    Get Spring Jena @ Github here.

    The roadmap includes:

    • Spring datastore/mapping support for object relational mapping, once those projects reach 1.0
    • Spring Transaction support - wrap Jena native transactions or provide app-level transaction management via Spring
    • Abstraction for triple stores - likely aligned against the Datastore interface in Spring Data
    • QuerySolutionMap overloading to the methods in the SparqlTemplate
    • Web / MVC capabilities, such as a taglib

    Here is a GIST to get you going:



    Enjoy!

    Tuesday, June 28, 2011

    Gradle Build to create POM

    Any Java/java-compatible language compile these days is going to want to create a POM file - POM's to install locally, POM's to upload remotely, a POM for all seasons.

    For Groovy SPARQL, I wanted to create a simple POM, so folks could add dependencies to their project and start using it.  I'm using the latest and greatest Gradle to build the project, and wanted to simply add 'apply plugin' to get a POM created.  Unfortunately that errored out with a missing group Id.  This wasn't surprising, since it wasn't clear if gradle was going to somehow extract a groupId automatically or if you would specify it.

    The Gradle docs have the example, but it's down at example 36.8, and since it doesn't show groupId, it didn't jump out as obvious as this was the correct way to do it.  So here it is, a quick gist on how to get a simple gradle artifact into POM format and installed.

    This created a ./build/poms/default-pom.xml with correct groupId, artifactId, and dependencies (properly scoped).

    Sunday, June 26, 2011

    Announcing Groovy SPARQL

    As part of my on-going weekend hobby of looking at Groovy and the Semantic web technology stack, I've created a quick port of Groovy SQL support style operations into the land of SPARQL.  So, just like you can do Sql.eachRow(closure), you can now do Sparql.eachRow(closure)

    Below is a Gist that shows the features of the relatively small API.  Note that this was also a dive into some Groovy 1.8 and Gradle, and was coded up this weekend inbetween BBQs using SpringSource Toolsuite 2.7.0M2 for the Groovy 1.8/Gradle support.

    Other features to come include:
    • Fluent DSL, leveraging Groovy 1.8 features 
    • Pure Java "Templates" for Jena/SPARQL similar to JdbcTemplate/jmsTemplate in Spring
    • Object marshalling and GORM / Spring Data support
    • Sparql / RDF Builder -- still deciding if this is necessary or not, or if it'll fall naturally into the DSL
    • Grails plug-in for the above
    • Testing with triples stores Jena TDB, Stardog, and AllegroGraph being the first three
    The code is up on Github here




    Enjoy!

    Sunday, June 12, 2011

    Spring Controller with Date Object Bindings

    UPDATE (28 Dec 2011): The new Spring 3.1 release includes a new DateFormatter, which provides this flexibility for passing in date fields in URLs.  Below is the original post for those still interested in the WebDataBinder API. 

    I guess this is turning into a series on time handling, but I wanted to share how to use the data binders in Spring to pass around date classes.  After all, you may want to have URLs that end in some representation of a "day" to page through time-sensitive data, or filter your data based on time, etc.

    For example, if you wanted to have a URL parameter of the form "yyyy-MM-dd", i.e. the XML Schema format, and marshal it into a java.util.Date, how would you go about doing this?

    Well, Spring provides:
    • an InitBinder to initialize WebDataBinders on a controller - i.e. setup a the binding mechanism on a Spring controller automatically
    • WebDataBinder to register custom property editors to convert certain types.  In this case the type is java.util.Date, and the property editor is also provided by Spring - CustomDate Editor
    With this, you can now use a date class with:
    • @RequestParam(value="date", required=false) Date date
    • @RequestMapping(value="/date/{date}", method=RequestMethod.GET) ...
    • @PathVariable("date") Date date

    Here's the code:



    Enjoy!

    Saturday, June 4, 2011

    Jena and DateTime

    Before you start storing timestamp literals in your triple store, consider the value of typed literals.  Typed literals can be easily converted to their native language type, in this case - java.util.Calendar.  Furthermore, SPARQL lets you do things like compare and filter on date values. 

    Here is a quick JUnit test to illustrate using Jena to create/read triples with dateTime typed literals, as well as using ARQ to query for dateTime literals.



    Some notes on the SPARQL (not shown above):
    - A dateTime literal can be used in SORT expressions, e.g. ORDER BY ASC(?date)
    - A dateTime literal can be used in FILTER expressions with comparisons such as FILTER ( ?date >= "$someDateTime"^^xsd:dateTime
    - Note that in SPARQL, you explicitly type the literal. You should define a PREFIX with xsd: so you can use the short hand

    Using xsd:dateTime is a good choice because the API translates to java.util.Calendar, of which you can easily do things like get your hands on a java.util.Date, automatically handle time zones, etc.

    Friday, June 3, 2011

    Two ways to Query Linked Data with ARQ

    ARQ provides two ways to query Linked Data, i.e. remote SPARQL Endpoints.  The first one us to use the "sparqlService" method on the QueryExecutionFactory. 

    E.g.: 

    QueryExecution qe = QueryExecutionFactory.sparqlService("http://dbpedia.org/sparql", query);

    The other way can be done with a QueryExecutionFactory that is also attached to a Jena Model, however you specifcy the "service" in the SPARQL directly.

    E.g.:

    QueryExecution qe =  QueryExecutionFactory.create(query,model);

    Sparql:
    SELECT ?s ?p WHERE {
        SERVICE <http://dbpedia.org/sparql> {
            ?s ?p <http://en.wikipedia.org/wiki/Sparql>
        }
     }

    Enjoy traversing the linked data!

    Saturday, May 28, 2011

    Groovy 1.8 Logging

    One of the great features in Groovy 1.8 is the introduction of logging capability.  You get the following annotations:
    • @Log for java.util.logging
    • @Commons for Commons-Logging
    • @Log4j for Log4J
    • @Slf4j for SLF4J
    You can add one of these annotations to any class, and your class automatically will pick up a log property to use.  In Groovy 1.7/Grails 1.3.x, adding logging to a non-Grails managed class is straight forward as well, you just add a simple statement:

        private static final log = LogFactory.getLog(this)

    However, so an annotation to eliminate one line isn't a big deal - what is a big deal is what the groovy "log.X" methods compile into.  From the release notes: you wind up with "log.info" turns into:

    if (log.isLoggable(Level.INFO)) {
        log.info 'whatever your message was'
    }


    This subtle change is a great performance booster -- you get a quick boolean check, to avoid a bunch fo string buff/string concat operations, followed by actually sending the log message down into the log system itself - only to be filtered out by the level.

    I've seen very verbose and 'noisy' logging by Java applications reduce performance by 10-20% depending on how enthusiastic you are about logging and how resource constrained your platform is.  Likely not an issue for bigger setups - but being a little performance conscious here or there can go a long way.

    Edit (7-6-11): Used the @Slf4j annotation in Groovy SPARQL, it appears to pick up the dependencies automatically which is also nice. 

    When doing this, I just had to make sure I had the latest Groovy-Eclipse for Groovy 1.8 source editing.  Otherwise, uninitialized "log" variables are errors in Groovy 1.7.x. 

    Friday, May 27, 2011

    How to Launch Spring Batch from Quartz in Grails

    Here is a quick tip on how to launch Spring Batch from a Quartz job in Grails.  Why would you want to do this?  To periodically run batch jobs in your Grails application.

    You will likely want this to be a non-concurrent Quartz job, so that you don't have concurrent batch processing going on.  Other then that, the Quartz job needs the job bean to run, and a job launcher.  Those can be configured in your standard Spring Batch way in the resources.xml file in Grails.




    Other notes:
    • TaskExecutor configuration works great, and I would love to see GPars as a mechanism to define task executors for Spring Batch
    • There were no issues using Groovy classes as implementors of ItemReader, ItemProcessor, or ItemWriter in Spring Batch
    • There were no issues having Spring Batch beans, step scoped, injected with Grails managed beans
    • Grails transaction manager did not appear to be picked up by Spring Batch, so I declared a ResourcelessTransactionManager
    • Do not forget to use StepScope, so you can re-use other beans in your Grails configuration directly in your configured jobs, simply delcare with: 
    <bean class="org.springframework.batch.core.scope.StepScope" />