Sunday, December 22, 2013

Build your own PaaS with Pallet, Ubuntu, and Java

Lately I've been using Pallet, a Platform as a Service (PaaS) library written in Clojure. It's great to see the building blocks of cloud computing coming together in libraries, prime for using to create new innovative packaging and deployment capabilities.  You can think of Pallet as Chef Recipes or Puppet, but instead of configuration files or Ruby, you write Clojure.  One of the main reasons I like Pallet is that it is a rich library to build your own PaaS capability, and requires very little from the nodes you build.  In fact, the only dependency is being able to SSH into a node and execute commands.  No agents, no servers, no repositories out of the box.  Copying jar files, executing apt-get, etc are all possible and you have flexibility to build what you want.

While the Pallet site has comprehensive API documentation, there is only one quick start that gets you going with EC2.  While I enjoy EC2 as much as the next person, the use case I was working on called for a more local development environment.  As luck would have it, Pallet has a library called VMFest, which is an abstraction over Oracle's VirtualBox.  Later, we would also pickup their Docker support, but that's a story for another day.  The nice thing about VMFest is you can use VirtualBox as a cloud provider.  On a reasonable piece of hardware, this means you can spin up virtual machines in 5-7 minutes, and they're full VMs that you can control.  Last but not least, I wanted Ubuntu and Java on these VMs as a solid base example to work from.  The following is a walk through of how to get started with Pallet, use VMFest as a good starter compute service, and install your first package - Java.

Pallet-java-example, the tutorial repo, is in Github.


Prerequisites:

  1. Oracle VirtualBox version 2.4 or later
  2. Leiningen, e.g. brew install leinginen
  3. Something to edit Clojure with, I recommend Light Table
  4. Follow the VirtualBox setup in the README (refer to Pallet-VMFEST Readme for issues)

With these requirements in hand, let's look at how you create this environment.  i started with the Pallet lein plugin for creating an example project.

lein new pallet example

With the environment in place, I then proceeded to modify the project.clj file to include the VMFest dependencies, and the Pallet Java crate.  In pallet, a crate is a collection of functions that are grouped together as a reusable unit, much like a Chef recipe. 



You will notice that we use the Virtual Box web service.  There is also a local COM interface, which I presume is higher performing but is not portable across all environments. For this environment, a few more miliseconds to talk to the VirtualBox isn't a big deal, so we'll go portability for ease of setup on different environments.  One important note, you cannot have both vbox dependencies in your configuration, as they use classpath loading and clash with eachother.

The next step is to look at what the lein pallet plugin generated for us.  The good news is that this is almost everything we need.  Let's take a look at the edited file:



In pallet terms, a node is an instance of a software stack running on a compute service, i.e. a VM in our instance with all of the software installed.  You'll see that we have a node-spec, some server-specs, and a group-spec.  Pallet provides flexibility in defining profiles of what you want the node (e.g. machine level parameters), sever (most of your software), and the group (your cluster).  These are all then converged or lifted together (i.e. deployed).  These layers of configuration are applied in order of phases: bootstrap, install, configure.

Some important notes here

  • In the base server, we define {:bootstrap (plan-fn (automated-admin-user))}, which is telling pallet that during the bootstrap phase of the node, execute the function for automated-admin user.  For the EC2 tutorial, you provide Pallet with your EC2 credentials to solve the chicken-in-egg problem of how do you login to create the first user.  Pallet-vmfest solves this with a sudo user and sudo password, stored in the .meta file in ./vmfest/models.  If this inforation is not present, the SSH fails and therefore pallet executions fail.
  • In the group-spec, you see that we extend java/server-spec.  This is where the crate is defined and used. In the end, it's just more functions, and if you configure them into a group-spec, they will run.

Now we're just about ready to create our cluster, but first we need to get our Ubuntu base image, and then pull it all together.



There are a couple things going on here:
  1. add-image is from vmfest, and adds an image and a .meta file to your ~/.vmfest/models directory
  2. converge is the main function you'll use to bring up and down a cluster - converge to a server count, and converge to 0 to destroy it
  3. nodes is a function that prints out your current nodes, also from vmfest

This should help folks get started with Pallet, gives you an introduction to a PaaS running on your local server, and is a fun way to apply Clojure to a domain.  Big thanks go out to Hugo Duncan and Antoni Batechilli, they both are very helpful to all who join #pallet and get started.


To recap the links:

  1. Pallet-java-example repo, i.e. the source for this article
  2. Pallet official website
  3. VMFest and Pallet-VMFest
  4. Oracle VirtualBox
  5. LightTable

Enjoy!


Sunday, February 10, 2013

GroovySparql 0.6 Released

Groovy Sparql 0.6 is now released and available via Maven Central.  In addition to upgrading to the latest Apache Jena versions, the groupId in the build now produces com.github.albaker so you can do an easy @Grab on the artifact.

Maven details are:
  • groupId: com.github.albaker
  • artifactId: GroovySparql
  • version: 0.6

This should help folks wanting to try Linked Data a very easy way using all of the usual Groovy tools (GroovyConsole, groovysh (REPL), groovy <script>), etc.  Grails plugin is also in the works.


Saturday, June 23, 2012

Stardog and Spring Framework Example

Ever since the initial release of Stardog Spring support, I wanted to create a simple example application for people to learn how to use the Spring Beans and introduce semantic web technology using Stardog into traditional Spring technology stack implementations.  So I'm happy to announce today a small sample application is now available on Github, the Stardog PetStore.  Following the Spring PetClinic example, the Pet Store uses the metaphor of a store to go buy a pet - in this case, dogs!

What we'll see in the sample application is:
  • A Gradle build that references the Stardog library folder, one of the easiest ways to pull Stardog into a build
  • A basic Spring 3.1 Web MVC application, with one controller, and a couple views.
  • The usage of traditional Spring beans files, and some stereotype annotations for autowiring of the Data Access Object
  • Usage of the SnarlTemplate class to create a data access object for a POJO
  • Usage of the DataImporter capability to load some triples at initialization time

There are, of course, lots of fun things going on with Spring 3.1 with Java configurations, javax.inject annotations, environment abstraction, etc.  This sample was kept simple by following the standard 'Simple MVC Template' project available in SpringSource Toolsuite, so anyone learning Spring can follow the parallels of those samples.  Also SnarlTemplate is analogous to JdbcTemplate, so it uses the traditional Data Access Object design pattern as opposed to new rich domain models and Spring Data.  Like JdbcTemplate, this gives full control over the native APIs and lets the mappings manipulate or use as much of the underlying system as necessary.  in our case, mapping to triples, naming our properties with URIs (i.e. RDF predicates) is a good reason to have this access.  Using Stardog Empire support for RDF backed JPA and Spring Data JPA Repositories is certainly on the TODO list and a good evolutionary next step.

The prerequisite steps for this tutorial are:
  1. Obtain Stardog from stardog.com and unzip somewhere, make sure license is in Stardog home
  2. Create a database called 'petstore', e.g. "stardog-admin create -n petstore"
  3. Make sure you have Gradle 1.0 build available, command line, Eclipse, etc

Let's kick things off with the build.  After checking out the project from Github, update the build.gradle file and set the location where the Stardog/lib folder exists.



The next step is to run the gradle build.  I recommend the recently released Gradle 1.0 and SpringSource Toolsuite 2.9.2, since it lets you specify the gradle folder directly.  Command line builds with Gradle also work nicely.

You'll see the Stardog Spring beans in the root application context, where we are connecting to Stardog via their SNARL protocol.  The only requisite step here is to follow the Stardog documentation and create a database called 'petstore', i.e. "stardog-admin create -n petstore."



The spring/appServlet/servlet-context.xml contains the standard Spring MVC related beans, and instructs Spring to component scan the com.example.stardog package.  From here it will find the HomeController, DogDAO and wire them togeter with the corresponding SnarlTemplate and DataSource defined above.

Looking at the HomeController, we see a run of the mill Spring controller with the following URLs:
  • "/" - the default list of dogs in the store
  • "/create" - the form for adding a new dog
  • "/delete/id" - the GET URL to remove a dog
In all cases, the controller operates on a DAO to retrieve and manipulate the Dog POJO.  The Dog POJO itself has three properties:
  • Name (String, will be referenced inside a URL for the RDF subject)
  • Wiki URL (e.g. the wikipedia page for the breed)
  • Photo URL (e.g. a wikipedia page)
The DogDAO is where the interesting intersection with Stardog and Spring happens.  Each of the usual persistence methods (list, get, add, update, remove) are backed by operations on the SnarlTemplate.  The URL properties used for RDF creation are referenced in a Constants enum. In this case I reused pieces of the FOAF vocabulary - it's not hard to imagine a linked data query helping to fill out more releated information about the subject in question.

Design note: As an aside, creating the DogDAO for pure domain object abstraction highlighted the need to enhance the SnarlTemplate with more of the equivalent methods like JdbcTemplate (e.g. queryForObject), and support Spring TX management.

If everything works out correctly for you, you'll be able to browse and see a couple of entries listed on the web page.

Default List of Dogs in the Store

The list of dogs gets loaded from a dogs.n3 file, found in src/main/resources.

Lessons Learned from this sample:
  1.  Enterprise grade applications built with Semtech is certainly possible, and with proper encapsulation the RDF and SPARQL can live in the data layer, leaving the business logic the usual bundle of POJO joy
  2. Stardog provides a rich development experience, friendly to Java developers
  3. Gradle saves a tremendeous amount of time by being able to interweave in file system trees along side Maven style dependencies
  4. There is an opportunity for balancing the encapsulation of persistence information (i.e. predicate URIs), business logic, and semantic web exposure (i.e. generating RDFa, RDF representations of POJO resources or REST services)
 My objective is to maintain this example project as a showcase of Semantic web capability and add to it:
  • Linked Data de-reference (e.g. using Spring Jena to query DBPedia)
  • Spring Content Negotiators to expose an RDF resource or RESTful RDF service maybe with JSON-LD
  • Add Empire and a JPA sytle example so SnarlTemplate and JPA can be compared and contrasted
  • Add a few more relationships and triples to showcase the power of SPARQL

Enjoy!


------------------------------------------
References:
  1. PetStore Sample on Github
  2. Stardog - Download and obtain license here
  3. Stardog Docs - Includes chapters on Stardog Spring
  4. Clark & Parsia - creators of Stardog
  5. Spring 3.1 Documentation
  6. Gradle and SpringSource Toolsuite Tutorial

Tuesday, May 22, 2012

Executable Wars with Gradle and Jetty

One of the things I recently wanted to do was create a set of Java based utility components that could be easily packaged (aka one delivery file), run together, and that all leveraged assets created inside a webapp.  Normally this would involve creating a 'fat jar', which takes all exisitng library classes and flattening them into the fat jar.

The upside of this is that you don't need any special class loaders, all the classes required by the application are now packaged directly in the jar.  The downside is that anything that used to live in META-INF folders in the third party libraries now get clobbered together in a single META-INF.  Of course, anything that used to rub up against servlet APIs, web aplication contexts and the what not will also seemingly break.

After poking around for a bit, the executable war file seemed like the way to go since it avoided some of these pitfalls and has the following benefits:
  • All third party jars can be packaged in a WEB-INF/lib
  • Solid and true jetty-6 provides a stable foundation for running a quick embedded container to run the war (i.e. itself)
  • All code written for a webapp can be immediately consumed
  • Remotability for the tool is immediately available
  • Once you open this pandora's box, wild-eyed ideas sprout up like the first executable war could take a list of wars as a command line arguments and deploy them all in itself.. "it's war files all the way down!"
Riding the Gradle band wagon, I wanted to try doing this in straight up Gradle script without re-using existing Ant tasks.  The following was done with Gradle 1.0-rc5


Some notes on this evening's experiment:
  • Jetty 8 has an 'orbit' file that Gradle doesn't yet handle gracefully.  There were some workarounds online, but I wanted this at a one hour research task so jetty 6 it was..
  • Tomcat 7 has a simple API for instantiating and running an embedded Tomcat, I just haven't gotten around to trying that out yet
  • "Gradle as Jetty Runner", "Gradle as Tomcat Runner", or plain old Groovy command lines are all valid options for doing the same thing, but in this case I wanted 1 file, 1 command
  • After so many years of Maven relaxation, it was kinda fun having direct control over build configuration primitives in the build file again and be able to use them in simple one liners
This already has me thinking of another little experiment -- self deployable agent wars using the above + MBeans, but that'll be for another night!


Monday, October 17, 2011

Configuration Automation with Gradle

For a while I've been following the Gradle ecosystem, seeing it grow by leaps and bounds.  As a build system and automation platform, Gradle provides a strong value proposition around a pluggable lifecycle model combined with easy task definition and the Groovy programming language.  One of the areas where Java still has some pain points is in configuration management.  You can bootstrap your development environment easily enough with Eclipse template projects, Grails, Spring Roo, Maven Archetypes and the like.  However, what about your deployment environment?  With the large number of Java app servers, message brokers, cache servers, and other interesting things being developed - an automation system around continuous deployment looks like the next logical step.  Gradle is positioned to take that step in my opinion.  What follows is a collision of ideas - Gradle, continuous integration, continuous deployment, cloud computing, and where we can find the next evolution in Java automation.

Today this space is largely filled by solutions like Puppet, Chef, and a handful of other tools that tackle server administration automation generically - usually following a concept of cookbooks and repeatable dependency management for the platform.  While they do support the various Java environments (Tomcat, ActiveMQ, etc), the lack of pure Java integration in the automation stack means you cannot exploit Java's capabilities directly without jumping through an interop layer.  Here are some example ideas on what it would be nice to do:

  •  Have a configuration management automation system that integrated with JMX, feeding information back to a management console
  •  Share or re-use Java assets in the automation workflow - e.g. using your Spring Batch beans as part of automating the setup of your database
  •  Leverage Java APIs in the automation system to distribute capabilities - e.g. start tomcat, start activeMQ, generate test JMS messages to validate connectivity, or perhaps use JMX to interrogate server status to validate sanity
  •  Build configuration artifacts shared directly into integration and production server environments (e.g. properties files, Spring bean files, etc)
  •  Provide a platform for next-generation Java platforms (OSGi, Cloud, etc)


The more I think about this, the more it makes sense to introduce some Gradle plugins that expand current task models into the continuous deployment and configuration management space.  Here is a sample list of tasks that could be executed in a Gradle build 
  1.  Compile, test, package
  2.  Jetty/tomcat integration tests, bootstrapping their configuration
  3.  Deploy to Tomcat cluster, update local config - updating the software artifacts with local software config (e.g. hostname) as necessary
  4.  Validate tomcat cluster sanity
  5.  Initialize database - not with bash scripts running SQL commands, but Groovy SQL, your data access layer jar being invoked, etc

Steps 1 and 2 are what you do today with Gradle in your own dev environment.  When you think about steps 3-5 in the various environments out there - from your home grown environments, to larger app servers, to virtual machines, a task framework around server management within Gradle looks more and more attractive. 

Sunday, August 14, 2011

Stardog and Spring Framework

Last week, Clark&Parsia released an initial integration between Stardog and Spring. To quote the Stardog site, Stardog is a commercial RDF database: insanely fast SPARQL query, transactions, and world-class OWL reasoning support. Of course, Spring provides a leading technology stack for rapid development of Java applications. Almost all projects support Spring integration in one form or another - with the exception of the Semantic Web technology stacks. So, working with C&P, we came up with an initial integration of Stardog and Spring.

Stardog-Spring 0.0.1 provides the initial groundwork for Spring developers to get started with Stardog, and in general, Semantic Web technology. Over time, the Stardog Spring integration will be expanded to support some of the larger enterprise capabilities from Spring, such as Spring Batch. Stardog-Spring is open source, available on Github, and licensed under the Apache 2.0 license.

For 0.0.1, there are three fundamental capabilities:
  1. DataSouce and DataSourceFactoryBean for managing Stardog connections
  2. SnarlTemplate for transaction- and connection-pool safe Stardog programming
  3. DataImporter for easy bootstrapping of input data into Stardog
The implementations follow the standard design patterns used across the Spring Framework, so if you are familiar with JdbcTemplate, JmsTemplate, etc you will be right at home with the SnarlTemplate.  The SnarlTemplate provides interface callbacks for querying, adding, and removing data - abstracting away the boilerplate connection handling and transaction handling for you.  Likewise, the DataSource and the FactoryBean look and feel very much like SQL dataSource's and factory beans within Spring.  

You can read the documention here and get the source here.  There is also a downloadable jar from Github as well.

This implementation was built with Gradle, and you need edit the build.gradle file to point at your Stardog release for it to build.  Of course Stardog-Spring works well with Spring Jena and Groovy SPARQL.

Last but not least, you will have to sign up with the Stardog testers to get the current version.  Eventually there will be a community-style edition and enterprise style edition of Stardog.

Saturday, July 30, 2011

Linked Data Microframework: Linked Ratpack

The other day I ran across some of the Sinatra inspired web microframeworks available in various languages, including Ratpack for Groovy.  Given RDF builder DSL in Groovy Sparql, I thought it would be a nice thought expertiment to create a microframework for linked data and RDF.  After an afternoon of coding and testing, the results look quite promising.  So here it is - Linked Ratpack, a microframework for Linked Data.

Linked RP works the same way Ratpack does - you provide a single Domain Specific Language (DSL) script where you write your methods to perform some function on a URL, and it weaves those in to a Jetty container.  In this case, I've added some capabilities to Ratpack to work with linked data:

  • RDFBuilder from Groovy SPARQL is automatically available to the DSL script under the 'rdf' variable
  • link(String endpoint) is available as a function to get an instance of the Groovy SPARQL Sparql class for performing Sparql queries.
  • resolve(String uri) is a new piece of functionality that uses Groovy's HTTPBuilder DSL and Jena to retrieve a URL and read it into RDF.  It should work across various RDF serialization types, and likely bomb out on HTML or anything else if you feed it an incorrect URI
The following Gist illustrates everything fairly nicely:



You can now browse to the following URLs:
  • localhost:4999/
  • localhost:4999/tim
  • localhost:4999/groovy

Note: since Jena models being returned by those functions get automatically serialized back out - if you want to do serialization inline - return null

To get started with Linked Ratpack, you must do the following:
  1. Get Groovy SPARQL from Github, and build/install it with Gradle
  2. Get Linked Ratpack from Github, and build it
  3. Create simple groovy scripts, like the above gist, and run "ratpack path/to/whatever.groovy"
This will start an HTTP server on whatever port you define in the DSL.  After that, you can start browsing to your URLs, hooking up SPARQL endpoint and generating RDF.

For me, this is one of the missing pieces in building linked data applications - an easy way to stand up little RDF servers to test walking RDF graphs hop-by-hop and perform URI de-referencing, and experimenting with generating derivative RDF sites from other RDF data sources (e.g. SPARQL Construct).

Many thanks to Justin Voss ( @ github ) for creating Ratpack in the first place, it was a solid foundation to build off of.

Enjoy!