Ravello and Hadoop, using Brooklyn

Originally Posted by Sam Corbett

I was recently asked to demonstrate the use of Brooklyn to deploy enterprise applications to Ravello.

Ravello specialises in taking existing on-premise applications to the cloud, for development and testing. They provide blueprints for typical three-tier Java web applications to help new users get started (nginx + Tomcat + Postgres or MySQL).

This was an opportunity to deploy a much more complex application – I chose Cloudera’s CDH distribution of Hadoop.

Brooklyn: manages and deploys complex applications

Brooklyn is an open source management plane which automatically manages and deploys applications. It has previously been used to create CDH certification clusters, and multi-provider, multi-region cloud services.

Integrating with Ravello

Brooklyn deploys applications to Locations (instances in public or private clouds, or a group of fixed machines), often using the jclouds library to interface with cloud providers.

However, Ravello is not one of jclouds’ supported providers, and Ravello and jclouds have slightly different views of the world, so some work was required before I could use Ravello as a Location.

jclouds views the world as a collection of individual VMs, which are assembled to create an application: Ravello views the application as the central entity of management, where the application contains multiple VMs. When using Ravello we must ask the application for a new machine, rather than the standard Brooklyn workflow of creating a VMs and informing the application of their existence.

I created a RavelloLocation Location for Brooklyn that translates between these two views of the world, turning requests for machines into calls to Ravello’s REST API.

Deploying Cloudera Hadoop

With RavelloLocation in place, deploying CDH was simply a matter of specifying cloud credentials as Brooklyn properties and targeting a deployment of the Cloudera application (brooklyn-cdh) at the new location.

Ravello created machines in the given cloud and set up the networking between machines. Brooklyn customised the machines and installed the applications.

Results

The management tree in the Brooklyn console, showing a four node Hadoop cluster, up and running.

The Ravello management console showing their representation of the cluster’s VMs. The open ports on the machines are shown, but the Ravello GUI does not show how these ports are used by other machines in the application. (This is handled by Brooklyn.)

The Cloudera management console, in ravcloud.com.

Next Steps

The deployment to Ravello shows that it is simple to use Brooklyn to deploy complex applications to clouds, even if they are not supported by jclouds and do not use jclouds’ VM-first mindset.

Now that the RavelloLocation links Brooklyn and Ravello, we gain the ability to deploy all of the applications in Brooklyn’s catalog, including MapR, MarkLogic clusters and the OpenGamma cloud services to Ravello.