Cassandra training

I was recently working on a project that was considering Cassandra a solution for a big data problem we were having.   We quickly discovered that although the resources online were helpful, they weren't as exhaustive as we had hoped.  There best resource was the unreleased O'Reilly Cassandra book that was being written.  We decided that the best course of action would be to attend the Cassandra training class offered by Riptano.  Not only are they the most authoritative source for Cassandra information, they are based here in Austin, TX.  

After teaching so many training classes back in the JBoss days (until Red Hat took over I was doing at least 1 class a month), being on the other side of the tables was a bit surreal.  At several points during the lab times, I found myself starting to stand up to go check on how everyone and needing to remind myself that I wasn't in charge of the show.   

The Riptano class is a single day class that is split between basic usage and admin/ops topics.  As long as a day is, it's really hard to get from 0 to anywhere interesting in a single day.  If I were redesigning the Riptano class, I would suggest that they record the morning introduction to Riptano concepts as a video and make that a pre-requisite for attending the class.  You'd definitely have some people who show up without doing their homework, but I think you have to target the people who are there to learn first.  Getting straight into Cassandra would have really let us get a lot deeper.

After the introduction to cassandra, we got started directly on some basic usage.  Riptano smartly provides a vmware virtual machine to standardize the learning environment for students.  (I really wish I had the luxury of doing that back when I was leading JBoss training classes.  The lessons and labs largely focussed on using cassandra-clii, the Cassandra shell.  The shell is very limited.  You can only do the most basic operations with it, but anything more would require writing code.  That's not an easy task for a platform for a datastore that aims to serve clients in a variety of languages, each with it's own different API abstractions.    Having a fully-functional query language should definitely be high on the list of future Cassandra improvements I'd like to see.  

The labs were helpful, but we only had time to explore the data model of the twissandra demo app.  It was enough to get a taste of how to navigate related column families.  With Cassandra you have to maintain your own indexing (No join's for you!) and carefully design your model around your access model.  It would have been very enlightening to try and design a usable data model, but that was clearly beyond the scope of the class. 

The rest of the class focussed largely on ops issues.  We played with different replication factors with a multi-node cluster, learning how Cassandra balances data between nodes and how it deals with nodes entering and leaving.  This was really practical information.  As I've only prototyped Cassandra code, I can speak to how well the class prepares you for deploying Cassandra in production, but it was very helpful to me.   The slow part of the afternoon was the hour or so that we spent talking about the JMX MBeans and Java memory management.  (Cassandra is written in Java)  I assume this was helpful to those in the class without a Java background, but I found myself nodding off a bit here. 

The conclusion?  I think the class is a good value for the money, especially since the trainer was an actual Cassandra developer who could answer technical questions without resorting to hand waving.  I would recommend it to anyone new to Cassandra.