Speaking at a local high school CS class

I attended PyTexas  earlier this year, and although I no longer do enough Python for it to really get excited about much that I saw there, I was greatly excited by a lightning talk where someone was trying to recruit people to  volunteer in Austin CS classrooms. Finding out about this program made the entire conference worth it. Although I'm mostly interested in working with kids on a programming team or who are working on programming projects where the could benefit from an experienced programmer, I'm pretty willing to do whatever I can.  I didn't have many CS resources available to me when I was in school, and I'd hate to think I was doing nothing if there were students now in the same situation.

Last week I had the great opportunity to speak to some local high school students taking their first CS class at Westlake High School in Austin.  Sadly I didn't get to do any coding or teaching, but I did get the chance to come in and say whatever I wanted to.

I'd never done a talk like this before, but my plan was pretty straightforward:

  1. Introduce myself and talk about what motivated me to get in to computer science. I showed some some pictures of my first computer and since my first interest was to become a game developer, I showed a few games that motivated me early on. I hoped the students would maybe be able to find some parallels to things they were interested in.  I think this material was ok, but I didn't count on the low quality classroom projector not being able to display my materials and so some of this wasn't very clear.
  2. I talked about studying CS in college.  What are classes like? What kind of classes do you take and what were some of the funnest projects I worked on while getting my degree.   I think most of the students found this moderately interesting.  Most of the students seemed to be college-bound and so anything about college life is probably worth at least half listening to.
  3. Next I talked about what you can do with a CS degree.  I mentioned all the kinds of companies you can work for and the different kinds of jobs you might do with a CS degree. I think I spent way to much time on this trying to be thorough.  I wanted to impress on the students that there's so much variety, but I guess I probably just went on and on and lost their attention.   
  4. Finally, I wanted to make it personal and show specifically what I do on a day to day basis.  I tried to emphasize the team nature of software development and show how we communicate and get things done. I didn't have internet access, so I tried to put together some screen shots showing things like our company chat room and issue tracker, but sadly the fuzzy projector also made this difficult. 
  5. I had a little extra material with some ideas of things students could do if they were motivated to dig a little deeper into computer science, but I didn't actually have time to get to this in any of the classes.  This was good because I think this content was not really appropriate to the audience given the resources they have available at school. 

As might be obvious, I don't think I did all that great of a job. My material didn't quite hit the mark, and I did not test my presentation on lower resolution and lower power projectors. Still, I hope that maybe some of the students found my talk if not helpful at least amusing.

I don't regret the talks at all.  I had fun, and although I'm disappointed in what I produced, if I'm lucky enough to get another opportunity to do this, I have plenty of specific things I can do to improve. If you have the chance to do a talk like this, I highly recommend it. 

Two more Commodares

Commodares - Ahoy! 38 

Commodares - Ahoy! 38 

I've been continuing to pour through back issues of Ahoy!  and have found two more of my non-published, but acknowledged Commodares submissions.  I don't know which problems these were for - maybe multiple problems, since I have memories of working on many of them. 

Commodares - Ahoy! 41

Commodares - Ahoy! 41

These were also during my early high school years.  At this time the only programming resources I had were practice problems from the computer programming team competitions and these magazines. I did manage to get my hands on at least one programming book during this time, and I have vague recollections of some programming tutorials downloaded from BBS's, but there really really wasn't that much.


My copy of On Lisp arrived

The Clojure Cup has finished, and my team, What The Fn, won the public favorite award.  This was really more a function of us begging for votes in the chat feature of the app early on, which gave us a lot of visibility throughout the competition.

On of the team prizes was a copy of Paul Graham's "On Lisp", and my team is letting me be the guardian of this book.  I'm thrilled because this book was my first introduction to Lisp beyond an intro class I took in school.  At the time, the University Co-Op at UT had a great selection of computer science books and I bought a number of gems like this. 

Sadly, I don't think I every finished the book, though I did make some semi-serious attempts.  I clearly remember section 2.6 on Closures having a big impact, and section 5.3 on memoization was (I think) my first introduction to the idea.  But, seeing as how I never actually learned macros until I become a Clojure developer, I'm guessing that's about how far I got with the book. 

I actually sold my original copy on Amazon marketplace years ago. I didn't think I'd ever have much use for a Lisp book, and the prices the book fetches are just too compelling given that the book is available for free in PDF form if I ever did want to revisit the material. When I started doing Clojure, I regretted that decision greatly, but not so much I felt strongly motivated to spend $150 to replace my copy.  Thanks to the Clojure Cup, I don't have to. No, it's not the same copy I purchased as an eager CS undergrad and then neglected, but it's good enough for me.  I can't wait to dig into it again.


One of my first publications

Commodares 45-2 from Ahoy! 49

Commodares 45-2 from Ahoy! 49

In high school, I had to learn to program mostly on my own.  One of the resources I used to challenge me was the Commodares programming challenges from Ahoy! magazine.  Every month there would be a few small programming challenges, and readers would be challenged to send in solutions.  

I always worked on these and occasionally sent in solutions.  Since I found the Commodore magazine archives, I've been digging through them trying to find some of my old solutions, and I think this may have been the only one of mine to make it to print.  I've found at least one other mention, but none in print.

Commodares from Ahoy! 60

Commodares from Ahoy! 60

It's not beautiful code, nor is it very impressive code.  But, this little programming challenges kept me on the path of learning and exploring that let me get to where I am now.  

I guess this is why I still love doing things like 4clojure  and project Euler problems.


Book Review "Gödel's Proof"

Gödel's Proof
By Ernest Nagel, James Newman

I've wanted to understand Gödel's incompleteness theorems for quite some time, but I've always found this find of math to be a bit of a stretch for my background.  That is, until I found "Gödel's Proof", which took me from only a passing misunderstanding of the incompleteness to a basic working understanding in just a bit over 100 pages.

The first half of the book mainly covers the notion of consistency and introduces all the background material needed to understand the both the motivation of the proof and the proof itself.  The second half of the book focuses on the proof itself, starting with Gödel numbers and mapping mathematical statements onto numbers and quickly moving to the crux of the proof.

While I can't claim to be an expert on Gödel's work, I do feel that I have a decent understanding of the ideas and can move on my next goal of reading and understanding Turing's original "On Computable Numbers, with an Application to the Entscheidungsproblem" which makes extensive use of Gödel's ideas.


fixing ssh host key verification failed

When doing dev work on EC2, I find myself destroying and creating new EC2 instances many times in a day, and I find myself hitting the "Host key verification failed" message from SSH a bit.

You could turn off the check with -o "StrictHostKeyChecking no" on the command line or in your ~/.ssh/config file, but the warning ssh gives is a good one and not one you really want to turn off.  My approach has been to edit the ~/.ssh/known_hosts file and remove the old key.  SSH is nice enough to tell you the exact line number.  It's inelegant, but it's what I've done for years.

What I didn't realize is that ssh has a better solution.

ssh-keygen -R will update the known_hosts file, removing the old key.  The next connection will prompt you to verify and add the new key.  Much nicer.


Reflecting on a year with OSGi

I attended a recent AustinJUG on OSGi.  I think OSGi is one of those technologies that is hard to really grasp and have a solid opinion of until you've used it.  I spent most of last year developing a set of applications on OSGi and learning the ins and outs of OSGi while tech editing OSGi in Action, so many people came to ask my thoughts.  I am most certainly not an expert, but I have used it enough to have opinions.  I thought I'd summarize here:


OSGi does a really good job of what it does.  It does work.  You can break up your app into very independent modular components and manage them.  It's not a half-baked solution, and given the complexities of this kind of class loading, that says a lot.  There are still gotcha's with things like the context class loader. (OSGi ignores it)

OSGi is a very good fit for servers and services.  If you truly have lots of mostly-independent, dynamic services that need to work together, you will like what OSGi gives you.  It can be used  for webapps, but it is not a great fit.  After working with it, the developers I was working with wanted to move the web server and WAR files out of the OSGi environment.

OSGi does not feel like modern Java.  OSGi was designed before the Java 5 revolution.  The APIs are cumbersome and feel outdated.

You will want to use a some sort of framework with it.   We used the Spring OSGi integration.  I didn't enjoy that, but it seems like the best choice when keeping in mind the big picture.

Working with your own code is wonderful.  Working with other people's code is not so easy.  If someone isn't managing an OSGi bundle for the libraries you want to use, you will feel pain.  Even if they are, you may still have headaches.

Summary: OSGi works, but don't use it unless you really need it.



No love for the monkey patch

Working mostly in Python at the day job has been an interesting experience.  For them most part integrating the new language and libraries has been fairly straightforward.  The dynamic nature leads to efficiencies on one end, but it also introduces a whole host of annoyances.  I spend a lot more time tracking down subtle errors that would have been immediately obvious in Java, for example.   

Of all the things I've encountered so far, the thing that's caused me the most headache has been a dumb monkey patch made by the M2Crypto library.  M2Crypto is a replacement SSL library required by own of our tools.  It apparently provides some functionality not in the standard python SSL code.  That's fine, except that in order to provide it's functionality, M2Crypto does this brilliant patch of urllib:

This apparently works for most urllib use, the the urllib.urlretrieve, which we were using, fails when retrieving https URLs.  It was a bear to track down, but once I realized that the function worked fine in a bare python and failed in the virtualenv python with all our third party libraries, it just took some time single-stepping through in the debugger to figure out where the two environments differed.

Since the open_https method is completely destroyed my the patch, the only option was to switch to urllib2.  However, since urllib2 doesn't seem to have kept the urlretrieve function around, I had to rewrite the urlretrieve functionality on top of it.  

This is exactly the kind of thing I was most afraid of when it came to working with Python - having to deal with other developers who decided it would be great to change the way base system libraries work.  Looking at the big picture, it was really only a minor annoyance - a few lost hours at most.  The real harm is in the feeling of instability it brings.  I want to know exactly what the system is doing and why, and it's hard to do when one library decides to muck around with another libraries internals.

An automatic primary key in django that works with inline model admin

I wanted to have a UUID primary field in a django model.  It isn't hard to create a CharField with a value that defaults to a UUID.   The django_extensions has a UUIDField is a nice all in one solution.  It works well as a primary key, except for a messy case with InlineModelAdmin which only allows non-editable primary keys if they are an AutoField.   

The workaround I found, after digging through the code a bit was to mimic the behavior of AutoField to trick InlineModelAdmin into doing the right thing.  I extended UUIDField like this:

After a bit more searching, I later found this was a known bug.  More interestingly, one of the commenters on the bug had considered the same solution and rejected it.  Take a look at the bug report for his reasoning, but the error condition he points out only occurs when the default value is None.  If the field has a default value, as is the case with this CharField-extended model, the potentially problematic logic doesn't kick in.  I haven't battle-tested the solution yet, but initial testing looks like it performs correctly. 


What exactly is open stack?

I've been struggling for a while to get a handle on OpenStack.  Since Rackspace, the company that initially pushed the idea, is based nearby, there's been quite a lot of noise about OpenStack locally.  I've attended several talks on OpenStack, but unfortunately I still haven't really gotten the big picture.

As a developer, my first question is why I should care open OpenStack.  I'm intrigued, for sure, and my background should leave no doubt that I'm a huge open source proponent.  However, I don't ever expect to run my own cloud.  As a developer, if I run on Amazon EC2, I see a set of Linux boxes (open source) and a public API I can use to manage my instances.   Whether the cloud infrastructure is open source or not, doesn't seem to matter much.  I should be able to spin up a Linux box on the Rackspace cloud just as easily and have nearly the same view of the world.  

The management of instances seems to be the key difference.  Services like Rightscale go a long way to bridging the gaps between the different cloud offerings, but I can certainly see how this would all be easier if the various cloud services had a common management API.  Unfortunately, I don't see much talk about management APIs from OpenStack.  Maybe I've missed it, or maybe it's just too early.  It's not really clear. What exactly is the OpenStack view of management APIs?

More important than the infrastructure APIs is the platform service APIs.  I'm a huge fan of S3 and SQS, for example, both of which are big reasons I go with EC2.  Though technically both are quite accessible outside of EC2, I would only seriously consider moving if equivalent services were available, preferably ones with standard APIs that would give me true cloud service portability.  OpenStack has the start of this with Swift, their version of S3.  However, I'm still a little unclear on the big picture.  Will OpenStack create a core platform of AWS-like cloud services based around standard APIs?  Or, is the goal to create a linux-like environment for the creation of (hopefully compatible) platform packages that can be installed into an OpenStack cloud ad hoc?  It seems to me like the later is closer to the approach, and although that's quite an interesting possibility, I'm not sure how OpenStack then will lead to a simpler, more standardized cloud than we have now.

I'm not down on OpenStack.  I just don't really understand the project.   As an app developer, I'm clearly not the primary audience of OpenStack, yet.  However, cloud platforms exist to serve the developers, and if you are designing an open platform from the ground up, I would hope to have been able to get a little bit better of a perspective on what the world of an OpenStack cloud will look like to a developer.

The half-life of a software engineer

As a software developer, staying at the same company too long can be fatal.  It's easy for your skills to stagnate working with the same people and technologies, year after year.  On the other hand, a track record as a job hopper can be a red flag indicating a developer who can't fit well on teams and who doesn't have any experience maintaining the systems they worked on.   No company wants to invest in a new hire who will jump ship in 6 months.

From my own experience, I find that it takes about 2 years to complete a full cycle on a project.  At first, you come in and are just learning the domain and the code base.  Then there's period of developing fluency until you've completely mastered the codebase.  At that point, you'll be in a position to be able to contribute to project at an architectural level, leaving plenty of time to hang around and learn the lessons from these larger-scale efforts.  

Sometimes this cycle is faster, sometimes it's slower, but a couple years seems to be about right.  After that, something needs to change.  In a larger company, there may be opportunities on teams that are very different from what you are working on.  I've always worked for smaller companies, and often times this change comes in the form of an acquisition, which brings new challenges.  At one company, I transfered to the advanced research division of the acquiring organization.  In another, I tried my hand at evangelism/product management, making a transition back into engineering when I felt I really needed to spend more time coding.  In both of these cases, I ended up staying for 4-5 years.

I'm curious how this compares to other people's experience.  

SQL Antipatterns

Despite the fact that I've actually increased the number of technical books I've been reading, I've gotten out of the habit of reviewing them.  I thought I should try and reverse that trend, starting with one of the most helpful books I've read this year, SQL Antipatterns.

I've never been a fan of relational databases.  I started my career working with LDAP, and acceptance of the rigidness relational model didn't come easily.  Over time, I learned to respect relational databases more and become moderately proficient with them.  However, my policy was always that the database is nothing more than an implementation detail.  If I ever actually noticed the database when you were writing code, something was seriously wrong with my dev model.

I won't debate the merits of that attitude, but it led me down a path of accepting many half-baked solutions that ended up causing my nothing but pain, but fortunately I was able to sweep those under the rug of my db abstractions and ORM mapping tools, directing the curses under my breath at that evil SQL database that I had to constantly interrupt my "real" dev work to deal with.

I saw many of those "solutions" in the SQL Antipatterns book, along with a discussion of the problems with each pattern and a pragmatic discussion of the alternatives.  (sometimes the antipattern might actually be better than the alternatives)  I won't discuss the specific antipatterns here.  I'll just repeat that I saw many of my previous design decisions in here along with alternatives I wish I had considered.   If you find relational databases a pain, some of that pain might be caused by the way you are doing things.  SQL Antipatterns may just set you right.   Remember, hating relational databases is not an excuse for mis-using them.   And who knows, when you start using them better you might even find they aren't nearly as ugly as you thought.


Cassandra training

I was recently working on a project that was considering Cassandra a solution for a big data problem we were having.   We quickly discovered that although the resources online were helpful, they weren't as exhaustive as we had hoped.  There best resource was the unreleased O'Reilly Cassandra book that was being written.  We decided that the best course of action would be to attend the Cassandra training class offered by Riptano.  Not only are they the most authoritative source for Cassandra information, they are based here in Austin, TX.  

After teaching so many training classes back in the JBoss days (until Red Hat took over I was doing at least 1 class a month), being on the other side of the tables was a bit surreal.  At several points during the lab times, I found myself starting to stand up to go check on how everyone and needing to remind myself that I wasn't in charge of the show.   

The Riptano class is a single day class that is split between basic usage and admin/ops topics.  As long as a day is, it's really hard to get from 0 to anywhere interesting in a single day.  If I were redesigning the Riptano class, I would suggest that they record the morning introduction to Riptano concepts as a video and make that a pre-requisite for attending the class.  You'd definitely have some people who show up without doing their homework, but I think you have to target the people who are there to learn first.  Getting straight into Cassandra would have really let us get a lot deeper.

After the introduction to cassandra, we got started directly on some basic usage.  Riptano smartly provides a vmware virtual machine to standardize the learning environment for students.  (I really wish I had the luxury of doing that back when I was leading JBoss training classes.  The lessons and labs largely focussed on using cassandra-clii, the Cassandra shell.  The shell is very limited.  You can only do the most basic operations with it, but anything more would require writing code.  That's not an easy task for a platform for a datastore that aims to serve clients in a variety of languages, each with it's own different API abstractions.    Having a fully-functional query language should definitely be high on the list of future Cassandra improvements I'd like to see.  

The labs were helpful, but we only had time to explore the data model of the twissandra demo app.  It was enough to get a taste of how to navigate related column families.  With Cassandra you have to maintain your own indexing (No join's for you!) and carefully design your model around your access model.  It would have been very enlightening to try and design a usable data model, but that was clearly beyond the scope of the class. 

The rest of the class focussed largely on ops issues.  We played with different replication factors with a multi-node cluster, learning how Cassandra balances data between nodes and how it deals with nodes entering and leaving.  This was really practical information.  As I've only prototyped Cassandra code, I can speak to how well the class prepares you for deploying Cassandra in production, but it was very helpful to me.   The slow part of the afternoon was the hour or so that we spent talking about the JMX MBeans and Java memory management.  (Cassandra is written in Java)  I assume this was helpful to those in the class without a Java background, but I found myself nodding off a bit here. 

The conclusion?  I think the class is a good value for the money, especially since the trainer was an actual Cassandra developer who could answer technical questions without resorting to hand waving.  I would recommend it to anyone new to Cassandra.  



Saved by the log

I have a tendency to obsess a bit about logging.  Sometimes I'm really comfortable with my code that I let my laziness override my judgement, but I generally try to log my code well enough that I can go back and answer just about any question I want to about a system.

One of my dev tasks earlier this year was to write a system that asynchronously archived user data by email.  I'm no stranger to SMTP, but I'm not exactly a mail guru either.  When I wrote the code, I knew we were going to have integration issues.  In a moment of paranoia I decided that instead of just logging the work being done and and the important result codes that I would log every outgoing SMTP session in a special log directory, indexed by time and and event id to make sure I could quickly answer any questions about a transaction.   Dealing with these logs is a bit of a hassle.  I periodically tar them up and ship them out to S3, but as I watch the data grow I've often wondered if maybe I weren't just overdoing it a bit.

The logs have come in handy from time to time, but over the last couple weeks having them around has saved me in some really big ways these last few weeks.  I've had three big issues these last few weeks, ranging from authentication, to mime-type issues to "are we sure we are actually sending anything at all" problems.  In each case, the problem has been in the receiving party's side.  With the extensive logging, it's been trivial to dig up the exact bits we sent out in seconds, while we wait and wait (often for days) for the other guys to figure out what is going on in their own system.

This all has made me wonder if perhaps I'm not being paranoid enough.  There are a lot of integration points in the code I'm working on that really has very poor log coverage.  If a problem did crop up, I wouldn't be able to directly answer a question about what exactly the system did or didn't do, and until now I probably wouldn't have thought much of it.  That's a pretty scary thought, and it's making me rethink my whole approach to logging.

How to respond to bad tech support calls

Today's xkcd on unhelpful tech support reminded me of a dilemma I had recently determining the proper response to tech support agent at Time Warner who clearly didn't seem to have much experience.  Our cable modem died, and it was clear I needed them to come replace it.  I understand that they might need to do some verification of my assessment first, as it does make sense that you don't send a truck out or send new hardware out just because a customer thinks his cable box is bad.  I have no issue with walking through standard troubleshooting steps just to be sure.  

What go me going was, after going through a number of "this might possibly be interesting troubleshooting steps if the lights on the cable modem were on" checks on my computer, the tech support person asked:

Have you optimized your web browser?

Huh?  I paused for a while lost in thought, trying to figure out what that means.  Optimized my web browser?  Is he asking if I've installed some Time Warner browser plugin to serve ads to me?  Have I updated the latest version of my browser?  What in the world could that mean?  (feel free to pause for a second and take a guess)

It turns out the Time Warner agent was asking trying to ask if I had cleared my browser cache and cookies. Silly me - I thought a cache was supposed to be a performance optimization by reducing the need for reloading pages over the network.  Of course, in some cases that could be a useful troubleshooting step, but would it be possible to actually say that rather than make up nonsense term?

It's funny, but my real dilemma was trying to figure out how to respond.  Should I just hold back and politely say yes?  Should I come up with a snarky response like "No, do I need to charge my flux capacitor first?" I ended up going halfway with the simple "I'm not sure that means anything", but I can't help but feel I missed the chance to give a truly quality response.  


Let's build Babbage's Analytical Engine


We like to too think that computers are a 20th century invention, but nearly 200 years ago Charles Babbage designed the Analytical Engine which, if built, would have been the first programmable computing machine.  With the potential of 20k of memory (much more than my first computer, a Commodore VIC-20), this punch card-driven, steam-powered beast could have launched the computer revolution a 100 years earlier, had it actually been built.

Well, at long last, it finally can be.  There's a project underway to build this historic device.  It's not a cheap endeavor, but if you are a geek, the value of building and seeing in action the first computer ever designed is just too much to pass up.  If you agree, please consider making a $10 donation via Pledge Bank to get the project kicked off.  

In the meantime, you might want to read more about the analytical engine to see what this is all about.  I'd also encourage you to listen to TWIT #269, which features a discussion with John Graham-Cumming about the project.


Trying out disqus for comments

I've not been happy with the commenting and community functions here at squarespace.  While I guess they would work well for a small company or for a site that wanted to build a managed community, I don't really want people to have to manage an identity just for my blog.  If a site doesn't allow facebook connect or openid login of some sort, I'm very unlikely to want want to participate.  So, I've been very uncomfortable with forcing that on people here. 

On top of that, I had hoped that by moving to a solution like squarespace, that I'd find it easier to integrate with my content across the net, particularly facebook and twitter.   Although the the tools squarespace provides are quite nice, I have been a little disappointed at the slow pace of development of integration features.    

The search came down to two primary candidates: disqus and echo.  Both feature commenting systems that can be plugged into an existing site.  They both have a wide variety of identity/login options, and they both look to have at least some ability to integrate content with social networks.  My impression was that echo had a little bit more of what I wanted, but there doesn't seem to be a true free/cheap level.  The product looks very focussed towards small/medium company sites and even their cheapest plan prices me out of the market.  I have absolutely no problem paying for a service, but what they offered was just too pricey for my needs.

So, I've added disqus here on the blog.  One side effect is that old comments are gone.  I'm hoping there will be an easy way to import them into disqus, but for now I'm just going to settle for the blank slate.  I hope that if I go with this long term that there will be an easy solution for bringing those existing comments in.  And, of course, I'm assuming I'll have an easy way to get comments out of disqus later if need be.  That'll all be part of due diligence.  In the meantime, I'm considering all comments as transient.

Sending multipart mail in an OSGi container

Sending mail in OSGi shouldn't be hard.  And, it wasn't, at least not at first.  My code was working perfectly until it came time to work on some enhancements which required attachments.  Suddenly my nice, simple mail program was falling over.

UnsupportedDataTypeException: no object DCH for MIME type multipart/mixed

The problem is a typical OSGi classloading mess.  In this case, javax.activation was unable to read a configuration file, META-INF/mailcap, that is normally a part of javax.mail.  This file isn't needed for simple mail messages, but when you start adding attachments, you need it around.  META-INF directories aren't something that can really be shared in OSGi, so there didn't seem like much hope of making things work correctly, at least short of creating (and maintaining) custom bundles.  

I wasn't able to find any adequate solutions after a bit of searching.  I tried using the springsource mail/activation bundles (making sure that I imported classes from those bundles rather than the system bundle) assuming they might have solved the problem, but javax.activation still had no visibility into javax.mail.

Digging through the code, I found that I could recreate the effects of META-INF/mailcap by directly calling CommandMap.setDefaultCommandMap().  That worked, but then the question became where to put that code.  Do I create a service in my bundle to set these "global" values?  What if some other clever bundle did the same thing?  Maybe I should create a new bundle that did this setup, just to make sure I can turn it on and off easily.  No matter what, this seemed ugly.

The other option was to embed javax.mail and javax.activation into my bundle. (Embed-Dependency using bnd)   This physically puts the mail/activation jars inside the bundle and uses those classes before any system ones, making them private to the bundle in much the same way a JAR in WEB-INF/lib or some other Java EE archive would be.  This would be a bad thing if I needed to share objects from those bundles to the outside world or if those bundles provided OSGi services, but that's clearly not the case here.

In the end, I've gone with the embedded JAR approach.  It appears to solve my problems, and I'm able to fully use the libraries, at least as far as I can tell.  Making the JARs private does bloat the bundles slightly, but I'm fully isolated from other JARs and from global side effects.  

There's nothing magic about this solution, but since I wasn't able to find an adequate solution documented elsewhere, I thought it would be worthwhile to share what works.  If anyone has a better solution, please post in the comments.

Using git in an svn world

I've been proposing that we move from svn to git at work because I think it would be a really good fit for our current process.  We are using svn along with a tool called savana which does user branches in svn.  It's not a terrible solution when it works, but savana feels like a hack and we spend quite a bit of time managing branches and cleaning up after savana failures.  

Git would solve a lot of problems, but it's also not without risk.  Transitioning takes time, existing processes would need to be reworked, and people would need to learn the new tools.  There's no doubt that git would be better in the long run, but it's hard to justify a transition today.

As a lower risk step towards git, I've been using the git svn functionality.  With git, I can check out our master svn repository into a local git repository on my machine.  Git is aware of all of our branches and tags.  I can create a local branch from any remote svn branch and push any changes I commit locally back into svn at the right point.  

A local git branch can completely replace the need for savana user branches, however the issue of sharing does come up.  It's not uncommon for two developers to collaborate an a shared user branch related to a new feature.  With user branches on the local machine, collaboration becomes a little more difficult.  It's possibly for those users to collaborate on a shared user branch in svn, but then git is really not helping out much.  Git does have many options for developer collaboration, but I don't think sending patch files around is really the kind of developer process that would work well for us.  Perhaps using "git daemon" would work for us.  It's really hard to say.  

The bigger advantage of introducing git svn as replacement for savana is that the other developers can experiment with git on their terms and hopefully become familiar enough with git that later on switching out svn for git as the central repository would hardly even be noticed. 

Getting to google docs via code

Over the last few weeks, I've had the need to share data from applications I'm working on to interested parties.  In an ideal world, this access would be baked into the application through some sort of administrative UI.   However, anything non-trivial would certainly take a good bit of effort to not only create that UI but to administer it.  (user management and access control, for example)   When I better understand the long-term data reporting needs of the app, I'm sure I'll go down that road.  But for now, I just needed to share some basic data out from the app.

Spreadsheets are great for this type of data, and google docs has very sophisticated sharing mechanisms.  For one-time data sharing tasks, it's trivial to upload the results of a database query as CSV and share it with others.  However, I needed to keep this data current.   I really don't want to be manually uploading reports every day.

Google provides programmatic access via the Google Data Protocol.  With GData I can write code to push data to a shared document.  This is probably less work than a custom UI, but it's not exactly free.  It's a generic enough task that I think I might write a JDBC -> GData exporter at some point just for kicks, but I'm not there yet.  

Last week I stumbled across a scriptable Google docs upload tool.  With this tool, I was able to write a shell script that would run queries against the database and upload the output (as CSV) to google docs in a shared directory.  Cron makes sure that the script is run at regular intervals.  The net result is that I have an automated way to share data with interested parties in a format that they can work with.  I can trivially add new queries or reports, and managing users and access is simple.  

It's not a perfect solution, but it solved the immediate problem very quickly.  Aside from the hassle of trying to get a valid AuthSub token (I didn't want to put my username and password in the script) it took very little time to implement.