Repeatable builds and maven?

A repeatable build is a build which you can re-run multiple times using the same source and the same commands, and that then results in exactly the same build output time and time again. The capability to do repeatable builds is an important cornerstone of every mature release management setup.

Of course, lots of people use some silly much more limited definition of a repeatable build, and are happy as long as “all tests pass”.

Getting to repeatable builds is nearly impossible for mere mortals using maven 1 (heck, maven 1 out of the box doesn’t even work anymore, since the ibiblio repository was changed in a way that causes maven 1 to break), and it is still prohibitively difficult with maven 2.

Of course, the people that do repeatable builds really well tend to create big all-encompassing solutions that are really hard to use with the tools used in real life, and they only really help you when you either do not have a gazillion dependencies, or you do the SCM for all your dependencies, too. For the average java developer, that all breaks down when you find out you can’t quite bootstrap the sun JDK very well, or are missing some other bit of important ‘source’.

Don’t get me wrong. Maven can be a very useful tool. Moreover, in practice, if you do large-scale java, you simply tend to run into maven at some point, and as a release engineer you cannot always do much about that. You must simply realize that when you’re doing release engineering based around maven, that is only sensible if you still really, really pay lots of attention to what you’re doing. Like running maven in offline mode for official builds. And wiping your local repository before you build releases. And keeping archived local repositories around with your distributions. And such and so forth.

Not paying attention or not thinking these kinds of tricky release engineering things through just isn’t very sensible, not when you’re doing so-called enterprise stuff where you might have to re-run a build 3 years after the fact. You cannot afford to count on maven to just magically do the right thing for you. Historically and typically, it doesn’t, at least not quite.

Our choices for python web applications

So, at work, we’re doing some “next generation” versions of a bunch of our backoffice tooling. That involves producing a bunch of cute little web applications, that often control not so cute and not so little processes (like transcoding and publishing and whatnot). The course-grained architecture pattern is pretty simple and familiar: database with information about files, jobs, tasks and metadata, some common libraries for interacting with the database, some web application middleware using those libraries, and a web server frontend serving up the middleware.

Pretty much normal bread-and-butter stuff. It’s not quite like document-based CMS work (you don’t really want to store many-gigabyte video in a JCR repo), but a lot of the technology choices are still similar.

Customizing a snake

Based on the various tech we have deployed today, and the skills of the people working on this kind of thing, we’re trying to standardize around two main server-side technologies: java and python. This post explains the choices we made for the python universe. At the moment those choices are actually not so easy, since there’s so much happening and so many projects are moving so fast. We scouted the web quite a bit to figure out what to do.

Lower layers

  • OS: ubuntu 7.10 (still some nodes on 6.10)
  • Database: MySQL 5.0.45 (comes with ubuntu, a bit reconfigured of course) with some little bits of replication
  • Python: Python 2.4.4 (2.5 is not on ubuntu 6.10 and not on all our developer workstations, but we’re testing with it and will upgrade eventually)
  • WSGI server: right now we have a slightly customized cherrypy wsgi server (so that it accepts signals, restarts itself, runs from /etc/init.d, logs in all the right places, etc) behind an apache httpd 2.2 ProxyPass, which also handles SSL/AAA. We want to try and move to mod_wsgi but first we need its mac install to suck a bit less, and so far, cherry is not quite falling over on us. If mod_wsgi doesn’t work out it’ll probably be back to twisted, probably also behind apache for SSL reasons.

Application glue

  • Database access layer: storm, our own slightly modified version. We really like storm, and every now we find we are pushing it a bit beyond its limits, which leads to some bits of patching (by people smarter than me!). Fortunately it seems the guys working on it are quite responsive on IRC. I expect there’ll be a few (more) patches from us that flow back upstream. I really hope someone implements support for forking out reads and writes to different nodes (like you get for free with MySQL Connector/J), either in MySQLdb or inside storm.
  • Python web glue: We’re trying to do everything completely WSGI-based, though most everything at the moment is actually inside CherryPy 3.1b1 handlers. The WSGI pattern works just fine and scales nicely enough in our tests.
  • Templating: Genshi 0.4.4 (we had to pick one, there’s a few good choices here)
  • XML bits and pieces: lxml 1.3.6. It’s the best XML support in python so far, but it still isn’t quite as good as what you get in java. All the various bits and pieces just aren’t quite as mature, and the underlying libxml2 doesn’t quite do XML schema support as well, and I also miss something like XMLBeans for python.

Out of the box?

We took a look at a bunch of the web frameworks out there. We didn’t seriously consider zope, but we took a long stare at pylons, turbogears and django before deciding not to bother with them. We’re not using much of paste either. Basically we missed one or more of

  • good support for storm out of the box
  • doing everything the WSGI way
  • good and correct documentation
  • easy to scale / make efficient
  • stable core with excellent compatibility and bugfixing

And perhaps a few other things, and on the balance we guessed it would be easier to roll our own and integrate components, rather than strip something else down, and maintain lots of vendor branches.

Key point: standardization good

Two years ago I would’ve picked twisted without blinking and invented another fancy wheel on top of it, but I’m happy I don’t have to do that anymore. Twisted has quite a learning curve, not just for app developers, but also for the people that need to deploy and scale the beast.

Two good things happened to the python webapp world: competition and standardization. Now things are progressing rapidly.

Progress is good, but it can result in various kinds of chaos that don’t help the application developer that likes to plan ahead a bit. The new scripting language based mega frameworks seem to attract a certain kind of developer and they probably work for a certain set of use cases, but standardizing on patterns and interfaces is much more useful for (opinioned!) people like us (with subtly deviating use cases). So framework authors: please do keep working on bridging the gap between all of them by cutting ’em down into tiny little WSGI middleware bits and pieces, and turn frameworks into libraries where you can.