PUT *heart* MySQL

MySQL has a non-standard extension called INSERT ... ON DUPLICATE KEY UPDATE. When you’re building a RESTful application, it really rocks, because you can use it quite easily to have just the right PUT semantics. For example, you can have queries such as

INSERT INTO Book
           (id, deleted, a, b, c)
    VALUES (#id#, 0, #a#, #b#, #c#)
    ON DUPLICATE KEY UPDATE
        a = #a#, b = #b#, c=#c#,
        deleted = 0,
        revision = revision + 1

which you might call from java (example uses iBatis) using something like

    /* PUT a book, return an HTTP status code or throw an exception */
    public int putBookPart(final BookPart bookPart) throws SQLException {
        final int affected = sqlMapClient.update("putBook", book);
        assert affected == 1 || affected == 2;
        return 202 - affected;
    }

Someone at MySQL was clever enough to hijack the affected value – it will be 1 if this was an insert (resulting in 201 Created) or 2 if it was an update (resulting in 200 OK)). I don’t know about you, but to me that looks pretty sweet!

In this simple case, there is only one SQL query that happens to do this PUT, and so I don’t even have to bother with transactions.

Oh, just realized there’s one other trick in this example: the revision column is int(11) UNSIGNED NOT NULL DEFAULT 1 which is the basis of my ETag.

On the JDBC API incompatibility in Java 6 / JDBC 4.0

I updated the venerable avalon logkit for JDBC 4.0 / Java 6 today. I had to do this because that API breaks backwards compatibility for implementers.

Note that the incompatibility is very subtle – it is a source incompatibility only. As long as you don’t recompile your DataSource against the java 6 codebase, you won’t notice any problem.

The cost of API incompatibility

Spring and DBCP had to make roughly the same change 2 years or so ago, as no doubt 100s of other projects have done over the years – everyone that implements DataSource and wanted it to compile against java 6 / JDBC 4.0. Of course the pain is not over, since all those projects still want to maintain backward compatibility even when the JDBC spec did not. A recent thread about commons-dbcp is a good example.

The majority of the API incompatibilities between java 1.4 and java 6 are in fact due to the new JDBC 4.0 package. (Other significant breakages include changes to javax.net.ssl.SSLSession and org.w3c.dom.) Of all the API breakage, the changes in JDBC 4.0 probably have had by far the biggest impact, since the JDBC interfaces are explicitly designed for implementation by third parties, and are implemented in many different places.

Incompatibility by accident

This slipped through even though the relevant JSR explicitly states in its proposal:

Ensure JDBC backward compatibility

Many applications and deployments have significant investments in the JDBC technology and any improvements to the API, the provision of utility class methods and the ability to utilize meta data facilities and generics will maintain backward compatibility to all previous JDBC specifications.

And the final draft spec says something similar:

Maintain backward compatibility with existing applications and drivers

Existing JDBC technology-enabled drivers ( JDBC drivers) and the applications that use them must continue to work in an implementation of the Java virtual machine that supports the JDBC 4.0 API. Applications that use only features defined in earlier releases of the JDBC API will not require changes to continue running. It should be straightforward for existing applications to migrate to JDBC 4.0 technology.

The spec doesn’t mention the incompatibility in its revision history or in the overview of the new features.

The expert group was led by someone from Sun, and there was participation from IBM, BEA, Oracle, MySQL, and many others. The executive committee for the JCP (which includes Apache, Google, HP, and others) unanimously approved the JDBC 4.0 API several times.

Even though JSRs have to have extensive TCKs (Technology Compatibility Kits), and even though the spec was co-authored and reviewed by a large sampling of vendors of JDBC technology, and even though Sun has extensive QA processes, this change still slipped through.

I wish I could read through the mailing list archives or the bug tracker for the JSR expert group’s work, to see if its possible to figure out if anyone found this problem before they released the spec. It’s obvious (to me) now that they made a mistake here, but I wonder if they were aware of the impact of this change back then. Alas, that data is not available.

Preventing incompatibility by automation

Something like this could have trivially been avoided if someone had bothered to run JAPI, the API comparison tool that Kaffe uses, or if someone had put the JDBC 4.0 draft API into a large-scale integration tool like gump.

It seems pretty obvious to me that doing spec development and API evolution in the open is a really good way to increase quality of the specification. This is a good concrete example of why doing open development matters.

Web application platform technology choices

The hardest bit in the web application platform challenge is making reasonable choices. Here’s a stab at some of them…

Hosting models

I see these basic choices:

  1. LAMP virtual hosting. If you can build everything you need with mysql+php and you have few enough users that you need only one database server, by far the easiest and cheapest.
  2. Application hosting. Code on github, project management with basecamp or hosted jira, build on AppEngine or Heroku or force.com. You don’t have to do your own infrastructure but you’re limited in what you can build. Also comes with a large chance of lock-in.
  3. Managed hosting. Rent (virtual) servers with pre-installed operating systems and managed networking. Expensive for large deployments but you don’t need all web operations team skills and you have a lot of flexibility (famously, twitter do this).
  4. Dedicated hosting. Buy or rent servers, rent rackspace or build your own data center. You need network engineers and people that can handle hardware. Usually the only cost-effective option beyond a certain size.

Given our stated requirements, we are really only talking about option #4, but I wanted to mention the alternatives because they will make sense for a lot of people. Oh, and I think all the other options are these days called cloud computing 🙂

Hardware platform

I’m not really a hardware guy, normally I leave this kind of stuff to others. Anyone have any good hardware evaluation guides? Some things I do know:

  • Get at least two of everything.
  • Get quality switches. Many of the worst outages have something to do with blown-up switches, and since you usually have only a few, losing one during a traffic spike is uncool.
  • Get beefy database boxes. Scaling databases out is hard, but they scale up nicely without wasting resources.
  • Get beefy (hardware) load balancers. Going to more than 2 load balancers is complicated, and while the load balancers have spare capacity they can help with SSL, caching, etc.
  • Get beefy boxes to run your monitoring systems (remember, two of everything). In my experience most monitoring systems suffer from pretty crappy architectures, and so are real resource hogs.
  • Get hardware RAID (RAID 5 seems common) with a battery-backed write-through cache, for all storage systems. That is, unless you have some other redundancy architecture and you don’t need RAID for redundancy.
  • Don’t forget about hardware for backups. Do you need tape?

Other thoughts:

  • Appliances. I really like the idea. Things like the schooner appliances for mysql and memcache, or the kickfire appliance for mysql analytics. I have no firsthand experience with them (yet) though. I’m guessing oracle+sun is going to big in this space.
  • SSD. It is obviously the future, but right now they seem to come with limited warranties, and they’re still expensive enough that you should only use them for data that will actually get hot.

Operating system

Choice #1: unix-ish or windows or both. The Microsoft Web Platform actually looks pretty impressive to me these days but I don’t know much about it. So I’ll go for unix-ish.

Choice #2: ubuntu or red hat or freebsd or opensolaris.

I think Ubuntu is currently the best of the debian-based linuxes. I somewhat prefer ubuntu to red hat, primarily because I really don’t like RPM. Unfortunately red hat comes with better training and certification programs, better hardware vendor support and better available support options.

FreeBSD and solaris have a whole bunch of advantages (zfs, zones/jails, smf, network stack, many-core, …) over linux that make linux seem like a useless toy, if it wasn’t for the fact that linux sees so much more use. This is important: linux has the largest array of pre-packaged software that works on it out of the box, linux runs on more hardware (like laptops…), and many more developers are used to linux.

One approach would be solaris for database (ZFS) and media (ZFS!) hosting, and linux for application hosting. The cost of that, of course, would be the complexity in having to manage two platforms. The question then is whether the gain in manageability offsets the price paid in complexity.

And so, red hat gains another (reluctant) customer.

Database

As much sympathy as I have for the NoSQL movement, the relational database is not dead, and it sure as hell is easier to manage. When dealing with a wide variety of applications by a wide variety of developers, and a lot of legacy software, I think a SQL database is still the default model to go with. There’s a large range of options there.

Choice #1: clustered or sharded. At some point some application will have more data than fits on one server, and it will have to be split. Either you use a fancy database that supports clustering (like Oracle or SQL Server), or you use some fancy clustering middleware (like continuent), or you teach your application to split up the data (using horizontal partitioning or sharding) and you use a more no-frills open source database (mysql or postgres).

I suspect that the additional cost of operating an oracle cluster may very well be worth paying for – besides not having to do application level clustering, the excellent management and analysis tools are worth it. I wish someone did a model/spreadsheet to prove it. Anyone?

However, it is much easier to find developers skilled with open source databases, and it is much easier for developers to run a local copy of their database for development. Again there’s a tradeoff.

The choice between mysql and postgres has a similar tradeoff. Postgres has a much more complete feature set, but mysql is slightly easier to get started with and has significantly easier-to-use replication features.

And so, mysql gains another (reluctant) customer.

With that choice made, I think its important to invest early on in providing some higher-level APIs so that while the storage engine might be InnoDB and the access to that storage engine might be MySQL, many applications are coded to talk to a more constrained API. Things like Amazon’s S3, SimpleDB and the Google AppEngine data store provide good examples of constrained APIs that are worth emulating.

HTTP architecture

Apache HTTPD. Easiest choice so far. Its swiss army knife characteristic is quite important. Its what everyone knows. Things like nginx are pretty cool and can be used as the main web server, but I suspect most people that switch to them should’ve spent some time tuning httpd instead. Since I know how to do that…I’ll stick with what I know.

As easy as that choice is, the choice of what to put between HTTPD and the web seems to be harder than ever. The basic sanctioned architecture these days seems to use BGP load sharing to have the switches direct traffic at some fancy layer 7 load balancers where you terminate SSL and KeepAlive. Those fancy load balancers then may point at a layer of caching reverse proxies like which then point at the (httpd) app servers.

I’m going to assume we can afford a pair of F5 Big-IPs per datacenter. Since they can do caching, too, we might avoid building that reverse proxy layer until we need it (at which point we can evaluate squid, varnish, HAProxy, nginx and perlbal, with that evaluation showing we should go with Varnish 🙂 ).

Application architecture

Memcache is nearly everywhere, obviously. Or is it? If you’re starting mostly from scratch and most stuff can be AJAX, http caching in front of the frontends (see above) might be nearly enough.

Assuming a 3-tier (web, middleware, db) system, reasonable choices for the front-end layer might include PHP, WSGI+Django, and mod_perl. I still can’t see myself rolling out Ruby on Rails on a large scale. Reasonable middelware choices might include java servlets, unix daemons written in C/C++ and more mod_perl. I’d say Twisted would be an unreasonable but feasible choice 🙂

Communication between the layers could be REST/HTTP (probably going through the reverse proxy caches) but I’d like to try and make use of thrift. Latency is a bitch, and HTTP doesn’t help.

I’m not sure whether considering a 2-tier system (i.e. PHP direct to database, or perhaps PHP link against C/C++ modules that talk to the database) makes sense these days. I think the layered architecture is usually worth it, mostly for organizational reasons: you can have specialized backend teams and frontend teams.

If it was me personally doing the development, I’m pretty sure I would go 3-tier, with (mostly) mod_wsgi/python frontends using (mostly) thrift to connect to (mostly) daemonized python backends (to be re-written in faster/more concurrent languages as usage patterns dictate) that connect to a farm of (mostly) mysql databases using raw _mysql, with just about all caching in front of the frontend layer. I’m not so sure its easy to teach a large community of people that pattern; it’d be interesting to try 🙂

As for the more boring choice…PHP frontends with java and/or C/C++ backends with REST in the middle seems easier to teach and evangelize, and its also easier to patch up bad apps by sticking custom caching stuff (and, shudder, mod_rewrite) in the middle.

Messaging

If there’s anything obvious in today’s web architecture it is that deferred processing is absolutely key to low-latency user experiences.

The obvious way to do asynchronous work is by pushing jobs on queues. One hard choice at the moment is what messaging stack to use. Obvious contenders include:

  • Websphere MQ (the expensive incumbent)
  • ActiveMQ (the best-known open source system with stability issues)
  • OpenAMQ (AMQP backed by interesting startup)
  • 0MQ (AMQP bought up by same startup)
  • RabbitMQ (AMQP by another startup; erlang yuck)
  • MRG (or QPid, AMQP by red hat which is not exactly a startup).

A less obvious way to do asynchronous work is through a job architecture such as gearman, app engine cron or quartz, where the queue is not explicit but rather exists as a “pending connections” set of work.

I’m not sure what I would pick right now. I’d probably still stay safe and use AMQ with JMS and/or STOMP with JMS semantics. 2 months from now I might choose differently.

First look at open source IntelliJ

IntelliJ IDEA was open sourced yesterday!

Codebase overview

  • over 20k java source files, totalling just over 2M lines
  • over 150 jar files
  • over 500 xml files
  • build system based on ant, gant, and a library called jps for running intellij builds for which the source apparently is not available yet (see IDEA-25160)
  • Apache license header applied to most of the files, copyrights both jetbrains and a variety of individuals, license data not quite complete, no NOTICE.txt (see IDEA-25161)
  • ./platform is the core system
  • ./plugins plug into the core platform
  • ./java and ./xml are bigger plugin-collection-ish subsystems

Building…

  • Install ant (there is an ant in ./lib/ant)
  • Run ant
  • Build takes about 7 minutes on my macbook

Running…

On Mac OS X I run into 64 bit problems. Falling back to a 32-bit version of JDK 5.0 works for me…seems like jetbrains may have just fixed it.

cd /System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/Home/bin
sudo bash
mv java java.orig
lipo java -remove x86_64 -output java_x32
ln -s java_32 java
cd -
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/Home
export PATH=$JAVA_HOME/bin:$PATH
rm -Rf out
ant
cd out/artifacts
unzip ideaIC-90.SNAPSHOT.mac.zip
open ./Maia-IC-90.SNAPSHOT.app

Loading the idea source code into your just-built ide works seemlessly (just navigate to your git repo, an intellij project is already set up in the .idea directory.

Reading the code

com.intellij.idea.Main uses Boostrap and MainImpl to invoke IdeaApplication.run(). We’re in IntelliJ OpenAPI land now. Somewhere further down the call stack something creates an ApplicationImpl which uses PicoContainer. w00t! That makes much more sense to me than the heavyweight OSGi/equinox that’s underpinning eclipse. Its where plugins and extensions get loaded, after which things become very fluid and multi-threaded and harder to follow.

So now I’m thinking I should find a way to hook up IntelliJ into a debugger inside another IntelliJ…though it’d be cool if intellij was somehow “self-hosting” in that sense. Here’s hoping the intellij devs will write some how-to-hack docs soon!

java 1.6 exposes the system load average

See the javadoc.

Example usage:

import java.lang.management.ManagementFactory;
import java.lang.management.OperatingSystemMXBean;

public class LoadAverage {
    public static void main(String[] args) {
        final OperatingSystemMXBean osStats =
                ManagementFactory.getOperatingSystemMXBean();
        final double loadAverage = osStats.getSystemLoadAverage();
        System.out.println(String.format("load average: %f", loadAverage));
    }
}

This is a rather useful feature if you are writing software that should do less when the overall system load is high.

For example, if you’re me, you might be working on a java daemon that is instructing some CouchDB instances on the same box to do database compactions and/or replications, and you could use this to tune down the concurrency or the frequency if the load average is above a threshold.

instanceof vs ClassCastException performance, which is faster?

Say you are writing a for loop for a java server application. This for loop is iterating over something that in practice (i.e. in production) is always of a specific subtype. But in theory it might not be, so being an average java developer, you add some defensive checking.

Now since the for loop will be run at least once for every request your platform receives, and you receive many millions of requests a week (perhaps you’re working at a place like the BBC?), you decide to spend 5 minutes making it “fast enough”.

So, should you keep that  if(!(r instanceof HttpServletRequest)) that you put in, replace it with try { ... } catch(ClassCastException e) { ... }, or remove the check completely?

Just to be clear: this is a really silly question and you should really just not bother spending your brain cycles thinking about it: java is fast and well-optimized over many years, it will be fast enough and many times faster than all your I/O handling and whatnot.

But, you now asked yourself this question and you really really want to know. You google for it and find no good answer. So you write a microbenchmark:

class InstanceOfBenchmark {
    static long globalCounter = 0;
    static long noCheck = 0;
    static long classCast = 0;
    static long instanceOf = 0;
    
    public static void main(final String[] args) {
        final int loops = 10000;
        final int encounterB = 1000;

        final A[] instances = new A[loops];
        for(int i = 0; i < loops; i++) {
            //if(i % encounterB == 0) {
            //    instances[i] = new B();
            //} else {
            instances[i] = new C();
            //}
        }
        
        for(int i = 0; i < 1000; i++) {
            testNoop(instances);
        }

        for (int i = 0; i < 1000; i++) {
            testClassCast(instances);
        }

        for (int i = 0; i < 1000; i++) {
            testInstanceOf(instances);
        }

        System.out.println("globalCounter = " + globalCounter);
        System.out.println(String.format(
                "Time for no-check test: %d", noCheck));
        System.out.println(String.format(
                "Time for ClassCastException test: %d", classCast));
        System.out.println(String.format(
                "Time for instanceof test: %d", instanceOf));
        System.out.println(String.format(
                "instanceof is slower than classCast by %f ms",
                1.0 * (instanceOf - classCast) / 1000 / 1000 / loops));
        System.out.println(String.format(
                "instanceof is slower than no check by %f ms",
                1.0 * (instanceOf - noCheck) / 1000 / 1000 / loops));
    }
    
    static void testNoop(final A[] instances) {
        long start, end;

        start = System.nanoTime();
        for (final A instance : instances) {
            final C c = (C)instance;
            c.otherDoNothing();
        }
        end = System.nanoTime();
        noCheck += end - start;
    }

    static void testClassCast(final A[] instances) {
        long start, end;

        start = System.nanoTime();
        for (final A instance : instances) {
            final C c;
            try {
                c = (C) instance;
                c.otherDoNothing();
            } catch (ClassCastException e) {
                // ignore
            }
        }
        end = System.nanoTime();
        classCast += end - start;
    }

    static void testInstanceOf(final A[] instances) {
        long start, end;

        start = System.nanoTime();
        for (final A instance : instances) {
            final C c;
            if (!(instance instanceof C)) {
                continue;
            }
            c = (C) instance;
            c.otherDoNothing();
        }
        end = System.nanoTime();
        instanceOf += end - start;
    }
    
    static class A {
        public void doNothing() {
            // does nothing
        }
    }

    static class B extends A {
    }

    static class C extends A {

        public void otherDoNothing() {
            // does _almost_ nothing, but not enough to be a no-op
            globalCounter++;
        }
    }
}

Here’s some sample results on my machine:

globalCounter = 30000000
Time for no-check test: 23981000
Time for ClassCastException test: 21082000
Time for instanceof test: 21551000
instanceof is slower than classCast by 0.000047 ms
instanceof is slower than no check by -0.000243 ms

globalCounter = 30000000
Time for no-check test: 27874000
Time for ClassCastException test: 21886000
Time for instanceof test: 35174000
instanceof is slower than classCast by 0.001329 ms
instanceof is slower than no check by 0.000730 ms

globalCounter = 30000000
Time for no-check test: 26983000
Time for ClassCastException test: 22007000
Time for instanceof test: 22568000
instanceof is slower than classCast by 0.000056 ms
instanceof is slower than no check by -0.000442 ms

So for performance it really doesn’t matter. It’s safe to assume that the JVM JITs and Hotspots its way to the most optimal code path in all cases.

If you want to fiddle some more, note that the above test has some code to allow you to vary the success/failure path; just uncomment/comment a few lines and tweak encounterB. The result is the same in all cases: performance is the same.

Now with that settled once and for all, I can keep my instanceof and get back to work!

Google AppEngine for java, first impressions

Bottomline, it’s impressive. A lot of obvious things work in obvious ways. The google team also picked a lot of the right open source libraries to reuse, so I’m sure a lot of things will work well 🙂

Not allowing the creation of threads is not unexpected, but a bummer. At the same time I’m a bit surprised there’s no (documented!) restrictions on memory consumption. API coverage is pretty complete and includes java.net and JavaMail. Missing bits include AWT, ImageIO, Swing, RMI, CORBA.

The most interesting question is “how to do bigtable from java”, and the answer is interesting too. Besides a low-level API, BigTable access is provided through DataNucleus which is the successor to JPOX. I was always fond of JPOX, shame it is gone. But DataNucleus does look reasonable enough.

Google’s written a DataNucleus adapter for BigTable, which has led to the claim that they support JDO and JPA. However, the JPA support today really seems too limited to be of any practical use. The JDO support does seem reasonable, though they don’t exactly fully implement the standard (yet).

I’m sure a lot of people were hoping for Hibernate support. I will guess that the google engineers tried and failed on that one. I will also guess the community will go and succeed, but that the result will often be horrible performance (since no matter what you do, joins will remain expensive).