On the JDBC API incompatibility in Java 6 / JDBC 4.0

I updated the venerable avalon logkit for JDBC 4.0 / Java 6 today. I had to do this because that API breaks backwards compatibility for implementers.

Note that the incompatibility is very subtle – it is a source incompatibility only. As long as you don’t recompile your DataSource against the java 6 codebase, you won’t notice any problem.

The cost of API incompatibility

Spring and DBCP had to make roughly the same change 2 years or so ago, as no doubt 100s of other projects have done over the years – everyone that implements DataSource and wanted it to compile against java 6 / JDBC 4.0. Of course the pain is not over, since all those projects still want to maintain backward compatibility even when the JDBC spec did not. A recent thread about commons-dbcp is a good example.

The majority of the API incompatibilities between java 1.4 and java 6 are in fact due to the new JDBC 4.0 package. (Other significant breakages include changes to javax.net.ssl.SSLSession and org.w3c.dom.) Of all the API breakage, the changes in JDBC 4.0 probably have had by far the biggest impact, since the JDBC interfaces are explicitly designed for implementation by third parties, and are implemented in many different places.

Incompatibility by accident

This slipped through even though the relevant JSR explicitly states in its proposal:

Ensure JDBC backward compatibility

Many applications and deployments have significant investments in the JDBC technology and any improvements to the API, the provision of utility class methods and the ability to utilize meta data facilities and generics will maintain backward compatibility to all previous JDBC specifications.

And the final draft spec says something similar:

Maintain backward compatibility with existing applications and drivers

Existing JDBC technology-enabled drivers ( JDBC drivers) and the applications that use them must continue to work in an implementation of the Java virtual machine that supports the JDBC 4.0 API. Applications that use only features defined in earlier releases of the JDBC API will not require changes to continue running. It should be straightforward for existing applications to migrate to JDBC 4.0 technology.

The spec doesn’t mention the incompatibility in its revision history or in the overview of the new features.

The expert group was led by someone from Sun, and there was participation from IBM, BEA, Oracle, MySQL, and many others. The executive committee for the JCP (which includes Apache, Google, HP, and others) unanimously approved the JDBC 4.0 API several times.

Even though JSRs have to have extensive TCKs (Technology Compatibility Kits), and even though the spec was co-authored and reviewed by a large sampling of vendors of JDBC technology, and even though Sun has extensive QA processes, this change still slipped through.

I wish I could read through the mailing list archives or the bug tracker for the JSR expert group’s work, to see if its possible to figure out if anyone found this problem before they released the spec. It’s obvious (to me) now that they made a mistake here, but I wonder if they were aware of the impact of this change back then. Alas, that data is not available.

Preventing incompatibility by automation

Something like this could have trivially been avoided if someone had bothered to run JAPI, the API comparison tool that Kaffe uses, or if someone had put the JDBC 4.0 draft API into a large-scale integration tool like gump.

It seems pretty obvious to me that doing spec development and API evolution in the open is a really good way to increase quality of the specification. This is a good concrete example of why doing open development matters.

Open source product-centric business model

  1. Build a community around a specific free thing.
  2. With that community’s help, design some products that people want, and return the favor by making the products free in raw form (source code).
  3. Let those with more money than time/skill/risk-tolerance buy the more polished version of those products. (That may turn out to be almost everyone)
  4. Do it again and again, building a 40% margin into the products to pay the bills.

(Adapted slightly from The time/money formula of free).

That pretty much summarizes the business model for sleepycat, MySQL, etc. You could also call it the “provide free lunch for volume, with a fractional but large up-sell” model.

Leo is…

I have always been jealous of Sam for having an idea of what Ruby is. No more.

Leo is…:

  • A general data management environment. Leo shows user-created relationships among any kind of data: computer programs, web sites, etc.
  • An outlining editor for programmers. Leo embeds the noweb and CWEB markup languages in an outline context.
  • A flexible browser for projects, programs, classes or any other data.
  • A project manager. Leo provides multiple views of a project within a single outline. Leo naturally represents tasks that remain up-to-date.
  • Portable. Leo runs on Windows, Linux and MacOS X.
  • 100% pure Python. Leo uses Tk/tcl to draw the screen.
  • Fully scriptable using Python. Leo’s outline files are XML format.
  • Open Software, distributed under the Python License.

Leo is certainly a lot like me. Deals with lots of data and their interconnections, very flexible, heavily focussed on python, heavily focussed on people, open source, pretty smart, and a lot of fun to play with.

Repeatable builds and maven?

A repeatable build is a build which you can re-run multiple times using the same source and the same commands, and that then results in exactly the same build output time and time again. The capability to do repeatable builds is an important cornerstone of every mature release management setup.

Of course, lots of people use some silly much more limited definition of a repeatable build, and are happy as long as “all tests pass”.

Getting to repeatable builds is nearly impossible for mere mortals using maven 1 (heck, maven 1 out of the box doesn’t even work anymore, since the ibiblio repository was changed in a way that causes maven 1 to break), and it is still prohibitively difficult with maven 2.

Of course, the people that do repeatable builds really well tend to create big all-encompassing solutions that are really hard to use with the tools used in real life, and they only really help you when you either do not have a gazillion dependencies, or you do the SCM for all your dependencies, too. For the average java developer, that all breaks down when you find out you can’t quite bootstrap the sun JDK very well, or are missing some other bit of important ‘source’.

Don’t get me wrong. Maven can be a very useful tool. Moreover, in practice, if you do large-scale java, you simply tend to run into maven at some point, and as a release engineer you cannot always do much about that. You must simply realize that when you’re doing release engineering based around maven, that is only sensible if you still really, really pay lots of attention to what you’re doing. Like running maven in offline mode for official builds. And wiping your local repository before you build releases. And keeping archived local repositories around with your distributions. And such and so forth.

Not paying attention or not thinking these kinds of tricky release engineering things through just isn’t very sensible, not when you’re doing so-called enterprise stuff where you might have to re-run a build 3 years after the fact. You cannot afford to count on maven to just magically do the right thing for you. Historically and typically, it doesn’t, at least not quite.

cherry-picking changesets is hard but possible

At work we use svn, with svnmerge for managing branches (most projects tend to use a trunk with a stable branch or two). I’m pretty much a svnmerge newbie, so every now and then I mess up one of my trees. When that happens, I’m lucky enough to have a fair share of total svn experts around to help me figure out what happened.

So here’s a little test scenario that shows what kind of merge conflict you tend to get when you cherry-pick a change to merge to stable, out of chronological order of change to the trunk, and then later you merge an older change that has a conflicting line in it:

$ cd /tmp
$ mkdir svnmergetest
$ cd svnmergetest
$ mkdir repo
$ svnadmin create /tmp/svnmergetest/repo
$ svn co file:///tmp/svnmergetest/repo co
Checked out revision 0.
$ cd co
$ svn mkdir trunk
A
$ cd trunk
$ cat > A <<END
>       line ASTART
>       line A1
>       line A2
>       line AEND
> END
$ svn add A
A         A
$ svn commit -m "Init trunk"
Adding
Adding         trunk/A
Transmitting file data .
Committed revision 1.
$ cat > A <<END
>       line ASTART
>       line A1
>       line A3
>       line AEND
> END
$ svn diff A
Index: A
===================================================================
--- A   (revision 1)
+++ A   (working copy)
@@ -1,4 +1,4 @@
       line ASTART
       line A1
-      line A2
+      line A3
       line AEND
$ svn commit -m "commit 1"
Sending        trunk/A
Transmitting file data .
Committed revision 2.
$ cat > A <<END
>       line ASTART
>       line A1
>       line A4
>       line AEND
> END
$ svn diff A
Index: A
===================================================================
--- A   (revision 2)
+++ A   (working copy)
@@ -1,4 +1,4 @@
       line ASTART
       line A1
-      line A3
+      line A4
       line AEND
$ svn commit -m "commit 2"
Sending        trunk/A
Transmitting file data .
Committed revision 3.
$ cd ..
# FWIW, I messed up commit in 4, and 5, commit 6 is to un-do the mess-up
$ svn cp -m "Init stable" -r 1 file:///tmp/svnmergetest/repo/trunk \
>   file:///tmp/svnmergetest/repo/stable

Committed revision 7.
$ svn up
A    stable
A    stable/A
Updated to revision 7.
$ cd stable
$ cat A
      line ASTART
      line A1
      line A2
      line AEND
$ svnmerge.py init
property 'svnmerge-integrated' set on '.'

$ svn commit -F svnmerge-commit-message.txt
Sending

Committed revision 8.
$ rm svnmerge-commit-message.txt
$ svnmerge.py avail
2-3
$ svnmerge.py merge 3
svnmerge: "3" is not a subversion working directory
$ svnmerge.py merge -r 3
C    A

property 'svnmerge-integrated' set on '.'

$ cat A
      line ASTART
      line A1
<<<<<<>>>>>> .merge-right.r3
      line AEND
$ cat > A <<END
>       line ASTART
>       line A1
>       line A4
>       line AEND
> END
$ svn resolved A
Resolved conflicted state of 'A'
$ svn diff

Property changes on: .
___________________________________________________________________
Name: svnmerge-integrated
   - /trunk:1
   + /trunk:1,3

Index: A
===================================================================
--- A   (revision 7)
+++ A   (working copy)
@@ -1,4 +1,4 @@
       line ASTART
       line A1
-      line A2
+      line A4
       line AEND
$ svn commit -m "Hand-resolve merge conflict"
Sending
Sending        stable/A
Transmitting file data .
Committed revision 9.
$ svnmerge.py avail
2
svnmerge.py merge -r 2
C    A

property 'svnmerge-integrated' set on '.'

$ cat A
      line ASTART
      line A1
<<<<<<>>>>>> .merge-right.r2
      line AEND
$ cat > A <<END
>       line ASTART
>       line A1
>       line A4
>       line AEND
> END
$ svn resolved A
Resolved conflicted state of 'A'
$ svn diff

Property changes on: .
___________________________________________________________________
Name: svnmerge-integrated
   - /trunk:1,3
   + /trunk:1-3
$ svn commit -m "Hand-resolve merge conflict"
Sending

Committed revision 10.

Justin tells me there’s very little in terms of handy fancy dandy software tooling that magically does the ‘right thing’ here without a human’s intervention. The main reason for that is that there is, fundamentally, no ‘right thing’.

So it’s possible to resolve this cleanly, and svn + svnmerge make it pretty clear what is going on, but it’s still a bit of work to figure out what to do. To stay out of trouble, it’s a safe bet that you should limit cherry picking as much as possible, and try and do merges between branches in chronological order whenever you can.

Replace maven with a shell script

One of the things I find myself trying to instill in a lot of our developers these days is that a little pragmatism can often go a long way.

By popular demand (really!), here’s my trivial shell script that pretends to be maven. For smallish projects and small sizes of your local maven repository, it is orders of magnitude faster than doing an actual maving run, and it has many other advantages over the “real” maven.

Of course, I don’t actually use this script (much). Lately I’ve been using Ant 1.7 with Ivy. Oh, And mod_perl‘s Apache::Test for TripleSoup.

#!/usr/bin/env bash

artifactId=`xmllint --noblanks project.xml |

        egrep -o '<id>[^>]+<\/id>' |

        sed -e 's/<id>//' -e 's/<\/id>//'`

groupId=`xmllint --noblanks project.xml |

        egrep -o '<groupId>[^>]+<\/groupId>' |

        sed -e 's/<groupId>//' -e 's/<\/groupId>//'`

currentVersion=`xmllint --noblanks project.xml |

        egrep -o '<currentVersion>[^>]+<\/currentVersion>' |

        sed -e 's/<currentVersion>//' -e 's/<\/currentVersion>//'`

shortDescription=`xmllint --noblanks project.xml |

        egrep -o '<shortDescription>[^>]+<\/shortDescription>' |

        sed -e 's/<shortDecription>//' \

            -e 's/<\/shortDescription>//'`

package=`xmllint --noblanks project.xml |

        egrep -o '<package>[^>]+<\/package>' |

        sed -e 's/<package>//' \

            -e 's/<\/package>//'`

organization=`xmllint --noblanks project.xml |

        grep -A5 '<organization>' |

        egrep -o '<name>[^>]+<\/name>' |

        sed -e 's/<name>//' \

            -e 's/<\/name>//'`for jar in `find $HOME/.maven/repository -name "*.jar"`; do

    CLASSPATH=$CLASSPATH:$jar

done

CLASSPATH=`pwd`/target/classes:`pwd`/target/test-classes:$CLASSPATH

export CLASSPATH

echo Building $artifactId-$currentVersion.jar...

rm -Rf target

mkdir -p target/classes

mkdir -p target/test-classes

cd src/java

javac -nowarn -Xlint:-deprecation -source 1.4 -target 1.4 \

        -d ../../target/classes \

        `find . -name '*.java'`

for dir in `find . -type d -not -path '*svn*'`; do

    mkdir -p ../../target/classes/$dir

done

cp -r `find . -type f -not -name '*.java' -not -path '*svn*'` \

        ../../target/classes

cd ../..

mkdir -p target/classes/META-INF

cp -f LICENSE* NOTICE* target/classes/META-INF 2>/dev/null

cat > target/classes/META-INF/MANIFEST.MF <<MFEND

Manifest-Version: 1.0

Created-By: Apache Maven Simulator 1.0

Extension-Name: $artifactId

Specification-Title: $shortDescription

Specification-Vendor: $organization

Specification-Version: $currentVersion

Implementation-Vendor: $organization

Implementation-Title: $package

Implementation-Version: $currentVersion

MFEND

cd target/classes

jar cf ../$artifactId-$currentVersion.jar *

cd ../..

echo Installing $artifactId-$currentVersion.jar...

mkdir -p $HOME/.maven/repository/$groupId/jars

cp target/$artifactId-$currentVersion.jar \

        $HOME/.maven/repository/$groupId/jars

echo done

Memories from ApacheCon US 2005

Leo Simons at the ApacheCon US 2005 Lightning Lottery Talks, holding two beers and an iBook

Talk about blogging backlog. Over on my old blog, the last entry is about the day before the hackathon before apachecon. The longtime reader will know by now that after that I got a little too busy doing other things to be blogging about those things. Fortunately, other people took pictures. Shockingly, this picture of my 5 minutes of fame bashing the ASF at the lightning lottery talks is not under a creative comments license, so I don’t think I’m allowed to reproduce it here. On the other hand, I think I’m allowed to show a thumbnail…

Update: Ted promptly put his ApacheCon photos on flickr under the creative commons license. Thank you!