LSD::RELOAD

March 25, 2007

Review of Google Apps

Filed under: Moved from old blog, Tech — Leo Simons @ 14:00

If you e-mail lsimons A jicarilla dot nl (which no-one ever does), the e-mail no longer ends up in my inbox, it ends up at gmail.jicarilla.nl. Encouraged by a collegue, I’ve been giving Google Apps a try.

Setting up

They have this figured out amazingly well. It definitely made me gow “wow”. And again, “wow”. If you own a domain, and you have access to the DNS settings for that domain, they guide you, step by step, to making their different services avaialble under that domain, including using it for e-mail.

I had it all done within an hour, just clicking around and following some instructions (most of them, I didn’t read). It just works so much better than other managed hosting solutions I’ve tried in the past (like things using cPanel), it’s amazing.

It was a little annoying that it kept forgetting that I, while being Dutch and living in The Netherlands, like to have all my user interfaces in English. I had to change that behavior in every individual app, and when I changed my location information, some of it flipped back to English.

GMail

The free account is 2GB, which is not enough for me. The paid account is 10GB, which is still not enough for me. So migrating is not even an option, even if GMail was deemed secure enough by the powers that be @ work.

Things I like

  • the ‘conversation’ model where you can manage whole conversations (wouldn’t work for my work e-mail though, where many people just hijack threads all the time)
  • the keyboard shortcuts
  • the gtalk integration
  • how editing of an e-mail starts inline by default, and you can break it out later

Things I don’t like

  • The unneccessary information and hints like, “visit settings to save time with keyboard shortcuts”. I’m a geek, the first thing I visited was the settings.
  • The primitive filtering. I am used to having all the power of procmail available to me (filtering by mailing list headers that ezmlm adds, for example). I can’t ever see this working.
  • Tags. I’m used to folders, and they are a vital part of what helps me mentally “switch hats” between job/open source/personal tech interest/private life. I don’t see how I could do that with tags.
  • No offline mode, or at least its not that clear to me how to get that.

All in all, it’s just very different, and I’m not actually looking for different. I suspect large parts of the world feel similarly and just want to keep their basic three-pane interface. Of course, there’s not that many people that process as much e-mail (and as weirdly) as I do, so I can understand how the GMail interface is not exactly optimized for people like me. I definitely would recommend it to, for example, my parents, if they hadn’t been using Outlook Express (and before that, Netscape Mail) and just want to keep using such an interface.

Google Calendar

Day view, week view, month view. Snappy interface. Search. Multiple calendars distinguished by color. iCal support. Creating an event in the right spot takes just a few seconds. I like.

I miss is the GTalk mini-window, that GMail does have, in the calendaring interface.

What would truly make this a killer app for me is if I could sync from Apple’s iCal to it, instead of just subscribe to Google Calendar from iCal. The reason for that is that I often jot things down while offline, and I’d want those automatically appearing (or at the click of a button) on google as soon as I had a network connection again.

Oh, and then of course I’d need Apple’s iCal to support SSL.

I would definitely recommend google calendar to anyone that needs a calendaring solution, especially if you want to use shared calendaring, and you don’t need to edit your calendar while offline.

Google Docs and spreadsheets

It works, and it autosaves your work, and it has revisions. I could never actually use this (since I need to work with all these big technical and/or legal MS Word documents which have comments, master pages, revision tracking, etc), but it is otherwise really nice for simpler things. Especially the shared editing.

This would be one of my favorite apps on the web, and I would probably actually use it the next time I need some loose planning for something (like a party or a conference call or whatever) if it weren’t for the existence of Dabble DB. Google’s solution works, but dabble is magic.

Integration between pieces

While everything except the page creator-generated website is available over SSL, the links between the pieces (i.e. when you’re in google calendar and you click on “gmail”) keep redirecting back to plain HTTP. In general, security seems, ehm, not paranoid enough for people like me.

While, wherever you are, there’s always a link to the other apps within reach in a consistent location, there’s still a highly modal way of working, where you have to mentally switch from writing an e-mail to writing a document. You can’t actually send a google docs document as an e-mail easily; you have to export it, save it locally, switch to gmail, create an e-mail, navigate to the file, upload the attachment, type the message, and send it. Hpff. Similarly, you can’t select a document in google docs and use “send to” (windows) or drag-and-drop (mac) to create an e-mail out of it.

This is really where either the microsoft solution (outlook, office, windows explorer integration, “start > my recent documents”) or a geeky solution (like me…mutt, SubEthaEdit, spotlight, locate/find/ls, office available when I need it) offers stellar productivity, and where trying to do it “in the web browser” still completely breaks down.

Managedness

Think of all the things you get for free in this solution! High availability (from any computer!), good backups (or so I assume), incredible search and filtering capabilities, real web-based collaborative editing. I think what it removes from the average computer user (or IT administrator) is the need to worry about many of these things. Once you get used to it, it works. And google has a proven record to keep things working.

It’s just that I already have these things…high availability (yay, svn), good backups (hurray for backuppc), great search (spotlight, mairix), better filtering (procmail, unix command line) and more mature collaborative editing (again svn, and SubEthaEdit)…and I can trust in myself instead of placing trust in a US company (with the US having laws I am not always a fan of).

Page Creator

It seems to work ok enough, and is easy enough to understand. It produces HTML 4.01 traditional which doesn’t validate. I didn’t spend much time with it; I prefer writing my XHTML-compliant HTML by hand…

Google Talk

We pretty much standardize on skype at work, and my non-geek friends pretty much standardize on MSN these days. Until they integrate GTalk with those closed silos, it is not that useful to me.

Start Page

Yawn. I’ve never seen one of these that I liked. I tried to use it, but quickly just dragged the service URLs into my firefox bookmarks toolbar.

Conclusion

Google’s current offering is easy to set up and easy to use for certain distinct, well-defined tasks. Yet, it is not secure enough for the paranoid (even if you trust google with your data). Nor does it offer productivity resembling anything a modern-day knowledge worker is used to.

As far as web-based solutions go…I like the e-mail offering from Yahoo! better. I like Dabble DB better for collaborative editing. Yet, google currently has the best calendaring solution I’ve seen, and is seemingly the fartest along with offering integration between all the different pieces.

The big plus to this offering, and google is getting this right in an amazingly intuitive way, is that it simplifies administration dramatically. For any small business IT administrator that is struggling to get MS Exchange and SharePoint up and running, going with the google setup will definitely be a breath of fresh air.

I won’t be switching to actual use of Google Apps any time soon, but I would definitely recommend small businesses, schools, families, and individuals to evaluate it and try it out. It is simple, it is intuitive, and it is free (or really cheap at $50/user/year if you want paid support and more storage).

Google Apps is serious competition for microsoft, even if perhaps not yet for the enterprise. Even if it doesn’t take off in a big way, this new solution will help to break the microsoft monopoly, and will get microsoft to produce better products at a lower price. Yay!

March 24, 2007

Replace maven with a shell script

Filed under: Java, Moved from old blog, Open Source, Tech — Leo Simons @ 16:29

One of the things I find myself trying to instill in a lot of our developers these days is that a little pragmatism can often go a long way.

By popular demand (really!), here’s my trivial shell script that pretends to be maven. For smallish projects and small sizes of your local maven repository, it is orders of magnitude faster than doing an actual maving run, and it has many other advantages over the “real” maven.

Of course, I don’t actually use this script (much). Lately I’ve been using Ant 1.7 with Ivy. Oh, And mod_perl’s Apache::Test for TripleSoup.

#!/usr/bin/env bash

artifactId=`xmllint --noblanks project.xml |

        egrep -o '<id>[^>]+<\/id>' |

        sed -e 's/<id>//' -e 's/<\/id>//'`

groupId=`xmllint --noblanks project.xml |

        egrep -o '<groupId>[^>]+<\/groupId>' |

        sed -e 's/<groupId>//' -e 's/<\/groupId>//'`

currentVersion=`xmllint --noblanks project.xml |

        egrep -o '<currentVersion>[^>]+<\/currentVersion>' |

        sed -e 's/<currentVersion>//' -e 's/<\/currentVersion>//'`

shortDescription=`xmllint --noblanks project.xml |

        egrep -o '<shortDescription>[^>]+<\/shortDescription>' |

        sed -e 's/<shortDecription>//' \

            -e 's/<\/shortDescription>//'`

package=`xmllint --noblanks project.xml |

        egrep -o '<package>[^>]+<\/package>' |

        sed -e 's/<package>//' \

            -e 's/<\/package>//'`

organization=`xmllint --noblanks project.xml |

        grep -A5 '<organization>' |

        egrep -o '<name>[^>]+<\/name>' |

        sed -e 's/<name>//' \

            -e 's/<\/name>//'`for jar in `find $HOME/.maven/repository -name "*.jar"`; do

    CLASSPATH=$CLASSPATH:$jar

done

CLASSPATH=`pwd`/target/classes:`pwd`/target/test-classes:$CLASSPATH

export CLASSPATH

echo Building $artifactId-$currentVersion.jar...

rm -Rf target

mkdir -p target/classes

mkdir -p target/test-classes

cd src/java

javac -nowarn -Xlint:-deprecation -source 1.4 -target 1.4 \

        -d ../../target/classes \

        `find . -name '*.java'`

for dir in `find . -type d -not -path '*svn*'`; do

    mkdir -p ../../target/classes/$dir

done

cp -r `find . -type f -not -name '*.java' -not -path '*svn*'` \

        ../../target/classes

cd ../..

mkdir -p target/classes/META-INF

cp -f LICENSE* NOTICE* target/classes/META-INF 2>/dev/null

cat > target/classes/META-INF/MANIFEST.MF <<MFEND

Manifest-Version: 1.0

Created-By: Apache Maven Simulator 1.0

Extension-Name: $artifactId

Specification-Title: $shortDescription

Specification-Vendor: $organization

Specification-Version: $currentVersion

Implementation-Vendor: $organization

Implementation-Title: $package

Implementation-Version: $currentVersion

MFEND

cd target/classes

jar cf ../$artifactId-$currentVersion.jar *

cd ../..

echo Installing $artifactId-$currentVersion.jar...

mkdir -p $HOME/.maven/repository/$groupId/jars

cp target/$artifactId-$currentVersion.jar \

        $HOME/.maven/repository/$groupId/jars

echo done

March 18, 2007

RDF modelling at Joost: no bnodes

Filed under: Moved from old blog, Tech, Work — Leo Simons @ 16:24

As I mentioned in a previous entry, Joost™ uses quite a bit of RDF. I’m sorry, but I’m not going to share our full data model with you (though we might do that in the future). All I want to try and do is highlight some basic choices that we (mostly Alberto) have made on how to model things using RDF.

Choice number one:

No bnodes

A blank node (definition) or “bnode” for short is when you have a subject in RDF that doesn’t have a ‘real’ URI.

Where do you use bnodes?

You encounter bnodes when modelling things in a ‘normal’ object-oriented fashion, and especially a lot in ‘normal’ modern XML. For example, the XML document

Listing 1
<!--[CDATA[

]]-->

might be turned into RDF as

Listing 2
<!--[CDATA[@prefix :  .

leo      isA               Person ;

         name              "Leo Simons" .

alberto  isA               Person ;

         name              "Alberto Reggiori" .

foo      isA               Group ;

         name              "Mentioned in article" ;

         containsPeople    ( leo alberto ) .]]-->

which is a special Notation3 (or Turtle) shorthand for

Listing 3
<!--[CDATA[@prefix :  .

leo      isA               Person ;

         name              "Leo Simons" .

alberto  isA               Person ;

         name              "Alberto Reggiori" .

foo      isA               Group ;

         name              "Mentioned in article" ;

         containsPeople    _:1 .

_:1      rdf:first         leo ;

         rdf:next          _:2 .

_:2      rdf:first         alberto ;

         rdf:next          rdf:nil .]]-->

_:1 and _:2 are bnodes. Doesn’t seem to be a problem with this, does there? (aside from rdf collections being cumbersome)

How does it look without bnodes?

Well, consider this alternative:

Listing 4
<!--[CDATA[@prefix :  .

leo      isA               Person ;

         name              "Leo Simons" .

alberto  isA               Person ;

         name              "Alberto Reggiori" .

foo      isA               Group ;

         name              "Mentioned in article" ;

         containsPerson    leo ;

         containsPerson    alberto .]]-->

It consists of less triples, obviously meaning less storage space, and, given the nature of RDF databases today, also better performance. As a data model grows in complexity, it seems that the percentage of bnodes will normally grow a bit as well, so the effect is more pronounced for lots of data.

But, more importantly, the software you have to write becomes more involved. Let’s investigate.

The effect of bnode use on source code

Here’s some imaginary java code (using jena) that prints certain data it finds in the model:

Listing 5

  List names = new ArrayList();

  for(RDFNode nameNode : m.listObjectsOfProperty(r, Example.name)) {

    if(!nameNode.isLiteral()) {

      continue;

    }

    Literal nameLiteral = (Literal)node.as(Literal.class);

    try {

      names.add(nameLiteral.getString());

    } catch(DatatypeFormatException e) {

    }

  }

  returns names.toArray(new String[names.size()]);

}private boolean hasType(Model m, Resource toCheck, Resource expectedType) {

  return m.contains(m, toCheck, RDF.isA, expectedType);

}

private void printHeader(Model m, Resource groupResource) {

  String[] groupName = getName(m, groupResource);

  if(groupName.length > 0) {

    for(String name : groupName.length) {

      System.out.println("Group name: " + name.getString());

    }

  } else {

    System.out.println("Group name: ");

  }

  System.out.println("----");

}

private void printName(Model m, Resource personResource) {

  String[] personName = getName(m, personResource);

  if(personName.length > 0) {

    for(String name : personName.length) {

      System.out.println("  " + personName.getString());

    }

  } else {

    System.out.println("  ");

  }

}

public void printInfoAboutGroup(Model m, URI groupId) {

  ValidationUtil.checkNotNull(m, "m");

  ValidationUtil.checkNotNull(groupId, "groupId");

Resource groupResource = m.getResource(groupId.toString());

  if(groupResource == null) {

    System.err.println("Warning: no such group: " + groupId.toString());

    return;

  }

if(!hasType(m, groupResource, Example.Group)) {

    System.err.println("Warning: not typed as a group: " + groupId.toString());

  }

printHeader(m, groupResource);

//

  // NOTE: for loop in a for loop in a for loop

  //

  for(RDFNode groupHead : m.listObjectsOfProperty(groupResource, Example.containsPeople) {

    if(!groupHead.canAs(Container.class)) {

      System.err.println("Warning: Group "+groupId.toString()+" containsPeople points to a non-container");

      continue;

    }

    Container container = groupHead.as(Container.class);

for(RDFNode peopleNode : container.iterator()) {

      if(!peopleNode.isResource()) {

        System.err.println("Warning: Group "+groupId.toString()+" rdf:next points to a literal: " +

          ((Literal)peopleNode.as(Literal.class)).getLexicalForm();

        continue;

      }

      Resource peopleResource = (Resource)peopleNode.as(Resource.class);

      if(!hasType(m, peopleResource, Example.Person)) {

        String identifier = (peopleResource.isAnon())?

            peopleResource.getId().getLabelString() :

            peopleResource.getURI();

        System.err.println("Warning: not typed as a person: " + identifier);

      }

      printName(peopleResource);

    }

  }

}

...]]>

Here’s the printInfoAboutGroup() method again, now for the RDF model structure from listing 4.

Listing 6

I suspect that, if you haven’t seen RDF-inspecting source code before, all of the above looks a little scary. There’s a load of looping and checking that you don’t have to take into account when using simple javabeans. This is the price to pay for the open world assumption, though of course a lot of it can be abstracted out in utility code a lot better than I’ve done above.

However, in the midst of all that java fluff, the difference should still be clear — Listing 6 has one nested for loop less than Listing 5. No matter how much you clean up this code, that fundamental difference remains, and, because of the open world assumption, it is rather more important…compare…

Conclusion

Because of the open-world assumption, making use of bnodes is very expensive when doing real-world software development. Therefore, bnodes should be avoided. Compare:

  • Object oriented world: foo.bar.getXyz() vs. foo.getBarXyz().
  • XML world: foo.getElementsByTagName("bar").getAttribute("xyz") vs. foo.getAttribute("barXyz")
  • RDF world: for(foo) { for(bar) { for(xyz) { ... }}} vs. for(foo) { for(barXyz) { ... }}.

You can forget all of the above, just remember these rules:

  • Don’t use RDF collections. Use one-to-many properties that result in “collections” instead.
  • If you need ordering, define the sorting algorithm instead of putting the ordering in your data.
  • If you have (sort-of) one-to-one relationships in your model, and one or both sides of the relationship is identified by a bnode, merge the concepts into one and distinguish using properties.

I upgraded to SVN 1.4

Filed under: Moved from old blog, Tech — Leo Simons @ 13:20

What should’ve been simple (”port install subversion“) took about two months. The reason for this is that the SVN working copy format changed in a backwards-incompatible fashion. The SVN team has always mentioned that the format would break it at some point, but many people just ignored those words of warning (mainly because there wasn’t exactly a convenient way to get the same information in another way with early subversion releases). So…

  • I had to buy an upgrade for IntelliJ IDEA, from version 5 to version 6, to use my IDE with SVN
    • (because IDEA uses SVNKit, a pure java replacement for the SVN client, written by a different developer team)
  • I had to fiddle for hours to get the security settings just right (I still haven’t found out exactly what broke, it has to do something with our certificate-based setup at work, I made sure to tell the SVN developers we employ around here since I don’t dare share details of our security measures; I hope one of them finds time to fix it)
  • I had to upgrade my custom blog software, xblog, since it, like SVNKit, also parsed the contents of .svn/entries. Here’s the patch.

Needless to say, I got consistently more unhappy during all of this. What I hated the most was having to fork out a bunch of money (of course, the company paid, but I’d rather see that cash go to the soccer table so badly desired by some) just to keep my trusted developer toolset working.

It’s also rather painful to note that the XML support in the subversion client still isn’t quite what you would expect out of a ‘normal’ XML tool. For example:

$ svn info --xml foo

<?xml version="1.0"?>

<info>

foo:  (Not a versioned resource)</info>

$ echo $?

0

$ # note how the error message does go to stderr properly

$ svn info --xml foo 2>/dev/null

<?xml version="1.0"?>

<info>

</info>

Much better would be if, in this case, --xml would return some kind of <error>, or at least would set the exit code to something that readily enables you to see there might be a problem. It doesn’t really matter for my little xblog script, but for a tool like gump, it seems likely to cause problems one way or another.

Make Rocks (too)

Filed under: Moved from old blog, Tech — Leo Simons @ 12:49

William writes:

Nothing made this more clear than working with Rake, Make, and Ant—all in the same day. Make is ridiculous, Ant is reasonable, and Rake rocks.

He has a nice example of how to do flash-y things without flash and how to use rake to do cool things. However, I think it doesn’t justify his “Make is ridiculous” statement at all. Here’s a solution to his problem using make and bash (might be proper sh, who knows):

Makefile

IMAGES:=$(wildcard resources/images/*.jpg) $(wildcard resources/images/*.png)resources/images.xml: $(IMAGES)

 bash resources/images.xml.sh $(IMAGES) > resources/images.xml

images.xml.settings.sh

SWF_VERSION=7

SWF_WIDTH=450

SWF_HEIGHT=550

SWF_BACKGROUND="#ffffff"

SWF_FRAMERATE=24

images.xml.sh

# load settings

. images.xml.settings.sh# header

cat << END

<?xml version="1.0" encoding="iso-8859-1"?>

<movie version="$SWF_VERSION"

       width="$SWF_WIDTH"

       height="$SFW_HEIGHT"

       framerate="$SWF_FRAMERATE">

  <background color="$SWF_BACKGROUND"/>

  <frame>

    <library>

END

# line for each clip

for fname in $*; do

  name=`basename "$fname" | sed -r 's/\.(png|jpg)$//'`

  echo "      <clip id=\"$name\" import=\"$fname\"/>"

done

# footer

cat << END

    </library>

  </frame>

</movie>

END

I haven’t tested it elsewhere but on my laptop, but I’m reasonably confident this setup will work by default on just about all linux/unix/mac os x machines out there, including ones from 10 years ago. It also doesn’t require one to learn a new language (ruby) or a new domain-specific language (rake) if you’re an “old fart”, integrates easily with most existing build systems one can imagine, and has about the same number of lines of code.

January 1, 2007

The startup suck

Filed under: Life, Moved from old blog, Work — Leo Simons @ 1:58

Startups suck. Not suck as in suck ass, but suck as in suck you in. It’s an interesting experience.

I work at a company that somewhat resembles a startup. I’ve worked there (as an employee, that is) for 3 months now. Like many of my collegues, I tend to work 60-hour weeks.

Before that, I was self-employed, and I worked 60-hour weeks, too, and I dare say with more stress. However, somehow I always managed to find some time to keep somewhat involved with some amount of open source stuff on the side, whereas now, I don’t even seem to find the time to read the members@apache.org mailing list.

It’s not because we don’t use open source software at work:

$ pwd

/Users/lsimons/dev2/tvp/bistro-trunk/external-libs

$ find . -name '*.jar' | grep -v '.svn' | wc -l

     294

It’s also not because my employer doesn’t want me working on open source projects (quote: “if this thing starts taking up more than 30% of your time we should sort-of talk about it”).

It’s about the rhythm. Just about everyone in our little p2p video company works like crazy, yet parties like students (actually, the students tend to leave the bars way early in Leiden, whereas we often get kicked out). In many ways, coming into the office (which I tend to do for about two days a week, the rest I work from home) feels more like arriving at a mini-conference than like, ehm, coming into the office. It just sucks me in, and I like it that way.

What I do these days? Judging by flickr, crowd control, ranting, eating, drinking, and making a fool of myself (and others). I don’t think there’s a public picture to be found on flickr yet of me actually working. Hmm….

I guess I should be happy I don’t work at youtube — then the above links would’ve been to the even more embarassing christmas dinner video footage, that, as far as I know, hasn’t made it off of our SSL-secured intranet. Of course, if we do our job really well on the COW (short for Content Owner Website, looks to be the main project for me to work on in January), we might be seeing a video channel on TVP about TVP before I can come up with a ploy to prevent it…

PS: you’ll be missed, Mads. I’m sure you find the job you’re looking for soon.

November 11, 2006

RDF at The Venice Project

Filed under: Moved from old blog, Tech, Work — Leo Simons @ 15:42

So, the secrecy is slowly being dropped.

As should be obvious by the sudden influx of people blogging about our little project, the company blogging policy was made available. Such. A. Relief. I hate secrecy. There’s only a few things (that interest me) which I can’t talk about, apparently. Yay!

So let’s mention something which might be interesting to my audience.

We make extensive use of RDF in different places. It all starts with a core RDFS/Owl schema that is used to capture various kinds of information (think FOAF +imdb+RSS+a lot more). I suspect some parts of the modelling work that was done here will make it into future standards for online video.

We have a custom distributed digital asset management system (or DAM), built around jena-with-postgres at the moment for storage and (CRUD-like) management off all that RDF-ized information over a REST protcol.

We convert from RDF to different specialized XML formats and back again. We convert from RDF to excel spreadsheets and back again (ugh). We have our jira instance hooked up to our RDF store. We convert RDF to other kinds of RDF. We have custom RDF visualization tools. We have custom RDF store crawlers that do efficient validation. We have RDF schemas that control the behavior of other distributed systems by adding intelligence to the core schema. We do triple timestamping. We do intelligent schema-driven indexing. We have custom libraries to make doing wicket-based, RDF-based web application development easier. Oh, we do RDF-based web applications. In short, we do more RDF than you can shake a stick at. So not a day goes by without some of our developers swearing about “RDF” or “metadata”, since in many ways RDF still isn’t exactly mature technology. But we’ll fix the warts, and contribute those fixes back to the open source community.

In many ways, to me, the RDF part of our server architecture is much like WADI (I spent a year building the next version of it with asemantics before joining The Venice Project), with postgres instead of oracle, REST instead of SOAP, and a much less scary data model.

This RDF-based digital asset management (or DAM) seems like something everyone’s doing right now. For example, Sesame has HTTPSail now, and the Simile people have Semantic Bank (I know of several more examples I’m not sure I can mention here).

Since everyone is inventing roughly the same wheel at the same time, and some people have re-invented it several times now, it is obvious it is about time for an open source project that does RDF-over-HTTP, properly. I’ve been talking to various people about this for a while now, and a bunch of us are almost ready to approach the Apache Incubator with a proposal for a project to build a “sparql endpoint”. And the venice project will be donating some code (and developer time!) to seed this effort. Hopefully we will go from annoyingly secretive to actively open (and open source) in the scope of a few weeks.

Now, back to work. Or rather, lunch, in London, with some people on our content team. See if the sushi is even better here than it was in New York.

May 14, 2006

Organising design-by-contract test code

Filed under: Java, Moved from old blog, Tech — Leo Simons @ 16:57

Lets say you write component-oriented, loosely coupled, design-by-contract (java) software and you do rigorous unit testing. How do you organise your codebase?

Diagram showing a way to organise interface, test, and implementation code

Here’s one way.

  1. Put all of your (work) interfaces, and everything referenced from those interfaces (like simple data beans and exceptions) in an “API” package. If you use a tool like Maven to split your code up into small libraries, make the API package into a seperate library. The main reason for this is that you (hopefully) make it possible for packages that use your library to compile and link against only the API package, which promotes loose coupling.
  2. Optionally put “support code” which will only ever really need one implementation or will be shared by implementations in an “SPI” package. If your library has a “implementation registation” or brokering interface such as the JDBC driver manager, it is likely a good candidate for this SPI package. (The abbreviation SPI stands for Service Provider Interface). Either package this SPI package with your API package, or provide it seperately.
  3. Write abstract test cases for each of your (work) interfaces. These abstract test cases are specifically geared at testing the contract promoted by the (work) interfaces, and not things like setup, initialization or destruction code. (If you write your code test-first, you’ll write these tests before you start on the work interfaces). Since JUnit (at least for 3.x) pretty much requires that you use subclassing of its TestCase class, it may make sense to introduce a common abstract base test class immediately (I tend to have an abstract base test case for handling the “no test result” result that junit 3.x does not natively support). This kind of abstract test case might look like this:
    public abstract class AbstractFooXXXTestCase
    
        extends AbstractFooTestCase {
    
      protected abstract FooXXX getInstance();  public void testFooNeverReturnsNull() {
    
        assertNotNull( getInstance().foo() );
    
      }
    
    }

    You can keep these TCK tests right next to the concrete tests for your (first) implementation code, or you can make it into a seperate package.

  4. Create concrete subclasses for these abstract test cases, which feed the superclass an instance of an implementation of the (work) interface that is being tested. These concrete classes are responsible for creating mock objects of the dependencies, doing any neccessary initialization, etc.
  5. Create the implementation code that is needed to make the concrete test cases compile and run successfully. If you’ve prepared a seperate “API” package, you might want to consider naming this package the implementation (or “impl” for short) package. If there’s a specific technology used for this implementation (like a specific database backend), you might want to name the package after that, instead of just using “impl”.
  6. The SPI package probably needs some test cases of its own. These are not a part of your TCK. Keep them close to the SPI code.
  7. Rinse and repeat. When doing incremental development, you’ll typically add a test to one of the TCK test classes, possibly modify the API and implementation package to make sure so everything compiles once more, re-run the concrete test suite to make sure the newly added test fails, then modify the implementation code and re-run the tests until all the tests pass again. When doing more of an up-front design, its still a good idea to start codifying design contracts for your (work) interfaces in a TCK suite as soon as possible, as a properly written TCK is probably a bit less ambiguous than most specification documents.

Note that, for bigger projects with multiple implementations, the exposed interfaces of the SPI and TCK packages need to be treated with similar care as those in the API package, since adapting implementation test cases to changes in the TCK might be too much effort otherwise.

I’ve been using this kind of code organisation for quite a while now (first for the jicarilla.org codebase, later also for commercial projects). PicoContainer uses it too, and it might very well be where I borrowed the idea from.

May 12, 2006

Theatersport: a dutch form improvisation theatre

Filed under: Moved from old blog, Personal — Leo Simons @ 14:49

For about a year and a half now, I’ve been very happy to be a part of Pro Deo, a theatersportvereniging (English: Theatresports). I thought I’d share a little about it.

Mix tournament

Picture from the ‘mixtoernooi’ in 2005. By Rudine Bijlsma.

Yesterday was the twice-yearly “mixtoernooi” (above picture from a previous one), where the teams (if you’re thinking “teams? What teams?” - Go click those links) consist of players with varying levels of experience. These are always lots of fun, since its a good way for the less experienced players to learn to be “on stage” with a bit of a “safety net” provided by the players with a little more experience.

It was a good show. Quite a bit of audience, good atmosphere, and lots of energy.

We had two matches of about 45 minutes each, and I was in one of them, meaning I played a part in three improvised scenes of about 4 minutes each. Our team played a spacejump, a free impro, and a time for a song. It went well and afterwards I was real proud of each of my teammates for putting on a great show. Of course, we lost from the other team by a large margin, which is how it should be.

Improvisation theatre builds character

Theatresports teaches its practicioners how to listen, how to be a team player, how to feel confident about themselves, and much more.

Theatresports is an excellent way to learn how to feel confident on stage (once you dare step up there with a hundred people watching you and no clue yet what it is you’re going to do in about 3 seconds, you dare do allmost anything), which is also a great way to learn how to feel confident when presenting or speaking in front of an audience.

Theatresports excels even more at learning how to be a team player. In order to function well as an improvisation theatre team, there needs to be a whole lot of trust between each of the players, and a high comfort level. And beyond that, nearly everything about the “art form” is there to encourage or even require healthy collaborative behaviour. Golden rules like “you should accept whatever it is that someone invents right there on the spot” (you don’t really want to get into a discussion about it in the middle of your scene) go way, way beyond the “lazy consensus” that open source people may be used to.

Theatresports makes you a better listener. In order to be able to interact with other people on stage (and with the audience) in a witty and dynamic fashion, without any kind of script, you need to pay attention, and lots of attention, to what is going on around you, otherwise you’ll misread your teammates intentions and the whole scene can go down the drain.

Etc etc etc.

Of course, for me, these are really insignificant pluses compared to the joy of doing various silly things with friends. We have a few hours of lessons and practice tonight, and after that we’ll frequently hang out in the bar until dawn.

April 17, 2006

What is make?

Filed under: Moved from old blog, Tech — Leo Simons @ 16:29

Before we go and talk about a better make, we should make sure we define what make actually is. Its a rather simple tool for transforming files from their source form into something else and doing some amount of dependencies tracking. Its a “dependency maintenance tool” and a “software construction tool” which is at its best when used together with other tools to form a “build system” and/or a “package management system”.

(This article is part of a series on build tools.)

Make as a tool

The wikipedia page on make starts off like this:

make is a utility that automates the process of converting files from one form to another, doing dependency tracking and invoking external programs to do additional work as needed. Its dependency tracking is very simple and centers on using the modification time of the input files. Most frequently it is used for compiling source code into object code, joining and then linking object code into executables or libraries. It uses files called “makefiles” to determine the dependency graph for a given output, and the build scripts which need to be passed to the shell to build them. The term “makefile” stems from their traditional file name of “makefile” or (later) “Makefile”.

That’s a rather good, if compact description. There are many variants of make available, all with slightly different features. The ones in most common use today are GNU make and BSD make.

Make as a part of a process

There is more to software than transforming source files into object files. I’ve written more about the software management process. Make is often the “driver” program for several of these stages. To build software, you often just type make. To test it, you type make test. To package software up into a release, you’ll type make dist. To distribute it, you’ll type make publish. To install it, you type make install.

Make really is optimized primarily for the “build” step, e.g. for the actual sourcefile to object file transformations. But since it uses the shell for executing commands, and the shell is the usual way you execute commands (yeah yeah, I know there is such a thing as a GUI), its real easy to hook up commands for doing most of the other stuff, too, and its possible to implement rules such as if a source code file changed, recompile the output files that are created from that file, then rerun the tests that test those output files, even if it quickly becomes quite complex and awkward to maintain those rules.

Using make for building software

Now, even when using make for just “the build” it usually does not operate in isolation. In the more-or-less standard way make is used to build a piece of “native” software, the sequence of commands is actually more like ./configure; make. Here, configure is a (rather extensive) shell script.

Since writing a really complex, portable shell script is very hard, many developers of C/C++ software use GNU Autoconf and the other parts of the GNU build system (autoconf, automake, and libtool) to assist them. These tools help to with generating the configure script and the Makefile. The complete process looks something like this:

Simplified program flow for the GNU build system/div>

So to make effective use of make for builds of software of realistic complexity which must build on a variety of platforms, we have to add several more steps of macro expansion, introspection, transformation, compilation, etc. Automake and autoconf are usually run by the original package developer, and then the generated configure script and generated Makefile are shipped to the end user. This means that the automake and autoconf dependencies (like perl and the GNU version of the M4 macro language) are not needed by the end user. However, since the configure script and Makefile are very complex, if the end user wants or needs to make changes to the build process, they usually still need to have automake and autoconf installed.

Note that while the GNU build system has some support for software written in languages such as perl, python or java, developers using those languages tend to not use make at all, instead opting for a language-specific tool.

Using make for other tasks

Any kind of task which involves the transformation of one kind of file into another kind of file is something where make can be very useful. For example, make can be used to package up generated code into a tarball, or to invoke any of the multitude of latex tools out there to generate HTML or PDF documentation.

Make is also often used for tasks that have little to do with transformation of input files into output files, but which are somewhat related to the more general process of dealing with software. For example, many Makefiles support an install target, which copies compiled software into a location on the filesystem where it can be easily invoked. Similarly, make is the basis of the BSD ports system.

How make interacts with packaging systems

Many linux distributions have a special kind of “packaging format” which adds its own “metadata” to a particular piece of software. This metadata describes what commands should be invoked to make the software compile and install successfully on the target platform. Several management tools are usually provided for managing this kind of metadata and/or these kinds of packages. In most cases, these tools at some point in their execution invoke make to do the actual software build. The BSD ports system is an interesting exception — it is completely based around make, and hence make tends to be used to invoke itself.

Here’s a picture showing the relationship between make and many common packaging tools:

Image illustrating the relationship between packaging tools and Make

(Note how building software for windows generally does not involve make. Also note how messed up package management on windows generally is. Coincidence? I don’t think so.)

Make as something to improve on

Make is rather simple, stable and mature software available on just about every computing platform out there, and installed by default on about every operating system but Microsoft Windows. While make is primarily optimized for building object files from source code and tracking dependencies between object files and the corresponding source files, it is flexible enough to be integrated with and usable for many other tasks of the software management process.

So what is there to improve on? Together with the previous two posts, this post should provide enough “background” so I can start compiling a list…

April 17 update: minor formatting updates and reference to the overarching series.

Next Page »

Blog at WordPress.com.