Review of Google Apps

If you e-mail lsimons A jicarilla dot nl (which no-one ever does), the e-mail no longer ends up in my inbox, it ends up at gmail.jicarilla.nl. Encouraged by a collegue, I’ve been giving Google Apps a try.

Setting up

They have this figured out amazingly well. It definitely made me gow “wow”. And again, “wow”. If you own a domain, and you have access to the DNS settings for that domain, they guide you, step by step, to making their different services avaialble under that domain, including using it for e-mail.

I had it all done within an hour, just clicking around and following some instructions (most of them, I didn’t read). It just works so much better than other managed hosting solutions I’ve tried in the past (like things using cPanel), it’s amazing.

It was a little annoying that it kept forgetting that I, while being Dutch and living in The Netherlands, like to have all my user interfaces in English. I had to change that behavior in every individual app, and when I changed my location information, some of it flipped back to English.

GMail

The free account is 2GB, which is not enough for me. The paid account is 10GB, which is still not enough for me. So migrating is not even an option, even if GMail was deemed secure enough by the powers that be @ work.

Things I like

  • the ‘conversation’ model where you can manage whole conversations (wouldn’t work for my work e-mail though, where many people just hijack threads all the time)
  • the keyboard shortcuts
  • the gtalk integration
  • how editing of an e-mail starts inline by default, and you can break it out later

Things I don’t like

  • The unneccessary information and hints like, “visit settings to save time with keyboard shortcuts”. I’m a geek, the first thing I visited was the settings.
  • The primitive filtering. I am used to having all the power of procmail available to me (filtering by mailing list headers that ezmlm adds, for example). I can’t ever see this working.
  • Tags. I’m used to folders, and they are a vital part of what helps me mentally “switch hats” between job/open source/personal tech interest/private life. I don’t see how I could do that with tags.
  • No offline mode, or at least its not that clear to me how to get that.

All in all, it’s just very different, and I’m not actually looking for different. I suspect large parts of the world feel similarly and just want to keep their basic three-pane interface. Of course, there’s not that many people that process as much e-mail (and as weirdly) as I do, so I can understand how the GMail interface is not exactly optimized for people like me. I definitely would recommend it to, for example, my parents, if they hadn’t been using Outlook Express (and before that, Netscape Mail) and just want to keep using such an interface.

Google Calendar

Day view, week view, month view. Snappy interface. Search. Multiple calendars distinguished by color. iCal support. Creating an event in the right spot takes just a few seconds. I like.

I miss is the GTalk mini-window, that GMail does have, in the calendaring interface.

What would truly make this a killer app for me is if I could sync from Apple’s iCal to it, instead of just subscribe to Google Calendar from iCal. The reason for that is that I often jot things down while offline, and I’d want those automatically appearing (or at the click of a button) on google as soon as I had a network connection again.

Oh, and then of course I’d need Apple’s iCal to support SSL.

I would definitely recommend google calendar to anyone that needs a calendaring solution, especially if you want to use shared calendaring, and you don’t need to edit your calendar while offline.

Google Docs and spreadsheets

It works, and it autosaves your work, and it has revisions. I could never actually use this (since I need to work with all these big technical and/or legal MS Word documents which have comments, master pages, revision tracking, etc), but it is otherwise really nice for simpler things. Especially the shared editing.

This would be one of my favorite apps on the web, and I would probably actually use it the next time I need some loose planning for something (like a party or a conference call or whatever) if it weren’t for the existence of Dabble DB. Google’s solution works, but dabble is magic.

Integration between pieces

While everything except the page creator-generated website is available over SSL, the links between the pieces (i.e. when you’re in google calendar and you click on “gmail”) keep redirecting back to plain HTTP. In general, security seems, ehm, not paranoid enough for people like me.

While, wherever you are, there’s always a link to the other apps within reach in a consistent location, there’s still a highly modal way of working, where you have to mentally switch from writing an e-mail to writing a document. You can’t actually send a google docs document as an e-mail easily; you have to export it, save it locally, switch to gmail, create an e-mail, navigate to the file, upload the attachment, type the message, and send it. Hpff. Similarly, you can’t select a document in google docs and use “send to” (windows) or drag-and-drop (mac) to create an e-mail out of it.

This is really where either the microsoft solution (outlook, office, windows explorer integration, “start > my recent documents”) or a geeky solution (like me…mutt, SubEthaEdit, spotlight, locate/find/ls, office available when I need it) offers stellar productivity, and where trying to do it “in the web browser” still completely breaks down.

Managedness

Think of all the things you get for free in this solution! High availability (from any computer!), good backups (or so I assume), incredible search and filtering capabilities, real web-based collaborative editing. I think what it removes from the average computer user (or IT administrator) is the need to worry about many of these things. Once you get used to it, it works. And google has a proven record to keep things working.

It’s just that I already have these things…high availability (yay, svn), good backups (hurray for backuppc), great search (spotlight, mairix), better filtering (procmail, unix command line) and more mature collaborative editing (again svn, and SubEthaEdit)…and I can trust in myself instead of placing trust in a US company (with the US having laws I am not always a fan of).

Page Creator

It seems to work ok enough, and is easy enough to understand. It produces HTML 4.01 traditional which doesn’t validate. I didn’t spend much time with it; I prefer writing my XHTML-compliant HTML by hand…

Google Talk

We pretty much standardize on skype at work, and my non-geek friends pretty much standardize on MSN these days. Until they integrate GTalk with those closed silos, it is not that useful to me.

Start Page

Yawn. I’ve never seen one of these that I liked. I tried to use it, but quickly just dragged the service URLs into my firefox bookmarks toolbar.

Conclusion

Google’s current offering is easy to set up and easy to use for certain distinct, well-defined tasks. Yet, it is not secure enough for the paranoid (even if you trust google with your data). Nor does it offer productivity resembling anything a modern-day knowledge worker is used to.

As far as web-based solutions go…I like the e-mail offering from Yahoo! better. I like Dabble DB better for collaborative editing. Yet, google currently has the best calendaring solution I’ve seen, and is seemingly the fartest along with offering integration between all the different pieces.

The big plus to this offering, and google is getting this right in an amazingly intuitive way, is that it simplifies administration dramatically. For any small business IT administrator that is struggling to get MS Exchange and SharePoint up and running, going with the google setup will definitely be a breath of fresh air.

I won’t be switching to actual use of Google Apps any time soon, but I would definitely recommend small businesses, schools, families, and individuals to evaluate it and try it out. It is simple, it is intuitive, and it is free (or really cheap at $50/user/year if you want paid support and more storage).

Google Apps is serious competition for microsoft, even if perhaps not yet for the enterprise. Even if it doesn’t take off in a big way, this new solution will help to break the microsoft monopoly, and will get microsoft to produce better products at a lower price. Yay!

Replace maven with a shell script

One of the things I find myself trying to instill in a lot of our developers these days is that a little pragmatism can often go a long way.

By popular demand (really!), here’s my trivial shell script that pretends to be maven. For smallish projects and small sizes of your local maven repository, it is orders of magnitude faster than doing an actual maving run, and it has many other advantages over the “real” maven.

Of course, I don’t actually use this script (much). Lately I’ve been using Ant 1.7 with Ivy. Oh, And mod_perl‘s Apache::Test for TripleSoup.

#!/usr/bin/env bash

artifactId=`xmllint --noblanks project.xml |

        egrep -o '<id>[^>]+<\/id>' |

        sed -e 's/<id>//' -e 's/<\/id>//'`

groupId=`xmllint --noblanks project.xml |

        egrep -o '<groupId>[^>]+<\/groupId>' |

        sed -e 's/<groupId>//' -e 's/<\/groupId>//'`

currentVersion=`xmllint --noblanks project.xml |

        egrep -o '<currentVersion>[^>]+<\/currentVersion>' |

        sed -e 's/<currentVersion>//' -e 's/<\/currentVersion>//'`

shortDescription=`xmllint --noblanks project.xml |

        egrep -o '<shortDescription>[^>]+<\/shortDescription>' |

        sed -e 's/<shortDecription>//' \

            -e 's/<\/shortDescription>//'`

package=`xmllint --noblanks project.xml |

        egrep -o '<package>[^>]+<\/package>' |

        sed -e 's/<package>//' \

            -e 's/<\/package>//'`

organization=`xmllint --noblanks project.xml |

        grep -A5 '<organization>' |

        egrep -o '<name>[^>]+<\/name>' |

        sed -e 's/<name>//' \

            -e 's/<\/name>//'`for jar in `find $HOME/.maven/repository -name "*.jar"`; do

    CLASSPATH=$CLASSPATH:$jar

done

CLASSPATH=`pwd`/target/classes:`pwd`/target/test-classes:$CLASSPATH

export CLASSPATH

echo Building $artifactId-$currentVersion.jar...

rm -Rf target

mkdir -p target/classes

mkdir -p target/test-classes

cd src/java

javac -nowarn -Xlint:-deprecation -source 1.4 -target 1.4 \

        -d ../../target/classes \

        `find . -name '*.java'`

for dir in `find . -type d -not -path '*svn*'`; do

    mkdir -p ../../target/classes/$dir

done

cp -r `find . -type f -not -name '*.java' -not -path '*svn*'` \

        ../../target/classes

cd ../..

mkdir -p target/classes/META-INF

cp -f LICENSE* NOTICE* target/classes/META-INF 2>/dev/null

cat > target/classes/META-INF/MANIFEST.MF <<MFEND

Manifest-Version: 1.0

Created-By: Apache Maven Simulator 1.0

Extension-Name: $artifactId

Specification-Title: $shortDescription

Specification-Vendor: $organization

Specification-Version: $currentVersion

Implementation-Vendor: $organization

Implementation-Title: $package

Implementation-Version: $currentVersion

MFEND

cd target/classes

jar cf ../$artifactId-$currentVersion.jar *

cd ../..

echo Installing $artifactId-$currentVersion.jar...

mkdir -p $HOME/.maven/repository/$groupId/jars

cp target/$artifactId-$currentVersion.jar \

        $HOME/.maven/repository/$groupId/jars

echo done

RDF modelling at Joost: no bnodes

As I mentioned in a previous entry, Joost™ uses quite a bit of RDF. I’m sorry, but I’m not going to share our full data model with you (though we might do that in the future). All I want to try and do is highlight some basic choices that we (mostly Alberto) have made on how to model things using RDF.

Choice number one:

No bnodes

A blank node (definition) or “bnode” for short is when you have a subject in RDF that doesn’t have a ‘real’ URI.

Where do you use bnodes?

You encounter bnodes when modelling things in a ‘normal’ object-oriented fashion, and especially a lot in ‘normal’ modern XML. For example, the XML document

Listing 1
<!--[CDATA[

]]-->

might be turned into RDF as

Listing 2
<!--[CDATA[@prefix :  .

leo      isA               Person ;

         name              "Leo Simons" .

alberto  isA               Person ;

         name              "Alberto Reggiori" .

foo      isA               Group ;

         name              "Mentioned in article" ;

         containsPeople    ( leo alberto ) .]]-->

which is a special Notation3 (or Turtle) shorthand for

Listing 3
<!--[CDATA[@prefix :  .

leo      isA               Person ;

         name              "Leo Simons" .

alberto  isA               Person ;

         name              "Alberto Reggiori" .

foo      isA               Group ;

         name              "Mentioned in article" ;

         containsPeople    _:1 .

_:1      rdf:first         leo ;

         rdf:next          _:2 .

_:2      rdf:first         alberto ;

         rdf:next          rdf:nil .]]-->

_:1 and _:2 are bnodes. Doesn’t seem to be a problem with this, does there? (aside from rdf collections being cumbersome)

How does it look without bnodes?

Well, consider this alternative:

Listing 4
<!--[CDATA[@prefix :  .

leo      isA               Person ;

         name              "Leo Simons" .

alberto  isA               Person ;

         name              "Alberto Reggiori" .

foo      isA               Group ;

         name              "Mentioned in article" ;

         containsPerson    leo ;

         containsPerson    alberto .]]-->

It consists of less triples, obviously meaning less storage space, and, given the nature of RDF databases today, also better performance. As a data model grows in complexity, it seems that the percentage of bnodes will normally grow a bit as well, so the effect is more pronounced for lots of data.

But, more importantly, the software you have to write becomes more involved. Let’s investigate.

The effect of bnode use on source code

Here’s some imaginary java code (using jena) that prints certain data it finds in the model:

Listing 5

  List names = new ArrayList();

  for(RDFNode nameNode : m.listObjectsOfProperty(r, Example.name)) {

    if(!nameNode.isLiteral()) {

      continue;

    }

    Literal nameLiteral = (Literal)node.as(Literal.class);

    try {

      names.add(nameLiteral.getString());

    } catch(DatatypeFormatException e) {

    }

  }

  returns names.toArray(new String[names.size()]);

}private boolean hasType(Model m, Resource toCheck, Resource expectedType) {

  return m.contains(m, toCheck, RDF.isA, expectedType);

}

private void printHeader(Model m, Resource groupResource) {

  String[] groupName = getName(m, groupResource);

  if(groupName.length > 0) {

    for(String name : groupName.length) {

      System.out.println("Group name: " + name.getString());

    }

  } else {

    System.out.println("Group name: ");

  }

  System.out.println("----");

}

private void printName(Model m, Resource personResource) {

  String[] personName = getName(m, personResource);

  if(personName.length > 0) {

    for(String name : personName.length) {

      System.out.println("  " + personName.getString());

    }

  } else {

    System.out.println("  ");

  }

}

public void printInfoAboutGroup(Model m, URI groupId) {

  ValidationUtil.checkNotNull(m, "m");

  ValidationUtil.checkNotNull(groupId, "groupId");

Resource groupResource = m.getResource(groupId.toString());

  if(groupResource == null) {

    System.err.println("Warning: no such group: " + groupId.toString());

    return;

  }

if(!hasType(m, groupResource, Example.Group)) {

    System.err.println("Warning: not typed as a group: " + groupId.toString());

  }

printHeader(m, groupResource);

//

  // NOTE: for loop in a for loop in a for loop

  //

  for(RDFNode groupHead : m.listObjectsOfProperty(groupResource, Example.containsPeople) {

    if(!groupHead.canAs(Container.class)) {

      System.err.println("Warning: Group "+groupId.toString()+" containsPeople points to a non-container");

      continue;

    }

    Container container = groupHead.as(Container.class);

for(RDFNode peopleNode : container.iterator()) {

      if(!peopleNode.isResource()) {

        System.err.println("Warning: Group "+groupId.toString()+" rdf:next points to a literal: " +

          ((Literal)peopleNode.as(Literal.class)).getLexicalForm();

        continue;

      }

      Resource peopleResource = (Resource)peopleNode.as(Resource.class);

      if(!hasType(m, peopleResource, Example.Person)) {

        String identifier = (peopleResource.isAnon())?

            peopleResource.getId().getLabelString() :

            peopleResource.getURI();

        System.err.println("Warning: not typed as a person: " + identifier);

      }

      printName(peopleResource);

    }

  }

}

...]]>

Here’s the printInfoAboutGroup() method again, now for the RDF model structure from listing 4.

Listing 6

I suspect that, if you haven’t seen RDF-inspecting source code before, all of the above looks a little scary. There’s a load of looping and checking that you don’t have to take into account when using simple javabeans. This is the price to pay for the open world assumption, though of course a lot of it can be abstracted out in utility code a lot better than I’ve done above.

However, in the midst of all that java fluff, the difference should still be clear — Listing 6 has one nested for loop less than Listing 5. No matter how much you clean up this code, that fundamental difference remains, and, because of the open world assumption, it is rather more important…compare…

Conclusion

Because of the open-world assumption, making use of bnodes is very expensive when doing real-world software development. Therefore, bnodes should be avoided. Compare:

  • Object oriented world: foo.bar.getXyz() vs. foo.getBarXyz().
  • XML world: foo.getElementsByTagName("bar").getAttribute("xyz") vs. foo.getAttribute("barXyz")
  • RDF world: for(foo) { for(bar) { for(xyz) { ... }}} vs. for(foo) { for(barXyz) { ... }}.

You can forget all of the above, just remember these rules:

  • Don’t use RDF collections. Use one-to-many properties that result in “collections” instead.
  • If you need ordering, define the sorting algorithm instead of putting the ordering in your data.
  • If you have (sort-of) one-to-one relationships in your model, and one or both sides of the relationship is identified by a bnode, merge the concepts into one and distinguish using properties.

I upgraded to SVN 1.4

What should’ve been simple (“port install subversion“) took about two months. The reason for this is that the SVN working copy format changed in a backwards-incompatible fashion. The SVN team has always mentioned that the format would break it at some point, but many people just ignored those words of warning (mainly because there wasn’t exactly a convenient way to get the same information in another way with early subversion releases). So…

  • I had to buy an upgrade for IntelliJ IDEA, from version 5 to version 6, to use my IDE with SVN
    • (because IDEA uses SVNKit, a pure java replacement for the SVN client, written by a different developer team)
  • I had to fiddle for hours to get the security settings just right (I still haven’t found out exactly what broke, it has to do something with our certificate-based setup at work, I made sure to tell the SVN developers we employ around here since I don’t dare share details of our security measures; I hope one of them finds time to fix it)
  • I had to upgrade my custom blog software, xblog, since it, like SVNKit, also parsed the contents of .svn/entries. Here’s the patch.

Needless to say, I got consistently more unhappy during all of this. What I hated the most was having to fork out a bunch of money (of course, the company paid, but I’d rather see that cash go to the soccer table so badly desired by some) just to keep my trusted developer toolset working.

It’s also rather painful to note that the XML support in the subversion client still isn’t quite what you would expect out of a ‘normal’ XML tool. For example:

$ svn info --xml foo

<?xml version="1.0"?>

<info>

foo:  (Not a versioned resource)</info>

$ echo $?

0

$ # note how the error message does go to stderr properly

$ svn info --xml foo 2>/dev/null

<?xml version="1.0"?>

<info>

</info>

Much better would be if, in this case, --xml would return some kind of <error>, or at least would set the exit code to something that readily enables you to see there might be a problem. It doesn’t really matter for my little xblog script, but for a tool like gump, it seems likely to cause problems one way or another.

Make Rocks (too)

William writes:

Nothing made this more clear than working with Rake, Make, and Ant—all in the same day. Make is ridiculous, Ant is reasonable, and Rake rocks.

He has a nice example of how to do flash-y things without flash and how to use rake to do cool things. However, I think it doesn’t justify his “Make is ridiculous” statement at all. Here’s a solution to his problem using make and bash (might be proper sh, who knows):

Makefile

IMAGES:=$(wildcard resources/images/*.jpg) $(wildcard resources/images/*.png)resources/images.xml: $(IMAGES)

 bash resources/images.xml.sh $(IMAGES) > resources/images.xml

images.xml.settings.sh

SWF_VERSION=7

SWF_WIDTH=450

SWF_HEIGHT=550

SWF_BACKGROUND="#ffffff"

SWF_FRAMERATE=24

images.xml.sh

# load settings

. images.xml.settings.sh# header

cat << END

<?xml version="1.0" encoding="iso-8859-1"?>

<movie version="$SWF_VERSION"

       width="$SWF_WIDTH"

       height="$SFW_HEIGHT"

       framerate="$SWF_FRAMERATE">

  <background color="$SWF_BACKGROUND"/>

  <frame>

    <library>

END

# line for each clip

for fname in $*; do

  name=`basename "$fname" | sed -r 's/\.(png|jpg)$//'`

  echo "      <clip id=\"$name\" import=\"$fname\"/>"

done

# footer

cat << END

    </library>

  </frame>

</movie>

END

I haven’t tested it elsewhere but on my laptop, but I’m reasonably confident this setup will work by default on just about all linux/unix/mac os x machines out there, including ones from 10 years ago. It also doesn’t require one to learn a new language (ruby) or a new domain-specific language (rake) if you’re an “old fart”, integrates easily with most existing build systems one can imagine, and has about the same number of lines of code.