Writing one line shell scripts with bash

If you are using ruby with bundler and Gemfiles properly, you probably know about running commands with bundle exec. However, sometimes this does not get you quite the right results, in particular if your Gemfile is not quite precise enough.

For example, I had an issue with cucumber + autotest + rails where I had both rails 3.1 and rails 3.2 apps using the same RVM. Since I was confused, and in a hurry, I figured one brute force option to unconfuse me would be to simply remove all old versions of all gems from the environment. I did just that, and thought I’d explain the process of incrementally coming up with the right shell script one liner.

First things first, let’s figure out what we have installed:

$ gem
...
  Usage:
...
    gem command [arguments...] [options...]
...
  Examples:
...
    gem list --local
...

Ok, I guess we want gem list:

$ gem list
*** LOCAL GEMS ***

actionmailer (3.2.2, 3.1.1)
...
ZenTest (4.7.0)

actionmailer is part of rails, and we can see there are two versions installed. Let’s figure out how to remove one of them…

$ gem help commands
GEM commands are:
...
    uninstall         Uninstall gems from the local repository
...
$ gem uninstall --help
Usage: gem uninstall GEMNAME [GEMNAME ...] [options]

  Options:
...
    -I, --[no-]ignore-dependencies   Ignore dependency requirements while
                                     uninstalling
...
    -v, --version VERSION            Specify version of gem to uninstall
...

Great. Let’s try it:

$ gem uninstall actionmailer -v 3.1.1

You have requested to uninstall the gem:
	actionmailer-3.1.1
rails-3.1.1 depends on [actionmailer (= 3.1.1)]
If you remove this gems, one or more dependencies will not be met.
Continue with Uninstall? [Yn]  y
Successfully uninstalled actionmailer-3.1.1

Ok, so we need to have it not ask us that question. From studying the command line options, the magic switch is to add -I.

So once we have pinpointed a version to uninstall, our command becomes something like gem uninstall -I $gem_name -v $gem_version. Now we need the list of gems to do this on, so we can run that command a bunch of times.

We’ll now start building our big fancy one-line script. I tend to do this by typing the command, executing it, and then pressing the up arrow to go back in the bash history to re-edit the same command.

Looking at the gem list output again, we can see that any gem with multiple installed versions has a comma in the output, and gems with just one installed version do not. We can use grep to filter the list:

$ gem list | grep ','
actionmailer (3.2.2, 3.1.1)
...
sprockets (2.1.2, 2.0.3)

Great. Now we need to extract out of this just the name of the gem and the problematic version. One way of looking at the listing is as a space-seperated set of fields: gemname SPACE (version1, SPACE version2), so we can use cut to pick fields one and three:

$ gem list | grep ',' | cut -d ' ' -f 1,3
...
gherkin 2.5.4)
jquery-rails 2.0.1,
...

Wait, why does the jquery-rails line look different?

$ gem list | grep ',' | grep jquery-rails
jquery-rails (2.0.2, 2.0.1, 1.0.16)

Ok, so it has 3 versions. Really in this instance we need to pick out fields 3,4,5,… and loop over them, uninstalling all the old versions. But that’s a bit hard to do. The alternative is to just pick out field 3 anyway, and run the same command a few times. The first time will remove jquery-rails 2.0.1, and then the second time the output will become something like

jquery-rails (2.0.2, 1.0.16)

and we will remove jquery-=rails 1.0.16.

We’re almost there, but we still need to get rid of the ( and , in our output.

$ gem list | grep ',' | cut -d ' ' -f 1,3 | sed 's/,//' | sed 's/)//'
...
childprocess 0.2.2
...
rack 1.3.5

Looking nice and clean.

To run our gem uninstall command, we know we need to prefix the version with -v

$ gem list | grep ',' | cut -d ' ' -f 1,3 | sed 's/,//' | sed 's/)//' | sed 's/ / -v /'
...
childprocess -v 0.2.2
...

Ok, so now at the start of the list we want to put gem uninstall -I . We can use the regular expression ‘^’ to match the beginning of the line. We’ll need sed to evaluate our regular expressions…

$ gem list | grep ',' | cut -d ' ' -f 1,3 | sed 's/,//' | sed 's/)//' | sed 's/ / -v /' | sed -r 's/^/gem uninstall -I/'
sed: illegal option -- r
usage: sed script [-Ealn] [-i extension] [file ...]
       sed [-Ealn] [-i extension] [-e script] ... [-f script_file] ... [file ...]

Ugh. -r is the switch used in the GNU version of sed. I’m on Mac OS X which comes with BSD sed, which uses -E.

$ gem list | grep ',' | cut -d ' ' -f 1,3 | sed 's/,//' | sed 's/)//' | sed 's/ / -v /' | sed -E 's/^/gem uninstall -I /'
...
gem uninstall -I childprocess -v 0.2.2
...

Ok. That looks like its the list of commands that we want to run. Since the next step will be the big one, before we actually run all the commands, let’s check that we can do so safely. A nice trick is to echo out the commands.

$ gem list | grep ',' | cut -d ' ' -f 1,3 | sed 's/,//' | sed 's/)//' | sed 's/ / -v /' | sed -E 's/^/echo gem uninstall -I /' | sh
gem uninstall -I childprocess -v 0.2.2

Ok, so evaluating through sh works. Let’s remove the echo:

$ gem list | grep ',' | cut -d ' ' -f 1,3 | sed 's/,//' | sed 's/)//' | sed 's/ / -v /' | sed -E 's/^/gem uninstall -I /' | sh
...
Successfully uninstalled childprocess-0.2.2
Removing rails
Successfully uninstalled rails-3.1.1
...

I have no idea why that rails gem gets the extra line of output. But it looks like it all went ok. Let’s remove the ‘sh’ again and check:

$ gem list | grep ',' | cut -d ' ' -f 1,3 | sed 's/,//' | sed 's/)//' | sed 's/ / -v /' | sed -E 's/^/gem uninstall /'
gem uninstall jquery-rails -v 1.0.16

Oh, that’s right, the jquery-rails gem had two versions. Let’s uninstall that, then. We press the up arrow twice to get back the command line that ends with | sh, and run that. Great: we’re all done!

Let’s look at the final command again:

gem list | grep ',' | cut -d ' ' -f 1,3 | sed 's/,//' | sed 's/)//' | sed 's/ / -v /' | sed -E 's/^/gem uninstall -I /' | sh
  1. gem list shows us the locally installed gems
  2. | grep ',' limits that output to lines with a comma in it, that is, gems with multiple versions installed
  3. | cut -d ' ' -f 1,3 splits the remaining lines by spaces, then picks fields one and three
  4. | sed 's/,//' removes all , from the output
  5. | sed 's/)//' removes all ) from the output
  6. | sed 's/ / -v /' replaces the (one remaining) space with -v
  7. | sed -E 's/^/gem uninstall -I /' puts gem uninstall -I at the start of every line
  8. | sh evaluates all the lines in the output as commands

Note that, as a reusable program or installable shell script, this command really isn’t good enough. For example:

  • It does not check the exit codes / statuses of the commands ran, instead it just assumes they run successfully
  • It assumes the output of gem list will always match our expectations (for example, that the output header does not have a comma in it, or that gem names or versions cannot contain space, comma, or -- this may be true but I wouldn't know for sure)
  • It assumes that -I is the only switch needed to prevent gem list from ever asking us questions

The best approach for a reusable command would probably be to write a script in ruby that used the RubyGems API. However, that would be much more work than writing our one-liner, which is “safe enough” for this use-once approach.

(For the record, this didn’t solve my rails + cucumber woes. Instead, it turns out I had run bundle install --without test at some point, and bundler remembers the --without. So I needed rm Gemfile.lock; bundle install --without ''.)

Setting up a new mac

I got a new macbook pro. Rather than migrate my settings automatically I decided I’d go for a little spring cleaning by buillding it up from scratch.

Initial setup stuff

  • start software update
  • go through system preferences, add appropriate paranoia
  • open keychain utility, add appropriate paranoia
  • run software update
  • insert Mac OS X CD, install XCode
  • remove all default bookmarks from safari
  • open iTunes, sign in, turn off annoying things like ping
  • download, extract, install, run once, enter serial (where needed)
    • Chrome
    • Firefox
      • set master password
    • Thunderbird
      • set master password
    • Skype
    • OmniGraffle
    • OmniPlan
    • TextMate
    • VMware Fusion
    • Colloquy
    • IntelliJ IDEA; disable most enterpriseish plugins; then open plugin manager and install
      • python plugin
      • ruby plugin
      • I have years of crap in my intellij profile which I’m not adding to this machine
    • VLC
    • Things
  • download stunnel, extract, sudo mkdir /usr/local && sudo chown $USER /usr/local, ./configure --disable-libwrap, make, sudo make install
  • launch terminal, customize terminal settings (font size, window size, window title, backgroundc color) and save as default

Transfer from old mac

  • copy over secure.dmg, mount, set to automount on startup
  • import certs into keychain, firefox, thunderbird
  • set up thunderbird with e-mail accounts
  • copy over ~/.ssh and ~/.subversion
  • set up stunnel for colloquy, run colloquy, add localhost as server, set autojoin channels, change to tabbed interface
  • copy over keychains
  • copy over Office, run office setup assistant
  • copy over documents, data
  • copy over virtual machines
  • open each virtual machine, selecting “I moved it”
  • copy over itunes library
  • plug in, pair and sync ipad and iphone

Backup

  • set up time machine

Development packages

I like to know what is in my /usr/local, so I don’t use MacPorts or fink or anything like it.

  • download and extract pip, python setup.py install
  • pip install virtualenv Django django-piston south django-audit-log django-haystack httplib2 lxml
    • Ah mac 10.6 has a decent libxml2 and libxslt so lxml just installs. What a breeze.
  • download and install 64-bit mysql .dmg from mysql.com, also install preference pane
  • Edit ~/.profile, set CLICOLOR=yes, set PATH, add /usr/local/bin/mysql to path, source ~/.profile
    • again, I have years of accumulated stuf in my bash profile that I’m dumping. It’s amazing how fast bash starts when it doesn’t have to execute a gazillion lines of shell script…
  • code>pip install MySQL-python, install_name_tool -change libmysqlclient.16.dylib /usr/local/mysql/lib/libmysqlclient.16.dylib /Library/Python/2.6/site-packages/_mysql.so
  • take care of the many dependencies for yum, mostly standard ./configure && make && make install of
    • pkg-config
    • gettext
    • libiconv
    • gettext again, --with-libiconv-prefix=/usr/local
    • glib, --with-libiconv=gnu
    • popt
    • db-5.1.19.tar.gz, cd build_unix && ../dist/configure --enable-sql && make && make install, cp sql/sqlite3.pc /usr/local/lib/pkgconfig/, cd /usr/local/BerkeleyDB.5.1/include && mkdir db51 && cd db51 && ln -s ../*.h .
    • neon, ./configure --without-gssapi --with-ssl
    • rpm (5.3), export CPATH=/usr/local/BerkeleyDB.5.1/include, sudo mkdir /var/local && sudo chown $USER /var/local, ./configure --with-db=/usr/local/BerkeleyDB.5.1 --with-python --disable-nls --disable-openmp --with-neon=/usr/local && make && make install
    • pip install pysqlite pycurl
    • urlgrabber, sudo mkdir /System/Library/Frameworks/Python.framework/Versions/2.6/share && chown $USER sudo mkdir /System/Library/Frameworks/Python.framework/Versions/2.6/share && python setup.py install
    • intltool
    • yum-metadata-parser, export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/usr/lib/pkgconfig; python setup.py install
  • install yum. Yum comes without decent install scripts (the idea is you install it from rpm). I did some hacks to get somewhere reasonable:
    cat Makefile | sed -E -e 's/.PHONY(.*)/.PHONY\1 install/' > Makefile.new
    mv Makefile.new Makefile
    cat etc/Makefile | sed -E -e 's/install -D/install/' > etc/Makefile.new
    mv etc/Makefile.new etc/Makefile
    make DESTDIR=/usr/local PYLIBDIR=/Library/Python/2.6 install
    mv /usr/local/Library/Python/2.6/site-packages/* /Library/Python/2.6/site-packages/
    mv /usr/local/usr/bin/* /usr/local/bin/
    mkdir /usr/local/sbin
    mv /usr/local/usr/sbin/* /usr/local/sbin/
    rsync -av /usr/local/usr/share/ /usr/local/share/
    rm -Rf /usr/local/usr/
    cat /usr/local/bin/yum | sed -E -e 's|/usr/share/yum-cli|/usr/local/share/yum-cli|' > /tmp/yum.new
    mv /tmp/yum.new /usr/local/bin
    chmod +x /usr/local/bin/yum
    

That’s how far I got last night, resulting in 86GB of disk use (most of which is VMs and iTunes library), just enough to be productive on my current $work project. I’m sure there’s weeks of tuning in my future.

Setting up vmware fusion, puppet and gitolite

The goal today is to start a rather ‘complete’ local dev environment on my laptop, from scratch, suitable for playing with continuous integration and deployment tools. My host is a mac. I’ll use VMWare Fusion for virtualization, running CentOS 5.5 guests. I’ll be setting up poor man’s vm cloning, and after that gitolite for version control, puppet for configuration management.

The main goal of all this is actually playing around with configuration management, so I’m not going to bother with backups or redundancy or any high availability config, however I will be deploying some basic java webapps and some basic PHP frontends to exercise the config management put in place.

Unfortunately, bootstrapping such a setup is a lot of work so I’ll write down a detailed installation log. This way, it may be less work next time, as per my own advice on bootstrapping. Maybe it helps you, too 🙂

1. Virtual Machine Setup

All these things will be deployed in virtual machines so as to match the production environment as much as possible. So the first step is to make it easy to create identical virtual machines that can be spun up and down on demand. Probably the best way is to use cobbler, but here’s how I did things:

1.1 Create a base vanilla VM

1.2 Create a setup to be able to clone VMs

As part of an effort to learn a bit about vmware, I wrote a simple script, that goes into ~/Documents/Virtual Machines:

#!/usr/bin/env python
# make-sandbox.py: clones virtual machines

import sys
import re
import os
import shutil

def usage():
    print "./make-sandbox.py [name]"
    print "  name should match ^[a-z][A-Z0-9]{1,15}$"

def fail(msg):
    print msg
    sys.exit(1)

def isVmConfigFile(name):
    return re.match("^.*?vanilla-sandbox\.(vmdk|vmsd|vmx|vmxf)$",
            name)

def writeNewConfig(src, dst, renamer):
    s = open(src, 'r').read()
    s = renamer(s)
    print "writing custom", dst
    f = open(dst, 'w')
    f.write(s)
    f.close()

def sandboxCopyTree(src, dst, renamer):
    names = os.listdir(src)

    print "mkdir", dst
    os.makedirs(dst)
    errors = []
    for name in names:
        srcname = os.path.join(src, name)
        dstname = renamer(os.path.join(dst, name))
        
        try:
            if os.path.islink(srcname):
                linkto = os.readlink(srcname)
                os.symlink(linkto, dstname)
                print "ln -s", srcname, dstname
            elif os.path.isdir(srcname):
                sandboxCopyTree(srcname, dstname, renamer)
            elif srcname.endswith(".log"):
                continue
            elif isVmConfigFile(srcname):
                writeNewConfig(srcname, dstname, renamer)
            else:
                print "cp", srcname, dstname
                shutil.copy2(srcname, dstname)
        except (IOError, os.error), why:
            errors.append((srcname, dstname, str(why)))
        # catch the Error from the recursive copytree so
        # that we can continue with other files
        except shutil.Error, err:
            errors.extend(err.args[0])
    try:
        shutil.copystat(src, dst)
    except OSError, why:
        errors.extend((src, dst, str(why)))
    if errors:
        raise shutil.Error(errors)

def sandboxRenameMaker(name):
    def renamer(dst):
        return dst.replace("vanilla", name)
    return renamer

if __name__ == "__main__":
    if len(sys.argv) != 2:
        usage()
        sys.exit(1)
    
    name = sys.argv[1]
    if not re.match('^[a-z][a-z0-9]{1,15}$', name):
        usage()
        sys.exit(1)

    basename = "%s-sandbox" % (name,)
    if os.path.exists(basename):
        fail("%s already exists?" % (basename,))
    
    sandboxCopyTree("vanilla-sandbox", basename,
            sandboxRenameMaker(name))

The script is pretty basic and hardcodes some stuff it probably shouldn’t, but it works ok for me. You can find various other imperfect scripts that do similar things on the vmware fusion forums. This script expects a directory ./vanilla-sandbox containing a vm named vanilla-sandbox, and the intended name of the new virtual machine as an argument.

1.3 Create some VMs

Invoke the new script like so:

cd ~/Documents/Virtual\ Machines && ./make-sandbox.py puppet

Which results in a virtual machine named puppet-sandbox. The virtual machine is not ready for use yet. Additional steps:

  • In VMWare Fusion, select File > Open..., then open the new VM.
  • Start the VM. VMWare will ask whether you moved or copied the VM. Select “I copied…”
  • log in as root
  • vi /etc/sysconfig/network, change hostname to match vm name (i.e. puppet.sandbox)
  • vi /etc/hosts, change hostname to match vm name
  • hostname [hostname], change hostname for the running system

Yes, this manual edit of the network settings is kind of icky, but I looked at how cobbler integrates vmware and koan and it just seemed a bit too much work for me right now. Perhaps I’ll look at that later.

Specifically for puppet, I want the machine running it to have a static IP, so I can put a static entry into /etc/hosts on the guest OS and have that always work. So, for the puppet machine, network config gets an extra step:

cp /etc/sysconfig/network-scripts/ifcfg-eth0
        /etc/sysconfig/network-scripts/ifcfg-eth0.bak
cat >/etc/sysconfig/network-scripts/ifcfg-eth0 <<END
# Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE]
DEVICE=eth0
BOOTPROTO=static
IPADDR=172.16.64.3
NETMASK=255.255.255.0
NETWORK=172.16.64.0
BROADCAST=172.16.64.255
ONBOOT=yes
GATEWAY=172.16.64.2
END
vi /etc/resolv.conf (172.16.64.2 is DNS)
service network restart

Then, on the host, echo "172.16.64.3 puppet.sandbox puppet" >> /etc/hosts.

Note 172.16.64.x is the default network used by vmware fusion NAT, vmnet8. You can find these details in /Library/Application Support/VMware Fusion/vmnet8/dhcpd.conf, which I believe really ought to learn about this, too:

host puppet {
    hardware ethernet 00:0C:29:3E:FC:56;
    fixed-address 172.16.64.3;
}

where that ethernet address is generated by vmware fusion, and you can find it with cat ~/Documents/Virtual Machines/puppet-sandbox/puppet-sandbox.vmx | grep generatedAddress. Restart vmware’s networking fanciness with /Library/Application Support/VMware Fusion/boot.sh --restart. Now restart networking on the puppet guest and check it’s working ok:

/etc/init.d/network restart
ping -c 1 www.google.com

2. Bootstrapping services

Once you have yourselves some base VMs one way or another, the next step is to get the combo of puppet and gitolite up properly. Normally these really need dedicated machines but I’m trying to conserve RAM so they’ll have to fit together on one VM for now.

2.1 Basic puppet server install

This bit is very easy:

yum install puppet-server ruby-shadow
mkdir -p /etc/puppet/manifests
puppetmaster --genconfig > /etc/puppet/puppet.conf
cat >/etc/puppet/manifests/site.pp <<END

file { "/etc/sudoers":
  owner => root, group => root, mode => 440
}

END
service puppetmaster start
puppetd --test

2.2 Basic gitolite install

We’ll actually use this puppet to install gitolite for us, then move the puppet config into gitolite once it’s set up.

Note that gitolite currently needs git 1.6.2+, so you need to get that from somewhere, it isn’t currently in CentOS 5 or EPEL. For this reason, I added the webtatic yum repo config earlier. Probably not a good idea for a production environment!

Here’s all the bits and pieces to get gitolite set up:

yum install git

# move /etc/sudoers resource to a module
cd /etc/puppet
mkdir -p /etc/puppet/modules/sudo/manifests
cat > /etc/puppet/modules/sudo/manifests/init.pp <<END
class sudo {
    file { "/etc/sudoers":
      owner => root, group => root, mode => 440
    }
}
END

# create a gitolite module
mkdir -p /etc/puppet/modules/gitolite/{manifests,files}

cat > /etc/puppet/modules/gitolite/files/install-gitolite.sh <<END
#!/usr/bin/env bash
# initial system install of gitotis. Run as root.

set -e

cd /home/git

if [[ ! -d "gitolite-source" ]]; then
    git clone git://github.com/sitaramc/gitolite gitolite-source
fi
cd gitolite-source
git checkout v1.5.8
mkdir -p /usr/local/share/gitolite/conf
        /usr/local/share/gitolite/hooks
src/gl-system-install /usr/local/bin
        /usr/local/share/gitolite/conf
        /usr/local/share/gitolite/hooks
END

cat > /etc/puppet/modules/gitolite/files/setup-gitolite.sh <<END
#!/usr/bin/env bash
# initial for-gitolite-user setup of gitolite. Run as gitolite.

set -e

/usr/local/bin/gl-setup /home/git/lsimons.pub
END

# note: truncated ssh key for blog post
cat > /etc/puppet/modules/gitolite/files/lsimons.pub <<END
ssh-rsa AAAA...== lsimons@...
END

cat > /etc/puppet/modules/gitolite/files/gitolite-rc <<END
\$GL_PACKAGE_CONF = '/usr/local/share/gitolite/conf';
\$GL_PACKAGE_HOOKS = '/usr/local/share/gitolite/hooks';
\$REPO_BASE="repositories";
\$REPO_UMASK = 0077;         # gets you 'rwx------'
\$PROJECTS_LIST = \$ENV{HOME} . "/projects.list";
\$GL_ADMINDIR=\$ENV{HOME} . "/.gitolite";
\$GL_LOGT="\$GL_ADMINDIR/logs/gitolite-%y-%m.log";
\$GL_CONF="\$GL_ADMINDIR/conf/gitolite.conf";
\$GL_KEYDIR="\$GL_ADMINDIR/keydir";
\$GL_CONF_COMPILED="\$GL_ADMINDIR/conf/gitolite.conf-compiled.pm";
\$GIT_PATH="";
\$GL_BIG_CONFIG = 0;
\$GL_NO_DAEMON_NO_GITWEB = 0;
\$GL_NO_CREATE_REPOS = 0;
\$GL_NO_SETUP_AUTHKEYS = 0;
\$GL_GITCONFIG_KEYS = "";
\$HTPASSWD_FILE = "";
\$RSYNC_BASE = "";
\$SVNSERVE = "";
\$GL_WILDREPOS = 0;
\$GL_WILDREPOS_PERM_CATS = "READERS WRITERS";
1;
END

cat > /etc/puppet/modules/gitolite/manifests/init.pp <<END
class gitolite {
    package { git:
        ensure => latest
    }
    
    group { git:
        ensure => present,
        gid => 802
    }
    
    user { git:
        ensure => present,
        gid => 802,
        uid => 802,
        home => "/home/git",
        shell => "/bin/bash",
        require => Group["git"]
    }
    
    file {
        "/home/git":
            ensure => directory,
            mode => 0750,
            owner => git,
            group => git,
            require => [User["git"], Group["git"]];

        "/home/git/install-gitolite.sh":
            ensure => present,
            mode => 0770,
            owner => git,
            group => git,
            require => File["/home/git"],
            source => "puppet:///modules/gitolite/install-gitolite.sh";
            
        "/home/git/setup-gitolite.sh":
            ensure => present,
            mode => 0770,
            owner => git,
            group => git,
            require => File["/home/git"],
            source => "puppet:///modules/gitolite/setup-gitolite.sh";
            
        "/home/git/lsimons.pub":
            ensure => present,
            mode => 0660,
            owner => git,
            group => git,
            require => File["/home/git"],
            source => "puppet:///modules/gitolite/lsimons.pub";
        
        "/home/git/.gitolite.rc":
            ensure => present,
            mode => 0660,
            owner => git,
            group => git,
            require => File["/home/git"],
            source => "puppet:///modules/gitolite/gitolite-rc";
    }
    
    exec {
        "/home/git/install-gitolite.sh":
            cwd => "/home/git",
            user => root,
            require => [
                        File["/home/git/install-gitolite.sh"],
                        Package["git"]
                ];

        "/home/git/setup-gitolite.sh":
            cwd => "/home/git",
            user => git,
            environment => "HOME=/home/git",
            require => [
                    Exec["/home/git/install-gitolite.sh"],
                    File["/home/git/setup-gitolite.sh"],
                    User["git"]
                ]
    }
}
END

# update the site config to use the sudo and gitolite modules
cat > /etc/puppet/manifests/modules.pp <<END
import "sudo"
import "gitolite"
END

cat > /etc/puppet/manifests/nodes.pp <<END
node basenode {
    include sudo
}

node "puppet.sandbox" inherits basenode {
    include gitolite
}
END

cat > /etc/puppet/manifests/site.pp <<END
import "modules"
import "nodes"
END

# invoke puppet once to apply the new config
puppetd -v --test

So now we have gitolite installed on the server. So far so good.

2.3 Creating an scm git repo

I now need a repository in which to put the puppet configs. I was originally planning to have a repo ‘scm’ and a directory ‘puppet’ within it, so that I could have /etc/scm, with /etc/puppet a symlink to /etc/scm/puppet. It turns out puppet doesn’t support symlinks for /etc/puppet, so I ended up fiddling about a bit…

# on client
git clone git@puppet:gitolite-admin
cd gitolite-admin
cat >>conf/gitolite.conf <<END

repo    scm
        RW+     =   lsimons

END
git add conf/gitolite.conf 
git commit -m "Adding 'scm' repo"
git push origin master
cd ..

2.4 Making puppet use the config from the scm repo

First we need to get the existing config into version control:

mkdir scm
cd scm
git init
cat >> .git/config <<END
[remote "origin"]
    url = git@puppet:scm
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
    remote = origin
    merge = refs/heads/master
END
scp -r root@puppet:/etc/puppet/* .
git add *
git commit -m "Check in initial puppet config"
git push --all

Next, get the config out of version control and underneath puppet, and automate this process:

# install from-git puppet config on server

cd /etc
mv puppet puppet.bak
mkdir puppet
chown git puppet
cd puppet
sudo -u git git clone file:///home/git/repositories/scm.git puppet

# install post-receive hook to update /etc/puppet after a push

su - git
cat > /home/git/repositories/scm.git/hooks/post-receive <<END
#!/usr/bin/env bash
(cd /etc/puppet; env -i git pull -q origin master)
END
chmod u+x /home/git/repositories/scm.git/hooks/post-receive
exit

That little bit of env -i git inside that hook had me baffled for a bit. It turns out that I needed to empty the environment before invoking git from inside of a hook, because otherwise it’ll pick up the GIT_DIR variable. D’oh!

With this config re-set up, there should be effectively 0 changes when we run puppet. Let’s check:

puppet /etc/puppet/manifests/site.pp

2.5 On (not) putting all the puppet.sandbox config in puppet

Note that installing the post-receive hook from the previous step is not puppeted. The reason for this is one of synchronization. For example, if puppet somehow creates that post-receive file before the scm repository exists, gitolite will complain. It seems easier to have puppet not touch things managed by gitolite and vice versa.

Similarly, the installation of puppet itself is not puppeted, leaving the configuration of puppet.sandbox not something that can be completely automatically rebuilt.

Instead, rebuilding this box should be done by first re-following the instructions above, and then restoring the contents of the git@puppet:gitolite-admin and git@puppet:scm repositories from their current state (or latest backup). For my current purposes, that’s absolutely fine.

3. Setting up puppet dashboard

I also had a look at installing puppet dashboard. Because I know ruby and rails and gems can be a big dependency hell I figured I didn’t even want to try it in a VM, and instead I “just” got it running on my mac.

3.1 MySQL, ruby and mac os x

Puppet dashboard is built using ruby on rails and suggests using mysql for persistence (in retrospect I should not have listened and used sqlite :-)). Ruby on Rails apparently accesses MySQL through the mysql gem. The MySQL gem has to link against both the native mysql library and the native ruby library. Fortunately, I’m aware enough of the potential pain that I tend to carefully install the most compatible version of systems like this:

$ file `which ruby`
/usr/local/bin/ruby: Mach-O executable i386
$ file `which mysql`
/usr/local/mysql/bin/mysql: Mach-O executable i386
$ ruby --version
ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-darwin9.7.0]
$ mysql --version
mysql  Ver 14.14 Distrib 5.1.31,
        for apple-darwin9.5.0 (i386) using readline 5.1

You’d hope that the mysql gem build picks up on all this ok, but that’s not quite the case. Instead, you really have to be quite explicit:

vi /usr/local/lib/ruby/1.8/i686-darwin9.7.0/rbconfig.rb
# change to CONFIG["ARCH_FLAG"] = "-arch i386"

sudo gem uninstall mysql
sudo env ARCHFLAGS="-arch i386" gem install --verbose \
    --no-rdoc --no-ri mysql \
    -- --with-mysql-dir=/usr/local \
    --with-mysql-config=/usr/local/mysql/bin/mysql_config
cd /usr/local/mysql/lib/
sudo ln -s mysql .
cd .

3.2 Look, ma, it’s a rails app

After this, fortunately it’s easy again.

tar zxf ...puppet-dashboard...
mv ... ~/puppet-dashboard
cd ~/puppet-dashboard
rake RAILS_ENV=production db:create
rake RAILS_ENV=production db:migrate
./script/server -e production
open http://localhost:3000/

Works like a charm. To get some data to see requires tweaking the puppet VM:

cd /Users/lsimons/puppet-dashboard/ext/puppet
vi puppet_dashboard.rb
# change HOST to 172.16.64.1
ssh root@puppet mkdir -p /var/lib/puppet/lib/puppet/reports
scp puppet_dashboard.rb root@puppet:/var/lib/puppet/lib/puppet/reports/
exit

cd ~/dev/scm/
vi puppet/puppet.conf
# report = true for [puppetd]
# reports = puppet_dashboard for [puppetmasterd]
git add puppet/puppet.conf
git commit -m "Enable reporting"
git push

4. Recap

So now we have:

  • A working, documented, repeatable process for creating new VMs
  • A working, documented, repeatable process for bootstrapping puppet
  • A neat version-controlled way of changing the puppet config
  • An installation of a puppet master that serves up the latest config
  • A puppeted installation of gitosis
  • A not-so-great but working installation of puppet dashboard
  • A few more VMs to configure

PowerDNS 29.2.22 on Mac OS X 10.5.8

Based on this hint and the official docs I got PowerDNS running on my mac.

Prerequisites

  • mac os x developer tools
  • mysql 5.0 or later (I’m using 5.1.31-osx10.5-x86)

Installation steps

  • Download boost library (I’m using 1_42_0) and extract
  • Download PowerDNS source distribution (I’m using 29.2.22) and extract
  • Compile and install:
$ CXXFLAGS="-I/Users/lsimons/Downloads/boost_1_42_0 -DDARWIN" ./configure \
    --with-mysql=/usr/local/mysql-5.1.31-osx10.5-x86 \
    --with-mysql-includes=/usr/local/mysql-5.1.31-osx10.5-x86/include \
    --without-pgsql \
    --without-sqlite \
    --without-sqlite3 \
    --prefix=/usr/local/pdns-2.9.22
$ make
$ sudo make install
$ cd /usr/local/pdns-2.9.22/etc
$ sudo cp pdns.conf-dist pdns.conf
$ vi pdns.conf
# look for the line #launch, just below add into pdns.conf:
#   launch=gmysql
#   gmysql-host=127.0.0.1
#   gmysql-user=root
#   gmysql-dbname=pdnstest
$ cd ../bin
$ sudo cp /Users/lsimons/Downloads/pdns-2.9.22/pdns/pdns .
$ sudo cp /Users/lsimons/Downloads/pdns-2.9.22/pdns/precursor .

Set up mysql database

Create pdns.sql:

CREATE TABLE domains (
    id              INT UNSIGNED NOT NULL PRIMARY KEY auto_increment,
    name            VARCHAR(255) NOT NULL,
    master          VARCHAR(128) DEFAULT NULL,
    last_check      INT DEFAULT NULL,
    type            VARCHAR(6) NOT NULL,
    notified_serial INT DEFAULT NULL, 
    account         VARCHAR(40) DEFAULT NULL,

    UNIQUE INDEX name_index (name)
) ENGINE=InnoDB;

CREATE TABLE records (
    id              INT UNSIGNED NOT NULL PRIMARY KEY auto_increment,
    domain_id       INT DEFAULT NULL,
    name            VARCHAR(255) DEFAULT NULL,
    type            VARCHAR(6) DEFAULT NULL,
    content         VARCHAR(255) DEFAULT NULL,
    ttl             INT DEFAULT NULL,
    prio            INT DEFAULT NULL,
    change_date     INT DEFAULT NULL,

    INDEX rec_name_index (name),
    INDEX nametype_index (name, type),
    INDEX domain_id (domain_id)
) ENGINE=InnoDB;

create table supermasters (
    ip              VARCHAR(25) NOT NULL, 
    nameserver      VARCHAR(255) NOT NULL, 
    account         VARCHAR(40) DEFAULT NULL
) ENGINE=InnoDB;

GRANT SELECT ON supermasters TO pdns;
GRANT ALL ON domains TO pdns;
GRANT ALL ON domains TO pdns@localhost;
GRANT ALL ON records TO pdns;
GRANT ALL ON records TO pdns@localhost;

Create pdns_sample_data.sql:

INSERT INTO domains (name, type) values ('test.com', 'NATIVE');
INSERT INTO records (domain_id, name, content, type,ttl,prio) 
    VALUES (1,'test.com','localhost ahu@ds9a.nl 1','SOA',86400,NULL);
INSERT INTO records (domain_id, name, content, type,ttl,prio)
    VALUES (1,'test.com','dns-us1.powerdns.net','NS',86400,NULL);
INSERT INTO records (domain_id, name, content, type,ttl,prio)
    VALUES (1,'test.com','dns-eu1.powerdns.net','NS',86400,NULL);
INSERT INTO records (domain_id, name, content, type,ttl,prio)
    VALUES (1,'www.test.com','199.198.197.196','A',120,NULL);
INSERT INTO records (domain_id, name, content, type,ttl,prio)
    VALUES (1,'mail.test.com','195.194.193.192','A',120,NULL);
INSERT INTO records (domain_id, name, content, type,ttl,prio)
    VALUES (1,'localhost.test.com','127.0.0.1','A',120,NULL);
INSERT INTO records (domain_id, name, content, type,ttl,prio)
    VALUES (1,'test.com','mail.test.com','MX',120,25);

Populate mysql database:

$ echo "CREATE DATABASE pdnstest" | mysql -uroot -e
$ mysql -uroot < pdns.sql
$ mysql -uroot < pdns_sample_data.sql

Run pdns

$ cd /usr/local/pdns-2.9.22
$ sudo bin/pdns start

Test

$ dig www.test.com @127.0.0.1
...
www.test.com.		120	IN	A	199.198.197.196
...

Using long-lived stable branches

For the last couple of years I’ve been using subversion on all the commercial software projects I’ve done. At joost and after that at the BBC we’ve usually used long-lived stable branches for most of the codebases. Since I cannot find a good explanation of the pattern online I thought I’d write up the basics.

Working on trunk

Imagine a brand new software project. There’s two developers: Bob and Fred. They create a new project with a new trunk and happily code away for a while:

Example flow diagram of two developers committing to trunk

Stable branch for release

Flow diagram of two developers creating a stable branch to cut releases

At some point (after r7, in fact) the project is ready to start getting some QA, and its Bob’s job to cut a first release and get it to the QA team. Bob creates the new stable branch (svn cp -r7 ../trunk ../branches/stable, resulting in r8). Then he fixes one last thing (r9), which he merges to stable (using svnmerge, r10). (Not paying much attention to the release work, Fred’s continued working and fixed a bug in r11) Bob then makes a tag of the stable branch (svn cp brances/stable tags/1.0.0, r12) to create the first release.

QA reproduce the bug Fred has already fixed, so Bob merges that change to stable (r14) and tags 1.0.1 (r15). 1.0.1 passes all tests and is eventually deployed to live.

Release branch for maintenance

Flow diagram of creating a release branch for hosting a bug fix

A few weeks later, a problem is found on the live environment. Since it looks like a serious problem, Bob and Fred both drop what they were doing (working on the 1.1 release) and hook up on IRC to troubleshoot. Fred finds the bug and commits the fix to trunk (r52), tells Bob on IRC, and then continues hacking away at 1.1 (r55). Bob merges the fix to stable (r53) and makes the first 1.1 release (1.1.0, r54) so that QA can verify the bug is fixed. It turns out Fred did fix the bug, so Bob creates a new release branch for the 1.0 series (r56), merges the fix to the 1.0 release branch (r57) and tags a new release 1.0.2 (r58). QA run regression tests on 1.0.2 and tests for the production bug. All seems ok so 1.0.2 is rolled to live.

Interaction with continuous integration

Flow diagram showing what continuous integration projects use what branch

Every commit on trunk may trigger a trunk build. The trunk build has a stable period of just a few minutes. Every successful trunk build may trigger an integration deploy. The integration deploy has a longer stable period, about an hour or two. It is also frequently triggered manually when an integration deploy failed or deployed broken software.

Ideally the integration deploy takes the artifacts from the latest successful trunk build and deploys those, but due to the way maven projects are frequently set up it may have to rebuild trunk before deploying it.

Every merge to stable may trigger a stable build. The stable build also has a stable period of just a few minutes, but it doesn’t run as frequently as the trunk build simply because merges are not done as frequently as trunk commits. The test deploy is not automatic – an explicit decision is made to deploy to the test environment and typically a specific version or svn revisions is deployed.

Reflections

Main benefits of this approach

  • Reasonably easy to understand (even for the average java weenie that’s a little scared of merging, or the tester that doesn’t touch version control at all).
  • Controlled release process.
  • Development (on trunk) never stops, so that there is usually no need for feature branches (though you can still use them if you need to) and communication overhead between developers is limited.
  • Subversion commit history tells the story of what actually happened reasonably well.

Why just one stable?

A lot of people seeing this might expect to see a 1.0-STABLE, 1.1-STABLE, and such and so forth. The BSDs and mozilla do things that way, for example. The reason not to have those comes down to tool support – with a typical svn / maven / hudson / jira toolchain, branching is not quite as cheap as you’d like it to be, especially on large crufty java projects. It’s simpler to work with just one stable branch, and you can often get away with it.

From a communication perspective it’s also just slightly easier this way – rather than talk about “the current stable branch” or “the 1.0 stable branch”, you can just say “the stable branch” (or “merge to stable”) and it is not ever ambiguous.

Why a long-lived stable?

In the example above, Bob and Fred have continued to evolve stable as they worked on the 1.1 release series – for example we can see that Bob merged r46,47,49 to stable. When continuously integrating on trunk, it’s quite common to see a lot of commits to trunk that in retrospect are best grouped together and considered a single logical change set. By identifying and merging those change sets early on, the story of the code evolution on stable gives a neat story of what features were code complete when, and it allows for providing QA with probably reasonably stable code drops early on.

This is usually not quite cherry-picking — it’s more likely melon-picking, where related chunks of code are kept out of stable for a while and then merged as they become stable. The more coarse-grained chunking tends to be rather necessary on “agile” java projects where there can be a lot of refactoring, which tends to make merging hard.

Why not just release from trunk?

The simplest model does not have a stable branch, and it simply cuts 1.0.0 / 1.0.1 / 1.1.0 from trunk. When a maintenance problem presents itself, you then branch from the tag for 1.0.2.

The challenge with this approach is sort-of shown in these examples — Fred’s commit r13 should not make it into 1.0.1. By using a long-lived stable branch Bob can essentially avoid creating the 1.0 maintenance branch. It doesn’t look like there’s a benefit here, but when you consider 1.1, 1.2, 1.3, and so forth, it starts to matter.

The alternative trunk-only approach (telling Fred to hold off committing r13 until 1.0 is in production) is absolutely horrible for what are hopefully obvious reasons, and I will shout at you if you suggest it to me.

For small and/or mature projects I do often revert back to having just a trunk. When you have high quality code that’s evolving in a controlled fashion, with small incremental changes that are released frequently, the need to do maintenance fixes becomes very rare and you can pick up some speed by not having a stable branch.

What about developing on stable?

It’s important to limit commits (rather than merges) that go directly to stable to an absolute minimum. By always committing to trunk first, you ensure that the latest version of the codebase really has all the latest features and bugfixes. Secondly, merging in just one direction greatly simplifies merge management and helps avoid conflicts. That’s relatively important with subversion because its ability to untangle complex merge trees without help is still a bit limited.

But, but, this is all massively inferior to distributed version control!

From an expert coders’ perspective, definitely.

For a team that incorporates people that are not all that used to version control and working with multiple parallel versions of a code base, this is very close to the limit of what can be understood and communicated. Since 80% of the cost of a typical (commercial) software project has nothing to do with coding, that’s a very significant argument. The expert coders just have to suck it up and sacrifice some productivity for the benefit of everyone else.

So the typical stance I end up taking is that those expert coders can use git-svn to get most of what they need, and they assume responsibility for transforming their own many-branches view back to a trunk+stable model for consumption by everyone else. This is quite annoying when you have three expert coders that really want to use git together. I’ve not found a good solution for that scenario; the cost of setting up decent server-side git hosting is quite difficult to justify even when you’re not constrained by audit-ability rules.

But, but this is a lot of work!

Usually when explaining this model to a new group of developers they realize at some point someone (like Bob) or some people will have to do the work of merging changes from trunk to stable, and that the tool support for stuff like that is a bit limited. They’ll also need extra hudson builds and worry a great deal how on earth to deal with maven’s need to have the version number inside the pom.xml file.

To many teams it just seems easier to avoid all this branching mess altogether, and instead they will just be extra good at their TDD and their agile skills. Surely it isn’t that much of a problem to avoid committing for a few hours and working on your local copy while people are sorting out how to bake a release with the right code in it. Right?

The resolution usually comes from the project managers, release managers, product managers, and testers. In service-oriented architecture setups it can also come from other developers. All those stakeholders quickly realize that all this extra work that the developers don’t really want to do is exactly the work that they do want the developers to do. They can see that if the developers spend some extra effort as they go along to think about what is “stable” and what isn’t, the chance of getting a decent code drop goes up.

Forward references for RESTful resource collections

I could use some help with a design problem for a RESTful API.

The APIs we’re trying to do are those for a media production process with MXF and AAF as interchange formats. Data comes out of a database to go into complex long-running processes that slice and dice the data, eventually coming back to merge into the database. That database itself is replicated across half a dozen sites in an eventually consistent pattern, and connected up in various ways to other (enterprise) databases. Because the full complexity of these media formats gets in the way of designing the API basics I’ve come up with a simpler example. The weirdness of the example comes from it being distilled out of the complex use cases, where it does make (some) sense.

Setting the scene

Imagine a library of digital books. The library for reasons of storage efficiency and others has ripped all the books apart and has stored the individual chapters. When you are searching through the library or fetching bits of content, you interact with a representation of the books and the chapters (like a virtual index card) that does not include their content.

So books consist of 0 or more chapters, chapters are part of one or more books. Chapters can be part of multiple books, really. This happens because The collected works of William Shakespeare is represented as all the chapters from all of his books stitched together.

Both books and chapters have 0 or more titles (usually one title per language but there are various also known as edge cases).

Browsing through books

Imagine we represent a book as

<book xmlns="http://schemas.example.com/library/v1/" id="urn:uuid:084E014B-784D-41AE-9EF6-01CE202B5EDA" href="/library/book/084E014B-784D-41AE-9EF6-01CE202B5EDA">
  
<title xml:lang="en-GB">The Merchant of Venice</title>
  
<title xml:lang="nl">De Koopman van Venetië</title>
  
<chapters>
    
<chapter id="urn:uuid:B24B6A07-7E48-4C61-B10F-FE13CCE7B20E" href="/library/chapter/B24B6A07-7E48-4C61-B10F-FE13CCE7B20E">
      
<title xml:lang="en-GB">FIRST ACT</title>
      
<title xml:lang="nl">EERSTE BEDRIJF</title>
    
</chapter>
  
</chapters>
</book>


and a chapter as

<chapter xmlns="http://schemas.example.com/library/v1/" id="urn:uuid:B24B6A07-7E48-4C61-B10F-FE13CCE7B20E" href="/library/chapter/B24B6A07-7E48-4C61-B10F-FE13CCE7B20E">
  
<title xml:lang="en-GB">FIRST ACT</title>
  
<title xml:lang="nl">EERSTE BEDRIJF</title>
  
<book id="urn:uuid:084E014B-784D-41AE-9EF6-01CE202B5EDA" href="/library/book/084E014B-784D-41AE-9EF6-01CE202B5EDA">
    
<title xml:lang="en-GB">The Merchant of Venice</title>
    
<title xml:lang="nl">De Koopman van Venetië</title>
  
</book>
</chapter>

It’s hopefully obvious that you can do a GET /library/{book|chapter}/{uuid} to retrieve these representations.

Changing book metadata

It’s also not difficult to imagine that you can do a PUT to the same URL to update the resource. You just PUT the same kind of document back.

What is a bit difficult is what happens when you do that PUT. The logic that I want is that a PUT of a book can be used to change the titles for that book and change which chapters are part of that book. For a PUT of a chapter, that should be used to change the titles for the chapter, but not to add or remove the chapter from a book (the list of chapters is ordered and the chapter doesn’t know where it is in the ordering).

(Again these rules seem pretty artificial in the example but in MXF there’s a variety of complex constraints that dictate in many cases that a new UMID should be created if an object in the model changes in a way that matters)

This sort-of breaks the PUT contract, because no matter how often you GET a book document, change the title of a chapter inside the book, and PUT that changed representation, your change will not be picked up. You have to follow the href, get the representation for the chapter, change the title there, and PUT it back.

This also breaks the common expectation people have with XML documents — if the data is there and you edit it and then you save it, normal things happen.

The problem with minimal representations

It’s easy to minimize the representations in use so this problem goes away. For example,

<chapter xmlns="http://schemas.example.com/library/v1/" id="urn:uuid:B24B6A07-7E48-4C61-B10F-FE13CCE7B20E">
<title xml:lang="en-GB">FIRST ACT</title>
<title xml:lang="nl">EERSTE BEDRIJF</title>
<book href="/library/book/084E014B-784D-41AE-9EF6-01CE202B5EDA" />
</chapter>


It’s clear what you’re dealing with. The PUT does what it is supposed to do, and to learn the book title you just do another GET.

The problem with this approach is that the number of HTTP requests grows much larger if you want to display something in the UI, because the visual representation of a chapter shows the book title. To build snappy UIs that use ajax to communicate with my service, the rich representation that has the title information is much better.

Some options

So what should I do?

Use multiple representations

I could have /library/{book|chapter}/{uuid}/annotated as well as /library/{book|chapter}/{uuid}, with the latter serving the minimal representation and supporting PUT, or if I had smart ajax clients (I don’t) I could use some kind of content negotiation to get to the rich annotated version.

This is rather a bit of work and when documents leave the web for some kind of offline processing (the AAF files go into a craft edit suite and come back very different many weeks later, but they will still reference some of my original data) I have a risk that the “wrong” document makes into that edit suite.

Document the situation

I could stick with my original richly annotated XML and simply document which fields are and aren’t processed when you do a PUT. I’d probably change the PUT to a POST to make it a bit clearer.

Document and enforce the situation

I could strongly validate all documents that are PUT to me to make sure they do not contain any elements (in my namespace) that I do not intend to save, and reject documents that

Document the situation inside the XML

I could do something like

<chapter xmlns="http://schemas.example.com/library/v1/" id="urn:uuid:B24B6A07-7E48-4C61-B10F-FE13CCE7B20E" href="/library/chapter/B24B6A07-7E48-4C61-B10F-FE13CCE7B20E">
  
<title xml:lang="en-GB">FIRST ACT</title>
  
<title xml:lang="nl">EERSTE BEDRIJF</title>
  
<referencedBy>
      
<!-- please note that referencedBy contents
           cannot be changed through PUT 
-->
      
<book id="urn:uuid:084E014B-784D-41AE-9EF6-01CE202B5EDA" href="/library/book/084E014B-784D-41AE-9EF6-01CE202B5EDA">
        
<title xml:lang="en-GB">The Merchant of Venice</title>
        
<title xml:lang="nl">De Koopman van Venetië</title>
      
</book>
  
</referencedBy>
</chapter>


This way it’s hopefully quite obvious to the API consumer what is going to happen when they PUT a document back. It is still rather unclean REST (so should I use POST?), but it avoids me having to design separate representations for browse vs edit.

One disadvantage is that I have to keep more resource state around when parsing or generating the content. Not an issue when things are built-up in memory, but for large documents and/or for pipeline processing, I made life a lot harder. There’s other possibilities to alleviate this (like adding an isReference attribute or inlining referencedBy sections throughout the document rather than put them all at the bottom), but they’re even less please esthetically.

Something else?

Which approach do you think is best? Is there a better one? What would you do?

Right now, since I’m just doing some quick prototyping, I’ve gone for the “document the situation” approach, but I think that eventually I’d either like to somehow highlight the “this is a forward reference for your convenience but don’t edit it” bits of the XML, or go for the multiple representations approach.

Capacity planning for the network

Lesson learned: on our HP blades, with the standard old crappy version of memcached that comes with red hat, when we use them as the backend for PHP’s object cache, we can saturate a 1gbit ethernet connection with CPU usage of about 20-70%:

Zenoss/RRD graph of memcache I/O plateau at 1gbit

No, we did not learn this lesson in a controlled load test, and no, we didn’t know this was going to be our bottleneck. Fortunately, it seems we degraded pretty gracefully, and so as far as we know most of the world didn’t really notice 🙂

Immediate response:

  1. take some frontend boxes out of load balancer to reduce pressure on memcached
  2. repurpose some servers for memcached, add frontend boxes back into pool
  3. tune object cache a little

Some of the follow-up:

  • redo some capacity planning paying more attention to network
  • see if we can get more and/or faster interfaces into the memcached blades
  • test if we can/should make the object caches local to the frontend boxes
  • test if dynamically turning on/off object caches in some places is sensible

I have to say it’s all a bit embarrassing – forgetting about network capacity is a bit of a rookie mistake. In our defense, most people doing scalability probably don’t deal with applications that access over 30 megs of object cache memory to service one request. The shape of our load spikes (when we advertise websites on primetime on BBC One) is probably a little unique, too.

update: I was mistakenly using “APC” in the above to mean “cache” but APC is just the “opcode cache” and is completely disjoint from “object cache”. D’oh!