Capacity planning for the network

Lesson learned: on our HP blades, with the standard old crappy version of memcached that comes with red hat, when we use them as the backend for PHP’s object cache, we can saturate a 1gbit ethernet connection with CPU usage of about 20-70%:

Zenoss/RRD graph of memcache I/O plateau at 1gbit

No, we did not learn this lesson in a controlled load test, and no, we didn’t know this was going to be our bottleneck. Fortunately, it seems we degraded pretty gracefully, and so as far as we know most of the world didn’t really notice 🙂

Immediate response:

  1. take some frontend boxes out of load balancer to reduce pressure on memcached
  2. repurpose some servers for memcached, add frontend boxes back into pool
  3. tune object cache a little

Some of the follow-up:

  • redo some capacity planning paying more attention to network
  • see if we can get more and/or faster interfaces into the memcached blades
  • test if we can/should make the object caches local to the frontend boxes
  • test if dynamically turning on/off object caches in some places is sensible

I have to say it’s all a bit embarrassing – forgetting about network capacity is a bit of a rookie mistake. In our defense, most people doing scalability probably don’t deal with applications that access over 30 megs of object cache memory to service one request. The shape of our load spikes (when we advertise websites on primetime on BBC One) is probably a little unique, too.

update: I was mistakenly using “APC” in the above to mean “cache” but APC is just the “opcode cache” and is completely disjoint from “object cache”. D’oh!

A personal history of computers and the internet

I’m pretty sure this is not very interesting to anyone at all, but I wanted to write it down so I can look back at it some years from now and remember.

Early years

In ’88 or ’89, when I was about 5 or 6, my dad bought an Amiga 500. I remember that he taught me how to draw mandelbrot fractals with it. We would sometimes leave it running for two days slowly drawing the picture. We had the extra floppy drive which meant that we could play games that were on two floppy disks. At first me and my brother mostly played a demo version of Pang that had come with a magazine, and then we got Rainbow Islands which we played for ages and ages. I think we got it for Sinterklaas. We also got Batman eventually.

Then, much later (the nextdoor neighbours had an IBM PC by then, though we weren’t allowed to play with it), with another magazine we got some kind of a programming environment; I can’t remember which one. I figured out how to make it display something like “your computer has been infected with a virus”, and I also figured out how to change the high scores on some of our games. We had that amiga for a long time.

The next computer to enter the house was an old IBM PC that had been obsoleted at my dad’s work. It had a 80×24 screen with green text on it, it ran MS-DOS and had a hard drive. We broke it the same day! With the amiga I had learned that if things went wrong you could just turn off the power and turn it on again. With this computer, that was enough to crash the hard drive.

When the pentium processor came out in ’93 my dad bought us a new computer. It came with DOS and Windows 3.1. I read the entire manual cover to cover and figured out how to do simple things with Basic in DOS, including messing with a version of snake that I typed in from a magazine.

When we got Compuserve with WinCIM and a 14k4 modem my 10-year-old self got properly addicted to the computer. Online time was expensive so I was allowed only 30 minutes a week at first. When my dad learned of the internet (I think after a business trip to the US) we spent ages together trying to get Winsock and WinCIM to work together. After calling compuserve support they sent us a new version of WinCIM with Mosaic and we got on the world wide web. I’m not sure when exactly this was, but the WWW was grey with blue and black, and we used WebCrawler to find stuff, which was really hard.

I remember I didn’t like the world wide web very much, because it wasn’t as easy to use as the compuserve forums. It all changed when I read about Yahoo! in a magazine; that was probably some time in 1995. I was then active on the compuserve AD&D forum, and as hobbyists set up AD&D websites, the web became very interesting.

Teenage years

I learned some HTML and made my own website on Geocities, I think in 1996 or 1997. When we switched ISP to Demon my website moved there, which the wayback machine keeps some copies of. I became an amazon associate and made commission on AD&D and other roleplaying books. I used the credit earned to buy roleplaying books and then a javascript book. I was learning javascript and eventually some PHP mostly to manage the book catalog.

According to my amazon history I got my first book credit in May 1998, when I was 15. That’s also roughly the time when I started making my first “company presence” websites for money. I started my first company just before I turned 16 and invested my first earnings in a copy of Flash 4 which I used to build lots of stuff with. I never made much money, but it was a bit more than I made from being a paperboy.

In 1999 or 2000 or so I had learned enough PHP and MySQL and Corel PhotoPaint 4 to help out on a major rework of another roleplaying site, AtFantasy, which stayed pretty popular for several years; I think the guy that started that eventually made a living out of running the site.

I started lurking on the PHP-Dev mailing list at some point, where there was this Sam Ruby person talking about server-side java all the time. It was intriguing, so I read a book or two about java, started programming in it, got interested in servers, and one thing led to another. I was voted in as a committer on Apache Avalon in March of 2001 (aged 17, my mum had to sign the CLA).

I switched my desktop to (red hat) linux around this time.

Going professional

In June 2001, just after finishing high school, I got a job as a web programmer for Planet Internet, which I quit after 3 months to go backpacking in Australia. It was my first real job, and I learned a lot in those 3 months (how to take down an oracle cluster, how to royally piss off the sysadmins by asserting they misconfigured the reverse proxy, how to write Tcl, how to do cross-browser HTML and javascript, how to do flasm, just how many support calls you cause if you accidentally publish the wrong version of the help pages).

After returning from Australia I got a job at Multi-M/IA, where I was hired to do some PHP CMS work. I then worked on an e-mail based CMS in java, which was for UNAIDS field doctors who had about 20 minutes of GPRS connectivity per day, a bulk mail tool using JavaMail, and several filesharing/intranet projects. This is where I first learned about server administration (we had managed hosting with Rackspace), which probably also triggered my interest in build engineering. As I started studying physics I eventually quit that job.

I partially switched to Mac when I bought an iBook G4, though I kept using (ubuntu) linux on my desktop for a long time.

The next key turning point was when I got a phone call from Dirk-Willem, who needed someone to help out on some project infrastructure and a build system for a major web service project for the Dutch government. I found I enjoyed that way more than studying, so I quit uni.

I got a beefy PowerMac and a 30″ Cinema Display and have been mac-only ever since.

I worked as a freelancer for 2 years, most of which was with asemantics, where I learned about low-level engineering and business and large-scale commercial software projects.

Asemantics was a subcontractor for Joost, whom I joined in October 2006 and then left in May this year. Like everyone else there I worked long hours and my open source contributions dwindled, but working with many very smarty and talented people meant I continued to learn quite a lot. In particular I ended up learning a lot more about data modeling and databases, eventually leading a migration effort away from an RDF database to a relational model.

At the moment I’m back to contracting, working at the BBC, where I’m part of a platform engineering team responsible for the BBC’s shiny new web platform. This is the first time I’ve been fully embedded in a really big (and old) organization. That means learning about policies and processes and organization structures, and then sometimes trying to change them. When a lot of engineering choices are dictated by non-technical constraints (“we must be able to hire people 10 years from now that can maintain our software”), your perspective does change. I think.

Web application platform technology choices

The hardest bit in the web application platform challenge is making reasonable choices. Here’s a stab at some of them…

Hosting models

I see these basic choices:

  1. LAMP virtual hosting. If you can build everything you need with mysql+php and you have few enough users that you need only one database server, by far the easiest and cheapest.
  2. Application hosting. Code on github, project management with basecamp or hosted jira, build on AppEngine or Heroku or force.com. You don’t have to do your own infrastructure but you’re limited in what you can build. Also comes with a large chance of lock-in.
  3. Managed hosting. Rent (virtual) servers with pre-installed operating systems and managed networking. Expensive for large deployments but you don’t need all web operations team skills and you have a lot of flexibility (famously, twitter do this).
  4. Dedicated hosting. Buy or rent servers, rent rackspace or build your own data center. You need network engineers and people that can handle hardware. Usually the only cost-effective option beyond a certain size.

Given our stated requirements, we are really only talking about option #4, but I wanted to mention the alternatives because they will make sense for a lot of people. Oh, and I think all the other options are these days called cloud computing 🙂

Hardware platform

I’m not really a hardware guy, normally I leave this kind of stuff to others. Anyone have any good hardware evaluation guides? Some things I do know:

  • Get at least two of everything.
  • Get quality switches. Many of the worst outages have something to do with blown-up switches, and since you usually have only a few, losing one during a traffic spike is uncool.
  • Get beefy database boxes. Scaling databases out is hard, but they scale up nicely without wasting resources.
  • Get beefy (hardware) load balancers. Going to more than 2 load balancers is complicated, and while the load balancers have spare capacity they can help with SSL, caching, etc.
  • Get beefy boxes to run your monitoring systems (remember, two of everything). In my experience most monitoring systems suffer from pretty crappy architectures, and so are real resource hogs.
  • Get hardware RAID (RAID 5 seems common) with a battery-backed write-through cache, for all storage systems. That is, unless you have some other redundancy architecture and you don’t need RAID for redundancy.
  • Don’t forget about hardware for backups. Do you need tape?

Other thoughts:

  • Appliances. I really like the idea. Things like the schooner appliances for mysql and memcache, or the kickfire appliance for mysql analytics. I have no firsthand experience with them (yet) though. I’m guessing oracle+sun is going to big in this space.
  • SSD. It is obviously the future, but right now they seem to come with limited warranties, and they’re still expensive enough that you should only use them for data that will actually get hot.

Operating system

Choice #1: unix-ish or windows or both. The Microsoft Web Platform actually looks pretty impressive to me these days but I don’t know much about it. So I’ll go for unix-ish.

Choice #2: ubuntu or red hat or freebsd or opensolaris.

I think Ubuntu is currently the best of the debian-based linuxes. I somewhat prefer ubuntu to red hat, primarily because I really don’t like RPM. Unfortunately red hat comes with better training and certification programs, better hardware vendor support and better available support options.

FreeBSD and solaris have a whole bunch of advantages (zfs, zones/jails, smf, network stack, many-core, …) over linux that make linux seem like a useless toy, if it wasn’t for the fact that linux sees so much more use. This is important: linux has the largest array of pre-packaged software that works on it out of the box, linux runs on more hardware (like laptops…), and many more developers are used to linux.

One approach would be solaris for database (ZFS) and media (ZFS!) hosting, and linux for application hosting. The cost of that, of course, would be the complexity in having to manage two platforms. The question then is whether the gain in manageability offsets the price paid in complexity.

And so, red hat gains another (reluctant) customer.

Database

As much sympathy as I have for the NoSQL movement, the relational database is not dead, and it sure as hell is easier to manage. When dealing with a wide variety of applications by a wide variety of developers, and a lot of legacy software, I think a SQL database is still the default model to go with. There’s a large range of options there.

Choice #1: clustered or sharded. At some point some application will have more data than fits on one server, and it will have to be split. Either you use a fancy database that supports clustering (like Oracle or SQL Server), or you use some fancy clustering middleware (like continuent), or you teach your application to split up the data (using horizontal partitioning or sharding) and you use a more no-frills open source database (mysql or postgres).

I suspect that the additional cost of operating an oracle cluster may very well be worth paying for – besides not having to do application level clustering, the excellent management and analysis tools are worth it. I wish someone did a model/spreadsheet to prove it. Anyone?

However, it is much easier to find developers skilled with open source databases, and it is much easier for developers to run a local copy of their database for development. Again there’s a tradeoff.

The choice between mysql and postgres has a similar tradeoff. Postgres has a much more complete feature set, but mysql is slightly easier to get started with and has significantly easier-to-use replication features.

And so, mysql gains another (reluctant) customer.

With that choice made, I think its important to invest early on in providing some higher-level APIs so that while the storage engine might be InnoDB and the access to that storage engine might be MySQL, many applications are coded to talk to a more constrained API. Things like Amazon’s S3, SimpleDB and the Google AppEngine data store provide good examples of constrained APIs that are worth emulating.

HTTP architecture

Apache HTTPD. Easiest choice so far. Its swiss army knife characteristic is quite important. Its what everyone knows. Things like nginx are pretty cool and can be used as the main web server, but I suspect most people that switch to them should’ve spent some time tuning httpd instead. Since I know how to do that…I’ll stick with what I know.

As easy as that choice is, the choice of what to put between HTTPD and the web seems to be harder than ever. The basic sanctioned architecture these days seems to use BGP load sharing to have the switches direct traffic at some fancy layer 7 load balancers where you terminate SSL and KeepAlive. Those fancy load balancers then may point at a layer of caching reverse proxies like which then point at the (httpd) app servers.

I’m going to assume we can afford a pair of F5 Big-IPs per datacenter. Since they can do caching, too, we might avoid building that reverse proxy layer until we need it (at which point we can evaluate squid, varnish, HAProxy, nginx and perlbal, with that evaluation showing we should go with Varnish 🙂 ).

Application architecture

Memcache is nearly everywhere, obviously. Or is it? If you’re starting mostly from scratch and most stuff can be AJAX, http caching in front of the frontends (see above) might be nearly enough.

Assuming a 3-tier (web, middleware, db) system, reasonable choices for the front-end layer might include PHP, WSGI+Django, and mod_perl. I still can’t see myself rolling out Ruby on Rails on a large scale. Reasonable middelware choices might include java servlets, unix daemons written in C/C++ and more mod_perl. I’d say Twisted would be an unreasonable but feasible choice 🙂

Communication between the layers could be REST/HTTP (probably going through the reverse proxy caches) but I’d like to try and make use of thrift. Latency is a bitch, and HTTP doesn’t help.

I’m not sure whether considering a 2-tier system (i.e. PHP direct to database, or perhaps PHP link against C/C++ modules that talk to the database) makes sense these days. I think the layered architecture is usually worth it, mostly for organizational reasons: you can have specialized backend teams and frontend teams.

If it was me personally doing the development, I’m pretty sure I would go 3-tier, with (mostly) mod_wsgi/python frontends using (mostly) thrift to connect to (mostly) daemonized python backends (to be re-written in faster/more concurrent languages as usage patterns dictate) that connect to a farm of (mostly) mysql databases using raw _mysql, with just about all caching in front of the frontend layer. I’m not so sure its easy to teach a large community of people that pattern; it’d be interesting to try 🙂

As for the more boring choice…PHP frontends with java and/or C/C++ backends with REST in the middle seems easier to teach and evangelize, and its also easier to patch up bad apps by sticking custom caching stuff (and, shudder, mod_rewrite) in the middle.

Messaging

If there’s anything obvious in today’s web architecture it is that deferred processing is absolutely key to low-latency user experiences.

The obvious way to do asynchronous work is by pushing jobs on queues. One hard choice at the moment is what messaging stack to use. Obvious contenders include:

  • Websphere MQ (the expensive incumbent)
  • ActiveMQ (the best-known open source system with stability issues)
  • OpenAMQ (AMQP backed by interesting startup)
  • 0MQ (AMQP bought up by same startup)
  • RabbitMQ (AMQP by another startup; erlang yuck)
  • MRG (or QPid, AMQP by red hat which is not exactly a startup).

A less obvious way to do asynchronous work is through a job architecture such as gearman, app engine cron or quartz, where the queue is not explicit but rather exists as a “pending connections” set of work.

I’m not sure what I would pick right now. I’d probably still stay safe and use AMQ with JMS and/or STOMP with JMS semantics. 2 months from now I might choose differently.

Bootstrapping a web application platform

Some rules for building a web application platform:

  1. Use a configuration management system.
  2. Put all configuration in version control.
  3. Have back ups of everything important. Verify all backups.
  4. Have documentation of configuration and process.
  5. Reliability and availability of configuration management is critical.
  6. Have monitoring of everything important. Verify all monitoring.

Unfortunately it is quite hard to get to this state for some existing system that violates those rules.

I’ve seen a few organizations deploy a first version of a platform “by hand” and then spend painful months re-imaging all their servers based on some chosen configuration management tool, causing all sorts of destabilization and pain.

Similarly, time and time again I also see people fail to consider backups from the start. Sorting out backups properly once they have been burnt by data loss is a very costly process, too.

It is well worth avoiding such mess by making sure you are at this clean state from the start, and then never ever leave it.

Before you begin

Some of the choices that you should have made:

  1. hardware and/or hosting platform
  2. operating system
  3. version control system
  4. backup system
  5. configuration management system
  6. monitoring system
  7. availability / redundancy requirements

That last bullet might be unexpected, but you need some idea of how much availability you want right away. The reason is that when all your servers are managed through centralized configuration management backed up by a version control system, those components become critically important. Imagine a scenario where a newly found security hole in one of your software components is being exploited for a DDOS attack. To deal with this, you need to upgrade the software component, and to do that, you need your configuration management and version control functioning.

Because there are a lot of scenarios that you haven’t planned for, and because extensive downtime is expensive, it is probably worth significantly over-provisioning for your configuration management, version control systems, and monitoring systems.

Boostrapping configuration management

  1. Install a repo server
  2. Install version control on the repo server
  3. Install configuration management on the repo server
  4. Put the configuration of the repo server into version control
  5. Install a backup server
  6. Install configuration management on the backup server
  7. Put the configuration of the backup server into version control
  8. Configure backups of the repo server
  9. Install a fallback repo server using the versioned configuration
  10. Restore a backup of the repo server onto the fallback repo server
  11. Fail over to the fallback repo server
  12. Reinstall the repo server using the versioned configuration
  13. Restore a backup of the fallback repo server onto the repo server
  14. Fail over to the repo server
  15. Install a secondary backup server using the versioned configuration
  16. Restore a backup from the secondary backup server onto the fallback repo server
  17. Fail over to the fallback repo server
  18. Fail over to the repo server
  19. Reinstall the backup server using the versioned configuration
  20. Restore a backup from the backup server onto the fallback repo server
  21. Fail over to the fallback repo server
  22. Fail over to the repo server
  23. Reboot every server in turn
  24. Shut down all servers then start up all the servers
  25. Document everything you’ve done so far in version control:
    • Document the precise bootstrap procedure
    • Document the precise failover procedure
    • Document the precise backup restoration procedure
    • Document the precise procedure to add a new server
    • Document the precise procedure to reboot a server
  26. Test all the documented procedures (by executing them)

Bootstrapping essential services

With basic configuration management in place, we can now work on setting up things like DNS (zone files obviously go in version control), NTP (which should run on every machine), SSH (make sure you’re secure enough), logrotate (make sure you don’t fill up your disks).

What services should be considered essential varies a bit. For example you might use some external DNS provider if all you have are public IPs. Similarly, if you want to use x509 client certificates to talk to SVN, then this is probably a good time to set up your certificate authority. Obviously, you should make sure to back up your certificate authority root certificate and the server certs for your essential servers somewhere really safe, like on a couple of dedicated, labelled, high quality USB keys.

Again, make sure you have enough instances of all these services to satisfy your availability guarantees. 2 is a good minimum.

Boostrapping package management

Up until now you’ve probably been downloading and installing packages from somewhere external. It is a good idea to set up a mirror (and a backup mirror) of the package servers you use so that you’re not dependent on the 3rd party’s server availability. Make sure all your own servers then use these package mirrors.

This is also a good point to consider how you will handle installation of your own software. If you’re going to be packaging it up, also set up your own package management servers at this point.

Bootstrapping security

Security is a rather big topic and I’ll ignore most of it here, but there’s a few places where it connects specifically to bootstrapping. Your configuration management tools should help ensure you can relatively easily roll out software updates. However you still have to put in place a process to ensure that you will roll out required security updates in a timely fashion.

For example, you should subscribe to security announcements from the vendors of the software that you use. It can be a good idea to keep your own log of security updates in version control. Keep a timestamp of when the update was received and what action was taken (where a common action is “no action”). This allows you to audit your vulnerability windows later on.

Bootstrapping a base build

If you are using managed hosting or a cloud service, presumably you’re starting off from a base install of your chosen operating system and so you don’t have to build your own servers. But if you’ve got your own hardware, you need a good process in place for building or rebuilding a server, which needs to be documented and repeatable, and hopefully its quick, too.

If you’re using your own hardware, remember that some hardware can be faulty so whenever you add new hardware you should have burn-in tests, which can usually double as a basic hardware benchmark.

Setting up initial monitoring

A monitoring setup is probably the first complicated piece of infrastructure to set up. Its pretty reasonable to assume that most of the currently installed systems behave reasonably well, and that individual performance metrics on them are not very important, so you don’t need that much monitoring. Perhaps:

  • alerts:
    • server unreachable
    • service unreachable
    • backup failed
    • configuration management operation failed
    • partition usage above threshold (80%)
    • swap usage above threshold (1%)
    • 15 minute load average above threshold (10.0)
  • Per-server graphs at 5 minute granularity:
    • load average
    • CPU usage
    • Memory usage
    • Network usage and/or I/O

With monitoring it is particularly important to make sure that alerts are working, so make sure to test this. Its also important that newly added servers get monitored, so automate (and test) that, or make sure the procedure documentation explains how to add a server to the monitoring.

Oh, and once you build an actual web application, it is a really good idea to get 3rd party monitoring of your web application(s), so that you are not dependent on your own monitoring to know when they go down. You might want to use one of those 3rd party services to monitor your monitoring tool.

A dummy service

You should now create a very simple service as an integration test of all the basic infrastructure. I like setting up a vanilla apache httpd with a trivial CGI that writes data you give it to a file.

  • Make sure to test that you can install, upgrade, downgrade, uninstall, reinstall the service
  • Make sure that you get alerts when it goes down
  • Make sure 3rd party monitoring and your monitoring correlate
  • Make sure log rotation of the apache logs is working
  • Make sure backups of the written data are working and that you can restore from them

Bootstrapping complete!

That’s it. Keep following the same discipline that’s exemplified in the description above. As a consequence you will know that things are safe and secure, because you will always follow the base rules mentioned above.

You can now either set up a real service, or start setting up the development infrastructure, such as issue tracker, wiki, and continuous integration servers. I don’t think those are strictly necessary at first – in my experience it is much easier to change how you do issue tracking or software builds than it is to change how you do deployments.

Is this really necessary?

Well, maybe not, but I tend to think it’s really a worthwhile investment. Consider what you now have:

  • Peace of mind because you have full backups
  • Peace of mind because you have full monitoring
  • Peace of mind you will get an alert when something important breaks
  • Peace of mind because you can quickly fail over if anything breaks
  • Peace of mind that you could rebuild everything in the case of disaster

Also consider that if you already know what you are doing, the above is actually less work than you may think. I haven’t tried in quite a few years, but I imagine it’d take me about a week to get it all set up from scratch.

Cloud computing to the rescue?

If you use an application-as-a-service platform like Google AppEngine you don’t have to do any of this, that is, as long as you trust your cloud provider to have done a good enough job themselves so that you are confident you don’t have to have a backup “just in case”!

If you use an elastic computing platform like EC2, you really still have to do most of this work. More importantly, because you don’t get very good uptime guarantees on EC2 instances themselves, you still have to find some dedicated managed hosting for your configuration management systems.

That said, it is pretty easy to use EC2 to do most of the bootstrapping work, and EC2 is also a pretty cheap way to keep your backup servers, that you can mostly keep turned off until you actually need them, especially if having a few minutes of unscheduled downtime is acceptable.

The web application platform challenge

Your mission, should you choose to accept it: design, build and administer a platform for hosting a complex multi-purpose content-centric website.

Let’s imagine some high-level requirements for this platform:

  • It has to support the kind of traffic that put it in the worldwide top 20.
  • It has to support static and dynamic web content, including simple and complex web applications.
  • It has to support audio and video and other such multimedia.
  • It has to support many devices and many languages.
  • It has to be evolvable, running for the next 10 years without major/expensive rework.
  • It has to host the sites and applications produced by a large community: dozens of teams and hundreds of developers with content produced by thousands of professionals (and millions of users).
  • It has to provide a full production pipeline to those teams.
  • It has to be efficient.
  • It has to be budgeted, with all significant costs financially justifiable.

When designing this platform, you’re expected to evaluate all options and then make clear choices (on hardware, software, architecture, programming languages, release process, etc). Then, make reasonable attempts to standardize those choices within the community and provide training, though you should also retain the flexibility to respond to changes in requirements and provide migration paths for legacy sites and applications.

How would you tackle this challenge? Where would you start? What choices would you have to make first, and what choices would you defer until later? What would you buy, what would you build? What kind of team would you need?

I intend to start writing some blog posts giving possible answers to those questions. Be aware though: usually when I have such intentions I’m not so good at following through. For example I never published part 2 for serious PHP.