Cloud computing – comparing Google AppEngine and Amazon Web Services

Seems like Cloud computing is the new hot topic these days! AWS has been around for a while, and google has now added billing features to App Engine, which was the last thing I was waiting for. So let’s try to compare these two.

Pricing

Numbers in US dollars. Source for google numbers, Amazon numbers here and here.

  Google Amazon
  AppEngine S3 50/10* EC2 Small** S3 5000/1000*** EC2 Huge****
per CPU hour 0.10 0 0 0 0
per hour per running instance 0 0 0.10 0 0.8
per GB bandwidth incoming 0.10 0.10 0.10 0.10 0.10
per GB bandwidth outgoing 0.12 0.17 0.17 0.10 0.10
per GB storage per month 0.15 0.15 0 0.12 0
per request incoming 0 0.00001 0 0.00001 0
per request outgoing 0 0.000001 0 0.000001 0
per email outgoing 0.0001 0 0 0
* up to 50TB storage & 10TB outbound traffic, after that gets cheaper
** “small” instances, 160GB storage, up to 10TB outbound traffic, after that gets cheaper
*** far over 500TB storage & far over 150TB storage
**** “extra large”, 8 compute units, 1.6TB storage, over 150TB storage

Apple, meet pear

It should be obvious the above table is really comparing incomparables – amazon gives you much more flexibility, more control, and more options. This still puts a lot of he responsibility for efficient resource utilization on the application developer. If you are not careful, you can have many cpu-hours and storage-hours available that you do not use. The google offering provides far less flexibility and fewer options, with the advantage that you can start for free, and you do not have to plan resource utilization as much – you pay for CPU hours used, not for CPU hours reserved.

There will still be a large amount of applications for which AppEngine hosting simply is not an option (those that need a data store different from GFS, those that are not written in python, etc), where one might still consider AWS.

But what if you are writing a WSGI-compliant pure python application that you want to scale out, that does not need any (pyrex, swig, C) native optimization or bespoke caching, and whose data model supports using GFS-style persistence? How do the offerings compare, then?

Scaling with AWS

In the Amazon case you will have to do a lot of up-front infrastructure development yourself. You will need to figure out how to deploy EC2 instances with apache+mod_wsgi on them, automatically adding more instances under load and remove them again when they become unneeded. You will need to think about how to split off your large files (if any) so they are hosted on S3 (or the new Elastic Block Store, perhaps with Mongo DB on top, and some memcache obviously). You will need to think about and set up what application-level across-cluster monitoring you need. Etc etc. Lots of fun, but lots of work.

Once you do all that work, after you have your infrastructure up and running, you will enjoy tremendous flexibility when scaling – use of the right application patterns and optimization techniques means you have many ways to scale up as well as out. For example, you will soon figure out how to speed up your application using psyco, pyrex, and hand-crafted C. You will end up using a messaging pattern where it makes sense, using SQS or ActiveMQ, maybe some twisted etc etc.

By the end of it all that you’ll be pretty well-versed in what it means to scale out a web application, and probably looking to set up your own hosting (which will be cheaper still than AWS). Your application will be somewhat aware of the infrastructure it runs on, which can be a good thing or bad thing depending on how you set it up.

Scaling with AppEngine

In the AppEngine case, lots and lots of decisions have already been made for you. You will build your application following google’s manuals, monitor it using google’s tools (at least for quite a while), and not worry much about capacity planning (that is, until your site is discussed during a prime time TV shows, your costs go through the roof, and you wish you had optimized for minimum bandwidth consumption) or advanced performance optimization techniques.

You won’t be able to build a very useful messaging infrastructure (better not be writing a MMOG or chat app) or streaming thingie or anything like that. So after you become wildly successful you’re going to need to augment your AppEngine-hosted systems, or migrate away from them completely.

On the upside, if your application never becomes massively popular, and you stay within the free usage quotas, the only thing you have wasted is your own time, and less of it at that, since you focused most of your time on building the application.

Scaling out an internet startup, 2009

All in all my advice is, if you’re planning to build the next fancy social 2.0 web 2.0 application, to seriously consider building on Google AppEngine. Be careful what APIs you use, and make sure to insulate yourself from their specifics – you don’t want to be tied at the hip to GFS, with no way to move away from it.

If AppEngine hosting is too expensive for your business model to work, get a new business model immediately. Both AppEngine and AWS are dirt cheap.

Once your application really takes off (and you start paying google some significant $s), try and see if you have some time to keep a version of it running on the AWS infrastructure, perhaps redirecting a few percent of your traffic there. If you can, that’s great, you now have options. At that point, one of your options to seriously consider is probably to set up your own dedicated infrastructure, starting with getting a competent ops guy or two, and then buying some server space (or some managed servers, maybe) and some bandwidth.

Can I do XML web services and SOAP?

The only reasonable python library for doing serious XML is lxml, which requires libxml2 and libxslt python bindings, which is currently an unsupported extension module on AppEngine. So not yet with AppEngine, or at least not properly. With AWS, yes, you can 🙂

But I don’t like python!

Well, apparently these guys can cloud-compute your rails, and these guys can cloud-compute your php and java. But really you should just use python.