Skip to content

The headache of mapping shards to servers

October 1, 2009

At work we had a lot of headache figuring out how to reshard our CouchDB data. We have 2 data centers with 16 CouchDB instances each. One server holds 4 CouchDB nodes. At the moment each data center has one copy of the data. We want to improve resilience so we are changing this so that each data center has two copies of the data (on different nodes of course). Figuring out how to reshard was not so simple at all.

This inspired some thinking about how we would do the same to our MySQL instances. Its not a challenge yet (we have much more KV data than relational data) but if we’re lucky we’ll get much more data pretty soon, and the issue will pop up in a few months.

I was thinking about what makes this so hard to think about (for someone with a small brain like me at least). It is probably about letting go of symmetry. Do you have the same trouble?

Imagine you have a database with information about users and perhaps data for/by those users. You might use horizontal partitioning with consistent hashing to distribute this data across two machines, which use master/slave replication between them for resilience. You might partition the data into four shards so you can scale out to 4 physical master servers later without repartitioning. It’d look like this:

diagram of shards on 2 mirrored servers

Now imagine that you add a new data center and you need to split the data between the two data centers. Assuming your database only does master/slave (like MySQL) and you prefer to daisy-chain the replication it might look like this:

diagram of shards on 2 servers in 2 data centers

Now imagine adding a third data center and an additional machine in each data center to provide extra capacity for reads. Maybe:

diagram of shards on 3 servers in 3 data centers

You can see that the configuration has suddenly become a bit unbalanced and also somewhat non-obvious. Given this availability of hardware, the most resilient distribution of masters and slaves is not symmetric at all. When you lose symmetry the configuration becomes much more complicated to understand and manage.

Now imagine a bigger database. To highlight how to deal with asymmetry, imagine 4 data centers with 3 nodes per data center. Further imagine having 11 shards to distribute, wanting at least one slave in the same data center as its master. Further imagine wanting 3 slaves for each master. Further imagine wanting to have the data split as evenly as possible across all data centers, so that you can lose up to half of them without losing any data.

Can you work out how to do that configuration? Maybe:

diagram showing many shards on many servers in many data centers

Can you work out how to do it for, oh, 27 data centers, 400 shards with at least 3 copies of each shard, with 7 data centers having 15% beefier boxes and one data center having twice the number of boxes?

As the size of the problem grows, a diagram of a good solution looks more and more like a circle that quickly becomes hard to draw.

When you add in the need to be able to reconfigure server allocations, to fail over masters to one of their slaves, and more, it turns out you eventually need software assistance to solve the mapping of shards to servers. We haven’t written such software yet; I suspect we can steal it from Voldemort by the time we need it 🙂

The tipping point is when you lose symmetry (the 3rd data centre, the 5th database node, etc).

Consistent hashing helps make it easier to do resharding and node rewiring, especially when you’re not (only) dependent on something like mysql replication. But figuring out what goes in which bucket is not as easy as figuring out which bucket goes where. Unless you have many hundreds of buckets, then maybe you can assume your distribution of buckets is even enough if you hash the bucket id to find the server id.

Advertisements
2 Comments
  1. Steve permalink
    October 1, 2009 22:31

    ouch – my head hurts…

  2. Leo Simons permalink*
    November 24, 2009 19:08

    And guess what: we actually made a mistake with our KV reshard mesh config. Grr. You can read about it here: http://enda.squarespace.com/tech/2009/11/24/problems-with-replication-maps.html

Comments are closed.

%d bloggers like this: