<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>LSD::RELOAD</title>
	<atom:link href="http://lsimons.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://lsimons.wordpress.com</link>
	<description>Blog where Leo talks tech</description>
	<lastBuildDate>Sun, 18 Oct 2009 12:59:34 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='lsimons.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/139aa6670041c087cfe4f445c0195fd0?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>LSD::RELOAD</title>
		<link>http://lsimons.wordpress.com</link>
	</image>
			<item>
		<title>Improving accessibility of this blog</title>
		<link>http://lsimons.wordpress.com/2009/10/18/improving-accessibility/</link>
		<comments>http://lsimons.wordpress.com/2009/10/18/improving-accessibility/#comments</comments>
		<pubDate>Sun, 18 Oct 2009 12:59:34 +0000</pubDate>
		<dc:creator>Leo Simons</dc:creator>
				<category><![CDATA[Blogging]]></category>
		<category><![CDATA[Tech]]></category>

		<guid isPermaLink="false">http://lsimons.wordpress.com/?p=405</guid>
		<description><![CDATA[I&#8217;ve been studying accessibility. After reading a whole bunch of stuff I decided I should try to make my blog as accessible as reasonably possible. This blog post chronicles some of the changes I&#8217;ve made. If this helps you enjoy my blog more, please do let me know. If I made anything worse, please also [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=405&subd=lsimons&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I&#8217;ve been studying accessibility. After reading a whole bunch of stuff I decided I should try to make my blog as accessible as reasonably possible. This blog post chronicles some of the changes I&#8217;ve made. If this helps you enjoy my blog more, please do let me know. If I made anything worse, please also let me know <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h3>Different wordpress theme</h3>
<p>This new theme features:</p>
<ul>
<li>High-contrast design (good old black-on-white)</li>
<li>relative font sizes in CSS</li>
<li>reasonable serif font</li>
<li>clean &#8220;pure&#8221; CSS for layout</li>
<li>layout/ordering of elements of html on the page is reasonable</li>
</ul>
<h3>Different theme configuration</h3>
<p>Some of the changes:</p>
<ul>
<li>Re-ordered the widgets</li>
<li>Replaced the calendar widget with a list of archived posts</li>
<li>Added more descriptive titles to the widgets</li>
<li>Changed the custom HTML for the RSS link (relative font size, alt tag for icon)</li>
<li>Stop using drop down menus for lists of links</li>
<li>Made the blog tagline more descriptive</li>
</ul>
<p>I also tried replacing the search widget with something more reasonable but failed to hack that up. If someone from wordpress could please make it resemble something like</p>
<pre>
&lt;form method="get" id="search_form" action="http://lsimons.wordpress.com/"&gt;
    &lt;div&gt;
        &lt;label for="s"&gt;Search:&lt;/label&gt;&lt;br /&gt;
        &lt;input type="text" name="s" id="s" /&gt;
        &lt;input type="submit" value="Go" /&gt;
    &lt;/div&gt;
&lt;/form&gt;
</pre>
<p>that would be great. Or, if someone knows how to do an accessible search box on a wordpress.com blog, please let me know <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Secondly, it would be nice if the theme could be modified to have &#8220;skip to content&#8221; and &#8220;skip to navigation&#8221; links, which I seem to have no way of doing myself.</p>
<h3>Change of blog contents</h3>
<p>Some of the changes:</p>
<ul>
<li>Fixed some bits of broken HTML</li>
<li>Added reasonable (descriptive but less than 100 characters) alt tags to images</li>
<li>Replaced use of &lt;h4&gt; with &lt;h3&gt; in blog post contents (so that the header structure goes h1 &#8211; h2 &#8211; h3 properly)</li>
<li>Cleaned up the page navigation to have just Home and About pages</li>
<li>Cleaned up the About page, in particular getting rid of the strike-through links which seems to be a subtlety that is easily lost when using a screen reader</li>
</ul>
<h3>&#8220;Doing&#8221; accessibility using tools</h3>
<p>I got rather depressed a couple of years ago with the complete lack of tool support, and basically just gave up. Recently I talked to some front-end engineers at the BBC (which cares a great deal about accessibility) who gave me some useful pointers. Tool support has now gotten a lot better. You should take a look around!</p>
<p>Here&#8217;s three links:</p>
<ul>
<li><a href="http://wave.webaim.org/">WebAIM WAVE</a>, an accessibility validator. I feel this validator is much better than any of the others; for example it doesn&#8217;t accuse me of using ASCII art when it encounters a block of java code. It also doesn&#8217;t sounds like a stern kindergarten teacher or some haughty user experience expert.</li>
<li><a href="http://www.standards-schmandards.com/projects/fangs/">Fangs</a>, an easy-to-use firefox plugin to do &#8220;screen reader emulation&#8221;. Basically it processes a web page and then spits out text that is roughly what a screen reader would say.</li>
<li><a href="http://www.webaim.org/simulations/screenreader.php">WebAIM Screen reader simulator</a>, gives you an idea of what a screen reader does, good to try if you&#8217;ve never seen an actual screen reader in action.</li>
</ul>
<h3>Dive into accessibility</h3>
<p>If you&#8217;ve never bothered about accessibility before but you&#8217;re interested now, I suggest you start your reading with <a href="http://diveintoaccessibility.org/">Dive into Accessibility</a>, a free online book with lots of clear practical advice by someone that knows his stuff and actually builds websites out there in the real world. From the introduction:</p>
<blockquote><p>
Don&#8217;t panic if you are not an HTML expert. Don&#8217;t panic if the only web site you have is a personal weblog, you picked your template out of a list on your first day of blogging, and you&#8217;ve never touched it since. I am not here to tell you that you need to radically redesign your web site from scratch, rip out all your nested tables, and convert to XHTML and CSS. This is about taking what you have and making it better in small but important ways.
</p></blockquote>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lsimons.wordpress.com/405/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lsimons.wordpress.com/405/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/lsimons.wordpress.com/405/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/lsimons.wordpress.com/405/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/lsimons.wordpress.com/405/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/lsimons.wordpress.com/405/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/lsimons.wordpress.com/405/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/lsimons.wordpress.com/405/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/lsimons.wordpress.com/405/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/lsimons.wordpress.com/405/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=405&subd=lsimons&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://lsimons.wordpress.com/2009/10/18/improving-accessibility/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/90d726e803a6829d46a165f1729e68b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lsimons</media:title>
		</media:content>
	</item>
		<item>
		<title>First look at open source IntelliJ</title>
		<link>http://lsimons.wordpress.com/2009/10/16/open-source-intellij/</link>
		<comments>http://lsimons.wordpress.com/2009/10/16/open-source-intellij/#comments</comments>
		<pubDate>Fri, 16 Oct 2009 16:16:21 +0000</pubDate>
		<dc:creator>Leo Simons</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://lsimons.wordpress.com/?p=386</guid>
		<description><![CDATA[IntelliJ IDEA was open sourced yesterday!
Codebase overview

over 20k java source files, totalling just over 2M lines
over 150 jar files
over 500 xml files
build system based on ant, gant, and a library called jps for running intellij builds for which the source apparently is not available yet (see IDEA-25160)
Apache license header applied to most of the files, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=386&subd=lsimons&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p><a href="http://blogs.jetbrains.com/idea/2009/10/intellij-idea-open-sourced/">IntelliJ IDEA was open sourced yesterday</a>!</p>
<h3>Codebase overview</h3>
<ul>
<li>over 20k java source files, totalling just over 2M lines</li>
<li>over 150 jar files</li>
<li>over 500 xml files</li>
<li><a href="http://git.jetbrains.org/?p=idea/community.git;a=tree;f=build">build system</a> based on ant, <a href="http://gant.codehaus.org/">gant</a>, and a library called <a href="http://git.jetbrains.org/?p=idea/community.git;a=history;f=build/lib/gant/lib/jps.jar">jps</a> for running intellij builds for which the source apparently is not available yet (see <a href="http://youtrack.jetbrains.net/issue/IDEA-25160">IDEA-25160</a>)</li>
<li>Apache license header applied to most of the files, copyrights both jetbrains and a variety of individuals, <a href="http://git.jetbrains.org/?p=idea/community.git;a=tree;f=license">license data</a> not quite complete, no NOTICE.txt (see <a href="http://youtrack.jetbrains.net/issue/IDEA-25161">IDEA-25161</a>)</li>
<li><a href="http://git.jetbrains.org/?p=idea/community.git;a=tree;f=platform">./platform</a> is the core system</li>
<li><a href="http://git.jetbrains.org/?p=idea/community.git;a=tree;f=plugins">./plugins</a> plug into the core platform</li>
<li><a href="http://git.jetbrains.org/?p=idea/community.git;a=tree;f=java">./java</a> and <a href="http://git.jetbrains.org/?p=idea/community.git;a=tree;f=xml">./xml</a> are bigger plugin-collection-ish subsystems</li>
</ul>
<h3>Building&#8230;</h3>
<ul>
<li>Install ant (there is an ant in <a href="http://git.jetbrains.org/?p=idea/community.git;a=tree;f=lib/ant">./lib/ant</a>)</li>
<li>Run ant</li>
<li>Build takes about 7 minutes on my macbook</li>
</ul>
<h3>Running&#8230;</h3>
<p>On Mac OS X I run into <a href="http://www.jetbrains.net/devnet/thread/279482">64 bit problems</a>. Falling back to a <a href="http://samcogan.com/blog/?p=23">32-bit version of JDK 5.0</a> works for me&#8230;seems like <a href="http://blogs.jetbrains.com/idea/2009/10/updated-build-of-community-edition-released/">jetbrains may have just fixed it</a>.</p>
<pre>
cd /System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/Home/bin
sudo bash
mv java java.orig
lipo java -remove x86_64 -output java_x32
ln -s java_32 java
cd -
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/Home
export PATH=$JAVA_HOME/bin:$PATH
rm -Rf out
ant
cd out/artifacts
unzip ideaIC-90.SNAPSHOT.mac.zip
open ./Maia-IC-90.SNAPSHOT.app
</pre>
<p>Loading the idea source code into your just-built ide works seemlessly (just navigate to your git repo, an intellij project is already set up in the <a href="http://git.jetbrains.org/?p=idea/community.git;a=tree;f=.idea">.idea</a> directory.</p>
<h3>Reading the code</h3>
<p><code>com.intellij.idea.Main</code> uses <code>Boostrap</code> and <code>MainImpl</code> to invoke <code>IdeaApplication.run()</code>. We&#8217;re in <a href="http://www.jetbrains.com/idea/plugins/plugin_developers.html">IntelliJ OpenAPI</a> land now. Somewhere further down the call stack something creates an <code><a href="http://git.jetbrains.org/?p=idea/community.git;a=blob;f=platform/platform-impl/src/com/intellij/openapi/application/impl/ApplicationImpl.java">ApplicationImpl</a></code> which uses <a href="http://www.picocontainer.org/">PicoContainer</a>. w00t! That makes much more sense to me than the heavyweight OSGi/equinox that&#8217;s underpinning eclipse. Its where plugins and extensions get loaded, after which things become very fluid and multi-threaded and harder to follow.</p>
<p>So now I&#8217;m thinking I should find a way to hook up IntelliJ into a debugger inside another IntelliJ&#8230;though it&#8217;d be cool if intellij was somehow &#8220;self-hosting&#8221; in that sense. Here&#8217;s hoping the intellij devs will write some how-to-hack docs soon!</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lsimons.wordpress.com/386/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lsimons.wordpress.com/386/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/lsimons.wordpress.com/386/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/lsimons.wordpress.com/386/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/lsimons.wordpress.com/386/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/lsimons.wordpress.com/386/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/lsimons.wordpress.com/386/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/lsimons.wordpress.com/386/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/lsimons.wordpress.com/386/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/lsimons.wordpress.com/386/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=386&subd=lsimons&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://lsimons.wordpress.com/2009/10/16/open-source-intellij/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/90d726e803a6829d46a165f1729e68b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lsimons</media:title>
		</media:content>
	</item>
		<item>
		<title>java 1.6 exposes the system load average</title>
		<link>http://lsimons.wordpress.com/2009/10/15/java-load-average/</link>
		<comments>http://lsimons.wordpress.com/2009/10/15/java-load-average/#comments</comments>
		<pubDate>Thu, 15 Oct 2009 16:30:49 +0000</pubDate>
		<dc:creator>Leo Simons</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://lsimons.wordpress.com/?p=382</guid>
		<description><![CDATA[See the javadoc.
Example usage:

import java.lang.management.ManagementFactory;
import java.lang.management.OperatingSystemMXBean;

public class LoadAverage {
    public static void main(String[] args) {
        final OperatingSystemMXBean osStats =
                ManagementFactory.getOperatingSystemMXBean();
        final double loadAverage [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=382&subd=lsimons&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>See the <a href="http://java.sun.com/javase/6/docs/api/java/lang/management/OperatingSystemMXBean.html#getSystemLoadAverage%28%29">javadoc</a>.</p>
<p>Example usage:</p>
<pre>
import java.lang.management.ManagementFactory;
import java.lang.management.OperatingSystemMXBean;

public class LoadAverage {
    public static void main(String[] args) {
        final OperatingSystemMXBean osStats =
                ManagementFactory.getOperatingSystemMXBean();
        final double loadAverage = osStats.getSystemLoadAverage();
        System.out.println(String.format("load average: %f", loadAverage));
    }
}
</pre>
<p>This is a rather useful feature if you are writing software that should do less when the overall system load is high.</p>
<p>For example, if you&#8217;re me, you might be working on a java daemon that is instructing some CouchDB instances on the same box to do database compactions and/or replications, and you could use this to tune down the concurrency or the frequency if the load average is above a threshold.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lsimons.wordpress.com/382/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lsimons.wordpress.com/382/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/lsimons.wordpress.com/382/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/lsimons.wordpress.com/382/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/lsimons.wordpress.com/382/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/lsimons.wordpress.com/382/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/lsimons.wordpress.com/382/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/lsimons.wordpress.com/382/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/lsimons.wordpress.com/382/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/lsimons.wordpress.com/382/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=382&subd=lsimons&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://lsimons.wordpress.com/2009/10/15/java-load-average/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/90d726e803a6829d46a165f1729e68b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lsimons</media:title>
		</media:content>
	</item>
		<item>
		<title>You don&#8217;t know and you don&#8217;t understand</title>
		<link>http://lsimons.wordpress.com/2009/10/10/you-dont-know-and-you-dont-understand/</link>
		<comments>http://lsimons.wordpress.com/2009/10/10/you-dont-know-and-you-dont-understand/#comments</comments>
		<pubDate>Sat, 10 Oct 2009 14:23:28 +0000</pubDate>
		<dc:creator>Leo Simons</dc:creator>
				<category><![CDATA[Life]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://lsimons.wordpress.com/?p=378</guid>
		<description><![CDATA[You know much less than you think you know. You misunderstand many more things than you think you do. You&#8217;re also much more wrong much more often than you think.
(Don&#8217;t worry, it&#8217;s not just you, the same is true for everyone else.)
Even better, this is how science works. Being a scientist is all about actively [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=378&subd=lsimons&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>You know much less than you think you know. You misunderstand many more things than you think you do. You&#8217;re also much more wrong much more often than you think.</p>
<p>(Don&#8217;t worry, it&#8217;s not just you, the same is true for everyone else.)</p>
<p>Even better, this is how science works. Being a scientist is all about actively trying to be wrong (and proving everyone else wrong), all the time. When you do science, you don&#8217;t know, and what you learn doing the science, you don&#8217;t ever know for sure.</p>
<h3>The scientific method</h3>
<p>Here&#8217;s the basic steps in the <a href="http://en.wikipedia.org/wiki/Scientific_method">scientific method</a>:</p>
<ol>
<li>Based on past experience of you and others, try and make some sense of a problem</li>
<li>Try to find a reasonable explanation for the problem</li>
<li>If the explanation is correct, what else would you be able to see or measure?</li>
<li>Try to disprove the explanation by doing the observation and measuring</li>
</ol>
<p>Scientists do this all day every day, they do it together on a world-wide scale, and they do it to each other.</p>
<h3>Experimentation</h3>
<p>In uni, studying applied physics, I was trained in a specific application of the scientific method to experimentation, which went something like:</p>
<ol>
<li>Define a question to answer.</li>
<li>Define what you already know (or will assume) that is related.</li>
<li>Form a hypothesis of what the answer may be.</li>
<li>Figure out what you can measure.</li>
<li>Define how those measurements could be interpreted to verify or disprove the hypothesis.</li>
<li>Do the experiments and collect the measurements.</li>
<li>Analyze the data.</li>
<li>Assert the internal consistency of the experimental data by applying statistics.</li>
<li>Draw conclusions from the analysis.</li>
</ol>
<p>The course was called <em>Introduction to Experimentation</em>, and it included many more specifics than just that process. For example, it was also about teamwork basics, the use of lab journals, safe lab practices, how to think about <a href="http://en.wikipedia.org/wiki/Accuracy_and_precision">accuracy and precision</a>, and quite a lot of engineering discipline.</p>
<p>The course was nearly completely free of actually interesting math or physics content. For example, the first two 4-hour practicums of the course centered around the measurement of the resistance of a 10 ohm <a href="http://en.wikipedia.org/wiki/Resistor">resistor</a>. Some of the brighest 18- and 19-year olds in the country would leave that practicum feeling properly stupid for the first time, very frustrated that they had &#8220;proven&#8221; the resistor to have a resistance of 11+/-0.4 Ohm (where in reality the resistor was &#8220;known&#8221; to be something like 10.000+/-0.001 Ohm).</p>
<h3>The art of being wrong</h3>
<p><em>Teaching</em> that same course (some 2 years later) has turned out to be one of the most valuable things I&#8217;ve ever done in my life. One of the key things that students learned in that course was that the teacher might not know either &#8211; after all a lab is a strange and wonderful place, and volt meters can in fact break! The teacher in turn learned that even when teaching something seemingly trivial it is possible to be utterly wrong. Powerful phrases that I learned to use included &#8220;I don&#8217;t know either&#8221;, &#8220;You are probably right, but <em>I</em> really don&#8217;t understand what&#8217;s going on&#8221;, &#8220;Are you sure?&#8221;, &#8220;I&#8217;m not sure&#8221;, &#8220;How can you be so sure?&#8221;, &#8220;How can we test that?&#8221;, and the uber-powerful &#8220;Ah yes, so I was wrong&#8221; (eclipsed in power only by &#8220;Ok, enough of this, let&#8217;s go drink beer&#8221;).</p>
<p>This way of inquisitive thinking, with its fundamental acceptance of uncertainty and being wrong, was later amplified by studying things like quantum mechanics with its horrible math and even more horrible concepts. &#8220;I don&#8217;t know&#8221; became my default mind-state. Today, it is one of the most important things I contribute to my work environment (whether it is doing software development, project management, business analytics doesn&#8217;t matter) &#8211; the power to say &#8220;I don&#8217;t know&#8221; and/or &#8220;I was wrong&#8221;.</p>
<p>For the last week or two I&#8217;ve had lots of fun working closely with a similarly schooled engineer (he really doesn&#8217;t know anything either&#8230;) to try and debug and change a complex software system. It&#8217;s been useful staring at the same screen, arguing with each other that really we don&#8217;t know enough about X or Y or Z to even try and form a hypothesis. Communicating out to the wider group, I&#8217;ve found that almost everyone cringes at the phrase &#8220;we don&#8217;t know&#8221; or my recent favorite &#8220;we still have many unknown unknowns&#8221;. Not knowing seems to be a horrible state of mind, rather than the normal one.</p>
<h3>Bits and bytes don&#8217;t lie?</h3>
<p>I have a hypothesis about that aversion to the unknown: people see computers as doing simple boolean logic on bits and bytes, so it should be quite possible to just know everything about a software system. As they grow bigger, all that changes is that there are more operations on more data, but you never really stop knowing. A sound and safe castle of logic!</p>
<p>In fact, I think that&#8217;s a lot of what computer science teaches (as far as I know, I never actually studied computer science in university, I just argued a lot with the computer so-called-scientists). You start with clean discrete math and through state machines and automata and functional programming you can eventually find your way to the design of distributed systems and all the way to the nirvana of the artificial intelligence. (AI being much better than the messy biological reality of forgetting things and the like.) Dealing with uncertainty and unknowns is not what computer science seems to be about.</p>
<p>The model of &#8220;clean logic all the way down&#8221; is completely useless when doing actual software development work. Do you really know which compiler was used on which version of the source code that led to the firmware that is now in your raid controller, and that there are no relevant bugs in it or in that compiler? Are you sure the RAM memory is plugged in correctly in all your 200 boxes? Is your data centre shielded enough from magnetic disturbances? Is that code you wrote 6 months ago really bug-free? What about that open source library you&#8217;re using everywhere?</p>
<p>In fact, this computer scientist focus on logic and algorithms and a high appreciation building systems is worse than just useless. It creates real problems. It means the associated industry sees its output in terms of lines of code written, features delivered, etc. The most revered super star engineers are those that crank out new software all the time. Web frameworks are popular because you can build an entire blog with them in 5 minutes.</p>
<p>Debugging and testing, that&#8217;s what people that make mistakes have to do. Software design is a group activity but debugging is something you do on your own without telling anyone that you can&#8217;t find your own mistake. If you are really good you will make fewer mistakes, will have to spend less time testing, and so produce more and better software more quickly. If you are really <em>really</em> good you might do test-driven development and with your 100% test coverage you just <em>know</em> that you <em>cannot</em> be wrong&#8230;</p>
<p>The environment in which we develop software is not nearly as controlled as we tend to assume. Our brains are not nearly as powerful as we believe. By not looking at the environment, by not accepting that there is quite a lot we don&#8217;t know, we become very bad at forming a reasonable hypothesis, and worse at interpreting our test data.</p>
<h3>Go measure a resistor</h3>
<p>So here&#8217;s my advice to people that want to become better software developers: try and measure some resistors. Accept that you&#8217;re wrong, that you don&#8217;t know, and that you don&#8217;t understand.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lsimons.wordpress.com/378/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lsimons.wordpress.com/378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/lsimons.wordpress.com/378/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/lsimons.wordpress.com/378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/lsimons.wordpress.com/378/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/lsimons.wordpress.com/378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/lsimons.wordpress.com/378/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/lsimons.wordpress.com/378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/lsimons.wordpress.com/378/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/lsimons.wordpress.com/378/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=378&subd=lsimons&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://lsimons.wordpress.com/2009/10/10/you-dont-know-and-you-dont-understand/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/90d726e803a6829d46a165f1729e68b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lsimons</media:title>
		</media:content>
	</item>
		<item>
		<title>[RT] MyCouch</title>
		<link>http://lsimons.wordpress.com/2009/10/07/rt-mycouch/</link>
		<comments>http://lsimons.wordpress.com/2009/10/07/rt-mycouch/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 23:40:33 +0000</pubDate>
		<dc:creator>Leo Simons</dc:creator>
				<category><![CDATA[Apache]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Work]]></category>
		<category><![CDATA[databases]]></category>

		<guid isPermaLink="false">http://lsimons.wordpress.com/?p=373</guid>
		<description><![CDATA[The below post is an edited version of a $work e-mail, re-posted here at request of some colleagues that wanted to forward the story. My apologies if some of the bits are unclear due to lack-of-context. In particular, let me make clear:

we have had a production CouchDB setup for months that works well
we are planning [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=373&subd=lsimons&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>The below post is an edited version of a <code>$work</code> e-mail, re-posted here at request of some colleagues that wanted to forward the story. My apologies if some of the bits are unclear due to lack-of-context. In particular, let me make clear:</p>
<ul>
<li>we have had a production CouchDB setup for months that works well</li>
<li>we are planning to keep that production setup roughly intact for many more months and we are <em>not</em> currently planning to migrate away from CouchDB <em>at all</em></li>
<li>overall we are big fans of the CouchDB project and its community and we expect great things to come out of it</li>
</ul>
<p>Nevertheless using pre-1.0 software based on an archaic language with rather crappy error handling can get frustrating <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<pre>
Subject: [RT] MyCouch
From: Leo Simons
To: Forge Engineering
</pre>
<p>This particular RT gives one possible answer to the question &#8220;what would be a good way to make this KV debugging somewhat less frustrating?&#8221; <em>(we have been fighting erratic response times from CouchDB under high load while replicating and compacting)</em></p>
<p>That answer is &#8220;we could probably replace CouchDB with java+mysql, and it might even be easy to do so&#8221;. And, then, &#8220;if it really is easy, that&#8217;s extra cool (and _because of_ CouchDB)&#8221;.)</p>
<h4>Why replace CouchDB?</h4>
<p>Things we really like about CouchDB (as the backend for our KV service):</p>
<ul>
<li>The architecture: HTTP/REST all the way down, MVCC, many-to-many replication, scales without bound, neat composable building blocks makes an evolvable platform.</li>
<li>Working system: Its in production, its running, its running pretty well.</li>
<li>Community: open source, active project, know the developers, &#8220;cool&#8221;.</li>
<li>Integrity: it hasn&#8217;t corrupted or lost any data yet, and it probably won&#8217;t anytime soon.</li>
</ul>
<p>Things we like less:</p>
<ul>
<li>Debugging: cryptic error messages, erlang stack straces, process deaths.</li>
<li>Capacity planning: many unknown and changing performance characteristics.</li>
<li>Immaturity: pre-1.0.</li>
<li>Humanware: lack of erlang development skills, lack of DBA-like skills, lack of training material (or trainers) to gain skills.</li>
<li>Tool support: JProfiler for erlang? Eclipse for erlang? Etc.</li>
<li>Map/Reduce and views: alien concept to most developers, hard to audit and manage free-form javascript from tenants, hard to use for data migrations and aggregations.</li>
<li>JSON: leads to developers storing JSON which is horribly inefficient.</li>
</ul>
<p>Those things we don&#8217;t like about couch unfortunately aren&#8217;t going to change very quickly. For example, the effort required to train up a bunch of DBAs so they can juggle CouchDB namespaces and instances and on-disk data structures is probably rather non-trivial.</p>
<h4>The basic idea</h4>
<p>It is not easy to see what other document storage system out there would be a particularly good replacement. Tokyo Cabinet, Voldemort, Cassandra, &#8230; all of these are also young and immature systems with a variety of quirks. Besides, we really really like the CouchDB architecture.</p>
<p>So why don&#8217;t we replace CouchDB with a re-implemented CouchDB? We keep the architecture almost exactly the same, but re-implement the features we care about using technology that we know well and is in many ways much more boring. &#8220;HTTP all the way down&#8221; should mean this is possible.</p>
<p>We could use mysql underneath (but not use any of its built-in replication features). The java program on top would do the schema and index management, and most importantly implement the CouchDB replication and compaction functionality.</p>
<p>We could even keep the same deployment structure. Assuming one java server is paired with one mysql database instance, we&#8217;d end up with 4 tomcat instances on 4 ports (5984-5987) and 4 mysql services on 4<br />
other ports (3306-3309). Use of mysqld_multi probably makes sense. Eventually we could perhaps optimize a bit more by having one tomcat process and one mysql process &#8211; it&#8217;ll make better use of memory.</p>
<p>Now, what is really really really cool about the CouchDB architecture and its complete HTTP-ness is that we should be able to do any actual migration one node at a time, without downtime. Moving the data across<br />
is as simple as running a replication. Combined with the fact that we&#8217;ve been carefully avoiding a lot of its features, CouchDB is probably one of the _easiest_ systems to replace <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':-D' class='wp-smiley' /> </p>
<h4>Database implementation sketch</h4>
<p>How would we implement the database? If we think of our KV data as having the form</p>
<pre>
  ns1:key1 [_rev=1-12345]: { ...}
  ns1:key2 [_rev=2-78901]: { subkey1: ..., }
  ns2:key3 [_rev=1-43210]: { subkey1: ..., subkey2: ...}
</pre>
<p>where the first integer part of the _rev is dubbed &#8220;v&#8221; and the remainder part as &#8220;src&#8221;, then a somewhat obvious database schema looks like <em>(disclaimer: schema has not been tested, do not use <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> )</em>:</p>
<pre>
CREATE TABLE namespace (
  id varchar(64) NOT NULL PRIMARY KEY
      CHARACTER SET ascii COLLATE ascii_bin,
  state enum('enabled','disabled','deleted') NOT NULL
) ENGINE=InnoDB;

CREATE TABLE {namespace}_key (
  ns varchar(64) NOT NULL
      CHARACTER SET ascii COLLATE ascii_bin,
  key varchar(180) NOT NULL
      CHARACTER SET ascii COLLATE ascii_bin,
  v smallint UNSIGNED NOT NULL,
  src int UNSIGNED NOT NULL,

  PRIMARY KEY (ns, key, v, src),
  FOREIGN KEY (ns) REFERENCES namespace(id)
) ENGINE=InnoDB;

CREATE TABLE {namespace}_value (
  ns varchar(64) NOT NULL
      CHARACTER SET ascii COLLATE ascii_bin,
  key varchar(180) NOT NULL
      CHARACTER SET ascii COLLATE ascii_bin,
  v smallint UNSIGNED NOT NULL,
  src int UNSIGNED NOT NULL,
  subkey varchar(255) NOT NULL
      CHARACTER SET utf8 COLLATE utf8_general_ci,
  small_value varchar(512) DEFAULT NULL
      CHARACTER SET utf8 COLLATE utf8_general_ci
      COMMENT 'will contain the value if it fits',
  large_value mediumtext DEFAULT NULL
      CHARACTER SET utf8 COLLATE utf8_general_ci
      COMMENT 'will contain the value if its big',

  PRIMARY KEY (ns, key, v, src, subkey),
  FOREIGN KEY (ns) REFERENCES namespace(id),
  FOREIGN KEY (ns, key, v, src)
      REFERENCES {namespace}_key(ns, key, v, src)
      ON DELETE CASCADE
) ENGINE=InnoDB;
</pre>
<p>With obvious queries including</p>
<pre>
  SELECT id FROM namespace WHERE state = 'enabled';

  SELECT key FROM {namespace}_key WHERE namespace_id = ?;
  SELECT key, v, src FROM {namespace}_key WHERE namespace_id = ?;
  SELECT v, src FROM {namespace}_key WHERE namespace_id = ?
      AND key = ?;
  SELECT v, src FROM {namespace}_key WHERE namespace_id = ?
      AND key = ? ORDER BY version DESC LIMIT 1;
  SELECT subkey, small_value FROM {namespace}_value
      WHERE namespace_id = ? AND key = ? AND v = ? AND src = ?;
  SELECT large_value FROM {namespace}_value
      WHERE namespace_id = ? AND key = ? AND v = ? AND src = ?
      AND subkey = ?;

  BEGIN;
  CREATE TABLE {namespace}_key (...);
  CREATE TABLE {namespace}_value (...);
  INSERT INTO namespace(id) VALUES (?);
  COMMIT;

  UPDATE namespace SET state = 'disabled' WHERE id = ?;
  UPDATE namespace SET state = 'deleted' WHERE id = ?;

  BEGIN;
  DROP TABLE {namespace}_value;
  DROP TABLE {namespace}_key;
  DELETE FROM namespace WHERE id = ?;
  COMMIT;

  INSERT INTO {namespace}_key (ns,key,v,src)
      VALUES (?,?,?,?);
  INSERT INTO {namespace}_value (ns,key,v,src,small_value)
      VALUES (?,?,?,?,?),(?,?,?,?,?),(?,?,?,?,?),(?,?,?,?,?);
  INSERT INTO {namespace}_value (ns,key,v,src,large_value)
      VALUES (?,?,?,?,?);

  DELETE FROM {namespace}_key WHERE ns = ? AND key = ?;
  DELETE FROM {namespace}_key WHERE ns = ? AND key = ?
      AND v &lt; ?;
  DELETE FROM {namespace}_key WHERE ns = ? AND key = ?
      AND v = ? AND src =?;
</pre>
<p>The usefulness for <code>{namespace}_value</code> is debatable; it helps a lot when implementing CouchDB views or some equivalent functionality (&#8220;get my all the documents in this namespace where subkey1=&#8230;&#8221;), but if we decide not to care, then its redundant and <code>{namespace}_key</code> can grow some additional small_value (which should then be big enough to contain a typical JSON document, i.e. maybe 1k) and large_value columns instead.</p>
<p>Partitioning the tables by <code>{namespace}</code> manually isn&#8217;t needed if we use MySQL 5.1 or later; table partitions could be used instead.</p>
<p>I&#8217;m not sure if we should have a &#8217;state&#8217; on the keys and do soft-deletes; that might make actual DELETE calls faster; it could also reduce the impact of compactions.</p>
<h4>Webapp implementation notes</h4>
<p>The java &#8220;CouchDB&#8221; webapp also does not seem that complicated to build (famous last words?). I would probably build it roughly the same way as <em>[some existing internal webapps]</em>.</p>
<p>The basic GET/PUT/DELETE operations are straightforward mappings onto queries that are also rather straightforward.</p>
<p>The POST /_replicate and POST /_compact operations are of course a little bit more involved, but not that much. Assuming some kind of a pool of url fetchers and some periodic executors&#8230;</p>
<p><strong>Replication:</strong></p>
<ol>
<li>get last-seen revision number for source</li>
<li>get list of updates from source</li>
<li>for each update
<ul>
<li>INSERT key</li>
<li>if duplicate key error, ignore and don&#8217;t update values</li>
<li>INSERT OR REPLACE all the values</li>
</ul>
</li>
</ol>
<p><strong>Compaction:</strong></p>
<ol>
<li>get list of namespaces</li>
<li>for each namespace:
<ul>
<li><code>SELECT key, v, src FROM {namespace}_key WHERE namespace_id = ? ORDER BY key ASC, v DESC, src DESC;</code></li>
<li>skip the first row for each key</li>
<li>if the second row for the key is the same v, conflict, don&#8217;t compact for this key</li>
<li><code>DELETE IGNORE FROM {namespace}_key WHERE ns = ? AND key = ? AND v = ? AND src =?;</code></li>
</ul>
</li>
</ol>
<p>So we need some kind of a replication record; once we have mysql available using &#8220;documents&#8221; seems awkward; let&#8217;s use a database table. We might as well have one more MySQL database on each server with a<br />
full copy of a &#8216;kvconfig&#8217; database, which is replicated around (using mysql replication) to all the nodes. Might also want to migrate away from NAMESPACE_METADATA documents&#8230;though maybe not, it <em>is</em> nice and flexible that way.</p>
<h4>Performance notes</h4>
<p>In theory, the couchdb on-disk format should be much faster than innodb for writes. In practice, innodb has seen quite a few years of tuning. More importantly, in our tests on our servers raw mysql performance seems to be rather better than couchdb. Some of that is due to the extra fsyncs in couchdb, but not all of it.</p>
<p>In theory, the erlang OTP platform should scale out much better than something java-based. In practice, the http server inside couchdb is pretty much a standard fork design using blocking I/O. More importantly, raw tomcat can take &gt;100k req/s on our hardware, which is much much more than our disks can do.</p>
<p>In theory, having the entire engine inside one process should be more efficient than java talking to mysql over TCP. In practice, I doubt this will really show up if we run java and mysql on the same box. More importantly, if this does become an issue, longer-term we may be able to &#8220;flatten the stack&#8221; by pushing the java &#8220;CouchDB&#8221; up into the service layer and merging it with the KV service, at which point java-to-mysql will be rather more efficient than java-to-couch.</p>
<p>In theory and in practice innodb has better indexes for the most common SELECTs/GETs so it should be a bit faster. It also is better at making use of large chunks of memory. I suspect the two most common requests (GET that returns 200, GET that returns 404) will both be faster, which incidentally are the most important for us to optimize, too.</p>
<p>We might worry java is slow. That&#8217;s kind-of silly <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> . In theory and in practice garbage collection makes software go faster. We just need to avoid doing those things that make it slow.</p>
<p>The overhead of ACID guarantees might be a concern. Fortunately MySQL is not _really_ a proper relational database if you don&#8217;t want it to be. We can probably set the transaction isolation level to READ UNCOMMITTED safely, and the schema design / usage pattern is such that we don&#8217;t need transactions in most places. More importantly we are keeping the eventual consistency model, with MVCC and all, on a larger scale. Any over-ACID-ness will be local to the particular node only.</p>
<p>Most importantly, this innodb/mysql thing is mature/boring technology that powers a lot of the biggest websites in the world. As such, you can buy books and consultancy and read countless websites about mysql/innodb/tomcat tuning. Its performance characteristics are pretty well-known and pretty predictable, and lots of people (including here at $work) can make those predictions easily.</p>
<h4>So when are we doing this?</h4>
<p>No no, we&#8217;re not, that&#8217;s not the point, this is just a RT! I woke up (rather early) with this idea in my head so I wrote it down to make space for other thoughts. At a minimum, I hope the above helps propagate some ideas:</p>
<ul>
<li>just how well we applied REST and service-oriented architecture here and the benefits its giving us</li>
<li>in particular because we picked the right architecture we are not stuck with / tied to CouchDB, now or later</li>
<li>we can always re-engineer things (though we should have good enough reasons)</li>
<li>things like innodb and/or bdb (or any of the old dbs) are actually great tools with some great characteristics</li>
</ul>
<h4>Just like FriendFeed?</h4>
<p>Bret Taylor has a good <a href="http://bret.appspot.com/entry/how-friendfeed-uses-mysql">explanation how FriendFeed built a non-relational database on top of a relational one</a>. The approach outlined above reminds rathe a lot of the solution they implemented, though there&#8217;s also important differences.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lsimons.wordpress.com/373/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lsimons.wordpress.com/373/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/lsimons.wordpress.com/373/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/lsimons.wordpress.com/373/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/lsimons.wordpress.com/373/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/lsimons.wordpress.com/373/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/lsimons.wordpress.com/373/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/lsimons.wordpress.com/373/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/lsimons.wordpress.com/373/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/lsimons.wordpress.com/373/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=373&subd=lsimons&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://lsimons.wordpress.com/2009/10/07/rt-mycouch/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/90d726e803a6829d46a165f1729e68b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lsimons</media:title>
		</media:content>
	</item>
		<item>
		<title>The headache of mapping shards to servers</title>
		<link>http://lsimons.wordpress.com/2009/10/01/the-headache-of-mapping-shards-to-servers/</link>
		<comments>http://lsimons.wordpress.com/2009/10/01/the-headache-of-mapping-shards-to-servers/#comments</comments>
		<pubDate>Thu, 01 Oct 2009 20:48:34 +0000</pubDate>
		<dc:creator>Leo Simons</dc:creator>
				<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Work]]></category>
		<category><![CDATA[databases]]></category>

		<guid isPermaLink="false">http://lsimons.wordpress.com/?p=364</guid>
		<description><![CDATA[At work we had a lot of headache figuring out how to reshard our CouchDB data. We have 2 data centers with 16 CouchDB instances each. One server holds 4 CouchDB nodes. At the moment each data center has one copy of the data. We want to improve resilience so we are changing this so [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=364&subd=lsimons&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>At work we had a lot of headache figuring out how to reshard our CouchDB data. We have 2 data centers with 16 CouchDB instances each. One server holds 4 CouchDB nodes. At the moment each data center has one copy of the data. We want to improve resilience so we are changing this so that each data center has two copies of the data (on different nodes of course). Figuring out how to reshard was not so simple at all.</p>
<p>This inspired some thinking about how we would do the same to our MySQL instances. Its not a challenge yet (we have much more KV data than relational data) but if we&#8217;re lucky we&#8217;ll get much more data pretty soon, and the issue will pop up in a few months.</p>
<p>I was thinking about what makes this so hard to think about (for someone with a small brain like me at least). It is probably about letting go of symmetry. Do you have the same trouble?</p>
<p>Imagine you have a database with information about users and perhaps data for/by those users. You might use horizontal partitioning with consistent hashing to distribute this data across two machines, which use master/slave replication between them for resilience. You might partition the data into four shards so you can scale out to 4 physical master servers later without repartitioning. It&#8217;d look like this:</p>
<div><img src="http://lsimons.files.wordpress.com/2009/10/mysql-replication.png?w=225&#038;h=227" alt="diagram of shards on 2 mirrored servers" title="mysql-replication" width="225" height="227" /></div>
<p>Now imagine that you add a new data center and you need to split the data between the two data centers. Assuming your database only does master/slave (like MySQL) and you prefer to daisy-chain the replication it might look like this:</p>
<div><img src="http://lsimons.files.wordpress.com/2009/10/mysql-dual-site.png?w=464&#038;h=236" alt="diagram of shards on 2 servers in 2 data centers" title="mysql-dual-site" width="464" height="236" /></div>
<p>Now imagine adding a third data center and an additional machine in each data center to provide extra capacity for reads. Maybe:</p>
<div><img src="http://lsimons.files.wordpress.com/2009/10/mysql-multi-site.png?w=636&#038;h=370" alt="diagram of shards on 3 servers in 3 data centers" title="mysql-multi-site" width="636" height="370" /></div>
<p>You can see that the configuration has suddenly become a bit unbalanced and also somewhat non-obvious. Given this availability of hardware, the most resilient distribution of masters and slaves is not symmetric at all. When you lose symmetry the configuration becomes much more complicated to understand and manage.</p>
<p>Now imagine a bigger database. To highlight how to deal with asymmetry, imagine 4 data centers with 3 nodes per data center. Further imagine having 11 shards to distribute, wanting at least one slave in the same data center as its master. Further imagine wanting 3 slaves for each master. Further imagine wanting to have the data split as evenly as possible across all data centers, so that you can lose up to half of them without losing any data.</p>
<p>Can you work out how to do that configuration? Maybe:</p>
<div><img src="http://lsimons.files.wordpress.com/2009/10/circles-in-boxes1.png?w=592&#038;h=578" alt="diagram showing many shards on many servers in many data centers" title="circles-in-boxes" width="592" height="578" /></div>
<p>Can you work out how to do it for, oh, 27 data centers, 400 shards with at least 3 copies of each shard, with 7 data centers having 15% beefier boxes and one data center having twice the number of boxes?</p>
<p>As the size of the problem grows, a diagram of a good solution looks more and more like a circle that quickly becomes hard to draw.</p>
<p>When you add in the need to be able to reconfigure server allocations, to fail over masters to one of their slaves, and more, it turns out you eventually need software assistance to solve the mapping of shards to servers. We haven&#8217;t written such software yet; I suspect we can steal it from Voldemort by the time we need it <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>The tipping point is when you lose symmetry (the 3rd data centre, the 5th database node, etc).</p>
<p>Consistent hashing helps make it easier to do resharding and node rewiring, especially when you&#8217;re not (only) dependent on something like mysql replication. But figuring out what goes in which bucket is not as easy as figuring out which bucket goes where. Unless you have many hundreds of buckets, then maybe you can assume your distribution of buckets is even enough if you hash the bucket id to find the server id.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lsimons.wordpress.com/364/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lsimons.wordpress.com/364/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/lsimons.wordpress.com/364/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/lsimons.wordpress.com/364/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/lsimons.wordpress.com/364/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/lsimons.wordpress.com/364/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/lsimons.wordpress.com/364/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/lsimons.wordpress.com/364/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/lsimons.wordpress.com/364/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/lsimons.wordpress.com/364/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=364&subd=lsimons&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://lsimons.wordpress.com/2009/10/01/the-headache-of-mapping-shards-to-servers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/90d726e803a6829d46a165f1729e68b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lsimons</media:title>
		</media:content>

		<media:content url="http://lsimons.files.wordpress.com/2009/10/mysql-replication.png" medium="image">
			<media:title type="html">mysql-replication</media:title>
		</media:content>

		<media:content url="http://lsimons.files.wordpress.com/2009/10/mysql-dual-site.png" medium="image">
			<media:title type="html">mysql-dual-site</media:title>
		</media:content>

		<media:content url="http://lsimons.files.wordpress.com/2009/10/mysql-multi-site.png" medium="image">
			<media:title type="html">mysql-multi-site</media:title>
		</media:content>

		<media:content url="http://lsimons.files.wordpress.com/2009/10/circles-in-boxes1.png" medium="image">
			<media:title type="html">circles-in-boxes</media:title>
		</media:content>
	</item>
		<item>
		<title>Making Large-scale CSRF protection performant and efficient</title>
		<link>http://lsimons.wordpress.com/2009/09/20/large-scale-csrf/</link>
		<comments>http://lsimons.wordpress.com/2009/09/20/large-scale-csrf/#comments</comments>
		<pubDate>Sun, 20 Sep 2009 15:04:18 +0000</pubDate>
		<dc:creator>Leo Simons</dc:creator>
				<category><![CDATA[Tech]]></category>
		<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://lsimons.wordpress.com/?p=358</guid>
		<description><![CDATA[Defining the challenge
Imagine

&#62;&#62; GET /popularPage HTTP/1.1
&#62;&#62; Host: bbc.co.uk

&#60;&#60; HTTP/1.1 200 OK
&#60;&#60; Set-Cookie: BBC-UID=$BBCUIDCOOKIE
&#60;&#60; &#60;form action="poll" method="POST"&#62;
&#60;&#60;   &#60;input type="text" name="comment"/&#62;
&#60;&#60;   &#60;input type="submit"/&#62;
&#60;&#60; &#60;/form&#62;

&#62;&#62; POST /popularPage/poll HTTP/1.1
&#62;&#62; Host: bbc.co.uk
&#62;&#62; Cookie: BBC-UID=$BBCUIDCOOKIE
&#62;&#62; comment=me%20too

&#60;&#60; HTTP/1.1 200 OK

As presented this form is vulnerable to CSRF. Basic requirements:

We need to provide reasonable protection against CSRF
We need to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=358&subd=lsimons&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><h4>Defining the challenge</h4>
<p>Imagine</p>
<pre>
&gt;&gt; GET /popularPage HTTP/1.1
&gt;&gt; Host: bbc.co.uk

&lt;&lt; HTTP/1.1 200 OK
&lt;&lt; Set-Cookie: BBC-UID=$BBCUIDCOOKIE
&lt;&lt; &lt;form action="poll" method="POST"&gt;
&lt;&lt;   &lt;input type="text" name="comment"/&gt;
&lt;&lt;   &lt;input type="submit"/&gt;
&lt;&lt; &lt;/form&gt;

&gt;&gt; POST /popularPage/poll HTTP/1.1
&gt;&gt; Host: bbc.co.uk
&gt;&gt; Cookie: BBC-UID=$BBCUIDCOOKIE
&gt;&gt; comment=me%20too

&lt;&lt; HTTP/1.1 200 OK
</pre>
<p>As presented this form is vulnerable to <a href="http://en.wikipedia.org/wiki/Cross-site_request_forgery">CSRF</a>. Basic requirements:</p>
<ul>
<li>We need to provide reasonable protection against CSRF</li>
<li>We need to do this in a way that is not very intrusive (i.e. this is not a banking app, it is a comment form)</li>
<li>We need to support users that use privacy software that spoofs the Referer header</li>
<li>We need to support users that have disabled javascript or are using a web browser that does not support javascript</li>
<li>We need to support users that have disabled cookies</li>
<li>We need to support users with old known-broken web browsers</li>
<li>We need to support both logged-in and anonymous users</li>
</ul>
<p>So far, this problem is easy: we can use <a href="http://shiflett.org/articles/cross-site-request-forgeries">well-documented</a> techniques based on unique tokens that are generated and checked server-side. We can also choose to improve the situation a little bit more when users support referer headers, cookies, javascript, etc.</p>
<p>But, here is the additional requirement that makes the problem a tad more difficult:</p>
<ul>
<li>We need to do this CSRF protection in an efficient and scalable way (popular BBC pages receive a lot of web traffic).</li>
</ul>
<p>In particular, we should optimize for the fact that most users requesting /popularPage will not submit this form. We also really want to avoid the use of server-side sessions, or any other similar constructs that become hideously expensive when you have multiple data centers each containing many web servers.</p>
<h4>Being smart</h4>
<p>Imagine we have a cheap way to generate lots of tokens that cannot be spoofed, are strongly associated with the user and/or their web browser, and are easy to verify just by looking at the token. Next, imagine that rather than store the token when it is generated, we only store the token once we receive it as a form submission. If instead we find that the token is already stored, we&#8217;re probably victim of a CSRF attack, so we then have to reject the submission.</p>
<p>It looks roughly like this on the wire:</p>
<pre>
&gt;&gt; GET /popularPage HTTP/1.1
&gt;&gt; Host: bbc.co.uk

&lt;&lt; HTTP/1.1 200 OK
&lt;&lt; Set-Cookie: BBC-UID=$BBCUIDCOOKIE
&lt;&lt; &lt;form action="poll" method="POST"&gt;
&lt;&lt;   &lt;input type="text" name="comment"/&gt;
&lt;&lt;   &lt;input type="hidden" name="bbcuid" value="$BBCUIDCOOKIE"/&gt;
&lt;&lt;   &lt;input type="hidden" name="nonce" value="$NONCE"/&gt;
&lt;&lt;   &lt;input type="submit"/&gt;
&lt;&lt; &lt;/form&gt;

&gt;&gt; POST /popularPage/poll HTTP/1.1
&gt;&gt; Host: bbc.co.uk
&gt;&gt; comment=me%20too&amp;bbcuid=$BBCUIDCOOKIE&amp;nonce=$NONCE

&lt;&lt; HTTP/1.1 200 OK
</pre>
<p>The nonce is calculated using something like this:</p>
<pre>
  BBCUID := COOKIE["BBC-UID"] or PARAM["bbcuid"] or RANDOM_STRING()
  SECRET := "someconstant"
  X := tstamp()
  NONCE := MD5( X, SECRET, BBCUID ) + X
</pre>
<p>To verify the nonce:</p>
<ol>
<li>extract the timestamp X out of the submitted nonce</li>
<li>if X has expired, reject</li>
<li>re-calculate the nonce using the X and BBCUID from the submitted nonce</li>
<li>if submitted nonce and calculated nonce do not match, reject</li>
<li>if the nonce exists in the seen-nonce-store, reject</li>
<li>store the nonce (with an expiry timestamp) in the seen-nonce-store</li>
</ol>
<p>It&#8217;s pretty important to verify the nonce after verifying all the other form data (but before actually processing the form data). That way, if the form fails to validate, you can present the user with the old nonce and they can try again.</p>
<p>Some of my esteemed colleagues came up with the above (unfortunately I don&#8217;t know whom to credit&#8230;) and are coding it up at the moment. I&#8217;ve also thought about it for a while, and I suspect this&#8217;ll work well.</p>
<h4>Using the right kind of storage</h4>
<p>You may have wondered, &#8220;what is this seen-nonce-store you speak of?&#8221;. I am wondering too, ever since I was asked to suggest a solution <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>We&#8217;re not processing credit card transactions here so we obviously don&#8217;t need <a href="http://en.wikipedia.org/wiki/ACID">ACID</a>. All we&#8217;re doing is protecting against a relatively obscure website attack, which even when actually exploited will not cause that many problems for us (we don&#8217;t even <a href="http://shiflett.org/blog/2007/mar/my-amazon-anniversary">sell books</a>). Our store does have to be fast and available, but we&#8217;ll sacrifice many things to achieve that!</p>
<p>Getting more specific, let&#8217;s say the conceptual API for our imaginary seen-nonce-store is something like this:</p>
<pre>
public interface NonceStore {
  /**
   * Check if a nonce already exists in the store.
   *
   * @param nonce the nonce to check for
   * @param timeout maximum time in milliseconds to spend looking
   *     for the nonce
   * @throws NonceNotFoundOntimeException if the timeout is reached
   *     without finishing the existence test
   * @throws TransientException if there was some error that prevents
   *     the existence test from completing
   */
  public boolean exists(String nonce, int timeout)
      throws NonceNotFoundOnTimeException, TransientException;

  /**
   * Store a nonce in the store. This is an asynchronous operation
   * that may fail.
   *
   * @param nonce the calculated nonce
   * @param tstamp the timestamp that was used for calculating the nonce
   * @param bbcUid the bbcUid that was used for calculating the nonce
   * @param nonce the nonce to store
   * @param expireAfter time in minutes to attempt to hold on to
   *     the nonce
   */
  public boolean store(String nonce, long tstamp, String bbcUid,
          int expireAfter);
}
</pre>
<p>Let&#8217;s further assume this seen-nonce-store is some kind of distributed horizontally scaled-out database that is usually accessed from apache/php over a tcp socket. There are some pretty specific characteristics to optimize for:</p>
<ol>
<li><code>exists()</code> has to be fast</li>
<li><code>exists()</code> should fail rather than be too slow</li>
<li><code>exists()</code> failures should be treated the same as non-existence</li>
<li><code>store()</code> has to be fast</li>
<li><code>store()</code> should ignore duplicate nonces</li>
<li>storage does not have to be consistent</li>
<li>storage does not quite have to be eventually consistent</li>
<li>it is acceptable to lose some data some of the time</li>
<li>it is acceptable to fail to store some data some of the time</li>
</ol>
<p>What we&#8217;re looking for here is some kind of distributed tree with built-in expiry features. Some observations:</p>
<ul>
<li>Not having to support a <code>get()</code> means that we should be able to do better than a generic key/value store.</li>
<li>Being able to lose some data should help when making store() fast. We can probably <code>store()</code> to local memory immediately and persist somewhere else later, asynchronously, in batches.</li>
<li>Needing only very weak consistency needs should help more. We can <code>fsync()</code> to disk rather infrequently if at all, and node-to-node replication doesn&#8217;t have to worry too much about ordering, retries, duplicate prevention, or any number of other problems.</li>
<li><code>exists()</code> tests are likely to be heavily biased towards younger nonces. We could perhaps look into keeping old nonces around but in some kind of cheaper storage, etc.</li>
<li>Due to how we load balance, there is likely to be some per-user data center affinity that can be exploited. Replication across data centers seems somewhat sacrifice-able.</li>
<li>If we do not store the <code>bbcUid</code> (nothing says we have to), our storage and network traffic does not have any particular privacy or security concerns beyond making it sufficiently unlikely that replication traffic gets blocked. Things like multicast seem ok, use of SSL seems not needed at all.</li>
</ul>
<h4>Theory vs practice vs future practice</h4>
<p>Due to the difficulty of building and provisioning such optimized pieces of infrastructure, our initial nonce storage system will use a combination of <a href="http://www.danga.com/memcached/">memcache</a> and our in-house distributed key/value store which is built on top of <a href="http://couchdb.apache.org/">CouchDB</a>. They&#8217;re probably not the best hammers for this problem, but they&#8217;re good enough to deal with this particular nail for us, for now.</p>
<p>That said, I&#8217;d think its an interesting challenge to come up with something really solid, and define the de facto standard for session-less CSRF. It&#8217;d be nice if there was something as obvious and maintenance-free and easy-to-use and fast as memcached, with a really robust open source C/PHP library on top. Do you have any suggestions of where to start? Have you solved any similar problems? How?</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lsimons.wordpress.com/358/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lsimons.wordpress.com/358/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/lsimons.wordpress.com/358/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/lsimons.wordpress.com/358/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/lsimons.wordpress.com/358/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/lsimons.wordpress.com/358/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/lsimons.wordpress.com/358/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/lsimons.wordpress.com/358/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/lsimons.wordpress.com/358/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/lsimons.wordpress.com/358/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=358&subd=lsimons&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://lsimons.wordpress.com/2009/09/20/large-scale-csrf/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/90d726e803a6829d46a165f1729e68b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lsimons</media:title>
		</media:content>
	</item>
		<item>
		<title>instanceof vs ClassCastException performance, which is faster?</title>
		<link>http://lsimons.wordpress.com/2009/07/20/instanceof-vs-classcastexception-performance-which-is-faster/</link>
		<comments>http://lsimons.wordpress.com/2009/07/20/instanceof-vs-classcastexception-performance-which-is-faster/#comments</comments>
		<pubDate>Mon, 20 Jul 2009 17:56:47 +0000</pubDate>
		<dc:creator>Leo Simons</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://lsimons.wordpress.com/?p=355</guid>
		<description><![CDATA[Say you are writing a for loop for a java server application. This for loop is iterating over something that in practice (i.e. in production) is always of a specific subtype. But in theory it might not be, so being an average java developer, you add some defensive checking.
Now since the for loop will be [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=355&subd=lsimons&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Say you are writing a for loop for a java server application. This for loop is iterating over something that in practice (i.e. in production) is always of a specific subtype. But in theory it might not be, so being an average java developer, you add some defensive checking.</p>
<p>Now since the for loop will be run at least once for every request your platform receives, and you receive many millions of requests a week (perhaps you&#8217;re working at a place like the BBC?), you decide to spend 5 minutes making it &#8220;fast enough&#8221;.</p>
<p>So, should you keep that  <code>if(!(r instanceof HttpServletRequest))</code> that you put in, replace it with <code>try { ... } catch(ClassCastException e) { ... }</code>, or remove the check completely?</p>
<p>Just to be clear: this is a really silly question and you should really just not bother spending your brain cycles thinking about it: java is fast and well-optimized over many years, it will be fast enough and many times faster than all your I/O handling and whatnot.</p>
<p>But, you now asked yourself this question and you really <em>really</em> want to know. You google for it and find no good answer. So you write a microbenchmark:</p>
<pre>
class InstanceOfBenchmark {
    static long globalCounter = 0;
    static long noCheck = 0;
    static long classCast = 0;
    static long instanceOf = 0;

    public static void main(final String[] args) {
        final int loops = 10000;
        final int encounterB = 1000;

        final A[] instances = new A[loops];
        for(int i = 0; i &lt; loops; i++) {
            //if(i % encounterB == 0) {
            //    instances[i] = new B();
            //} else {
            instances[i] = new C();
            //}
        }

        for(int i = 0; i &lt; 1000; i++) {
            testNoop(instances);
        }

        for (int i = 0; i &lt; 1000; i++) {
            testClassCast(instances);
        }

        for (int i = 0; i &lt; 1000; i++) {
            testInstanceOf(instances);
        }

        System.out.println(&quot;globalCounter = &quot; + globalCounter);
        System.out.println(String.format(
                &quot;Time for no-check test: %d&quot;, noCheck));
        System.out.println(String.format(
                &quot;Time for ClassCastException test: %d&quot;, classCast));
        System.out.println(String.format(
                &quot;Time for instanceof test: %d&quot;, instanceOf));
        System.out.println(String.format(
                &quot;instanceof is slower than classCast by %f ms&quot;,
                1.0 * (instanceOf - classCast) / 1000 / 1000 / loops));
        System.out.println(String.format(
                &quot;instanceof is slower than no check by %f ms&quot;,
                1.0 * (instanceOf - noCheck) / 1000 / 1000 / loops));
    }

    static void testNoop(final A[] instances) {
        long start, end;

        start = System.nanoTime();
        for (final A instance : instances) {
            final C c = (C)instance;
            c.otherDoNothing();
        }
        end = System.nanoTime();
        noCheck += end - start;
    }

    static void testClassCast(final A[] instances) {
        long start, end;

        start = System.nanoTime();
        for (final A instance : instances) {
            final C c;
            try {
                c = (C) instance;
                c.otherDoNothing();
            } catch (ClassCastException e) {
                // ignore
            }
        }
        end = System.nanoTime();
        classCast += end - start;
    }

    static void testInstanceOf(final A[] instances) {
        long start, end;

        start = System.nanoTime();
        for (final A instance : instances) {
            final C c;
            if (!(instance instanceof C)) {
                continue;
            }
            c = (C) instance;
            c.otherDoNothing();
        }
        end = System.nanoTime();
        instanceOf += end - start;
    }

    static class A {
        public void doNothing() {
            // does nothing
        }
    }

    static class B extends A {
    }

    static class C extends A {

        public void otherDoNothing() {
            // does _almost_ nothing, but not enough to be a no-op
            globalCounter++;
        }
    }
}
</pre>
<p>Here&#8217;s some sample results on my machine:</p>
<pre>
globalCounter = 30000000
Time for no-check test: 23981000
Time for ClassCastException test: 21082000
Time for instanceof test: 21551000
instanceof is slower than classCast by 0.000047 ms
instanceof is slower than no check by -0.000243 ms

globalCounter = 30000000
Time for no-check test: 27874000
Time for ClassCastException test: 21886000
Time for instanceof test: 35174000
instanceof is slower than classCast by 0.001329 ms
instanceof is slower than no check by 0.000730 ms

globalCounter = 30000000
Time for no-check test: 26983000
Time for ClassCastException test: 22007000
Time for instanceof test: 22568000
instanceof is slower than classCast by 0.000056 ms
instanceof is slower than no check by -0.000442 ms
</pre>
<p>So for performance <strong>it really doesn&#8217;t matter</strong>. It&#8217;s safe to assume that the JVM JITs and Hotspots its way to the most optimal code path in all cases.</p>
<p>If you want to fiddle some more, note that the above test has some code to allow you to vary the success/failure path; just uncomment/comment a few lines and tweak <code>encounterB</code>. The result is the same in all cases: performance is the same.</p>
<p>Now with that settled once and for all, I can keep my <code>instanceof</code> and get back to work!</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lsimons.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lsimons.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/lsimons.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/lsimons.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/lsimons.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/lsimons.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/lsimons.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/lsimons.wordpress.com/355/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/lsimons.wordpress.com/355/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/lsimons.wordpress.com/355/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=355&subd=lsimons&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://lsimons.wordpress.com/2009/07/20/instanceof-vs-classcastexception-performance-which-is-faster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/90d726e803a6829d46a165f1729e68b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lsimons</media:title>
		</media:content>
	</item>
		<item>
		<title>Administrative stuff to take care of when moving to the UK</title>
		<link>http://lsimons.wordpress.com/2009/07/18/administrative-stuff-to-take-care-of-when-moving-to-the-uk/</link>
		<comments>http://lsimons.wordpress.com/2009/07/18/administrative-stuff-to-take-care-of-when-moving-to-the-uk/#comments</comments>
		<pubDate>Sat, 18 Jul 2009 12:58:26 +0000</pubDate>
		<dc:creator>Leo Simons</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://lsimons.wordpress.com/?p=352</guid>
		<description><![CDATA[As I&#8217;ve found out recently, the UK is not exactly the easiest country in the world for foreigners to move to, and it also isn&#8217;t exactly the most digitized. Navigating the maze of different rules and institutions depends on doing things in the right order. Here is that order:

Get a pre-paid mobile phone &#8211; just [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=352&subd=lsimons&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>As I&#8217;ve found out recently, the UK is not exactly the easiest country in the world for foreigners to move to, and it also isn&#8217;t exactly the most digitized. Navigating the maze of different rules and institutions depends on doing things in the right order. Here is that order:</p>
<ol>
<li>Get a <strong>pre-paid mobile phone</strong> &#8211; just walk into a shop and buy one. Lots of institutions want a UK phone number; you can&#8217;t get a mobile subscription until you have proof of residency. Get a pre-paid USB 3G dongle. You should get your dongle from Vodafone because with them your pre-paid bandwidth does not expire after a month.</li>
<li>Get a <strong>residential address</strong> where you can <strong>receive mail</strong>. Lots of institutions require one. It cannot be a business address and it cannot be a post box. So ask a friend or collegue! Alternatively if you are going to go for a <a href="http://www.moveflat.co.uk/cgi-bin/Flatsharing.asp">flatshare</a> arrangement you might be able to get that by paying cash up front and that becomes your address.</li>
<li>Get a <strong>bank account</strong> with <strong>Barclays</strong>. They seem to be the only bank that can easily deal with foreigners moving to the country. You cannot apply online. Find the <strong>local branch</strong> using the <a href="http://www.barclays.co.uk/">barclays website</a>, and ring them or walk in for an <strong>appointment</strong>. A lot of the people at the branch will think they cannot help you, but you should usually be able to find at least one account manager that knows that (s)he <em>can</em> help you. For your appointment you will need your UK phone number, your UK address and <strong>two forms of ID</strong> (for example passport and drivers&#8217; license). If you have an ID that has an address on it (does not have to be your UK address) that helps. Make it clear that you are already employed in the UK, that you get paid well, that your salary will go into this new account, and you are also happy to put some significant amount of cash in your new account immediately. Be prepared for the appointment to take at least two hours. Make sure to get online banking. You get your bank account number and sort code number immediately and your bank card and online banking details in 5-7 working days.</li>
<li>Register for <strong>online banking</strong> on the barclays website.</li>
<li>Give your bank details (bank, account number, sort code, local branch) to your employer.</li>
<li>Get a flat or flatshare. <strong><a href="http://www.findaproperty.com/">findaproperty.com</a></strong> is the best site to use for finding a flat, <strong><a href="http://www.moveflat.co.uk/">moveflat.co.uk</a></strong> is the best site for finding a flatshare. Offering to <strong>pay rent in advance</strong> tends to take care of most housing agent concerns (like you not having a referral from another UK housing agent). You also have to be able to set up a standing order, for which you need bank account number, sort code number, and address details of your local bank branch.</li>
<li>If you got your own flat, get a <strong>BT landline</strong>. You cannot do this online, instead <strong>call 0800 800 150</strong>. I called Friday at 7:30 pm and was helped quickly. You can order broadband at the same time if you want. Don&#8217;t bother trying to get a phone line anywhere else &#8211; its significantly more painful than going with BT.</li>
<li>Walk into your local Barclays branch to change your address details. Bring your bank card, account details and identification.</li>
<li>Give your new address details to your employer.</li>
<li>Register for <strong><a href="http://en.wikipedia.org/wiki/Council_Tax">council tax</a></strong> with your local council. How to do so depends on the council; check their website. I e-mailed my council who then sent me forms to fill out and send back.</li>
<li>Register for <strong>utilities</strong> (electricity, water, gas). Your landlord or agent should be able to help figure this out, though utilities are handled through a commercial setup so you have to pick a provider. Again you should set up standing orders/direct debit for paying your bill.</li>
<li>If you own a TV, register for a <a href="http://www.tvlicensing.co.uk/">TV License</a>. You can do this online. Set up direct debit.</li>
<li>Once your BT phoneline is set up, you can opt to get <strong><a href="http://www.sky.com/">Sky</a> satellite TV</strong>. If you live in a house or small block of flats you can do this online, otherwise you have to check with your landlord about whether there is a minidish for your block, and if there is, call sky at 08442 410 137.</li>
<li>Once you get a utility bill, you can finally get a <strong>mobile phone subscription plan</strong>. When you walk into the shop, bring a utility bill, proof of ID (i.e. passport), bank and address details, and your landline phone number. (faking a utility bill is obviously not very hard, and I&#8217;ve heard reports from people that have done this to get a subscription.)</li>
<li>Give your new phone details to your employer. If you care about your bank being able to call you, pay yet another visit to the local branch.</li>
<li>Once you&#8217;ve received salary in your new bank account at least twice, you can <strong>apply for a credit card</strong>. With Barclays you can do this online as long as you&#8217;re already registered for online banking.</li>
<li>Get a <strong>National insurance number</strong>. I have yet to try this out. According to <a href="http://www.direct.gov.uk/en/MoneyTaxAndBenefits/BenefitsTaxCreditsAndOtherSupport/BeginnersGuideToBenefits/DG_10014073">this website</a> I have to call <strong>0845 600 0643</strong> (8.00 am to 6.00 pm Monday to Friday) and then go through an interview.</li>
<li>Give your employer your national insurance number.</li>
</ol>
<p>I was happy to have several collegues around that had been through the same process so I got many useful tips from them, but even then I found that all in all it takes quite a bit of calendar time for all these things to get sorted out. Because of that, I recommend to start with the first steps as soon as you know you&#8217;re moving to the UK.</p>
<p>Have you recently moved to the UK? Did you do things the same way or did perhaps find a more efficient route? Any hints or tips to share?</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lsimons.wordpress.com/352/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lsimons.wordpress.com/352/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/lsimons.wordpress.com/352/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/lsimons.wordpress.com/352/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/lsimons.wordpress.com/352/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/lsimons.wordpress.com/352/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/lsimons.wordpress.com/352/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/lsimons.wordpress.com/352/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/lsimons.wordpress.com/352/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/lsimons.wordpress.com/352/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=352&subd=lsimons&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://lsimons.wordpress.com/2009/07/18/administrative-stuff-to-take-care-of-when-moving-to-the-uk/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/90d726e803a6829d46a165f1729e68b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lsimons</media:title>
		</media:content>
	</item>
		<item>
		<title>Unemployed!</title>
		<link>http://lsimons.wordpress.com/2009/05/01/unemployed/</link>
		<comments>http://lsimons.wordpress.com/2009/05/01/unemployed/#comments</comments>
		<pubDate>Fri, 01 May 2009 06:59:19 +0000</pubDate>
		<dc:creator>Leo Simons</dc:creator>
				<category><![CDATA[Work]]></category>

		<guid isPermaLink="false">http://lsimons.wordpress.com/?p=348</guid>
		<description><![CDATA[My last day at Joost was one week ago. On paper, it was yesterday.
Today, I am officially unemployed! Such freedom!  
Next week I&#8217;m starting on a contract for the BBC, in London.
I&#8217;m quite excited about it, also because this means I&#8217;ll be spending the vast majority of my time in London, and maybe I&#8217;ll [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=348&subd=lsimons&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><div id="attachment_349" class="wp-caption alignright" style="width: 170px"><a href="http://www.flickr.com/photos/rudolf_schuba/983646145/"><img class="size-full wp-image-349" title="bigben" src="http://lsimons.files.wordpress.com/2009/05/bigben.jpg?w=160&#038;h=240" alt="Picture of the Big Ben in London" width="160" height="240" /></a><p class="wp-caption-text">Photo of the Big Ben in London</p></div>
<p>My last day at <a href="http://www.joost.com/">Joost</a> was one week ago. On paper, it was yesterday.</p>
<p>Today, I am officially unemployed! Such freedom! <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>Next week I&#8217;m starting on a contract for the BBC, in London.</p>
<p>I&#8217;m quite excited about it, also because this means I&#8217;ll be spending the vast majority of my time in London, and maybe I&#8217;ll even move over there permanently! Eek!</p>
<p>Exciting times. <a href="http://lsimons.wordpress.com/2009/03/09/massive-change-is-inevitable/">My mood is eager impatience</a>.</p>
<p><em>(photo by </em><a title="Flickr page for Big Ben Photo" href="http://www.flickr.com/photos/rudolf_schuba/983646145/" target="_blank"><em>Rudolf Schuba</em></a><em>)</em></p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/lsimons.wordpress.com/348/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/lsimons.wordpress.com/348/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/lsimons.wordpress.com/348/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/lsimons.wordpress.com/348/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/lsimons.wordpress.com/348/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/lsimons.wordpress.com/348/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/lsimons.wordpress.com/348/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/lsimons.wordpress.com/348/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/lsimons.wordpress.com/348/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/lsimons.wordpress.com/348/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=lsimons.wordpress.com&blog=2100343&post=348&subd=lsimons&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://lsimons.wordpress.com/2009/05/01/unemployed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/90d726e803a6829d46a165f1729e68b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">lsimons</media:title>
		</media:content>

		<media:content url="http://lsimons.files.wordpress.com/2009/05/bigben.jpg" medium="image">
			<media:title type="html">bigben</media:title>
		</media:content>
	</item>
	</channel>
</rss>