Running MySQL Cluster without Arbitrator: don’t, but if you have to..

This post explains how to disable Arbitration when using MySQL Cluster. It gives a case where this could be useful.

First, a piece of advice: you do not want to run MySQL Cluster with arbitration disabled. But if you must, e.g. because of an oversight in your implementation, you can.
Arbitration is very important in MySQL Cluster. It makes sure you don’t end up with a Split Brain situation: 2 halves working independently, continuing changing data, making it impossible for them to work together later on.

However, Arbitration comes with a price: you need an extra machine. “Sure, what’s the big deal?”. It’s not that easy when you lack the money, or more problematic, when you lack the real-estate in your rack.

Everyone running MySQL Cluster should know that you should not run the ndb_mgmd on the same machines on which the data node processes, ndbd or ndbmtd, are running. The Management Nodes need to be on a separate machine so it can act as an Arbitrator.

Here’s an example why: If you have two hosts A and B and both are running a management and data node process. Host A’s ndb_mgmd is currently the Arbitrator. Now unplug host A *BANG*: one data node and the arbitrator down. The other data node on Host B notices this, and tries to figure out if it can continue. So it checks if it can reach the Arbitrator: but it’s gone as well! So, the data node on host B goes faithfully down. This all happens in a few seconds, there is no time to elect a new Arbitrator. “Cluster’s dead, Jim”.

What if you can’t get a 3rd machine? There’s an option for that.. Data nodes can be configured with setting the Arbitration-option to WaitExternal. This means you will have to develop your own arbitration application or script. How cool is that? Well, it might be cool, but it’s a pain in the butt.

[ndbd default]
Arbitration = WaitExternal
ArbitrationTimeout = 3

What happens with our 2 host setup with above changes: When Host A, which has the Arbitrator, goes down, the data node on Host B will wait for 3 seconds, i.e. ArbitrationTimeout. It will block all incoming transactions, refusing changes. An application, the External Arbitrator, running on Host B (actually on all hosts running MySQL Cluster proceses) has 3 seconds to figure out whether Host B can continue running it’s ndbd process(es), or not. In this case, it should find out that Host A is down and that Host B should continue keeping the data available.

“Ah, easy! Problem solved!”, you might joyfully exclaim. No, it isn’t. It’s more complicated than that. What happens when Host A doesn’t go down, but both hosts can’t see each other due to a network issue between them? Both External Arbitrators would figure out that they need to continue: you end up again with a split brain. So you still need someway to handle that.

At this point, I would like to say: “Goodluck!”. Every situation is going to be different. Everyone will have his own External Arbitrator requirements or ways to check if a host or blade chassis is up or not. It’s a great option, and it puts you more in control of your MySQL Cluster, but it adds a lot of complexity.

So, my advice: revise and correct your MySQL Cluster setup when you think you need to disable Arbitration.

My first thurderstorm shooting

About 2 hours and more than 500 shots later I finally did it: I shot lighting! One needs patience, lots of nerves and luck. I didn’t readup on how to do it before, but in the end I pretty much figured it out. The full battery ran out, that much I was using the camera.

Exposure time is not so important, especially when you are exposed to the light polution of the city. I kept it low, from 6 to 10 seconds. I tried with 30 seconds (maximum) but it was just to bright.

Thunderstorm over Kraków

What I found, is that getting the Aperture correct is the way to get it right. Increase it when the storm is further way, decrease when it’s closing in. In the end, when the storm reached our block of flats, I did set it around f/5.6 and f/6.3. This was good. I was continously shooting every 6 seconds keeping the trigger down (didn’t have my cable handy).

Thunderstorm over Kraków

Except for the camera technique, there were a few more challenges, e.g. the actual danger when the storm got closer: keeping windows open is no good. Also the incoming rain was making everything wet. Great fun however, even though only 20 shots did contain actual lighting.

Thunderstorm over Kraków

My gear at the time of shooting: Canon 450D equipped with the EF24-105mm f/4L IS USM lens. I thought putting my 30D to work too, but I lacked a second tripod.

Check how old your MySQL books are before usage

This is a friendly reminder to check the publication date and discussed version you MySQL books before starting out hacking or even posting about limitations. Lots of old books are still going around. Maybe it’s good to destroy them rather than giving them to students or newbies.

Few days ago (28 May 2010), for example, we had a word-for-word copy of a book on a blog post (now removed) which was discussing MySQL Cluster limitations from years ago. Well, it was funny at first and we had a good laugh. But it’s a bit worrisome. My colleague Matthew posted a rebuttal post.

How would you recycle the old, technical books? It’s not worth giving them to public libraries, it’s maybe unhealthy to burn them? How would you do it?