This post explains how to disable Arbitration when using MySQL Cluster. It gives a case where this could be useful.
First, a piece of advice: you do not want to run MySQL Cluster with arbitration disabled. But if you must, e.g. because of an oversight in your implementation, you can.
Arbitration is very important in MySQL Cluster. It makes sure you don't end up with a Split Brain situation: 2 halves working independently, continuing changing data, making it impossible for them to work together later on.
However, Arbitration comes with a price: you need an extra machine. "Sure, what's the big deal?". It's not that easy when you lack the money, or more problematic, when you lack the real-estate in your rack.
Everyone running MySQL Cluster should know that you should not run the ndb_mgmd on the same machines on which the data node processes, ndbd or ndbmtd, are running. The Management Nodes need to be on a separate machine so it can act as an Arbitrator.
Here's an example why: If you have two hosts A and B and both are running a management and data node process. Host A's ndb_mgmd is currently the Arbitrator. Now unplug host A *BANG*: one data node and the arbitrator down. The other data node on Host B notices this, and tries to figure out if it can continue. So it checks if it can reach the Arbitrator: but it's gone as well! So, the data node on host B goes faithfully down. This all happens in a few seconds, there is no time to elect a new Arbitrator. "Cluster's dead, Jim".
What if you can't get a 3rd machine? There's an option for that.. Data nodes can be configured with setting the Arbitration-option to WaitExternal. This means you will have to develop your own arbitration application or script. How cool is that? Well, it might be cool, but it's a pain in the butt.
[ndbd default] Arbitration = WaitExternal ArbitrationTimeout = 3
What happens with our 2 host setup with above changes: When Host A, which has the Arbitrator, goes down, the data node on Host B will wait for 3 seconds, i.e. ArbitrationTimeout. It will block all incoming transactions, refusing changes. An application, the External Arbitrator, running on Host B (actually on all hosts running MySQL Cluster proceses) has 3 seconds to figure out whether Host B can continue running it's ndbd process(es), or not. In this case, it should find out that Host A is down and that Host B should continue keeping the data available.
"Ah, easy! Problem solved!", you might joyfully exclaim. No, it isn't. It's more complicated than that. What happens when Host A doesn't go down, but both hosts can't see each other due to a network issue between them? Both External Arbitrators would figure out that they need to continue: you end up again with a split brain. So you still need someway to handle that.
At this point, I would like to say: "Goodluck!". Every situation is going to be different. Everyone will have his own External Arbitrator requirements or ways to check if a host or blade chassis is up or not. It's a great option, and it puts you more in control of your MySQL Cluster, but it adds a lot of complexity.
So, my advice: revise and correct your MySQL Cluster setup when you think you need to disable Arbitration.
Last week I was struggling to find an easy way to simulate a troubled Data Node (ndbd process) using MySQL Cluster. It's as simple as pancackes: using the kill command!
To freeze a process you just need to kill the process using the SIGSTOP signal. To let the processes continue, use SIGCONT. Here's an example shell script showing how you would use these two signals on a data node:
# 2010-05-03 08:11:46 [ndbd] INFO -- Angel pid: 542 ndb pid: 543
NDBDPID=`grep 'Angel pid' ndb_3_out.log | tail -n1 | awk '{ print $11 }'`
kill -STOP $NDBDPID
sleep 10
kill -CONT $NDBDPID
I'm using the out-log because the file ndb_3.pid contains only the PID of the Angel process. The sleep command is something variable which you can set as low or as high as you want.
In the above example the script sleeps long enough for data node to fail with an Arbitration Error. If you would set options HeartbeatIntervalDbDb and TimeBetweenWatchDogCheck to a lower value than the default, you would only be able to sleep for a few seconds. The result:
[MgmtSrvr] WARNING -- Node 2: Node 3 missed heartbeat 2 [MgmtSrvr] WARNING -- Node 2: Node 3 missed heartbeat 3 [MgmtSrvr] ALERT -- Node 1: Node 3 Disconnected [MgmtSrvr] ALERT -- Node 1: Node 3 Disconnected [MgmtSrvr] WARNING -- Node 2: Node 3 missed heartbeat 4 [MgmtSrvr] ALERT -- Node 2: Node 3 declared dead due to missed heartbeat [MgmtSrvr] INFO -- Node 2: Communication to Node 3 closed [MgmtSrvr] ALERT -- Node 2: Network partitioning - arbitration required [MgmtSrvr] INFO -- Node 2: President restarts arbitration thread [state=7] [MgmtSrvr] ALERT -- Node 2: Arbitration won - positive reply from node 1 [MgmtSrvr] ALERT -- Node 2: Node 3 Disconnected [MgmtSrvr] INFO -- Node 2: Started arbitrator node 1 [ticket=019b00025cc8aad8] [MgmtSrvr] ALERT -- Node 3: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
How is this useful? Well, for simulating a data node which is having problems while having load for example. Maybe you would like to see what happens if you tune the WatchDog or Hearbeat parameters. Or maybe you want to give a demonstration to your management without going through hassel of overloading a disk or CPU or pulling network cables (e.g. for prove of concept).
In any case, I think it's a cool use of the kill-command. One I didn't know of.
You are using MySQL Cluster and crazy enough to digest NDB API? Sick of SQL? Here's a treat: a function to make C/C++ strings ready for inserting into a VARCHAR field. The special thing about them is that the length is prefixed in the first 2 bytes.
void make_ndb_varchar(char *buffer, char *str)
{
int len = strlen(str);
int hlen = (len > 255) ? 2 : 1;
buffer[0] = len & 0xff;
if( len > 255 )
buffer[1] = (len / 256);
strcpy(buffer+hlen, str);
}
Yes, you can use memcpy. Whatever floats your boat.
Lets use this function for a table t1, defined as follows (note: latin1!):
CREATE TABLE t1 ( id INT UNSIGNED NOT NULL, vc VARCHAR(128), vclong VARCHAR(1280), PRIMARY KEY (id) ) ENGINE=NDB DEFAULT CHARSET=latin1
Here is part of the code, simplified for this post:
char vc[128+1]; // Size of 'vc', +1 for length info
char vclong[1280+2]; // Size of 'vclong', +2 for length info
..
make_ndb_varchar(vc, "NDB API kicks ass");
operation->setValue("vc", vc);
..
The above example uses latin1. You could use Unicode, but that would probably mean converting from one encoding to the other using iconv. That's another story.
This post complements Johan Andersson's blog entry. Thanks to my colleagues Mats and Roger who helped me with a silly problem today regarding this function.
A preliminary schedule is now available for the MySQL Conference & Expo 2010 (in Santa Clara, California, USA). I have two talks and a tutorial, currently scheduled as follows:
- MySQL Cluster Tutorial, Mon 08:30 on 12 Apr 2010, together with Andrew Hutchings and Andrew Morgan
- MySQL Cluster: An Introduction, Tue 11:55 on 13 Apr 2010
- Connecting MySQL and Python, Tue 15:05 on 13 Apr 2010
Schedule might change a bit, so I suggest keeping an eye on the conference website.
And please remember: we might have changed colors a few times past 2 years, but we are still the nice and friendly people we were back in MySQL AB!
You're not scared of writing clusterious code and eavesdropping is your favorite pastime at work? You want to know what's going on in your MySQL Cluster but were afraid asking? The MySQL Cluster Management API can help you!
Below you'll find example C-code that will get you started with MGM API. It's rather dull at first, but imagine you, instead of printing the event information, taking action. Imagine you starting another thread where you run some procedure which tells a monitoring system: "Hey! Some error happend!". Imagine you spending countless hours of clusterious fun with this API!
The example listens for events in 4 categories (Statistic, Info, Error and Checkpoint) with the greatest level of 15. The code only prints out what event was received, and it also shows how to interprete the event.category, which proves to be a bit tricky.
#include <stdlib.h>
#include <unistd.h>
#include <mgmapi/mgmapi.h>
int main()
{
NdbMgmHandle handle = ndb_mgm_create_handle();
NdbLogEventHandle logevent;
struct ndb_logevent event;
int res, category;
int timeout = 1000; // in ms
int filter[] = {
15, NDB_MGM_EVENT_CATEGORY_STATISTIC,
15, NDB_MGM_EVENT_CATEGORY_INFO,
15, NDB_MGM_EVENT_CATEGORY_ERROR,
15, NDB_MGM_EVENT_CATEGORY_CHECKPOINT,
0
};
ndb_mgm_set_connectstring(handle,"localhost");
ndb_mgm_connect(handle,0,0,0);
logevent = ndb_mgm_create_logevent_handle(handle, filter);
while(true) {
res = ndb_logevent_get_next(logevent,&event,timeout);
category = event.category + CFG_MIN_LOGLEVEL;
if (category == NDB_MGM_EVENT_CATEGORY_CHECKPOINT) {
printf("We got a checkpoint event!\n");
}
printf("Event type %d category %d\n",
event.type, event.category);
usleep(500); // temporize!
}
ndb_mgm_destroy_handle(&handle);
}
Note that the above is very bare bone without error handling, but it works. Also, run it on the machine which runs your Management Node (ndb_mgmd), or change the connection string in the code.
Compiling can be a bit of a chore, unless you read "Want to compile a MySQL Cluster MGM API application?".
clus·te·ri·ous |ˈkləstərēəs;|
adjective
highly pleasant to MySQL Support Engineers: A clusterious issue a day, keeps the spouse away.
· clusterful : a hard day.
DERIVATIVES
clusteriously adverb
clusteriousness noun
ORIGIN MySQL Support Team: via Sun Microsystems.
Clusterious
noun
a rare variety of MySQL Support Engineers originally cultivated in Sweden.
A week ago we found a workaround for a bug in MySQL Cluster making it impossible to run a management node on MacOS X. Until the bug is fixed, you should use the --nodaemon option for the ndb_mgmd executable. Both MySQL Cluster v6.3 and v7.0 are affected.
Currently, I'm starting the management node like this:
(
cd /opt/mysql/mysql ;
./libexec/ndb_mgmd -f /opt/mysql/config.ini \
--nodaemon 2>/dev/null 1>&2 </dev/null &
)
Obviously, you'll want to change the paths.
Eventually, the bug will get fixed, but until then you got no excuse to not try MySQL Cluster on Mac!
Here is a quick way to compile a simple MGM API application. The example will get the state of all nodes in MySQL Cluster and print whether they are connected or not.
All this without a Makefile, we just want to have some simple example on Linux to see how it works. It's basic, maybe, but sometimes useful to just have a peek.
Requirements! We assume that:
- you installed MySQL Cluster 6.3 or higher, preferably under /usr/local/mysql,
- your cluster is up and shiny,
- and ndb_mgmd runs on the same machine you are compiling the MGM API test application on.
The code, save it in a file called mgmapi_test.cc (from the MySQL manual):
#include <stdlib.h>
#include <mgmapi/mgmapi.h>
int main()
{
NdbMgmHandle handle = ndb_mgm_create_handle();
ndb_mgm_set_connectstring(handle,"localhost");
ndb_mgm_connect(handle,0,0,0);
struct ndb_mgm_cluster_state *state = ndb_mgm_get_status(handle);
for(int i=0; i < state->no_of_nodes; i++)
{
struct ndb_mgm_node_state *node_state= &state->node_states[i];
printf("node with ID=%d ", node_state->node_id);
if(node_state->version != 0)
printf("connected\n");
else
printf("not connected\n");
}
free((void*)state);
ndb_mgm_destroy_handle(&handle);
}
Lets compile it! Note that it goes all on one line, but you could also paste the following lines in a shell script:
(
MYBASE="/usr/local/mysql";
PATH="$MYBASE/bin:$PATH";
g++ `mysql_config --libs` `mysql_config --cflags` -lndbclient \
-I$MYBASE/include/storage/ndb \
-o mgmapi_test mgmapi_test.cc
)
No errors? Lets run it! Setting LD_LIBRARY_PATH doesn't hurt:
(
export LD_LIBRARY_PATH="/usr/local/mysql/lib";
./mgmapi_test
)
The result should be something like this:
node with ID=1 connected
node with ID=2 connected
node with ID=3 connected
node with ID=4 connected
node with ID=10 connected
node with ID=11 not connected
node with ID=12 not connected
node with ID=13 not connected
..
FAQ: Wouldn't it be better to have a Makefile? Absolutely. Will there be problems doing it above? Sure, but the error message should help; start by checking if the locations are OK.
MySQL Cluster 6.3.28 and 7.0.9 introduce the MySQL server option --ndb-wait-setup. This makes sure that clients can not connect the SQL Node when no Data Nodes are available within, by default, 15 seconds. When the timeout is reached, and no Data Nodes are available, the NDB storage engine will be marked as unavailable.
The following will appear in the MySQL server error log when --ndb-wait-setup=30 has been set:
[Note] NDB: NodeID is 10, management server 'ndbsup-priv-1:1406'
[Note] NDB[0]: NodeID: 10, no storage nodes connected (timed out)
[Note] Starting Cluster Binlog Thread
[Note] Event Scheduler: Loaded 0 events
[Note] NDB Binlog: Ndb tables initially read only.
.. 30 seconds later..
[Warning] NDB : Tables not available after 30 seconds.
Consider increasing --ndb-wait-setup value
[Note] /data1/mysql/5.1.39_6.3.28/libexec/mysqld: ready for connections.
Use case: when doing an installation and you start Data and SQL Nodes quickly after each other. Normally, services which connect to a MySQL server (which is connect to a MySQL Cluster) will have failures because NDB tables are not yet available. With --ndb-wait-setup option set, they will not even be able to connect. It could help in some automated install scenarios where you want to make sure clients can't do anything until Data Nodes are available.
Stay tuned for the binaries due first half of November (2009). Source is already available for MySQL Cluster 7.0.9 and 6.3.28.
Replication channel failover is discussed in the MySQL manual and it is quite easy to implement. When the primary replication channel fails, all you need to do is some SQL queries, do a CHANGE MASTER TO, issue a START SLAVE and it should be replicating again.
This fail-over procedure is also very useful if you are upgrading the Master MySQL Cluster mysqld processes doing binary logging. For this you do not need a (controlled) fail-over to the Slave Cluster. You simply point the SQL Node reading events from the Master to another mysqld and switch back when the upgrade is done.
The MySQL manual describes what to do, here is a summary in a few point:
- Stop the Slave SQL Node
- Get the last epoch on this Slave SQL Node
- Using this epoch, get the binary log file and position on the new Master SQL Node
- On the Slave SQL Node, use CHANGE MASTER TO with the file and log position from previous point
- Start the Slave SQL Node, and it should work
Few notes:
- You need to have 2 SQL nodes on the Master MySQL Cluster which are configured to do binary logging. If you haven't, please do ASAP.
- It's good to temporize between steps 2 and 3, like 10 seconds.
- Yes, SQL Node is in this case a mysqld process./li>
The following Python functions are just examples, but hopefully they useful for somebody (full script not provided but it is using MySQL Connector/Python!):
def _get_last_epoch(slave):
c = slave.cursor()
stmt = "SELECT MAX(epoch) AS lastepoch "\
"FROM mysql.ndb_apply_status"
c.execute(stmt)
row = c.fetchone()
c.close()
return row[0]
def _get_binary_log_info(master, lastEpoch):
c = master.cursor()
stmt = "SELECT SUBSTRING_INDEX(File, '/', -1),"\
"Position "\
"FROM mysql.ndb_binlog_index "\
"WHERE epoch > %s "\
"ORDER BY epoch ASC LIMIT 1"
c.execute(stmt, (lastEpoch,))
row = c.fetchone()
c.close()
return row
def _change_master(slave,masterHost=None,
masterPort=None,
logFile=None,logPos=None):
stmtArgs = []
params = []
if masterHost:
stmtArgs.append('MASTER_HOST=%s')
params.append(masterHost)
if masterPort:
stmtArgs.append('MASTER_PORT=%s')
params.append(masterPort)
if logFile:
stmtArgs.append('MASTER_LOG_FILE=%s')
params.append(logFile)
if logPos:
stmtArgs.append('MASTER_LOG_POS=%s')
params.append(logPos)
c = slave.cursor()
stmt = "CHANGE MASTER TO %s" % ', '.join(stmtArgs)
c.execute(stmt, tuple(params))
c.close()
Update 2010-02-26: Updating the text a bit; correcting Python code.
Make sure to specify everytime the --configdir option when starting the ndb_mgmd.
shell> ndb_mgmd -f config.ini --configdir=/path/to/cachedirI'm using the same value as Datadir for the --configdir option.
An elegant solution it is not, IMHO. Maybe it's good to make your own ndb_mgmd_safe?
The LOST_EVENTS incident is reported in the MySQL binary log in two cases. Each time the Slave will try to tell you why. Here is an example output of the SHOW SLAVE STATUS done on the Slave MySQL Cluster (bit edited to fit):
mysql> SHOW SLAVE STATUS\GThe message can be one of the following (MySQL Cluster 7.0.7):
Last_Errno: 1590
Last_Error: The incident LOST_EVENTS occured on the master.
Message: mysqld startup
- mysqld startup: SQL Node (mysqld) acting as Master was started.
- cluster disconnect: SQL Node acting as Master lost connection to its data nodes.
cd /where/binlog/are/
for f in `ls binglog.*`;
do
echo $f
mysqlbinlog $f | grep LOST_EVENTS
done
.. it could have been the title for the next James Bond movie, but no.. It's much more exciting!
The LOST_EVENTS incident is reported each time you are starting a MySQL server which is binary logging and is participating as an SQL Node in MySQL Cluster setup. Slaves reading from these binary logs will stop their I/O Thread (the one getting information from the master) and report this incident as an error (see below for example).
It is very important that this incident is reported and is stopping the replication between MySQL Clusters. If not, you could end up with inconsistent data!
A world without LOST_EVENTS incidents
You have a MySQL Cluster running with two SQL Nodes. One mysqld is binary logging, getting all updates happening in the cluster. You have a Slave MySQL Cluster which has a mysqld reading from this Master and all is replicating just fine, until..
Somebody trips over the wire and the SQL Node doing binary logging goes down. *BANG*. Luckily this person's head didn't hit the rack and is still conscious enough for quickly putting back the power cord into the socket. The machine spins up and runs happily the SQL node. All is good. Well, not really.. Next day you found your mobile silent because your girlfriend had a headache and you missed an alarm. Few hours later you're getting angry reports from customers: they are missing data on the slave?
What the..?
If your Master SQL Node doing binary logging fails, and you still have other SQL Nodes doing updates on data in your Cluster, you might end up with the following situation:
On Master:
mysql> SELECT * FROM t1 ORDER BY c1;
+----+----------+---------------------+
| c1 | serverid | c2 |
+----+----------+---------------------+
| 1 | 1 | 2009-09-17 12:50:39 |
| 2 | 2 | 2009-09-17 12:51:03 |
| 3 | 2 | 2009-09-17 12:51:06 |
| 4 | 1 | 2009-09-17 12:51:09 |
| 5 | 2 | 2009-09-17 13:01:46 | Missing on slave!
| 6 | 2 | 2009-09-17 13:01:47 | Missing on slave!
| 7 | 1 | 2009-09-17 13:02:36 |
+----+----------+---------------------+
On Slave:
mysql> SELECT * FROM t1 ORDER BY c1;
+----+----------+---------------------+
| c1 | serverid | c2 |
+----+----------+---------------------+
| 1 | 1 | 2009-09-17 12:50:39 |
| 2 | 2 | 2009-09-17 12:51:03 |
| 3 | 2 | 2009-09-17 12:51:06 |
| 4 | 1 | 2009-09-17 12:51:09 |
| 7 | 1 | 2009-09-17 13:02:36 |
+----+----------+---------------------+
The above is actual output of what could happen when LOST_EVENTS incident is not written in the binary logs due to a bug.
The Master has 2 more rows, which were inserted on the 2nd SQL Node, while the SQL node doing binary logging was down. The replication picked up after the failing SQL Node came back, but now there is data missing: the Slave misses or lost events. What's in a name, huh!?
A world with LOST_EVENTS incidents
If your SQL Node doing binary logging fails, and restarts, it will report in its binary logs that there might be events lost: LOST_EVENTS incident. The Slave's I/O thread will stop and report an error. Good DBA's will of course monitor Slaves and they will have to act upon this.
Here an example of how the Slave looks like when it receives a LOST_EVENTS incident:
mysql> SHOW SLAVE STATUS\G
..
Slave_IO_Running: Yes
Slave_SQL_Running: No
..
Last_Errno: 1590
Last_Error: The incident LOST_EVENTS occurred on the master..
..
Now, if you had only 1 SQL Node in the Master Cluster, you're done for. You'll have re-initialize your Slave Cluster because you can not guarantee the consistency of the data.
To prevent this, you have to have at least 2 SQL Nodes doing binary logging. This way, if one fails, you can switch to another replication channel. This switching over is explained in the MySQL Manual, and will be material for a next article!
Conclusion
If you got a LOST_EVENTS incident on the Slave, consider yourself (kind of) lucky! Because then you know there could be something wrong, and not find out later that your Slave is inconsistent with the Master.
This article will describe how we start this cluster after a shut down, changing configuration and do a rolling restart.
The configuration
The config.ini stored on the first management node:
[NDBD DEFAULT]
Datadir=/data2/users/geert/cluster/master
NoOfReplicas=2
DataMemory=80M
IndexMemory=10M
[NDB_MGMD DEFAULT]
Datadir=/data2/users/geert/cluster/master
[NDB_MGMD]
Id=1
Hostname = machine-1
[NDB_MGMD]
Id=2
Hostname = machine-2
[NDBD]
Id=3
Hostname = machine-3
[NDBD]
Id=4
Hostname = machine-4
[API]
[API]
[API]
[API]
Starting MySQL Cluster 7.0 with 2 management nodes
We assume that the cluster from previous post was shut down and we need to restart it. Here are the instructions to do so.
We start both management node process the same way, without the options --initial or --reload!
# machine-1
shell> ndb_mgmd --configdir=/path/to/configcache/dir
..NDB Cluster Management Server. mysql-5.1.35 ndb-7.0.7
..Loaded config from '/path/to/configcache/dir/ndb_1_config.bin.1'
# machine-2
shell> ndb_mgmd --configdir=/path/to/configcache/dir
..NDB Cluster Management Server. mysql-5.1.35 ndb-7.0.7
..Loaded config from '/path/to/configcache/dir/ndb_2_config.bin.1'
Data nodes are started on machine-3 and machine-4:
# machine-3
shell> ndbd -c machine-1
..Configuration fetched from 'machine-1', generation: 1
# machine-4
shell> ndbd -c machine-2
..Configuration fetched from 'machine-2', generation: 1
That should bring your cluster back up, ready for some experiments!
Rolling restart after configuration change
Lets assume we want more memory to store data and index information. We change the following in the configuration file config.ini, which you find on the first management node, machine-1:
[NDBD DEFAULT]
DataMemory=160M
IndexMemory=20M
Save your new config.ini and kill the ndb_mgmd process on machine-1, followed by starting it again with the --reload option:
# machine-1
shell> killall ndb_mgmd
shell> ndb_mgmd -f config.ini --reload --configdir=/path/to/configcache/dir
..NDB Cluster Management Server. mysql-5.1.35 ndb-7.0.7
..Loaded config from '/path/to/configcache/dir/ndb_1_config.bin.1'
The above output might be a bit confusing: we started with a changed config.ini but it said it loaded config from the previous cached version. This is normal. It needs to first read the old to know the changes from the new. The real magic is shown in the cluster log ndb_1_cluster.log on machine-1 (simplified for this blog post):
.. Detected change of config.ini on disk, will try to set it
when all ndb_mgmd(s) started. This is the actual diff:
[ndbd(DB)]
NodeId=3
-IndexMemory=10485760
+IndexMemory=20971520
[ndbd(DB)]
NodeId=4
-IndexMemory=10485760
+IndexMemory=20971520
..
Node 2 connected
Starting configuration change, generation: 1
Configuration 2 commited
Config change completed! New generation: 2
Notice that currently in MySQL Cluster 7.0.7 there is bug that when changing 2 parameters, only 1 will show up when the difference is shown in the logs.
On the second management node you'll find in ndb_2_cluster.log something like this:
..Node 2: Node 1 Connected
..
..Configuration 2 commited
Both management nodes have agreed on the same configuration, and both have it binary cached in files named like ndb_2_config.bin.*.
We continue now with restarting the data nodes, while connected to either management node do the following:
# machine-1 or machine-2
shell> ndb_mgm
ndb_mgm> ALL REPORT MEMORY USAGE
Node 3: Data usage is 0%(4 32K pages of total 2560)
Node 3: Index usage is 0%(8 8K pages of total 1312)
Node 4: Data usage is 0%(4 32K pages of total 2560)
Node 4: Index usage is 0%(8 8K pages of total 1312)
ndb_mgm> 3 RESTART
Node 3: Node shutdown initiated
Node 3: Node shutdown completed, restarting, no start.
Node 3 is being restarted
Node 3: Started (version 7.0.7)
ndb_mgm> 4 RESTART
Node 4: Node shutdown initiated
Node 4: Node shutdown completed, restarting, no start.
Node 4 is being restarted
Node 4: Data usage decreased to 0%(0 32K pages of total 5120)
Node 4: Started (version 7.0.7)
ndb_mgm> ALL REPORT MEMORY USAGE
Node 3: Data usage is 0%(6 32K pages of total 5120)
Node 3: Index usage is 0%(8 8K pages of total 2592)
Node 4: Data usage is 0%(6 32K pages of total 5120)
Node 4: Index usage is 0%(8 8K pages of total 2592)
The ALL REPORT MEMORY USAGE output shows that the configuration took effect and a rolling restart was succesful.
We've seen a few people in the community struggling with the new management node features in MySQL Cluster 7.0. To be honnest, we sometimes in MySQL Support, are scratching our heads as well.
This simple how-to will explain how to start from scratch a MySQL Cluster 7.0.7 (or above) with 2 management nodes (and 2 data nodes). This is not a rolling upgrade from MySQL Cluster 6.3.
Here is the basic config.ini we are going to use. Note the Hostname parameters as we are going to use them often:
[NDBD DEFAULT]
Datadir=/data2/users/geert/cluster/master
NoOfReplicas=2
DataMemory=80M
IndexMemory=10M
[NDB_MGMD DEFAULT]
Datadir=/data2/users/geert/cluster/master
[NDB_MGMD]
Id=1
Hostname = machine-1
[NDB_MGMD]
Id=2
Hostname = machine-2
[NDBD]
Id=3
Hostname = machine-3
[NDBD]
Id=4
Hostname = machine-4
[API]
[API]
[API]
[API]
We start with an empty MySQL Cluster: no data, no logs. The config.ini is only on machine-1 (however, it's good to have it on both anyway, making it HA!).
Start the first ndb_mgmd process on host machine-1:
shell> ndb_mgmd -f config.ini \
--configdir=/path/to/empty/dir --initial
..NDB Cluster Management Server. mysql-5.1.35 ndb-7.0.7
..Reading cluster configuration from 'config.ini'
On machine-2 you start the 2nd management as follows:
shell> ndb_mgmd -c machine-1 --ndb-nodeid=2 \
--configdir=/path/to/empty/dir
..NDB Cluster Management Server. mysql-5.1.35 ndb-7.0.7
..Trying to get configuration from other mgmd(s) using
'nodeid=2,machine-1'...
..Connected to 'machine-1'...
Note that starting the second ndb_mgmd is quite different:
- It does not read config.ini, but gets configuration from machine-1.
- It needs to know what node ID it has, because it doesn't read a configuration file.
- There is no --initial option.
At this point, you should have a cluster that looks like this (doing SHOW connected to the first management node on machine-1):
ndb_mgm> SHOW
..
[ndbd(NDB)] 2 node(s)
id=3 (not connected, accepting connect from machine-3)
id=4 (not connected, accepting connect from machine-4)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @machine-1 (mysql-5.1.35 ndb-7.0.7)
id=2 (not connected, accepting connect from machine-2)
..
Important: The fact that you don't see the second management node connect is because you did not start data nodes. Management nodes 'see' each other through connected data nodes!
Now start the data nodes, but for fun, point the 2nd one to the 2nd management node.
On machine-3 you do:
shell> ndbd -c machine-1
..Configuration fetched from 'machine-1', generation: 1
Same on machine-4 but connect to the 2nd management node:
shell> ndbd -c machine-2
..Configuration fetched from 'machine-2', generation: 1
Your MySQL Cluster is now up and running with 2 management and 2 data nodes.
ndb_mgm> SHOW
..
[ndbd(NDB)] 2 node(s)
id=3 @machine-3 (mysql-5.1.35 ndb-7.0.7, Nodegroup: 0, Master)
id=4 @machine-4 (mysql-5.1.35 ndb-7.0.7, Nodegroup: 0)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @machine-1 (mysql-5.1.35 ndb-7.0.7)
id=2 @machine-2 (mysql-5.1.35 ndb-7.0.7)
..
Next we'll do some experimenting changing some parameters and doing a rolling restart, but that's for another article/blog post.
Problem is that when you do a system restart, this ID is reset to 1. If you didn't move your backups from your data nodes to another machine (which is of course something you should do), you might get errors.
A rather easy trick is to provide a timestamp as ID, for example using Unix time.
shell> ID=`date +%s` && ndb_mgm "START BACKUP $ID"
Now your the directories on the data nodes containing the backups will have a time in them, and it will survive system restarts. You can of use of course use any integer as ID: 2^32 as of MySQL 6.3.23 (earlier only 2^31 is supported).
--include-databases=name
Comma separated list of databases to restore.
Example: db1,db3
--exclude-databases=name
Comma separated list of databases to not restore.
Example: db1,db3
--include-tables=name
Comma separated list of tables to restore. Table name
should include database name. Example: db1.t1,db3.t1
--exclude-tables=name
Comma separated list of tables to not restore. Table name
should include database name. Example: db1.t1,db3.t1
To demonstrate in a few examples, lets assume you have the following tables:
mysql> SELECT TABLE_SCHEMA AS `Schema`,TABLE_NAME AS `Table`
FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA LIKE 'db_';
+--------+-------+
| Schema | Table |
+--------+-------+
| db1 | t1 |
| db1 | t2 |
| db1 | t3 |
| db2 | t2 |
| db2 | t3 |
| db3 | t1 |
| db3 | t4 |
+--------+-------+
If you need to restore table db3.t4 and the complete database db2 you should do the following on all data nodes (some important options are omitted!):
shell> ndb_restore [...] -r --include-tables=db3.t4
shell> ndb_restore [...] -r --include-databases=db2
In a similar way you can exclude. For example, if you need to restore all database but db1:
shell> ndb_restore [...] -r --exclude-databases=db1
To exclude the table db3.t1, and restore everything else:
shell> ndb_restore [...] -r --exclude-tables=db3.t1
I helped making the initial patches for this feature and hope there are not to much bugs! Maybe some more suggestions on how to improve it?
Of course, all this is documented in the MySQL manual.
A good feedback later on was to show more usecases. This a good point and I think on working on a talk showing less how it works, but how it can be used. Doing both technical details and use-cases is hard to do in 50 mintes.
I feedback is appriciated!
BTW, slide will eventually come online somewhere..
Take for example the following configuration for your SQL Nodes:
[SQL]
Id=10
Hostname= 10.100.9.6
[SQL]
Id=11
Hostname= 10.100.9.6
[SQL]
Id=12
Hostname= 10.100.9.7
[SQL]
Id=13
Hostname= 10.100.9.7
No further slots are defined, and you setup your two MySQL Servers to use connection pooling:
[mysqld]
ndb-cluster-connection-pool = 2
This will result in the following output of SHOW in the ndb_mgm client tool:
[mysqld(API)] 4 node(s)
id=10 @10.100.9.6 (mysql-5.1.30 ndb-6.3.20)
id=11 @10.100.9.6 (mysql-5.1.30 ndb-6.3.20)
id=12 @10.100.9.7 (mysql-5.1.30 ndb-6.3.20)
id=13 @10.100.9.7 (mysql-5.1.30 ndb-6.3.20)
Two problems:
- You don't have a free slot for restoring backups using ndb_restore, or for using other ndb command line tools.
- If you go single user mode, e.g. doing ALTER TABLE, you can't choose 10 or 11 because those will be used by other MySQL connections.
Solution: leave a free [API] slot available to do administrative tasks. Best is to asign it a dedicated NodeID and Hostname so nobody else can use it but the DBA. It's quite simple, and each MySQL Cluster setup should have this already, even before connection pooling was available.
Connection pooling makes it just harder because when you say 'ENTER SINGLE USER MODE 12', you connect to the host '10.100.9.6', you are not sure you'll get this Node ID at all! Eventually you will, but mind that other MySQL connections can use it too.
Not really my job, but lots of us in MySQL Support are developers. Doing some coding from time to time keeps one sharp!
How it works? Let me show it with some (simplified) examples:
# only restore db1
ndb_restore --include-database="db1" /dir/.../
# only restore db2.t2 (1 table)
ndb_restore --include-tables="db2.t2" /dir/.../
# restore the rest of db2
ndb_restore --include-databases="db2" --exclude-tables="db2.t2" /dir/.../
# restore db3 excluding db1 and db2
ndb_restore --exclude-databases="db1,db2" /dir/.../
This isn't pushed yet, but if anyone has some comments or wants something more, just comment on the bug report.
Now I have to figure out how to make test cases and push it to MySQL Cluster 6.3 or 6.4.
