Debugging MySQL Cluster installed using RPMs using gdb

This post explains how to debug MySQL Cluster 7.1, installed using the RPM packages, using gdb on a Linux box (Red Hat, Oracle Enterprise Linux, CentOS, ..).

When a data node crashes lots of information goes into the error log, trace files and out log. However, it makes sometimes sense when you can repeat the crash, to run the data node in debug mode, or using gdb.

First, using RPMs and a Linux distribution, make sure you have the ‘debuginfo’ package installed. For example, for Red Hat or Oracle Enterprise Linux on a 64-bit machine, this package would be called: MySQL-Cluster-gpl-debuginfo-7.1.15-1.rhel5.x86_64.rpm .

Create a file with the following commands, we will name it ‘ndbd.gdb’:

set pagination off
set logging overwrite on
set logging file ndbd_gdb_backtrace.txt
set logging on
run --foreground -c <YourMGM:1186> --ndb-nodeid=<YourID>
thread apply all bt
set logging off

Note line 5: pass the options to ‘run’ which you usually pass when starting ndbd, but leave the –foreground option.
Note line 3: you can save of course the logging file wherever you want to.

Then all you need to do is run gdb with the commands file and the ndbd binary you just created:

shell> gdb /usr/sbin/ndbd -x ndbd.gdb

A full backtrace of threads will be available in the logging file when ndbd crashes. This you can then upload when reporting bugs.

There are probably more options and things you can do, please post them in the comments!

MySQL Cluster: Rotating the log file of the Data Nodes

There is a log file called ndb_<NodeID>_out.log created by the MySQL Cluster data nodes which can become quite big overtime. There is, unlike the cluster logs created by the management nodes, no rotation build in. So you have to revert to the basics and copy the file away, truncating the old one.

For example, if you want to ‘rotate’ the log file of data node with NodeID 3:

shell> mv ndb_3_out.log.1.gz ndb_3_out.log.2.gz
shell> cp ndb_3_out.log ndb_3_out.log.1
shell> cat /dev/null > ndb_3_out.log
shell> gzip ndb_3_out.log.1

It’s not elegant, and you might lose some entries, but it will help you keeping disk usage minimal. If you don’t need the log at all, just line 3 would do the trick.

You can use logrotate‘s copytruncate to achieve something similar.

Custom logger for your MySQL Cluster data nodes

The MySQL Cluster data node log files can become very big. The best solution is to actually fix the underlying problem. But if you know what you are doing, you can work around it and filter out these annoying log entries.

An example of ‘annoying’ entries is when you run MySQL Cluster on virtual machines (not good!) and disks and OS can’t follow any more; a few lines from the ndb_X_out.log:

2011-04-03 10:52:31 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Scanning Timers elapsed=100
2011-04-03 10:52:31 [ndbd] INFO     -- timerHandlingLab now: 1301820751642 sent: 1301820751395 diff: 247
2011-04-03 10:52:31 [ndbd] INFO     -- Watchdog: User time: 296  System time: 536
2011-04-03 10:52:31 [ndbd] INFO     -- Watchdog: User time: 296  System time: 536
2011-04-03 10:52:31 [ndbd] WARNING  -- Watchdog: Warning overslept 276 ms, expected 100 ms.
2011-04-03 10:53:33 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Performing Receive elapsed=100
2011-04-03 10:53:33 [ndbd] INFO     -- Watchdog: User time: 314  System time: 571
2011-04-03 10:53:33 [ndbd] INFO     -- timerHandlingLab now: 1301820813839 sent: 1301820813476 diff: 363
2011-04-03 10:53:33 [ndbd] INFO     -- Watchdog: User time: 314  System time: 571

You can’t set the log levels like you would do for the cluster logs produced by the management node. However, you can run the data nodes so they put messages to STDOUT and redirect it to a script:

ndbd --nodaemon 2>&1 | ndbd_logger.py /var/log/ndb_3_out.log &

And here’s the ndbd_logger.py script filtering out the ‘annoying’ messages. Extra candy: it fixes lines which do not have a timestamp!

import sys
import os
import socket
from time import strftime

FILTERED = (
  'Watchdog',
  'timerHandlingLab',
  'time to complete',
)      

def main():
  try:
    log_file = sys.argv[1]
  except IndexError:
    print "Need location for log file (preferable absolute path)"
    sys.exit(1)
  
  try:
    fp = open(log_file,'ab')
  except IOError, e:
    print "Failed openeing file: %s" % e
    sys.exit(2)
    
  while True:
    line = sys.stdin.readline().strip()
    if line == '':
      break
    for f in FILTERED:
      if line.find(f) > -1:
        continue
    if line.find('[ndbd]') == -1:
      line = strftime('%Y-%m-%d %H:%M:%S [ndbd] NA       -- ') + line
    fp.write(line + '\n')
    fp.flush()
  fp.write(strftime('%Y-%m-%d %H:%M:%S Closing log\n'))
  fp.close()

if __name__ == '__main__':
  main()

The above script can definitely be improved, but it shows the basics. I particularly like the timestamp fixing.

Running MySQL Cluster without Arbitrator: don’t, but if you have to..

This post explains how to disable Arbitration when using MySQL Cluster. It gives a case where this could be useful.

First, a piece of advice: you do not want to run MySQL Cluster with arbitration disabled. But if you must, e.g. because of an oversight in your implementation, you can.
Arbitration is very important in MySQL Cluster. It makes sure you don’t end up with a Split Brain situation: 2 halves working independently, continuing changing data, making it impossible for them to work together later on.

However, Arbitration comes with a price: you need an extra machine. “Sure, what’s the big deal?”. It’s not that easy when you lack the money, or more problematic, when you lack the real-estate in your rack.

Everyone running MySQL Cluster should know that you should not run the ndb_mgmd on the same machines on which the data node processes, ndbd or ndbmtd, are running. The Management Nodes need to be on a separate machine so it can act as an Arbitrator.

Here’s an example why: If you have two hosts A and B and both are running a management and data node process. Host A’s ndb_mgmd is currently the Arbitrator. Now unplug host A *BANG*: one data node and the arbitrator down. The other data node on Host B notices this, and tries to figure out if it can continue. So it checks if it can reach the Arbitrator: but it’s gone as well! So, the data node on host B goes faithfully down. This all happens in a few seconds, there is no time to elect a new Arbitrator. “Cluster’s dead, Jim”.

What if you can’t get a 3rd machine? There’s an option for that.. Data nodes can be configured with setting the Arbitration-option to WaitExternal. This means you will have to develop your own arbitration application or script. How cool is that? Well, it might be cool, but it’s a pain in the butt.

[ndbd default]
Arbitration = WaitExternal
ArbitrationTimeout = 3

What happens with our 2 host setup with above changes: When Host A, which has the Arbitrator, goes down, the data node on Host B will wait for 3 seconds, i.e. ArbitrationTimeout. It will block all incoming transactions, refusing changes. An application, the External Arbitrator, running on Host B (actually on all hosts running MySQL Cluster proceses) has 3 seconds to figure out whether Host B can continue running it’s ndbd process(es), or not. In this case, it should find out that Host A is down and that Host B should continue keeping the data available.

“Ah, easy! Problem solved!”, you might joyfully exclaim. No, it isn’t. It’s more complicated than that. What happens when Host A doesn’t go down, but both hosts can’t see each other due to a network issue between them? Both External Arbitrators would figure out that they need to continue: you end up again with a split brain. So you still need someway to handle that.

At this point, I would like to say: “Goodluck!”. Every situation is going to be different. Everyone will have his own External Arbitrator requirements or ways to check if a host or blade chassis is up or not. It’s a great option, and it puts you more in control of your MySQL Cluster, but it adds a lot of complexity.

So, my advice: revise and correct your MySQL Cluster setup when you think you need to disable Arbitration.

Signals to freeze a Data Node: simulating trouble

Last week I was struggling to find an easy way to simulate a troubled Data Node (ndbd process) using MySQL Cluster. It’s as simple as pancackes: using the kill command!

To freeze a process you just need to kill the process using the SIGSTOP signal. To let the processes continue, use SIGCONT. Here’s an example shell script showing how you would use these two signals on a data node:

# 2010-05-03 08:11:46 [ndbd] INFO     -- Angel pid: 542 ndb pid: 543
NDBDPID=`grep 'Angel pid' ndb_3_out.log | tail -n1 | awk '{ print $11 }'`
kill -STOP $NDBDPID
sleep 10
kill -CONT $NDBDPID

I’m using the out-log because the file ndb_3.pid contains only the PID of the Angel process. The sleep command is something variable which you can set as low or as high as you want.

In the above example the script sleeps long enough for data node to fail with an Arbitration Error. If you would set options HeartbeatIntervalDbDb and TimeBetweenWatchDogCheck to a lower value than the default, you would only be able to sleep for a few seconds. The result:

 [MgmtSrvr] WARNING  -- Node 2: Node 3 missed heartbeat 2
 [MgmtSrvr] WARNING  -- Node 2: Node 3 missed heartbeat 3
 [MgmtSrvr] ALERT    -- Node 1: Node 3 Disconnected
 [MgmtSrvr] ALERT    -- Node 1: Node 3 Disconnected
 [MgmtSrvr] WARNING  -- Node 2: Node 3 missed heartbeat 4
 [MgmtSrvr] ALERT    -- Node 2: Node 3 declared dead due to missed heartbeat
 [MgmtSrvr] INFO     -- Node 2: Communication to Node 3 closed
 [MgmtSrvr] ALERT    -- Node 2: Network partitioning - arbitration required
 [MgmtSrvr] INFO     -- Node 2: President restarts arbitration thread [state=7]
 [MgmtSrvr] ALERT    -- Node 2: Arbitration won - positive reply from node 1
 [MgmtSrvr] ALERT    -- Node 2: Node 3 Disconnected
 [MgmtSrvr] INFO     -- Node 2: Started arbitrator node 1 [ticket=019b00025cc8aad8]
 [MgmtSrvr] ALERT    -- Node 3: Forced node shutdown completed. 
  Caused by error 2305:  'Node lost connection to other nodes and can not
  form a unpartitioned cluster, please investigate if there are error(s)
  on other node(s)(Arbitration error). Temporary error, restart node'.

How is this useful? Well, for simulating a data node which is having problems while having load for example. Maybe you would like to see what happens if you tune the WatchDog or Hearbeat parameters. Or maybe you want to give a demonstration to your management without going through hassel of overloading a disk or CPU or pulling network cables (e.g. for prove of concept).

In any case, I think it’s a cool use of the kill-command. One I didn’t know of.