Refactored: Poor man’s MySQL replication monitoring

This is a reply to the blog post Poor man’s MySQL replication monitoring. Haidong Ji had a few problems using MySQLdb (could use the ‘dict’ cursor) and apparently he doesn’t want to much dependencies. I agree that using the mysql client tool is a nice alternative if you don’t want to use any 3rd party Python modules. And the MySQL client tools are usually and should be installed with the server.

However, since MySQL Connector/Python only needs itself and Python, dependencies are reduced to a minimum. Here you’ll find a refactored version of Haidong’s version (can of course be made much more sophisticated) using the connector:

import sys
from socket import gethostname
import smtplib
import mysql.connector

emailSubject = "Replication problem on slave %s"
emailTo = "recipient@example.com"
emailFrom = "monitor-tool@example.com"

def runCmd(cmd):
    cnx = mysql.connector.connect(user='root',
                                  unix_socket='/path/to/mysql.sock')
    cur = cnx.cursor(buffered=True)
    cur.execute(cmd)
    columns = tuple( [d[0].decode('utf8') for d in cur.description] )
    row = cur.fetchone()
    if row is None:
        raise StandardError("MySQL Server not configured as Slave")
    result = dict(zip(columns, row))
    cur.close()
    cnx.close()
    return result

try:
    slave_status = runCmd("SHOW SLAVE STATUS")
except mysql.connector.Error, e:
    print >> sys.stderr, "There was a MySQL error:", e
    sys.exit(1)
except StandardError, e:
    print >> sys.stderr, "There was an error:", e
    sys.exit(1)
    
if (slave_status['Slave_IO_Running'] == 'Yes' and
    slave_status['Slave_SQL_Running'] == 'Yes' and
    slave_status['Last_Errno'] == 0):
    print "Cool"
else:
    emailBody = [
        "From: %s" % emailFrom,
        "To: %s" % emailTo,
        "Subject: %s" % (emailSubject %  gethostname()),
        "",
        '\n'.join([ k + ' : ' + str(v) for k,v in slave_status.iteritems()]),
        "\r\n",
        ]
    server = smtplib.SMTP("localhost")
    server.sendmail(emailFrom, [emailTo], '\r\n'.join(emailBody))
    server.quit()

Custom logger for your MySQL Cluster data nodes

The MySQL Cluster data node log files can become very big. The best solution is to actually fix the underlying problem. But if you know what you are doing, you can work around it and filter out these annoying log entries.

An example of ‘annoying’ entries is when you run MySQL Cluster on virtual machines (not good!) and disks and OS can’t follow any more; a few lines from the ndb_X_out.log:

2011-04-03 10:52:31 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Scanning Timers elapsed=100
2011-04-03 10:52:31 [ndbd] INFO     -- timerHandlingLab now: 1301820751642 sent: 1301820751395 diff: 247
2011-04-03 10:52:31 [ndbd] INFO     -- Watchdog: User time: 296  System time: 536
2011-04-03 10:52:31 [ndbd] INFO     -- Watchdog: User time: 296  System time: 536
2011-04-03 10:52:31 [ndbd] WARNING  -- Watchdog: Warning overslept 276 ms, expected 100 ms.
2011-04-03 10:53:33 [ndbd] WARNING  -- Ndb kernel thread 0 is stuck in: Performing Receive elapsed=100
2011-04-03 10:53:33 [ndbd] INFO     -- Watchdog: User time: 314  System time: 571
2011-04-03 10:53:33 [ndbd] INFO     -- timerHandlingLab now: 1301820813839 sent: 1301820813476 diff: 363
2011-04-03 10:53:33 [ndbd] INFO     -- Watchdog: User time: 314  System time: 571

You can’t set the log levels like you would do for the cluster logs produced by the management node. However, you can run the data nodes so they put messages to STDOUT and redirect it to a script:

ndbd --nodaemon 2>&1 | ndbd_logger.py /var/log/ndb_3_out.log &

And here’s the ndbd_logger.py script filtering out the ‘annoying’ messages. Extra candy: it fixes lines which do not have a timestamp!

import sys
import os
import socket
from time import strftime

FILTERED = (
  'Watchdog',
  'timerHandlingLab',
  'time to complete',
)      

def main():
  try:
    log_file = sys.argv[1]
  except IndexError:
    print "Need location for log file (preferable absolute path)"
    sys.exit(1)
  
  try:
    fp = open(log_file,'ab')
  except IOError, e:
    print "Failed openeing file: %s" % e
    sys.exit(2)
    
  while True:
    line = sys.stdin.readline().strip()
    if line == '':
      break
    for f in FILTERED:
      if line.find(f) > -1:
        continue
    if line.find('[ndbd]') == -1:
      line = strftime('%Y-%m-%d %H:%M:%S [ndbd] NA       -- ') + line
    fp.write(line + '\n')
    fp.flush()
  fp.write(strftime('%Y-%m-%d %H:%M:%S Closing log\n'))
  fp.close()

if __name__ == '__main__':
  main()

The above script can definitely be improved, but it shows the basics. I particularly like the timestamp fixing.