MySQL Cluster: replication failover (+ some Python)

Replication channel failover is discussed in the MySQL manual and it is quite easy to implement. When the primary replication channel fails, all you need to do is some SQL queries, do a CHANGE MASTER TO, issue a START SLAVE and it should be replicating again.

This fail-over procedure is also very useful if you are upgrading the Master MySQL Cluster mysqld processes doing binary logging. For this you do not need a (controlled) fail-over to the Slave Cluster. You simply point the SQL Node reading events from the Master to another mysqld and switch back when the upgrade is done.

The MySQL manual describes what to do, here is a summary in a few point:

  1. Stop the Slave SQL Node
  2. Get the last epoch on this Slave SQL Node
  3. Using this epoch, get the binary log file and position on the new Master SQL Node
  4. On the Slave SQL Node, use CHANGE MASTER TO with the file and log position from previous point
  5. Start the Slave SQL Node, and it should work

Few notes:

  • You need to have 2 SQL nodes on the Master MySQL Cluster which are configured to do binary logging. If you haven’t, please do ASAP.
  • It’s good to temporize between steps 2 and 3, like 10 seconds.
  • Yes, SQL Node is in this case a mysqld process.

The following Python functions are just examples, but hopefully they useful for somebody (full script not provided but it is using MySQL Connector/Python!):

def _get_last_epoch(slave):
  c = slave.cursor()
  stmt = "SELECT MAX(epoch) AS lastepoch FROM mysql.ndb_apply_status"
  c.execute(stmt)
  row = c.fetchone()
  c.close()
  return row[0]

def _get_binary_log_info(master, lastEpoch):
  c = master.cursor()
  stmt = ("SELECT SUBSTRING_INDEX(File, '/', -1),"
    "Position "
    "FROM mysql.ndb_binlog_index "
    "WHERE epoch > %s "
    "ORDER BY epoch ASC LIMIT 1")
  c.execute(stmt, (lastEpoch,))
  row = c.fetchone()
  c.close()
  return row

def _change_master(slave,masterHost=None,
                   masterPort=None,
                   logFile=None,logPos=None):
  stmtArgs = []
  params = []
  if masterHost:
    stmtArgs.append('MASTER_HOST=%s')
    params.append(masterHost)
  if masterPort:
    stmtArgs.append('MASTER_PORT=%s')
    params.append(masterPort)
  if logFile:
    stmtArgs.append('MASTER_LOG_FILE=%s')
    params.append(logFile)
  if logPos:
    stmtArgs.append('MASTER_LOG_POS=%s')
    params.append(logPos)

  c = slave.cursor()
  stmt = "CHANGE MASTER TO %s" % ', '.join(stmtArgs)
  c.execute(stmt, tuple(params))
  c.close()

Update 2010-02-26: Updating the text a bit; correcting Python code.