Geert JM Vanderkelen

Refactored: Poor man’s MySQL replication monitoring

Refactored: Poor man’s MySQL replication monitoring

This is a reply to the blog post Poor man’s MySQL replication monitoring. Haidong Ji had a few problems using MySQLdb (could use the ‘dict’ cursor) and apparently he doesn’t want to much dependencies. I agree that using the mysql client tool is a nice alternative if you don’t want to use any 3rd party Python modules. And the MySQL client tools are usually and should be installed with the server.

However, since MySQL Connector/Python only needs itself and Python, dependencies are reduced to a minimum. Here you’ll find a refactored version of Haidong’s version (can of course be made much more sophisticated) using the connector:

import sys
from socket import gethostname
import smtplib
import mysql.connector

emailSubject = "Replication problem on slave %s"
emailTo = "recipient@example.com"
emailFrom = "monitor-tool@example.com"

def runCmd(cmd):
    cnx = mysql.connector.connect(user='root',
                                  unix_socket='/path/to/mysql.sock')
    cur = cnx.cursor(buffered=True)
    cur.execute(cmd)
    columns = tuple( [d[0].decode('utf8') for d in cur.description] )
    row = cur.fetchone()
    if row is None:
        raise StandardError("MySQL Server not configured as Slave")
    result = dict(zip(columns, row))
    cur.close()
    cnx.close()
    return result

try:
    slave_status = runCmd("SHOW SLAVE STATUS")
except mysql.connector.Error, e:
    print >> sys.stderr, "There was a MySQL error:", e
    sys.exit(1)
except StandardError, e:
    print >> sys.stderr, "There was an error:", e
    sys.exit(1)
    
if (slave_status['Slave_IO_Running'] == 'Yes' and
    slave_status['Slave_SQL_Running'] == 'Yes' and
    slave_status['Last_Errno'] == 0):
    print "Cool"
else:
    emailBody = [
        "From: %s" % emailFrom,
        "To: %s" % emailTo,
        "Subject: %s" % (emailSubject %  gethostname()),
        "",
        '\n'.join([ k + ' : ' + str(v) for k,v in slave_status.iteritems()]),
        "\r\n",
        ]
    server = smtplib.SMTP("localhost")
    server.sendmail(emailFrom, [emailTo], '\r\n'.join(emailBody))
    server.quit()

4 thoughts on “Refactored: Poor man’s MySQL replication monitoring

  1. Joel Chaney

    @Joel Chaney
    Pushed the wrong code. use this one.

    #!/bin/sh
    ## Joel Chaney##
    ## joel.chaney@mongoosemetrics.com (look at robots.txt) ##
    ## 2012-02-03 ##

    repeat_alert_interval=30 # minutes for lock file life
    lock_file=/tmp/slave_alert.lck # location of lock file

    EMAIL=YOURNAME@YOURCOMPANY.DOM # where to send alerts
    SSTATUS=/tmp/sstatus # location of sstatus file

    ### Code — do not edit below ##
    NODE=`uname -n`
    ## Check if alert is locked ##
    function check_alert_lock () {
    if [ -f $lock_file ] ; then
    current_file=`find $lock_file -cmin -$repeat_alert_interval`
    if [ -n "$current_file" ] ; then
    # echo “Current lock file found”
    return 1
    else
    # echo “Expired lock file found”
    rm $lock_file
    return 0
    fi
    else
    touch $lock_file
    return 0
    fi
    }

    SLAVE=mysql

    $SLAVE -e ‘SHOW SLAVE STATUS\G’ > $SSTATUS

    function extract_value {
    FILENAME=$1
    VAR=$2
    grep -w $VAR $FILENAME | awk ‘{print $2}’
    }

    Master_Binlog=$(extract_value $SSTATUS Master_Log_File )
    Master_Position=$(extract_value $SSTATUS Read_Master_Log_Pos )
    Master_Host=$(extract_value $SSTATUS Master_Host)
    Master_Port=$(extract_value $SSTATUS Master_Port)
    Master_Log_File=$(extract_value $SSTATUS Master_Log_File)
    Read_Master_Log_Pos=$(extract_value $SSTATUS Read_Master_Log_Pos)
    Slave_IO_Running=$(extract_value $SSTATUS Slave_IO_Running)
    Slave_SQL_Running=$(extract_value $SSTATUS Slave_SQL_Running)
    Slave_ERROR=$(extract_value $SSTATUS Last_Error)

    ERROR_COUNT=0
    if [ "$Master_Binlog" != "$Master_Log_File" ]
    then
    ERRORS[$ERROR_COUNT]=”master binlog ($Master_Binlog) and Master_Log_File ($Master_Log_File) differ”
    ERROR_COUNT=$(($ERROR_COUNT+1))
    fi

    POS_DIFFERENCE=$(echo ${Master_Position}-${Read_Master_Log_Pos}|bc)

    if [ $POS_DIFFERENCE -gt 1000 ]
    then
    ERRORS[$ERROR_COUNT]=”The slave is lagging behind of $POS_DIFFERENCE”
    ERROR_COUNT=$(($ERROR_COUNT+1))
    fi

    if [ "$Slave_IO_Running" == "No" ]
    then
    ERRORS[$ERROR_COUNT]=”Replication is stopped”
    ERROR_COUNT=$(($ERROR_COUNT+1))
    fi

    if [ "$Slave_SQL_Running" == "No" ]
    then
    ERRORS[$ERROR_COUNT]=”Replication (SQL) is stopped”
    ERROR_COUNT=$(($ERROR_COUNT+1))
    fi

    if [ $ERROR_COUNT -gt 0 ]
    then
    if [ check_alert_lock == 0 ]
    then
    SUBJECT=”${NODE}-ERRORS in replication”
    BODY=”
    CNT=0
    while [ "$CNT" != "$ERROR_COUNT" ]
    do
    BODY=”$BODY ${ERRORS[$CNT]}”
    CNT=$(($CNT+1))
    done
    BODY=$BODY” \n${Slave_ERROR}”
    echo $BODY | mail -s “$SUBJECT” $EMAIL
    fi
    else
    echo “Replication OK”
    fi

Leave a Reply

Your email address will not be published. Required fields are marked *

56 − 51 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>