This is a reply to the blog post Poor man’s MySQL replication monitoring. Haidong Ji had a few problems using MySQLdb (could use the ‘dict’ cursor) and apparently he doesn’t want to much dependencies. I agree that using the [mysql](http://dev.mysql.com/doc/refman/5.5/en/mysql.html)
client tool is a nice alternative if you don’t want to use any 3rd party Python modules. And the MySQL client tools are usually and should be installed with the server.
However, since MySQL Connector/Python only needs itself and Python, dependencies are reduced to a minimum. Here you’ll find a refactored version of Haidong’s version (can of course be made much more sophisticated) using the connector:
import sys
from socket import gethostname
import smtplib
import mysql.connector
emailSubject = "Replication problem on slave %s"
emailTo = "recipient@example.com"
emailFrom = "monitor-tool@example.com"
def runCmd(cmd):
cnx = mysql.connector.connect(user='root',
unix_socket='/path/to/mysql.sock')
cur = cnx.cursor(buffered=True)
cur.execute(cmd)
columns = tuple( [d[0].decode('utf8') for d in cur.description] )
row = cur.fetchone()
if row is None:
raise StandardError("MySQL Server not configured as Slave")
result = dict(zip(columns, row))
cur.close()
cnx.close()
return result
try:
slave_status = runCmd("SHOW SLAVE STATUS")
except mysql.connector.
Error, e:
print >> sys.stderr, "There was a MySQL error:", e
sys.exit(1)
except StandardError, e:
print >> sys.stderr, "There was an error:", e
sys.exit(1)
if (slave_status['Slave_IO_Running'] == 'Yes' and
slave_status['Slave_SQL_Running'] == 'Yes' and
slave_status['Last_Errno'] == 0):
print "Cool"
else:
emailBody = [
"From: %s" % emailFrom,
"To: %s" % emailTo,
"Subject: %s" % (emailSubject % gethostname()),
"",
'\n'.join([ k + ' : ' + str(v) for k,v in slave_status.iteritems()]),
"\r\n",
]
server = smtplib.
SMTP("localhost")
server.sendmail(emailFrom, [emailTo], '\r\n'.join(emailBody))
server.quit()
Comments
@Joel Chaney Pushed the wrong code. use this one.
#!/bin/sh
Joel Chaney
joel.chaney@mongoosemetrics.com (look at robots.txt)
2012-02-03
repeat_alert_interval=30 # minutes for lock file life lock_file=/tmp/slave_alert.lck # location of lock file
EMAIL=YOURNAME@YOURCOMPANY.DOM # where to send alerts SSTATUS=/tmp/sstatus # location of sstatus file
Code -- do not edit below
NODE=
uname -n
Check if alert is locked
function check_alert_lock () { if [ -f $lock_file ] ; then current_file=
find $lock_file -cmin -$repeat_alert_interval
if [ -n “$current_file” ] ; then # echo “Current lock file found” return 1 else # echo “Expired lock file found” rm $lock_file return 0 fi else touch $lock_file return 0 fi }SLAVE=mysql
$SLAVE -e ‘SHOW SLAVE STATUS\G’ > $SSTATUS
function extract_value { FILENAME=$1 VAR=$2 grep -w $VAR $FILENAME | awk ‘{print $2}’ }
Master_Binlog=$(extract_value $SSTATUS Master_Log_File ) Master_Position=$(extract_value $SSTATUS Read_Master_Log_Pos ) Master_Host=$(extract_value $SSTATUS Master_Host) Master_Port=$(extract_value $SSTATUS Master_Port) Master_Log_File=$(extract_value $SSTATUS Master_Log_File) Read_Master_Log_Pos=$(extract_value $SSTATUS Read_Master_Log_Pos) Slave_IO_Running=$(extract_value $SSTATUS Slave_IO_Running) Slave_SQL_Running=$(extract_value $SSTATUS Slave_SQL_Running) Slave_ERROR=$(extract_value $SSTATUS Last_Error)
ERROR_COUNT=0 if [ “$Master_Binlog” != “$Master_Log_File” ] then ERRORS[$ERROR_COUNT]=“master binlog ($Master_Binlog) and Master_Log_File ($Master_Log_File) differ” ERROR_COUNT=$(($ERROR_COUNT+1)) fi
POS_DIFFERENCE=$(echo ${Master_Position}-${Read_Master_Log_Pos}|bc)
if [ $POS_DIFFERENCE -gt 1000 ] then ERRORS[$ERROR_COUNT]=“The slave is lagging behind of $POS_DIFFERENCE” ERROR_COUNT=$(($ERROR_COUNT+1)) fi
if [ “$Slave_IO_Running” == “No” ] then ERRORS[$ERROR_COUNT]=“Replication is stopped” ERROR_COUNT=$(($ERROR_COUNT+1)) fi
if [ “$Slave_SQL_Running” == “No” ] then ERRORS[$ERROR_COUNT]=“Replication (SQL) is stopped” ERROR_COUNT=$(($ERROR_COUNT+1)) fi
if [ $ERROR_COUNT -gt 0 ] then if [ check_alert_lock == 0 ] then SUBJECT=“${NODE}-ERRORS in replication” BODY=” CNT=0 while [ “$CNT” != “$ERROR_COUNT” ] do BODY=“$BODY ${ERRORS[$CNT]}” CNT=$(($CNT+1)) done BODY=$BODY” \n${Slave_ERROR}” echo $BODY | mail -s “$SUBJECT” $EMAIL fi else echo “Replication OK” fi
…