Poor man’s MySQL replication monitoring


Using MySQL replication slave(s) for reporting (with potentially different storage engines) is a very popular way of scaling database read activities. As usual, you want to be on top of things when replication breaks so end users can be notified and issues addressed. When Nagios, Zabbix, or whatever monitoring tools are not available or otherwise not accessible, there got to be another way. I wrote the following Python script for that purpose. The script can then be scheduled via crontab to check replication status in an interval you define. When things break, you get a notification email.

Notes:
1. I toyed with MySQLdb Python module for this task, but I don’t like the fact that I cannot easily retrieve values via column names in a MySQLdb cursor. If there is an easier way that I am not aware of due to my ignorance, I’d appreciate it if you could let me know.

I then tried pyodbc, but got a Python 2.4 compiling issue. I didn’t have time to troubleshoot at the moment, so I opted for subprocess calling mysql client and then processing standard output. The added benefit of this approach is the lack of dependencies on additional Python modules;

2. Here is my test for replication status: I look at 3 replication slave status variables. If Slave_IO_Running == “Yes” and Slave_SQL_Running == “Yes” and Last_Errno == “0”, then I consider that to be success. Otherwise, I want to know about it. Let me know if this test is not sufficient or there is a better way.

3. Good place to scavenge code for running external command, list splitting, removing the first and last list items, and dictionary building for Python.

[sourcecode language=”Python”]
import shlex, smtplib, string, subprocess, sys
from socket import gethostname

def runCmd(cmd):
proc = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False)
out, err = proc.communicate()
ret = proc.returncode
return (ret, out, err)

returnCode, stdOut, stdErr = runCmd(‘mysql -e “show slave status\\G”‘)
if returnCode:
print >> sys.stderr, “There was an error (%d): \n” % returnCode
print >> sys.stderr, stdErr

slaveStatusList = stdOut.split(‘\n’)
#Remove the first and last item of the list
del slaveStatusList[0]
del slaveStatusList[-1]
slaveStatusDict={}
for i in slaveStatusList:
slaveStatusDict[i.split(‘:’)[0].strip()] = i.split(‘:’)[1].strip()

if (slaveStatusDict[“Slave_IO_Running”] == “Yes” and slaveStatusDict[“Slave_SQL_Running”] == “Yes” and slaveStatusDict[“Last_Errno”] == “0”):
print “cool”
else:
emailSubject = “Replication problem on slave ” + gethostname()
emailTo = “firstName.lastName@gmail.com”
emailFrom = “robot@mycompany.com”
emailBody = string.join((
“From: %s” % emailFrom,
“To: %s” % emailTo,
“Subject: %s” % emailSubject ,
“”,
stdOut
), “\r\n”)
server = smtplib.SMTP(“localhost”)
server.sendmail(emailFrom, [emailTo], emailBody)
server.quit()
[/sourcecode]

, ,

3 responses to “Poor man’s MySQL replication monitoring”

  1. Brilliant, Geert and Giuseppe!

    Now this “poor man” is made a bit richer in knowledge, thanks a lot guys! I really appreciate it.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.