Installing Perl DBI and DBD-mysql on Windows 64 bit

I had trouble getting Perl DBI and DBD-mysql on Windows in the past. In addition, on Windows 64-bit, you sometimes see recommendations of using 32-bit Perl.

Today I got to test the latest 64-bit ActiveState Perl distro for Windows, version 5.12.3.1204. I tested it on Windows 2008 R2 64-bit. I am happy to report that it works. I am not categorically recommend FOR the installation of 64-bit Perl on Windows, though.

Here are the steps:
1. Get the ActiveState Perl 64-bit package for Windows and install it, following all the default options;
2. On command prompt, do:
cd c:\perl64\bin
ppm install DBI
ppm install DBD-mysql

I then tested against both Oracle’s MySQL 5.5 Community Server and MariaDb’s 5.2.7 on Windows with MaatKit’s mk-table-checksum to confirm. And it worked fine:

C:\Users\Administrator\Downloads\maatkit-7540\maatkit-7540\bin>c:\Perl64\bin\perl.exe mk-table-checksum –databases mysql h=localhost,u=root,p=password

Finding long running INNODB transactions

Notes:
1. The script prints out elapsed time since transaction started, MySQL thread id, and the kill statement for transactions running longer than a defined threshold value, in seconds. Just copy, paster, and then execute the kill statement if you want to terminate the long transaction(s);
2. Adjust shellCmd variable;
3. Adjust longRunningThreshold value as needed. It is measured in seconds;
4. No special libraries/modules needed, as long as there is a working mysql client;
5. re module is used for regex processing. Good place to find examples of regular expression search and grouping. A status variable is used to assist locating MySQL thread id once a transaction running longer than the defined threshold is found.

import re, shlex, subprocess 

def runCmd(cmd):
    proc = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False)
    out, err = proc.communicate()
    ret = proc.returncode
    return (ret, out, err)

shellCmd = """mysql -h hostName -e "show innodb status\\G""""
longRunningThreshold = 600

returnCode, stdOut, stdErr = runCmd(shellCmd)

targetTransactionFound = False
for line in stdOut.split('\n'):
    if targetTransactionFound:
        match = re.search(r'^MySQL\sthread\sid\s(\d+),', line)
        if match:
            print 'MySQL thread id', match.group(1), ' has been running for ', secondsTransactionElapsed, ' seconds'
            print 'To kill it, run: kill', match.group(1)
            targetTransactionFound = False
    else:
        match = re.search(r'^---TRANSACTION\s\w+,\sACTIVE\s(\d+)\ssec', line)
        if match:
            if (long(match.group(1)) > longRunningThreshold):
                targetTransactionFound = True
                secondsTransactionElapsed = match.group(1)

Some notes and observations on ICE storage engine

I’ve used Vertica, a commercial column-based database storage engine, and was reasonably impressed. During O’Reilly MySQL conference last month, I checked out Infobright’s vendor booth and talked with some users. I became curious and wanted to test it out. Infobright has a free community version (ICE Infobright Community Edition)of its column-based storage engine that works with MySQL, which was what I used for my testing. I have no relationship with Infobright whatsoever, I happen to think that column-based storage can be a potentially disruptive technology in the BI/DW field. I’d love to hear your comments/experiences.

Here are some noteworthy points:

1. Setup is pretty easy. You can follow steps here. Note that the package has most relavant MySQL tools. A separate install of mysql client and server is NOT needed.

The ICE package has the following storage engines bundled:
BRIGHTHOUSE
MRG_MYISAM
CSV
MyISAM
MEMORY

2. I used TPC-H’s dbgen tool to generate data for testing. The raw text file is around 8.8 GB, about 72 million rows;

3. I used Rackspace’s cloud server, CentOS 5.5, 1 GB memory, 64-bit, 35 GB of hard drive space for testing. I created 2 databases, each with one table called lineitem: one table uses the BRIGHTHOUSE storage engine, the other uses the MyISAM storage engine. No index on the MyISAM table initially;

4. On this particular Rackspace server, below is the record of how long it took to load that amount of data into BRIGHTHOUSE table:

# time mysql-ib infobright < load.sql 

real	22m46.974s
user	0m2.320s
sys	0m16.140s
&#91;/sourcecode&#93;

And here is the record of how long it takes to load into the MyISAM table:

&#91;sourcecode language="text"&#93;
# time mysql-ib test < load.sql 

real	6m11.966s
user	0m1.960s
sys	0m14.420s
&#91;/sourcecode&#93;

Here is what's inside load.sql:

&#91;sourcecode language="text"&#93;
load data local infile '/root/dbgen/lineitem.tbl' into table lineitem fields terminated by '|' lines terminated by '\n';
&#91;/sourcecode&#93;

5. Size comparison, in GB. Here you can see the power of the impressive compression rate of a column-based storage engine:
Raw Text: 8.8
Raw Text After GZIP: 2.6
MyISAM File Size: 7.2
InfobrightFileSize: 1.5

6. I did a rudimentary performance comparison. The first one is on Infobright table, the second on MyISAM table:

&#91;sourcecode language="text"&#93;
mysql> select count(*) from lineitem where l_shipdate between '1993-01-01' and '1995-01-01';
+----------+
| count(*) |
+----------+
| 21880025 |
+----------+
1 row in set (15.06 sec)

mysql> use test;
Database changed
mysql> select count(*) from lineitem where l_shipdate between '1993-01-01' and '1995-01-01';
+----------+
| count(*) |
+----------+
| 21880025 |
+----------+
1 row in set (1 min 9.23 sec)

I then created an index on l_shipdate for the MyISAM table, and improved the performance to a bit more than 10 seconds.

Setting up replication with XtraBackup

I attended Vadim Tkachenko’s talk on XtraBackup during MySQL conference in Santa Clara last week. Backups are obviously very important, but the use case I had in mind is this:
Replicating a database that has Innodb tables in it, while keeping both master and slave on line if possible.

Tangent: by the way, I love the native backup utility that was once promised in MySQL 6.0, similar to SQL Server’s way of backup. It was like running “BACKUP myDb to DISK = ‘/backupDirectory/myDb.bak'” under mysql client, but I digress…

I have used mysqldump to accomplish this in the past, but I wondered how XtraBackup would fare in this task, especially after hearing Vadim’s talk and reading news on Percona’s development effort. To cut to the chase, this is my conclusion. Reproducing steps are listed immediately afterwards.
1. innobackupex provides a consisten database backup, spitting out log file and log positions in stdout, which is nice and useful for slave initiation;
2. It works with both MyISAM and innodb tables;
3. If MyISAM tables are all you have, just run innobackupex –prepare /directoryWhereBackupIs, and then move the database directory from under /directoryWhereBackupIs to under your slave’s datadir, then make the necessary group and owner change to said directory and its content files, and you are ready to run the “change master” command and start slave;
4. If the database has innodb tables, then in addition to step 3, you will also need to stop mysql on slave, move the ibdata1 file to datadir, then restart mysql, and run “change master…” and “start slave” commands. It does not matter if you are using innodb_file_per_table or not.

It will be nice if I can keep the slave up and running during this step when the database has innodb tables in it. Did I do anything wrong? Is there a better way? What if the slave has a database that has innodb tables and thus uses ibdata1 to begin with? What do you do then? Should I play with Tungsten’s replication? What are the compelling reasons to use Tungsten’s replication?

In any case, from my limited testing, I think I will use innobackupex for future replication creation tasks, if I can afford a mysqld restart. Overall, it feels a bit easier than mysqldump approach that I’ve been using in the past.

Here are the steps needed to reproduce:

1. Fire up 2 Rackspace CentOS 5.5 servers. Rackspace cloud servers beat Amazon EC2 servers hands down, in my view, for developing/sandboxing purposes;
2. Install the required mysql client, server, and XtraBackup on both servers;
3. Make /etc/my.cnf by cloning the sample cnf files under /usr/share/my-small.cnf. 3 minimum changes were necessary: log-bin=mysql-bin, server-id=a unique number, datadir=/var/lib/mysql. The first 2 are necessary for replication, the last is needed for innobackupex

Well, while you are at it, on slave, add in read-only and skip-slave-start if appropriate. That’s best practice for read only slave.

4. Add master server’s public key to authorized_keys on slave, to facilitate easy ssh connection.
5. On master, run this command:

innobackupex --databases=test --stream=tar /tmp/ --slave-info | ssh root@slave "tar xfi - -C /root"
When it finishes, you should see something like this:
110419 18:54:21  innobackupex: completed OK!
tar: Read 6656 bytes from -

Take note of 3 lines immediately above it, where it states the binlog file and log position, like this:

innobackupex: MySQL binlog position: filename 'mysql-bin.000002', position 2515

6. On slave, run this command:

innobackupex --apply-log /locationWhereBackupIs

then, assuming the database name is test, run the 2 commands below to change the group and owner to mysql:

chgrp -R mysql test
chown -R mysql test

move the directory under mysqld’s datadir:

mv test/ /mysql/datadir

If test database has innodb tables in it, stop mysql on slave, then copy ibdata1 to datadir, restart mysql.

7. On master, open up port 3306 if it is not already open, then create the replication account:

grant replication slave, replication client on *.* to repl@'50.56.121.%' identified by 'p@ssw0rd';

8. On slave, run:

change master to master_host='50.56.121.96', master_user='repl', master_password='p@ssw0rd', master_log_file='see output from innobackupex backup command on master', master_log_pos=numFrominnobackupexOutputOnMaster;

start slave;

show slave status\G

Poor man’s MySQL replication monitoring

Using MySQL replication slave(s) for reporting (with potentially different storage engines) is a very popular way of scaling database read activities. As usual, you want to be on top of things when replication breaks so end users can be notified and issues addressed. When Nagios, Zabbix, or whatever monitoring tools are not available or otherwise not accessible, there got to be another way. I wrote the following Python script for that purpose. The script can then be scheduled via crontab to check replication status in an interval you define. When things break, you get a notification email.

Notes:
1. I toyed with MySQLdb Python module for this task, but I don’t like the fact that I cannot easily retrieve values via column names in a MySQLdb cursor. If there is an easier way that I am not aware of due to my ignorance, I’d appreciate it if you could let me know.

I then tried pyodbc, but got a Python 2.4 compiling issue. I didn’t have time to troubleshoot at the moment, so I opted for subprocess calling mysql client and then processing standard output. The added benefit of this approach is the lack of dependencies on additional Python modules;

2. Here is my test for replication status: I look at 3 replication slave status variables. If Slave_IO_Running == “Yes” and Slave_SQL_Running == “Yes” and Last_Errno == “0”, then I consider that to be success. Otherwise, I want to know about it. Let me know if this test is not sufficient or there is a better way.

3. Good place to scavenge code for running external command, list splitting, removing the first and last list items, and dictionary building for Python.

import shlex, smtplib, string, subprocess, sys
from socket import gethostname

def runCmd(cmd):
    proc = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False)
    out, err = proc.communicate()
    ret = proc.returncode
    return (ret, out, err)

returnCode, stdOut, stdErr = runCmd('mysql -e "show slave status\\G"')
if returnCode:
        print >> sys.stderr, "There was an error (%d): \n" % returnCode
        print >> sys.stderr, stdErr

slaveStatusList = stdOut.split('\n')
#Remove the first and last item of the list
del slaveStatusList[0]
del slaveStatusList[-1]
slaveStatusDict={}
for i in slaveStatusList:
        slaveStatusDict[i.split(':')[0].strip()] = i.split(':')[1].strip()

if (slaveStatusDict["Slave_IO_Running"] == "Yes" and slaveStatusDict["Slave_SQL_Running"] == "Yes" and slaveStatusDict["Last_Errno"] == "0"):
        print "cool"
else:
        emailSubject = "Replication problem on slave " + gethostname()
        emailTo = "firstName.lastName@gmail.com"
        emailFrom = "robot@mycompany.com"
        emailBody = string.join((
                "From: %s" % emailFrom,
                "To: %s" % emailTo,
                "Subject: %s" % emailSubject ,
                "",
                stdOut
                ), "\r\n")
        server = smtplib.SMTP("localhost")
        server.sendmail(emailFrom, [emailTo], emailBody)
        server.quit()