Having fun with MySQL and Python: converting MySQL character set to utf8


Lately I worked quite a bit with Python and Linux, writing monitoring and automation utilities. I am in a transition period, so I thought I ought to write some Python stuff interfacing with MySQL for fun, and start positioning myself for expanded career horizon, I hope.

To get started, I thought it would be fun to rewrite a Perl utility I wrote before with Python. That script converts MySQL character sets to utf8, a very common task for wikis and blogs during an upgrade. This time, I did everything from scratch: firing up an Amazon EC2 Linux instance, hand install and configuring MySQL 5.1.50 (creating mysql user, group, wget tarball, setting directory ownership and permissions, creating symbolic to MySQL binaries, editing my.cnf, /etc/init.d/ and chkconfig automatic startup, environmental variables, the works), compiled and configured Python 2.7, compiled and configured Python easy_install, compiled and installed MySQLdb module for Python, and finally successfully rewrote and tested the utility in Python. It is listed at the bottom of this post.

The original Perl program used some SQL code generation technique based on metadata within information_schema. As is hopefully well-known by now, information_schema should be used with caution because it can leads to server lockup. In fact, I noticed it when I ran my Perl program on a shared web hosting server. See Baron’s post here for an alternative way of doing it. But my Python program listed below didn’t use it, just to be consistent with the one used in Perl. I had a lot of fun going through this exercise.

Here are some notes I took.
1. It would be really nice if there is a catalog of every publicly available AMIs that contains detailed metadata that we can query, such as:
a. Version and distro of *nix;
b. What has been installed? Apache, MySQL, Python, PHP, Nagios, etc., etc.
c. For the software that has been installed, what is the version? What version(s) of MySQL was installed? What patches have been applied? Is it Python 2.4, 2.5, 2.6, 2.7, 3.0? Things like that.
d. Hardware related stuff: memory, disk size, etc.

Granted, I used AWS Management Console web interface to fire up and terminate instances, and accessed the servers via Putty. I lost my Linux workstation and haven’t built a new one yet. I do recall there were some EC2 command line tools on Linux that can do some of that when I last tried it a few months back. So it could be that things have improved and/or I just didn’t know the right way to get that out. If that is the case, I’d really appreciate it if you can give me some pointers. As it stands, the xml manifest file is too vague and not detailed enough.

2. Trying to do Python upgrade, say from 2.4 to 2.6 or 2.7, is generally not a good idea. Having multiple versions of Python side by side is probably a better way to go. Yes, it can be confusing, but I think in most cases, the cost/benefit analysis weighs against an upgrade: broken libraries and modules, failed cron jobs, you name it.

When having multiple versions of Python side by side, you can define a default version by some creative use of symbolic link. For instance, I did
./configure && make && sudo make install
for Python 2.7 on a server that has 2.4 pre-installed. I then did:
ln -s /usr/local/bin/python2.7 /usr/bin/python
to make Python 2.7 the default version. If I need Python 2.4, I will just call python2.4 binary.

While on that note, I found it surprising that 2 AMI instances (ami-3e836657 and ami-01b75668) I used both had Python 2.4 installed, which in my mind was pretty old.

3. Installing MySQLdb Python module was a bit problematic. On one server,
sudo easy_install mysql-python
did not work for me. I didn’t spend time digging because the following worked:
sudo yum install MySQL-python
On another server, easy_install mysql-python did work, but when I did import MySQLdb, I got:
ImportError: libmysqlclient_r.so.16: cannot open shared object file: No such file or directory
To fix that, do:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/mysql/lib
Here is the Python code
[sourcecode language=”python”]
import MySQLdb
db = MySQLdb.connect(host=”localhost”, user=”root”, passwd=””, db=”charset_test”, unix_socket=”/tmp/mysql.sock”)
cursor = db.cursor()

sql = “SELECT CONCAT(‘ALTER TABLE ‘, table_name, ‘ MODIFY ‘, column_name, ‘ ‘, REPLACE(column_type, ‘char’, ‘binary’)) FROM information_schema.columns WHERE table_schema = ‘charset_test’ and data_type LIKE ‘%char%’”
cursor.execute(sql)
results = cursor.fetchall()

for row in results:
“””Convert char types to binary types first”””
cursor.execute(row[0])
“””Now we convert binary type back to char with CHARACTER SET utf8 defined”””
cursor.execute(row[0].rpartition(‘ ‘)[0] + ‘ ‘ + row[0].rpartition(‘ ‘)[2].replace(‘binary’, ‘char’) + ‘ CHARACTER SET utf8’)

sql = “SELECT CONCAT(‘ALTER TABLE ‘, table_name, ‘ MODIFY ‘, column_name, ‘ ‘, REPLACE(column_type, ‘text’, ‘blob’)) FROM information_schema.columns WHERE table_schema = ‘MyDatabase’ and data_type LIKE ‘%text%’”
cursor.execute(sql)
results = cursor.fetchall()

for row in results:
“””Convert text types to blob types first”””
cursor.execute(row[0])
“””Now we convert blob type back to text with CHARACTER SET utf8 defined”””
cursor.execute(row[0].rpartition(‘ ‘)[0] + ‘ ‘ + row[0].rpartition(‘ ‘)[2].replace(‘blob’, ‘text’) + ‘ CHARACTER SET utf8’)

db.close()
[/sourcecode]

, , , ,

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.