Perl script to convert MySQL character set to utf8

“Groovy, baby!”, as international man of mystery likes to say.

It seems converting character and text data to utf8 character set is a common task for MySQL, especially during an upgrade. In fact, I had trouble with it during server and WordPress database upgrade for this blog site. I wrote about it in this post, where I explained how to do it step-by-step using a command line tool such as mysql, taking advantage of some nifty code generation trick with information_schema.

One drawback of that method is that it’s a manual process, therefore time-consuming and error-prone.

I spent some time today to cook up a simple Perl script to automate this task. See below. Remember to change MyDatabase to your database name (there are 3 places that need to be changed), and adjust the user name and password accordingly. This script also prints out sql statements used during the process so you will know what has been done to the database.

I have tested this successfully on my own WordPress blog. Hopefully it will help somebody out there. Enjoy!

#!/usr/bin/perl
# MyUtf8Converter.pl - convert all character data to utf8 character set
use strict;
use warnings;
use DBI;

# data source name, username, password, connection attributes
my $dsn = "DBI:mysql:MyDatabase:localhost";
my $user_name = "MySQLUserName";
my $password = "MySQLPassword";
my %conn_attrs = (RaiseError => 1, PrintError => 0, AutoCommit => 1);

# connect to database
my $dbh = DBI->connect ($dsn, $user_name, $password, \%conn_attrs);

# Convert char types first. Use information_schema to generate alter table code to convert data to binary first.
my $sth = $dbh->prepare ("SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'char', 'binary')) FROM information_schema.columns WHERE table_schema = 'MyDatabase' and data_type LIKE '%char%'");

$sth->execute ();

# Run alter table stamement. First convert char types to binary types, then convert those to char types utf8 character set
while (my ($val) = $sth->fetchrow_array ())
    {
        print $val, "\n";
        $dbh->do ($val);
        $val =~ s/binary/char/;
        $val = $val . " CHARACTER SET utf8";
        print $val, "\n";
        $dbh->do ($val);
    }
$sth->finish ();
# Now we convert text data. Use information_schema to generate alter table code to convert data to blob first.
$sth = $dbh->prepare ("SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'text', 'blob')) FROM information_schema.columns WHERE table_schema = 'MyDatabase' and data_type LIKE '%text%'");

$sth->execute ();

# Run alter table stamement. First convert text types to blob types, then convert those to text types utf8 character set
while (my ($val) = $sth->fetchrow_array ())
    {
        print $val, "\n";
        $dbh->do ($val);
        $val =~ s/blob/text/;
        $val = $val . " CHARACTER SET utf8";
        print $val, "\n";
        $dbh->do ($val);
    }
$sth->finish ();

$dbh->disconnect ();

13 Comments »

  1. Chuck Said,

    February 16, 2009 @

    you rock, man.

  2. Haidong Ji Said,

    February 16, 2009 @

    Thanks Chuck. Long time no see. Hope all is well.

  3. Bastiaan Said,

    March 17, 2009 @

    ok I did sort of the same but I do get after changing a TEXT field to BLOB than the data was to big?? while the data war almost nothing

    any tips where to look? corrupt unicode?

  4. Haidong Ji Said,

    March 17, 2009 @

    Hi Bastiaan,

    In your case, I doubt it is a matter of corrupt unicode. When the character set is UTF8, MySQL uses 3 bytes to store one character, instead of 1 when the character is Latin1. I suppose that is why the size grown after the conversion?

  5. SL Said,

    May 21, 2009 @

    Hi, I’m getting this in my error log:

    malformed header from script. Bad header=ALTER TABLE wp_comments MODIFY: convert.pl

    (convert.pl being the name of the script)

    The only other change I made to the script other than changing the database info is that I changed “localhost” to the correct hostname, since I was getting an error on that too.

    Do you know what the new error means?

    Thanks!

  6. Haidong Ji Said,

    May 24, 2009 @

    Hmmm, not sure about that SL.

    I tried it again on my own machine and it worked. Hopefully you’ve fixed this problem. If not, send me the script and I can take a look.

  7. Emre Said,

    September 1, 2009 @

    Hi,

    I tried it on my localhost(wamp, perl installed correctly.) but got “500 Internal Server Error”

    By the way can you convert it to php, it will be very useful.

  8. Haidong Ji Said,

    September 1, 2009 @

    Hi Emre,

    Sorry, at this point I don’t have enough time for a proper php code.

    You can try the manual approach documented here and see what happens:

    http://www.haidongji.com/2008/11/11/convert-character-set-to-utf8-in-mysql/

    Hope it helps.

  9. knb Said,

    September 17, 2009 @

    If there are foreign keys defined on in your table (column values referenced from other tables), conversion won’t work for the referenced column. drop f-keys first, then recreate. can be tricky, though.

  10. Haidong Ji Said,

    September 17, 2009 @

    Thanks for pointing that out, knb.

  11. Ипотпал Said,

    November 15, 2009 @

    this is a chunk of gold

  12. Haidong Ji Said,

    November 15, 2009 @

    Thanks Ипотпал. Glad you liked it. Cheers!

  13. The Ji Village News » Having fun with MySQL and Python: converting MySQL character set to utf8 Said,

    September 15, 2010 @

    [...] get me started, I thought it would be fun to rewrite a Perl utility I wrote before with Python. That script converts MySQL character sets to utf8, a very common task for wikis and [...]

RSS feed for comments on this post · TrackBack URI

Leave a Comment