Convert character set to utf8 in MySQL

Update: If you are interested in automating this task with a simple Perl script, I created one here.

Say, you had a MySQL database that stores texts in languages such as Chinese, Japanese, Korean, Hebrew, Arabic, and so on, and recently there was a server move or upgrade, and all of sudden text in the above mentioned languages didn’t display properly. What do we do to fix it then?

That was what happened to me recently. My hosting company migrated this site to a new LAMP server, and managed to screw up the character set so my Chinese entries looked like gibberish. I spent some time researching, and was able to convert my WordPress database character set to utf8 successfully. Below is the notes for my own consumption. If it helps you, dear reader, all the better!

Please follow the procedures below step by step in the order presented here. If you need help running sql script using command line tools, I have a little discussion here. I am assuming you have access to command line tools or things like phpMyAdmin. Please remember to replace MyDb with your database name.

1. Switch to information_schema database by:

USE information_schema;

2. Run this statement below. It will generate sql statements to be executed later. Save the result in a sql script file.

SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'char', 'binary'), ';') FROM columns WHERE table_schema = 'MyDb' and data_type LIKE '%char%';

3. Run this statement below. It will generate sql statements to be executed later. Save the result in the same sql script file, below the sql statements generated earlier;

SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', REPLACE(column_type, 'text', 'blob'), ';') FROM columns WHERE table_schema = 'MyDb' and data_type LIKE '%text%';

4. Run this statement below. It will generate sql statements to be executed later. Save the result in the same sql script file, below the sql statements generated earlier;

SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', column_type, ' CHARACTER SET utf8;') FROM columns WHERE table_schema = 'MyDb' and data_type LIKE '%char%';

5. Run this statement below. It will generate sql statements to be executed later. Save the result in the same sql script file, below the sql statements generated earlier;

SELECT CONCAT('ALTER TABLE ', table_name, ' MODIFY ', column_name, ' ', column_type, ' CHARACTER SET utf8;') FROM columns WHERE table_schema = 'MyDb' and data_type LIKE '%text%';

6. Add the line below to the sql script file;

ALTER DATABASE MyDb CHARACTER SET utf8;

7. Switch to MyDb by:

USE MyDb;

8. Run the script file generated from the above steps, and you should be good.

Enjoy! My procedure above is especially useful for upgrading databases from MySQL 4 and earlier to MySQL 5 and later, as discussed here. It appears that for MySQL 5 and later, it is possible to convert character set of all columns in one command, like:

ALTER TABLE wp_posts CONVERT TO CHARACTER SET utf8;

However, I didn’t test it so I am not sure.

By the way, if you are interested in collation in MySQL, especially for Chinese, I wrote a little note here.

33 thoughts on “Convert character set to utf8 in MySQL”

  1. This has been an on-going problem since the early days of MySQL, none of the advice or recommendations worked properly, even those suggested by MySQL!

    Your solution *WORKS*, it’s intuitive and easy to implement.

    This is a nicely written article. Thank you for sharing this information.

    Xie xie

    Andy

  2. 不用谢, you are welcome Andy. Glad it helped.

    I plan to write a Perl script to do this all automatically. That should make it even easier. When I do, I will link it here.

  3. Hello,

    I’ve started the process, but observe that the order is perhaps not correct. The var type is changed before the utf8 is introduced. The schema generates an ALTER statement which searches for the original column type in all cases. If the type is changed first, they get changed back on the next step. Please advise. Thanks for your contribution.

  4. I found an explanation here regarding the blob/varbinary technique implemented here. Now I understand the double-processing of candidate fields in the db tables.

  5. Sorry for the late reply alank. Glad you found out the reason.

    I do want to make this an automated process, through Perl. Just haven’t got the time to do it yet.

  6. 我也尝试过直接转换数据表(table)还有列(columns)的字符集,从gbk/gb2312转到utf8,或者反向,但结果不行,当时是在5.0版本下做的。找时间再按照你的方法做一遍试试看。

  7. 用我写的这个方法应当可以。我也写了个简单的Perl脚本。
    http://www.haidongji.com/2009/02/16/perl-script-to-convert-mysql-character-set-to-utf8/

    但如果有很多数据的话,速度会慢。如果用Fulltext的索引,还会有其他麻烦。

    我和Ryan Lowe在电邮里讨论过这个问题,他的这篇博客就更深入,很有参考意义。

    http://www.mysqlperformanceblog.com/2009/03/17/converting-character-sets/

    按:刚到你的网站逛逛,感觉有很多实用的信息,以加入我的Google Reader。叶大侠如果是苏北徐州附近的人,我们的方言都差不多。

  8. 我大学是在福建读的,厦门大学。今年夏天会故地重游。

  9. Hello,

    I tried using this with a MediaWiki db, and ran into a spot of bother with in the searchindex table.

    For example, this line,

    ALTER TABLE searchindex MODIFY si_title varbinary(255);

    threw this error:

    ERROR 1283 (HY000) at line 108: Column ‘si_title’ cannot be part of FULLTEXT index

    I thought it might have been remedied by changing the collation of each table from latin1 to utf8. However, after that step, I still got the same error. I’m not certain what causes this stumble, but thought I’d share it here, at any rate. Any thoughts?

  10. could you say a few words about DATA WAREHOUSING? Any advice about how to acquire relevant skills? Thank you!

  11. Naitan,

    Data Warehousing is a big topic. I suggest looking for books on Amazon and read reviews, then find the book that has the best review and read it.

    Cheers.

  12. Gee dude,
    Thanks for this post, this really helped me getting closer to a solution to my problem!

  13. Thanks for this solution to converting a mysql database to utf8 it rocks.

    had one slight problem i had a table called interval , so altered the sql slightly to quote around it like so `interval`

    then worked perfectly :)

  14. hi haidong,

    I have a question, your scripts seems can not solve the problem of foreign key with ‘char’ column inside, any suggestion ?

  15. Hi Haidong Ji,

    I adapted your script for converting a latin1 WP MU site into UTF8 and it worked perfectly. Many thanks for sharing your knowledge –and beside, it’s a beautiful solution;-)

  16. Tomas,

    Not too sure what you mean by that. Short of having access to the server and files, it is hard to see where exactly the problem is.

    Daniel,

    Sharing with and benefiting from each other is my goal. I am really glad it helped. Cheers!

  17. Hi, thx for your work – it was a starter but the problem is more complex as the browsersettings can ignore the websites needs. So if you set your browser not to autorecognize the websites coding – rather than to a fixed setting this will lead also after a clean-up to a messed database. So i think it should be fixed in the post-message in WordPress.

    yours Joe

  18. Cheers, Joe.

    Yeah, if a browser forces a particular encoding, it will display gibberish.

  19. Hey,
    adding this to the comment-template helped:
    form accept-charset=”UTF-8″

    but it is still not final. Every modern Browser except IE<8 accept this – so it is an IE/old-Browser-Issue. I am looking for some Java-Script to get the Browsers settings but was unlucky yet.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>