Update: If you are interested in automating this task with a simple Perl script, I created one here.
Say, you had a MySQL database that stores texts in languages such as Chinese, Japanese, Korean, Hebrew, Arabic, and so on, and recently there was a server move or upgrade, and all of sudden text in the above mentioned languages didn’t display properly. What do we do to fix it then?
That was what happened to me recently. My hosting company migrated this site to a new LAMP server, and managed to screw up the character set so my Chinese entries looked like gibberish. I spent some time researching, and was able to convert my WordPress database character set to utf8 successfully. Below is the notes for my own consumption. If it helps you, dear reader, all the better!
Please follow the procedures below step by step in the order presented here. If you need help running sql script using command line tools, I have a little discussion here. I am assuming you have access to command line tools or things like phpMyAdmin. Please remember to replace MyDb with your database name.
1. Switch to information_schema database by:
2. Run this statement below. It will generate sql statements to be executed later. Save the result in a sql script file.
SELECT CONCAT(‘ALTER TABLE ‘, table_name, ‘ MODIFY ‘, column_name, ‘ ‘, REPLACE(column_type, ‘char’, ‘binary’), ‘;’) FROM columns WHERE table_schema = ‘MyDb’ and data_type LIKE ‘%char%’;
3. Run this statement below. It will generate sql statements to be executed later. Save the result in the same sql script file, below the sql statements generated earlier;
SELECT CONCAT(‘ALTER TABLE ‘, table_name, ‘ MODIFY ‘, column_name, ‘ ‘, REPLACE(column_type, ‘text’, ‘blob’), ‘;’) FROM columns WHERE table_schema = ‘MyDb’ and data_type LIKE ‘%text%’;
4. Run this statement below. It will generate sql statements to be executed later. Save the result in the same sql script file, below the sql statements generated earlier;
SELECT CONCAT(‘ALTER TABLE ‘, table_name, ‘ MODIFY ‘, column_name, ‘ ‘, column_type, ‘ CHARACTER SET utf8;’) FROM columns WHERE table_schema = ‘MyDb’ and data_type LIKE ‘%char%’;
5. Run this statement below. It will generate sql statements to be executed later. Save the result in the same sql script file, below the sql statements generated earlier;
SELECT CONCAT(‘ALTER TABLE ‘, table_name, ‘ MODIFY ‘, column_name, ‘ ‘, column_type, ‘ CHARACTER SET utf8;’) FROM columns WHERE table_schema = ‘MyDb’ and data_type LIKE ‘%text%’;
6. Add the line below to the sql script file;
ALTER DATABASE MyDb CHARACTER SET utf8;
7. Switch to MyDb by:
8. Run the script file generated from the above steps, and you should be good.
Enjoy! My procedure above is especially useful for upgrading databases from MySQL 4 and earlier to MySQL 5 and later, as discussed here. It appears that for MySQL 5 and later, it is possible to convert character set of all columns in one command, like:
ALTER TABLE wp_posts CONVERT TO CHARACTER SET utf8;
However, I didn’t test it so I am not sure.
By the way, if you are interested in collation in MySQL, especially for Chinese, I wrote a little note here.
33 responses to “Convert character set to utf8 in MySQL”
我用了个SyntaxHighlighter Plus WordPress 插件。这个插件是建在Google code的syntaxhighlighter。这儿是网站，说不准你也可以用上。
This has been an on-going problem since the early days of MySQL, none of the advice or recommendations worked properly, even those suggested by MySQL!
Your solution *WORKS*, it’s intuitive and easy to implement.
This is a nicely written article. Thank you for sharing this information.
不用谢, you are welcome Andy. Glad it helped.
I plan to write a Perl script to do this all automatically. That should make it even easier. When I do, I will link it here.
I’ve started the process, but observe that the order is perhaps not correct. The var type is changed before the utf8 is introduced. The schema generates an ALTER statement which searches for the original column type in all cases. If the type is changed first, they get changed back on the next step. Please advise. Thanks for your contribution.
I found an explanation here regarding the blob/varbinary technique implemented here. Now I understand the double-processing of candidate fields in the db tables.
Sorry for the late reply alank. Glad you found out the reason.
I do want to make this an automated process, through Perl. Just haven’t got the time to do it yet.
[…] Approach #2 (outlined here): […]
[…] Описан здесь. […]
I tried using this with a MediaWiki db, and ran into a spot of bother with in the searchindex table.
For example, this line,
ALTER TABLE searchindex MODIFY si_title varbinary(255);
threw this error:
ERROR 1283 (HY000) at line 108: Column ‘si_title’ cannot be part of FULLTEXT index
I thought it might have been remedied by changing the collation of each table from latin1 to utf8. However, after that step, I still got the same error. I’m not certain what causes this stumble, but thought I’d share it here, at any rate. Any thoughts?
Check out Ryan Lowe’s discussion of this topic here. He has a script written, and the comments in that entry is also interesting.
Hope that helps. I would be interested in hearing how you solve your issue eventually.
could you say a few words about DATA WAREHOUSING? Any advice about how to acquire relevant skills? Thank you!
Data Warehousing is a big topic. I suggest looking for books on Amazon and read reviews, then find the book that has the best review and read it.
Thanks for this post, this really helped me getting closer to a solution to my problem!
[…] harus dilalui dalam proses konversi ke UTF-8 bagi database yang sudah berisi banyak data. Untungnya di laman inidipaparkan cara yang lebih […]
Thanks for this solution to converting a mysql database to utf8 it rocks.
had one slight problem i had a table called interval , so altered the sql slightly to quote around it like so `interval`
then worked perfectly 🙂
You are welcome, Louie. Cheers!
I have a question, your scripts seems can not solve the problem of foreign key with ‘char’ column inside, any suggestion ?
Hi Haidong Ji,
I adapted your script for converting a latin1 WP MU site into UTF8 and it worked perfectly. Many thanks for sharing your knowledge –and beside, it’s a beautiful solution;-)
Not too sure what you mean by that. Short of having access to the server and files, it is hard to see where exactly the problem is.
Sharing with and benefiting from each other is my goal. I am really glad it helped. Cheers!
Very good articles ) 😛
[…] Actually my atantion comes to some weird characters in my wordpress blog. Such as Ã¼ or Ã¶ as represantants of ä and ö. So i had a look into my mysql-db and saw that it was still on latin1. On my way to the clearance i got over that explanation. But still all howtos doesn’t work out for me. There are others allready had a look into it like fischerlander or Ji’s Blog. […]
Hi, thx for your work – it was a starter but the problem is more complex as the browsersettings can ignore the websites needs. So if you set your browser not to autorecognize the websites coding – rather than to a fixed setting this will lead also after a clean-up to a messed database. So i think it should be fixed in the post-message in WordPress.
Yeah, if a browser forces a particular encoding, it will display gibberish.
[…] following 4 queries are lifted directly from Haidong Ji via the WordPress Codex. I added the pipe out to text file. Note that the path to mysql executable […]
adding this to the comment-template helped:
but it is still not final. Every modern Browser except IE<8 accept this – so it is an IE/old-Browser-Issue. I am looking for some Java-Script to get the Browsers settings but was unlucky yet.
OK, i has just successfull
[…] （代码来源于Haidong Ji's blog中的一篇文章） […]