Monthly Archives: April 2022

Out with the latin1, in with the utf8mb4

The MySQL database underneath this blog has been using the latin1 encoding since the beginning. However, at some point I started posting with Unicode characters encoded with UTF-8. WordPress dutifully stuffed this UTF-8 data into the latin1 columns without complaint. After all, it’s technically valid, even if it’s nonsense.

This wasn’t a problem until I upgraded to MySQL 8.0. Suddenly, characters in a few of my posts were displaying incorrectly. For example, non-breaking spaces in several posts were no longer invisible, showing up as “Å” in odd places. Characters such as “ö” were obliterated. Worse yet, my posts were a mix of latin1 and UTF-8 encoding.

Thankfully, Nic Jansma developed a solution for this problem some time ago. It’s not completely automated, but it helps. If you do this yourself make sure you run the script against a copy of your database, over and over again, until you’ve resolved all of the problems.

At this point everything has been updated to UTF-8 using the MySQL utf8mb4 encoding. Please drop me a line if you notice anything that looks messed up.