Database convert to utf8

I want to convert a database from latin1_swedish_ci to utf8mb3_general_ci.
For this I use the script from Bob https://bobsguides.com/convert-db-utf8.html

But:
2024-09-27_155354

ModX 3.0.5
PHP 8.2.22
Server: Localhost via UNIX socket
Server-Typ: MariaDB
Server-Version: 10.6.19-MariaDB - MariaDB Server
Server-Zeichensatz: cp1252 West European (latin1)

Is this a question or a just posted to let others know?

Did this stop the whole process or were you able to run the rest of the SQL queries?


Also, if you go through the trouble of changing the character set, why not use utf8mb4 instead of utf8mb3? utf8mb3 isn’t real Unicode and can’t store all values (especially Emoticons).

The process is stopped.
I have also tried utf8mb4 and I get the same error message.

With a tool like phpMyAdmin you could check if the index “target” exists in the table modx_access_actiondom. (In phpMyAdmin the indexes are listed in the tab “Structure” under the table with the columns.)

If the index “target” doesn’t exist (what seems to be the case), just delete the line ALTER TABLE modx_access_actiondom DROP INDEX `target` from the SQL code and run it again.

Here is the whole process:
Executing the SQL code on a freshly copied database. I get the following error message:

2024-09-30_084234-content

if I change the type to ‘LONGTEXT’ in the ‘content’ field in the ‘modx_site_content’ table before the SQL command, I get the same error message.

However, the text fields have already been changed to the type ‘blob’.

Before:


After:

In addition, some indexes have been deleted.

But the collation has not been changed and the type is still set to ‘blob’.

If I then execute the SQL command again, it logically does not find the ‘indexes’.

Maybe try changing all occurrences of BLOB in the generated SQL code (or at least the one in the line that fails) to MEDIUMBLOB or LONGBLOB instead.

Yes, if the script fails and stops in the middle of the execution, you have to start again from a fresh database backup (or only run the SQL code after the line that failed).

I changed all BLOB to LONGBLOB.
Now i get the following error:

\xDF is the code for the Eszett (ß), but the same problem seems to occur with e.g. umlauts.


I have never used this script from Bob to convert the charset for a database.

It seems that first, all text columns are changed to binary

ALTER TABLE `myTable` MODIFY `myColumn` BLOB;

and then the columns are changed back to text with the new charset.

ALTER TABLE `myTable` MODIFY `myColumn` varchar(255) CHARACTER SET utf8mb4...;

I don’t know why the binary step is necessary. Maybe @bobray can explain.


If I understand the MySQL documentation correctly, then this extra step (converting the column to binary) is only necessary, if the column contains values that are encoded with the wrong character set.

If on the other hand, your data is correctly encoded with latin1 in your colum, then I believe only the second line (ALTER TABLE ... MODIFY ... CHARACTER SET ...) is necessary.


I would try deleting all ALTER TABLE ... BLOB; lines from the SQL code, running the remaining code and see if that works correctly (especially for characters like umlauts and eszett).

I believe the conversion to binary is necessary if you are changing the character set of the DB, its tables, and their text fields. It puts them in a state where they have no character set, so it doesn’t matter how they’re encoded, kind of like Greenwich Mean Time.

I could be wrong.

Did you take a look a the MySQL documentation I linked above?

For example it says that…

Suppose that table t has a nonbinary column named col1 defined as CHAR(50) CHARACTER SET latin1 but you want to convert it to use utf8mb4 so that you can store values from many languages. The following statement accomplishes this: ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8mb4;

The additional step with the change to binary is only mentioned, when the encoding of the content is different than defined for the column. So you need this step only, if the data is already correctly encoded, but you want to correct the dataset of the column (without performing any character conversion).

Thanks for reminding me. I created it quite a while ago. I did it the way I did because some users might have characters that are not encoded correctly. You’re right that it’s probably not necessary for the OP’s case.

Thank you all!
I deleted all ALTER TABLE ... BLOB; lines and it works perfectly!

1 Like