This is a read-only copy of the MariaDB Knowledgebase generated on 2024-11-15. For the latest, interactive version please visit https://mariadb.com/kb/.

Poor performance with MariaDB (11) and utf8mb4_unicode_ci (vs utf8mb4_general_ci)

We have a large database using MariaDB 11, and up until now, we have been working with various encodings, primarily utf8mb3_general_ci. As part of a standardization process, we attempted to encode all tables with a single encoding, utf8mb4_unicode_ci. However, two primary SQL queries have become significantly slower, taking over 8 seconds compared to the previous 3 seconds. Upon examining the execution plan, we noticed that MariaDB can no longer utilize an index that was working before. Additionally, there's another query that performs an order by/group by operation on a large table, which never finishes executing, though I can't identify any index-related issues.

It's quite challenging for me to share or explain these queries here, as they are complex and involve multiple tables. So, my question is more general in nature.

Is this behavior normal? Is it bad to use the utf8mb4_general_ci encoding for all tables?

I tried using utf8mb4_general_ci only for one large table and continued with utf8mb4_unicode_ci for the rest. The queries ran faster, but I'm now forced to explicitly specify the same collation when comparing columns.

Thank you

Answer Answered by Daniel Black in this comment.

So not from the latest uca1400 family of collations?

When comparing columns across tables, indexes can be used when the collation matches (as its an order). Are you using ALTER TABLE ... CONVERT TO CHARACTER SET ... COLLATE ....? ALTER TABLE ... DEFAULT COLLATE is only going to change added columns to the table?

Check ANALYZE FORMAT=JSON or optimizer trace for a few more details what's going on.

Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.