This is a read-only copy of the MariaDB Knowledgebase generated on 2024-11-21. For the latest, interactive version please visit https://mariadb.com/kb/.

Unicode

Unicode is a standard for encoding text across multiple writing systems. MariaDB supports a number of character sets for storing Unicode data:

Character SetDescription
ucs2UCS-2, each character is represented by a 2-byte code with the most significant byte first. Fixed-length 16-bit encoding.
utf8Until MariaDB 10.5, this was a UTF-8 encoding using one to three bytes per character. Basic Latin letters, numbers and punctuation use one byte. European and Middle East letters mostly fit into 2 bytes. Korean, Chinese, and Japanese ideographs use 3-bytes. No supplementary characters are stored. From MariaDB 10.6, utf8 is an alias for utf8mb3, but this can changed to ut8mb4 by changing the default value of the old_mode system variable.
utf8mb3UTF-8 encoding using one to three bytes per character. Basic Latin letters, numbers and punctuation use one byte. European and Middle East letters mostly fit into 2 bytes. Korean, Chinese, and Japanese ideographs use 3-bytes. No supplementary characters are stored. Until MariaDB 10.5, this was an alias for utf8. From MariaDB 10.6, utf8 is by default an alias for utf8mb3, but this can changed to ut8mb4 by changing the default value of the old_mode system variable.
utf8mb4UTF-8 encoding the same as utf8mb3 but which stores supplementary characters in four bytes.
utf16UTF-16, same as ucs2, but stores supplementary characters in 32 bits. 16 or 32-bits.
utf32UTF-32, fixed-length 32-bit encoding.
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.