String Manipulation Functions Returns Incorrect Result When Data Values Are In UTF8 Japanese

Hello. I couldn't find a page describing which string manipulation functions are available in MariaDB ColumnStore, so I did a tiny test how Japanese string data will be treated within.

- TRIM : So far confirmed correct results. - UPPER / LEFT : No error given but the results are incorrect. - LENGTH : Seems like this function is returning the consumed bytes, rather than a string length (number of characters) . - LEFT / RIGHT : No error given but the results are incorrect. - SUBSTRING : No error given but the results are incorrect. - REVERSE : If heading whitespace(s) exists, this is not included in the reversing process.

Hadn't tried other functions yet (since I don't know which functions are available) . I found it hard to handle Japanese (nonUSACII or nonLatin1 ?) in ColumnStore mode. In InnoDB mode, no problem observed. Used CharacterSet 'utf8' and Collate 'utf8_general_ci' for this testing.

Below follows the testing code that I've used.

/* Test Code (Begin) */ DROP DATABASE IF EXISTS `kosilatest` ; COMMIT ; CREATE DATABASE IF NOT EXISTS `kosilatest` CHARACTER SET = 'utf8' COLLATE = 'utf8_general_ci'

COMMIT ; USE `kosilatest` ; COMMIT ; DROP TABLE IF EXISTS `test` ; COMMIT ; CREATE TABLE IF NOT EXISTS `test` ( `col1` CHAR(99) , `col2` VARCHAR(99) ) ENGINE=ColumnStore DEFAULT CHARSET=utf8
SHOW TABLES ; DESCRIBE `test` ; TRUNCATE TABLE `test` ; COMMIT ; INSERT INTO `test` ( `col1` , `col2` ) VALUES ( 'AAA' , 'aaa' ) ; INSERT INTO `test` ( `col1` , `col2` ) VALUES ( 'BBBBBB' , 'bbbbbb' ) ; INSERT INTO `test` ( `col1` , `col2` ) VALUES ( 'CCCCCCCCC' , 'ccccccccc' ) ; INSERT INTO `test` ( `col1` , `col2` ) VALUES ( 'A quick brown fox' , 'jumped over the lazy dog,' ) ; INSERT INTO `test` ( `col1` , `col2` ) VALUES ( ' I have a pen ! ' , ' I have a pineapple ? ' ) ; INSERT INTO `test` ( `col1` , `col2` ) VALUES ( 'テスト' , 'テスト' ) ; INSERT INTO `test` ( `col1` , `col2` ) VALUES ( '123' , 'ABC' ) ; INSERT INTO `test` ( `col1` , `col2` ) VALUES ( ' 456 ' , ' xyz ' ) ; COMMIT ; SELECT * FROM `test` ; COMMIT ; SELECT `col1` , TRIM(`col1`) FROM `test` ; SELECT `col2` , UPPER(`col2`) , LOWER(`col2`) FROM `test` ; SELECT `col1` , LENGTH(`col1`) , `col2` , LENGTH(`col2`) FROM `test` ; SELECT `col1` , LEFT(`col1`,2) , `col2` , RIGHT(`col2`,2) FROM `test` ; SELECT `col1` , SUBSTRING(`col1`,2,2) , `col2` , SUBSTRING(`col2`,2,2) FROM `test` ; SELECT `col1` , REVERSE(`col1`) , `col2` , REVERSE(`col2`) FROM `test` ; COMMIT ; DROP DATABASE `kosilatest` ; COMMIT ; /* Test Code (End) */

Answer Answered by David Thompson in this comment.

Thank you for reporting. You are using recommended settings for utf8.

If you are willing to help here it would be great if you could file specific bugs with test cases in jira https://jira.mariadb.org, project MCOL as it'll be easier to track progress there.

Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.