CONNECT Data Types
Contents
Many data types make no or little sense when applied to plain files. This why CONNECT supports only a restricted set of data types. However, ODBC, JDBC or MYSQL source tables may contain data types not supported by CONNECT. In this case, CONNECT makes an automatic conversion to a similar supported type when it is possible.
The data types currently supported by CONNECT are:
Type name | Description | Used for |
---|---|---|
TYPE_STRING | Zero ended string | char, varchar, text |
TYPE_INT | 4 bytes integer | int, mediumint, integer |
TYPE_SHORT | 2 bytes integer | smallint |
TYPE_TINY | 1 byte integer | tinyint |
TYPE_BIGINT | 8 bytes integer | bigint, longlong |
TYPE_DOUBLE | 8 bytes floating point | double, float, real |
TYPE_DECIM | Numeric value | decimal, numeric, number |
TYPE_DATE | 4 bytes integer | date, datetime, time, timestamp, year |
TYPE_STRING
This type corresponds to what is generally known as CHAR or VARCHAR by database users, or as strings by programmers. Columns containing characters have a maximum length but the character string is of fixed or variable length depending on the file format.
The DATA_CHARSET option must be used to specify the character set used in the data source or file. Note that, unlike usually with MariaDB, when a multi-byte character set is used, the column size represents the number of bytes the column value can contain, not the number of characters.
TYPE_INT
The INTEGER type contains signed integer numeric 4-byte values (the int/ of
the C language) ranging from –2,147,483,648
to 2,147,483,647
for signed
type and 0
to 4,294,967,295
for unsigned type.
TYPE_SHORT
The SHORT data type contains signed integer numeric 2-byte values (the short
integer of the C language) ranging from –32,768
to 32,767
for signed
type and 0
to 65,535
for unsigned type.
TYPE_TINY
The TINY data type contains integer numeric 1-byte values (the char of
the C language) ranging from –128
to 127
for signed type and 0
to
255
for unsigned type. For some table types, TYPE_TINY is used to represent Boolean values (0 is false, anything else is true).
TYPE_BIGINT
The BIGINT data type contains signed integer 8-byte values (the long long of the C language) ranging from -9,223,372,036,854,775,808
to
9,223,372,036,854,775,807
for signed type and from 0
to
18,446,744,073,709,551,615
for unsigned type.
Inside tables, the coding of all integer values depends on the table type. In
tables represented by text files, the number is written in characters, while in
tables represented by binary files (BIN
or VEC
) the number is directly
stored in the binary representation corresponding to the platform.
The length (or precision) specification corresponds to the length of the table field in which the value is stored for text files only. It is used to set the output field length for all table types.
TYPE_DOUBLE
The DOUBLE data type corresponds to the C language double type, a floating-point double precision value coded with 8 bytes. Like for integers, the internal coding in tables depends on the table type, characters for text files, and platform binary representation for binary files.
The length specification corresponds to the length of the table field in which the value is stored for text files only. The scale (was precision) is the number of decimal digits written into text files. For binary table types (BIN and VEC) this does not apply. The length and scale specifications are used to set the output field length and number of decimals for all types of tables.
TYPE_DECIM
The DECIMAL data type corresponds to what MariaDB or ODBC data sources call NUMBER, NUMERIC, or DECIMAL: a numeric value with a maximum number of digits (the precision) some of them eventually being decimal digits (the scale). The internal coding in CONNECT is a character representation of the number. For instance:
colname decimal(14,6)
This defines a column colname as a number having a precision of 14 and a scale of 6. Supposing it is populated by:
insert into xxx values (-2658.74);
The internal representation of it will be the character string
-2658.740000
. The way it is stored in a file table depends on the table
type. The length field specification corresponds to the length of the table
field in which the value is stored and is calculated by CONNECT from the
precision and the scale values. This length is precision plus 1 if
scale is not 0 (for the decimal point) plus 1 if this column is not
unsigned (for the eventual minus sign). In fix formatted tables the number is
right justified in the field of width length, for variable formatted
tables, such as CSV, the field is the representing character string.
Because this type is mainly used by CONNECT to handle numeric or decimal fields of ODBC, JDBC and MySQL table types, CONNECT does not provide decimal calculations or comparison by itself. This is why decimal columns of CONNECT tables cannot be indexed.
DATE Data type
Internally, date/time values are stored by CONNECT as a signed 4-byte integer. The value 0 corresponds to 01 January 1970 12:00:00 am coordinated universal time (UTC). All other date/time values are represented by the number of seconds elapsed since or before midnight (00:00:00), 1 January 1970, to that date/time value. Date/time values before midnight 1 January 1970 are represented by a negative number of seconds.
CONNECT handles dates from 13 December 1901, 20:45:52 to 18 January 2038, 19:14:07.
Although date and time information can be represented in both CHAR and INTEGER data types, the DATE data type has special associated properties. For each DATE value, CONNECT can store all or only some of the following information: century, year, month, day, hour, minute, and second.
Date Format in Text Tables
Internally, date/time values are handled as a signed 4-byte integer. But in text tables (type DOS, FIX, CSV, FMT, and DBF) dates are most of the time stored as a formatted character string (although they also can be stored as a numeric string representing their internal value). Because there are infinite ways to format a date, the format to use for decoding dates, as well as the field length in the file, must be associated to date columns (except when they are stored as the internal numeric value).
Note that this associated format is used only to describe the way the temporal value is stored internally. This format is used both for output to decode the date in a SELECT statement as well as for input to encode the date in INSERT or UPDATE statements. However, what is kept in this value depends on the data type used in the column definition (all the MariaDB temporal values can be specified). When creating a table, the format is associated to a date column using the DATE_FORMAT option in the column definition, for instance:
create table birthday ( Name varchar(17), Bday date field_length=10 date_format='MM/DD/YYYY', Btime time field_length=8 date_format='hh:mm tt') engine=CONNECT table_type=CSV; insert into birthday values ('Charlie','2012-11-12','15:30:00'); select * from birthday;
The SELECT query returns:
Name | Bday | Btime |
---|---|---|
Charlie | 2012-11-12 | 15:30:00 |
The values of the INSERT statement must be specified using the standard MariaDB syntax and these values are displayed as MariaDB temporal values. Sure enough, the column formats apply only to the way these values are represented inside the CSV files. Here, the inserted record will be:
Charlie,11/12/2012,03:30 PM
Note: The field_length option exists because the MariaDB syntax does not allow specifying the field length between parentheses for temporal column types. If not specified, the field length is calculated from the date format (sometimes as a max value) or made equal to the default length value if there is no date format. In the above example it could have been removed as the calculated values are the ones specified. However, if the table type would have been DOS or FIX, these values could be adjusted to fit the actual field length within the file.
A CONNECT format string consists of a series of elements that represent a particular piece of information and define its format. The elements will be recognized in the order they appear in the format string. Date and time format elements will be replaced by the actual date and time as they appear in the source string. They are defined by the following groups of characters:
Element | Description |
---|---|
YY | The last two digits of the year (that is, 1996 would be coded as "96"). |
YYYY | The full year (that is, 1996 could be entered as "96" but displayed as “1996”). |
MM | The one or two-digit month number. |
MMM | The three-character month abbreviation. |
MMMM | The full month name. |
DD | The one or two-digit month day. |
DDD | The three-character weekday abbreviation. |
DDDD | The full weekday name. |
hh | The one or two-digit hour in 12-hour or 24-hour format. |
mm | The one or two-digit minute. |
ss | The one or two-digit second. |
t | The one-letter AM/PM abbreviation (that is, AM is entered as "A"). |
tt | The two-letter AM/PM abbreviation (that is, AM is entered as "AM"). |
Usage Notes
- To match the source string, you can add body text to the format string, enclosing it in single quotes or double quotes if it would be ambiguous. Punctuation marks do not need to be quoted.
- The hour information is regarded as 12-hour format if a “t” or “tt” element follows the “hh” element in the format or as 24-hour format otherwise.
- The "MM", "DD", "hh", "mm", "ss" elements can be specified with one or two letters (e.g. "MM" or "M") making no difference on input, but placing a leading zero to one-digit values on output[1] for two-letter elements.
- If the format contains elements DDD or DDDD, the day of week name is skipped on input and ignored to calculate the internal date value. On output, the correct day of week name is generated and displayed.
- Temporal values are always stored as numeric in BIN and VEC tables.
Handling dates that are out of the range of supported CONNECT dates
If you want to make a table containing, for instance, historical dates not being convertible into CONNECT dates, make your column CHAR or VARCHAR and store the dates in the MariaDB format. All date functions applied to these strings will convert them to MariaDB dates and will work as if they were real dates. Of course they must be inserted and will be displayed using the MariaDB format.
NULL handling
CONNECT handles null values for data sources able to produce nulls. Currently this concerns mainly the ODBC, JDBC, MONGO, MYSQL, XML, JSON and INI table types. For INI, JSON, MONGO or XML types, null values are returned when the key is missing in the section (INI) or when the corresponding node does not exist in a row (XML, JSON, MONGO).
For other file tables, the issue is to define what a null value is. In a numeric column, 0 can sometimes be a valid value but, in some other cases, it can make no sense. The same for character columns; is a blank field a valid value or not?
A special case is DATE columns with a DATE_FORMAT specified. Any value not matching the format can be regarded as NULL.
CONNECT leaves the decision to you. When declaring a column in the CREATE TABLE statement, if it is declared NOT NULL, blank or zero values will be considered as valid values. Otherwise they will be considered as NULL values. In all cases, nulls are replaced on insert or update by pseudo null values, a zero-length character string for text types or a zero value for numeric types. Once converted to pseudo null values, they will be recognized as NULL only for columns declared as nullable.
For instance:
create table t1 (a int, b char(10)) engine=connect; insert into t1 values (0,'zero'),(1,'one'),(2,'two'),(null,'???'); select * from t1 where a is null;
The select query replies:
a | b |
---|---|
NULL | zero |
NULL | ??? |
Sure enough, the value 0 entered on the first row is regarded as NULL for a nullable column. However, if we execute the query:
select * from t1 where a = 0;
This will return no line because a NULL is not equal to 0 in an SQL where clause.
Now let us see what happens with not null columns:
create table t1 (a int not null, b char(10) not null) engine=connect; insert into t1 values (0,'zero'),(1,'one'),(2,'two'),(null,'???');
The insert statement will produce a warning saying:
Level | Code | Message |
---|---|---|
Warning | 1048 | Column 'a' cannot be null |
It is replaced by a pseudo null 0
on the fourth row. Let us see the result:
select * from t1 where a is null; select * from t1 where a = 0;
The first query returns no rows, 0 are valid values and not NULL. The second query replies:
a | b |
---|---|
0 | zero |
0 | ??? |
It shows that the NULL inserted value was replaced by a valid 0 value.
Unsigned numeric types
They are supported by CONNECT since version 1.01.0010 for fixed numeric types (TINY, SHORT, INTEGER, and BITINT).
Data type conversion
CONNECT is able to convert data from one type to another in most cases. These conversions are done without warning even when this leads to truncation or loss of precision. This is true, in particular, for tables of type ODBC, JDBC, MYSQL and PROXY (via MySQL) because the source table may contain some data types not supported by CONNECT. They are converted when possible to CONNECT types.
When converted, MariaDB types are converted as:
MariaDB Types | CONNECT Type | Remark |
---|---|---|
integer, medium integer | TYPE_INT | 4 byte integer |
small integer | TYPE_SHORT | 2 byte integer |
tiny integer | TYPE_TINY | 1 byte integer |
char, varchar | TYPE_STRING | Same length |
double, float, real | TYPE_DOUBLE | 8 byte floating point |
decimal, numeric | TYPE_DECIM | Length depends on precision and scale |
all date related types | TYPE_DATE | Date format can be set accordingly |
bigint, longlong | TYPE_BIGINT | 8 byte integer |
enum, set | TYPE_STRING | Numeric value not accessible |
All text types | TYPE_STRING TYPE_ERROR | Depending on the value of the connect_type_conv system variable value. |
Other types | TYPE_ERROR | Not supported, no conversion provided. |
For ENUM, the length of the column is the length of the longest value of the enumeration. For SET the length is enough to contain all the set values concatenated with comma separator.
In the case of TEXT columns, the handling depends on the values given to the connect_type_conv and connect_conv_size system variables.
Note: BLOB is currently not converted by default until a TYPE_BIN type is added to CONNECT. However, the FORCE option (from Connect 1.06.006) can be specified for blob columns containing text and the SKIP option also applies to ODBC BLOB columns.
ODBC SQL types are converted as:
SQL Types | Connect Type | Remark |
---|---|---|
SQL_CHAR, SQL_VARCHAR | TYPE_STRING | |
SQL_LONGVARCHAR | TYPE_STRING | len = min(abs(len), connect_conv_size) If the column is generated by discovery (columns not specified) its length is connect_conv_size. |
SQL_NUMERIC, SQL_DECIMAL | TYPE_DECIM | |
SQL_INTEGER | TYPE_INT | |
SQL_SMALLINT | TYPE_SHORT | |
SQL_TINYINT, SQL_BIT | TYPE_TINY | |
SQL_FLOAT, SQL_REAL, SQL_DOUBLE | TYPE_DOUBLE | |
SQL_DATETIME | TYPE_DATE | len = 10 |
SQL_INTERVAL | TYPE_STRING | len = 8 + ((scale) ? (scale+1) : 0) |
SQL_TIMESTAMP | TYPE_DATE | len = 19 + ((scale) ? (scale +1) : 0) |
SQL_BIGINT | TYPE_BIGINT | |
SQL_GUID | TYPE_STRING | llen=36 |
SQL_BINARY, SQL_VARBINARY, SQL_LONG-VARBINARY | TYPE_STRING | len = min(abs(len), connect_conv_size ) Only if the value of connect_type_conv is force . The column should use the binary charset. |
Other types | TYPE_ERROR | Not supported. |
JDBC SQL types are converted as:
JDBC Types | Connect Type | Remark |
---|---|---|
(N)CHAR, (N)VARCHAR | TYPE_STRING | |
LONG(N)VARCHAR | TYPE_STRING | len = min(abs(len), connect_conv_size) If the column is generated by discovery (columns not specified), its length is connect_conv_size |
NUMERIC, DECIMAL, VARBINARY | TYPE_DECIM | |
INTEGER | TYPE_INT | |
SMALLINT | TYPE_SHORT | |
TINYINT, BIT | TYPE_TINY | |
FLOAT, REAL, DOUBLE | TYPE_DOUBLE | |
DATE | TYPE_DATE | len = 10 |
TIME | TYPE_DATE | len = 8 + ((scale) ? (scale+1) : 0) |
TIMESTAMP | TYPE_DATE | len = 19 + ((scale) ? (scale +1) : 0) |
BIGINT | TYPE_BIGINT | |
UUID (specific to PostgreSQL) | TYPE_STRING TYPE_ERROR | len=36 If connect_type_conv=NO |
Other types | TYPE_ERROR | Not supported. |
Note: The connect_type_conv SKIP option also applies to ODBC and JDBC tables.