Introduction to the CONNECT Engine
CONNECT is not just a new “YASE” (Yet another Storage Engine) that provides another way to store data with additional features. It brings a new dimension to MariaDB, already one of the best products to deal with traditional database transactional applications, further into the world of business intelligence and data analysis, including NoSQL facilities. Indeed, BI is the set of techniques and tools for the transformation of raw data into meaningful and useful information. And where is this data?
"It's amazing in an age where relational databases reign supreme when it comes to managing data that so much information still exists outside RDBMS engines in the form of flat files and other such constructs. In most enterprises, data is passed back and forth between disparate systems in a fashion and speed that would rival the busiest expressways in the world, with much of this data existing in common, delimited files. Target systems intercept these source files and then typically proceed to load them via ETL (extract, transform, load) processes into databases that then utilize the information for business intelligence, transactional functions, or other standard operations. ETL tasks and data movement jobs can consume quite a bit of time and resources, especially if large volumes of data are present that require loading into a database. This being the case, many DBAs welcome alternative means of accessing and managing data that exists in file format."
- Robin Schumacher[1]
What he describes is known as MED (Management of External Data) enabling the handling of data not stored in a DBMS database as if it were stored in tables. An ISO standard exists that describes one way to implement and use MED in SQL by defining foreign tables for which an external FDW (Foreign Data Wrapper) has been developed in C.
However, since this was written, a new source of data was developed as the “cloud”. Data are existing worldwide and, in particular, can be obtained in JSON or XML format in answer to REST queries. From Connect 1.06.0010, it is possible to create JSON, XML or CSV tables based on data retrieved from such REST queries.
MED as described above is a rather complex way to achieve this goal and MariaDB does not support the ISO SQL/MED standard. But, to cover the need, possibly in transactional but mostly in decision support applications, the CONNECT storage engine supports MED in a much simpler way.
The main features of CONNECT are:
- No need for additional SQL language extensions.
- Embedded wrappers for many external data types (files, data sources, virtual).
- NoSQL query facilities for JSON, XML, HTML files and using JSON UDFs.
- NoSQL data obtained from REST queries (requires cpprestsdk).
- NoSQL new data type MONGO accessing MongoDB collections as relational tables.
- Read/Write access to external files of most commonly used formats.
- Direct access to most external data sources via ODBC, JDBC and MySQL or MongoDB API.
- Only used columns are retrieved from external scan.
- Push-down WHERE clauses when appropriate.
- Support of special and virtual columns.
- Parallel execution of multi-table tables (currently unavailable).
- Supports partitioning by sub-files or by sub-tables (enabling table sharding).
- Support of MRR for SELECT, UPDATE and DELETE.
- Provides remote, block, dynamic and virtual indexing.
- Can execute complex queries on remote servers.
- Provides an API that allows writing additional FDW in C++.
With CONNECT, MariaDB has one of the most advanced implementations of MED and NoSQL, without the need for complex additions to the SQL syntax (foreign tables are "normal" tables using the CONNECT engine).
Giving MariaDB easy and natural access to external data enables the use of all of its powerful functions and SQL-handling abilities for developing business intelligence applications.
With version 1.07 of CONNECT, retrieving data from REST queries is available in all binary distributed version of MariaDB, and, from 1.07.002, CONNECT allows workspaces greater than 4GB.
- ↑ Robin Schumacher is Vice President Products at DataStax and former Director of Product Management at MySQL. He has over 13 years of database experience in DB2, MySQL, Oracle, SQL Server and other database engines.