Vector Overview
MariaDB starting with 11.7.1
Vectors were introduced in MariaDB 11.7.1
MariaDB Vector is a feature that allows MariaDB Server to perform as a relational vector database. Vectors generated by your favourite AI model can be stored and searched for in MariaDB.
The initial implementation uses the modified HNSW algorithm for searching in the vector index (to solve the so-called Approximate Nearest Neighbor problem), and defaults to Euclidean distance.
Creating
Vectors are defined using VECTOR INDEX
for the index definition in the CREATE TABLE with vectors statement.
CREATE TABLE v ( id INT PRIMARY KEY, v VECTOR(5) NOT NULL, VECTOR INDEX (v) );
Inserting
Vector columns store 32-bit IEEE 754 floating point numbers.
INSERT INTO v VALUES (1, x'e360d63ebe554f3fcdbc523f4522193f5236083d'), (2, x'f511303f72224a3fdd05fe3eb22a133ffae86a3f'), (3,x'f09baa3ea172763f123def3e0c7fe53e288bf33e'), (4,x'b97a523f2a193e3eb4f62e3f2d23583e9dd60d3f'), (5,x'f7c5df3e984b2b3e65e59d3d7376db3eac63773e'), (6,x'de01453ffa486d3f10aa4d3fdd66813c71cb163f'), (7,x'76edfc3e4b57243f10f8423fb158713f020bda3e'), (8,x'56926c3fdf098d3e2c8c5e3d1ad4953daa9d0b3e'), (9,x'7b713f3e5258323f80d1113d673b2b3f66e3583f'), (10,x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e');
Alternatively one can use VEC_FromText()
function:
INSERT INTO v VALUES (1,Vec_FromText('[0.418708,0.809902,0.823193,0.598179,0.0332549]')), (2,Vec_FromText('[0.687774,0.789588,0.496138,0.57487,0.917617]')), (3,Vec_FromText('[0.333221,0.962687,0.467263,0.448235,0.475671]')), (4,Vec_FromText('[0.822185,0.185643,0.683452,0.211072,0.554056]')), (5,Vec_FromText('[0.437057,0.167281,0.0770977,0.428638,0.241591]')), (6,Vec_FromText('[0.76956,0.926895,0.803376,0.0157961,0.589042]')), (7,Vec_FromText('[0.493999,0.641957,0.761598,0.94276,0.425865]')), (8,Vec_FromText('[0.924108,0.275466,0.0543329,0.0731585,0.136344]')), (9,Vec_FromText('[0.186956,0.69666,0.0356002,0.668875,0.84722]')), (10,Vec_FromText('[0.415294,0.609278,0.426765,0.988832,0.475556]'));
Querying
VEC_DISTANCE_EUCLIDEAN is a function that calculates a Euclidean (L2) distance between two points.
SELECT id FROM v ORDER BY VEC_DISTANCE_EUCLIDEAN(v, x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e'); +----+ | id | +----+ | 10 | | 7 | | 3 | | 9 | | 2 | | 1 | | 5 | | 4 | | 6 | | 8 | +----+
Most commonly, this kind of query is done with a limit, for example to return vectors that are closest to a given vector, such as from a user search query, image or a song fragment.
SELECT id FROM v ORDER BY VEC_DISTANCE_EUCLIDEAN(v, x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e') LIMIT 2; +----+ | id | +----+ | 10 | | 7 | +----+
System Variables
There are a number of system variables used for vectors:
- mhnsw_max_cache_size — upper limit for one MHNSW vector index cache.
- mhnsw_default_distance — default value for the
DISTANCE
vector index option. - mhnsw_default_m — default value for the
M
vector index option. - mhnsw_ef_search — Defines the minimal number of result candidates to look for in the vector index for ORDER BY ... LIMIT N queries. The search will never search for less rows than that, even if LIMIT is smaller.