This is a read-only copy of the MariaDB Knowledgebase generated on 2024-11-21. For the latest, interactive version please visit https://mariadb.com/kb/.

ColumnStore User Defined Aggregate and Window Functions

Introduction

Starting with MariaDB ColumnStore 1.1, the ability to create and use user defined aggregate and window functions is supported in addition to scalar functions. With Columnstore 1.2, multiple parameters are supported. A C++ SDK is provided as well as 3 reference examples that provide additional functions that may be of general use:

  • median - mathematical median, equivalent to percentile_cont(0.5)
  • avg_mode - mathematical mode, i.e the most frequent value in the set
  • ssq - sum of squares, i.e the sum of each individual number squared in the set

Similar to built in functions, the SDK supports distributed aggregate execution where as much of the calculation is scaled out across PM nodes and then collected / finalized in the UM node. Window functions (due to the ordering requirement) are only executed at the UM level.

Using user defined aggregate functions

The reference examples above are included in the standard build of MariaDB ColumnStore and so can be used by registering them as user defined aggregate functions. The same can be done for new functions assuming the instance has the updated libraries included. From a mcsmysql prompt:

CREATE AGGREGATE FUNCTION median returns REAL soname 'libudf_mysql.so';
CREATE AGGREGATE FUNCTION avg_mode returns REAL soname 'libudf_mysql.so';
CREATE AGGREGATE FUNCTION ssq returns REAL soname 'libudf_mysql.so';

After this these may be used in the same way as any other aggregate or window function like sum:

SELECT grade, 
AVG(loan_amnt) avg, 
MEDIAN(loan_amnt) median 
FROM loanstats 
GROUP BY grade 
ORDER BY grade;

Developing a new function

This requires a MariaDB ColumnStore source tree and necessary tools to compile C/C++ code. The SDK and reference examples are available in the utils/udfsdk directory of the source tree. This contains the SDK documentation which is also available here:

Limitations

  • The implementation of the median and avg_mode functions will scale in memory consumption to the size of the set of unique values in the aggregation.
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.