Data archiving with Mqfilter and Tee filters
Data archiving with Mqfilter and Tee filters
This tutorial gives a quick look into how you can combine various filters to create systems for archiving data for analysis. The aim of this tutorial is to show what can be done with MariaDB MaxScale's filters rather than demonstrate a proven method of archiving data. For this tutorial you will need two MariaDB servers, one for archiving the data and one for actual use, a RabbitMQ server and a MariaDB MaxScale server. For testing purposes some of these can locate on the same server but for actual use, an HA solution is recommended.
The use case for this tutorial is a production system with one main server where all queries are routed and an archive server where only INSERT, UPDATE and DELETE statements are routed. The queries routed to the archive servers are also transformed into a canonical format and sent to a RabbitMQ broker for analysis. This setup allows us to control what we send to the server and could possibly allow us to filter out DELETE statements completely, making the archive server a true archive of all data.
Setting up MariaDB MaxScale
The installation of MariaDB MaxScale is covered in the Installation chapter of the MariaDB MaxScale Tutorial.
Setting up the MariaDB servers
Since the archive server will not replicate from the main server, we don't need to set up replication between the two. The only thing we need to do is to create the users we will use for monitoring and authentication.
The process of creating monitoring and authentication users for MariaDB MaxScale is described in the Creating Database Users section of the MariaDB MaxScale Tutorial.
Setting up RabbitMQ server
To set up the RabbitMQ server, follow the instructions for your OS on the RabbitMQ website.
Useful documentation about access rights can be found on the Access Control
page and for UNIX systems the rabbitmqctl
manpage
has all the needed commands to manage your installation of RabbitMQ.
For this tutorial, we will use a RabbitMQ server installed on a CentOS 7 from the RPM packages. Since CentOS 7 doesn't have the RabbitMQ server in the default repositories, we will need two extra repositories: The EPEL repository and the Erlang repository.
First we need the EPEL and Erlang repositories for CentOS 7 after which we will install the RabbitMQ server:
sudo yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm sudo yum -y install http://packages.erlang-solutions.com/erlang-solutions-1.0-1.noarch.rpm sudo yum -y install https://www.rabbitmq.com/releases/rabbitmq-server/v3.5.6/rabbitmq-server-3.5.6-1.noarch.rpm
Next step is to start the RabbitMQ server and configure it.
sudo systemctl start rabbitmq-server
We need to create the msguser
account which we will use to relay the messages
to the broker. This can be done with the rabbitmqctl
command.
sudo rabbitmqctl add_user "msguser" "msgpwd" sudo rabbitmqctl set_permissions "msguser" ".*" ".*" ".*"
This creates the msguser
account with msgpwd
as the password and grants read, write and configurations
permissions on the default virtual host /
.
Configuration file creation
The final step before testing is the creation of the maxscale.cnf file. We start off by adding the two servers to the configuration file.
[production-1] type=server address=192.168.0.200 port=3306 protocol=MariaDBBackend [archive-1] type=server address=192.168.0.201 port=3000 protocol=MariaDBBackend
After we have defined the production-1
and archive-1
servers, we need a monitor
module for those servers. This module will detect if connectivity to the servers
is lost and notify MariaDB MaxScale of the changed server states.
[MySQL-Monitor] type=monitor module=mariadbmon servers=production-1, archive-1 user=maxuser password=maxpwd monitor_interval=5000
The monitor will use the user maxuser
with the password maxpwd
to connect to
the servers and query them for their state. In the servers
parameter we have
listed both of the production-1
and archive-1
servers. All objects in the
MariaDB MaxScale configuration file are referred by their section names. Here the section
names of the servers are used in the servers
parameter. The monitor_interval
parameter controls how often the monitor will poll the servers for status. For
this tutorial, we've set it to 5000 milliseconds.
Next we configure the Production and Archive services. The Production service will be the main access point to the system.
[Production] type=service router=readconnroute servers=production-1 user=maxuser password=maxpwd filters=Tee [Archive] type=service router=readconnroute servers=archive-1 user=maxuser password=maxpwd filters=MQ-Filter
The filters
parameters for the services refer to the filters we will be creating next.
The Production service will use the Tee filter to duplicate INSERT, UPDATE and DELETE
statements to the Archive service. The statements passed to the Archive service will
use the MQ Filter to send the canonical versions of the statements to the RabbitMQ broker.
The Production service will use the production-1
server and the Archive service will
use the archive-1
server. Both services user the maxuser
user with the maxpwd
password.
Next we will configure the listeners for these two services.
[Production-Listener] type=listener service=Production protocol=MariaDBClient port=4000 [Archive-Listener] type=listener service=Archive protocol=MariaDBClient port=4001
The port
parameter controls which port the listener will listen on and where the client
connections should be made. The service
parameter tells which listener belongs to which
service.
After the services and their listeners are configured we will configure the two filters we'll use. We begin with the Tee filter.
[Tee] type=filter module=tee service=Archive match=\(insert\)\|\(update\)\|\(delete\)
The service
parameter controls which service we want to duplicate the statements to.
This needs to be set to the section name of the archive service: Archive
. The match
parameter is a regular expression which, if matched, cause the statement to be duplicated.
Here we will want a regular expression which will match the INSERT, UPDATE and DELETE
statements.
After the Tee filter is configured, we will configure the Mqfilter to communicate with the RabbitMQ server we've set up.
[MQ-Filter] type=filter module=mqfilter hostname=192.168.0.201 port=5672 username=msguser password=msgpwd exchange=msg-ex-1 queue=msg-queue-1 key=MaxScale
The hostname
is the address and the port
is the port of the RabbitMQ server. For
username
and password
we will use the msguser
and msgpwd
credentials we
created earlier. The exchange
is the name of the RabbitMQ exchange we're using and
the key
is the key used for all the sent messages. The messages will be stored
in the mxs-queue-1
queue.
The last thing to add is the administration interface.
[MaxAdmin-Service] type=service router=cli [MaxAdmin-Listener] type=listener service=MaxAdmin-Service protocol=maxscaled socket=default
Now we have created the MariaDB MaxScale configuration file and all we need to do is to save
it in /etc/maxscale.cnf
, start MariaDB MaxScale and test that it works. The testing will
be done in the next section.
Here is the complete configuration file.
# The production and archive servers [production-1] type=server address=192.168.0.200 port=3306 protocol=MariaDBBackend [archive-1] type=server address=192.168.0.201 port=3000 protocol=MariaDBBackend # MySQL server monitor [MySQL-Monitor] type=monitor module=mariadbmon servers=production-1, archive-1 user=maxuser password=maxpwd monitor_interval=5000 # Production service [Production] type=service router=readconnroute servers=production-1 user=maxuser password=maxpwd filters=Tee # Archive service [Archive] type=service router=readconnroute servers=archive-1 user=maxuser password=maxpwd filters=MQ-Filter # Listeners for the services [Production-Listener] type=listener service=Production protocol=MariaDBClient port=4000 [Archive-Listener] type=listener service=Archive protocol=MariaDBClient port=4001 # Tee filter to duplicate insert, update and delete # statements to the archive server [Tee] type=filter module=tee service=Archive match=\(insert\)\|\(update\)\|\(delete\) # Mqfilter to log the canonical versions of the archive # server queries to a RabbitMQ broker [MQ-Filter] type=filter module=mqfilter hostname=192.168.0.201 port=5672 username=msguser password=msgpwd exchange=msg-ex-1 queue=msg-queue-1 key=MaxScale # Administration interface setup [MaxAdmin-Service] type=service router=cli [MaxAdmin-Listener] type=listener service=MaxAdmin-Service protocol=maxscaled socket=default
Testing the setup
Now that we have created the configuration file, prepared the RabbitMQ server and the database servers we can start testing the setup. We do that by starting MariaDB MaxScale:
sudo systemctl start maxscale
We can see the state of the two servers with MaxAdmin:
sudo maxadmin list servers Servers. -------------------+-----------------+-------+-------------+-------------------- Server | Address | Port | Connections | Status -------------------+-----------------+-------+-------------+-------------------- production-1 | 192.168.0.200 | 3306 | 0 | Running archive-1 | 192.168.0.201 | 3000 | 0 | Running -------------------+-----------------+-------+-------------+--------------------
We can also check the queue status on the RabbitMQ server:
sudo rabbitmqctl list_queues Listing queues ...
If we create a connection on the Production service on port 4000 and execute a set of data modifying statements we should see an equal number of statements being sent to the RabbitMQ server:
MariaDB [(none)]> insert into test.t1 values(1); Query OK, 1 row affected (0.09 sec) MariaDB [(none)]> insert into test.t1 values(2); Query OK, 1 row affected (0.07 sec) MariaDB [(none)]> update test.t1 set id = 0; Query OK, 2 rows affected (0.05 sec) Rows matched: 2 Changed: 2 Warnings: 0
On the RabbitMQ server we can see the new queue msg-queue-1
with three new messages in it.
sudo rabbitmqctl list_queues Listing queues ... msg-queue-1 3
We can also see the data on the archive server:
MariaDB [(none)]> select * from test.t1; +------+ | id | +------+ | 0 | | 0 | +------+ 2 rows in set (0.00 sec)
To read the data from the RabbitMQ, we can use the RabbitMQ Consumer tool included in the MariaDB MaxScale source. For a tutorial on how to use this tool, please read RabbitMQ Consumer Client.