Node Joining Failure When wsrep_sst_method=mariabackup
Hello,
Simplest possible test environment for evaluating MariaDB clustering as follows (aka Steps to Reproduce):
Two fresh installs of CentOS 7 minimal, updated to all latest packages. Remove all firewall rules (non-internet facing test servers).
Imported MariaDB yum repo details.
yum install MariaDB-server MariaDB-backup perl-DBD-MySQL socat
As of this writing, MariaDB-server and MariaDB-backup vs 10.3.8 gets installed, all to default locations.
/etc/my.cnf.d.server.cnf on both machines include:
wsrep_on=ON wsrep_provider=/usr/lib64/galera/libgalera_smm.so wsrep_cluster_address=gcomm://10.0.0.1,10.0.0.2 wsrep_sst_method=mariabackup wsrep_sst_auth=******:******
Bring up first MariaDB server with:
galera_new_cluster
Create SST user on first MariaDB server using the same credentials specified above for wsrep_sst-auth:
GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT, SUPER ON *.* TO ******@'localhost' identified by '******';
Bring up second MariaDB server with:
systemctl start mariadb
File innobackup.backup.log on first MariaDB (the donor) includes lines such as:
Straming ./mysql/event.MYD to STDOUT ...done
Last line in innobackup.backup.log on that machine is:
completed OK!
Second server still fails to join cluster. Log file on second MariaDB server ends with:
WSREP_SST: [INFO] Cleaning the binlog directory /var/log/mysql/bin as well (20180723 22:16:40.849) removed ‘/var/log/mysql/bin/mariadb-master-bin.000001’ WSREP_SST: [INFO] Waiting for SST streaming to complete! (20180723 22:16:40.855) 2018-07-23 22:16:41 0 [Note] WSREP: (11d8586e, 'tcp://0.0.0.0:4567') turning message relay requesting off 2018-07-23 22:16:52 0 [Note] WSREP: 1.0 (mariadb1.local): State transfer to 0.0 (mariadb2.local) complete. 2018-07-23 22:16:52 0 [Note] WSREP: Member 1.0 (mariadb1.local) synced with group. WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql//.sst (20180723 22:16:52.240) WSREP_SST: [INFO] Evaluating mariabackup --innobackupex --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20180723 22:16:52.244) rm: cannot remove ‘/var/lib/mysql//innobackup.prepare.log’: No such file or directory rm: cannot remove ‘/var/lib/mysql//innobackup.move.log’: No such file or directory WSREP_SST: [INFO] Moving the backup to /var/lib/mysql/ (20180723 22:16:52.644) WSREP_SST: [INFO] Evaluating mariabackup --innobackupex --move-back --force-non-empty-directories ${DATA} &>${DATA}/innobackup.move.log (20180723 22:16:52.647) WSREP_SST: [ERROR] Cleanup after exit with status:1 (20180723 22:16:52.662) 2018-07-23 22:16:52 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '10.0.0.2' --datadir '/var/lib/mysql/' --parent '5115' --binlog '/var/log/mysql/bin/mariadb-master-bin' : 1 (Operation not permitted) 2018-07-23 22:16:52 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script. 2018-07-23 22:16:52 0 [ERROR] WSREP: SST failed: 1 (Operation not permitted) 2018-07-23 22:16:52 0 [ERROR] Aborting
No [ERROR] lines on the first MariaDB server at all.
Have been running MariaDB clusters with rsync SST method for years just fine. Above example using rsync SST method instead works perfectly. Cannot get even this most simple test scenario to work with mariaback method. Grateful for any suggestions please.
Answer Answered by Geoff Montee in this comment.
First, setting the SST user's GRANT line to include the IP addresses instead of localhost. Previously, I was using 'sst'@'localhost' as is the normal/basic setup. However, MariaBackup seems to require an IP address for each node (e.g. Node1 needs a grant for 'sst'@'node2', 'sst'@'node3', etc). As the current setup has a private RFC1918 /24 subnet available, I used 'sst'@'10.20.30.%' for the GRANT.
This is not correct. The 'sst'@'localhost' user account on the donor node is sufficient. If this were a privilege issue, then your SST would have failed on the donor node--not the joiner node. See here:
https://mariadb.com/kb/en/library/mariabackup-sst-method/#authentication-and-privileges
Finally, the 'datadir' entry under [mysqld] MUST exist or MariaBackup will make a mess of itself.
This is most likely what caused your problems. However, keep in mind that this bug has been fixed in MariaDB 10.3.10 and later. See here:
https://mariadb.com/kb/en/library/mariabackup-overview/#no-default-datadir
No [ERROR] lines on the first MariaDB server at all.
In the future, remember to check the SST logs on the donor node and the joiner node! See here:
https://mariadb.com/kb/en/library/mariabackup-sst-method/#logs