MariaDB Galera cluster gtid's falling out of sync inspite of setting wsrep_cluster
I have asynchronous replication between 2 clusters (calling master cluster, slave cluster). As expected a node in the slave cluster receives async replication and galera transmits to the rest of the nodes of the slave cluster. When the gtid counter is being incremented on the nodes of this slave cluster, the node acting as the slave node is keeping track of master server id and incrementing the gtid by wsrep_domain_id, server_id, galera counter+1. However the other nodes of the slave cluster are not keeping track of the server id variable and increment gtid by wsrep_domain_id, galera counter+1. See below
Slave node of the slave cluster
show variables like '%gtid%'; +------------------------+-------------------------+ | Variable_name | Value | +------------------------+-------------------------+ | gtid_binlog_pos | 2-100-36261 | | gtid_binlog_state | 2-200-36263,2-100-36261 | | gtid_current_pos | 1-100-36261 | | gtid_domain_id | 22 | | gtid_ignore_duplicates | OFF | | gtid_seq_no | 0 | | gtid_slave_pos | 1-100-36261 | | gtid_strict_mode | OFF | | last_gtid | 2-200-36263 | | wsrep_gtid_domain_id | 2 | | wsrep_gtid_mode | ON | +------------------------+-------------------------+ 11 rows in set (0.00 sec)
Another node in the slave cluster
show variables like '%gtid%'; +------------------------+-------------------------+ | Variable_name | Value | +------------------------+-------------------------+ | gtid_binlog_pos | 2-100-36265 | | gtid_binlog_state | 2-200-36264,2-100-36265 | | gtid_current_pos | 1-100-36259 | | gtid_domain_id | 22 | | gtid_ignore_duplicates | OFF | | gtid_seq_no | 0 | | gtid_slave_pos | 1-100-36259 | | gtid_strict_mode | OFF | | last_gtid | | | wsrep_gtid_domain_id | 2 | | wsrep_gtid_mode | ON | +------------------------+-------------------------+ 11 rows in set (0.00 sec)
This behavior has been noted in MariaDB 10.1.11, 10.1.13 and 10.1.14. In 10.1.11, gtid_slave_pos table is being replicated by galera hence if the slave node goes down it is possible to move the slave process to another node, by first setting the gtid_slave_pos variable. However starting 10.1.13 the table is not longer being replicated I see it as a fix for another issue agreeably, but now there is no means to determine the slave position to be able to start replication on another node of the cluster.
Can this behavior be classified as bug and a bug request opened?