MariaDB Columnstore 1.2.3-1 Cluster with replication for um1/um2 fails HA
Hello,
I've been struggling to understand how to handle the HA for this type of cluster - UM1, UM2, PM1, PM2 with Data Redundancy (GlusterFS) and Data Replication enabled.
Having the following disaster scenario:
We have the master host (um1 by default) losing network connection, the remaining frontend server is the slave UM2. During the downtime let's say we have some writes to the UM2 database
When this happens the server takes 30 seconds to detect that UM1 is down, another 30 sec to restart all services on the cluster of the other hosts (PM1, PM2, UM2) and then another 30 seconds to restart them again (I have no idea why is this happening twice though). After the UM1 host returns, it joins the cluster and the above one minute and a half operations repeat again. UM1 becomes slave and UM2 becomes the master, but it does not replicate the newly created records in the database. It will start replicate when you Disable/Enable MySQL Replication and it will only start replicating the new records after that. The old ones during the downtime will remain only at the UM2 server and never get written to UM1.
Is there any way to avoid this and have records sync automatically when the server returns in the cluster? Another question is: What kind of data is saved on the GlusterFS, isn't it supposed to keep such records during downtime of some of the hosts or it is only for PM hosts?