mariadb - cannot join cluster
Hi,
I have a 3 nodes mariadb-galera that was working on linux, firewall was stopped but not disabled. After a power failure, firewall was started and all nodes were with safe_to_bootstrap: 0 at the same scn.
I changed safe_to_bootstrap to 1 on one node: this one started.
The other ones didn't start because of firewall. So I stopped all firewalls, re-stopped everything and made a restart: galera_new_cluster on 1st node.
systemctl start mariadb on other nodes... but 2nd and 3rd node don't start.
on 1st node (ip ending with .140), I have (logical) timeouts as other nodes don't start. On 2nd node (ip ending with .141):
---
[root@eidlot2-database-1 ]# systemctl status mariadb
● mariadb.service - MariaDB 10.4.9 database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: active (running) since Mon 2020-01-27 14:23:34 CET; 32min ago
Docs: man:mysqld(8)
https://mariadb.com/kb/en/library/systemd/
Process: 15194 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Process: 15068 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 15066 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 15154 (mysqld)
Status: "Taking your SQL requests now..."
CGroup: /system.slice/mariadb.service
└─15154 /usr/sbin/mysqld --wsrep-new-cluster --wsrep_start_position=81e2e742-0016-11ea-ae1f- 221166a346b3:38024
Jan 27 14:55:44 eidlot2-database-1.pass.lan mysqld[15154]: 2020-01-27 14:55:44 0 [Note] WSREP: (37ab1fde, 'tcp:0.0.0.0:4567') connection to peer 00000000 with addr tcp:172.16.57.141:4567 timed out, no messages seen in PT3S
Jan 27 14:55:46 eidlot2-database-1.pass.lan mysqld[15154]: 2020-01-27 14:55:46 0 [Note] WSREP: (37ab1fde, 'tcp:0.0.0.0:4567') connection to peer 00000000 with addr tcp:172.16.57.144:4567 timed out, no messages seen in PT3S
Jan 27 14:55:48 eidlot2-database-1.pass.lan mysqld[15154]: 2020-01-27 14:55:48 0 [Note] WSREP: (37ab1fde, 'tcp:0.0.0.0:4567') connection to peer 00000000 with addr tcp:172.16.57.141:4567 timed out, no messages seen in PT3S
Jan 27 14:55:50 eidlot2-database-1.pass.lan mysqld[15154]: 2020-01-27 14:55:50 0 [Note] WSREP: (37ab1fde, 'tcp:0.0.0.0:4567') connection to peer 00000000 with addr tcp:172.16.57.144:4567 timed out, no messages seen in PT3S
Jan 27 14:55:52 eidlot2-database-1.pass.lan mysqld[15154]: 2020-01-27 14:55:52 0 [Note] WSREP: (37ab1fde, 'tcp:0.0.0.0:4567') connection to peer 00000000 with addr tcp:172.16.57.141:4567 timed out, no messages seen in PT3S
Jan 27 14:55:54 eidlot2-database-1.pass.lan mysqld[15154]: 2020-01-27 14:55:54 0 [Note] WSREP: (37ab1fde, 'tcp:0.0.0.0:4567') connection to peer 00000000 with addr tcp:172.16.57.144:4567 timed out, no messages seen in PT3S
Jan 27 14:55:56 eidlot2-database-1.pass.lan mysqld[15154]: 2020-01-27 14:55:56 0 [Note] WSREP: (37ab1fde, 'tcp:0.0.0.0:4567') connection to peer 00000000 with addr tcp:172.16.57.141:4567 timed out, no messages seen in PT3S
Jan 27 14:55:58 eidlot2-database-1.pass.lan mysqld[15154]: 2020-01-27 14:55:58 0 [Note] WSREP: (37ab1fde, 'tcp:0.0.0.0:4567') connection to peer 00000000 with addr tcp:172.16.57.144:4567 timed out, no messages seen in PT3S
Jan 27 14:56:00 eidlot2-database-1.pass.lan mysqld[15154]: 2020-01-27 14:56:00 0 [Note] WSREP: (37ab1fde, 'tcp:0.0.0.0:4567') connection to peer 00000000 with addr tcp:172.16.57.141:4567 timed out, no messages seen in PT3S
Jan 27 14:56:02 eidlot2-database-1.pass.lan mysqld[15154]: 2020-01-27 14:56:02 0 [Note] WSREP: (37ab1fde, 'tcp:0.0.0.0:4567') connection to peer 00000000 with addr tcp:172.16.57.144:4567 timed out, no messages seen in PT3S
On node 2:
[root@eidlot2-database-2 ]# systemctl status mariadb
● mariadb.service - MariaDB 10.4.9 database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: failed (Result: exit-code) since Mon 2020-01-27 14:24:55 CET; 33min ago
Docs: man:mysqld(8)
https://mariadb.com/kb/en/library/systemd/
Process: 20219 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
Process: 20133 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 20131 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 20219 (code=exited, status=1/FAILURE)
Status: "MariaDB server is down"
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: at gcomm/src/pc.cpp:connect():158
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:54 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():220: Failed to open backend connection: -110 (Connection timed out)
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:54 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1608: Failed to open channel 'galeratest' at 'gcomm:172.16.57.140,172.16.57.141,172.16.57.144'...tion timed out)
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:54 0 [ERROR] WSREP: gcs connect failed: Connection timed out
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:54 0 [ERROR] WSREP: wsrep::connect(gcomm:172.16.57.140,172.16.57.141,172.16.57.144) failed: 7
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:54 0 [ERROR] Aborting
Jan 27 14:24:55 eidlot2-database-2.pass.lan systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE
Jan 27 14:24:55 eidlot2-database-2.pass.lan systemd[1]: Failed to start MariaDB 10.4.9 database server.
Jan 27 14:24:55 eidlot2-database-2.pass.lan systemd[1]: Unit mariadb.service entered failed state.
Jan 27 14:24:55 eidlot2-database-2.pass.lan systemd[1]: mariadb.service failed. Hint: Some lines were ellipsized, use -l to show in full.
[root@eidlot2-database-2 ]# journalctl -xe
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: GCache::RingBuffer initial scan... 0.0% ( 0/314572824 bytes) complete.
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (314572824/314572824 bytes) complete.
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 209-37984
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: GCache::RingBuffer unused buffers scan... 0.0% ( 0/61980600 bytes) complete.
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (61980600/61980600 bytes) complete.
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): found 1/37778 locked buffers
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): used space: 61980600/314572800
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.16.57.141; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa =
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: Service thread queue flushed.
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: #
Assign initial position for certification: 81e2e742-0016-11ea-ae1f-221166a346b3:37984, protocol version: -1
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: Start replication
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: Connecting with bootstrap option: 0
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: Setting GCS initial position to 81e2e742-0016-11ea-ae1f-221166a346b3:37984
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: protonet asio version 0
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: Using CRC-32C for message checksums.
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: backend: asio
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Warning] WSREP: access file(/var/lib/mysqlgvwstate.dat) failed(No such file or directory)
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: restore pc from disk failed
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: GMCast version 0
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: (55e1b535, 'tcp:0.0.0.0:4567') listening at tcp:0.0.0.0:4567
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: (55e1b535, 'tcp:0.0.0.0:4567') multicast: , ttl: 1
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: EVS version 1
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: gcomm: connecting to group 'galeratest', peer '172.16.57.140:,172.16.57.141:,172.16.57.144:'
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: (55e1b535, 'tcp:0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp:172.16.57.141:4567
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: (55e1b535, 'tcp:0.0.0.0:4567') connection established to 55d53448 tcp:172.16.57.144:4567
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: (55e1b535, 'tcp:0.0.0.0:4567') turning message relay requesting on, nonlive peers:
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: (55e1b535, 'tcp:0.0.0.0:4567') connection established to 37ab1fde tcp:172.16.57.140:4567
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: EVS version upgrade 0 -> 1
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: declaring 37ab1fde at tcp:172.16.57.140:4567 stable
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: declaring 55d53448 at tcp:172.16.57.144:4567 stable
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: PC protocol upgrade 0 -> 1
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:24 0 [Note] WSREP: view(view_id(NON_PRIM,37ab1fde,12) memb {
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 37ab1fde,0
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 55d53448,0
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 55e1b535,0
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: } joined {
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: } left {
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: } partitioned {
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 48b83e85,0
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: 4d71f4d4,0
Jan 27 14:24:24 eidlot2-database-2.pass.lan mysqld[20219]: })
Jan 27 14:24:25 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:25 0 [Note] WSREP: (55e1b535, 'tcp:0.0.0.0:4567') connection established to 55d53448 tcp:172.16.57.144:4567
Jan 27 14:24:28 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:28 0 [Note] WSREP: (55e1b535, 'tcp:0.0.0.0:4567') turning message relay requesting off
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:54 0 [Note] WSREP: (55e1b535, 'tcp:0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp:172.16.57.144:4567
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:54 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: at gcomm/src/pc.cpp:connect():158
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:54 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():220: Failed to open backend connection: -110 (Connection timed out)
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:54 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1608: Failed to open channel 'galeratest' at 'gcomm:172.16.57.140,172.16.57.141,172.16.57.144': -110 (Connection
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:54 0 [ERROR] WSREP: gcs connect failed: Connection timed out
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:54 0 [ERROR] WSREP: wsrep::connect(gcomm:172.16.57.140,172.16.57.141,172.16.57.144) failed: 7
Jan 27 14:24:54 eidlot2-database-2.pass.lan mysqld[20219]: 2020-01-27 14:24:54 0 [ERROR] Aborting
Jan 27 14:24:55 eidlot2-database-2.pass.lan systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE
Jan 27 14:24:55 eidlot2-database-2.pass.lan systemd[1]: Failed to start MariaDB 10.4.9 database server.
-- Subject: Unit mariadb.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit mariadb.service has failed. -- -- The result is failed.
Jan 27 14:24:55 eidlot2-database-2.pass.lan systemd[1]: Unit mariadb.service entered failed state.
Jan 27 14:24:55 eidlot2-database-2.pass.lan systemd[1]: mariadb.service failed.
-------------------------------------------------------
(ip .144 is 3rd node, still down, not yet started, so it is normal it gives a timeout -110).
My question: why does node 2 go to failed state? If he joins node 1 they have the majority. Should I also set safe_to_bootstrap to 1 on 2nd node????
Thanks.