Data corruption on the secondary peer of a MariaDB-Galera cluster, while the main peer is OK, so how to recover?
We have a Galera cluster of MariaDB databases, consisting of:
galera-4.x86_64 v26.4.9-1.el7.centos @mariadb
MariaDB-server.x86_64 v10.5.12-1.el7.centos @mariadb
The cluster has two nodes working in an active-active way, and the main node has more weight than the secondary node to break even during a network outage.
The recent vulnerability scanning includes simulated cyberattack, and the test caused failure on the mariadbd
daemon. The main node failed to restart, and the secondary node's system log contains messages about data corruption.
We recovered the cluster by:
- first, re-bootstrapping the main node, and then
- restarting the secondary node with
innodb_force_recovery=1
, and then - restarting the secondary node again with
innodb_force_recovery=1
removed from the configuration file.
The database cluster works OK, and we got successful results on testing transactions. However, mysqlcheck
found corrupted index B-trees on the secondary node, while it shows OK on the main node.
Our Question:
- Considering the clustering context, what is the best way to fix the data corruption?
- What might be the causes of corruption, and how to prevent it in the future?
- Miscellaneously and optionally, if we want to troubleshoot similar issues by ourselves, how do we start learning the programming basics of this open-source project? Tentatively, we are thinking about learning 1) core dumping, 2) core back tracing, and 3) debugging the database binary executable with source code, however, we will listen to your advice to get started.
We are new to this area, and we highly appreciate hints, suggestions, and reference links.
Details:
The screenshot of mysqlcheck
results on the secondary node.
-sh-4.2$ mysqlcheck --all-databases --verbose -u username -p Enter password: Processing databases appdb appdb.auth_group OK appdb.auth_group_permissions OK appdb.auth_permission OK appdb.auth_user OK appdb.auth_user_groups OK appdb.auth_user_user_permissions OK appdb.authtoken_token OK appdb.celery_taskmeta OK appdb.celery_tasksetmeta OK appdb.corsheaders_corsmodel OK appdb.django_admin_log OK appdb.django_content_type OK appdb.django_migrations OK appdb.django_session OK appdb.django_site OK appdb.djcelery_crontabschedule OK appdb.djcelery_intervalschedule OK appdb.djcelery_periodictask OK appdb.djcelery_periodictasks OK appdb.djcelery_taskstate OK appdb.djcelery_workerstate OK appdb.eav_attribute OK appdb.eav_enumgroup OK appdb.eav_enumgroup_enums OK appdb.eav_enumvalue OK appdb.eav_value OK appdb.app_business OK appdb.app_customer Warning : InnoDB: The B-tree of index PRIMARY is corrupted. Warning : InnoDB: The B-tree of index app_customer_transaction_id_2xxxxxxxxxxxxxxa_uniq is corrupted. Warning : InnoDB: The B-tree of index eb_transaction_id_2xxxxxxxxxxxxxxa_fk_app_transaction_id is corrupted. error : Corrupt appdb.app_emailtemplate OK appdb.app_ministry OK appdb.app_orderitem OK appdb.app_orderitemlog OK appdb.app_product OK appdb.app_producttemplate OK appdb.app_producttemplateattribute OK appdb.app_transaction OK appdb.app_transactionattribute OK appdb.app_userbusiness OK information_schema -sh-4.2$