High Availability
This section provide guidance on how to run MariaDB
and MaxScale
in high availability mode. If you are looking to run the operator in HA as well, please refer to the Helm documentation.
Our recommended HA setup for production is:
- Galera with at least 3 nodes. Always an odd number of nodes.
- MaxScale with at least 2 nodes to load balance requests to the Galera cluster.
- Use dedicated nodes to avoid noisy neighbours.
- Define pod disruption budgets.
Refer to the following sections for further detail.
Table of contents
Kubernetes Services
In order to address nodes, MariaDB Enterprise Operator provides you with the following Kubernetes Services
:
<mariadb-name>
: To be used for read requests. It will point to all nodes.<mariadb-name>-primary
: To be used for write requests. It will point to a single node, the primary.<mariadb-name>-secondary
: To be used for read requests. It will point to all nodes, except the primary.
Whenever the primary changes, either by the user or by the operator, both the <mariadb-name>-primary
and <mariadb-name>-secondary
Services
will be automatically updated by the operator to address the right nodes.
The primary may be manually changed by the user at any point by updating the spec.galera.primary.podIndex
field. Alternatively, automatic primary failover can be enabled by setting spec.galera.primary.automaticFailover
, which will make the operator to switch primary whenever the primary Pod
goes down.
MaxScale
While Kubernetes Services
can be utilized to dynamically address primary and secondary instances, the most robust high availability configuration we recommend relies on MaxScale. Please refer to MaxScale docs for further detail.
Pod Anti-Affinity
WARNING
Bear in mind that, when enabling this, you need to have at least as manyNodes
available as the replicas specified. Otherwise yourPods
will be unscheduled and the cluster won't bootstrap.
To achieve real high availability, we need to run each MariaDB
Pod
in different Kubernetes Nodes
. This practice, known as anti-affinity, helps reducing the blast radius of Nodes
being unavailable.
By default, anti-affinity is disabled, which means that multiple Pods
may be scheduled in the same Node
, something not desired in HA scenarios.
You can selectively enable anti-affinity in all the different Pods
managed by the MariaDB
resource:
apiVersion: enterprise.mariadb.com/v1alpha1 kind: MariaDB metadata: name: mariadb-galera spec: bootstrapFrom: restoreJob: affinity: antiAffinityEnabled: true ... metrics: exporter: affinity: antiAffinityEnabled: true ... affinity: antiAffinityEnabled: true
Anti-affinity may also be enabled in the the resources that have a reference to MariaDB
, resulting in their Pods
being scheduled in Nodes
where MariaDB
is not running. For instance, the Backup
and Restore
processes can run in different Nodes
:
apiVersion: enterprise.mariadb.com/v1alpha1 kind: Backup metadata: name: backup spec: mariaDbRef: name: mariadb-galera ... affinity: antiAffinityEnabled: true
apiVersion: enterprise.mariadb.com/v1alpha1 kind: Restore metadata: name: restore spec: mariaDbRef: name: mariadb-galera ... affinity: antiAffinityEnabled: true
In the case of MaxScale
, the Pods
will also be placed in Nodes
isolated in terms of compute, ensuring isolation not only among themselves but also from the MariaDB
Pods
. For example, if you run a MariaDB
and MaxScale
with 3 replicas each, you will need 6 Nodes
in total:
apiVersion: enterprise.mariadb.com/v1alpha1 kind: MaxScale metadata: name: maxscale-galera spec: mariaDbRef: name: mariadb-galera ... metrics: exporter: affinity: antiAffinityEnabled: true ... affinity: antiAffinityEnabled: true
Default anti-affinity rules generated by the operator might not satisfy your needs, but you can always define your own rules. For example, if you want the MaxScale
Pods
to be in different Nodes
, but you want them to share Nodes
with MariaDB
:
apiVersion: enterprise.mariadb.com/v1alpha1 kind: MaxScale metadata: name: maxscale-galera spec: mariaDbRef: name: mariadb-galera ... affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app.kubernetes.io/instance operator: In values: - maxscale-galera # 'mariadb-galera' instance omitted (default anti-affinity rule) topologyKey: kubernetes.io/hostname
Dedicated Nodes
If you want to avoid noisy neighbours running in the same Kubernetes Nodes
as your MariaDB
, you may consider using dedicated Nodes
. For achieving this, you will need:
- Taint your
Nodes
and add the counterpart toleration in yourPods
.IMPORTANT
Tainting yourNodes
is not covered by this operator, it is something you need to do by yourself beforehand. You may take a look at the Kubernetes documentation to understand how to achieve this. -
Select the
Nodes
wherePods
will be scheduled in via anodeSelector
.NOTE
Although you can use the defaultNode
labels, you may consider adding more significative labels to yourNodes
, as you will have to set to them in yourPod
nodeSelector
. Refer to the Kubernetes documentation. -
Add
podAntiAffinity
to yourPods
as described in the Pod Anti-Affinity section.
The previous steps can be achieved by setting these fields in the MariaDB
resource:
apiVersion: enterprise.mariadb.com/v1alpha1 kind: MariaDB metadata: name: mariadb-galera spec: ... tolerations: - key: "enterprise.mariadb.com/ha" operator: "Exists" effect: "NoSchedule" nodeSelector: "enterprise.mariadb.com/node": "ha" affinity: antiAffinityEnabled: true
Pod Disruption Budgets
IMPORTANT
Take a look at the Kubernetes documentation if you are unfamiliar toPodDisruptionBudgets
By defining a PodDisruptionBudget
, you are telling Kubernetes how many Pods
your database tolerates to be down. This quite important for planned maintenance operations such as Node
upgrades.
MariaDB Enterprise Operator creates a default PodDisruptionBudget
if you are running in HA, but you are able to define your own by setting:
apiVersion: enterprise.mariadb.com/v1alpha1 kind: MariaDB metadata: name: mariadb-galera spec: ... podDisruptionBudget: maxUnavailable: 33%