High Availability (HA) Concepts

Introduction: Eliminating the Single Point of Failure

In the previous lesson, we identified the core components of Kubernetes. We saw that components like the kube-apiserver and etcd are absolutely critical for a functioning cluster. This leads to a very important question: what happens if the single machine running your control plane fails?
If that node goes down, etcd is gone, the API server is unreachable, and the controllers stop running. While your existing applications on the worker nodes might continue to run for a while, the cluster is effectively unmanageable. You can’t deploy new applications, scale existing ones, or respond to any failures.
This is what we call a “single point of failure,” and in any production system, our goal is to eliminate them. The solution is High Availability (HA).
High Availability (HA) is the practice of designing a system to be resilient to failures, ensuring it remains operational and accessible even when individual components fail.

The Foundation of HA: A Resilient etcd Cluster

When we talk about Kubernetes HA, we must start with its brain and source of truth: etcd.
Making etcd highly available is the foundation of a resilient control plane. etcd is a distributed system, designed to be run as a cluster of multiple members. For this cluster to work, it needs to be able to safely write data and agree on the state of the world. To do this, it must achieve quorum.

Understanding Quorum

Quorum is the minimum number of members that must be online and in agreement for the etcd cluster to be operational. The formula for calculating quorum is:
Quorum=(N/2)+1
…where N is the total number of members in the etcd cluster.
This formula is the reason why you will always hear the advice: “Always run an odd number of control plane nodes.” Let’s examine why:
2-Node Cluster: Quorum is (2/2)+1=2. Both nodes must be online. If you lose one, you lose quorum. Fault Tolerance: 0 nodes.
3-Node Cluster: Quorum is (3/2)+1=1.5+1→2. You can lose one node, and the remaining two can still form a quorum. Fault Tolerance: 1 node.
4-Node Cluster: Quorum is (4/2)+1=3. You can still only afford to lose one node. If you lose two, the two remaining members can’t form a quorum. Fault Tolerance: 1 node.
5-Node Cluster: Quorum is (5/2)+1=2.5+1→3. You can lose two nodes, and the remaining three can still form a quorum. Fault Tolerance: 2 nodes.
As you can see, an odd number of nodes always gives you the best return on fault tolerance for the number of machines you are running. This is why production clusters will almost always have 3 or 5 control plane nodes.

Control Plane HA Architectures

Now that we know we need multiple etcd members, where do we run them? This leads to two common HA architectures.
Stacked Control Plane: This is the most common approach. In this model, the etcd members run on the same machines as the other control plane components (kube-apiserver, kube-scheduler, etc.). It is simpler to configure, manage, and requires fewer physical or virtual machines.
External etcd: In this model, the etcd cluster runs on a dedicated set of machines, completely separate from the nodes running the other control plane components. This provides better resilience by decoupling the components. If a control plane node running the API server fails, it doesn’t also take an etcd member with it. This separation is often used for very large-scale or business-critical clusters where maximum resilience is the primary goal.

Accessing the HA Control Plane: The Role of the Load Balancer

With three control plane nodes, each running its own kube-apiserver, a new problem arises: how do worker nodes and kubectl clients know which API server to talk to?
The solution is to place a Load Balancer in front of the control plane nodes.
The load balancer provides a single, stable endpoint (an IP address or DNS name) that acts as the front door for your entire cluster. All components, from the kubelets on the worker nodes to your own command line, are configured to talk to this one address.
The load balancer’s job is twofold:
Distribute Traffic: It distributes incoming API requests across all the healthy kube-apiserver instances.
Provide Failover: It continuously runs health checks on the API servers. If a control plane node fails, the load balancer automatically detects this, stops sending traffic to the failed node, and seamlessly redirects all requests to the remaining healthy nodes.

Beyond the Control Plane: A Holistic Approach to HA

A resilient control plane is the most important step, but true high availability involves the entire cluster. You must design for resilience at every layer.
Worker Nodes & Workloads: Run your worker nodes across different physical locations. In a cloud environment, this means distributing nodes across multiple Availability Zones (AZs), which are effectively separate data centers. To complement this, use Kubernetes features like Pod Anti-Affinity. This allows you to create scheduling rules that prevent multiple replicas of your application from being placed on the same node or even in the same AZ, preventing a single point of failure for your application.
Storage: If your application is stateful, its data must also be highly available. Use resilient network storage solutions that replicate data across multiple locations. In the cloud, this means using storage classes that are multi-AZ aware, so if a Pod fails over to a new AZ, its persistent storage volume can follow it.
Networking: For on-premise clusters, this means having redundant switches, routers, and network paths. In the cloud, you can largely rely on the highly available networking infrastructure provided by your vendor.
By considering resilience at the control plane, worker node, application, storage, and network layers, you can build a truly robust and highly available Kubernetes cluster.