Skip to content

etcd

What is etcd, what data does it store, and why is it so critical to cluster reliability and consistency?

From the perspective of a platform engineer, etcd is the single most critical database in your Kubernetes infrastructure. It functions as the cluster's definitive "brain" and memory. While the API Server handles the processing of requests, it is stateless; etcd is the only component that persists the state. If you lose etcd, you lose the cluster.

What is etcd?

etcd is a consistent, highly-available key-value store used as the backing store for all Kubernetes cluster data,. Unlike traditional relational databases (like SQL) used in many web applications, etcd is a distributed system designed specifically for distributed systems.

It operates as a leader-based distributed system. This means that in a cluster of etcd nodes (typically 3 or 5), one node is elected the leader to handle writes, while the others serve as followers to ensure redundancy. It relies on consensus algorithms (Raft) to ensuring data consistency across these nodes.

What data does it store?

etcd stores the serialized state of all objects in the Kubernetes API. Practically, this encompasses two categories of data:

  1. The Record of Intent (Desired State): Every object you create—Pods, Deployments, Services, ConfigMaps, and Ingress rules—is serialized and written to etcd. When you apply a YAML manifest, you are essentially writing a record to etcd stating, "I want the cluster to look like this".

  2. The Cluster Status (Current State): The control plane components and Kubelets report status back to the API Server, which writes it to etcd. This includes:

    • Node health and readiness.
    • Which Node a Pod is currently running on.
    • The current phase of a Pod (Pending, Running, Failed).
    • Secrets: Sensitive data like passwords and tokens are stored in etcd. By default, these may be stored as plain text, making etcd protection a critical security concern.
    • Events: Cluster events (logs of what has happened) are also stored here, though in large clusters, it is a best practice to shard these into a separate etcd instance to prevent performance degradation.

Why is etcd critical to reliability and consistency?

Because Kubernetes is a distributed system, it requires a "Single Source of Truth." etcd provides this through strong consistency.

1. The "Split-Brain" Prevention

In distributed systems, you cannot risk two different controllers making conflicting decisions because they see different versions of the data. etcd ensures that once a write is committed, all reads will reflect that change.

  • Operational Reality: You must run etcd with an odd number of members (usually 3 or 5) to establish a quorum. If a quorum cannot be established (e.g., 2 out of 3 nodes fail), etcd stops accepting writes to prevent data corruption.
  • Impact: If etcd becomes unstable or loses quorum, the cluster freezes. No new Pods can be scheduled, and the current state cannot be updated.

2. Sensitivity to Latency (Disk I/O)

etcd is extremely sensitive to disk write latency and network instability.

  • The Heartbeat: The leader must periodically send heartbeats to followers. If disk I/O saturation delays these heartbeats, followers may presume the leader is dead and trigger a new election. This causes "instability," where the cluster pauses operations while trying to elect a new leader.

Best Practice: Dedicated Hardware

It is strongly recommended to run etcd on dedicated hardware or isolated environments with high-performance SSDs to guarantee resource requirements and prevent starvation.

3. Security Implication: The "Root" Bypass

Access to etcd is equivalent to having root permission on the entire cluster.

Security Alert

Bypassing RBAC: Kubernetes RBAC (Role-Based Access Control) is enforced by the API Server. If an attacker gains direct access to etcd, they can read all Secrets and modify cluster state directly, bypassing all API-level security controls,.

Mitigation: You should enable encryption at rest for API data, and ensure that only the API Server certificates are authorized to communicate with etcd.

Topology Choices

For production, you have two primary architectural choices for deploying etcd, usually managed by tools like kubeadm:

  1. Stacked etcd: The etcd members run on the same nodes as the control plane components. This is simpler to manage but couples the failure domains; if a control plane node goes down, you lose an etcd member.
  2. External etcd: The etcd cluster runs on hosts dedicated solely to etcd, separate from the control plane. This provides better isolation and performance guarantees but requires more infrastructure.