Appearance
Etcd Architecture & Latency Metrics
How do I monitor etcd latency in a production cluster?
Monitoring etcd latency is a critical operational workflow. Because etcd serves as the highly-available key-value store and the absolute "source of truth" for all Kubernetes cluster data, any starvation in its underlying architecture will rapidly destabilize the entire control plane.
To understand why latency kills etcd, you must first understand its architecture.
Part 1: Architecture & Mechanisms
The Storage Mechanism (Key-Value Store)
When the Kubernetes API server persists an object, it serializes the data—either in JSON or the highly efficient binary Protobuf format—and writes it directly into the etcd key-value store. This serialized state holds exactly what every Pod, ConfigMap, and Secret should look like.
The Consensus Protocol (Raft)
To ensure that the stored data remains consistent across multiple nodes, etcd operates as a leader-based distributed system relying on the Raft consensus algorithm.
In this architecture, an elected leader node must continuously send periodic heartbeats to its follower nodes to validate its leadership and keep the cluster stable. Because the Raft protocol requires strict timing to achieve consensus, etcd is highly sensitive to network and disk I/O.
Leader Election Chaos
If slow disk performance (slow fsync times) causes the leader to miss its heartbeat window, it triggers a heartbeat timeout. The follower nodes assume the leader is dead and trigger a new election. During an election, the cluster refuses new writes, effectively causing the Kubernetes control plane to "freeze."
The Network Protocol (mTLS over TCP)
For data transmission, etcd utilizes TCP split into two distinct channels:
- Client Communication (Port 2379): The
kube-apiserver(the client) communicates withetcd. - Peer Communication (Port 2380): The individual
etcdmembers communicate with one another to replicate state and achieve consensus.
These communication channels should always be explicitly secured using mutual TLS (mTLS) backed by an x509 Public Key Infrastructure (PKI). This ensures that only clients and peers with valid, trusted certificates can participate in the protocol.
Part 2: Monitoring Latency Bottlenecks
To prevent the cluster from freezing due to disk or network starvation, you must continuously monitor etcd health and latency.
1. Monitor Request Latencies via Metrics
To preemptively spot bottlenecks before they cause a full outage, you should continuously monitor request latencies using a tool like Prometheus.
In a standard Kubernetes architecture, the kube-controller-manager explicitly exposes etcd latency metrics. By scraping these endpoints, operators can observe performance degradation (such as slow fsync write operations) and take action before the latency triggers leader election failures.
2. Track API Server Health Endpoints
Kubernetes itself is designed to continually monitor the health and responsiveness of etcd. The kube-apiserver acts as the gateway to etcd and exposes dedicated HTTP endpoints for this purpose:
/livez/etcd/readyz/etcd
If etcd experiences severe latency that renders it effectively unreachable, these endpoints will explicitly fail and return an error (such as an HTTP 500 Internal Server Error). Load balancers should be configured to poll these endpoints to remove unhealthy API servers from rotation.
3. Perform Active Network Checks (etcdctl)
For direct operational troubleshooting, operators should use the etcdctl command-line tool.
(Note: While etcdutl is used for disk-level administration like restoring snapshots, etcdctl is designed for ongoing network-based operations).
By executing the etcdctl endpoint health command against your cluster, you can perform active, real-time health checks. This command interacts directly over mTLS with the etcd members—bypassing the Kubernetes API server entirely—to verify their individual responsiveness and overall ability to participate in the Raft quorum.