Appearance
K8s architecture
Kubernetes Cluster Architecture is built on a "Hub-and-Spoke" pattern designed to facilitate declarative configuration and automation. The system is fundamentally divided into two distinct parts: the Control Plane (the brains of the operation) and the Nodes (the worker machines that execute the workloads).
Here is a comprehensive breakdown of the architectural components, their interactions, and operational models.
1. The Control Plane
The Control Plane manages the global state of the cluster. It detects and responds to cluster events (like a Pod dying) and makes high-level decisions (like scheduling).
- kube-apiserver: This is the core component and the "front end" of the Kubernetes control plane. It exposes the HTTP API that users, internal components, and external components communicate with. It is designed to scale horizontally by deploying more instances.
- etcd: This is the consistent, highly-available key-value store used as the backing store for all cluster data. Because it stores the entire state of the cluster, having a robust backup plan for etcd is critical for disaster recovery.
- kube-scheduler: This component watches for newly created Pods that have no Node assigned. It selects the best Node for them to run on based on resource requirements, hardware constraints, affinity/anti-affinity specifications, and data locality.
- kube-controller-manager: This runs controller processes. Logically, each controller (e.g., Node Controller, Job Controller) is a separate process, but to reduce complexity, they are compiled into a single binary. These controllers run loops that constantly compare the current state to the desired state defined in the API.
- cloud-controller-manager (Optional): This component embeds cloud-specific control logic. It links the cluster into the cloud provider's API, decoupling the Kubernetes core from the underlying cloud infrastructure. It manages resources like cloud load balancers and routes.
2. The Worker Nodes
Nodes are the worker machines (virtual or physical) that host the Pods (workloads). Every node must run the following components to be part of the cluster:
- kubelet: An agent that runs on each node. It takes a set of
PodSpecs(provided via the API server or static file paths) and ensures the containers described in those specs are running and healthy. It communicates with the container runtime using the Container Runtime Interface (CRI). - kube-proxy: A network proxy running on each node that maintains network rules. These rules allow network communication to Pods from inside or outside the cluster, essentially implementing the Kubernetes Service concept. It typically uses the operating system packet filtering layer (like
iptablesor IPVS). - Container Runtime: The software responsible for actually running the containers (e.g., containerd, CRI-O, Docker Engine).
3. Communication Patterns
The architecture enforces specific communication pathways to maintain security and stability.
- Hub-and-Spoke API: All API usage terminates at the API server. Nodes and Pods do not communicate directly with each other regarding cluster state; they talk to the API server.
- Node to Control Plane: The
kubeletandkube-proxycommunicate with the API server to register themselves and update status. This traffic is secured via TLS and typically runs over a public or private network. - Control Plane to Node: The API server connects to the Kubelet for specific operational tasks, such as fetching logs, attaching to running pods, or port-forwarding. Because these connections may traverse untrusted networks, SSH tunnels (deprecated) or the Konnectivity service (a TCP-level proxy) are used to secure this traffic.
4. Controller Architecture & The Control Loop
Kubernetes is not just an execution engine; it is a system of "independent, composable control processes".
- The Controller Pattern: Controllers track at least one resource type. They observe the current state, compare it to the desired state (spec), and make changes to bring the current state closer to the desired state.
- Example: The Job controller watches for Job objects. When it sees a new Job, it doesn't run the task itself; it tells the API server to create Pods. The Kubelet (another controller-like agent) sees the Pods and runs the containers.
5. High Availability (HA) Topologies
For production environments, the control plane is usually replicated across multiple machines to provide fault tolerance. There are two primary topology options for HA clusters:
- Stacked etcd Topology:
- Etcd members and control plane components (apiserver, scheduler, controller-manager) are co-located on the same nodes.
- Pros: Simpler infrastructure setup and management.
- Cons: Higher risk of failed coupling—if a node goes down, you lose both a control plane instance and an etcd member. A minimum of three nodes is required.
- External etcd Topology:
- Etcd members run on hosts separate from the control plane nodes.
- Pros: Decouples the control plane from the data store. Losing a control plane node doesn't impact data redundancy.
- Cons: Requires twice the number of hosts (minimum 3 for control plane + 3 for etcd).
6. Node Architecture & Lifecycle
Nodes can self-register with the control plane or be added manually. Once part of the cluster, the Node Controller manages their lifecycle:
- Heartbeats: Nodes send heartbeats (via
Nodestatus updates andLeaseobjects) to prove availability. - Eviction: If a node stops sending heartbeats (e.g., becomes unreachable), the node controller waits for a specific timeout (default 5 minutes) before marking the node
Unknownand scheduling its Pods for eviction. - Graceful Shutdown: The kubelet attempts to detect OS shutdown events to terminate pods gracefully before the node goes offline.
7. Addons
While not part of the binary core, addons are essential for a functional cluster. They usually run as Pods (Deployments or DaemonSets) in the kube-system namespace. Key addons include:
- DNS: Required for service discovery. All Pods use this DNS server.
- Container Resource Monitoring: Records time-series metrics about containers.
- Network Plugins: Implement the CNI specification to allocate IP addresses and enable Pod-to-Pod communication.