What happens when a Pod or Node fails, and how does Kubernetes detect and recover from failures?

Lets break down the failure handling mechanisms in Kubernetes.

Kubernetes operates on a declarative model called desired state. The system continuously monitors the current state of the cluster and uses control loops to reconcile any differences. When a failure occurs—whether at the container, Pod, or Node level—it is treated as a deviation from the desired state, triggering specific self-healing workflows.

Here is the technical breakdown of how Kubernetes detects and recovers from failures at different layers of the stack.

1. Container and Pod Failures

At the workload level, the primary agent of recovery is the kubelet (running on the node) for local issues, and Controllers (running in the Control Plane) for total Pod loss.

Detection Mechanisms

Process Exit Codes: The kubelet monitors the lifecycle of containers. If a container's main process exits with a non-zero status code, it is considered a failure.
Liveness Probes: Process checks are insufficient for deadlocks (where the app is running but stuck). Liveness probes allow the kubelet to periodically check application health via HTTP, TCP, or gRPC. If the probe fails, the kubelet kills the container.
Startup Probes: For slow-starting legacy applications, a startup probe holds off other probes until the application is fully initialized. If this fails, the container is killed and restarted.

Recovery Workflow

Restart Policy: When a container fails, the kubelet checks the Pod's restartPolicy (default: Always).
- If set to Always or OnFailure, the kubelet restarts the container on the same node.
Exponential Backoff: To prevent a failing container from overwhelming the node, the kubelet implements an exponential backoff delay (10s, 20s, 40s…) capped at 5 minutes (300 seconds). This state is visible as CrashLoopBackOff.
Controller Replacement: If the Pod cannot recover or is deleted, the higher-level Workload Controller (e.g., Deployment, ReplicaSet) observes that the number of running replicas is lower than the desired replicas count and creates a completely new Pod to replace it.

2. Node Failures

Node failures are handled by the Control Plane, specifically the Node Controller and the Pod Garbage Collector.

Detection Mechanisms

Heartbeats (Leases): Nodes send periodic heartbeats to the API server (via Lease objects in the kube-node-lease namespace). This proves availability.
Controller Monitoring: The Node Controller checks these heartbeats. If a node stops sending updates, the controller changes the Node's .status Condition Ready to Unknown.

Recovery Workflow

Tainting: When a node becomes unhealthy, the Node Controller applies taints to it, such as node.kubernetes.io/unreachable or node.kubernetes.io/not-ready.
Toleration Window: Pods typically have a default toleration for these taints for 300 seconds (5 minutes). This prevents massive rescheduling storms during minor network blips.
Eviction: If the node remains unreachable after the toleration expires (5 minutes), the taint causes the API server to mark the Pods on that node for deletion/eviction.
Rescheduling: The Workload Controller (e.g., Deployment) sees the Pods are terminating/gone. It creates new replacement Pods. The kube-scheduler then places these new Pods onto healthy nodes.

Important Distinction for StatefulSets: StatefulSets require "at most one" semantics to prevent data corruption (split-brain). If a node fails, Kubernetes will not automatically reschedule StatefulSet Pods because it cannot confirm the old Pod is truly dead (the node might just be partitioned). You must force-delete the Pod or taint the node as node.kubernetes.io/out-of-service to trigger recovery.

3. Resource Starvation (Node-Pressure Eviction)

Sometimes a node is "healthy" (heartbeating) but has run out of resources (Memory, Disk, or PIDs).

Detection and Recovery

Eviction Signals: The kubelet monitors resources like memory.available or nodefs.available. If these drop below configured thresholds (Soft or Hard limits), the node enters a pressure state.
Ranking: The kubelet proactively terminates Pods to reclaim resources. It ranks Pods for eviction based on their Quality of Service (QoS) class:
1. BestEffort (No requests/limits) are evicted first.
2. Burstable (Usage exceeds requests) are evicted next.
3. Guaranteed (Usage within limits) are evicted last.

Summary Table

Failure Type	Detected By	Primary Recovery Action
Container Crash	Kubelet (Exit Code)	Kubelet restarts container (Subject to Backoff).
App Deadlock	Kubelet (Liveness Probe)	Kubelet kills and restarts container.
Node Offline	Node Controller (Heartbeat)	Controller taints node; Pods evicted after 5 min; Replicas recreated on other nodes.
Resource Exhaustion	Kubelet (Thresholds)	Kubelet evicts lower QoS Pods to reclaim resources.

What happens when a Pod or Node fails, and how does Kubernetes detect and recover from failures?

1. Container and Pod Failures ​

Detection Mechanisms ​

Recovery Workflow ​

2. Node Failures ​

Detection Mechanisms ​

Recovery Workflow ​

3. Resource Starvation (Node-Pressure Eviction) ​

Detection and Recovery ​

Summary Table ​

1. Container and Pod Failures

Detection Mechanisms

Recovery Workflow

2. Node Failures

Detection Mechanisms

Recovery Workflow

3. Resource Starvation (Node-Pressure Eviction)

Detection and Recovery

Summary Table