Appearance
How do resource requests and limits work, and how do they affect scheduling, performance, and stability?
Consider Resource Requests and Limits to be the foundational "contract" between your applications and the Kubernetes cluster. They determine three critical outcomes: where workloads run (Scheduling), how they perform under load (Performance), and which workloads survive during resource starvation (Stability).
Here is the technical breakdown of how these mechanisms operate.
1. The Distinction: Scheduling vs. Runtime Enforcement
The most important concept to grasp is that Requests and Limits target different parts of the Kubernetes lifecycle.
- Requests are used by the Scheduler to find a suitable Node. They act as a reservation.
- Limits are used by the Container Runtime (CRI) and Linux Kernel to enforce usage. They act as a hard ceiling.
2. How Requests Affect Scheduling
When you define a request (e.g., cpu: 500m, memory: 256Mi), you are guaranteeing that amount of resource to the container.
- Bin Packing: The kube-scheduler filters nodes based on whether they have enough unallocated capacity to meet the Pod's requests,.
- Reservation: Once a Pod is scheduled, that requested capacity is deducted from the Node's allocatable pool. It does not matter if the container is currently using 0% CPU; that space is reserved and cannot be given to other Pods.
- CPU Weighting: On Linux, the CPU request is also used to configure
cpu.shares. If a node is under CPU pressure (contention), containers with higher CPU requests are allocated more CPU time relative to those with lower requests.
Engineering Insight: If you set requests too high (over-provisioning), you waste money because nodes appear "full" to the scheduler even if utilization is low. If you set them too low, you risk node over-subscription and performance degradation.
3. How Limits Affect Performance and Stability
Limits define the maximum resources a container is allowed to consume. The behavior differs drastically between compressible resources (CPU) and incompressible resources (Memory).
CPU Limits (Compressible)
CPU is measured in units (cores) or millicores (m).
- Mechanism: CPU limits are enforced using the Linux CFS (Completely Fair Scheduler) quota.
- Behavior: If a container attempts to exceed its CPU limit, it is throttled. The kernel restricts its CPU access for the remainder of the time slice.
- Impact: The application slows down, latency increases, but it does not crash.
Memory Limits (Incompressible)
Memory is measured in bytes (Mi, Gi).
- Mechanism: Memory limits are enforced by the kernel's Out-Of-Memory (OOM) subsystem.
- Behavior: If a container tries to allocate memory beyond its limit, the kernel cannot compress it. The process is terminated (OOM Killed), typically resulting in a container restart (Exit Code 137),.
- Impact: Immediate application downtime or a crash loop.
4. Quality of Service (QoS) Classes
Kubernetes implicitly assigns a QoS Class to every Pod based on how you configure these requests and limits. This classification determines which Pods are evicted first when a Node runs out of resources.
| QoS Class | Configuration | Behavior & Eviction Priority |
|---|---|---|
| Guaranteed | Requests = Limits (for both CPU and Memory). | Top Priority. These Pods are guaranteed their resources. They are the last to be evicted during node resource starvation. |
| Burstable | Requests < Limits (or Limits not set). | Middle Priority. These Pods have a guaranteed baseline (request) but can burst up to their limit. If the node is under pressure, they are evicted after BestEffort pods. |
| BestEffort | No Requests or Limits set. | Lowest Priority. These Pods can use free resources on the node but are the first to be evicted if the node runs out of CPU or Memory,. |
5. Advanced Resource Management
Pod-Level Resources
Traditionally, resources are defined per container. However, newer versions of Kubernetes (v1.34 beta) allow specifying resources at the Pod level. This enables containers within a Pod to share a resource "budget," improving utilization for patterns like sidecars,.
Extended Resources
Beyond CPU and Memory, you can manage Extended Resources (like GPUs or dongles). These are strictly integers and cannot be overcommitted (Requests must equal Limits),.
Summary Recommendation
To ensure cluster reliability:
- Always set Memory Limits to prevent a memory leak in one app from crashing the entire node (OOM).
- Set Requests accurately to ensure the scheduler places Pods on nodes that can actually handle the load.
- Use LimitRanges in namespaces to prevent users from creating Pods without resource boundaries.