Kubernetes Resource Management: Requests, Limits & QoS Classes

How do resource requests and limits work, and how do they affect scheduling, performance, and stability?

Consider Resource Requests and Limits to be the foundational "contract" between your applications and the Kubernetes cluster. They determine three critical outcomes: where workloads run (Scheduling), how they perform under load (Performance), and which workloads survive during resource starvation (Stability).

Here is the technical breakdown of how these mechanisms operate.

1. The Distinction: Scheduling vs. Runtime Enforcement

The most important concept to grasp is that Requests and Limits target different parts of the Kubernetes lifecycle.

Requests are used by the Scheduler to find a suitable Node. They act as a reservation.
Limits are used by the Container Runtime (CRI) and Linux Kernel to enforce usage. They act as a hard ceiling.

2. How Requests Affect Scheduling

When you define a request (e.g., cpu: 500m, memory: 256Mi), you are guaranteeing that amount of resource to the container.

Bin Packing: The kube-scheduler filters nodes based on whether they have enough unallocated capacity to meet the Pod's requests,.
Reservation: Once a Pod is scheduled, that requested capacity is deducted from the Node's allocatable pool. It does not matter if the container is currently using 0% CPU; that space is reserved and cannot be given to other Pods.
CPU Weighting: On Linux, the CPU request is also used to configure cpu.shares. If a node is under CPU pressure (contention), containers with higher CPU requests are allocated more CPU time relative to those with lower requests.

Engineering Insight: If you set requests too high (over-provisioning), you waste money because nodes appear "full" to the scheduler even if utilization is low. If you set them too low, you risk node over-subscription and performance degradation.

3. How Limits Affect Performance and Stability

Limits define the maximum resources a container is allowed to consume. The behavior differs drastically between compressible resources (CPU) and incompressible resources (Memory).

CPU Limits (Compressible)

CPU is measured in units (cores) or millicores (m).

Mechanism: CPU limits are enforced using the Linux CFS (Completely Fair Scheduler) quota.
Behavior: If a container attempts to exceed its CPU limit, it is throttled. The kernel restricts its CPU access for the remainder of the time slice.
Impact: The application slows down, latency increases, but it does not crash.

Memory Limits (Incompressible)

Memory is measured in bytes (Mi, Gi).

Mechanism: Memory limits are enforced by the kernel's Out-Of-Memory (OOM) subsystem.
Behavior: If a container tries to allocate memory beyond its limit, the kernel cannot compress it. The process is terminated (OOM Killed), typically resulting in a container restart (Exit Code 137),.
Impact: Immediate application downtime or a crash loop.

4. Quality of Service (QoS) Classes

Kubernetes implicitly assigns a QoS Class to every Pod based on how you configure these requests and limits. This classification determines which Pods are evicted first when a Node runs out of resources.

QoS Class	Configuration	Behavior & Eviction Priority
Guaranteed	Requests = Limits (for both CPU and Memory).	Top Priority. These Pods are guaranteed their resources. They are the last to be evicted during node resource starvation.
Burstable	Requests < Limits (or Limits not set).	Middle Priority. These Pods have a guaranteed baseline (request) but can burst up to their limit. If the node is under pressure, they are evicted after BestEffort pods.
BestEffort	No Requests or Limits set.	Lowest Priority. These Pods can use free resources on the node but are the first to be evicted if the node runs out of CPU or Memory,.

5. Advanced Resource Management

Pod-Level Resources

Traditionally, resources are defined per container. However, newer versions of Kubernetes (v1.34 beta) allow specifying resources at the Pod level. This enables containers within a Pod to share a resource "budget," improving utilization for patterns like sidecars,.

Extended Resources

Beyond CPU and Memory, you can manage Extended Resources (like GPUs or dongles). These are strictly integers and cannot be overcommitted (Requests must equal Limits),.

Summary Recommendation

To ensure cluster reliability:

Always set Memory Limits to prevent a memory leak in one app from crashing the entire node (OOM).
Set Requests accurately to ensure the scheduler places Pods on nodes that can actually handle the load.
Use LimitRanges in namespaces to prevent users from creating Pods without resource boundaries.

How do resource requests and limits work, and how do they affect scheduling, performance, and stability?

1. The Distinction: Scheduling vs. Runtime Enforcement ​

2. How Requests Affect Scheduling ​

3. How Limits Affect Performance and Stability ​

CPU Limits (Compressible) ​

Memory Limits (Incompressible) ​

4. Quality of Service (QoS) Classes ​

5. Advanced Resource Management ​

Pod-Level Resources ​

Extended Resources ​

Summary Recommendation ​