In-Place Pod Resizing: Seamless Resource Updates

How does In-Place Pod Resizing work in 1.35?

Historically, Kubernetes has enforced a strictly immutable approach to Pod resource allocation: changing a Pod's CPU or memory requirements necessitated terminating the existing Pod and scheduling a replacement. As of Kubernetes 1.35, the In-Place Pod Vertical Scaling feature is stable, fundamentally shifting this paradigm by allowing dynamic adjustment of CPU and memory for running containers without disruption.

As an architect, understanding the mechanics of how the control plane, kubelet, and Linux kernel collaborate to execute these live updates is critical for building resilient, auto-scaling platforms.

Here is the deep dive into the architecture, cgroup v2 mechanics, and production challenges of in-place Pod resizing.

1. The Architectural Workflow (API to Kubelet)

The in-place resize operation is driven by a new declarative state reconciliation loop managed by the kubelet, utilizing a dedicated API subresource.

The /resize Subresource: To trigger an update, a client (like a VerticalPodAutoscaler or a user running kubectl patch --subresource resize) submits new requests and limits to the Pod's specification.
State Tracking: Kubernetes now tracks three distinct dimensions of resource state to manage the transition:
- Desired Resources (spec.containers[*].resources): The new intended state.
- Allocated Resources (status.containerStatuses[*].allocatedResources): The resources the kubelet has confirmed are reserved for the container (used by the scheduler to ensure node capacity isn't oversubscribed during a pending resize).
- Actual Resources (status.containerStatuses[*].resources): The resources currently configured on the running container.
Kubelet Reconciliation: The kubelet observes the discrepancy between desired and allocated resources. If the node lacks capacity, the kubelet adds a PodResizePending condition with a reason of Infeasible or Deferred. If the resize is Deferred, the kubelet will periodically retry it, prioritizing Pods based on PriorityClass, QoS class (Guaranteed over Burstable), and time waited.
Container Resize Policies: Because some applications (like Java JVMs) cannot dynamically adapt to memory changes, you can define a resizePolicy per resource type. Setting it to NotRequired (the default) applies the change live; setting it to RestartContainer forces the kubelet to restart the container with the new resource boundaries.

2. Under the Hood: cgroup v2 Mechanics

When the kubelet accepts a live resize request (PodResizeInProgress), it communicates with the container runtime (via the Container Runtime Interface) to modify the container's control groups (cgroups) in the Linux kernel on the fly.

With cgroup v2, Kubernetes maps your declarative YAML into specific kernel-level resource control files:

CPU Limits (Throttling): The container's CPU limit translates to a hard ceiling in the kernel (cpu.max in cgroup v2 / CFS quota). The kubelet updates this value live. If the container is currently using more CPU than the newly lowered limit, the kernel immediately begins throttling the process during its scheduling time slices.
CPU Requests (Weighting): The CPU request translates to proportional CPU time (cpu.weight in cgroup v2 / cpu.shares). Updating this changes the container's priority relative to other containers when the host CPU is under contention.
Memory Requests (Protection): On nodes running cgroup v2, the memory request is used by the container runtime as a hint to configure memory.min and memory.low. Setting memory.min ensures that a specific amount of memory is strictly reserved and never reclaimed by the kernel, guaranteeing availability for the workload.
Memory Limits (OOM Control): The memory limit dictates the hard memory boundary for the cgroup. Under cgroup v2 Memory QoS, the system may also use memory.high to proactively throttle a workload as it approaches its limit, before the kernel is forced to invoke the Out-Of-Memory (OOM) killer to terminate the process.

3. Production Challenges and Constraints

While powerful, in-place resizing introduces new operational edge cases and strict constraints that platform teams must design around.

The Memory Scale-Down Race Condition

Decreasing a container's memory limit is a highly sensitive operation. If the kubelet simply applied a lower limit while the application was using memory above that new threshold, the kernel would instantly OOM-kill the container. To mitigate this, if the memory resizePolicy is NotRequired, the kubelet makes a "best-effort" check. If current memory usage exceeds the newly requested limit, the kubelet refuses to apply the limit and leaves the resize status as "In Progress". However, this is subject to a race condition: memory usage could spike in the milliseconds between the kubelet checking the usage and the kernel applying the new limit, leading to an unexpected OOM kill.

QoS Class Immutability

A Pod's Quality of Service (QoS) class—Guaranteed, Burstable, or BestEffort—is calculated at creation and cannot be changed by a resize operation.

If a Pod is Guaranteed, you can resize it, but the new CPU and memory requests must continue to exactly equal the new limits.
If a Pod is Burstable, you cannot resize its requests to equal its limits, as that would dynamically elevate it to Guaranteed.
If a Pod is BestEffort, you cannot add resource requests or limits to it after the fact.

Architectural Incompatibilities

In Kubernetes 1.35, several features fundamentally conflict with in-place resizing:

Static Resource Managers: Pods that are managed by the static CPU Manager policy (which assigns exclusive CPU cores) or the Static Memory Manager policy (which pins memory to specific NUMA nodes) cannot be resized in-place.
Container Types: While sidecar containers support resizing, ephemeral debug containers and non-restartable init containers cannot be resized.
Resource Removal: You can change the values of requests and limits, but you cannot completely remove them once they have been set.
Swap Memory: If the node uses swap memory, you cannot resize memory requests dynamically unless the container's memory resizePolicy is explicitly set to RestartContainer.
Windows: Windows Pods do not support in-place resizing at the OS level.

How does In-Place Pod Resizing work in 1.35?

1. The Architectural Workflow (API to Kubelet) ​

2. Under the Hood: cgroup v2 Mechanics ​

3. Production Challenges and Constraints ​

The Memory Scale-Down Race Condition ​