Appearance
In-Place Pod Resizing: Seamless Resource Updates
How does In-Place Pod Resizing work in 1.35?
Historically, Kubernetes has enforced a strictly immutable approach to Pod resource allocation: changing a Pod's CPU or memory requirements necessitated terminating the existing Pod and scheduling a replacement. As of Kubernetes 1.35, the In-Place Pod Vertical Scaling feature is stable, fundamentally shifting this paradigm by allowing dynamic adjustment of CPU and memory for running containers without disruption.
As an architect, understanding the mechanics of how the control plane, kubelet, and Linux kernel collaborate to execute these live updates is critical for building resilient, auto-scaling platforms.
Here is the deep dive into the architecture, cgroup v2 mechanics, and production challenges of in-place Pod resizing.
1. The Architectural Workflow (API to Kubelet)
The in-place resize operation is driven by a new declarative state reconciliation loop managed by the kubelet, utilizing a dedicated API subresource.
- The
/resizeSubresource: To trigger an update, a client (like a VerticalPodAutoscaler or a user runningkubectl patch --subresource resize) submits newrequestsandlimitsto the Pod's specification. - State Tracking: Kubernetes now tracks three distinct dimensions of resource state to manage the transition:
- Desired Resources (
spec.containers[*].resources): The new intended state. - Allocated Resources (
status.containerStatuses[*].allocatedResources): The resources the kubelet has confirmed are reserved for the container (used by the scheduler to ensure node capacity isn't oversubscribed during a pending resize). - Actual Resources (
status.containerStatuses[*].resources): The resources currently configured on the running container.
- Desired Resources (
- Kubelet Reconciliation: The kubelet observes the discrepancy between desired and allocated resources. If the node lacks capacity, the kubelet adds a
PodResizePendingcondition with a reason ofInfeasibleorDeferred. If the resize isDeferred, the kubelet will periodically retry it, prioritizing Pods based on PriorityClass, QoS class (Guaranteed over Burstable), and time waited. - Container Resize Policies: Because some applications (like Java JVMs) cannot dynamically adapt to memory changes, you can define a
resizePolicyper resource type. Setting it toNotRequired(the default) applies the change live; setting it toRestartContainerforces the kubelet to restart the container with the new resource boundaries.
2. Under the Hood: cgroup v2 Mechanics
When the kubelet accepts a live resize request (PodResizeInProgress), it communicates with the container runtime (via the Container Runtime Interface) to modify the container's control groups (cgroups) in the Linux kernel on the fly.
With cgroup v2, Kubernetes maps your declarative YAML into specific kernel-level resource control files:
- CPU Limits (Throttling): The container's CPU
limittranslates to a hard ceiling in the kernel (cpu.maxin cgroup v2 / CFS quota). The kubelet updates this value live. If the container is currently using more CPU than the newly lowered limit, the kernel immediately begins throttling the process during its scheduling time slices. - CPU Requests (Weighting): The CPU
requesttranslates to proportional CPU time (cpu.weightin cgroup v2 /cpu.shares). Updating this changes the container's priority relative to other containers when the host CPU is under contention. - Memory Requests (Protection): On nodes running cgroup v2, the memory
requestis used by the container runtime as a hint to configurememory.minandmemory.low. Settingmemory.minensures that a specific amount of memory is strictly reserved and never reclaimed by the kernel, guaranteeing availability for the workload. - Memory Limits (OOM Control): The memory
limitdictates the hard memory boundary for the cgroup. Under cgroup v2 Memory QoS, the system may also usememory.highto proactively throttle a workload as it approaches its limit, before the kernel is forced to invoke the Out-Of-Memory (OOM) killer to terminate the process.
3. Production Challenges and Constraints
While powerful, in-place resizing introduces new operational edge cases and strict constraints that platform teams must design around.
The Memory Scale-Down Race Condition
Decreasing a container's memory limit is a highly sensitive operation. If the kubelet simply applied a lower limit while the application was using memory above that new threshold, the kernel would instantly OOM-kill the container. To mitigate this, if the memory resizePolicy is NotRequired, the kubelet makes a "best-effort" check. If current memory usage exceeds the newly requested limit, the kubelet refuses to apply the limit and leaves the resize status as "In Progress". However, this is subject to a race condition: memory usage could spike in the milliseconds between the kubelet checking the usage and the kernel applying the new limit, leading to an unexpected OOM kill.
QoS Class Immutability
A Pod's Quality of Service (QoS) class—Guaranteed, Burstable, or BestEffort—is calculated at creation and cannot be changed by a resize operation.
- If a Pod is
Guaranteed, you can resize it, but the new CPU and memoryrequestsmust continue to exactly equal the newlimits. - If a Pod is
Burstable, you cannot resize its requests to equal its limits, as that would dynamically elevate it toGuaranteed. - If a Pod is
BestEffort, you cannot add resource requests or limits to it after the fact.
Architectural Incompatibilities
In Kubernetes 1.35, several features fundamentally conflict with in-place resizing:
- Static Resource Managers: Pods that are managed by the
staticCPU Manager policy (which assigns exclusive CPU cores) or theStaticMemory Manager policy (which pins memory to specific NUMA nodes) cannot be resized in-place. - Container Types: While sidecar containers support resizing, ephemeral debug containers and non-restartable init containers cannot be resized.
- Resource Removal: You can change the values of requests and limits, but you cannot completely remove them once they have been set.
- Swap Memory: If the node uses swap memory, you cannot resize memory requests dynamically unless the container's memory
resizePolicyis explicitly set toRestartContainer. - Windows: Windows Pods do not support in-place resizing at the OS level.