Skip to content

OOM Killer Mechanics: RSS vs. Active Cache

How does the OOM killer choose between RSS and cache?

The Out of Memory (OOM) Killer does not technically "choose" between RSS (Anonymous Memory) and Cache during the kill process. Instead, the OOM Killer is a kernel-level fail-safe mechanism of last resort that activates only after the Linux kernel has failed to reclaim enough memory (usually Page Cache) to satisfy a container's limits or the node's requirements.

Understanding how the kernel prioritizes reclaiming memory versus killing processes is critical for preventing unexpected OOMKilled pod crashes.


1. The Pre-OOM Phase: Reclaiming Cache

Before the OOM Killer is ever triggered, the Linux kernel aggressively attempts to free up memory by "evicting" reclaimable pages. The system distinguishes between memory that must be kept and memory that can safely be dropped.

  1. Inactive Page Cache (First to go): The kernel maintains an "inactive Least Recently Used (LRU)" cache list. This represents cached files that have not been accessed recently. When memory pressure mounts, the kernel simply drops these pages to free up RAM. This does not kill your process and is practically invisible to the application.
  2. Active Page Cache (Harder to reclaim): Memory used for file buffering that is currently being accessed or written to is tracked as active_file. The kubelet and the kernel generally treat this as "not reclaimable" in the immediate sense, because evicting it would cause severe thrashing or require waiting for data to flush to disk.
  3. RSS (Anonymous Memory): This is the memory used by your application's heap and stack. It cannot be dropped or reclaimed; it can only be swapped out (if swap is enabled) or freed by forcefully killing the process perfectly.

The "Working Set" Trap

A common OOMKilled scenario occurs when a workload performs intensive disk I/O. This activity fills the container's memory allocation with active_file (Active Page Cache). Because the kernel struggles to reclaim active cache quickly, the Kubelet observes this as high working set memory. If RSS + Active Cache exceeds the container's hard limit, the kernel cannot reclaim enough memory, and the OOM Killer is triggered—even though the application's actual heap (RSS) might be very small!


2. The OOM Killer Phase: Selecting a Victim

Once the kernel determines it cannot reclaim enough Cache to stay below the limit, it invokes the OOM Killer. The OOM Killer does not distinguish between "killing RSS" or "killing Cache"—it kills an entire process to release all of its held memory resources instantly.

It selects the victim based on a dynamically calculated OOM Score. Kubernetes intentionally manipulates this score to protect critical cluster components and prioritize which user workloads should die first.

A. The Kernel's Base Calculation

The kernel calculates a base score depending on the percentage of available memory the process is currently consuming. A process using 80% of the node's RAM gets a much higher base score than a process using 5%. This includes both the RSS and the Page Cache attributed to that process.

B. Kubernetes Influence (oom_score_adj)

Kubernetes intervenes by applying a heavy adjustment value (oom_score_adj) directly to the kernel's calculation. This adjustment is based entirely on the Pod's Quality of Service (QoS) Class. This ensures that "Burstable" or "BestEffort" pods are killed long before "Guaranteed" critical workloads.

QoS ClassPod Configuration Statusoom_score_adjOOM Priority
BestEffortPod has no Memory Requests or Limits.1000Highest score. First to be killed.
BurstableRequests < Limits.Calculated (2 - 999)Dies only after all BestEffort Pods are gone.
GuaranteedRequests exactly equal Limits.-997Very low score. Last to be killed.

(Note: Essential infrastructure like the kubelet and container runtime receive oom_score_adj of -999, meaning they are functionally immune to the OOM Killer compared to user workloads).


Summary Workflow

  1. Pressure: Container memory usage approaches the configured limit.
  2. Attempt Reclaim: The Linux kernel attempts to drop Inactive Page Cache.
  3. Failure: If total usage remains high (due to a high application RSS footprint, or heavy Active Page Cache from I/O), reclamation fails.
  4. OOM Kill: The kernel targets the process with the highest cumulative OOM Score (Raw Usage + Kubernetes QoS adjustment oom_score_adj) and forcibly kills it via SIGKILL, releasing both its RSS and Cache simultaneously.

Based on Kubernetes v1.35 (Timbernetes). Changelog.