Linux Kernel Primitives for Kubernetes: Namespaces & Cgroups Explained

Cgroups & Namespaces

Before Kubernetes was an orchestrator, it was just Linux. The entire container revolution is built upon two kernel primitives that existed long before Docker: cgroups (Control Groups) and namespaces.

Historical Context

Google engineers (initially Paul Menage and Rohit Seth) began working on "Process Containers" in 2006. This was later renamed to cgroups and merged into the Linux Kernel 2.6.24 in 2008. Without this kernel patch, Kubernetes would not exist.

1. key concepts: The Two Pillars

If you think of a Container as a "Box", these two primitives define the walls of that box.

Primitive	Function	The Metaphor
Namespaces	Isolation (Visibility)	"What can I see?"
Cgroups	Restriction (Usage)	"How much can I use?"

2. Namespaces: The Walls of Reality

Linux Namespaces lie to the process. When you run a process in a namespace, it thinks it is the only thing running on the machine. It has its own "view" of the system.

The 7 Critical Namespaces

While cgroups Enforce limits, Namespaces manage resource visibility. They create the fundamental illusion that a container is running on its own dedicated operating system.

1. PID Namespace (Process ID)

This namespace isolates the process ID number space.

Process Isolation: Processes within a container can only "see" other processes in the same PID namespace, preventing them from spying on host processes.
How Containers get PID 1: Inside a new PID namespace, the first process started is assigned PID 1 (your application's entrypoint). PID 1 must handle system signals (like SIGTERM) and reap zombie processes.
Host vs. Container View: A process has two IDs. Inside, it sees itself as PID 1. On the host, the kernel sees it as PID 12345.

Engineering Note

Kubernetes allows you to disable this isolation by setting shareProcessNamespace: true in the Pod spec. This allows containers in the same Pod to see each other's processes (useful for sidecar debugging).

2. Network Namespace

This provides a completely independent network stack (interfaces, IPs, routing tables).

Virtual Network Interfaces: To connect this isolated namespace to the cluster, CNI plugins use a veth pair (virtual ethernet). One end is placed inside the container (eth0), and the other remains on the host attached to a bridge (cni0).
How Pods get their own IP: Kubernetes creates the network namespace for the Pod, not the individual container. All containers in a Pod share this single network namespace (and thus the same IP address and localhost).

3. Mount Namespace

Allows processes to have their own isolated view of filesystem mount points.

Root Filesystem (§): When a container starts, it sees its image layers as the root filesystem.
pivot_root: A secure system call used during container creation that swaps the host OS root with the container image root.

Hostname Isolation: Allows a container to have its own hostname independent of the host node. This is exactly how Kubernetes assigns a stable, resolvable hostname to a Pod (e.g., web-0 in a StatefulSet) that persists across reschedules.

5. IPC Namespace (Inter-Process Communication)

Shared Memory: Containers within the same Pod share the IPC namespace. This enables them to communicate using high-performance IPC mechanisms like shared memory segments, while remaining completely isolated from other Pods.

6. User Namespace

UID/GID Mapping: Allows a process to run as root (UID 0) inside the container, but be mapped to an unprivileged user (e.g., UID 1000) on the host. This drastically reduces the blast radius of a container breakout.

7. Cgroup Namespace

Cgroup View Isolation: Virtualizes the view of the /proc/self/cgroup file. The container sees itself at the root of the hierarchy, preventing it from modifying cgroups belonging to other neighboring containers.

3. Cgroups: The Resource Police

If Namespaces are the walls, Cgroups are the guards watching the door. They enforce limits and accounting.

When you write this YAML in Kubernetes:

yaml

resources:
  limits:
    memory: "512Mi"
    cpu: "500m"

The Kubelet translates that generic text into specific Linux Cgroup Files on the worker node.

How it works on disk

On the Linux host, you can find these controls in /sys/fs/cgroup/.

1. CPU Throttling (cpu.cfs_quota_us) When you leverage a CPU limit of 0.5 (500m), the kernel writes a value to the scheduler: "This process group gets 50,000 microseconds of runtime for every 100,000 microseconds of real time."

If the app asks for more? The kernel pauses (throttles) the process.

2. Memory OOM (memory.limit_in_bytes) When you set a limit of 512Mi, the kernel writes 536870912 to this file.

If the app asks for more? The kernel invokes the OOM Killer (Out of Memory Killer) to instantly terminate the process (SIGKILL).

The Engineering Takeaway

"Containers" technically do not exist. There is no kernel object called "Container". There are only normal Linux processes that have been tricked by Namespaces (so they can't see neighbors) and restricted by Cgroups (so they don't eat all the RAM).

4. Prove it to yourself (Hands-on CLI)

You don't need Docker or Kubernetes to see this. You just need Linux (or a VM).

A. Create a Namespace manually

The unshare command lets you create a process in a new namespace.

bash

# Become root, then start a new shell in a new PID namespace
sudo unshare --pid --fork --mount-proc /bin/bash

Inside this new shell, run ps aux. You will see you are PID 1. You have become the container.

B. Inspecting Namespaces (`lsns`)

You can list all namespaces on a host to find "hidden" containers.

bash

lsns -t net

This shows every Network Namespace. If you run a Docker container, a new entry appears here.

C. Break into a Container (`nsenter`)

This is the most critical troubleshooting command for CKA. If a container has no shell (distroless), how do you debug it?

Find the PID of the container process on the host.
Use nsenter to jump into its namespace using your host's tools.

bash

# 1. Get the PID using the modern CRI tool (containerd/CRI-O)
PID=$(crictl inspect --output go-template --template '{{.info.pid}}' <container-id>)

# 2. Enter the NET and PID namespaces
nsenter --target $PID --net --pid

You are now "inside" the container's reality, but using your host's extensive binary toolkit (curl, tcpdump, strace).

Cgroups & Namespaces

1. key concepts: The Two Pillars ​

2. Namespaces: The Walls of Reality ​

The 7 Critical Namespaces ​

1. PID Namespace (Process ID) ​

2. Network Namespace ​

3. Mount Namespace ​

4. UTS Namespace (UNIX Time-Sharing) ​

5. IPC Namespace (Inter-Process Communication) ​

6. User Namespace ​

7. Cgroup Namespace ​

3. Cgroups: The Resource Police ​

How it works on disk ​

4. Prove it to yourself (Hands-on CLI) ​

A. Create a Namespace manually ​

B. Inspecting Namespaces (lsns) ​

C. Break into a Container (nsenter) ​

1. key concepts: The Two Pillars

2. Namespaces: The Walls of Reality

The 7 Critical Namespaces

1. PID Namespace (Process ID)

2. Network Namespace

3. Mount Namespace

4. UTS Namespace (UNIX Time-Sharing)

5. IPC Namespace (Inter-Process Communication)

6. User Namespace

7. Cgroup Namespace

3. Cgroups: The Resource Police

How it works on disk

4. Prove it to yourself (Hands-on CLI)

A. Create a Namespace manually

B. Inspecting Namespaces (`lsns`)

C. Break into a Container (`nsenter`)