Appearance
Linux Kernel Primitives for Kubernetes: Namespaces & Cgroups Explained
Cgroups & Namespaces
Before Kubernetes was an orchestrator, it was just Linux. The entire container revolution is built upon two kernel primitives that existed long before Docker: cgroups (Control Groups) and namespaces.
Historical Context
Google engineers (initially Paul Menage and Rohit Seth) began working on "Process Containers" in 2006. This was later renamed to cgroups and merged into the Linux Kernel 2.6.24 in 2008. Without this kernel patch, Kubernetes would not exist.
1. key concepts: The Two Pillars
If you think of a Container as a "Box", these two primitives define the walls of that box.
| Primitive | Function | The Metaphor |
|---|---|---|
| Namespaces | Isolation (Visibility) | "What can I see?" |
| Cgroups | Restriction (Usage) | "How much can I use?" |
2. Namespaces: The Walls of Reality
Linux Namespaces lie to the process. When you run a process in a namespace, it thinks it is the only thing running on the machine. It has its own "view" of the system.
The 7 Critical Namespaces
While cgroups Enforce limits, Namespaces manage resource visibility. They create the fundamental illusion that a container is running on its own dedicated operating system.
1. PID Namespace (Process ID)
This namespace isolates the process ID number space.
- Process Isolation: Processes within a container can only "see" other processes in the same PID namespace, preventing them from spying on host processes.
- How Containers get PID 1: Inside a new PID namespace, the first process started is assigned PID 1 (your application's entrypoint). PID 1 must handle system signals (like
SIGTERM) and reap zombie processes. - Host vs. Container View: A process has two IDs. Inside, it sees itself as
PID 1. On the host, the kernel sees it asPID 12345.
Engineering Note
Kubernetes allows you to disable this isolation by setting shareProcessNamespace: true in the Pod spec. This allows containers in the same Pod to see each other's processes (useful for sidecar debugging).
2. Network Namespace
This provides a completely independent network stack (interfaces, IPs, routing tables).
- Virtual Network Interfaces: To connect this isolated namespace to the cluster, CNI plugins use a veth pair (virtual ethernet). One end is placed inside the container (
eth0), and the other remains on the host attached to a bridge (cni0). - How Pods get their own IP: Kubernetes creates the network namespace for the Pod, not the individual container. All containers in a Pod share this single network namespace (and thus the same IP address and
localhost).
3. Mount Namespace
Allows processes to have their own isolated view of filesystem mount points.
- Root Filesystem (§): When a container starts, it sees its image layers as the root filesystem.
- pivot_root: A secure system call used during container creation that swaps the host OS root with the container image root.
4. UTS Namespace (UNIX Time-Sharing)
- Hostname Isolation: Allows a container to have its own hostname independent of the host node. This is exactly how Kubernetes assigns a stable, resolvable hostname to a Pod (e.g.,
web-0in a StatefulSet) that persists across reschedules.
5. IPC Namespace (Inter-Process Communication)
- Shared Memory: Containers within the same Pod share the IPC namespace. This enables them to communicate using high-performance IPC mechanisms like shared memory segments, while remaining completely isolated from other Pods.
6. User Namespace
- UID/GID Mapping: Allows a process to run as
root(UID 0) inside the container, but be mapped to an unprivileged user (e.g., UID 1000) on the host. This drastically reduces the blast radius of a container breakout.
7. Cgroup Namespace
- Cgroup View Isolation: Virtualizes the view of the
/proc/self/cgroupfile. The container sees itself at the root of the hierarchy, preventing it from modifying cgroups belonging to other neighboring containers.
3. Cgroups: The Resource Police
If Namespaces are the walls, Cgroups are the guards watching the door. They enforce limits and accounting.
When you write this YAML in Kubernetes:
yaml
resources:
limits:
memory: "512Mi"
cpu: "500m"The Kubelet translates that generic text into specific Linux Cgroup Files on the worker node.
How it works on disk
On the Linux host, you can find these controls in /sys/fs/cgroup/.
1. CPU Throttling (cpu.cfs_quota_us) When you leverage a CPU limit of 0.5 (500m), the kernel writes a value to the scheduler: "This process group gets 50,000 microseconds of runtime for every 100,000 microseconds of real time."
- If the app asks for more? The kernel pauses (throttles) the process.
2. Memory OOM (memory.limit_in_bytes) When you set a limit of 512Mi, the kernel writes 536870912 to this file.
- If the app asks for more? The kernel invokes the OOM Killer (Out of Memory Killer) to instantly terminate the process (SIGKILL).
The Engineering Takeaway
"Containers" technically do not exist. There is no kernel object called "Container". There are only normal Linux processes that have been tricked by Namespaces (so they can't see neighbors) and restricted by Cgroups (so they don't eat all the RAM).
4. Prove it to yourself (Hands-on CLI)
You don't need Docker or Kubernetes to see this. You just need Linux (or a VM).
A. Create a Namespace manually
The unshare command lets you create a process in a new namespace.
bash
# Become root, then start a new shell in a new PID namespace
sudo unshare --pid --fork --mount-proc /bin/bashInside this new shell, run ps aux. You will see you are PID 1. You have become the container.
B. Inspecting Namespaces (lsns)
You can list all namespaces on a host to find "hidden" containers.
bash
lsns -t netThis shows every Network Namespace. If you run a Docker container, a new entry appears here.
C. Break into a Container (nsenter)
This is the most critical troubleshooting command for CKA. If a container has no shell (distroless), how do you debug it?
- Find the PID of the container process on the host.
- Use
nsenterto jump into its namespace using your host's tools.
bash
# 1. Get the PID using the modern CRI tool (containerd/CRI-O)
PID=$(crictl inspect --output go-template --template '{{.info.pid}}' <container-id>)
# 2. Enter the NET and PID namespaces
nsenter --target $PID --net --pidYou are now "inside" the container's reality, but using your host's extensive binary toolkit (curl, tcpdump, strace).