Skip to content

The Origins

Cgroups & Namespaces

Before Kubernetes was an orchestrator, it was just Linux. The entire container revolution is built upon two kernel primitives that existed long before Docker: cgroups (Control Groups) and namespaces.

Historical Context

Google engineers (initially Paul Menage and Rohit Seth) began working on "Process Containers" in 2006. This was later renamed to cgroups and merged into the Linux Kernel 2.6.24 in 2008. Without this kernel patch, Kubernetes would not exist.

1. key concepts: The Two Pillars

If you think of a Container as a "Box", these two primitives define the walls of that box.

PrimitiveFunctionThe Metaphor
NamespacesIsolation (Visibility)"What can I see?"
CgroupsRestriction (Usage)"How much can I use?"

2. Namespaces: The Walls of Reality

Linux Namespaces lie to the process. When you run a process in a namespace, it thinks it is the only thing running on the machine. It has its own "view" of the system.

The 7 Critical Namespaces

  1. PID (Process ID):
    • The Illusion: "I am PID 1."
    • The Reality: You are actually PID 24590 on the host node.
  2. MNT (Mount):
    • The Illusion: "I see a full Debian filesystem at /."
    • The Reality: You are trapped in a folder (/var/lib/docker/overlay2...).
  3. NET (Network):
    • The Illusion: "I have my own eth0 interface and IP (10.244.0.5)."
    • The Reality: You are using a virtual ethernet pair (veth) connected to a bridge on the host.
  4. UTS (Unix Timesharing): Allows the container to have its own Hostname.
  5. IPC (Inter-Process Comms): Prevents shared memory access between containers.
  6. USER: Maps "Root" inside the container to a "Nobody" user outside (for security).
  7. CGROUP: (Newer) Isolated view of cgroup files.

3. Cgroups: The Resource Police

If Namespaces are the walls, Cgroups are the guards watching the door. They enforce limits and accounting.

When you write this YAML in Kubernetes:

yaml
resources:
  limits:
    memory: "512Mi"
    cpu: "500m"

The Kubelet translates that generic text into specific Linux Cgroup Files on the worker node.

How it works on disk

On the Linux host, you can find these controls in /sys/fs/cgroup/.

1. CPU Throttling (cpu.cfs_quota_us) When you leverage a CPU limit of 0.5 (500m), the kernel writes a value to the scheduler: "This process group gets 50,000 microseconds of runtime for every 100,000 microseconds of real time."

  • If the app asks for more? The kernel pauses (throttles) the process.

2. Memory OOM (memory.limit_in_bytes) When you set a limit of 512Mi, the kernel writes 536870912 to this file.

  • If the app asks for more? The kernel invokes the OOM Killer (Out of Memory Killer) to instantly terminate the process (SIGKILL).

The Engineering Takeaway

"Containers" technically do not exist. There is no kernel object called "Container". There are only normal Linux processes that have been tricked by Namespaces (so they can't see neighbors) and restricted by Cgroups (so they don't eat all the RAM).

4. Prove it to yourself (Hands-on CLI)

You don't need Docker or Kubernetes to see this. You just need Linux (or a VM).

A. Create a Namespace manually

The unshare command lets you create a process in a new namespace.

bash
# Become root, then start a new shell in a new PID namespace
sudo unshare --pid --fork --mount-proc /bin/bash

Inside this new shell, run ps aux. You will see you are PID 1. You have become the container.

B. Inspecting Namespaces (lsns)

You can list all namespaces on a host to find "hidden" containers.

bash
lsns -t net

This shows every Network Namespace. If you run a Docker container, a new entry appears here.

C. Break into a Container (nsenter)

This is the most critical troubleshooting command for CKA. If a container has no shell (distroless), how do you debug it?

  1. Find the PID of the container process on the host.
  2. Use nsenter to jump into its namespace using your host's tools.
bash
# 1. Get the PID
PID=$(docker inspect -f '{{.State.Pid}}' my-container)

# 2. Enter the NET and PID namespaces
nsenter --target $PID --net --pid

You are now "inside" the container's reality, but using your host's extensive binary toolkit (curl, tcpdump, strace).