Skip to content

What are the security primitives used to harden containerized workloads?

Kubernetes security relies on Defense in Depth. While namespaces and cgroups provide the basic structure of a container, the primitives below—Capabilities, Seccomp, AppArmor/SELinux, and User Namespaces—secure the boundary between the container and the host kernel.


1. Linux Capabilities

Concept: Breaking "Root" into Pieces. Traditionally, the Unix root user (UID 0) has unlimited power. Linux Capabilities break this power into distinct units. This allows you to grant a process the specific privileges it needs (like binding to a low-numbered network port) without giving it full superuser access to the entire system.

  • What they are: A granular set of permissions (e.g., CAP_CHOWN, CAP_KILL).
  • Common Capabilities:
    • NET_ADMIN: Allows interface configuration, firewall rule manipulation, and routing table modification.
    • SYS_ADMIN: The "catch-all" capability. It is very powerful and allows mounting filesystems, using ptrace, and more. Avoid granting this if possible.
    • NET_BIND_SERVICE: Allows binding to ports below 1024 (like port 80).
  • Dropping Capabilities: A best practice is to drop all capabilities and only add back the ones strictly necessary. This minimizes the "blast radius" if an attacker compromises the container.

Example: Dropping ALL and adding specific networking permission

yaml
apiVersion: v1
kind: Pod
metadata:
  name: security-context-demo
spec:
  containers:
  - name: sec-ctx-demo
    image: nginx
    securityContext:
      capabilities:
        drop:
        - ALL          # Best Practice: Start with zero privileges
        add:
        - NET_ADMIN    # Only grant specific network admin rights

2. Seccomp (Secure Computing Mode)

Concept: A Firewall for System Calls. While capabilities control permission, Seccomp controls action. It filters the system calls (syscalls) a process can make to the Linux kernel. To understand what these calls look like under the hood, learn how to trace them using strace.

  • System Call Filtering: Every time a program opens a file, accepts a connection, or exits, it makes a syscall. Seccomp restricts which of these calls are allowed. If a containerized application tries to use a restricted syscall (like reboot), the kernel blocks it.
  • Profiles:
    • Unconfined: No filtering (Dangerous).
    • RuntimeDefault: A sane default profile provided by the container runtime. This blocks highly dangerous calls and is the recommended baseline.
    • Localhost: A custom profile defined in a JSON file on the node.

Example: Enforcing the RuntimeDefault profile

yaml
apiVersion: v1
kind: Pod
metadata:
  name: default-seccomp
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault  # Uses the container runtime's safe default
  containers:
  - name: test-container
    image: nginx

3. AppArmor / SELinux (Mandatory Access Control)

Concept: Policy-Driven Access Control. These are kernel modules that act as a second layer of enforcement. Even if a user has "permission" (via standard Linux permissions) to touch a file, MAC (Mandatory Access Control) can block it based on a security profile.

  • AppArmor (Application Armor):

    • Uses Path-based profiles. It restricts what files an executable can read, write, or execute.
    • K8s Usage: You load profiles onto the node (e.g., k8s-nginx) and reference them in the Pod spec.
    • Modes: Profiles can be in Enforce (block actions) or Complain (log actions) mode.
  • SELinux (Security-Enhanced Linux):

    • Uses Label-based controls. Every file and process has a label (e.g., container_t). Policies dictate how labels interact.
    • K8s Usage: Kubernetes can automatically mount volumes with the correct SELinux context using the seLinuxOptions field or mount options.

4. User Namespaces and UID Mapping

Concept: Identity Isolation (The Ultimate Illusion). User Namespaces decouple the user ID (UID) inside the container from the UID on the host.

  • The Mechanism:
    • Inside the container: The process looks like root (UID 0). It can install packages and modify files owned by root inside the container image.
    • Outside (On the Host): The kernel maps UID 0 inside to an unprivileged UID (e.g., UID 65534) on the host.
  • Security Implications: This massively reduces risk. If an attacker manages to "break out" of the container, they find themselves as a powerless user on the host node, unable to access system files or kill processes.
  • Configuration: Enabled via the hostUsers field in the Pod spec (requires Kubernetes v1.25+ and specific feature gates).

Summary Comparison

PrimitiveControls...Best Practice
CapabilitiesPrivilege Power. (Can I modify network settings?)Drop ALL, add only needed.
SeccompKernel Actions. (Can I call reboot or swapon?)Use RuntimeDefault.
AppArmor/SELinuxResource Access. (Can I write to /etc?)Use standard profiles; avoid Unconfined.
User NamespacesUser Identity. (Am I really root?)Enable hostUsers: false where supported.

Based on Kubernetes v1.35 (Timbernetes). Changelog.