What is kube-controller-manager?

### Definition: kube-controller-manager

The kube-controller-manager is a critical component of the Kubernetes Control Plane that acts as the primary automation engine for the cluster. While it runs as a single binary process to reduce architectural complexity, it logically encapsulates a suite of independent control loops (controllers).

From a platform engineering perspective, the kube-controller-manager is responsible for reconciling the current state of the cluster with the desired state defined in the API. It does not execute workloads itself; rather, it watches the shared state via the API server and makes changes (such as creating Pods or calling cloud APIs) to ensure reality matches your "Record of Intent".

Key Responsibilities:

Node Controller: Monitors node health, managing timeouts and evictions when nodes go offline.
Replication Controller: Ensures the correct number of Pod replicas are running for replication controllers.
ServiceAccount Controller: Creates default ServiceAccounts and API access tokens for new namespaces.
Job Controller: Watches Job objects and creates Pods to run tasks to completion.
Cloud Interactions: If not using an external cloud-controller-manager, it handles interactions with underlying cloud providers (e.g., configuring routes or load balancers).

Operational Workflows and Useful Commands

Managing the kube-controller-manager often involves debugging cluster state (why isn't X happening?), checking high-availability status, or modifying low-level cluster behaviors via flags.

1. Checking Status and Health

In most production clusters (like those built with kubeadm), the controller manager runs as a static Pod on the control plane nodes.

Command: Verify the Pod status

bash

kubectl get pods -n kube-system -l component=kube-controller-manager

Why this is useful: It confirms that the binary is running and passing its liveness probes.

Command: Check component health status (if exposed)

bash

kubectl get --raw /healthz

Note: While this usually checks the API server, the controller manager also exposes health endpoints if configured locally.

2. Investigating Logs

Because the controller manager is responsible for "fixing" the cluster, its logs are the first place to look when workloads are stuck (e.g., PersistentVolumes not binding, Pods not being created by a Deployment).

Command: Tail logs for the active leader

bash

# List the pods first to get the specific name
kubectl logs -n kube-system kube-controller-manager-<node-name> --follow

Why this is useful: This reveals errors such as "failed to create pod," "quota exceeded," or cloud provider API failures.

Command: Check logs on the node (if container runtime fails) If the static pod isn't starting, checking the file system logs on the control plane node is necessary:

bash

# Run on the control plane node
cat /var/log/kube-controller-manager.log
# OR
journalctl -u kubelet # Kubelet is responsible for starting the static pod

Source:

3. Managing Leader Election (High Availability)

In a highly available cluster, multiple controller manager instances run, but only one is the active leader to prevent conflicting actions.

Command: Identify the current leader The leader election logic uses Lease objects to coordinate.

bash

kubectl get lease -n kube-system kube-controller-manager

Why this is useful: Use this to determine which specific node is currently orchestrating the cluster. If you are debugging logs, you must look at the logs of the holder of the lease.

4. Configuration and Flags

The behavior of the controller manager is heavily driven by startup flags. In a kubeadm environment, these are often managed via the ClusterConfiguration.

Example: Modifying flags via kubeadm To change settings (e.g., changing the pod eviction timeout or enabling specific feature gates), you edit the cluster configuration.

yaml

# Example excerpt from a kubeadm config
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
controllerManager:
  extraArgs:
    # Example: Change how fast nodes are evicted after failure
    "pod-eviction-timeout": "2m"
    # Example: Enable/Disable specific controllers
    "controllers": "*,bootstrappersigner,tokencleaner"

Source:

Command: View current flags (on a live node) If you cannot find the config file, you can inspect the definition of the running pod.

bash

kubectl get pod -n kube-system kube-controller-manager-<node-name> -o yaml

Look under spec.containers.command to see the flags passed to the binary.

5. Debugging Performance (Metrics)

The controller manager exposes metrics that are vital for monitoring the health of the control plane (e.g., how long work queues are taking).

Command: Accessing raw metrics

bash

# You may need to create a proxy to access the metrics port (securely)
kubectl get --raw /metrics

Why this is useful: You can look for metrics like workqueue_depth to see if the controller manager is overwhelmed and falling behind on processing changes.

Summary of Critical Flags

When configuring the kube-controller-manager, these are some of the most common flags engineers interact with:

Flag	Purpose	Example Context
`--cluster-cidr`	Specifies the CIDR range for Pods in the cluster.	Network setup.
`--allocate-node-cidrs`	Tells the manager to allocate CIDRs for new nodes automatically.	Setting up CNI networking.
`--controllers`	Enables/Disables specific control loops.	Disabling the `cloud-node-lifecycle` controller when migrating to an external Cloud Controller Manager.
`--cluster-signing-cert-file`	Location of the CA to sign certificates (CSRs).	Managing Certificate Signing Requests.
`--use-service-account-credentials`	Runs each controller with individual ServiceAccount tokens rather than one super-admin token.	Security best practice.

What is kube-controller-manager?

Operational Workflows and Useful Commands ​

1. Checking Status and Health ​

2. Investigating Logs ​

3. Managing Leader Election (High Availability) ​

4. Configuration and Flags ​

5. Debugging Performance (Metrics) ​

Summary of Critical Flags ​

Operational Workflows and Useful Commands

1. Checking Status and Health

2. Investigating Logs

3. Managing Leader Election (High Availability)

4. Configuration and Flags

5. Debugging Performance (Metrics)

Summary of Critical Flags