Skip to content

Attach/Detach Mechanics

How does the attach/detach controller handle Pod migration, and what causes 'Multi-Attach' errors?

The attach/detach controller is a core control loop running within the kube-controller-manager that is responsible for safely orchestrating the attachment and detachment of storage volumes to and from cluster nodes.

When working with modern out-of-tree Container Storage Interface (CSI) plugins, the attach/detach controller does not communicate with the storage vendor's API directly. Instead, it relies on a highly decoupled architecture utilizing the VolumeAttachment API object and the CSI external-attacher sidecar.

Internal Mechanics of the Attach/Detach Workflow

  1. Checking Requirements: When a Pod is scheduled to a node, the control plane checks the CSIDriver object to determine if the specific storage plugin requires an explicit attach operation.
  2. Declaring Intent: If required, the attach/detach controller creates a VolumeAttachment object in the Kubernetes API. This object captures the strict, declarative intent to attach a specific PersistentVolume to a specific Node.
  3. Execution via Sidecar: The external-attacher sidecar (which runs alongside the vendor's CSI driver) constantly watches for new VolumeAttachment objects. It reads the intent, makes the imperative gRPC calls to the vendor's CSI driver to perform the physical attachment, and then updates the VolumeAttachment status to indicate the operation is complete.
  4. Kubelet Mounting: The kubelet on the target node continuously monitors the VolumeAttachment status; it waits until the status is marked as attached before it proceeds to mount the volume into the Pod's filesystem.

The Migration Workflow: Moving from Node A to Node B

When a stateful Pod is rescheduled from Node A to Node B (for example, due to a deployment update or scaling event), Kubernetes must safely transfer the storage.

Before the volume can be attached to Node B, the attach/detach controller must ensure the volume is fully unmounted and detached from Node A. The Kubelet on Node A must successfully stop the container, unmount the filesystem, and report this success back to the API server. Only then does the attach/detach controller delete or modify the VolumeAttachment for Node A, signaling the external-attacher to physically detach the disk from Node A. Once that succeeds, the controller issues a new VolumeAttachment for Node B.

The 'Multi-Attach' Error on AWS EBS and GCP PD

A "Multi-Attach" error is a common operational hurdle when running stateful workloads on cloud provider block storage, such as Amazon Elastic Block Store (EBS) or Google Compute Engine Persistent Disks (PD).

By architectural design, AWS EBS and GCE PD are block storage devices that only support the ReadWriteOnce access mode for write operations. The ReadWriteOnce mode dictates that the volume can be mounted as read-write by a single node at any given time.

The "Multi-Attach" error almost exclusively occurs during a Node failure or a network partition. If Node A becomes unresponsive, the following sequence unfolds:

  1. Loss of Communication: The API server loses contact with the kubelet on Node A.
  2. Safety Lock: Because the kubelet cannot report that it has safely unmounted the volume and terminated the Pod, the control plane cannot guarantee that the application isn't still actively writing to the disk.
  3. Rejection: To prevent catastrophic data corruption (such as two separate Linux kernels attempting to manage the same ext4 filesystem simultaneously), the Kubernetes control plane refuses to issue a clean detach command.
  4. The Error: The scheduler places the replacement Pod on Node B, but the cloud provider's backend rejects the CSI driver's attach request because the disk is physically still locked to Node A. Kubernetes surfaces this rejection as a Multi-Attach error.

Operational Resolution

Historically, this required manual intervention to delete the Node object or force-detach the disk via the cloud console. Modern Kubernetes provides built-in mechanisms to handle this:

  • Forced Detach on Timeout: If a Pod deletion has not succeeded for 6 minutes and the node is unhealthy, Kubernetes will automatically attempt to force-detach the volume. However, this carries a strict architectural warning: if the node is simply network-partitioned and the workload is actually still running, force-detaching the volume violates the CSI specification and can result in severe data corruption.
  • The Out-Of-Service Taint: For a much safer and faster recovery during a non-graceful node shutdown (e.g., the VM died and will not return), an administrator can manually apply the node.kubernetes.io/out-of-service:NoExecute taint to Node A. This explicitly tells the control plane that the node is permanently out of service. Kubernetes will immediately force delete the Pods on that node and immediately perform the volume detach operations, allowing the storage to quickly attach to Node B without waiting for the timeout.

Based on Kubernetes v1.35 (Timbernetes). Changelog.