Appearance
Scenario: Multi-Attach Error
When a node fails, Kubernetes protects your data by refusing to detach the volume. This is the Multi-Attach error.
The Scenario: Volume Detachment During Node Failure
When a node fails (crashes, loses network, or powers down unexpectedly), the kubelet on that node can no longer communicate with the Kubernetes API server. This creates a "stuck" state for volumes:
- Normal Behavior: Typically, when a Pod terminates, the kubelet unmounts the volume, updates the API, and then the control plane detaches the volume.
- Failure Behavior: Because the node is unreachable, the kubelet cannot confirm that the volume has been unmounted. Consequently, the control plane (specifically the AttachDetachController) assumes the volume is still in use and potentially mounted on the failed node. It will not detach the volume automatically to prevent data corruption.
The "Multi-Attach error for volume"
This error typically occurs when a StatefulSet or Deployment tries to reschedule a Pod from the failed node (Node A) to a healthy node (Node B).
- The Cause: Most standard PersistentVolumes (like EBS, PD, or Cinder) use the
ReadWriteOnce(RWO) access mode, meaning the volume can only be attached to one node at a time. - The Conflict: The volume remains "Attached" to the failed Node A in the cloud provider's backend. When the AttachDetachController attempts to attach the same volume to the new Node B, the storage provider rejects the request because the volume is already attached elsewhere.
- The Error: This rejection results in a "Multi-Attach error" event on the Pod, blocking it from starting.
Role of the AttachDetachController
The AttachDetachController is a control plane loop running inside the kube-controller-manager. It is responsible for ensuring the actual attachment state of volumes matches the desired state defined by Pod schedules.
- During Failure: The controller observes that the volume is attached to the failed node. Without a signal that the node is safely out of service, the controller waits indefinitely for the node to recover or for the volume to be safely unmounted, preventing the detach operation.
- Force Detach on Timeout: There is a feature called "force storage detach on timeout." If enabled (it can be disabled via
disable-force-detach-on-timeout), Kubernetes will force-detach a volume if a Pod deletion has failed for 6 minutes and the node is unhealthy. However, this is dangerous and can lead to data corruption if the workload on the "failed" node is actually still writing to the disk.
Role of VolumeAttachment Objects
VolumeAttachment is a non-namespaced API object that captures the intent to attach a specific volume to a specific node.
- Tracking State: It serves as the source of truth for the AttachDetachController and CSI drivers. It tracks whether a volume is
attached: trueor false, and records anyattachErrorordetachError. - CSI Interaction: For CSI drivers, the
external-attachersidecar watchesVolumeAttachmentobjects. When it sees a new object, it calls the CSI driver to perform the physical attachment. It updates thestatusof theVolumeAttachmentobject to reflect success or failure.
Resolution: Non-Graceful Node Shutdown
To resolve the "Multi-Attach error" and safely detach the volume without waiting for the timeout, you must tell Kubernetes that the node is permanently gone.
- The "Out-of-Service" Taint: You can manually add the taint
node.kubernetes.io/out-of-service:NoExecute(orNoSchedule) to the failed node. - The Effect: This taint signals the control plane that the node is irrecoverable. The control plane will then:
- Force delete the Pods on that node (ignoring the missing kubelet confirmation).
- Immediately trigger the detach operation for the volumes, freeing them up to be attached to the new node.
WARNING
You must verify the node is actually shut down before adding this taint to avoid data corruption caused by two nodes writing to the same volume simultaneously.