Appearance
Kubernetes Volume Stuck: PVC Pending & Mount Failures Fixed
Kubernetes Volume Troubleshooting: Essential Commands & Quick Fixes
When a volume issue occurs, you need fast diagnosis. This guide provides the essential kubectl commands and troubleshooting workflow to identify and fix volume problems quickly.
Fast Diagnostic Workflow
Step 1: Check the PVC Status
bash
# List all PVCs in current namespace
kubectl get pvc
# Detailed view with events
kubectl describe pvc <pvc-name>Status meanings:
Pending→ Waiting for binding or provisioningBound→ Successfully bound to a PVLost→ Associated PV deleted but PVC still exists
Look for these event messages:
waiting for first consumer→ WaitForFirstConsumer mode (normal)failed to provision volume→ Provisioner errorno persistent volumes available→ No matching PV existsexceeded quota→ Resource limit hit
Step 2: Check the Pod Status
bash
# Check if pod is waiting for volume
kubectl get pod <pod-name>
# Detailed events
kubectl describe pod <pod-name> | grep -A 10 EventsVolume-related pod states:
ContainerCreating→ Waiting for volume mountPending→ Can't schedule (often due to volume topology)
Critical events to find:
FailedAttachVolume→ Volume can't attach to nodeFailedMount→ Volume attached but mount failedMulti-Attach error→ Volume already attached elsewhere
Step 3: Check VolumeAttachment Objects
bash
# List all volume attachments
kubectl get volumeattachment
# Detailed view
kubectl describe volumeattachment <va-name>Key fields:
.status.attached: true→ Successfully attached.status.attached: false→ Attachment failed or in progress.status.attachError→ Error from CSI driver or cloud provider
Step 4: Check the PersistentVolume
bash
# List all PVs (cluster-wide)
kubectl get pv
# Detailed view
kubectl describe pv <pv-name>PV Status meanings:
Available→ Ready to bind to a PVCBound→ Bound to a PVCReleased→ PVC deleted, needs reclaimFailed→ Reclamation failed
Common Issues & Quick Fixes
Issue: PVC Stuck in Pending (WaitForFirstConsumer)
Cause: StorageClass uses volumeBindingMode: WaitForFirstConsumer
Check:
bash
kubectl get storageclass <sc-name> -o yaml | grep volumeBindingModeFix: Create a Pod that uses the PVC. The binding happens when the Pod is scheduled.
Issue: PVC Stuck in Pending (No StorageClass)
Check:
bash
# List available storage classes
kubectl get storageclass
# Check what PVC is requesting
kubectl get pvc <pvc-name> -o yaml | grep storageClassNameFix:
bash
# Edit PVC to use existing StorageClass
kubectl patch pvc <pvc-name> -p '{"spec":{"storageClassName":"<valid-sc-name>"}}'Issue: Multi-Attach Error After Node Failure
Check if node is truly dead:
bash
kubectl get nodes
kubectl describe node <failed-node-name>Fix 1: Add out-of-service taint (K8s 1.26+):
bash
kubectl taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecuteThis signals the control plane that the node is permanently gone, triggering immediate volume detachment.
Fix 2: Wait for automatic detachment (6-10 minutes): The AttachDetachController will eventually force-detach after timeout.
Fix 3: Manual cleanup (last resort):
bash
# Delete the stuck VolumeAttachment
kubectl delete volumeattachment <va-name>
# If volume is still stuck, manually detach in cloud console
# AWS: EC2 → Volumes → Force Detach
# GCP: Compute Engine → Disks → Detach
# Azure: Disks → DetachWARNING
Only force-detach if you're certain the old node is completely powered off. Otherwise, risk filesystem corruption.
Issue: CSI Driver Not Responding
Check CSI driver pods:
bash
# List CSI driver pods
kubectl get pods -n kube-system | grep csi
# Common CSI driver names:
# ebs-csi-controller, gce-pd-csi-driver, azuredisk-csi-driverCheck CSI controller logs:
bash
# Provisioner logs (handles volume creation)
kubectl logs -n kube-system <csi-controller-pod> -c csi-provisioner
# Attacher logs (handles volume attachment)
kubectl logs -n kube-system <csi-controller-pod> -c csi-attacherCheck CSI node driver logs:
bash
# Find CSI node pod on the target node
kubectl get pods -n kube-system -o wide | grep csi-node
# Check logs (handles volume mounting)
kubectl logs -n kube-system <csi-node-pod> -c csi-driverCommon CSI errors:
rpc error: code = Unavailable→ Driver not running or unreachableAccess Denied→ IAM/RBAC permissions missingQuota Exceeded→ Cloud provider limits hit
Issue: Volume Quota Exceeded
Check namespace quota:
bash
kubectl get resourcequota -n <namespace>
kubectl describe resourcequota -n <namespace>Fix:
bash
# Delete unused PVCs
kubectl delete pvc <unused-pvc-name>
# Or request quota increase from cluster adminIssue: StorageClass Provisioner Failed
Check provisioner configuration:
bash
kubectl get storageclass <sc-name> -o yamlVerify provisioner is running:
bash
# For CSI drivers
kubectl get pods -n kube-system | grep <provisioner-name>
# Check external-provisioner logs
kubectl logs -n kube-system <provisioner-pod> -c csi-provisionerForce Cleanup Procedures
Warning
These commands bypass Kubernetes' safety mechanisms. Only use when you've verified the volume is not in use and standard cleanup has failed.
Force Delete Stuck PVC
bash
# Try graceful delete first
kubectl delete pvc <pvc-name>
# If stuck in Terminating state
kubectl patch pvc <pvc-name> -p '{"metadata":{"finalizers":null}}'Force Delete Stuck PV
bash
# Remove finalizers to allow deletion
kubectl patch pv <pv-name> -p '{"metadata":{"finalizers":null}}'Force Delete VolumeAttachment
bash
kubectl delete volumeattachment <va-name>Force Delete Pod Waiting for Volume
bash
# Graceful delete with zero grace period
kubectl delete pod <pod-name> --grace-period=0 --forceValidation Commands
Verify StorageClass Configuration
bash
# Check if StorageClass exists
kubectl get storageclass
# Check default StorageClass
kubectl get storageclass -o json | jq '.items[] | select(.metadata.annotations["storageclass.kubernetes.io/is-default-class"]=="true") | .metadata.name'
# Verify provisioner
kubectl get storageclass <sc-name> -o jsonpath='{.provisioner}'Verify CSI Driver Installation
bash
# Check CSI driver
kubectl get csidrivers
# Check CSI nodes
kubectl get csinodes
# Verify driver is registered on nodes
kubectl get csinode <node-name> -o yamlCheck Volume Attachment Timeline
bash
# See when VolumeAttachment was created
kubectl get volumeattachment -o custom-columns=NAME:.metadata.name,AGE:.metadata.creationTimestamp,ATTACHED:.status.attached
# If created >5 minutes ago and still not attached, investigatePrevention Best Practices
Use StatefulSets for Stateful Workloads
StatefulSets handle PVC lifecycle correctly:
yaml
apiVersion: apps/v1
kind: StatefulSet
spec:
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 10GiSet Appropriate Reclaim Policy
yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete # or Retain for important data
volumeBindingMode: WaitForFirstConsumerMonitor Volume Attachment Time
bash
# Alert if attachment takes >2 minutes
kubectl get volumeattachment -o json | \
jq -r '.items[] | select(.status.attached == false) |
select((now - (.metadata.creationTimestamp | fromdateiso8601)) > 120) |
.metadata.name'Access Modes Quick Reference
| Access Mode | Abbreviation | Meaning |
|---|---|---|
| ReadWriteOnce | RWO | Volume mounted read-write by one node |
| ReadWriteMany | RWX | Volume mounted read-write by many nodes |
| ReadOnlyMany | ROX | Volume mounted read-only by many nodes |
| ReadWriteOncePod | RWOP | Volume mounted read-write by one pod (K8s 1.22+) |
Key Point
ReadWriteOnce = one node, not one pod. Multiple pods on the same node can share an RWO volume. This is why multi-attach errors occur when pods move between nodes.
Quick Reference Commands
bash
# Complete volume stack overview
kubectl get pvc,pv,volumeattachment,storageclass
# Watch volume attachments in real-time
kubectl get volumeattachment -w
# Find using PVCs by StorageClass
kubectl get pvc --all-namespaces -o json | \
jq -r '.items[] | select(.spec.storageClassName=="fast-ssd") |
"\(.metadata.namespace)/\(.metadata.name)"'
# Find all PVCs in Pending state
kubectl get pvc --all-namespaces --field-selector=status.phase=Pending
# Check node volume limits (max attachable volumes per node)
kubectl get csinode <node-name> -o yaml
# List all volumes attached to a specific node
kubectl get volumeattachment -o json | \
jq -r '.items[] | select(.spec.nodeName=="<node-name>") | .spec.source.persistentVolumeName'Troubleshooting Checklist
When debugging volume issues, check in this order:
- ✅ PVC status - Is it Pending, Bound, or Lost?
- ✅ PVC events - What does
kubectl describe pvcsay? - ✅ Pod status - Is it ContainerCreating or Pending?
- ✅ Pod events - Any FailedMount or FailedAttachVolume?
- ✅ VolumeAttachment - Is it attached? Any errors?
- ✅ StorageClass - Does it exist? Is provisioner running?
- ✅ CSI driver - Are controller and node pods healthy?
- ✅ Node status - Is the target node Ready?
- ✅ Resource quota - Any limits preventing creation?
- ✅ Cloud provider - Manual check in console if needed
Key Takeaways
- Always check events first -
kubectl describeshows what's actually wrong - WaitForFirstConsumer is normal - PVC stays Pending until Pod is created
- Multi-attach errors resolve automatically - Wait 6-10 minutes after node failure
- Use out-of-service taint for faster recovery from node failures
- CSI driver health is critical - Check controller and node pods
- Don't remove finalizers unless necessary - They prevent orphaned cloud resources
- ReadWriteOnce = one node - Not one pod, which causes confusion