Kubernetes Volume Stuck: PVC Pending & Mount Failures Fixed

Kubernetes Volume Troubleshooting: Essential Commands & Quick Fixes

When a volume issue occurs, you need fast diagnosis. This guide provides the essential kubectl commands and troubleshooting workflow to identify and fix volume problems quickly.

Fast Diagnostic Workflow

Step 1: Check the PVC Status

bash

# List all PVCs in current namespace
kubectl get pvc

# Detailed view with events
kubectl describe pvc <pvc-name>

Status meanings:

Pending → Waiting for binding or provisioning
Bound → Successfully bound to a PV
Lost → Associated PV deleted but PVC still exists

Look for these event messages:

waiting for first consumer → WaitForFirstConsumer mode (normal)
failed to provision volume → Provisioner error
no persistent volumes available → No matching PV exists
exceeded quota → Resource limit hit

Step 2: Check the Pod Status

bash

# Check if pod is waiting for volume
kubectl get pod <pod-name>

# Detailed events
kubectl describe pod <pod-name> | grep -A 10 Events

Volume-related pod states:

ContainerCreating → Waiting for volume mount
Pending → Can't schedule (often due to volume topology)

Critical events to find:

FailedAttachVolume → Volume can't attach to node
FailedMount → Volume attached but mount failed
Multi-Attach error → Volume already attached elsewhere

Step 3: Check VolumeAttachment Objects

bash

# List all volume attachments
kubectl get volumeattachment

# Detailed view
kubectl describe volumeattachment <va-name>

Key fields:

.status.attached: true → Successfully attached
.status.attached: false → Attachment failed or in progress
.status.attachError → Error from CSI driver or cloud provider

Step 4: Check the PersistentVolume

bash

# List all PVs (cluster-wide)
kubectl get pv

# Detailed view
kubectl describe pv <pv-name>

PV Status meanings:

Available → Ready to bind to a PVC
Bound → Bound to a PVC
Released → PVC deleted, needs reclaim
Failed → Reclamation failed

Common Issues & Quick Fixes

Issue: PVC Stuck in Pending (WaitForFirstConsumer)

Cause: StorageClass uses volumeBindingMode: WaitForFirstConsumer

Check:

bash

kubectl get storageclass <sc-name> -o yaml | grep volumeBindingMode

Fix: Create a Pod that uses the PVC. The binding happens when the Pod is scheduled.

Issue: PVC Stuck in Pending (No StorageClass)

Check:

bash

# List available storage classes
kubectl get storageclass

# Check what PVC is requesting
kubectl get pvc <pvc-name> -o yaml | grep storageClassName

Fix:

bash

# Edit PVC to use existing StorageClass
kubectl patch pvc <pvc-name> -p '{"spec":{"storageClassName":"<valid-sc-name>"}}'

Issue: Multi-Attach Error After Node Failure

Check if node is truly dead:

bash

kubectl get nodes
kubectl describe node <failed-node-name>

Fix 1: Add out-of-service taint (K8s 1.26+):

bash

kubectl taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecute

This signals the control plane that the node is permanently gone, triggering immediate volume detachment.

Fix 2: Wait for automatic detachment (6-10 minutes): The AttachDetachController will eventually force-detach after timeout.

Fix 3: Manual cleanup (last resort):

bash

# Delete the stuck VolumeAttachment
kubectl delete volumeattachment <va-name>

# If volume is still stuck, manually detach in cloud console
# AWS: EC2 → Volumes → Force Detach
# GCP: Compute Engine → Disks → Detach
# Azure: Disks → Detach

WARNING

Only force-detach if you're certain the old node is completely powered off. Otherwise, risk filesystem corruption.

Issue: CSI Driver Not Responding

Check CSI driver pods:

bash

# List CSI driver pods
kubectl get pods -n kube-system | grep csi

# Common CSI driver names:
# ebs-csi-controller, gce-pd-csi-driver, azuredisk-csi-driver

Check CSI controller logs:

bash

# Provisioner logs (handles volume creation)
kubectl logs -n kube-system <csi-controller-pod> -c csi-provisioner

# Attacher logs (handles volume attachment)
kubectl logs -n kube-system <csi-controller-pod> -c csi-attacher

Check CSI node driver logs:

bash

# Find CSI node pod on the target node
kubectl get pods -n kube-system -o wide | grep csi-node

# Check logs (handles volume mounting)
kubectl logs -n kube-system <csi-node-pod> -c csi-driver

Common CSI errors:

rpc error: code = Unavailable → Driver not running or unreachable
Access Denied → IAM/RBAC permissions missing
Quota Exceeded → Cloud provider limits hit

Issue: Volume Quota Exceeded

Check namespace quota:

bash

kubectl get resourcequota -n <namespace>
kubectl describe resourcequota -n <namespace>

Fix:

bash

# Delete unused PVCs
kubectl delete pvc <unused-pvc-name>

# Or request quota increase from cluster admin

Issue: StorageClass Provisioner Failed

Check provisioner configuration:

bash

kubectl get storageclass <sc-name> -o yaml

Verify provisioner is running:

bash

# For CSI drivers
kubectl get pods -n kube-system | grep <provisioner-name>

# Check external-provisioner logs
kubectl logs -n kube-system <provisioner-pod> -c csi-provisioner

Force Cleanup Procedures

Warning

These commands bypass Kubernetes' safety mechanisms. Only use when you've verified the volume is not in use and standard cleanup has failed.

Force Delete Stuck PVC

bash

# Try graceful delete first
kubectl delete pvc <pvc-name>

# If stuck in Terminating state
kubectl patch pvc <pvc-name> -p '{"metadata":{"finalizers":null}}'

Force Delete Stuck PV

bash

# Remove finalizers to allow deletion
kubectl patch pv <pv-name> -p '{"metadata":{"finalizers":null}}'

Force Delete VolumeAttachment

bash

kubectl delete volumeattachment <va-name>

Force Delete Pod Waiting for Volume

bash

# Graceful delete with zero grace period
kubectl delete pod <pod-name> --grace-period=0 --force

Validation Commands

Verify StorageClass Configuration

bash

# Check if StorageClass exists
kubectl get storageclass

# Check default StorageClass
kubectl get storageclass -o json | jq '.items[] | select(.metadata.annotations["storageclass.kubernetes.io/is-default-class"]=="true") | .metadata.name'

# Verify provisioner
kubectl get storageclass <sc-name> -o jsonpath='{.provisioner}'

Verify CSI Driver Installation

bash

# Check CSI driver
kubectl get csidrivers

# Check CSI nodes
kubectl get csinodes

# Verify driver is registered on nodes
kubectl get csinode <node-name> -o yaml

Check Volume Attachment Timeline

bash

# See when VolumeAttachment was created
kubectl get volumeattachment -o custom-columns=NAME:.metadata.name,AGE:.metadata.creationTimestamp,ATTACHED:.status.attached

# If created >5 minutes ago and still not attached, investigate

Prevention Best Practices

Use StatefulSets for Stateful Workloads

StatefulSets handle PVC lifecycle correctly:

yaml

apiVersion: apps/v1
kind: StatefulSet
spec:
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 10Gi

Set Appropriate Reclaim Policy

yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete  # or Retain for important data
volumeBindingMode: WaitForFirstConsumer

Monitor Volume Attachment Time

bash

# Alert if attachment takes >2 minutes
kubectl get volumeattachment -o json | \
  jq -r '.items[] | select(.status.attached == false) | 
  select((now - (.metadata.creationTimestamp | fromdateiso8601)) > 120) | 
  .metadata.name'

Access Modes Quick Reference

Access Mode	Abbreviation	Meaning
ReadWriteOnce	RWO	Volume mounted read-write by one node
ReadWriteMany	RWX	Volume mounted read-write by many nodes
ReadOnlyMany	ROX	Volume mounted read-only by many nodes
ReadWriteOncePod	RWOP	Volume mounted read-write by one pod (K8s 1.22+)

Key Point

ReadWriteOnce = one node, not one pod. Multiple pods on the same node can share an RWO volume. This is why multi-attach errors occur when pods move between nodes.

Quick Reference Commands

bash

# Complete volume stack overview
kubectl get pvc,pv,volumeattachment,storageclass

# Watch volume attachments in real-time
kubectl get volumeattachment -w

# Find using PVCs by StorageClass
kubectl get pvc --all-namespaces -o json | \
  jq -r '.items[] | select(.spec.storageClassName=="fast-ssd") | 
  "\(.metadata.namespace)/\(.metadata.name)"'

# Find all PVCs in Pending state
kubectl get pvc --all-namespaces --field-selector=status.phase=Pending

# Check node volume limits (max attachable volumes per node)
kubectl get csinode <node-name> -o yaml

# List all volumes attached to a specific node
kubectl get volumeattachment -o json | \
  jq -r '.items[] | select(.spec.nodeName=="<node-name>") | .spec.source.persistentVolumeName'

Troubleshooting Checklist

When debugging volume issues, check in this order:

✅ PVC status - Is it Pending, Bound, or Lost?
✅ PVC events - What does kubectl describe pvc say?
✅ Pod status - Is it ContainerCreating or Pending?
✅ Pod events - Any FailedMount or FailedAttachVolume?
✅ VolumeAttachment - Is it attached? Any errors?
✅ StorageClass - Does it exist? Is provisioner running?
✅ CSI driver - Are controller and node pods healthy?
✅ Node status - Is the target node Ready?
✅ Resource quota - Any limits preventing creation?
✅ Cloud provider - Manual check in console if needed

Key Takeaways

Always check events first - kubectl describe shows what's actually wrong
WaitForFirstConsumer is normal - PVC stays Pending until Pod is created
Multi-attach errors resolve automatically - Wait 6-10 minutes after node failure
Use out-of-service taint for faster recovery from node failures
CSI driver health is critical - Check controller and node pods
Don't remove finalizers unless necessary - They prevent orphaned cloud resources
ReadWriteOnce = one node - Not one pod, which causes confusion

Kubernetes Volume Troubleshooting: Essential Commands & Quick Fixes

Fast Diagnostic Workflow ​

Step 1: Check the PVC Status ​

Step 2: Check the Pod Status ​

Step 3: Check VolumeAttachment Objects ​

Step 4: Check the PersistentVolume ​

Common Issues & Quick Fixes ​

Issue: PVC Stuck in Pending (WaitForFirstConsumer) ​

Issue: PVC Stuck in Pending (No StorageClass) ​

Issue: Multi-Attach Error After Node Failure ​

Issue: CSI Driver Not Responding ​

Issue: Volume Quota Exceeded ​

Issue: StorageClass Provisioner Failed ​

Force Cleanup Procedures ​

Force Delete Stuck PVC ​

Force Delete Stuck PV ​

Force Delete VolumeAttachment ​

Force Delete Pod Waiting for Volume ​

Validation Commands ​

Verify StorageClass Configuration ​

Verify CSI Driver Installation ​

Check Volume Attachment Timeline ​

Prevention Best Practices ​

Use StatefulSets for Stateful Workloads ​

Set Appropriate Reclaim Policy ​

Monitor Volume Attachment Time ​

Access Modes Quick Reference ​

Quick Reference Commands ​

Troubleshooting Checklist ​

Key Takeaways ​

Further Reading ​

Fast Diagnostic Workflow

Step 1: Check the PVC Status

Step 2: Check the Pod Status

Step 3: Check VolumeAttachment Objects

Step 4: Check the PersistentVolume

Common Issues & Quick Fixes

Issue: PVC Stuck in Pending (WaitForFirstConsumer)

Issue: PVC Stuck in Pending (No StorageClass)

Issue: Multi-Attach Error After Node Failure

Issue: CSI Driver Not Responding

Issue: Volume Quota Exceeded

Issue: StorageClass Provisioner Failed

Force Cleanup Procedures

Force Delete Stuck PVC

Force Delete Stuck PV

Force Delete VolumeAttachment

Force Delete Pod Waiting for Volume

Validation Commands

Verify StorageClass Configuration

Verify CSI Driver Installation

Check Volume Attachment Timeline

Prevention Best Practices

Use StatefulSets for Stateful Workloads

Set Appropriate Reclaim Policy

Monitor Volume Attachment Time

Access Modes Quick Reference

Quick Reference Commands

Troubleshooting Checklist

Key Takeaways

Further Reading