Skip to content

Kubernetes Volume Stuck: PVC Pending & Mount Failures Fixed

Kubernetes Volume Troubleshooting: Essential Commands & Quick Fixes

When a volume issue occurs, you need fast diagnosis. This guide provides the essential kubectl commands and troubleshooting workflow to identify and fix volume problems quickly.

Fast Diagnostic Workflow

Step 1: Check the PVC Status

bash
# List all PVCs in current namespace
kubectl get pvc

# Detailed view with events
kubectl describe pvc <pvc-name>

Status meanings:

  • Pending → Waiting for binding or provisioning
  • Bound → Successfully bound to a PV
  • Lost → Associated PV deleted but PVC still exists

Look for these event messages:

  • waiting for first consumer → WaitForFirstConsumer mode (normal)
  • failed to provision volume → Provisioner error
  • no persistent volumes available → No matching PV exists
  • exceeded quota → Resource limit hit

Step 2: Check the Pod Status

bash
# Check if pod is waiting for volume
kubectl get pod <pod-name>

# Detailed events
kubectl describe pod <pod-name> | grep -A 10 Events

Volume-related pod states:

  • ContainerCreating → Waiting for volume mount
  • Pending → Can't schedule (often due to volume topology)

Critical events to find:

  • FailedAttachVolume → Volume can't attach to node
  • FailedMount → Volume attached but mount failed
  • Multi-Attach error → Volume already attached elsewhere

Step 3: Check VolumeAttachment Objects

bash
# List all volume attachments
kubectl get volumeattachment

# Detailed view
kubectl describe volumeattachment <va-name>

Key fields:

  • .status.attached: true → Successfully attached
  • .status.attached: false → Attachment failed or in progress
  • .status.attachError → Error from CSI driver or cloud provider

Step 4: Check the PersistentVolume

bash
# List all PVs (cluster-wide)
kubectl get pv

# Detailed view
kubectl describe pv <pv-name>

PV Status meanings:

  • Available → Ready to bind to a PVC
  • Bound → Bound to a PVC
  • Released → PVC deleted, needs reclaim
  • Failed → Reclamation failed

Common Issues & Quick Fixes

Issue: PVC Stuck in Pending (WaitForFirstConsumer)

Cause: StorageClass uses volumeBindingMode: WaitForFirstConsumer

Check:

bash
kubectl get storageclass <sc-name> -o yaml | grep volumeBindingMode

Fix: Create a Pod that uses the PVC. The binding happens when the Pod is scheduled.

Issue: PVC Stuck in Pending (No StorageClass)

Check:

bash
# List available storage classes
kubectl get storageclass

# Check what PVC is requesting
kubectl get pvc <pvc-name> -o yaml | grep storageClassName

Fix:

bash
# Edit PVC to use existing StorageClass
kubectl patch pvc <pvc-name> -p '{"spec":{"storageClassName":"<valid-sc-name>"}}'

Issue: Multi-Attach Error After Node Failure

Check if node is truly dead:

bash
kubectl get nodes
kubectl describe node <failed-node-name>

Fix 1: Add out-of-service taint (K8s 1.26+):

bash
kubectl taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecute

This signals the control plane that the node is permanently gone, triggering immediate volume detachment.

Fix 2: Wait for automatic detachment (6-10 minutes): The AttachDetachController will eventually force-detach after timeout.

Fix 3: Manual cleanup (last resort):

bash
# Delete the stuck VolumeAttachment
kubectl delete volumeattachment <va-name>

# If volume is still stuck, manually detach in cloud console
# AWS: EC2 → Volumes → Force Detach
# GCP: Compute Engine → Disks → Detach
# Azure: Disks → Detach

WARNING

Only force-detach if you're certain the old node is completely powered off. Otherwise, risk filesystem corruption.

Issue: CSI Driver Not Responding

Check CSI driver pods:

bash
# List CSI driver pods
kubectl get pods -n kube-system | grep csi

# Common CSI driver names:
# ebs-csi-controller, gce-pd-csi-driver, azuredisk-csi-driver

Check CSI controller logs:

bash
# Provisioner logs (handles volume creation)
kubectl logs -n kube-system <csi-controller-pod> -c csi-provisioner

# Attacher logs (handles volume attachment)
kubectl logs -n kube-system <csi-controller-pod> -c csi-attacher

Check CSI node driver logs:

bash
# Find CSI node pod on the target node
kubectl get pods -n kube-system -o wide | grep csi-node

# Check logs (handles volume mounting)
kubectl logs -n kube-system <csi-node-pod> -c csi-driver

Common CSI errors:

  • rpc error: code = Unavailable → Driver not running or unreachable
  • Access Denied → IAM/RBAC permissions missing
  • Quota Exceeded → Cloud provider limits hit

Issue: Volume Quota Exceeded

Check namespace quota:

bash
kubectl get resourcequota -n <namespace>
kubectl describe resourcequota -n <namespace>

Fix:

bash
# Delete unused PVCs
kubectl delete pvc <unused-pvc-name>

# Or request quota increase from cluster admin

Issue: StorageClass Provisioner Failed

Check provisioner configuration:

bash
kubectl get storageclass <sc-name> -o yaml

Verify provisioner is running:

bash
# For CSI drivers
kubectl get pods -n kube-system | grep <provisioner-name>

# Check external-provisioner logs
kubectl logs -n kube-system <provisioner-pod> -c csi-provisioner

Force Cleanup Procedures

Warning

These commands bypass Kubernetes' safety mechanisms. Only use when you've verified the volume is not in use and standard cleanup has failed.

Force Delete Stuck PVC

bash
# Try graceful delete first
kubectl delete pvc <pvc-name>

# If stuck in Terminating state
kubectl patch pvc <pvc-name> -p '{"metadata":{"finalizers":null}}'

Force Delete Stuck PV

bash
# Remove finalizers to allow deletion
kubectl patch pv <pv-name> -p '{"metadata":{"finalizers":null}}'

Force Delete VolumeAttachment

bash
kubectl delete volumeattachment <va-name>

Force Delete Pod Waiting for Volume

bash
# Graceful delete with zero grace period
kubectl delete pod <pod-name> --grace-period=0 --force

Validation Commands

Verify StorageClass Configuration

bash
# Check if StorageClass exists
kubectl get storageclass

# Check default StorageClass
kubectl get storageclass -o json | jq '.items[] | select(.metadata.annotations["storageclass.kubernetes.io/is-default-class"]=="true") | .metadata.name'

# Verify provisioner
kubectl get storageclass <sc-name> -o jsonpath='{.provisioner}'

Verify CSI Driver Installation

bash
# Check CSI driver
kubectl get csidrivers

# Check CSI nodes
kubectl get csinodes

# Verify driver is registered on nodes
kubectl get csinode <node-name> -o yaml

Check Volume Attachment Timeline

bash
# See when VolumeAttachment was created
kubectl get volumeattachment -o custom-columns=NAME:.metadata.name,AGE:.metadata.creationTimestamp,ATTACHED:.status.attached

# If created >5 minutes ago and still not attached, investigate

Prevention Best Practices

Use StatefulSets for Stateful Workloads

StatefulSets handle PVC lifecycle correctly:

yaml
apiVersion: apps/v1
kind: StatefulSet
spec:
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 10Gi

Set Appropriate Reclaim Policy

yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete  # or Retain for important data
volumeBindingMode: WaitForFirstConsumer

Monitor Volume Attachment Time

bash
# Alert if attachment takes >2 minutes
kubectl get volumeattachment -o json | \
  jq -r '.items[] | select(.status.attached == false) | 
  select((now - (.metadata.creationTimestamp | fromdateiso8601)) > 120) | 
  .metadata.name'

Access Modes Quick Reference

Access ModeAbbreviationMeaning
ReadWriteOnceRWOVolume mounted read-write by one node
ReadWriteManyRWXVolume mounted read-write by many nodes
ReadOnlyManyROXVolume mounted read-only by many nodes
ReadWriteOncePodRWOPVolume mounted read-write by one pod (K8s 1.22+)

Key Point

ReadWriteOnce = one node, not one pod. Multiple pods on the same node can share an RWO volume. This is why multi-attach errors occur when pods move between nodes.

Quick Reference Commands

bash
# Complete volume stack overview
kubectl get pvc,pv,volumeattachment,storageclass

# Watch volume attachments in real-time
kubectl get volumeattachment -w

# Find using PVCs by StorageClass
kubectl get pvc --all-namespaces -o json | \
  jq -r '.items[] | select(.spec.storageClassName=="fast-ssd") | 
  "\(.metadata.namespace)/\(.metadata.name)"'

# Find all PVCs in Pending state
kubectl get pvc --all-namespaces --field-selector=status.phase=Pending

# Check node volume limits (max attachable volumes per node)
kubectl get csinode <node-name> -o yaml

# List all volumes attached to a specific node
kubectl get volumeattachment -o json | \
  jq -r '.items[] | select(.spec.nodeName=="<node-name>") | .spec.source.persistentVolumeName'

Troubleshooting Checklist

When debugging volume issues, check in this order:

  1. PVC status - Is it Pending, Bound, or Lost?
  2. PVC events - What does kubectl describe pvc say?
  3. Pod status - Is it ContainerCreating or Pending?
  4. Pod events - Any FailedMount or FailedAttachVolume?
  5. VolumeAttachment - Is it attached? Any errors?
  6. StorageClass - Does it exist? Is provisioner running?
  7. CSI driver - Are controller and node pods healthy?
  8. Node status - Is the target node Ready?
  9. Resource quota - Any limits preventing creation?
  10. Cloud provider - Manual check in console if needed

Key Takeaways

  1. Always check events first - kubectl describe shows what's actually wrong
  2. WaitForFirstConsumer is normal - PVC stays Pending until Pod is created
  3. Multi-attach errors resolve automatically - Wait 6-10 minutes after node failure
  4. Use out-of-service taint for faster recovery from node failures
  5. CSI driver health is critical - Check controller and node pods
  6. Don't remove finalizers unless necessary - They prevent orphaned cloud resources
  7. ReadWriteOnce = one node - Not one pod, which causes confusion

Further Reading

Based on Kubernetes v1.35 (Timbernetes). Changelog.