Kubernetes Cluster Discovery & Storage Investigation Playbook

Kubernetes Cluster Discovery & Storage Investigation

A practical field guide for when you land on an unfamiliar Kubernetes cluster and need to understand what's running, where data lives, and how to recover it.

Phase 1 — Cluster Scope

First thing — understand what you're dealing with at the cluster level.

bash

kubectl cluster-info
kubectl version
kubectl get nodes -o wide
kubectl top nodes
kubectl describe nodes | grep -E "Taints|Roles|Capacity|Allocatable|CPU|Memory"

Phase 2 — Namespace Inventory

Get a sense of scale before diving into individual resources.

bash

kubectl get namespaces
kubectl get all -A | wc -l
kubectl get pods -A --no-headers | awk '{print $1}' | sort | uniq -c | sort -rn

Phase 3 — Workload Overview

Map out everything that's running.

bash

kubectl get pods -A -o wide
kubectl get pods -A --field-selector=status.phase!=Running
kubectl get deployments -A
kubectl get statefulsets -A
kubectl get daemonsets -A
kubectl get jobs -A
kubectl get cronjobs -A

Phase 4 — Unhealthy & Problem Pods

Find what's broken before anything else.

bash

kubectl get pods -A | grep -vE "Running|Completed"
kubectl get pods -A | grep -E "CrashLoop|Error|Pending|Evicted|OOMKilled"
kubectl get pods -A -o json | jq '.items[] | select(.status.containerStatuses[]?.restartCount > 5) | .metadata.name'
kubectl top pods -A --sort-by=cpu | head -20
kubectl top pods -A --sort-by=memory | head -20

Phase 5 — Networking

bash

kubectl get svc -A
kubectl get ingress -A
kubectl get networkpolicies -A
kubectl get endpoints -A | grep "<none>"
kubectl get svc -A | grep -v ClusterIP

Phase 6 — Storage

bash

kubectl get pv -A
kubectl get pvc -A
kubectl get pvc -A | grep -v Bound
kubectl get storageclass

Phase 7 — Config & Secrets

bash

kubectl get configmaps -A
kubectl get secrets -A
kubectl get secrets -A | grep -v "kubernetes.io/service-account"

Phase 8 — RBAC & Security

bash

kubectl get roles -A
kubectl get rolebindings -A
kubectl get clusterroles | grep -v "system:"
kubectl get clusterrolebindings | grep -v "system:"
kubectl auth can-i --list
kubectl get serviceaccounts -A

Phase 9 — Events (the most underused)

Events tell you what the cluster has been doing. Always check these.

bash

kubectl get events -A --sort-by='.lastTimestamp'
kubectl get events -A --field-selector=type=Warning
kubectl get events -A --sort-by='.lastTimestamp' | tail -30

Phase 10 — Resource Pressure

Find pods with no resource limits — these are ticking time bombs.

bash

kubectl describe nodes | grep -A5 "Allocated resources"
kubectl get pods -A -o json | jq '.items[] | select(.spec.containers[].resources.limits == null) | .metadata.name'
kubectl get limitrange -A
kubectl get resourcequota -A

Phase 11 — Add-ons & Platform

bash

kubectl get pods -n kube-system
kubectl get pods -n monitoring 2>/dev/null
kubectl get pods -n ingress-nginx 2>/dev/null
kubectl get pods -n cert-manager 2>/dev/null
helm list -A 2>/dev/null

Phase 12 — One-shot Triage Command

When you need a quick answer on what's broken:

bash

kubectl get pods -A -o wide | grep -vE "Running|Completed" | sort -k4

Storage Deep Dive

1. Which Pod is Using What Storage

bash

kubectl get pods -A -o json | jq '.items[] | {pod: .metadata.name, ns: .metadata.namespace, volumes: [.spec.volumes[]? | select(.persistentVolumeClaim) | .persistentVolumeClaim.claimName]}'

kubectl get pvc -A
kubectl get pvc -n <namespace> -o wide

# Map pod -> PVC -> PV in one view
kubectl get pods -n <namespace> -o json | jq '.items[] | {pod: .metadata.name, pvcs: [.spec.volumes[]?.persistentVolumeClaim?.claimName // empty]}'

2. Where is Data Actually Stored + Protocol

bash

# PVC -> PV binding
kubectl get pvc -n <namespace> <pvc-name> -o yaml | grep -E "volumeName|storageClass|accessModes"

# PV details — where it actually lives
kubectl get pv <pv-name> -o yaml

The answer is in .spec — look for one of these blocks:

Field	Meaning
`hostPath`	Local path on a specific node
`nfs`	NFS server + exported path
`csi`	Cloud/vendor CSI driver
`emptyDir`	RAM or node temp disk — not persistent
`awsEBS` / `gcePD` / `azureDisk`	Cloud block storage

bash

# Check what type a PV is
kubectl get pv <pv-name> -o json | jq '.spec | keys'

# Extract the key details
kubectl get pv <pv-name> -o json | jq '{driver: .spec.csi?.driver, path: .spec.hostPath?.path, nfs: .spec.nfs}'

# StorageClass tells you the provisioner and protocol
kubectl describe storageclass <name>
kubectl get storageclass -o yaml | grep provisioner

3. Will the Data Persist?

bash

# Reclaim policy: Retain = data survives PVC deletion, Delete = data gone
kubectl get pv <pv-name> -o json | jq '{reclaimPolicy: .spec.persistentVolumeReclaimPolicy, volumeMode: .spec.volumeMode}'

# emptyDir = dies with the pod, never persists
kubectl get pod <pod-name> -n <namespace> -o json | jq '.spec.volumes[] | select(.emptyDir) | .name'

# Check if PVC is owned by a parent resource (gets deleted with it)
kubectl get pvc <pvc-name> -n <namespace> -o json | jq '.metadata.ownerReferences'

# Full persistence overview for all PVs
kubectl get pv -o json | jq '.items[] | {name: .metadata.name, reclaim: .spec.persistentVolumeReclaimPolicy, status: .status.phase}'

Quick rule of thumb:

emptyDir → data gone when pod dies
hostPath → data survives pod restarts but tied to one node
PVC with reclaimPolicy: Delete → data gone when PVC deleted
PVC with reclaimPolicy: Retain → data survives PVC deletion, PV must be manually reclaimed

4. Full Persistence Picture — One Command

bash

kubectl get pv -o json | jq '.items[] | {
  name: .metadata.name,
  reclaim: .spec.persistentVolumeReclaimPolicy,
  type: (.spec | keys | map(select(. != "accessModes" and . != "capacity" and . != "claimRef" and . != "volumeMode" and . != "persistentVolumeReclaimPolicy" and . != "storageClassName" and . != "nodeAffinity")))[0],
  claim: .spec.claimRef.name,
  namespace: .spec.claimRef.namespace
}'

Data Recovery — Copy Data & Restore to a New Pod

Step 1 — Find the Data Path and Node

bash

kubectl get pv <pv-name> -o json | jq '.spec.hostPath.path // .spec.nfs // .spec.csi'
kubectl get pod <pod-name> -n <namespace> -o json | jq '.spec.nodeName'

Step 2 — Copy Data Out

bash

# Option A: kubectl cp (easiest, works for running pods)
kubectl cp <namespace>/<pod-name>:/var/lib/mysql ./mysql-backup

# Option B: exec + tar stream
kubectl exec -n <namespace> <pod-name> -- tar czf - /var/lib/mysql | tar xzf - -C ./mysql-backup

# Option C: proper database dumps (always preferred for DBs)

# MySQL
kubectl exec -n <namespace> <pod-name> -- mysqldump -u root -p<pass> --all-databases > dump.sql

# PostgreSQL
kubectl exec -n <namespace> <pod-name> -- pg_dumpall -U postgres > dump.sql

# MongoDB
kubectl exec -n <namespace> <pod-name> -- mongodump --out /tmp/mongodump
kubectl cp <namespace>/<pod-name>:/tmp/mongodump ./mongodump

Step 3 — Create PV/PVC Pointing at Backup Data

yaml

# pv-restore.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: restore-pv
spec:
  capacity:
    storage: 10Gi
  accessModes: [ReadWriteOnce]
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: /path/to/your/mysql-backup
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: restore-pvc
  namespace: default
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 10Gi
  volumeName: restore-pv   # bind directly to the PV above

Step 4 — Spin Up Pod Using That PVC

yaml

# restore-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: mysql-restore
  namespace: default
spec:
  containers:
  - name: mysql
    image: mysql:8.0
    env:
    - name: MYSQL_ROOT_PASSWORD
      value: "yourpassword"
    volumeMounts:
    - name: data
      mountPath: /var/lib/mysql
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: restore-pvc

bash

kubectl apply -f pv-restore.yaml
kubectl apply -f restore-pod.yaml

# Wait for pod to be ready
kubectl wait --for=condition=Ready pod/mysql-restore --timeout=60s

# Verify data is there
kubectl exec -it mysql-restore -- mysql -u root -p -e "show databases;"

Step 5 — If Restoring from SQL Dump

bash

# Push dump into the running pod
kubectl exec -i mysql-restore -- mysql -u root -pyourpassword < dump.sql

# Verify
kubectl exec -it mysql-restore -- mysql -u root -p -e "show databases;"

Quick Reference — Storage Decision Matrix

Volume Type	Survives Pod Restart	Survives Node Reboot	Survives PVC Delete	Notes
`emptyDir`	✗	✗	✗	Temp scratch space only
`hostPath`	✓	✓	✓	Tied to one node
PVC + `Delete` policy	✓	✓	✗	Default for most cloud provisioners
PVC + `Retain` policy	✓	✓	✓	Data survives, PV must be reclaimed manually
`emptyDir` with `medium: Memory`	✗	✗	✗	RAM disk — fastest, most volatile

Kubernetes Cluster Discovery & Storage Investigation

Phase 1 — Cluster Scope ​

Phase 2 — Namespace Inventory ​

Phase 3 — Workload Overview ​

Phase 4 — Unhealthy & Problem Pods ​

Phase 5 — Networking ​

Phase 6 — Storage ​

Phase 7 — Config & Secrets ​

Phase 8 — RBAC & Security ​

Phase 9 — Events (the most underused) ​

Phase 10 — Resource Pressure ​

Phase 11 — Add-ons & Platform ​

Phase 12 — One-shot Triage Command ​

Storage Deep Dive ​

1. Which Pod is Using What Storage ​

2. Where is Data Actually Stored + Protocol ​

3. Will the Data Persist? ​

4. Full Persistence Picture — One Command ​

Data Recovery — Copy Data & Restore to a New Pod ​

Step 1 — Find the Data Path and Node ​

Step 2 — Copy Data Out ​

Step 3 — Create PV/PVC Pointing at Backup Data ​

Step 4 — Spin Up Pod Using That PVC ​

Step 5 — If Restoring from SQL Dump ​

Quick Reference — Storage Decision Matrix ​

Phase 1 — Cluster Scope

Phase 2 — Namespace Inventory

Phase 3 — Workload Overview

Phase 4 — Unhealthy & Problem Pods

Phase 5 — Networking

Phase 6 — Storage

Phase 7 — Config & Secrets

Phase 8 — RBAC & Security

Phase 9 — Events (the most underused)

Phase 10 — Resource Pressure

Phase 11 — Add-ons & Platform

Phase 12 — One-shot Triage Command

Storage Deep Dive

1. Which Pod is Using What Storage

2. Where is Data Actually Stored + Protocol

3. Will the Data Persist?

4. Full Persistence Picture — One Command

Data Recovery — Copy Data & Restore to a New Pod

Step 1 — Find the Data Path and Node

Step 2 — Copy Data Out

Step 3 — Create PV/PVC Pointing at Backup Data

Step 4 — Spin Up Pod Using That PVC

Step 5 — If Restoring from SQL Dump

Quick Reference — Storage Decision Matrix