Module 4.5: Storage Troubleshooting
Complexity:
[MEDIUM]- Diagnosing and fixing storage issuesTime to Complete: 35-45 minutes
Prerequisites: Modules 4.1-4.4 (all previous storage modules)
What You’ll Be Able to Do
Section titled “What You’ll Be Able to Do”After this module, you will be able to:
- Diagnose storage failures systematically (PVC Pending, mount errors, capacity issues, permission denied)
- Trace the storage provisioning chain from PVC → StorageClass → provisioner → PV → mount
- Fix common storage issues: stuck finalizers, orphaned PVs, filesystem corruption recovery
- Design a troubleshooting checklist for storage problems in CKA exam scenarios
Why This Module Matters
Section titled “Why This Module Matters”Storage issues are among the most common problems in Kubernetes clusters. Pods stuck in ContainerCreating, PVCs that never bind, permission errors, and capacity problems can bring applications to a halt. The CKA exam heavily tests troubleshooting skills, and storage problems appear frequently. This module gives you a systematic approach to diagnose and fix storage issues.
The Detective Analogy
Troubleshooting storage is like being a detective. The pod won’t start - that’s the crime. Your tools are
kubectl describe,kubectl logs, andkubectl get events- your magnifying glass, fingerprint kit, and witness interviews. You follow the evidence: Pod → PVC → PV → StorageClass → CSI driver. Each step reveals clues until you find the culprit.
What You’ll Learn
Section titled “What You’ll Learn”By the end of this module, you’ll be able to:
- Systematically troubleshoot storage issues
- Debug PVC binding problems
- Fix volume mount errors
- Diagnose CSI driver issues
- Resolve permissions and capacity problems
Did You Know?
Section titled “Did You Know?”- Most storage issues are misconfiguration: Wrong StorageClass name, mismatched access modes, or missing labels cause 80% of problems
- Events are your best friend:
kubectl describeshows recent events that often contain the exact error message - CSI drivers have their own logs: When the usual commands don’t help, check CSI controller and node logs
Part 1: Troubleshooting Framework
Section titled “Part 1: Troubleshooting Framework”1.1 The Storage Debug Path
Section titled “1.1 The Storage Debug Path”┌─────────────────────────────────────────────────────────────────────┐│ Storage Troubleshooting Path ││ ││ Pod Issue ││ │ ││ ▼ ││ 1. k describe pod <name> ││ └─► Check Events section ││ └─► Check volume mount errors ││ │ ││ ▼ ││ 2. k get pvc <name> ││ └─► Is STATUS "Bound"? ││ └─► If "Pending", check Events ││ │ ││ ▼ ││ 3. k get pv ││ └─► Does matching PV exist? ││ └─► Is STATUS "Available" or "Bound"? ││ │ ││ ▼ ││ 4. k get sc <storageclass> ││ └─► Does StorageClass exist? ││ └─► Is provisioner correct? ││ │ ││ ▼ ││ 5. Check CSI driver ││ └─► Is driver installed? ││ └─► Check driver pod logs ││ │└─────────────────────────────────────────────────────────────────────┘1.2 Key Commands Reference
Section titled “1.2 Key Commands Reference”# Pod-level debuggingk describe pod <pod-name>k get pod <pod-name> -o yamlk logs <pod-name>
# PVC debuggingk get pvck describe pvc <pvc-name>k get pvc <pvc-name> -o yaml
# PV debuggingk get pvk describe pv <pv-name>k get pv <pv-name> -o yaml
# StorageClass debuggingk get sck describe sc <sc-name>
# Events (often most useful!)k get events --sort-by='.lastTimestamp'k get events --field-selector involvedObject.name=<pvc-name>
# CSI debuggingk get pods -n kube-system | grep csik logs -n kube-system <csi-pod> -c <container>Part 2: PVC Binding Problems
Section titled “Part 2: PVC Binding Problems”2.1 PVC Stuck in Pending
Section titled “2.1 PVC Stuck in Pending”Symptoms: PVC shows STATUS: Pending, never becomes Bound
k get pvc# NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS# my-pvc Pending fast-ssdDebug steps:
# Step 1: Check eventsk describe pvc my-pvc# Look at Events section for errors2.2 Common Pending Causes
Section titled “2.2 Common Pending Causes”Cause 1: No matching PV exists (static provisioning)
Events: Type Reason Message ---- ------ ------- Normal FailedBinding no persistent volumes available for this claimSolution: Create a PV that matches the PVC requirements:
# Check what PVC needsk get pvc my-pvc -o yaml | grep -A5 spec:
# Create matching PV or fix PVC to match existing PVCause 2: StorageClass doesn’t exist
Events: Type Reason Message ---- ------ ------- Warning ProvisioningFailed storageclass.storage.k8s.io "fast-ssd" not foundSolution:
# List available StorageClassesk get sc
# Fix PVC to use existing StorageClassk patch pvc my-pvc -p '{"spec":{"storageClassName":"standard"}}'# Note: You may need to delete and recreate PVCCause 3: No CSI driver/provisioner
Events: Type Reason Message ---- ------ ------- Warning ProvisioningFailed failed to provision volume: no csi driverSolution: Install the required CSI driver for your storage backend
Cause 4: WaitForFirstConsumer mode
k get pvc my-pvc# STATUS: Pending (this is normal until pod uses it!)
k get sc fast-ssd -o jsonpath='{.volumeBindingMode}'# WaitForFirstConsumerSolution: This is expected behavior! Create a pod that uses the PVC, and it will bind.
Pause and predict: You see a PVC in
Pendingstatus with no error events. Before diving into debugging, what single piece of information should you check on the StorageClass that might immediately explain the Pending status as normal behavior?
2.3 Access Mode Mismatch
Section titled “2.3 Access Mode Mismatch”Symptoms: PVC won’t bind even though PV exists
k get pv# NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS# pv-1 100Gi RWO Retain Available
k get pvc# NAME STATUS ACCESS MODES STORAGECLASS# my-pvc Pending RWX manualProblem: PVC requests RWX, PV only offers RWO
Solution:
# Option 1: Change PVC to match PVk delete pvc my-pvc# Recreate with RWO
# Option 2: Create new PV with RWX (if storage supports it)2.4 StorageClass Mismatch
Section titled “2.4 StorageClass Mismatch”k get pv pv-1 -o jsonpath='{.spec.storageClassName}'# manual
k get pvc my-pvc -o jsonpath='{.spec.storageClassName}'# fastProblem: PVC and PV have different storageClassName
Solution: Align storageClassName on both resources
Part 3: Volume Mount Errors
Section titled “Part 3: Volume Mount Errors”3.1 Pod Stuck in ContainerCreating
Section titled “3.1 Pod Stuck in ContainerCreating”Symptoms: Pod stays in ContainerCreating state
k get pods# NAME READY STATUS RESTARTS AGE# my-pod 0/1 ContainerCreating 0 5mDebug:
k describe pod my-pod# Look for volume-related errors in Events3.2 Common Mount Errors
Section titled “3.2 Common Mount Errors”Error 1: PVC not found
Events: Warning FailedMount Unable to attach or mount volumes: persistentvolumeclaim "my-pvc" not foundSolution:
# Check PVC exists in same namespacek get pvc my-pvc -n <namespace>
# Fix pod spec if PVC name is wrongError 2: Volume already attached to another node
Events: Warning FailedAttachVolume Multi-Attach error for volume "pvc-xxx": Volume is already attached to node "node-1"Cause: RWO volume attached to another node (common during node failures)
Solution:
# Wait for old pod to terminate, or force deletek delete pod <old-pod> --force --grace-period=0
# If using StatefulSet, might need to delete old PV attachmentk get volumeattachmentError 3: Permission denied
Events: Warning FailedMount MountVolume.SetUp failed: mount failed: exit status 32, permission deniedSolution:
# Add securityContext to podspec: securityContext: fsGroup: 1000 # Group ID for volume containers: - name: app securityContext: runAsUser: 1000 # User IDError 4: hostPath doesn’t exist
Events: Warning FailedMount hostPath type check failed: path /data/myapp does not existSolution:
# Use DirectoryOrCreate typevolumes:- name: data hostPath: path: /data/myapp type: DirectoryOrCreate # Instead of DirectoryStop and think: A pod is stuck in
ContainerCreatingandkubectl describe podshows “Multi-Attach error for volume.” You know the volume is RWO. Before force-deleting the old pod, what should you check first? Could force-deleting cause data corruption?
3.3 Mount Timeout
Section titled “3.3 Mount Timeout”Events: Warning FailedMount Unable to attach or mount volumes: timeout expired waiting for volumes to attachCauses:
- CSI driver not responding
- Storage backend unreachable
- Node issues
Debug:
# Check CSI driver podsk get pods -n kube-system | grep csi
# Check CSI driver logsk logs -n kube-system <csi-controller-pod> -c csi-provisioner
# Check node conditionsk describe node <node-name> | grep -A5 ConditionsPart 4: Capacity Problems
Section titled “Part 4: Capacity Problems”4.1 Volume Full
Section titled “4.1 Volume Full”Symptoms: Application errors about disk space
Debug:
# Check PVC capacityk get pvc my-pvc# CAPACITY: 10Gi
# Check actual usage in podk exec my-pod -- df -h /data# Shows actual usageSolution 1: Expand PVC (if StorageClass supports it)
# Check if expansion is allowedk get sc <storageclass> -o jsonpath='{.allowVolumeExpansion}'# true
# Expand PVCk patch pvc my-pvc -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'
# Monitor expansion statusk describe pvc my-pvc | grep -A5 ConditionsSolution 2: Clean up data
k exec my-pod -- rm -rf /data/tmp/*4.2 Insufficient Capacity
Section titled “4.2 Insufficient Capacity”Events: Warning ProvisioningFailed insufficient capacityCauses:
- Storage backend is full
- Quota exceeded
- Regional capacity limits (cloud)
Debug:
# Check ResourceQuotak get resourcequota -n <namespace>
# Check LimitRangek get limitrange -n <namespace>
# For cloud, check cloud console for capacityPart 5: CSI Driver Issues
Section titled “Part 5: CSI Driver Issues”5.1 CSI Driver Not Installed
Section titled “5.1 CSI Driver Not Installed”Symptoms: PVC stuck pending, events mention CSI
k describe pvc my-pvc# Events:# Warning ProvisioningFailed error getting CSI driver nameDebug:
# List CSI driversk get csidrivers
# Check if driver pods are runningk get pods -n kube-system | grep csi
# Check CSINode objectsk get csinode5.2 CSI Driver Crashlooping
Section titled “5.2 CSI Driver Crashlooping”k get pods -n kube-system | grep csi# NAME READY STATUS RESTARTS# ebs-csi-controller-xxx 0/6 CrashLoopBackOff 5Debug:
# Check logsk logs -n kube-system ebs-csi-controller-xxx -c csi-provisionerk logs -n kube-system ebs-csi-controller-xxx -c csi-attacher
# Common causes:# - Missing cloud credentials# - Wrong IAM permissions# - Network connectivity issues5.3 CSI Driver Permissions
Section titled “5.3 CSI Driver Permissions”For cloud storage, CSI drivers need appropriate permissions:
AWS: IAM role with EBS permissions
# Check service accountk get sa -n kube-system ebs-csi-controller-sa -o yaml# Look for eks.amazonaws.com/role-arn annotationGCP: Workload Identity or node service account Azure: Managed Identity or service principal
Pause and predict: You see a CSI controller pod in
CrashLoopBackOff. The pod logs show “failed to assume IAM role.” The EBS CSI driver was working yesterday. What could have changed, and where would you look to verify the IAM configuration?
Part 6: Quick Reference: Error Messages
Section titled “Part 6: Quick Reference: Error Messages”6.1 Error Message Cheatsheet
Section titled “6.1 Error Message Cheatsheet”| Error Message | Likely Cause | Quick Fix |
|---|---|---|
no persistent volumes available | No matching PV for static provisioning | Create matching PV |
storageclass not found | Wrong StorageClass name | Check k get sc |
waiting for first consumer | WaitForFirstConsumer mode | Create pod using PVC |
Multi-Attach error | RWO volume on multiple nodes | Delete old pod first |
permission denied | Filesystem permissions | Set fsGroup/runAsUser |
path does not exist | hostPath missing | Use DirectoryOrCreate |
timeout waiting for volumes | CSI driver issue | Check CSI pods/logs |
insufficient capacity | No space in storage backend | Expand or clean up |
volume is already attached | Stale volume attachment | Delete VolumeAttachment |
6.2 Quick Debug Commands
Section titled “6.2 Quick Debug Commands”# One-liner for common checksecho "=== PVCs ===" && k get pvc && \echo "=== PVs ===" && k get pv && \echo "=== StorageClasses ===" && k get sc && \echo "=== Recent Events ===" && k get events --sort-by='.lastTimestamp' | tail -20Common Mistakes
Section titled “Common Mistakes”| Mistake | Problem | Solution |
|---|---|---|
| Not checking Events | Missing the actual error message | Always k describe first |
| Ignoring namespace | PVC in different namespace than pod | Verify namespace matches |
| Forgetting WaitForFirstConsumer | Thinking PVC is broken when Pending | Check volumeBindingMode |
| Deleting PVC before pod | Pod can’t unmount properly | Delete pod first |
| Not checking CSI logs | Generic errors hide real cause | Check CSI driver pods |
| Wrong YAML indentation | Volume spec invalid | Use --dry-run=client -o yaml |
Q1: Systematic Triage
Section titled “Q1: Systematic Triage”A developer reports their pod has been stuck in ContainerCreating for 10 minutes. They have already tried deleting and recreating the pod twice. Walk through the exact sequence of commands you would run to diagnose this, starting from the pod and working down through the storage stack. At each step, what specific information are you looking for?
Answer
Start with kubectl describe pod <name> and look at the Events section for volume-related errors (FailedMount, FailedAttach, timeout). This tells you whether the issue is a missing PVC, a mount error, or an attach error. Next, kubectl get pvc <name> to check if the PVC is Bound — if it shows Pending, the problem is upstream. If Pending, kubectl describe pvc <name> reveals why: “storageclass not found,” “no persistent volumes available,” or “waiting for first consumer.” Then check kubectl get sc to verify the StorageClass exists and has the right provisioner. Finally, kubectl get pods -n kube-system | grep csi to verify the CSI driver is running, and check its logs with kubectl logs. Each step narrows the problem: pod events tell you the symptom, PVC status tells you where the chain breaks, and CSI logs reveal the root cause.
Q2: The Pending PVC That Is Not Broken
Section titled “Q2: The Pending PVC That Is Not Broken”A new team member files an urgent ticket: “PVC data-volume has been Pending for 2 hours, nothing is working!” You check kubectl describe pvc data-volume and see no error events — just a normal “waiting for first consumer to be created” message. The team member insists something is wrong because other PVCs bind immediately. How do you explain what is happening, and what should the team member do next?
Answer
The PVC is using a StorageClass with volumeBindingMode: WaitForFirstConsumer. This means the PV is deliberately not provisioned until a pod that uses the PVC is scheduled. This is the correct behavior for zone-specific storage (like AWS EBS or GCE PD) to ensure the volume is created in the same availability zone as the pod. The “other PVCs” that bind immediately likely use a StorageClass with Immediate binding mode or an NFS-type provisioner. The team member needs to create a pod that references this PVC in its volumes section. Once the scheduler assigns the pod to a node, the provisioner will create the PV in the correct zone, the PVC will bind, and the pod will start. This is not a bug — it is a feature that prevents cross-zone mount failures.
Q3: Node Failure and Multi-Attach
Section titled “Q3: Node Failure and Multi-Attach”A 3-node cluster loses node-2 (it goes NotReady). A StatefulSet pod on node-2 used an RWO EBS volume. Kubernetes tries to reschedule the pod to node-3, but the new pod is stuck in ContainerCreating with “Multi-Attach error for volume: Volume is already exclusively attached to node-2.” The old pod shows Terminating but will not complete because node-2 is down. What are the steps to recover, and what are the risks of force-deleting the old pod?
Answer
The RWO volume is still attached to the unreachable node-2, and the new pod on node-3 cannot attach it simultaneously. Recovery steps: (1) Verify node-2 is truly down: kubectl get node node-2 and check conditions. (2) Force-delete the stuck pod: kubectl delete pod <name> --force --grace-period=0. This removes the pod from the API server but does not cleanly unmount the volume on node-2. (3) Delete the stale VolumeAttachment: kubectl get volumeattachment, find the one for this volume, and kubectl delete volumeattachment <name>. This tells the control plane to release the volume. (4) The new pod on node-3 should now be able to attach the volume. Risk: force-deleting without clean unmount can cause data corruption if the application was mid-write when node-2 went down. EBS volumes have built-in consistency, but application-level data (like database WAL files) may be incomplete. After recovery, run an application-level integrity check (e.g., fsck or database repair).
Q4: Permission Denied After Migration
Section titled “Q4: Permission Denied After Migration”A team migrates their application to a new container image that runs as uid 1000 (previously ran as root). After the migration, the pod starts but the application logs show “Permission denied: cannot write to /data/app.log.” The PVC mounts successfully and kubectl describe pod shows no errors. What is the root cause, and what is the correct fix without reverting to running as root?
Answer
The root cause is a filesystem ownership mismatch. The PV’s files were created by the previous container running as root (uid 0), so they are owned by root. The new container runs as uid 1000 and cannot write to root-owned files. The correct fix is to set fsGroup in the pod’s security context: spec.securityContext.fsGroup: 1000. This tells the kubelet to recursively change the group ownership of all files in the mounted volume to GID 1000, and set the setgid bit so new files inherit this group. Additionally, set runAsUser: 1000 and runAsNonRoot: true in the container’s securityContext. Do NOT revert to running as root — that would be a security regression. For volumes with many files, the fsGroup change can cause slow pod startup on first mount, which is a known trade-off.
Q5: Dynamic Provisioning Silently Failing
Section titled “Q5: Dynamic Provisioning Silently Failing”A PVC referencing StorageClass premium-ssd stays Pending. kubectl describe pvc shows the event: “waiting for a volume to be created, either by external provisioner ‘ebs.csi.aws.com’ or manually created by system administrator.” The StorageClass exists and looks correct. Other PVCs using the default StorageClass work fine. Where do you look next, and what are the three most likely causes?
Answer
The error means the PVC reached the provisioner but the provisioner has not acted. Next step: check the CSI driver pods in kube-system with kubectl get pods -n kube-system | grep csi and then check their logs with kubectl logs -n kube-system <csi-controller-pod> -c csi-provisioner. The three most likely causes: (1) CSI controller is crashlooping or not running — the provisioner sidecar cannot process the request. Check pod status and restart counts. (2) IAM/permission issue — the CSI driver lacks permission to create EBS volumes (e.g., expired IAM role, wrong IRSA annotation on the service account, or missing ec2:CreateVolume permission). Check kubectl get sa -n kube-system ebs-csi-controller-sa -o yaml for the role-arn annotation. (3) Invalid StorageClass parameters — the parameters section contains values the provisioner cannot use (wrong type, invalid kmsKeyId, or unsupported iopsPerGB value). The CSI provisioner logs will show the specific error from the AWS API.
Q6: Full Troubleshooting Scenario
Section titled “Q6: Full Troubleshooting Scenario”On exam day, you are given this scenario: A Deployment with 2 replicas is failing. Pod-1 is Running, Pod-2 is ContainerCreating. Both reference PVC app-data. The PVC is Bound to a PV with access mode RWO. The StorageClass uses volumeBindingMode: WaitForFirstConsumer and reclaimPolicy: Delete. kubectl get pods -o wide shows Pod-1 on node-A and Pod-2 scheduled to node-B. What is the problem, and provide two distinct solutions (one quick fix for the exam, one proper production fix).
Answer
The problem is that both pods share one PVC with RWO access mode, and they are on different nodes. RWO means the volume can only be attached to one node at a time. Pod-1 on node-A has the volume, so Pod-2 on node-B gets a Multi-Attach error. Quick exam fix: Scale the Deployment to 1 replica (kubectl scale deploy <name> --replicas=1) so only one pod needs the volume. Or, add a nodeAffinity/nodeSelector to force all pods onto the same node (RWO allows multiple pods on the same node). Proper production fix: Convert the Deployment to a StatefulSet with volumeClaimTemplates, giving each replica its own independent PVC and PV. Alternatively, if the workload truly needs shared storage, switch to a storage backend that supports ReadWriteMany (RWX) like NFS or EFS, and update the PVC access mode. The reclaimPolicy: Delete is also risky for a stateful app — consider changing to Retain on the PV.
Hands-On Exercise: Storage Troubleshooting Scenarios
Section titled “Hands-On Exercise: Storage Troubleshooting Scenarios”# Create namespacek create ns storage-debug
# We'll create broken configurations and fix themScenario 1: PVC Won’t Bind (Wrong StorageClass)
Section titled “Scenario 1: PVC Won’t Bind (Wrong StorageClass)”# Create a PVC with wrong StorageClasscat <<EOF | k apply -f -apiVersion: v1kind: PersistentVolumeClaimmetadata: name: broken-pvc-1 namespace: storage-debugspec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: nonexistent-classEOF
# Check statusk get pvc -n storage-debug broken-pvc-1
# Debugk describe pvc -n storage-debug broken-pvc-1# Look for: storageclass "nonexistent-class" not found
# Fix: List available StorageClasses and recreate PVCk get sck delete pvc -n storage-debug broken-pvc-1
# Recreate with correct StorageClass (use your cluster's SC)cat <<EOF | k apply -f -apiVersion: v1kind: PersistentVolumeClaimmetadata: name: broken-pvc-1 namespace: storage-debugspec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: standard # Use actual SC nameEOFScenario 2: Pod Can’t Mount (Wrong PVC Name)
Section titled “Scenario 2: Pod Can’t Mount (Wrong PVC Name)”# Create a valid PVCcat <<EOF | k apply -f -apiVersion: v1kind: PersistentVolumeClaimmetadata: name: correct-pvc namespace: storage-debugspec: accessModes: - ReadWriteOnce resources: requests: storage: 1GiEOF
# Create pod with wrong PVC referencecat <<EOF | k apply -f -apiVersion: v1kind: Podmetadata: name: broken-pod-1 namespace: storage-debugspec: containers: - name: app image: busybox:1.36 command: ['sleep', '3600'] volumeMounts: - name: data mountPath: /data volumes: - name: data persistentVolumeClaim: claimName: wrong-pvc-name # This doesn't exist!EOF
# Check pod statusk get pod -n storage-debug broken-pod-1# STATUS: ContainerCreating
# Debugk describe pod -n storage-debug broken-pod-1# Look for: persistentvolumeclaim "wrong-pvc-name" not found
# Fixk delete pod -n storage-debug broken-pod-1
cat <<EOF | k apply -f -apiVersion: v1kind: Podmetadata: name: broken-pod-1 namespace: storage-debugspec: containers: - name: app image: busybox:1.36 command: ['sleep', '3600'] volumeMounts: - name: data mountPath: /data volumes: - name: data persistentVolumeClaim: claimName: correct-pvc # Fixed!EOFScenario 3: hostPath Type Error
Section titled “Scenario 3: hostPath Type Error”# Create pod with strict hostPath typecat <<EOF | k apply -f -apiVersion: v1kind: Podmetadata: name: broken-pod-2 namespace: storage-debugspec: containers: - name: app image: busybox:1.36 command: ['sleep', '3600'] volumeMounts: - name: data mountPath: /data volumes: - name: data hostPath: path: /tmp/nonexistent-path-xyz type: Directory # Fails if directory doesn't existEOF
# Debugk describe pod -n storage-debug broken-pod-2# May show: hostPath type check failed
# Fix: Use DirectoryOrCreatek delete pod -n storage-debug broken-pod-2
cat <<EOF | k apply -f -apiVersion: v1kind: Podmetadata: name: broken-pod-2 namespace: storage-debugspec: containers: - name: app image: busybox:1.36 command: ['sleep', '3600'] volumeMounts: - name: data mountPath: /data volumes: - name: data hostPath: path: /tmp/nonexistent-path-xyz type: DirectoryOrCreate # Creates if missingEOFSuccess Criteria
Section titled “Success Criteria”- Identified StorageClass error from events
- Fixed PVC to use correct StorageClass
- Identified wrong PVC name from pod events
- Fixed pod to reference correct PVC
- Understood hostPath type requirements
Cleanup
Section titled “Cleanup”k delete ns storage-debugPractice Drills
Section titled “Practice Drills”Drill 1: Quick Status Check (1 min)
Section titled “Drill 1: Quick Status Check (1 min)”# Task: Check if all PVCs in namespace are Boundk get pvc -n <namespace># Look for any not showing "Bound"Drill 2: Find PVC Events (1 min)
Section titled “Drill 2: Find PVC Events (1 min)”# Task: Get events for a specific PVCk describe pvc <pvc-name> | grep -A20 EventsDrill 3: Check Volume in Pod (2 min)
Section titled “Drill 3: Check Volume in Pod (2 min)”# Task: Verify a volume is mounted correctly in podk exec <pod> -- df -hk exec <pod> -- ls -la /dataDrill 4: Debug ContainerCreating (2 min)
Section titled “Drill 4: Debug ContainerCreating (2 min)”# Task: Find why pod is stuck in ContainerCreatingk describe pod <pod-name># Check Events for mount errorsDrill 5: Check CSI Driver Status (2 min)
Section titled “Drill 5: Check CSI Driver Status (2 min)”# Task: Verify CSI driver is runningk get pods -n kube-system | grep csik get csidriversDrill 6: Find Matching PV (2 min)
Section titled “Drill 6: Find Matching PV (2 min)”# Task: Find why PVC won't bind to existing PVk get pvc <pvc-name> -o yaml | grep -E 'storage:|accessModes:|storageClassName:'k get pv <pv-name> -o yaml | grep -E 'storage:|accessModes:|storageClassName:'# Compare valuesDrill 7: Check VolumeAttachments (1 min)
Section titled “Drill 7: Check VolumeAttachments (1 min)”# Task: List all volume attachmentsk get volumeattachment# Useful for debugging Multi-Attach errorsDrill 8: Get Recent Storage Events (1 min)
Section titled “Drill 8: Get Recent Storage Events (1 min)”# Task: Get recent events related to PVCsk get events --field-selector reason=FailedBindingk get events --field-selector reason=ProvisioningFailedSummary: Storage Troubleshooting Checklist
Section titled “Summary: Storage Troubleshooting Checklist”□ Pod stuck? → k describe pod → check Events□ PVC Pending? → k describe pvc → check Events□ StorageClass exists? → k get sc□ PV available? → k get pv□ Access modes match? → Compare PVC and PV□ StorageClassName match? → Compare PVC and PV□ CSI driver running? → k get pods -n kube-system | grep csi□ Permissions issue? → Check securityContext fsGroup□ Capacity issue? → Check quotas and storage backendNext Steps
Section titled “Next Steps”Congratulations! You’ve completed Part 4: Storage. You should now be able to:
- Configure volumes (emptyDir, hostPath, projected)
- Work with PersistentVolumes and PersistentVolumeClaims
- Use StorageClasses for dynamic provisioning
- Create and restore from volume snapshots
- Troubleshoot common storage issues
Continue to the Part 4 Cumulative Quiz to test your knowledge, then proceed to Part 5: Troubleshooting.