IntermediateScenario
10 min
Node Disk Full
LinuxStorageKubernetesTroubleshooting
Advertisement
Interview Question
One of your production nodes is reporting 100% disk usage and workloads are failing. How do you investigate and resolve this?
Key Points to Cover
- Run df/du to identify which filesystem/paths are full
- Check logs, tmp dirs, container overlay storage
- Delete/rotate large logs or stale container images
- Expand disk/attach new volume if persistent issue
- Add monitoring to detect disk usage trends earlier
Evaluation Rubric
Identifies full disk using system tools30% weight
Finds culprit dirs/files30% weight
Applies cleanup/expansion appropriately20% weight
Mentions preventive monitoring20% weight
Hints
- 💡Check containerd/docker overlay2 layers.
Common Pitfalls to Avoid
- ⚠️Failing to immediately identify the specific filesystem that is full.
- ⚠️Spending too much time on log analysis without first using `df` and `du` to pinpoint the largest consumers.
- ⚠️Not considering container runtime storage (image layers, volumes) as a potential cause in a Kubernetes environment.
- ⚠️Making broad assumptions about the cause without systematically investigating the node and its workloads.
- ⚠️Deleting files indiscriminately without understanding their purpose, potentially causing further system instability.
Potential Follow-up Questions
- ❓What if /var/lib/docker fills up?
- ❓How to prevent inode exhaustion?
Advertisement