IntermediateScenario
10 min

Node Disk Full

LinuxStorageKubernetesTroubleshooting
Advertisement
Interview Question

One of your production nodes is reporting 100% disk usage and workloads are failing. How do you investigate and resolve this?

Key Points to Cover
  • Run df/du to identify which filesystem/paths are full
  • Check logs, tmp dirs, container overlay storage
  • Delete/rotate large logs or stale container images
  • Expand disk/attach new volume if persistent issue
  • Add monitoring to detect disk usage trends earlier
Evaluation Rubric
Identifies full disk using system tools30% weight
Finds culprit dirs/files30% weight
Applies cleanup/expansion appropriately20% weight
Mentions preventive monitoring20% weight
Hints
  • 💡Check containerd/docker overlay2 layers.
Common Pitfalls to Avoid
  • ⚠️Failing to immediately identify the specific filesystem that is full.
  • ⚠️Spending too much time on log analysis without first using `df` and `du` to pinpoint the largest consumers.
  • ⚠️Not considering container runtime storage (image layers, volumes) as a potential cause in a Kubernetes environment.
  • ⚠️Making broad assumptions about the cause without systematically investigating the node and its workloads.
  • ⚠️Deleting files indiscriminately without understanding their purpose, potentially causing further system instability.
Potential Follow-up Questions
  • What if /var/lib/docker fills up?
  • How to prevent inode exhaustion?
Advertisement