IntermediateScenario
10 min

Node CPU Thrashing

LinuxPerformanceTroubleshooting
Advertisement
Interview Question

One node in your cluster shows 100% CPU usage with context switching spikes. How do you troubleshoot?

Key Points to Cover
  • Run top/htop/mpstat to confirm CPU saturation
  • Check for high context switches or interrupts
  • Identify runaway processes or noisy neighbors
  • Adjust CPU limits or reschedule workloads
  • Tune kernel params if systemic
Evaluation Rubric
Uses CPU monitoring tools30% weight
Finds process/neighbors causing thrash30% weight
Mitigates via limits/rescheduling20% weight
Mentions tuning or safeguards20% weight
Hints
  • 💡Noisy neighbor issues common in shared clusters.
Potential Follow-up Questions
  • How do you prevent CPU starvation?
  • What about CPU pinning?
Advertisement