Advertisement
Interview Question
Your Kubernetes etcd cluster shows high fsync latency, causing API server slowness. How do you troubleshoot and resolve?
Key Points to Cover
- Check etcd metrics for fsync/disk latency
- Correlate with node disk performance and saturation
- Migrate to SSD/provisioned IOPS or isolate disks
- Tune etcd compaction and defragmentation
- Monitor disk latency continuously with alerts
Evaluation Rubric
Collects etcd disk latency metrics30% weight
Links disk performance to cluster slowness30% weight
Suggests SSD/tuning solutions20% weight
Mentions monitoring/alerting20% weight
Hints
- 💡etcd heavily depends on low-latency disk writes.
Common Pitfalls to Avoid
- ⚠️Focusing solely on etcd metrics without investigating node-level disk performance.
- ⚠️Not understanding the underlying storage technology and its limitations (e.g., shared storage contention, cloud provider IOPS limits).
- ⚠️Ignoring the potential impact of other processes on the etcd node's disk I/O.
- ⚠️Making broad system-wide storage changes without isolating the problem to etcd first.
- ⚠️Failing to verify the resolution and establish ongoing monitoring after implementing a fix.
Potential Follow-up Questions
- ❓How to run etcd benchmarks?
- ❓What about etcd compaction tuning?
Advertisement
Related Questions
Questions that share similar topics with this one
Node Disk Full
Intermediate🔧 Troubleshooting Scenarios•10 min•Scenario
Kubernetes Pod Stuck in Pending
Intermediate📞 Phone Screen•2 min•Phone
Kubernetes Service Types
Intermediate📞 Phone Screen•2 min•Phone
ConfigMap vs Secret in Kubernetes
Intermediate📞 Phone Screen•2 min•Phone
K8s Readiness vs Liveness Probes
Intermediate📞 Phone Screen•2 min•Phone