BeginnerScenario
5 min
Health Check Misconfiguration Causing Flapping
Advertisement
Interview Question
Instances are flapping in and out of load balancers due to aggressive health checks. How do you detect and fix this without masking real failures?
Key Points to Cover
- Correlate LB health check logs with app readiness/liveness
- Adjust thresholds/intervals/timeout to match latency profile
- Differentiate startup vs steady-state checks
- Add graceful shutdown/drain and connection draining
- Monitor false-positive/negative rates after changes
Evaluation Rubric
Shows evidence of flapping and cause30% weight
Tunes health check parameters properly30% weight
Handles startup/shutdown gracefully20% weight
Validates reduction in flapping20% weight
Hints
- 💡Use separate readiness and liveness probes.
Common Pitfalls to Avoid
- ⚠️Simply increasing the health check timeout or interval without understanding the underlying cause.
- ⚠️Failing to correlate load balancer logs with application-specific logs.
- ⚠️Treating startup health checks the same as steady-state health checks.
- ⚠️Not considering the application's inherent latency and resource constraints when configuring health checks.
- ⚠️Ignoring the possibility that the health check endpoint itself might be faulty or experiencing performance issues within the application.
Potential Follow-up Questions
- ❓How to set SLO-aware thresholds?
- ❓Should you gate deploys on health trends?
Advertisement
Related Questions
Questions that share similar topics with this one
Types of Load Balancers
Intermediate📞 Phone Screen•2 min•Phone
K8s Readiness vs Liveness Probes
Intermediate📞 Phone Screen•2 min•Phone
PostgreSQL Replication Lag Troubleshooting
Advanced🔬 Technical Deep Dive•5 min•Technical
Exactly-Once Effects with the Outbox Pattern
Advanced🔬 Technical Deep Dive•5 min•Technical
Designing Backpressure in Reactive Systems
Advanced🔬 Technical Deep Dive•5 min•Technical