Interview Questions/Troubleshooting Scenarios/Health Check Misconfiguration Causing Flapping
BeginnerScenario
5 min

Health Check Misconfiguration Causing Flapping

Load BalancingReliabilityOperations
Advertisement
Interview Question

Instances are flapping in and out of load balancers due to aggressive health checks. How do you detect and fix this without masking real failures?

Key Points to Cover
  • Correlate LB health check logs with app readiness/liveness
  • Adjust thresholds/intervals/timeout to match latency profile
  • Differentiate startup vs steady-state checks
  • Add graceful shutdown/drain and connection draining
  • Monitor false-positive/negative rates after changes
Evaluation Rubric
Shows evidence of flapping and cause30% weight
Tunes health check parameters properly30% weight
Handles startup/shutdown gracefully20% weight
Validates reduction in flapping20% weight
Hints
  • 💡Use separate readiness and liveness probes.
Potential Follow-up Questions
  • How to set SLO-aware thresholds?
  • Should you gate deploys on health trends?
Advertisement