Health Check Misconfiguration Causing Flapping

Interview Question

Instances are flapping in and out of load balancers due to aggressive health checks. How do you detect and fix this without masking real failures?

Key Points to Cover

Evaluation Rubric

Shows evidence of flapping and cause30% weight

Tunes health check parameters properly30% weight

Handles startup/shutdown gracefully20% weight

Validates reduction in flapping20% weight

Hints

Common Pitfalls to Avoid

⚠️Simply increasing the health check timeout or interval without understanding the underlying cause.
⚠️Failing to correlate load balancer logs with application-specific logs.
⚠️Treating startup health checks the same as steady-state health checks.
⚠️Not considering the application's inherent latency and resource constraints when configuring health checks.
⚠️Ignoring the possibility that the health check endpoint itself might be faulty or experiencing performance issues within the application.

Potential Follow-up Questions