Cloud API Throttling (429) Causing Failures

CloudRate LimitingReliability

Interview Question

Background jobs calling a cloud provider API start failing with 429 Too Many Requests during peak hours. How do you stabilize and prevent it?

Key Points to Cover

Evaluation Rubric

Quantifies usage vs quotas30% weight

Implements robust retry/backoff30% weight

Batches/shards to reduce bursts20% weight

Monitors 429s and adjusts proactively20% weight

Hints

Common Pitfalls to Avoid

⚠️Not implementing exponential backoff, leading to immediate retries that exacerbate the problem.
⚠️Forgetting to add jitter to backoff strategies, resulting in synchronized retries (thundering herd).
⚠️Ignoring or not correctly implementing the 'Retry-After' header provided by the API.
⚠️Assuming a simple retry is sufficient without understanding the underlying burst patterns or quota limits.
⚠️Focusing solely on client-side fixes without exploring upstream solutions like quota increases or workload optimization.

Potential Follow-up Questions