Advertisement
Interview Question
A critical API has intermittent p99 latency spikes without increased error rates. How would you isolate the cause and stabilize tail latency?
Key Points to Cover
- Correlate spikes with GC, CPU steal, IO wait, or network retransmits
- Instrument queue depths, thread pools, and timeouts
- Add hedged requests or request collapsing where safe
- Apply connection pooling and tune TCP parameters
Evaluation Rubric
Correlates latency spikes to system signals35% weight
Instruments queues/pools/timeouts25% weight
Proposes tail-latency mitigation patterns20% weight
Validates with trace/metrics before/after20% weight
Hints
- 💡Check head-of-line blocking and Nagle/Delayed ACKs.
Common Pitfalls to Avoid
- ⚠️Focusing only on average latency and missing tail latency issues.
- ⚠️Insufficient instrumentation, leading to a lack of visibility into internal application behavior.
- ⚠️Making broad, unverified assumptions about the cause without data correlation.
- ⚠️Implementing mitigation strategies without fully understanding their impact on consistency or other performance metrics.
- ⚠️Not considering the impact of external dependencies or upstream/downstream services on the API's latency.
Potential Follow-up Questions
- ❓When to use hedged requests?
- ❓How do retries affect queues?
Advertisement
Related Questions
Questions that share similar topics with this one
HTTP Keep-Alive & Connection Pooling
Intermediate📞 Phone Screen•2 min•Phone
Purpose of a CDN
Beginner📞 Phone Screen•2 min•Phone
Slow CDN Performance
Intermediate🔧 Troubleshooting Scenarios•10 min•Scenario
What does /24 mean in CIDR?
Beginner📞 Phone Screen•1 min•Phone
Common HTTP Status Codes
Beginner📞 Phone Screen•2 min•Phone