AdvancedTechnical
5 min
Diagnosing Intermittent p99 Latency Spikes
PerformanceSRENetworking
Advertisement
Interview Question
A critical API has intermittent p99 latency spikes without increased error rates. How would you isolate the cause and stabilize tail latency?
Key Points to Cover
- Correlate spikes with GC, CPU steal, IO wait, or network retransmits
- Instrument queue depths, thread pools, and timeouts
- Add hedged requests or request collapsing where safe
- Apply connection pooling and tune TCP parameters
Evaluation Rubric
Correlates latency spikes to system signals35% weight
Instruments queues/pools/timeouts25% weight
Proposes tail-latency mitigation patterns20% weight
Validates with trace/metrics before/after20% weight
Hints
- 💡Check head-of-line blocking and Nagle/Delayed ACKs.
Potential Follow-up Questions
- ❓When to use hedged requests?
- ❓How do retries affect queues?
Advertisement