Interview Questions/Technical Deep Dive/Diagnosing Intermittent p99 Latency Spikes
AdvancedTechnical
5 min

Diagnosing Intermittent p99 Latency Spikes

PerformanceSRENetworking
Advertisement
Interview Question

A critical API has intermittent p99 latency spikes without increased error rates. How would you isolate the cause and stabilize tail latency?

Key Points to Cover
  • Correlate spikes with GC, CPU steal, IO wait, or network retransmits
  • Instrument queue depths, thread pools, and timeouts
  • Add hedged requests or request collapsing where safe
  • Apply connection pooling and tune TCP parameters
Evaluation Rubric
Correlates latency spikes to system signals35% weight
Instruments queues/pools/timeouts25% weight
Proposes tail-latency mitigation patterns20% weight
Validates with trace/metrics before/after20% weight
Hints
  • 💡Check head-of-line blocking and Nagle/Delayed ACKs.
Potential Follow-up Questions
  • When to use hedged requests?
  • How do retries affect queues?
Advertisement