Performance
All interview questions related to Performance
What is HTTP keep-alive and why does connection pooling improve performance?
Name two key improvements HTTP/2 brings over HTTP/1.1 and why they matter.
What do the 1, 5, and 15 minute load averages indicate on Linux, and how do you interpret them relative to CPU cores?
What problem does a CDN solve and how does it improve user-perceived performance?
What is API rate limiting, why is it important, and how is it commonly implemented?
Name two popular load testing tools and explain their typical use cases.
Your production Kubernetes cluster shows unusually high CPU usage in multiple pods. Walk me through your investigation and mitigation steps.
How do you scale a message queue system like Kafka or RabbitMQ to handle millions of messages per second?
Your Java services show p99 latency spikes during peak traffic. How would you analyze and tune JVM garbage collection to reduce pause times?
In a streaming system under bursty load, how do you implement backpressure to prevent overload and cascading failures?
A critical API has intermittent p99 latency spikes without increased error rates. How would you isolate the cause and stabilize tail latency?
Explain different caching strategies and their trade-offs for high-performance applications.
Your application's database response times have increased by 300% over the last hour. Users are complaining about slow page loads. How do you investigate and resolve this?
Users in one region report very slow page loads, but the rest of the world is fine. How do you troubleshoot this CDN performance issue?
Your Kafka consumer groups are showing high lag and messages are processing slowly. How do you investigate and remediate this?
Your APIβs average latency jumped from 100ms to 2s without an increase in traffic. How would you investigate?
Your RabbitMQ/SQS queue has millions of unprocessed messages. What steps do you take?
One node in your cluster shows 100% CPU usage with context switching spikes. How do you troubleshoot?
A cache eviction triggers a surge of requests to the origin, causing overload. How do you diagnose and prevent cache stampede?
Nightly maintenance jobs overlap and create resource contention and backlog. Explain your triage and prevention.
Services on certain VMs show latency spikes correlated with CPU steal time. How do you investigate and mitigate?
Sudden latency spikes correlate with saturated server thread pools. How do you diagnose and remediate safely?
A misconfigured deployment invalidates most CDN cache objects at once, flooding the origin. Whatβs your triage and prevention plan?
An application shows sudden latency spikes due to cloud storage IOPS limits being hit. How do you confirm and fix?