Interview Questions/Troubleshooting Scenarios/Sudden 5xx Spike on Web Tier
IntermediateScenario
10 min

Sudden 5xx Spike on Web Tier

HTTPWebObservabilityIncident Response
Advertisement
Interview Question

Production dashboards show a sharp increase in HTTP 5xx responses from the web tier over the last 10 minutes, but traffic volume is normal. Describe your step-by-step triage and remediation.

Key Points to Cover
  • Confirm scope with metrics (5xx rate, latency, saturation) and traces
  • Check recent deploys/feature flags and roll back if correlated
  • Inspect app/web logs and upstream dependency health
  • Validate config/env changes and TLS/cert validity
  • Apply circuit breakers/retries; widen capacity if needed
Evaluation Rubric
Uses metrics/logs/traces to scope incident30% weight
Considers recent changes and flag rollbacks25% weight
Checks upstream/downstream dependencies25% weight
Applies safe mitigation and rollback20% weight
Hints
  • 💡Look for config drift, expired certs, or downstream timeouts.
Potential Follow-up Questions
  • How would you prevent regressions?
  • Which SLOs/alerts would you set?
Advertisement