AdvancedScenario
15 min

Cloud Provider Outage

Advertisement
Interview Question

Your cloud provider reports an ongoing outage in one region, affecting your services. How do you respond?

Key Points to Cover
  • Verify scope of outage with provider status/metrics
  • Failover to healthy regions if multi-region setup exists
  • Communicate with stakeholders and update status page
  • Reduce blast radius by disabling impacted features
  • Perform postmortem and resilience improvements
Evaluation Rubric
Confirms outage impact clearly30% weight
Executes failover or feature disablement30% weight
Communicates status effectively20% weight
Plans for long-term resilience20% weight
Hints
  • 💡Always prepare for regional outages.
Common Pitfalls to Avoid
  • ⚠️Relying solely on the cloud provider's status page without independent verification of service impact.
  • ⚠️Delaying or failing to initiate failover procedures in a multi-region setup.
  • ⚠️Under-communicating or providing infrequent, vague updates to stakeholders.
  • ⚠️Not actively engaging with the cloud provider's support for detailed information or escalation.
  • ⚠️Skipping or conducting a superficial post-incident review, missing opportunities for improvement.
Potential Follow-up Questions
  • How do you design for cross-region failover?
  • What about DNS-based routing?
Advertisement