AdvancedTechnical
5 min
Cloud Disaster Recovery Planning
CloudDisaster RecoveryResilience
Advertisement
Interview Question
How would you design a disaster recovery (DR) strategy for a critical cloud-hosted application?
Key Points to Cover
- Define RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
- Choose active-active, active-passive, or backup-restore strategies
- Automate failover using DNS, traffic managers, or orchestration
- Test DR plan regularly with game days
Evaluation Rubric
Defines RTO/RPO clearly30% weight
Chooses appropriate DR strategy30% weight
Automates failover effectively20% weight
Emphasizes testing and drills20% weight
Hints
- 💡Think chaos engineering and backup frequency.
Common Pitfalls to Avoid
- ⚠️Not involving business stakeholders in defining RTO/RPO, leading to misaligned expectations and over- or under-investment.
- ⚠️Relying solely on manual failover procedures, which are prone to human error and significantly increase RTO.
- ⚠️Inadequate or infrequent DR testing, resulting in a false sense of security and a DR plan that fails when needed.
- ⚠️Failing to replicate or synchronize data sufficiently to meet the defined RPO, leading to unacceptable data loss.
- ⚠️Neglecting to consider all dependencies, including third-party services, network configurations, and IAM permissions, which can derail a DR attempt.
Potential Follow-up Questions
- ❓What are common pitfalls in DR testing?
- ❓How to manage DR costs?
Advertisement