Interview Questions/System Design/Design a Cloud Data Warehouse
AdvancedSystem-Design
45 min

Design a Cloud Data Warehouse

Data WarehouseStorageETLAnalytics
Advertisement
Interview Question

Design a data warehouse platform supporting petabyte-scale storage, ELT/ETL pipelines, query federation, and cost controls.

Key Points to Cover
  • Columnar storage and MPP architecture
  • ETL/ELT pipelines, batch vs streaming loads
  • Query federation and optimizer design
  • Concurrency scaling, workload management
  • Governance: lineage, RBAC, cost monitoring
Evaluation Rubric
Efficient columnar storage & MPP25% weight
Robust ETL/ELT design25% weight
Strong federation & optimizer plan25% weight
Governance & cost management25% weight
Hints
  • 💡Leverage separation of storage and compute.
Common Pitfalls to Avoid
  • ⚠️Underestimating the complexity and cost of data governance and security at petabyte scale.
  • ⚠️Failing to adequately plan for data volume growth and scalability limitations of chosen technologies.
  • ⚠️Over-reliance on a single ETL/ELT tool without considering diverse source system requirements.
  • ⚠️Neglecting query optimization and performance tuning, leading to high compute costs and slow analytics.
  • ⚠️Lack of robust monitoring and alerting, resulting in undetected performance degradations or cost overruns.
Potential Follow-up Questions
  • How do you handle schema evolution?
  • How to isolate workloads between teams?
Advertisement