AdvancedSystem-Design
45 min
Design a Time-Series Metrics Database
DatabasesStorageObservabilityCompression
Advertisement
Interview Question
Design a horizontally scalable time-series database for metrics with high-cardinality support, rollups, and retention policies.
Key Points to Cover
- Write path: ingestion, batching, WAL, backpressure
- Schema/indexing: labels/tags, cardinality control, inverted index
- Compression & storage layout (LSM/columnar, downsampling, rollups)
- Query path: aggregations, range queries, precomputed materializations
- Retention & tiered storage (hot/warm/cold), compaction strategies
- Multi-tenant isolation and quota/limits
Evaluation Rubric
High-throughput, durable ingest path25% weight
Cardinality-aware schema/indexing25% weight
Efficient query/rollup/compaction25% weight
Retention/tiering and multi-tenancy25% weight
Hints
- 💡Cardinality explosions often come from unbounded labels.
Common Pitfalls to Avoid
- ⚠️**Overlooking cardinality limits:** Failing to implement effective cardinality control can lead to excessive memory usage and slow queries as indexes grow unbounded.
- ⚠️**Inefficient indexing for high-cardinality data:** Using traditional relational database indexing for high-cardinality labels will not scale and will result in poor query performance.
- ⚠️**Underestimating write amplification in LSM-trees:** Without proper tuning of compactions and write buffering, LSM-trees can suffer from high write amplification, impacting performance.
- ⚠️**Lack of backpressure awareness:** Not implementing backpressure at various stages of the ingestion pipeline can lead to cascading failures and data loss under load.
- ⚠️**Ignoring query patterns in storage design:** Designing storage layout without considering typical aggregation and filtering patterns will result in inefficient data retrieval and slow query responses.
Potential Follow-up Questions
- ❓How do you detect label cardinality abuse?
- ❓How would you store exemplars/traces?
Advertisement