AdvancedSystem-Design
45 min
Design a Real-Time Analytics Pipeline
StreamingDataMessagingOLAP
Advertisement
Interview Question
Design a pipeline to ingest, process, and query billions of events per day with second-level latency and exactly-once semantics.
Key Points to Cover
- Ingestion: Kafka/Kinesis with partitions sized for throughput
- Processing: Flink/Spark Structured Streaming with checkpoints
- Storage: hot (ClickHouse/Druid/BigQuery) + cold (S3 + Parquet)
- Semantics: idempotency/outbox, watermarking, late data handling
- Serving: pre-aggregations, rollups, multi-tenant isolation
Evaluation Rubric
High-throughput ingestion design25% weight
Fault-tolerant streaming processing25% weight
Hot/cold storage and rollups25% weight
Low-latency query serving and isolation25% weight
Hints
- 💡Plan for backfills and schema evolution.
Potential Follow-up Questions
- ❓How to handle out-of-order events?
- ❓How to guarantee exactly-once?
Advertisement