Interview Questions/System Design/Design a Real-Time Analytics Pipeline
AdvancedSystem-Design
45 min

Design a Real-Time Analytics Pipeline

StreamingDataMessagingOLAP
Advertisement
Interview Question

Design a pipeline to ingest, process, and query billions of events per day with second-level latency and exactly-once semantics.

Key Points to Cover
  • Ingestion: Kafka/Kinesis with partitions sized for throughput
  • Processing: Flink/Spark Structured Streaming with checkpoints
  • Storage: hot (ClickHouse/Druid/BigQuery) + cold (S3 + Parquet)
  • Semantics: idempotency/outbox, watermarking, late data handling
  • Serving: pre-aggregations, rollups, multi-tenant isolation
Evaluation Rubric
High-throughput ingestion design25% weight
Fault-tolerant streaming processing25% weight
Hot/cold storage and rollups25% weight
Low-latency query serving and isolation25% weight
Hints
  • 💡Plan for backfills and schema evolution.
Potential Follow-up Questions
  • How to handle out-of-order events?
  • How to guarantee exactly-once?
Advertisement