Advertisement
Interview Question
How do you scale a message queue system like Kafka or RabbitMQ to handle millions of messages per second?
Key Points to Cover
- Partition topics across multiple brokers
- Tune producer batching and acks
- Ensure consumer groups are balanced
- Monitor lag and rebalance as needed
- Optimize disk throughput and replication factor
Evaluation Rubric
Uses partitioning effectively30% weight
Optimizes producers/acks30% weight
Ensures consumer scaling20% weight
Optimizes infra for throughput20% weight
Hints
- 💡Think about disk IO and partition parallelism.
Common Pitfalls to Avoid
- ⚠️**Underestimating Network Bandwidth:** Assuming internal network capacity is sufficient without proper measurement, leading to dropped messages or high latency.
- ⚠️**Improper Partitioning Strategy:** Choosing too few or too many partitions, leading to uneven load distribution or excessive overhead.
- ⚠️**Ignoring Producer/Consumer Latency:** Focusing solely on broker throughput while overlooking the latency introduced by application-level producers and consumers.
- ⚠️**Lack of Idempotency in Consumers:** Not designing consumers to handle duplicate messages gracefully, leading to data corruption when retries occur during scaling or failures.
- ⚠️**Neglecting Disk I/O Bottlenecks:** Overlooking the impact of slow disk performance on message persistence, especially in high-write scenarios, which can choke the entire system.
Potential Follow-up Questions
- ❓How to monitor consumer lag?
- ❓What happens if a broker fails?
Advertisement
Related Questions
Questions that share similar topics with this one
Design a Push Subscription & Pub/Sub System
Intermediate🏗️ System Design•30 min•System-Design
Kafka Consumer Lag
Advanced🔧 Troubleshooting Scenarios•15 min•Scenario
Message Queue Backlog
Intermediate🔧 Troubleshooting Scenarios•10 min•Scenario
HTTP Keep-Alive & Connection Pooling
Intermediate📞 Phone Screen•2 min•Phone
HTTP/1.1 vs HTTP/2
Intermediate📞 Phone Screen•2 min•Phone