IntermediateSystem-Design
30 min
Design a Distributed Job Scheduler
Advertisement
Interview Question
Design a reliable, horizontally scalable scheduler (distributed cron) that supports one-off and recurring jobs with retries and idempotency.
Key Points to Cover
- Job model: due time, retry policy, idempotency keys
- Leader election or partitioned ownership; fencing tokens
- Durable storage with visibility timeouts and leases
- At-least-once execution with dedupe and DLQs
- Observability: job states, SLAs, replays, and pause/resume
Evaluation Rubric
Clear job model & idempotency25% weight
Safe partition/lease ownership25% weight
Durable storage & retries/DLQs25% weight
Operational visibility & controls25% weight
Hints
- 💡Use time wheels or priority queues for timers.
Common Pitfalls to Avoid
- ⚠️**Lack of Idempotency:** Failing to design for idempotency can lead to data corruption or inconsistent system states when jobs are retried.
- ⚠️**Single Point of Failure:** Not implementing distributed coordination or leader election for job dispatch can make the scheduler vulnerable to single-node failures.
- ⚠️**Job Loss During Failures:** Improper handling of worker failures without visibility timeouts or durable queues can result in jobs being lost indefinitely.
- ⚠️**Clock Skew:** Relying solely on precise `due_time` across distributed systems can be problematic due to clock skew; using relative timers or an event-driven approach might be more robust.
- ⚠️**Infinite Retries Without Backoff:** Aggressively retrying failed jobs without exponential backoff can overload downstream services and the scheduler itself.
Potential Follow-up Questions
- ❓How to avoid clock skew issues?
- ❓How to guarantee exactly-once for certain jobs?
Advertisement
Related Questions
Questions that share similar topics with this one
Exactly-Once Effects with the Outbox Pattern
Advanced🔬 Technical Deep Dive•5 min•Technical
PostgreSQL Replication Lag Troubleshooting
Advanced🔬 Technical Deep Dive•5 min•Technical
Design a Multi-Channel Notification Service
Intermediate🏗️ System Design•30 min•System-Design
Design an E-commerce Checkout & Cart
Advanced🏗️ System Design•45 min•System-Design
Design a Push Subscription & Pub/Sub System
Intermediate🏗️ System Design•30 min•System-Design