IntermediateSystem-Design
30 min
Design a Distributed Job Scheduler
SchedulingReliabilityDatabasesMessaging
Advertisement
Interview Question
Design a reliable, horizontally scalable scheduler (distributed cron) that supports one-off and recurring jobs with retries and idempotency.
Key Points to Cover
- Job model: due time, retry policy, idempotency keys
- Leader election or partitioned ownership; fencing tokens
- Durable storage with visibility timeouts and leases
- At-least-once execution with dedupe and DLQs
- Observability: job states, SLAs, replays, and pause/resume
Evaluation Rubric
Clear job model & idempotency25% weight
Safe partition/lease ownership25% weight
Durable storage & retries/DLQs25% weight
Operational visibility & controls25% weight
Hints
- 💡Use time wheels or priority queues for timers.
Potential Follow-up Questions
- ❓How to avoid clock skew issues?
- ❓How to guarantee exactly-once for certain jobs?
Advertisement