Interview Questions/System Design/Design a Distributed Job Scheduler
IntermediateSystem-Design
30 min

Design a Distributed Job Scheduler

SchedulingReliabilityDatabasesMessaging
Advertisement
Interview Question

Design a reliable, horizontally scalable scheduler (distributed cron) that supports one-off and recurring jobs with retries and idempotency.

Key Points to Cover
  • Job model: due time, retry policy, idempotency keys
  • Leader election or partitioned ownership; fencing tokens
  • Durable storage with visibility timeouts and leases
  • At-least-once execution with dedupe and DLQs
  • Observability: job states, SLAs, replays, and pause/resume
Evaluation Rubric
Clear job model & idempotency25% weight
Safe partition/lease ownership25% weight
Durable storage & retries/DLQs25% weight
Operational visibility & controls25% weight
Hints
  • 💡Use time wheels or priority queues for timers.
Potential Follow-up Questions
  • How to avoid clock skew issues?
  • How to guarantee exactly-once for certain jobs?
Advertisement