Code Room
System designHard
Question
Design a scheduled-jobs / cron platform that lets ~10K internal services register recurring jobs (cron expressions) and one-off delayed tasks. Scale: ~2M scheduled executions/day, jobs range from 100ms to 30min, some are critical (billing runs) and must fire exactly on time. Requirements: a job must fire even if a scheduler node crashes, must NOT fire twice (or must be idempotent), late jobs (missed window during an outage) need a defined catch-up policy, and the system must scale horizontally without two schedulers double-firing the same job.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.