Code Room
System designMedium
Question
Design a distributed delayed-job / scheduled-task system that can hold hundreds of millions of pending jobs, each scheduled to fire at an arbitrary future time (seconds from now to a year out) — e.g., 'send this reminder in 3 days', 'expire this hold in 15 minutes', 'retry this charge at 9am tomorrow'. Jobs must fire close to on time (within a few seconds), exactly-ish once, and the system must absorb spikes where millions of jobs are due in the same minute (e.g., a midnight batch). How do you store, schedule, and dispatch these at scale?
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.