Code Room
System designMedium
Question
Design a job/task queue with a pool of worker processes for a background-processing platform: producers enqueue ~20k jobs/sec, jobs take 50ms–5min, and the system must guarantee at-least-once execution even if a worker dies mid-job. Requirements: no job is silently lost, a stuck job is retried, and a single poison job can't block its queue forever. Describe enqueue/dequeue, how in-flight jobs are tracked, and how concurrent workers avoid running the same job twice in the common case.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.