One worker can only do one thing at a time — when arrivals outpace its service rate, the queue backs up without bound.
A single-threaded event loop or worker — a Node.js process, a Redis server, one CPU draining a queue — handles tasks strictly one at a time. While its throughput stays ahead of the arrival rate, the queue stays short and waits stay flat.
The moment a slow or blocking task appears, or arrivals climb past the one-task-per-tick service rate, utilization presses toward 100%. Queueing theory is unforgiving here: as utilization approaches 1, the expected wait runs off toward infinity. One slow task head-of-line blocks everyone behind it. And piling more requests onto an already-saturated thread does not help — you can only shed load, make the task faster, or parallelize across more workers.
A single consumer pulls one task off the front of a FIFO queue and runs it to completion before touching the next. There is no second worker to overlap with, so the time spent inside process() is time the loop cannot accept or advance anything else.
queue = FIFO() # tasks waiting their turn
while True:
task = queue.pop_front() # take the oldest waiter
process(task) # run it to completion — ONE at a time
# nothing else happens until process() returns.
# if process() blocks (slow I/O, heavy CPU), the
# whole loop stalls and every queued task waits.
# steady state: if arrival_rate > service_rate, the
# queue grows without bound. utilization → 1 ⇒ wait → ∞.
| Aspect | Cost | Signal to watch |
|---|---|---|
| Throughput ceiling | Hard cap at 1 / mean service time — one thread, no overlap | Completed-per-second flattening while arrivals keep rising |
| Latency under load | Grows unboundedly once utilization passes roughly 80% | p99 wait climbing far faster than the mean |
| Head-of-line blocking | One slow task stalls every task behind it, fast or not | Queue age spiking while a single task runs long |
| Simplicity | No locks, no races — one task's state at a time, easy to reason about | Tempting to lean on it past the point it can keep up |
| Mitigations | Shed load, parallelize, shard the queue, or offload CPU work off the loop | Queue depth still rising after you "added capacity" the wrong way |
Say one worker handles 1000 req/s on average — its mean service time is 1 ms, so its throughput ceiling is exactly 1000/s. At 950 req/s utilization is 95%: the queue is non-empty but bounded, and waits are noticeable but stable. At 1000 req/s you are at 100% — the worker never idles, has zero slack, and any jitter or one slow task pushes the queue up with no chance to recover. At 1050 req/s arrivals beat service by 5%: every second adds ~50 net tasks the worker can never claw back, so the queue grows linearly without bound and head-of-line latency runs away — exactly the runaway you see in the animation once arrivals cross one per tick.
1. Arrivals exceed the service rate by 5% and stay there. What does the steady-state queue length do?
2. One worker is already saturated. You send it twice as many requests. What happens to latency?