Single-threaded saturation

One worker can only do one thing at a time — when arrivals outpace its service rate, the queue backs up without bound.

The idea

A single-threaded event loop or worker — a Node.js process, a Redis server, one CPU draining a queue — handles tasks strictly one at a time. While its throughput stays ahead of the arrival rate, the queue stays short and waits stay flat.

The moment a slow or blocking task appears, or arrivals climb past the one-task-per-tick service rate, utilization presses toward 100%. Queueing theory is unforgiving here: as utilization approaches 1, the expected wait runs off toward infinity. One slow task head-of-line blocks everyone behind it. And piling more requests onto an already-saturated thread does not help — you can only shed load, make the task faster, or parallelize across more workers.

See it work

Press play to watch one worker drain its queue.

How it works

A single consumer pulls one task off the front of a FIFO queue and runs it to completion before touching the next. There is no second worker to overlap with, so the time spent inside process() is time the loop cannot accept or advance anything else.

queue = FIFO()        # tasks waiting their turn

while True:
    task = queue.pop_front()   # take the oldest waiter
    process(task)              # run it to completion — ONE at a time
    # nothing else happens until process() returns.
    # if process() blocks (slow I/O, heavy CPU), the
    # whole loop stalls and every queued task waits.

# steady state: if arrival_rate > service_rate, the
# queue grows without bound. utilization → 1 ⇒ wait → ∞.

Trade-offs

AspectCostSignal to watch
Throughput ceilingHard cap at 1 / mean service time — one thread, no overlapCompleted-per-second flattening while arrivals keep rising
Latency under loadGrows unboundedly once utilization passes roughly 80%p99 wait climbing far faster than the mean
Head-of-line blockingOne slow task stalls every task behind it, fast or notQueue age spiking while a single task runs long
SimplicityNo locks, no races — one task's state at a time, easy to reason aboutTempting to lean on it past the point it can keep up
MitigationsShed load, parallelize, shard the queue, or offload CPU work off the loopQueue depth still rising after you "added capacity" the wrong way

Watch out for

Worked example

Say one worker handles 1000 req/s on average — its mean service time is 1 ms, so its throughput ceiling is exactly 1000/s. At 950 req/s utilization is 95%: the queue is non-empty but bounded, and waits are noticeable but stable. At 1000 req/s you are at 100% — the worker never idles, has zero slack, and any jitter or one slow task pushes the queue up with no chance to recover. At 1050 req/s arrivals beat service by 5%: every second adds ~50 net tasks the worker can never claw back, so the queue grows linearly without bound and head-of-line latency runs away — exactly the runaway you see in the animation once arrivals cross one per tick.

Check yourself

1. Arrivals exceed the service rate by 5% and stay there. What does the steady-state queue length do?

2. One worker is already saturated. You send it twice as many requests. What happens to latency?