Single-threaded saturation

One worker can only do one thing at a time — when arrivals outpace its service rate, the queue backs up without bound.

The idea

A single-threaded event loop or worker — a Node.js process, a Redis server, one CPU draining a queue — handles tasks strictly one at a time. While its throughput stays ahead of the arrival rate, the queue stays short and waits stay flat.

The moment a slow or blocking task appears, or arrivals climb past the one-task-per-tick service rate, utilization presses toward 100%. Queueing theory is unforgiving here: as utilization approaches 1, the expected wait runs off toward infinity. One slow task head-of-line blocks everyone behind it. And piling more requests onto an already-saturated thread does not help — you can only shed load, make the task faster, or parallelize across more workers.

See it work

Press play to watch one worker drain its queue.

How it works

A single consumer pulls one task off the front of a FIFO queue and runs it to completion before touching the next. There is no second worker to overlap with, so the time spent inside process() is time the loop cannot accept or advance anything else.

queue = FIFO()        # tasks waiting their turn

while True:
    task = queue.pop_front()   # take the oldest waiter
    process(task)              # run it to completion — ONE at a time
    # nothing else happens until process() returns.
    # if process() blocks (slow I/O, heavy CPU), the
    # whole loop stalls and every queued task waits.

# steady state: if arrival_rate > service_rate, the
# queue grows without bound. utilization → 1 ⇒ wait → ∞.

Trade-offs

Aspect	Cost	Signal to watch
Throughput ceiling	Hard cap at `1 / mean service time` — one thread, no overlap	Completed-per-second flattening while arrivals keep rising
Latency under load	Grows unboundedly once utilization passes roughly 80%	p99 wait climbing far faster than the mean
Head-of-line blocking	One slow task stalls every task behind it, fast or not	Queue age spiking while a single task runs long
Simplicity	No locks, no races — one task's state at a time, easy to reason about	Tempting to lean on it past the point it can keep up
Mitigations	Shed load, parallelize, shard the queue, or offload CPU work off the loop	Queue depth still rising after you "added capacity" the wrong way

Watch out for

Running near 100% utilization "to be efficient" — you leave no headroom, so the first burst makes the queue explode.
A single blocking or CPU-heavy call (a sync hash, a big JSON parse) freezing the whole event loop while it runs.
Unbounded queues that hide the problem: latency creeps up silently until memory runs out and the process is killed.
Assuming retries help — re-sending work to a saturated thread adds arrivals and makes saturation worse, not better.
Not measuring queue depth. Without it you only notice saturation once users feel the latency.

Worked example

Say one worker handles 1000 req/s on average — its mean service time is 1 ms, so its throughput ceiling is exactly 1000/s. At 950 req/s utilization is 95%: the queue is non-empty but bounded, and waits are noticeable but stable. At 1000 req/s you are at 100% — the worker never idles, has zero slack, and any jitter or one slow task pushes the queue up with no chance to recover. At 1050 req/s arrivals beat service by 5%: every second adds ~50 net tasks the worker can never claw back, so the queue grows linearly without bound and head-of-line latency runs away — exactly the runaway you see in the animation once arrivals cross one per tick.

Check yourself

1. Arrivals exceed the service rate by 5% and stay there. What does the steady-state queue length do?

2. One worker is already saturated. You send it twice as many requests. What happens to latency?