Queue backlog

When work arrives faster than it's done, the line only gets longer.

The idea

A queue between a producer and a consumer is a shock absorber: a brief burst lands in the buffer and drains once the worker catches up. That only holds while the consumer keeps pace. Let λ (lambda) be the arrival rate — items added per tick — and μ (mu) be the service rate — items finished per tick, one per consumer.

The moment λ > μ, the queue depth grows by λ − μ every tick — a backlog, linear in time — and wait time grows right along with it (L = λ·W, Little's law). The only real fixes are more service rate (add consumers) or less arrival rate (shed load). A bigger buffer just delays the reckoning.

Press play. Work arrives at λ = 5 per tick but only μ = 2 get served — watch the backlog climb.

consumers (μ) 2 shed load (cap λ at μ)

How it works

One tick is one pass of this loop: take in the arrivals, then let each consumer finish one item. Depth can never go below zero — you can't serve work that isn't there — so the update clamps at zero. Wait time follows from Little's law: a backlog of depth drains at μ per tick, so a fresh item waits about depth / μ ticks.

def tick(depth, arrivals, service):       # service = number of consumers (mu)
    depth = max(0, depth + arrivals - service)
    wait  = depth / service if service else float('inf')   # Little's law: L = lambda*W
    return depth, wait

# lambda > mu  ->  depth grows by (lambda - mu) every tick — a backlog, linear in time.
# mu >= lambda ->  depth holds or drains; once depth hits 0 the system is caught up.

The levers all push on those two numbers. Autoscaling consumers raises μ. Load shedding (drop or reject low-priority work) and backpressure (tell the producer to slow down) lower λ. A bounded queue caps memory but spills the overflow as dropped work once it's full.

Cost

Regime	Queue depth	Wait time
μ > λ (draining)	Falls toward 0	Shrinks each tick
μ = λ (balanced)	Flat (whatever it was)	Constant ~depth/μ
λ > μ, unbounded	Grows O(t) — linear in time	Climbs without bound
λ > μ, bounded	Pins at capacity	Capped, but work is dropped

A buffer trades memory for the ability to absorb bursts — genuinely useful when overload is brief. But for a sustained λ > μ, a bigger buffer only buys time before it overflows or runs you out of memory. The cure is more μ or less λ, not more buffer.

Watch out for

Unbounded queues hide the problem. An in-memory queue with no cap looks healthy on a throughput graph right up until it exhausts memory and the process is killed. Bound the queue so overload surfaces as backpressure, not a sudden OOM.
A bigger buffer is not a fix. Doubling the queue size while λ > μ only postpones collapse — the depth still grows linearly, just from a higher start. Sustained overload needs more consumers or less load.
Measuring throughput, not depth or age. Items served per second can look great while the oldest item in the queue is hours old. Alert on queue depth and the age of the head item, not just throughput.
Head-of-line blocking. One slow or stuck item at the front holds up everything behind it, so effective μ collapses even though workers look "busy." Use per-item timeouts and let independent work pass.
Retries amplifying arrivals. When consumers fall behind and time out, naive retries re-enqueue the same work — raising λ exactly when you can least afford it. This feedback loop turns a small backlog into a runaway one; cap retries and back off.

Worked example

During a flash sale, a payment-webhook consumer that normally handles 200 events/sec is hit with 600/sec (λ far above μ). The queue depth climbs by ~400/sec; within minutes there are hundreds of thousands of pending events and p99 latency for a webhook goes from milliseconds to many minutes. Because dashboards showed steady throughput, nobody noticed until merchants reported orders stuck in "pending."

The fix had two moves, both straight off the levers above. First, autoscale consumers — spin up more webhook workers to raise μ above the 600/sec arrival rate so the backlog starts draining. Second, shed non-critical load — route analytics and email-receipt events to a separate low-priority queue so the critical payment-capture path gets all the new capacity. Depth peaked, then fell to zero; latency recovered. Raise the consumer slider above λ, or tick shed load, to watch the same recovery here.

Check yourself

A queue is backlogged: items arrive at λ = 8/sec and consumers finish μ = 5/sec, and it has been this way for an hour. What actually clears the backlog?

With depth held steady at 600 items and 3 consumers each finishing one item per tick, roughly how long does a newly arriving item wait?