Queue backlog

When work arrives faster than it's done, the line only gets longer.

The idea

A queue between a producer and a consumer is a shock absorber: a brief burst lands in the buffer and drains once the worker catches up. That only holds while the consumer keeps pace. Let λ (lambda) be the arrival rate — items added per tick — and μ (mu) be the service rate — items finished per tick, one per consumer.

The moment λ > μ, the queue depth grows by λ − μ every tick — a backlog, linear in time — and wait time grows right along with it (L = λ·W, Little's law). The only real fixes are more service rate (add consumers) or less arrival rate (shed load). A bigger buffer just delays the reckoning.

Press play. Work arrives at λ = 5 per tick but only μ = 2 get served — watch the backlog climb.

How it works

One tick is one pass of this loop: take in the arrivals, then let each consumer finish one item. Depth can never go below zero — you can't serve work that isn't there — so the update clamps at zero. Wait time follows from Little's law: a backlog of depth drains at μ per tick, so a fresh item waits about depth / μ ticks.

def tick(depth, arrivals, service):       # service = number of consumers (mu)
    depth = max(0, depth + arrivals - service)
    wait  = depth / service if service else float('inf')   # Little's law: L = lambda*W
    return depth, wait

# lambda > mu  ->  depth grows by (lambda - mu) every tick — a backlog, linear in time.
# mu >= lambda ->  depth holds or drains; once depth hits 0 the system is caught up.

The levers all push on those two numbers. Autoscaling consumers raises μ. Load shedding (drop or reject low-priority work) and backpressure (tell the producer to slow down) lower λ. A bounded queue caps memory but spills the overflow as dropped work once it's full.

Cost

RegimeQueue depthWait time
μ > λ (draining)Falls toward 0Shrinks each tick
μ = λ (balanced)Flat (whatever it was)Constant ~depth/μ
λ > μ, unboundedGrows O(t) — linear in timeClimbs without bound
λ > μ, boundedPins at capacityCapped, but work is dropped

A buffer trades memory for the ability to absorb bursts — genuinely useful when overload is brief. But for a sustained λ > μ, a bigger buffer only buys time before it overflows or runs you out of memory. The cure is more μ or less λ, not more buffer.

Watch out for

Worked example

During a flash sale, a payment-webhook consumer that normally handles 200 events/sec is hit with 600/sec (λ far above μ). The queue depth climbs by ~400/sec; within minutes there are hundreds of thousands of pending events and p99 latency for a webhook goes from milliseconds to many minutes. Because dashboards showed steady throughput, nobody noticed until merchants reported orders stuck in "pending."

The fix had two moves, both straight off the levers above. First, autoscale consumers — spin up more webhook workers to raise μ above the 600/sec arrival rate so the backlog starts draining. Second, shed non-critical load — route analytics and email-receipt events to a separate low-priority queue so the critical payment-capture path gets all the new capacity. Depth peaked, then fell to zero; latency recovered. Raise the consumer slider above λ, or tick shed load, to watch the same recovery here.

Check yourself

A queue is backlogged: items arrive at λ = 8/sec and consumers finish μ = 5/sec, and it has been this way for an hour. What actually clears the backlog?

With depth held steady at 600 items and 3 consumers each finishing one item per tick, roughly how long does a newly arriving item wait?