Semaphore permit leaks

A semaphore is a tray of permits. Every borrower must put theirs back — one that forgets, even once, shrinks the tray forever.

The idea

A counting semaphore guards a limited pool — say four database connections, or four concurrent jobs. It hands out permits. A worker must acquire() a permit before touching the resource and release() it when done. If no permit is free, acquire() blocks and waits in line.

A permit leak is when a path acquires a permit but never releases it — usually because work throws an exception between acquire and release and there's no try/finally, so the release is skipped. Each leak permanently removes one permit. The available count drifts down, and once it hits zero every acquire() blocks forever: exhaustion.

Semaphore tray — 4 permits free free free free Worker idle no permit held Acquire queue · · · · empty — permits available
All four permits are free. The pool is healthy.

How it works

The whole bug lives in one missing keyword. When work runs between acquire and release, an exception jumps straight past the release. The permit is logically held by a worker that has already unwound and gone away — nobody owns it, nobody will ever give it back.

# BUG — release is skipped when do_work() throws
def handle():
    sem.acquire()                 # take a permit
    do_work()                     # raises -> jumps out of handle()
    sem.release()                 # never runs: the permit is leaked

# FIX — release always runs, even on the error path
def handle():
    sem.acquire()                 # take a permit
    try:
        do_work()                 # raise all you like
    finally:
        sem.release()             # the permit always goes back

The finally block is the contract: the permit returns no matter how the body exits — normal return, early return, or exception. In languages without finally, a guard object that releases in its destructor (RAII) or a with context manager does the same job.

Signals

ChoiceBuys youCosts you
try/finally vs RAII / context-manager guardExplicit and language-portableEasy to forget; the guard makes leaks structurally impossible
Bounded semaphore vs plainA bounded one throws on over-release, catching double-frees earlyA plain one silently inflates the count, hiding a different bug
Leak vs deadlock vs slow consumerLeak: available only ever falls. Deadlock: two holders wait on each other. Slow: available recovers once load easesAll three look like "it hangs" until you watch the available count over time
Bigger pool vs leak detectionMore permits delays exhaustionA leak still drains any size pool — it only buys time, not a fix

Watch out for

Worked example

A connection-pool handler does pool.acquire(), runs a query, then pool.release(). The pool holds four permits. Most requests are fine. But one query path throws on malformed input — and the author wrote acquire(); query(); release() with no try/finally. Every bad request takes a connection and never returns it. After four bad requests the available count is zero; the fifth caller blocks on acquire(), then the sixth, and the service hangs — even though traffic is normal and the database is idle. Wrapping the body in try: query() finally: pool.release() returns the connection on the error path too, and the available count stops drifting — exactly the contrast the visual walks through.

Check yourself

Available permits only ever goes down, never back up, and eventually every request hangs. Most likely cause?

Coach note: the giveaway is that available never recovers. A small pool or a slow consumer dips and rebounds; a deadlock involves mutual waiting. A one-way drift to zero is a leak. Give it another pass if that distinction feels slippery — it's the heart of the bug.