A semaphore is a tray of permits. Every borrower must put theirs back — one that forgets, even once, shrinks the tray forever.
A counting semaphore guards a limited pool — say four database connections, or four concurrent jobs. It hands out permits. A worker must acquire() a permit before touching the resource and release() it when done. If no permit is free, acquire() blocks and waits in line.
A permit leak is when a path acquires a permit but never releases it — usually because work throws an exception between acquire and release and there's no try/finally, so the release is skipped. Each leak permanently removes one permit. The available count drifts down, and once it hits zero every acquire() blocks forever: exhaustion.
The whole bug lives in one missing keyword. When work runs between acquire and release, an exception jumps straight past the release. The permit is logically held by a worker that has already unwound and gone away — nobody owns it, nobody will ever give it back.
# BUG — release is skipped when do_work() throws
def handle():
sem.acquire() # take a permit
do_work() # raises -> jumps out of handle()
sem.release() # never runs: the permit is leaked
# FIX — release always runs, even on the error path
def handle():
sem.acquire() # take a permit
try:
do_work() # raise all you like
finally:
sem.release() # the permit always goes back
The finally block is the contract: the permit returns no matter how the body exits — normal return, early return, or exception. In languages without finally, a guard object that releases in its destructor (RAII) or a with context manager does the same job.
| Choice | Buys you | Costs you |
|---|---|---|
try/finally vs RAII / context-manager guard | Explicit and language-portable | Easy to forget; the guard makes leaks structurally impossible |
| Bounded semaphore vs plain | A bounded one throws on over-release, catching double-frees early | A plain one silently inflates the count, hiding a different bug |
| Leak vs deadlock vs slow consumer | Leak: available only ever falls. Deadlock: two holders wait on each other. Slow: available recovers once load eases | All three look like "it hangs" until you watch the available count over time |
| Bigger pool vs leak detection | More permits delays exhaustion | A leak still drains any size pool — it only buys time, not a fix |
acquire() with no try/finally around the work — the classic leak. The happy path returns the permit; the error path keeps it.return or break between acquire and release. It exits before the release line just as surely as an exception does.acquire() with no timeout. A bounded wait fails fast and surfaces exhaustion instead of hanging the request thread forever.A connection-pool handler does pool.acquire(), runs a query, then pool.release(). The pool holds four permits. Most requests are fine. But one query path throws on malformed input — and the author wrote acquire(); query(); release() with no try/finally. Every bad request takes a connection and never returns it. After four bad requests the available count is zero; the fifth caller blocks on acquire(), then the sixth, and the service hangs — even though traffic is normal and the database is idle. Wrapping the body in try: query() finally: pool.release() returns the connection on the error path too, and the available count stops drifting — exactly the contrast the visual walks through.
Available permits only ever goes down, never back up, and eventually every request hangs. Most likely cause?
Coach note: the giveaway is that available never recovers. A small pool or a slow consumer dips and rebounds; a deadlock involves mutual waiting. A one-way drift to zero is a leak. Give it another pass if that distinction feels slippery — it's the heart of the bug.