Code Room
On-callMediumoc-g635
Subject Semaphore lock permit leakLevel Mid–Senior~35 minCommon in Networking & APIs · Concurrency interviewsIndustries Technology, Software development

Question

A service guards calls to a flaky vendor with a `Semaphore(50)` to cap concurrency. Over hours it slowly degrades: throughput to the vendor falls steadily, and eventually every call blocks on `semaphore.acquire()` and times out, even though the vendor itself is responding fine and your traffic hasn't increased. A graph of available permits trends downward over time and never recovers — like a slow leak — flattening at 0. Restarting the pod fixes it instantly, then it slowly leaks again. A retry/timeout change shipped two days ago. Triage and find the leak.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.