Code Room
On-callMediumoc-g031
Subject Thread pool exhaustionLevel Mid–Senior~30 minCommon in Concurrency · Distributed systems interviewsIndustries Technology, Software development

Question

A Python web app (gunicorn with sync workers, 4 workers × 8 threads = 32 concurrent slots) fronts three downstream services: auth, billing, and recommendations. At 18:00 the recommendations service (a non-critical sidebar widget) starts taking 10s per call instead of 50ms. Within minutes your *entire* app — including login and checkout, which don't even touch recommendations — becomes unresponsive and starts timing out. Recommendations is only ~5% of your traffic. Triage and explain why a slow non-critical dependency took down everything.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.