On-callHardoc-g405

Subject Resource exhaustionLevel Senior–Staff~35 minCommon in Concurrency · Code quality & review interviewsIndustries Technology, Software development

Question

A Java service backed by a fixed 200-thread request pool starts returning 503s ('pool exhausted / task rejected') at 15:20; p99 on *every* endpoint balloons to the 30s timeout, even endpoints that don't touch the database. CPU sits at 20% and memory is flat — the box is nearly idle. A thread dump shows ~195 of 200 worker threads BLOCKED/parked in a synchronous call to one downstream recommendations API. That recommendations API got slow (its own incident) starting at 15:15. Throughput on your service has collapsed even though nothing about your service changed. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.