Code Room
On-callHardoc-g406
Subject Resource exhaustionLevel Mid–Senior~30 minCommon in Reliability & on-call interviewsIndustries Technology, Software development

Question

A Node.js API using a 20-connection Postgres pool starts throwing 'timeout exceeded when trying to acquire a connection' and 'remaining connection slots are reserved' from Postgres at 12:00. The pool's in-use gauge is pinned at 20/20 and never drops, even during a brief traffic lull where QPS halved — used connections stayed at 20. CPU and memory are fine. A feature shipped this morning added a code path that runs a query inside a manual transaction (`BEGIN`) and, on one early-return error branch, returns without `COMMIT`/`ROLLBACK` and without releasing the client. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.