Question
A Java service backed by a fixed 50-connection HikariCP pool starts timing out: p99 jumps to the 30s request timeout, throughput collapses, but CPU and memory are flat and low. The pool's 'active connections' metric is pinned at 50/50 and 'pending threads waiting for a connection' is climbing into the hundreds. A thread dump shows many threads BLOCKED in `getConnection()`, and a handful HOLDING a connection while themselves blocked calling a second `getConnection()` inside the same request. A feature shipped yesterday added a nested transaction that opens a second DB connection mid-request. Triage and fix.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.