Code Room
On-callMedium
Question
The reporting service intermittently throws `TimeoutException: could not acquire a connection from the pool within 30000ms`. It correlates with mornings when analysts run dashboards. The HikariCP pool size is 20. Dashboards: `pool.active` sits at 20 during incidents while `pool.pending` (waiters) spikes; the database itself reports only ~30% of its `max_connections` used and is healthy. A change last week added an export endpoint that streams a large result set to the client while holding the DB connection open for the whole download. Triage and fix.
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.