Question
A deploy upgrades the database driver (a minor version, e.g. 5.3 → 5.4). Tests pass. Under production load, intermittent `connection pool timeout: no connection available within 10000ms` errors appear, but only at peak; DB server-side metrics show the DB itself is healthy and under-utilized, connection count to the DB is LOWER than before, and app-side latency is up. Recent context: the driver minor release changed the default max pool size from 100 to 10 (a 'safer default'), documented only in release notes, not the changelog summary. How do you triage and mitigate?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.