On-callHardoc-g280

Subject Dependency upgradeLevel Senior–Staff~35 minCommon in Code quality & review interviewsIndustries Technology, Software development

Question

A deploy upgrades the database driver (a minor version, e.g. 5.3 → 5.4). Tests pass. Under production load, intermittent `connection pool timeout: no connection available within 10000ms` errors appear, but only at peak; DB server-side metrics show the DB itself is healthy and under-utilized, connection count to the DB is LOWER than before, and app-side latency is up. Recent context: the driver minor release changed the default max pool size from 100 to 10 (a 'safer default'), documented only in release notes, not the changelog summary. How do you triage and mitigate?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.