Question
It's 14:10 and you're on call for the payments API (Java/Spring Boot, Postgres via HikariCP, behind an ALB). PagerDuty fires: p99 latency on POST /charges jumped from 180ms to 2.4s, while p50 is unchanged at 60ms. Error rate is flat — nothing is failing, requests are just slow. The Hikari dashboard shows `connections.active` pinned at the pool max of 20 and `connections.pending` climbing into the dozens. Postgres `pg_stat_activity` shows ~20 sessions, most in state `idle in transaction`. A deploy went out at 13:55 that 'added an external fraud-score lookup before committing the charge.' Walk me through how you triage this and what you do in the next 15 minutes versus the next week.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.