Code Room
On-callMediumoc-g023
Subject Latency spikesLevel Mid–Senior~30 minCommon in Reliability & on-call interviewsIndustries Technology, Software development

Question

It's 14:10 and you're on call for the payments API (Java/Spring Boot, Postgres via HikariCP, behind an ALB). PagerDuty fires: p99 latency on POST /charges jumped from 180ms to 2.4s, while p50 is unchanged at 60ms. Error rate is flat — nothing is failing, requests are just slow. The Hikari dashboard shows `connections.active` pinned at the pool max of 20 and `connections.pending` climbing into the dozens. Postgres `pg_stat_activity` shows ~20 sessions, most in state `idle in transaction`. A deploy went out at 13:55 that 'added an external fraud-score lookup before committing the charge.' Walk me through how you triage this and what you do in the next 15 minutes versus the next week.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.