Code Room
On-callEasyoc-g529
Subject Bad deployLevel Entry–Mid~15 minCommon in Reliability & on-call interviewsIndustries Software development, Technology

Question

You're on call for a single web API. Ten minutes ago the error-rate dashboard jumped from ~0.1% to ~40% of requests returning HTTP 500, and latency on the success path is unchanged. The deploy timeline shows a new version went out 12 minutes ago; nothing else changed (no traffic spike, no dependency alerts). The on-call channel is filling up with customer reports. How do you triage and mitigate?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.