Code Room
On-callMedium
Question
After a deploy at 10:00, GET /cart p99 regressed from 90ms to 700ms; p50 went from 40ms to 60ms (mild). Throughput and error rate are unchanged. Distributed traces for slow requests show the time is spent in many small, sequential database calls — a slow request makes ~150 round-trips to the DB, each ~3ms, while a fast request makes ~5. The deploy 'added per-line-item promotional pricing.' Carts vary in size; big carts are slow, small carts are fine. Triage and explain the regression precisely.
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.