On-callHardoc-g473

Subject Version skewLevel Senior–Staff~40 minCommon in Networking & APIs interviewsIndustries Technology, Software development

Question

During a rolling deploy of the `payments` API (24 pods, ~10 min), reconciliation later flags a small cluster of DOUBLE charges, all timestamped within the rollout window. Context: v51 (new) changed the idempotency-key derivation to include a newly-added `attempt_id` field; v50 (old) derives the key the old way (without it). The idempotency store is shared. During the rollout, a client retry can hit a v50 pod the first time and a v51 pod on retry (or vice versa): the two versions compute DIFFERENT idempotency keys for the SAME logical payment, so the dedup check misses and the charge runs twice. Dashboards: no error spike, p99 normal, charge volume slightly elevated. Triage, explain why only the rollout window is affected, then mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.