On-callHardoc-g618

Subject Transaction deadlock stormLevel Senior–Staff~40 minCommon in Databases & SQL · Concurrency interviewsIndustries Technology

Question

After a deploy this afternoon, the wallet/ledger service starts logging a flood of "deadlock detected" errors — hundreds per minute, up from ~zero. The DB auto-aborts one transaction in each cycle, so users see intermittent "transaction failed, please retry" on transfers, and the retry logic causes a thundering retry storm that makes it worse. The deadlock graph in the logs shows two transactions each updating two `accounts` rows: one does `UPDATE account A then B`, the other `UPDATE account B then A`. The deploy changed the transfer code path. CPU is moderate; throughput is down because so much work is being rolled back. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.