Code Room
On-callHardoc-g424
Subject Lock contentionLevel Senior–Staff~35 minCommon in Databases & SQL · Concurrency · Algorithms & data structures interviewsIndustries Technology

Question

A routine migration to add a column ran during a low-traffic window and 'hung'. Within a minute, the entire `users` table became effectively unavailable — every read and write to it is timing out, app error rate spiked to 100% on those endpoints, and the migration's `ALTER TABLE` is still `active`. `pg_locks` shows the ALTER waiting on an `AccessExclusiveLock`, and behind it a long queue of normally-fast SELECTs and UPDATEs all waiting. There's one old transaction at the front holding an `AccessShareLock` on `users` that's been open for 40 minutes. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.