Code Room
On-callHardoc-g047
Subject Schema migrationLevel Senior–Staff~40 minCommon in Concurrency interviewsIndustries Technology, Software development

Question

A migration job kicks off at 02:30 to add a NOT NULL column with a default to the 'orders' table (Postgres 11, ~400M rows). At 02:31 writes to orders start timing out, the API returns 503s for anything touching orders, and pg_stat_activity shows a long-running ALTER TABLE holding an AccessExclusiveLock with a growing queue of blocked queries behind it. Replica lag begins climbing. The migration was tested on a staging copy with 1M rows and ran in 2 seconds. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.