Question
Postgres 14. A schema migration (`ALTER TABLE accounts ADD COLUMN ...`) was deployed at 16:30 and the deploy 'hung'. Within 2 minutes, p99 on the entire `accounts`-touching API spiked and `pg_stat_activity` filled with sessions in `wait_event = relation` / `Lock`. The ALTER itself is shown as `active` but waiting. There's also a long-running analytics `SELECT` on `accounts` that started at 16:25 and a steady stream of normal app queries. Explain the lock cascade, how you triage with the lock views, the safe mitigation, and how to ship such migrations without an outage.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.