Code Room
On-callHard
Question
A routine schema migration that adds a NOT NULL column with a default to a 200M-row orders table on MySQL 8 (InnoDB) was deployed at 16:00. By 16:02 the entire checkout path is timing out. SHOW PROCESSLIST is full of queries in 'Waiting for table metadata lock'. The migration itself shows as running. APM shows write throughput dropped to near zero while reads continue. The on-call DBA says the table is 'too big to lock'. How do you triage and what's your mitigation plan?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.