Code Room
On-callHardoc-g336
Subject Migration gone wrongLevel Senior–Staff~40 minCommon in Distributed systems interviewsIndustries Technology

Question

A schema migration adds a `currency` column (defaulting writes to a per-row value) across a 16-shard MySQL fleet for an `orders` table. The migration runner applies DDL shard-by-shard. At 03:20 it failed partway: shards 0–9 have the new column, shards 10–15 don't. The runner reported failure and paged, but the API kept serving. Now reads are inconsistent: orders on migrated shards return a currency, orders on un-migrated shards 500 on the new code path that expects the column, and a reporting job is summing amounts across currencies as if they were all USD. How do you triage, stabilize, and safely complete the migration?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.