Code Room
On-callMediumoc-g128
Subject Migration gone wrongLevel Mid–Senior~35 minCommon in Reliability & on-call interviewsIndustries Technology, Software development

Question

A migration added a `is_active BOOLEAN NOT NULL DEFAULT true` column to a 50M-row `subscriptions` table and ran a backfill to set `is_active=false` for cancelled subscriptions using a join against a `cancellations` table. The migration 'succeeded'. An hour later, billing runs and charges ~12,000 customers who had cancelled. Dashboards: refund/dispute tickets spike; the new column shows almost everyone as `is_active=true`. Investigation hint: the backfill query used a `LEFT JOIN` and a `WHERE cancellations.id IS NOT NULL` that was negated incorrectly, plus the `cancellations` table uses `customer_id` while the join was on `subscription_id`. How do you triage, stop wrong charges, and correct the data?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.