Question
A migration added a `is_active BOOLEAN NOT NULL DEFAULT true` column to a 50M-row `subscriptions` table and ran a backfill to set `is_active=false` for cancelled subscriptions using a join against a `cancellations` table. The migration 'succeeded'. An hour later, billing runs and charges ~12,000 customers who had cancelled. Dashboards: refund/dispute tickets spike; the new column shows almost everyone as `is_active=true`. Investigation hint: the backfill query used a `LEFT JOIN` and a `WHERE cancellations.id IS NOT NULL` that was negated incorrectly, plus the `cancellations` table uses `customer_id` while the join was on `subscription_id`. How do you triage, stop wrong charges, and correct the data?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.