Question
A deploy ran a destructive migration: it merged two columns `first_name` + `last_name` into a single `full_name` column and DROPPED the originals, and the new code reads/writes only `full_name`. Twenty minutes post-deploy, a serious bug is found in the new name-handling code. The on-call's instinct is to roll back the deploy. But rolling the *code* back leaves it reading `first_name`/`last_name`, which no longer exist — so a naive rollback would turn a partial outage into a total one. Dashboards: the new code is erroring ~6% on name-render paths; the old columns are gone; a `full_name` column is fully populated. How do you reason about this and what do you actually do?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.