Question
A change ships in two separate artifacts: a code deploy (new `tax` service that reads a new `rate_table_v2` config) and a config deploy (publishing `rate_table_v2`). They're deployed by different pipelines and the team assumed config would land first, but today the CODE pipeline finished at 09:58 and the CONFIG pipeline at 10:07. Between 09:58 and 10:07, the new code looked for `rate_table_v2`, didn't find it, and fell back to an empty table — so for 9 minutes, ~30% of checkouts computed $0 tax (no error, 200s). Dashboards: error rate flat, latency flat; a `tax_collected` business metric dipped sharply from 09:58 to 10:07 then recovered. Triage, explain the window, then prevent it.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.