Question
A deploy bumps the feature-flag SDK from v6 to v7 (a 'modernization' chore, no flag changes intended). For 50 minutes everything is normal. Then a slow drift: a paid 'priority_delivery' experiment that was supposed to be 8% on starts behaving as if ~50% of users are in the treatment, driving up courier-assignment cost. Dashboards: `flag_eval{flag="priority_delivery", variation="on"}` rose from 8% toward 50% over the first hour after deploy, tracking the rolling-deploy progress (pod by pod). No error rate change, no latency change. The flag's percentage rollout is unchanged in the dashboard. v7's changelog notes 'improved hashing for more uniform bucketing.' Triage and mitigate.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.