Question
You're running a 50/50 online A/B between ranking model v-control and v-treatment on the homepage feed. At 11:00 the experiment guardrail dashboard escalates: the treatment arm's downstream 'session revenue per user' is down 9% versus control and the gap is statistically significant and widening over two hours, while top-line CTR on the feed is actually slightly UP in treatment. Both arms return 200s, latency is equal, no errors. Dashboards: treatment surfaces more clickbait-style low-value items high in the ranking; control's revenue-weighted positions are unchanged. How do you triage and decide?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.