Question
Your fraud-scoring pipeline calls a third-party risk-signals vendor for each transaction. There is no outage: the vendor returns HTTP 200 on every call, latency is normal, your error rate is flat at baseline. But starting around 15:00, your auto-approve rate quietly climbed from 88% to 99% and chargeback-flagged volume is creeping up in the back office. Dashboards: vendor call success 100%, p99 normal, no timeouts, no 5xx. The vendor's status page is all green. The only context: the vendor emailed last week about a 'minor schema update' to their response. How do you triage and mitigate?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.