On-callHardoc-g289

Subject Third party downLevel Senior–Staff~40 minCommon in Reliability & on-call interviewsIndustries Technology

Question

Your fraud-scoring pipeline calls a third-party risk-signals vendor for each transaction. There is no outage: the vendor returns HTTP 200 on every call, latency is normal, your error rate is flat at baseline. But starting around 15:00, your auto-approve rate quietly climbed from 88% to 99% and chargeback-flagged volume is creeping up in the back office. Dashboards: vendor call success 100%, p99 normal, no timeouts, no 5xx. The vendor's status page is all green. The only context: the vendor emailed last week about a 'minor schema update' to their response. How do you triage and mitigate?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.