Code Room
On-callMediumoc-g084
Subject Third party downLevel Mid–Senior~30 minCommon in Reliability & on-call interviewsIndustries Telecom

Question

At 19:20 new-user signups and 2FA logins start failing at the SMS step. Dashboards: your SMS provider's API for sending OTP codes returns elevated 500s and high latency for one route (carrier/region specific); login-completion rate for users requiring SMS 2FA drops sharply; users with app-based authenticator codes are unaffected. The provider's status page notes degraded delivery to certain carriers. Recent context: none on your side. How do you triage and mitigate?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.