Code Room
On-callMediumoc-g680
Subject Networking dnsLevel Mid–Senior~35 minCommon in Networking & APIs interviewsIndustries Technology

Question

Twenty minutes after a planned DNS change repointing api.example.com to a new ingress IP, you get partial outage reports: roughly 40% of users can't reach the API while 60% are fine, and the split doesn't correlate with region. Your new ingress is healthy and serving the 60%. Some clients resolve the old IP (now decommissioned, connections refused) and some the new IP. The old record had a TTL of 3600s. A handful of large enterprise customers behind corporate resolvers are heavily represented in the failures. Triage and stabilize.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.