Code Room
On-callHard
Question
At 15:40 your public site becomes unreachable for a large fraction of users worldwide, while a smaller fraction load it fine. Dashboards: your origin and CDN edges are healthy and lightly loaded — almost no traffic is arriving at all. Synthetic checks from several regions fail at the DNS-resolution step ('SERVFAIL'). Your authoritative DNS is hosted by a single managed DNS provider, and that provider's status page reports it is mitigating a large DDoS against its anycast network. Recent context: none on your side; no deploy. How do you triage and mitigate?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.