Question
At 08:00 you added a new internal service, payments-v2.internal, and rolled out clients that call it. For ~20 minutes after the DNS record was created, a large fraction of clients keep failing with NXDOMAIN / 'no such host' for payments-v2.internal, even though the record clearly exists and `dig` from a fresh resolver returns it instantly. The failures slowly taper off on their own. Dashboards: the record's positive TTL is short, but the failures persisted far longer than that. Some clients started working at 08:05, others not until 08:18. How do you triage and what's the fix?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.