Question
A blue-green cutover flips 100% to green at 09:00 for a service that calls a third-party geocoding API. Green is healthy and warmed in every internal sense. But at 09:00 the third-party API starts returning HTTP 429 to green, and ~10% of requests fail for 20 minutes before recovering. Context: the third-party rate-limits per source IP, and blue and green run in different subnets / behind different NAT egress IPs. Blue's egress IP had built up a higher allowed-burst allowance with the vendor over time; green's brand-new egress IP starts cold with the default lower limit. Dashboards: green's 429s spike at 09:00 then taper as the vendor's adaptive limit ramps up for the new IP; internal metrics all green. Triage, explain, and prevent.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.