Question
A push notification drives a 5x surge to your mobile home-screen API. Your home-screen service autoscales fine and stays healthy, but a shared downstream `user-profile` service — which it calls 8 times per home-screen request to hydrate widgets — saturates: profile p99 explodes, its DB hits connection limits, and home-screen requests start failing on the profile dependency. Profile is used by many teams and can't scale instantly. Dashboards show home-screen traffic up 5x but profile traffic up ~40x. How do you triage and mitigate without taking down profile for everyone?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.