Question
Your recommendations service runs on a Kubernetes HPA targeting 65% CPU, min 8 / max 40 pods. Since this morning the replica count oscillates wildly — scaling 8→36→8→34 every few minutes — and p99 latency has a sawtooth pattern that tracks the swings. Each scale-down evicts warm pods, and the surviving pods briefly spike to 95% CPU before new ones become ready (your readiness probe gates on a 40s warm-cache load). Dashboards show average CPU hovering right at the 65% target and the HPA event log is full of alternating SuccessfulRescale up/down events. Traffic is flat versus last week. Walk through your triage and how you'd stop the flapping.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.