Question
Your service runs with an Envoy sidecar per pod and an HPA that scales on the APP container's CPU. The app itself is comfortably under its CPU target and the HPA isn't scaling up. But during short traffic bursts, end-to-end p99 spikes by +100-300ms for a few seconds, then settles. The app container's own latency and CPU look fine during the bursts; the added time is in the sidecar hop. The sidecar container has a CPU limit of 500m, and `container_cpu_cfs_throttled_periods` on the SIDECAR container is high and lines up exactly with the latency spikes. App-container throttling is zero. How do you triage and mitigate?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.