On-callHardoc-g453

Subject Latency spikesLevel Senior–Staff~40 minCommon in Reliability & on-call interviewsIndustries Technology, Software development

Question

Your service runs with an Envoy sidecar per pod and an HPA that scales on the APP container's CPU. The app itself is comfortably under its CPU target and the HPA isn't scaling up. But during short traffic bursts, end-to-end p99 spikes by +100-300ms for a few seconds, then settles. The app container's own latency and CPU look fine during the bursts; the added time is in the sidecar hop. The sidecar container has a CPU limit of 500m, and `container_cpu_cfs_throttled_periods` on the SIDECAR container is high and lines up exactly with the latency spikes. App-container throttling is zero. How do you triage and mitigate?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.