On-callMediumoc-g174

Subject Backlog buildupLevel Mid–Senior~30 minCommon in Distributed systems interviewsIndustries Technology, Software development

Question

A Pub/Sub *pull* subscription feeds a fleet of worker pods (on GKE) that do CPU-heavy ML inference. At 16:00 `num_undelivered_messages` climbs from ~5k to 400k over 30 minutes and oldest-unacked-age grows. The worker pods are pinned at ~100% CPU, but the pod count is *not* increasing. The HPA (horizontal pod autoscaler) is configured to scale on CPU with a max replica count that's already reached. `ack_message_count` is steady (workers are acking what they process, just not enough). Recent context: an upstream feature launch roughly doubled inference request volume today. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.