Code Room
On-callMediumoc-g109
Subject Cold startLevel Mid–Senior~35 minCommon in Reliability & on-call interviewsIndustries Technology

Question

Your service runs on a platform that scales to zero when idle (Knative/Cloud Run style). A monitoring synthetic and the first real users after an idle period get 10–20 second responses or timeouts, while warm responses are ~150ms. The platform logs show the request waiting on a new pod being created: a large container image is pulled, the runtime starts, and only then does the app boot and connect to its dependencies. Traffic is spiky — long idle gaps then sudden bursts. How do you triage and cut the first-request latency?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.