Code Room
On-callMedium
Question
During a rolling deploy of your JVM service, every freshly deployed instance shows a 60–90 second window of high p99 latency and elevated errors right after it starts taking traffic, then settles to normal. The load balancer adds each new instance to rotation as soon as its `/health` returns 200, which happens within 2 seconds of process start. Heap and GC look fine after warmup; the spike is only at the start of each instance's life. Caches are empty and the JIT hasn't compiled hot paths yet. How do you triage and stop deploys from causing latency spikes?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.