Code Room
On-callHardoc-g414
Subject Cpu saturationLevel Senior–Staff~40 minCommon in Concurrency interviewsIndustries Technology, Software development

Question

A Go in-memory rate-limiter service's CPU jumps from 25% to 95% at 13:30 and p99 doubles, yet *useful* throughput (allowed+denied decisions/sec) actually drops. `perf`/pprof shows most on-CPU time in runtime mutex/lock and futex, not in the business logic, and the OS shows a huge spike in context switches and run-queue length. A deploy at 13:25 added a single global `sync.Mutex` around a counter map that every request now contends on; before, sharding kept contention low. Request rate is only up ~5%. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.