Code Room
On-callHard
Question
A low-latency market-data fan-out service regressed after a refactor that replaced a single global counter with a small array of per-shard counters 'to reduce contention.' Counterintuitively, throughput dropped ~30% and p99 got worse under high core counts, even though each thread now writes only to its own array slot and there's no lock. CPU is high but instructions-per-cycle dropped sharply; perf counters show a large rise in L2/L3 cache coherence traffic and 'HITM' (modified-cache-line) events. No allocation, no GC, no lock contention. How do you triage and fix?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.