On-callHardoc-g642

Subject False sharing cache contentionLevel Senior–Staff~45 minCommon in Reliability & on-call interviewsIndustries Technology, Software development

Question

A C++ high-frequency packet-counter service regressed after a 'cache-friendly' refactor that packed per-thread counters into a tight contiguous array (`counters[thread_id]++`, no locks, each thread touches only its own index). On a 32-core box, throughput is now ~40% LOWER than before, and it gets worse the more threads you add even though there are zero locks and zero data races (each thread writes a different element). `perf` shows a huge spike in L2/L3 cache misses and 'HITM' (cache-line transfer) events, and CPU is busy but unproductive. Triage and explain why a lock-free design got slower.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.