Code Room
On-callMedium
Question
A Node.js notification service starts timing out health checks and dropping WebSocket connections under moderate load. Dashboards: event-loop lag (measured via `perf_hooks`) spikes from <5ms to 400ms+ and stays there; CPU on the single Node process is pinned at 100% on one core while the box has 7 idle cores; the async task queue (in-process) grows without bound. A feature shipped yesterday generates a personalized digest by doing synchronous JSON parsing + template rendering + a `bcrypt`-style hashing step inline in the request handler. Triage and remediate.
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.