Code Room
On-callHard
Question
A reactive (Project Reactor / Netty) Java service that normally handles 50k RPS on a handful of event-loop threads suddenly drops to handling almost nothing — latency goes to seconds, throughput collapses, but CPU is near idle (15%). Thread dumps show all of the small number of event-loop threads BLOCKED, parked inside a synchronous JDBC call and a `.block()` added in last night's deploy to 'quickly reuse' a legacy DAO. No upstream/downstream change. How do you triage and mitigate?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.