Question
A Java market-data fan-out service freezes completely about once a day — it stops publishing updates, accepts no new work, and only a restart recovers it. There's no crash and no OOM. When it freezes, CPU goes to ~0% and stays there (not pegged). A thread dump taken during the freeze shows two threads each holding one lock and waiting on the other ('Found one Java-level deadlock' is printed by the JVM), involving a subscription-registry lock and a publisher lock. The deadlock appeared after a refactor that lets a publish callback re-enter and modify the subscription registry. Triage and design the fix.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.