Question
At 14:40 a Java input-validation service's CPU pins to 100% across all cores and p99 jumps from 40ms to multi-second; throughput collapses and some health checks time out, but request *rate* is flat and the upstream/downstream look fine. A thread dump shows many worker threads parked deep inside `java.util.regex.Pattern$...match` on the same stack. There was no deploy today; the only change is that a new B2B customer started submitting unusually long, structured strings to one endpoint this afternoon. CPU profiles attribute ~90% of on-CPU time to one regex evaluation. Triage and mitigate.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.