Question
Mid-morning, the orders API latency p99 jumps from 80ms to 8s and error rate climbs as request timeouts pile up. The DB (MySQL/InnoDB) CPU is only at 30% — it's not saturated. Your APM shows hundreds of `UPDATE orders SET status=...` statements stuck in "waiting for lock." `SHOW ENGINE INNODB STATUS` and `sys.innodb_lock_waits` reveal they're all blocked behind a single transaction that's been open for 22 minutes and holds row locks on a swath of the orders table. That transaction came from an analytics/reporting connection. Recent context: a data analyst started running an ad-hoc "reconcile yesterday's orders" script before standup. Triage and mitigate.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.