Question
Finance reports that ~0.3% of customers were charged twice for the same order over the last 24h. Dashboards: the payments worker fleet was auto-scaled from 4 to 12 pods yesterday at noon to handle a backlog; the duplicate-charge rate is near zero before noon and rises sharply after. The order-events queue (SQS-style, at-least-once) occasionally redelivers messages. The charge handler does: read order, `if order.status != 'charged'` then call Stripe and set `status='charged'`. No DB row lock around that check-then-set. Triage, contain the bleeding, and explain the root cause.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.