Question
An accounting reconciliation flags that aggregate 'credits remaining' across wallets is drifting LOWER than the sum of individual debits/credits should allow — money is quietly disappearing from balances under high concurrency. Dashboards: no errors; the wallet service reads a balance, computes the new value in app code, and writes it back (`UPDATE wallets SET balance = ? WHERE id = ?`); during traffic spikes, concurrent debits on the same hot wallet (shared team accounts) interleave; the DB is Postgres at READ COMMITTED. There's no row lock or atomic decrement. How do you triage this corruption, stop the silent balance loss, and recover the lost amounts?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.