Question
You're migrating a `wallets` table from an old Postgres store to a new sharded store using a dual-write + gradual read-cutover: a router decides per `account_id` whether reads go to old or new based on a 'migrated' flag set when each account's data is copied. At 16:00 you bumped the read-cutover from 10% to 60% of accounts. Within minutes, support reports a slice of users seeing *stale balances* — a deposit made an hour ago is missing on refresh, then reappears. Dashboards: error rate flat, both stores healthy, replication current. How do you triage, stop serving stale data, and reconcile?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.