Question
You're mid-migration from a monolith Postgres `inventory` table to a new sharded store, running a dual-write phase (app writes to both, reads still from old). At 09:30, reconciliation alerts fire: 0.3% of SKUs have divergent stock counts between old and new, and the divergence is growing ~0.1%/hour. Dashboards: both stores are up; the new store occasionally returns write timeouts (~0.5% of writes) under load, but the app logs those as warnings and moves on; no read-path impact yet. The cutover to read-from-new is scheduled for tomorrow. How do you triage the divergence, decide whether to proceed, and reconcile the two stores?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.