On-callMediumoc-g117

Subject Inconsistent replicasLevel Mid–Senior~30 minCommon in Databases & SQL · Distributed systems interviewsIndustries Technology

Question

Users report that after saving a profile change, refreshing sometimes shows the OLD value, then it 'fixes itself' a few seconds later. Started 30 minutes ago. Dashboards: write traffic is normal; one of three Postgres read replicas shows replication lag oscillating 3–15s (the other two are <500ms); the load balancer routes reads round-robin across all three; the lagging replica's `WALReceiver` is fine but its disk write latency doubled after a noisy co-tenant landed on that host. No app deploy. How do you triage, stop the bad user experience now, and prevent stale reads going forward?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.