Question
A read-heavy SaaS app (95% reads) offloads reads to a fleet of async read replicas behind a load balancer to scale out a single primary. After the rollout, support tickets spike: users update a setting and the next page load shows the old value, and an admin's bulk import 'finishes' but the dashboard shows partial data for a few seconds. Replication lag is normally <100ms but spikes to 2s under load. Design a replica read-routing policy that fixes the user-visible staleness without sending everything back to the primary, and state the trade-off.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.