Why your replicas show stale data, and how write-heavy jobs break them.
To handle high traffic, databases use a Primary-Replica architecture. You send all WRITES (INSERT/UPDATE) to the Primary. The Primary then streams those changes to Replicas. You send READS to the Replicas to spread the load.
However, this streaming is asynchronous. If someone runs a massive background job (e.g., deleting 1,000,000 old rows), the Primary processes it fast, but the Replica takes time to catch up. This delay is called Replication Lag. During this lag, if a user updates their profile (Write to Primary) and refreshes the page (Read from Replica), they will see their old profile data!
# BAD: Naive Read
def get_user_profile(user_id):
# If the user JUST updated their profile 1 second ago,
# the replica might not have the change yet!
return db_replica.query("SELECT * FROM users WHERE id=?", user_id)
# GOOD: Read-your-writes consistency
def update_profile(user_id, data):
db_primary.execute("UPDATE users...", data)
# Set a flag in Redis/Cookie: "User wrote recently"
cache.set(f"recent_write:{user_id}", True, expire=5)
def get_user_profile(user_id):
if cache.get(f"recent_write:{user_id}"):
# Force read from Primary to ensure freshness!
return db_primary.query("SELECT * FROM users WHERE id=?", user_id)
return db_replica.query("SELECT * FROM users WHERE id=?", user_id)