On-callMediumoc-g056

Subject Schema migrationLevel Mid–Senior~35 minCommon in Databases & SQL · Distributed systems interviewsIndustries Technology, Software development

Question

A migration to add an index on a large 'events' table (MySQL 8, primary + 2 read replicas) runs at 03:00 using a standard online DDL. The primary stays healthy and the migration completes there in 6 minutes. But starting at 03:00, replica lag on both read replicas climbs to 40 minutes, and the app — which reads recent events from replicas — starts showing users stale/missing data and the analytics dashboards go blank. After the index finishes replicating, lag drains and everything recovers. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.