Code Room
On-callHardoc-g339
Subject Backfill stormLevel Senior–Staff~35 minCommon in Databases & SQL · Distributed systems interviewsIndustries Technology

Question

A backfill to populate a new `search_vector` tsvector column across a 300M-row Postgres `documents` table was launched at 22:00 as a tight loop issuing large `UPDATE ... WHERE id BETWEEN ...` batches. By 22:20, read replicas are 8 minutes behind and climbing, user-facing reads (routed to replicas) are serving stale data, and the primary's disk is filling fast. Dashboards: primary CPU moderate, but WAL generation rate is 10x normal and replica apply lag is a straight diagonal line up. No app errors yet. How do you triage and mitigate before replicas fall over?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.