Code Room
On-callHardoc-g350
Subject Data lossLevel Senior–Staff~35 minCommon in Reliability & on-call interviewsIndustries Technology

Question

An engineer ran an approved cleanup at 15:30 to delete ~2,000 stale `projects` rows for a churned customer cohort. The query ran in seconds and looked correct. Within minutes, support floods in: thousands of *active* customers report their tasks, comments, and attachments are gone. Dashboards: no errors, DB healthy, but row counts on several child tables dropped by millions. A migration last quarter added `ON DELETE CASCADE` foreign keys from those child tables to `projects`. How do you triage, stop further loss, and recover?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.