Code Room
On-callHardoc-g118
Subject Data corruptionLevel Senior–Staff~40 minCommon in Reliability & on-call interviewsIndustries Technology, Software development

Question

Analysts report that a downstream table's `amount_cents` values look 'shifted' — many rows have plausible-but-wrong numbers, and a few fields are swapped with adjacent ones. No errors anywhere: ingestion succeeds, the consumer commits, dashboards are green. Context: 90 minutes ago a producer team deployed a change that added a new field in the MIDDLE of an Avro record's field list and bumped the schema, but a subset of consumers are pinned to an older Schema Registry cache and didn't pick up the new schema. How do you triage this silent corruption, stop it spreading, and recover the affected rows?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.