Code Room
On-callHardoc-g479
Subject Schema migrationLevel Senior–Staff~40 minCommon in Reliability & on-call interviewsIndustries Technology, Software development

Question

A migration adds a new value to a Postgres native ENUM type (`order_status`) so the new code can write status `partially_refunded`. The migration runs fine and is fast. The new code rolls out and starts writing `partially_refunded`. But a separate, OLDER reporting service (deployed months ago, not part of this rollout) begins throwing `invalid input value for enum order_status: "partially_refunded"` on a subset of rows, and its nightly export fails. Dashboards: the primary app is healthy; only the reporting service errors, and only on rows written after the migration. The reporting service uses an ORM that loaded the enum's allowed values at process start and validates reads against that cached set. Triage, explain, and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.