On-callEasyoc-g534

Subject Config errorLevel Entry–Mid~15 minCommon in Reliability & on-call interviewsIndustries Software development, Technology

Question

A service fails to start and the orchestrator is reporting it as unhealthy across every instance. The startup logs show the process exiting immediately with an error about an invalid value in its configuration. The last change was a config update pushed five minutes ago to tune a setting — no application code changed. Every instance is failing identically. How do you triage and recover?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.