Code Room
System designMediumsd-g167
Subject Alerting systemsLevel Mid–Senior~30 minCommon in Reliability & on-call interviewsIndustries Technology, Software development

Question

Design the alert evaluation engine that turns metric/log conditions into alert states for an org with 8,000 alert rules. Problems to solve: a metric oscillating around its threshold causes a rule to flap (fire/resolve every 30s), producing pager spam; planned maintenance and deploys should suppress alerts; and some alerts should only fire if a condition holds for several minutes, not on a single bad sample. Design the rule model, the state machine, flap damping, and suppression.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.