Code Room
System designMedium
Question
Design the alert evaluation engine that turns metric/log conditions into alert states for an org with 8,000 alert rules. Problems to solve: a metric oscillating around its threshold causes a rule to flap (fire/resolve every 30s), producing pager spam; planned maintenance and deploys should suppress alerts; and some alerts should only fire if a condition holds for several minutes, not on a single bad sample. Design the rule model, the state machine, flap damping, and suppression.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.