Code Room
System designMediumsd-g161
Subject Slo monitoringLevel Mid–Senior~35 minCommon in Reliability & on-call interviewsIndustries Technology, Software development

Question

Design an SLO monitoring and error-budget system for a payments API with a 99.95% availability and a 'p99 latency < 300ms' SLO measured over a rolling 28-day window. The platform serves 40k req/s. You need to compute compliance, show remaining error budget, and page on-call only when the budget is burning fast enough to matter — not on every blip. Design the measurement, the budget computation, and the alerting policy.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.