Code Room
System designHardsd-g175
Subject Metrics systemsLevel Senior–Staff~40 minCommon in Distributed systems interviewsIndustries Technology

Question

Design how a metrics platform records and serves latency percentiles for 100,000 service endpoints. Engineers want p50/p90/p99/p99.9 for any endpoint, and crucially want to aggregate percentiles across arbitrary groupings (all endpoints in a region, a whole service) computed on the fly. Averaging stored per-endpoint percentiles is mathematically wrong, but storing every raw latency is too expensive. Design the data representation, the aggregation, and the accuracy/cost trade-off.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.