Code Room
System designHardsd-g156
Subject Distributed tracingLevel Senior–Staff~45 minCommon in Distributed systems interviewsIndustries Technology

Question

Design a distributed tracing pipeline for a microservice platform of ~800 services producing 5M traces/min, average 40 spans/trace (so ~200M spans/min at peak). The business wants to keep essentially all error traces and all traces slower than the p99 latency for that endpoint, but can afford to store only ~2% of the healthy traces. Storage budget allows ~30 days hot. Design ingest, the sampling decision, and trace storage/retrieval by trace ID and by service+latency+error filters.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.