Code Room
System designMedium
Question
Design metrics collection for a highly elastic environment: a Kubernetes platform where 30,000 pods churn constantly (median pod lifetime 20 minutes), spread across 12 clusters in 4 regions. You need every pod's metrics scraped within 30s of it appearing, no metric gaps when pods die, and a single global query view across all regions. Decide push vs pull, how targets are discovered, and how you make collection highly available without double-counting.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.