Code Room
System designMediumsd-g165
Subject Metrics systemsLevel Mid–Senior~35 minCommon in Distributed systems interviewsIndustries Technology, Software development

Question

Design metrics collection for a highly elastic environment: a Kubernetes platform where 30,000 pods churn constantly (median pod lifetime 20 minutes), spread across 12 clusters in 4 regions. You need every pod's metrics scraped within 30s of it appearing, no metric gaps when pods die, and a single global query view across all regions. Decide push vs pull, how targets are discovered, and how you make collection highly available without double-counting.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.