Code Room
System designHardsd-g647
Subject Service mesh networkingLevel Senior–Staff~50 minCommon in Networking & APIs interviewsIndustries Technology

Question

Design a service mesh for a Kubernetes platform running 8,000 services across 60,000 pods, where every pod gets a sidecar proxy that handles mTLS, retries, timeouts, traffic splitting, and telemetry. The mesh must mint and rotate workload identities (SPIFFE-style) with cert lifetimes under 24h, enforce per-route authz policy, and let a platform team shift traffic (canary 1% -> 100%) declaratively. Constraints: sidecar p99 added latency under 1ms per hop, config/cert pushes to all sidecars within seconds, and a control-plane outage must not break existing data-plane traffic. Walk through the components, the identity/cert model, and the central trade-off.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.