Code Room
System designMedium
Question
Design the traffic-splitting layer of a reverse proxy / service mesh that does progressive canary rollouts: shift 1% → 5% → 25% → 100% of traffic to a new service version based on live health metrics, with automatic rollback if error rate or latency degrades. Requirements: the split must be sticky per user (a user shouldn't bounce between versions mid-session), the rollout must be observable, and rollback must be fast. Describe the routing decision, the metric-driven promotion/rollback loop, and the stickiness mechanism.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.