Question
Design a layer-7 (HTTP/2) load balancer that distributes 500k requests/sec across a backend pool of 2,000 stateful cache servers where requests for the same key should prefer the same backend to maximize cache hit rate. Targets: p99 added latency under 3ms, even load spread (no backend over ~1.2x the mean), and when backends are added/removed only a small fraction of keys should remap. Backends fail constantly at this scale, so detect a dead backend within a couple of seconds and stop sending it traffic without a stampede onto its neighbors. Walk through the balancing algorithm, the health-check model, and the main trade-off.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.