Question
Design a request-coalescing / batching layer in front of a backend that's far more efficient at batch lookups than single lookups (e.g. a database or a service where one call fetching 100 keys is ~the same cost as one call fetching 1 key, but per-call overhead is high). Requests arrive individually and concurrently for single keys. Design how you coalesce concurrent in-flight requests for the same key and batch distinct keys into grouped backend calls, while bounding the added latency each caller pays and handling partial failures within a batch.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.