Question
Design a shared connection-pool / concurrency-permit coordinator that fairly multiplexes a hard cap of database connections (say 200) across dozens of service instances and many request types, so that one bursty endpoint can't exhaust the pool and starve the rest. Constraints: total in-flight DB connections must stay under the cap, a slow query must not let callers pile up unbounded, and waiting callers should be served fairly with a bounded timeout. Describe how permits are allocated, how concurrent callers coordinate, and the overload behavior.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.