Question
During a flash sale, the app starts returning 500s and logs fill with "FATAL: sorry, too many clients already" and "remaining connection slots are reserved." Postgres `max_connections` is 200. The dashboard shows `pg_stat_activity` near 200 connections, but most are in state `idle in transaction` or plain `idle`, not actually running queries — DB CPU is low. The app runs 30 Kubernetes pods that just autoscaled from 8, each with its own connection pool sized at 20. No PgBouncer in front. A new feature shipped last week that opens a transaction early in a request handler. Triage and mitigate.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.