Code Room
On-callHardoc-g420
Subject Vacuum bloatLevel Senior–Staff~40 minCommon in Reliability & on-call interviewsIndustries Technology

Question

PagerDuty fires: Postgres is logging `WARNING: database "prod" must be vacuumed within 8400000 transactions`. `SELECT datname, age(datfrozenxid) FROM pg_database` shows your main DB at 1.94 billion and climbing. A few tables are sitting in the high hundreds of millions on `age(relfrozenxid)`. There's a huge append-only `events` table that nobody ever updates or deletes, and a couple of tables owned by a service that runs hours-long transactions. Autovacuum is enabled with defaults. You're roughly 6 hours from the 2-billion shutdown threshold if the rate holds. Triage and mitigate.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.