On-callHardoc-g188

Subject Disk fullLevel Senior–Staff~40 minCommon in Databases & SQL · Storage & CDN · Distributed systems interviewsIndustries Software development

Question

A primary PostgreSQL instance's data volume fills to 100% at 04:00 and the database refuses writes (it shuts down to protect itself). Disk dashboards show steady growth on pg_wal/ over the last 12 hours specifically — table data size is flat. A read replica went offline for maintenance at 16:00 yesterday and never reconnected, and a logical replication slot for a downstream analytics CDC pipeline shows a replication-lag/slot-retained-bytes metric climbing all night. Normal table churn is unchanged. Triage and recover.

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.