Question
Alert at 21:30: "DB host disk > 92% and climbing ~1%/min." The Postgres primary is on a 500GB volume. Writes are slowing and you're minutes from the disk filling, at which point Postgres will refuse writes and could crash. The disk dashboard shows the `pg_wal` directory growing fast — it's now 180GB, far above normal. Recent context: a replica went offline for maintenance 3 hours ago and hasn't reconnected, and someone set up a logical replication slot last week for a new analytics pipeline that's been failing silently. Table sizes themselves look normal. Triage and act fast.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.