Question
A billing pipeline consumes a Pub/Sub subscription and writes invoice line-items to Postgres. Finance reports two problems this week: (1) a few customers were *double-charged*, and (2) one batch of charges appears *missing*. Dashboards: the subscription's `ack_message_count` looks healthy, `oldest_unacked_message_age` is low. Recent context: the consumer was changed three days ago to ack the Pub/Sub message *first* and then write to Postgres (to 'reduce ack-deadline expirations'), and the Postgres write has no uniqueness constraint on `(invoice_id, line_item_id)`. There was a brief Postgres failover two days ago. Triage and explain both the duplicates and the loss.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.