Question
RabbitMQ cluster, quorum queue `image-resize`. At 11:30 the `redelivered` message rate on the queue spikes to ~95% of all deliveries, the queue's `ready` count oscillates between 40k and 80k, and consumer `unacked` counts on each of the 8 workers sit pinned at their `prefetch` limit (50). Throughput (acks/sec) has fallen ~70%. No deploy. Recent context: the upstream object store the resize workers read from started returning intermittent 5s timeouts about 15 minutes before the alert; the broker's per-message `delivery-attempts`/x-death header counts are climbing. How do you triage and mitigate this redelivery storm?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.