Question
Support tickets report customers charged twice for the same order over the last hour. Dashboards: the payment-service consumer that processes a Kafka `order.placed` topic shows its rebalance count jumped (3 rebalances in 30 min), processing latency per message rose near the `max.poll.interval.ms` ceiling, and `charge_created` counts in your payment provider are ~1.8x the `order.placed` counts for the window. No code deploy happened; a noisy-neighbor caused GC pauses on two consumer pods earlier. How do you confirm the cause, stop the double-charges now, and reconcile the customers who were already charged twice?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.