Question
A Go request-fan-out service's RSS climbs ~120MB/hour and it gets OOMKilled roughly every 20 hours; restarting resets it and the climb begins again. Heap dashboards show in-use heap rising slowly, but the standout signal is that the `go_goroutines` metric rises monotonically — from ~300 at boot to tens of thousands by hour 18 — tracking RSS almost exactly. A pprof goroutine dump shows huge counts blocked on channel receive in one helper that fans out to several backends and waits for the first result. A change last week added a slow optional backend to that fan-out. Triage and mitigate.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.