Question
A Java export service (limit 4Gi) that was rock-solid for months suddenly gets OOMKilled (exit 137) twice this afternoon, each time within seconds of a *single* request — not a slow climb. Memory dashboards show a flat ~1.2GB baseline with sudden vertical spikes to the limit right before each kill. There was no deploy. The access log shows that just before each crash, one new enterprise customer called the 'export all records' endpoint with no date filter, returning ~8 million rows that the service materializes fully into a list and serializes in memory before responding. Triage and mitigate.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.