Question
A Python (Gunicorn + Flask) recommendation API has workers that grow from 200MB to 2GB RSS over ~6 hours and get killed by the orchestrator, dropping in-flight requests. Each worker handles all routes. Dashboards show RSS climbing monotonically; `tracemalloc` top-stats taken an hour apart attribute the growth to a module-level dict used to memoize per-user feature vectors with a hand-rolled `@cache` decorator that has no eviction. Request volume is steady and the set of active users is bounded, but user IDs include a high-cardinality experiment bucket suffix appended last week. Triage and fix.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.