Question
A Python (FastAPI/uvicorn) reporting endpoint's CPU climbs from 30% to 95% and p99 goes from 200ms to 9s starting at 10:05, while request rate is flat and DB query time is unchanged. A py-spy flame graph shows ~85% of on-CPU time in JSON serialization and a nested list-membership check (`if x in big_list`) inside a per-row loop. The only change this morning was a 'cosmetic' update that made each report include the full list of all related entities per row instead of just IDs — average rows-per-report didn't change, but the *per-row* payload and the related-entity list got much bigger. Triage and mitigate.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.