Question
A rolling deploy of the `catalog` API changes a list endpoint: a field that was always present, `inventory_count` (integer), is now OMITTED entirely for out-of-stock items (instead of returning 0) as a payload-size optimization. Web is fine. But during and after the rollout, the `cart` service (a SEPARATE backend caller, deployed weeks ago) starts treating some items as having `inventory_count = null → unlimited`, allowing oversell, and the rate roughly tracks the fraction of `catalog` pods on the new version. Dashboards: no errors (cart's deserializer defaults a missing field to its 'unlimited' sentinel), p99 normal; an oversell business metric climbs. Triage, explain why it tracks the rollout, and mitigate.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.