Code Room
On-callMedium
Question
Your product page calls an upstream catalog service for pricing and availability. Starting 13:10, customers report pages rendering with blank prices and 'unavailable' badges on items that are in stock. Your service treats this as a soft success — no errors thrown, page still renders. Dashboards: upstream catalog calls return HTTP 200, latency is actually slightly LOWER than normal, your error rate is flat. The upstream team deployed a caching-layer change at 13:05. How do you triage and mitigate, and what would you change in your client?
What a strong answer looks like
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.