Code Room
On-callHardoc-g359
Subject DdosLevel Senior–Staff~40 minCommon in Reliability & on-call interviewsIndustries Software development, Technology

Question

Your product-search API origin CPU saturates and p99 triples, but your CDN cache-hit ratio quietly **collapsed** from 96% to 11% over an hour. Traffic volume at the edge is only up ~20% — not a volumetric flood. Inspection shows requests appending random query params (`?_=<rand>`, `&v=<uuid>`) to otherwise cacheable search URLs, busting the cache so every request hits origin. Sources are ~40k residential IPs, realistic UAs, each slow and human-paced; many requests carry a valid logged-in cookie. It correlates with a competitor's price-scraping season. How do you triage and mitigate without blocking real shoppers?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Diagram & narrate the incident
Loading whiteboard…
Run or narrate your approach, then ask the coach.