Question
At 12:00 your app's images, uploads, and even some page loads break in one region. Dashboards: calls to object storage (S3-style) in us-east-1 return 503 'SlowDown' and elevated errors; the cloud provider's status page confirms a regional object-storage degradation. Your compute is healthy, but several services that synchronously read config/feature-flag JSON from that bucket on each request are now timing out and cascading. Recent context: a service was recently changed to read its feature-flag config from S3 on every request instead of caching it. How do you triage and mitigate?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.