On-callMediumoc-g650

Subject Object storage read after write consistencyLevel Mid–Senior~30 minCommon in Databases & SQL · Storage & CDN interviewsIndustries Technology

Question

Your image-processing pipeline writes a derived thumbnail to an object store and immediately enqueues a job whose worker reads that object back to do a second transform. On-call is paged: ~0.5% of jobs fail with 'object not found' or read a stale previous version, even though the write 'succeeded'. Dashboards: write success rate is 100%; the read-back error correlates with jobs where the read happens <200ms after the write; the failures cluster on objects served from a specific replica region; a recent migration moved this bucket from a strongly-consistent store to a different object store (or a cross-region replicated bucket). No errors on the write side. How do you triage and fix these read-after-write consistency misses?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.