Question
Design a file-processing pipeline for a document/image platform handling ~200K uploads/day (avg 4MB, up to 200MB). Each upload must be virus-scanned, and for images a set of thumbnails generated; for documents a text extraction for search. Uploads come from browsers and mobile. Requirements: the upload request returns fast (user shouldn't wait for processing), processing is best-effort-soon but can take seconds-to-minutes, a scan failure must quarantine the file (never serve it), and the pipeline must survive a worker crash mid-job without losing or double-serving files.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.