Code Room
System designMediumsd-g499
Subject Image processingLevel Mid–Senior~40 minCommon in Algorithms & data structures interviewsIndustries Technology, Software development

Question

Design the near-duplicate detection + clustering system for a consumer photo-backup product that has 100B photos and wants to (a) avoid re-storing the same photo a user uploads twice from different devices, and (b) group visually near-identical shots (burst frames, the same scene re-shared/recompressed) so the UI can show one representative and the user can free space. Exact-hash dedup misses re-encoded/resized copies; you need perceptual matching at planet scale without an O(n^2) all-pairs comparison.

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Narrate your design
Loading whiteboard…
Run or narrate your approach, then ask the coach.