System designHardsd-g712

Subject Object storageLevel Senior–Staff~45 minCommon in Storage & CDN interviewsIndustries Technology

Question

Design the storage layer for a data lake holding petabytes of analytical data as columnar (Parquet) files on object storage, queried by Spark/Presto-style engines. Workloads are append-heavy ingestion plus large analytical scans that read a few columns across billions of rows, with occasional row-level updates/deletes (GDPR erasure). How do you organize files, make queries fast, and support mutation on an immutable object store?

What a strong answer looks like

Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.

Learn the concepts

Narrate your design

Loading whiteboard…

Run or narrate your approach, then ask the coach.