Code Room
System designHard
Question
Design a distributed file system for a data/analytics platform (GFS/HDFS-style) that backs large sequential reads. Constraints: files are mostly huge (GBs to TBs), write-once/append-mostly with concurrent appends, throughput matters more than latency, hundreds of PB across thousands of commodity nodes, and the system must tolerate frequent node failures. Cover the namespace, chunking, and replication.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.