Code Room
System designMedium
Question
A lakehouse table (Iceberg/Delta on S3) ingests near-real-time via tiny streaming micro-batch commits every few seconds, producing millions of small files and tens of thousands of metadata/snapshot entries per day. Trino queries that used to take seconds now take minutes, query planning alone is slow, S3 LIST/GET costs have ballooned, and writers occasionally hit commit conflicts. Design the table maintenance and layout strategy to restore fast queries without sacrificing the few-second ingest freshness.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.