Code Room
System designHard
Question
Design a centralized log aggregation system ingesting 2TB/day of application logs (peak 60MB/s) from 10,000 containers, with structured JSON and free-text lines mixed. Most queries are 'tail the last 15 minutes for service X' or 'grep this error string across service Y for the last 24h'; a minority are 30-day forensic searches. Cost matters more than millisecond query latency. Design ingest, storage, indexing, and how you make 30-day grep affordable.
What a strong answer looks like
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.
Learn the concepts
Loading whiteboard…
Run or narrate your approach, then ask the coach.