Question
Design a near-real-time lakehouse for IoT/telematics: ~3M connected vehicles emit GPS + sensor pings every few seconds (~hundreds of thousands of writes/sec), and the platform must serve both (a) a 'latest state per vehicle' lookup used by an ops dashboard, and (b) historical analytical queries over months of pings. Pings arrive out of order and late (a vehicle reconnects and dumps an hour of buffered pings). You need fresh latest-state and queryable history on object storage without a separate OLTP database. Design ingestion, the upsert path, and the layout.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.