Question
Design the real-time analytics pipeline for a mobile app that aggregates per-minute active-user and event counts from clients that frequently go offline. Events carry a client-side event_time but can arrive minutes-to-hours late (subway rides, airplane mode, flaky networks); some arrive out of order within a single device. You must publish per-minute aggregates within ~10 seconds for a live ops dashboard, but the numbers must also eventually be correct when the stragglers land. How do you handle watermarks, lateness, and the inevitable correction?
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.