Question
Design a footfall/dwell-time analytics pipeline: from a stream of anonymized device location pings, attribute visits to physical venues (a store, a mall) and compute metrics like visit count, average dwell time, and visit frequency per venue per day. Venues are polygons; ~500M pings/day; pings are sparse and noisy (a device pings irregularly, GPS drifts). The output feeds retail-analytics dashboards. Describe how you attribute pings to venues, how you reconstruct discrete 'visits' and dwell time from sparse pings, the streaming pipeline, and the accuracy vs noise trade-off.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.