Turn a click you made one second ago into the next thing you see.
Open any feed — a video app, a shop, a music queue — and the order of items is not fixed. It is scored just for you, on the fly, from signals about what you just did. Tap a running-shoe video, linger on it for eight seconds, and within a heartbeat the next thing in line shifts to match.
Real-time personalization is the machinery behind that shift. It reads your live signals from a fast feature store, packs them into a feature vector, asks a lightweight ranking model to score every candidate item, and re-sorts the feed — all inside a tight latency budget of a few milliseconds. When a new signal lands, it does the whole thing again.
Three pieces, each tuned for speed. An online feature store holds fresh per-user signals keyed by user_id, answering in about a millisecond. A small ranking model (often gradient-boosted trees or a shallow net) takes the feature vector plus one candidate and emits a relevance score. A low-latency serving layer scores every candidate, sorts, and returns the top K. The candidates themselves come from a cheaper upstream retrieval step; ranking only re-orders that short list.
def rank_feed(user_id, candidates, K=5):
feats = feature_store.get(user_id) # online lookup, ~1ms
scores = [model.score(feats, c) for c in candidates]
ranked = sorted(zip(candidates, scores),
key=lambda cs: cs[1], reverse=True)
return [c for c, s in ranked[:K]] # top-K feed
# A fresh signal arrives -> write it, then re-rank with current candidates
def on_event(user_id, event):
feature_store.update(user_id, event) # e.g. click, dwell
return rank_feed(user_id, current_candidates(user_id))
| Decision | Cheaper / faster | Richer / slower |
|---|---|---|
| Feature freshness | Batch features, minutes old: trivial latency | Streaming features, seconds old: adds write + read cost on the hot path |
| Re-rank strategy | Incremental: nudge only affected scores | Full re-rank every event: simplest and correct, more compute per click |
| Candidate count | Small set (~50): low p99, may miss good items | Large set (~500): better recall, p99 climbs |
| Model size | GBDT / shallow net: sub-ms scoring | Deep model: more accurate, must batch and use a GPU |
| Where to score | Server-side: one place, easy to update | On-device: zero round-trip, harder to ship and debug |
Mara opens her feed at 7:42 PM. The feature store returns recent_click=running shoes, dwell=8s, time=evening, follows=trail-running. The model scores five candidates and the feed settles into this order — a cooking clip on top, a trail-running video down at rank 4:
before: 1. Easy weeknight pasta 0.61
2. Lo-fi study mix 0.55
3. City marathon recap 0.48
4. Trail running in the Alps 0.41 <- candidate
5. Desk setup tour 0.33
Then Mara taps the marathon recap and watches 12 seconds of it. That single event writes two fresh features — recent_click=running, dwell=12s — which line up with her follows=trail-running signal. On the next request the model re-scores, the trail-running video jumps, and the feed re-sorts:
after: 1. Trail running in the Alps 0.88 <- rank 4 -> rank 1
2. City marathon recap 0.79
3. Lo-fi study mix 0.52
4. Easy weeknight pasta 0.44
5. Desk setup tour 0.30
No model retraining happened — only the feature vector changed. That is the whole trick: keep the model fixed and let fresh features move the order in milliseconds.
1. A user clicks an item. The feed does not change at all, even on refresh. Where would you look first?
2. Your p50 latency is great but p99 occasionally times out. What is the calmer fix?