Real-time personalization

Turn a click you made one second ago into the next thing you see.

The idea

Open any feed — a video app, a shop, a music queue — and the order of items is not fixed. It is scored just for you, on the fly, from signals about what you just did. Tap a running-shoe video, linger on it for eight seconds, and within a heartbeat the next thing in line shifts to match.

Real-time personalization is the machinery behind that shift. It reads your live signals from a fast feature store, packs them into a feature vector, asks a lightweight ranking model to score every candidate item, and re-sorts the feed — all inside a tight latency budget of a few milliseconds. When a new signal lands, it does the whole thing again.

See it work

LIVE FEATURES RANKING MODEL FEED
Press Play to watch one click become a re-ranked feed.

How it works

Three pieces, each tuned for speed. An online feature store holds fresh per-user signals keyed by user_id, answering in about a millisecond. A small ranking model (often gradient-boosted trees or a shallow net) takes the feature vector plus one candidate and emits a relevance score. A low-latency serving layer scores every candidate, sorts, and returns the top K. The candidates themselves come from a cheaper upstream retrieval step; ranking only re-orders that short list.

def rank_feed(user_id, candidates, K=5):
    feats = feature_store.get(user_id)              # online lookup, ~1ms
    scores = [model.score(feats, c) for c in candidates]
    ranked = sorted(zip(candidates, scores),
                    key=lambda cs: cs[1], reverse=True)
    return [c for c, s in ranked[:K]]               # top-K feed

# A fresh signal arrives -> write it, then re-rank with current candidates
def on_event(user_id, event):
    feature_store.update(user_id, event)            # e.g. click, dwell
    return rank_feed(user_id, current_candidates(user_id))

Trade-offs

DecisionCheaper / fasterRicher / slower
Feature freshnessBatch features, minutes old: trivial latencyStreaming features, seconds old: adds write + read cost on the hot path
Re-rank strategyIncremental: nudge only affected scoresFull re-rank every event: simplest and correct, more compute per click
Candidate countSmall set (~50): low p99, may miss good itemsLarge set (~500): better recall, p99 climbs
Model sizeGBDT / shallow net: sub-ms scoringDeep model: more accurate, must batch and use a GPU
Where to scoreServer-side: one place, easy to updateOn-device: zero round-trip, harder to ship and debug

Watch out for

Worked example

Mara opens her feed at 7:42 PM. The feature store returns recent_click=running shoes, dwell=8s, time=evening, follows=trail-running. The model scores five candidates and the feed settles into this order — a cooking clip on top, a trail-running video down at rank 4:

before:   1. Easy weeknight pasta        0.61
          2. Lo-fi study mix            0.55
          3. City marathon recap        0.48
          4. Trail running in the Alps  0.41   <- candidate
          5. Desk setup tour            0.33

Then Mara taps the marathon recap and watches 12 seconds of it. That single event writes two fresh features — recent_click=running, dwell=12s — which line up with her follows=trail-running signal. On the next request the model re-scores, the trail-running video jumps, and the feed re-sorts:

after:    1. Trail running in the Alps  0.88   <- rank 4 -> rank 1
          2. City marathon recap        0.79
          3. Lo-fi study mix            0.52
          4. Easy weeknight pasta       0.44
          5. Desk setup tour            0.30

No model retraining happened — only the feature vector changed. That is the whole trick: keep the model fixed and let fresh features move the order in milliseconds.

Check yourself

1. A user clicks an item. The feed does not change at all, even on refresh. Where would you look first?

2. Your p50 latency is great but p99 occasionally times out. What is the calmer fix?