Real-time personalization

Turn a click you made one second ago into the next thing you see.

The idea

Open any feed — a video app, a shop, a music queue — and the order of items is not fixed. It is scored just for you, on the fly, from signals about what you just did. Tap a running-shoe video, linger on it for eight seconds, and within a heartbeat the next thing in line shifts to match.

Real-time personalization is the machinery behind that shift. It reads your live signals from a fast feature store, packs them into a feature vector, asks a lightweight ranking model to score every candidate item, and re-sorts the feed — all inside a tight latency budget of a few milliseconds. When a new signal lands, it does the whole thing again.

See it work

Press Play to watch one click become a re-ranked feed.

How it works

Three pieces, each tuned for speed. An online feature store holds fresh per-user signals keyed by user_id, answering in about a millisecond. A small ranking model (often gradient-boosted trees or a shallow net) takes the feature vector plus one candidate and emits a relevance score. A low-latency serving layer scores every candidate, sorts, and returns the top K. The candidates themselves come from a cheaper upstream retrieval step; ranking only re-orders that short list.

def rank_feed(user_id, candidates, K=5):
    feats = feature_store.get(user_id)              # online lookup, ~1ms
    scores = [model.score(feats, c) for c in candidates]
    ranked = sorted(zip(candidates, scores),
                    key=lambda cs: cs[1], reverse=True)
    return [c for c, s in ranked[:K]]               # top-K feed

# A fresh signal arrives -> write it, then re-rank with current candidates
def on_event(user_id, event):
    feature_store.update(user_id, event)            # e.g. click, dwell
    return rank_feed(user_id, current_candidates(user_id))

Trade-offs

Decision	Cheaper / faster	Richer / slower
Feature freshness	Batch features, minutes old: trivial latency	Streaming features, seconds old: adds write + read cost on the hot path
Re-rank strategy	Incremental: nudge only affected scores	Full re-rank every event: simplest and correct, more compute per click
Candidate count	Small set (~50): low p99, may miss good items	Large set (~500): better recall, p99 climbs
Model size	GBDT / shallow net: sub-ms scoring	Deep model: more accurate, must batch and use a GPU
Where to score	Server-side: one place, easy to update	On-device: zero round-trip, harder to ship and debug

Watch out for

Stale features. If the feature store still serves yesterday's behavior, you personalize for a person who no longer exists. A click that never reaches the hot path is a click that never re-ranks the feed.
Training/serving skew. The features computed offline during training must match — name, units, and timing — the ones computed live at serving. A silent mismatch (seconds vs. milliseconds, log-scaled vs. raw) quietly wrecks scores.
Feedback loops and filter bubbles. Showing more of what was clicked teaches the model to show even more of it. Without exploration or a diversity term, the feed narrows until it only mirrors the past.
Cold-start users. A brand-new user has an almost-empty feature vector. Lean on context (time of day, locale, trending items) and popularity priors until real signals accumulate.
Latency budget blowing p99. The average request can be fast while the slow tail times out. Cap candidate count, set a deadline on the feature read, and serve a default order on timeout rather than blocking the page.

Worked example

Mara opens her feed at 7:42 PM. The feature store returns recent_click=running shoes, dwell=8s, time=evening, follows=trail-running. The model scores five candidates and the feed settles into this order — a cooking clip on top, a trail-running video down at rank 4:

before:   1. Easy weeknight pasta        0.61
          2. Lo-fi study mix            0.55
          3. City marathon recap        0.48
          4. Trail running in the Alps  0.41   <- candidate
          5. Desk setup tour            0.33

Then Mara taps the marathon recap and watches 12 seconds of it. That single event writes two fresh features — recent_click=running, dwell=12s — which line up with her follows=trail-running signal. On the next request the model re-scores, the trail-running video jumps, and the feed re-sorts:

after:    1. Trail running in the Alps  0.88   <- rank 4 -> rank 1
          2. City marathon recap        0.79
          3. Lo-fi study mix            0.52
          4. Easy weeknight pasta       0.44
          5. Desk setup tour            0.30

No model retraining happened — only the feature vector changed. That is the whole trick: keep the model fixed and let fresh features move the order in milliseconds.

Check yourself

1. A user clicks an item. The feed does not change at all, even on refresh. Where would you look first?

2. Your p50 latency is great but p99 occasionally times out. What is the calmer fix?