Question
Design the embedding-versioning and backfill system for a recommendations platform with 800M item embeddings and 400M user embeddings, where embeddings from the item-encoder and the user-encoder must live in the same vector space to be comparable. The ML team ships a new encoder roughly monthly. The hard constraint: a v5 user embedding is meaningless against a v4 item embedding, and you cannot freeze traffic for the hours/days it takes to re-embed 1.2B entities. Walk through how you version embeddings, run the backfill, and switch retrieval over to the new space without ever mixing versions in a single query.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.