Online Learning Feedback Loops

When a recommendation algorithm becomes a self-fulfilling prophecy.

The idea

A video recommendation algorithm learns what you like by looking at what you click. If it predicts you like Cat videos, it shows you Cat videos. You click them (because they are the only thing on the screen), and the model updates itself: "Wow, I was right, they love Cats!" This is a Feedback Loop. The model trains on data that it generated itself. Over time, the model becomes wildly overconfident, narrowing your recommendations into a tiny echo chamber, completely unaware that you might have also loved Dog videos if it had ever bothered to show you one.

Step 1: The model thinks you like Cats slightly more than Dogs (55% to 45%).

How it works (Exploration vs Exploitation)

To break the feedback loop, systems must use Epsilon-Greedy Exploration or Multi-Armed Bandits. You must force the algorithm to intentionally make "sub-optimal" recommendations a small percentage of the time (e.g., 5% of the time, show a completely random video). This gathers fresh, unbiased ground-truth data, allowing the model to discover new interests and correct its own biases.

// Breaking the Feedback Loop (Epsilon-Greedy)

function getRecommendations(user) {
    const EPSILON = 0.05; // 5% Exploration rate
    
    if (Math.random() < EPSILON) {
        // EXPLORE: Show something completely random.
        // If the user clicks this, the model learns something NEW.
        return getRandomVideo(); 
    } else {
        // EXPLOIT: Show what the model thinks is best.
        // This maximizes short-term clicks.
        return mlModel.predictBestVideo(user);
    }
}

Cost

Exploration has a real monetary cost. By intentionally showing random or "worse" videos 5% of the time, you guarantee a slight drop in immediate click-through rates and ad revenue today. You are trading short-term profits for long-term model health and user retention (preventing them from getting bored of the echo chamber).

Watch out for

Shadow Banning: If a new creator uploads a video, a purely exploitative model will never show it because it has no historical clicks to prove it's good. Without forced exploration, new content can never break into the ecosystem.