ML Content Moderation (Human-in-the-Loop)

How platforms manage 10 million posts a day without hiring a million reviewers.

The idea

Social networks cannot afford to have a human read every single post. Instead, they pass every post through a fast Machine Learning classifier. The ML model outputs a "Toxicity Score" from 0% to 100%. If the score is extremely low, it's instantly approved. If the score is extremely high, it's instantly deleted. But ML models aren't perfect; they struggle with sarcasm, context, and slang. The middle ground—the grey area—is sent to a queue for manual review by a human. This is called a Human-in-the-Loop architecture.

Step 1: A user submits a post. The ML Model calculates a Toxicity Score (0-100%).

How it works (Thresholds)

The engineering team defines two strict thresholds based on the business's risk tolerance. The ML model is a pipeline step immediately following the database save, often processed asynchronously via a message queue (like Kafka or RabbitMQ) so the user doesn't have to wait for the analysis to finish before seeing "Post submitted".

// Async Worker processing a new post
async function processModeration(post) {
    const toxicity = await mlClassifier.predict(post.text);
    
    // Threshold 1: The Auto-Delete line (e.g. 95%)
    if (toxicity > 0.95) {
        await deletePost(post.id);
        await banUser(post.author);
    } 
    // Threshold 2: The Human Review line (e.g. 70%)
    else if (toxicity > 0.70) {
        await flagPostAsHidden(post.id);
        // Push to a manual review queue
        await humanReviewQueue.push({ postId: post.id, score: toxicity });
    } 
    // Under 70%: Auto-Approve
    else {
        await publishPost(post.id);
    }
}

Cost

Running a Deep Learning text classifier on millions of posts is computationally expensive (requiring GPUs). To save costs, platforms often use a cascading approach: a cheap, fast Regex filter drops obvious spam first, so the expensive ML model only runs on the remaining posts. Human reviewers are the most expensive part of the pipeline, which is why the "grey area" thresholds must be tuned carefully.

Watch out for

Model Drift: Language evolves constantly. If a new slang word or political meme goes viral, a static ML model will confidently categorize it wrong. The decisions made by the Human Reviewers must be continuously fed back into the training dataset to retrain the model daily or weekly.