Question
A suspiciously high eval score often means test data leaked into training. Given train and test, two lists of example strings (chat messages used to train and evaluate a moderation model), return how many entries of test also appear anywhere in train, using exact string comparison. Count each test entry every time it occurs in the test list, but at most once per occurrence regardless of how many times it appears in train. Example: train = ["buy now", "hello team"], test = ["hello team", "lunch?", "hello team"] returns 2.
leakage_count(train: list[str], test: list[str]) → int[["buy now","hello team"],["hello team","lunch?","hello team"]]out2State your approach and its time/space complexity out loud before you optimize. Handle the edge cases (empty input, duplicates, overflow), and say why you chose this over the brute force. Green tests are the floor, not the grade.
Vibe coding: describe the solution in plain language (or narrate it) and the coach grades your approach. Generating runnable code from your description is coming next.