CodingEasycod-g1350

Subject Ml metricsLevel Entry–Mid~10 minCommon in ML systems · Algorithms & data structures interviewsIndustries Software development

Question

A suspiciously high eval score often means test data leaked into training. Given train and test, two lists of example strings (chat messages used to train and evaluate a moderation model), return how many entries of test also appear anywhere in train, using exact string comparison. Count each test entry every time it occurs in the test list, but at most once per occurrence regardless of how many times it appears in train. Example: train = ["buy now", "hello team"], test = ["hello team", "lunch?", "hello team"] returns 2.

Implement

leakage_count(train: list[str], test: list[str]) → int

Examples

in[["buy now","hello team"],["hello team","lunch?","hello team"]]out2

What a strong answer looks like

State your approach and its time/space complexity out loud before you optimize. Handle the edge cases (empty input, duplicates, overflow), and say why you chose this over the brute force. Green tests are the floor, not the grade.

Learn the concepts

Vibe coding: describe the solution in plain language (or narrate it) and the coach grades your approach. Generating runnable code from your description is coming next.

Run or narrate your approach, then ask the coach.