Code Room
CodingHardcod-g1062
Subject Machine learningLevel Mid–Senior~30 minCommon in ML systems interviewsIndustries Software development

Question

Compute TF-IDF for a tiny corpus. Given docs, a list of documents where each document is a list of word tokens, and a query word, return for each document its TF-IDF weight for that word, in document order. Term frequency tf(word, doc) = count of word in doc / total tokens in doc (0 if the doc is empty). Inverse document frequency idf(word) = ln(N / (1 + df)) + 1, where N is the number of documents and df is the number of documents that contain the word (the +1 is smoothing in the denominator). TF-IDF = tf * idf. Return the list of weights, each rounded to 6 decimals. There are at least 1 document; documents may be empty.

Implement
tfidf_column(docs: list[list[str]], query: str) → list[float]
Examples
in[[["the","cat","sat"],["the","dog","ran"],["a","cat","cat"]],"cat"]out[0.333333,0,0.666667]
What a strong answer looks like

State your approach and its time/space complexity out loud before you optimize. Handle the edge cases (empty input, duplicates, overflow), and say why you chose this over the brute force. Green tests are the floor, not the grade.

Vibe coding: describe the solution in plain language (or narrate it) and the coach grades your approach. Generating runnable code from your description is coming next.

Run or narrate your approach, then ask the coach.