Code Room
CodingMediumcod-g687
Subject TokenizationLevel Mid–Senior~30 minCommon in Algorithms & data structures interviewsIndustries Software development

Question

Tokenize source text while tracking source positions for error reporting. Given a string, emit one token per maximal run of either word characters ([A-Za-z0-9_]) or punctuation that is a single non-space, non-word character. Whitespace (spaces, tabs, newlines) separates tokens and is not emitted, but advances position. For each token return [text, line, col] where line is 1-based (incremented by '\n') and col is the 1-based column of the token's first character (column resets to 1 after each newline). Return the list of [text, line, col].

Implement
tokenize_pos(src: str) → list[list]
Examples
in["ab c\nd"]out[["ab",1,1],["c",1,4],["d",2,1]]
What a strong answer looks like

State your approach and its time/space complexity out loud before you optimize. Handle the edge cases (empty input, duplicates, overflow), and say why you chose this over the brute force. Green tests are the floor, not the grade.

Vibe coding: describe the solution in plain language (or narrate it) and the coach grades your approach. Generating runnable code from your description is coming next.

Run or narrate your approach, then ask the coach.