CodingMediumcod-g232

Subject TokenizationLevel Mid–Senior~30 minCommon in Algorithms & data structures interviewsIndustries Software development

Question

Write a lexer for a tiny language. Given a source string, produce a list of (type, value) tuples. Token types: 'NUM' for a maximal run of digits (value is the substring), 'ID' for a maximal run of letters/underscores/digits that starts with a letter or underscore (value is the substring), 'OP' for any single character in the set + - * / = ( ) (value is that character). Whitespace (spaces, tabs, newlines) separates tokens and is otherwise ignored. Any other character is invalid: stop and return whatever tokens were produced so far plus a final ('ERR', <the bad char>) tuple. On clean input, no ERR is appended. Return the token list (empty for empty/all-whitespace input).

Implement

tokenize(src: str) → list

Examples

in["x = 42 + y1"]out[["ID","x"],["OP","="],["NUM","42"],["OP","+"],["ID","y1"]]

What a strong answer looks like

State your approach and its time/space complexity out loud before you optimize. Handle the edge cases (empty input, duplicates, overflow), and say why you chose this over the brute force. Green tests are the floor, not the grade.

Learn the concepts

Vibe coding: describe the solution in plain language (or narrate it) and the coach grades your approach. Generating runnable code from your description is coming next.

Run or narrate your approach, then ask the coach.