Code Room
CodingMediumcod-g453
Subject TokenizationLevel Mid–Senior~30 minCommon in Algorithms & data structures interviewsIndustries Software development

Question

Write a lexer for a small expression language. Given a source string, produce a list of [type, value] tokens. Token types: 'NUM' for integer or decimal numbers (digits with at most one optional internal dot, e.g. '12', '3.14'), 'ID' for identifiers (a letter or underscore followed by letters/digits/underscores), 'OP' for any of the single chars + - * / ( ) = , and 'WS' is skipped (spaces and tabs are ignored, never emitted). The value of a NUM/ID token is the matched substring; the value of an OP is the single char. Unknown characters never occur. Return the token list in order.

Implement
tokenize_expr(src: str) → list[list]
Examples
in["x = 3.14 + y2"]out[["ID","x"],["OP","="],["NUM","3.14"],["OP","+"],["ID","y2"]]
What a strong answer looks like

State your approach and its time/space complexity out loud before you optimize. Handle the edge cases (empty input, duplicates, overflow), and say why you chose this over the brute force. Green tests are the floor, not the grade.

Vibe coding: describe the solution in plain language (or narrate it) and the coach grades your approach. Generating runnable code from your description is coming next.

Run or narrate your approach, then ask the coach.