Strings & parsing

Parsing turns a raw sequence of characters into meaningful chunks and evaluates them step by step.

The idea

A string is fundamentally just an array of characters. To make sense of it—like when calculating a math expression—we must scan it from left to right.

As we scan, we group raw characters into meaningful units (tokens), such as multi-digit numbers. When we encounter operations, we push them onto a stack, waiting until we have enough context to safely evaluate them according to the rules of precedence.

See it work

Press play, or step through.

How it works

We read characters linearly. Digits are accumulated into full numbers. Operators dictate when to pause and evaluate the numbers we've accumulated. Using an explicit stack allows us to handle operations with varying precedences correctly in a single pass.

def evaluate(expression):
    vals, ops = [], []
    i = 0
    
    def process_op():
        op = ops.pop()
        b, a = vals.pop(), vals.pop()
        if op == '+': vals.append(a + b)
        elif op == '*': vals.append(a * b)
        
    while i < len(expression):
        if expression[i].isdigit():
            num = 0
            while i < len(expression) and expression[i].isdigit():
                num = num * 10 + int(expression[i])
                i += 1
            vals.append(num)
            continue
            
        if expression[i] in "+*":
            # Process any pending ops of equal or higher precedence
            while ops and (ops[-1] == '*' or expression[i] == '+'):
                process_op()
            ops.append(expression[i])
            
        i += 1
        
    while ops:
        process_op()
        
    return vals[0] if vals else 0

Cost

OperationTimeSpace
EvaluationO(n)O(n)

We trade O(n) auxiliary memory for stacks to evaluate the expression efficiently in a single pass.

Watch out for

Worked example

Consider evaluating "12+3*4". We scan left-to-right. '1' and '2' are grouped into the number 12 and pushed to the values stack. When we see '+', it waits on the ops stack because we don't know the right-hand side yet. Then 3 is pushed to values. When we see '*', it has higher precedence than '+', so it is also pushed to wait. After 4 is parsed, the string ends. We pop and evaluate the highest precedence first: 3 * 4 = 12, push 12, and then evaluate 12 + 12 = 24.

Check yourself

In a stack-based expression parser, what should you do when you encounter a closing parenthesis )?