Step 3 — The Recursive-Descent Parser
Goal: turn a token stream into the AST you designed in Step 2, respecting precedence and associativity.
The Grammar (Recap From CONCEPTS.md)
expr = term { ('+' | '-') term }
term = factor { ('*' | '/') factor }
factor = NUMBER
| '(' expr ')'
| '-' factor
Three rules, three functions. Recursive descent is a literal transcription.
The Cursor Helpers
const Token& Parser::peek() const { return tokens_[pos_]; }
const Token& Parser::advance() { return tokens_[pos_++]; }
bool Parser::check(TokenKind k) const { return peek().kind == k; }
bool Parser::match(TokenKind k) { if (check(k)) { advance(); return true; } return false; }
const Token& Parser::expect(TokenKind k, const char* msg) {
if (!check(k)) throw ParseError(/* … */);
return advance();
}
Five helpers cover every recursive-descent parser ever written:
peek— look without consumingadvance— consume onecheck— peek's kind matches?match— if so, consume and return trueexpect— must match, or throw
This vocabulary scales: Clang's parser uses essentially the same primitives, just with richer diagnostics in expect.
parseExpr — Lowest Precedence
ExprPtr Parser::parseExpr() {
ExprPtr left = parseTerm();
while (check(TokenKind::Plus) || check(TokenKind::Minus)) {
TokenKind op = peek().kind; advance();
ExprPtr right = parseTerm();
left = std::make_unique<BinaryExpr>(op, std::move(left), std::move(right));
}
return left;
}
Reads aloud: "parse a term, then while the next token is + or -, consume the operator, parse another term, and wrap the previous result as the new left."
Why The While Loop = Left Associativity
Trace 5 - 3 - 1:
Initial: left = parseTerm() = (5)
Iter 1: token = -, consume
right = parseTerm() = (3)
left = BinaryExpr(-, (5), (3)) ← (5 - 3)
Iter 2: token = -, consume
right = parseTerm() = (1)
left = BinaryExpr(-, (5-3), (1)) ← ((5 - 3) - 1)
Done. evaluate → 1
Each iteration nests the previous left on the inside. The tree leans left. That's left-associativity.
To make - right-associative (5 - 3 - 1 = 5 - (3 - 1) = 3), you'd write:
ExprPtr right = parseExpr(); // recurse instead of loop
return std::make_unique<BinaryExpr>(op, std::move(left), std::move(right));
Recursion-on-the-right ⇒ right-associativity. This is exactly how we'd handle ^ (exponentiation) in Step 6's extension.
parseTerm — Same Pattern, Higher Precedence
ExprPtr Parser::parseTerm() {
ExprPtr left = parseFactor();
while (check(TokenKind::Star) || check(TokenKind::Slash)) {
TokenKind op = peek().kind; advance();
ExprPtr right = parseFactor();
left = std::make_unique<BinaryExpr>(op, std::move(left), std::move(right));
}
return left;
}
Identical shape; the only changes are the operators consumed and the next-level call (parseFactor). Adding a new precedence level (modulo, comparison, etc.) means inserting another function in the call chain.
Why This = Higher Precedence Than +/-
When parseExpr calls parseTerm, control descends into parseTerm's while loop. That loop consumes all the * and / operators before returning. By the time parseTerm returns to parseExpr, the multiplication has already been packaged into a sub-tree. parseExpr then attaches that sub-tree as either left or right of its +/- node.
* binds tighter because parseTerm "grabs" its operators first — they're inside its loop, not parseExpr's.
parseFactor — Atoms and Recursion Back to parseExpr
ExprPtr Parser::parseFactor() {
if (check(TokenKind::Number)) {
double v = peek().value; advance();
return std::make_unique<NumberExpr>(v);
}
if (match(TokenKind::LParen)) {
ExprPtr inner = parseExpr(); // back to the top
expect(TokenKind::RParen, "expected ')'");
return inner;
}
if (check(TokenKind::Minus)) {
advance();
ExprPtr operand = parseFactor(); // right-recursive
return std::make_unique<UnaryExpr>(TokenKind::Minus, std::move(operand));
}
if (check(TokenKind::Error)) { /* propagate lex error */ }
throw ParseError(/* … */);
}
Three productions:
NUMBER— consume and wrap.'(' expr ')'— consume(, recurse all the way back toparseExpr(precedence resets!), expect). This is how parens override precedence.'-' factor— unary minus.--5works because the recursion is onparseFactor, not directly creating a Number.
Why Parens Override Precedence
(1 + 2) * 3:
parseExpr→parseTerm→parseFactorparseFactorsees(, callsparseExpragain.- That
parseExpr(the inner one) parses1 + 2→BinaryExpr(+, 1, 2). - Outer
parseFactorconsumes), returnsBinaryExpr(+, 1, 2). - Control returns to outer
parseTerm. Itsleftis(1+2). Loop sees*, parses3. BuildsBinaryExpr(*, (1+2), 3).
The parens temporarily gave us "expr-level reset" inside what would have been factor-level parsing. The grammar isn't doing anything magical; the recursion-back-to-parseExpr is.
Trip Through The Full Pipeline
For 2 * (3 + 4):
Tokens: NUMBER(2) STAR LPAREN NUMBER(3) PLUS NUMBER(4) RPAREN EOF
parseExpr
└─ parseTerm
├─ parseFactor → NumberExpr(2)
├─ sees STAR, consume; parseFactor:
│ ├─ sees LPAREN, consume
│ ├─ parseExpr (inner)
│ │ └─ parseTerm
│ │ └─ parseFactor → NumberExpr(3)
│ │ (no */)
│ │ ├─ sees PLUS, consume; parseTerm → parseFactor → NumberExpr(4)
│ │ └─ returns BinaryExpr(+, 3, 4)
│ └─ consume RPAREN; returns BinaryExpr(+, 3, 4)
└─ wrap into BinaryExpr(*, 2, BinaryExpr(+, 3, 4))
Result: BinaryExpr(*, NumberExpr(2), BinaryExpr(+, NumberExpr(3), NumberExpr(4)))
Error Cases
| Input | What Happens |
|---|---|
"" | parse() sees Eof immediately → throws "empty input" |
1 + | parseExpr parses 1, consumes +, calls parseTerm → parseFactor which sees Eof and throws |
(1 + 2 | parseFactor matches (, recurses, then expect(RParen) fails → throws "expected ')'" |
1 2 | parseExpr parses 1, no +/-, returns. parse() checks for Eof, finds NUMBER(2) → throws "unexpected token" |
1 / 0 | parses fine; the evaluator throws (Step 4) |
1 + abc | lexer emits Error token; parseFactor propagates it |
All five error categories live in this lab. Phase 9 (Diagnostics) replaces these one-line throws with Clang-style source spans + fix-it hints, but the detection logic is identical.
Why Is This Called "LL(1)"?
- Left-to-right scan of input.
- Leftmost derivation produced.
- 1 token of lookahead (we only ever call
peek()once per decision; neverpeek(2)).
LL(1) is the most common parsing class for hand-written compilers. Some real languages need LL(2) or even unbounded lookahead (C++ template parsing) — Clang uses tentative parsing in those spots.
Next
→ 04-the-evaluator.md — walk the tree and produce a number.