Step 2 — The AST
Goal: design the tree data structure the parser will build and the evaluator will walk.
Three Node Types Cover All Of Arithmetic
| Node | Represents | Children |
|---|---|---|
NumberExpr | 42, 3.14 | none |
BinaryExpr | a + b, a * b, … | lhs, rhs |
UnaryExpr | -a | operand |
That's it. Three classes, no surprises.
Class Hierarchy + Visitor
struct NumberExpr; struct BinaryExpr; struct UnaryExpr;
template <typename R>
struct ExprVisitor {
virtual ~ExprVisitor() = default;
virtual R visit(NumberExpr&) = 0;
virtual R visit(BinaryExpr&) = 0;
virtual R visit(UnaryExpr&) = 0;
};
struct Expr {
virtual ~Expr() = default;
virtual double accept(ExprVisitor<double>& v) = 0;
};
struct NumberExpr : Expr {
double value;
explicit NumberExpr(double v) : value(v) {}
double accept(ExprVisitor<double>& v) override { return v.visit(*this); }
};
// BinaryExpr, UnaryExpr follow the same accept pattern
The dance:
- Each node has
accept(Visitor&)— one virtual call site. acceptimmediately callsvisitor.visit(*this)— and because*thishas its concrete type at the call site, the right overload is selected at compile time.- The visitor's
visit(NumberExpr&)etc. are user-defined per pass.
This is the double dispatch trick: virtual dispatch picks the node's concrete type; static dispatch (overloading) picks the operation on it.
Why Not Just A Switch / std::variant?
Two viable alternatives:
Alternative A — std::variant<Number, Binary, Unary>
using Expr = std::variant<NumberExpr, BinaryExpr, UnaryExpr>;
double eval(const Expr& e) {
return std::visit(overloaded{
[](const NumberExpr& n) { return n.value; },
[](const BinaryExpr& b) { /* … */ },
[](const UnaryExpr& u) { /* … */ }
}, e);
}
Pros: no virtuals, slightly faster, more "modern C++".
Cons: every new node type requires updating every visit site (or using if constexpr); recursive types need extra wrapping (std::variant<NumberExpr, std::unique_ptr<BinaryExpr>, …>); error messages are worse.
Alternative B — Switch On Tag
struct Expr { ExprKind kind; /* … */ };
double eval(const Expr& e) {
switch (e.kind) {
case ExprKind::Number: /* … */;
case ExprKind::Binary: /* … */;
}
}
Pros: zero virtual overhead; single allocation possible (one Expr struct sized for the largest variant).
Cons: every pass touches every node type's data layout; no encapsulation.
Which does production use?
- Clang's AST: tagged union via
Stmt::getStmtClass()+ visitor (RecursiveASTVisitor). - rustc's AST: enums (Rust's
std::variant-equivalent). - LLVM IR's
Instruction: tagged class hierarchy + visitor.
We use the virtual + Visitor pattern here because it's the most teachable and the most directly comparable to Clang's design. The trade-offs are real — we discuss them in docs/analysis.md.
Ownership: std::unique_ptr
using ExprPtr = std::unique_ptr<Expr>;
struct BinaryExpr : Expr {
TokenKind op;
ExprPtr lhs;
ExprPtr rhs;
BinaryExpr(TokenKind o, ExprPtr l, ExprPtr r)
: op(o), lhs(std::move(l)), rhs(std::move(r)) {}
};
Children are owned via unique_ptr. A tree is not shared — destroying the root recursively destroys the whole tree.
Why not shared_ptr? Compilers never need shared ownership of AST nodes. Sharing would imply two parents, and ASTs are trees — by definition no shared parents.
Why not raw pointers + manual delete? Memory leak waiting to happen, especially with parser exceptions. unique_ptr is RAII for ownership.
Why not arena allocation? It's the technique production compilers use (LLVM has BumpPtrAllocator; Clang's ASTContext allocates from a single arena). We'll add arena allocation in cp-09 when we have many AST nodes per program. For arithmetic, individual unique_ptr allocations are fine.
The accept Method, Concretely
double NumberExpr::accept(ExprVisitor<double>& v) { return v.visit(*this); }
When the evaluator runs someExpr->accept(eval):
- Virtual dispatch goes to the right
accept(e.g.,NumberExpr::accept). - That calls
eval.visit(*this)— and because*thisis aNumberExpr&, overload resolution choosesEvaluator::visit(NumberExpr&).
The combination is sometimes called "double dispatch" — but conceptually it's just: one virtual call routes to the right type, then static overload routes to the right operation.
The Template Parameter R
template <typename R>
struct ExprVisitor {
virtual R visit(NumberExpr&) = 0;
virtual R visit(BinaryExpr&) = 0;
virtual R visit(UnaryExpr&) = 0;
};
R is the return type. We define Evaluator : ExprVisitor<double>. A printer would be Printer : ExprVisitor<std::string>. A type-checker would be TypeChecker : ExprVisitor<Type>. Code generation in cp-11 will be LLVMGen : ExprVisitor<llvm::Value*>.
In this lab Expr::accept is hard-wired to ExprVisitor<double> — sufficient for our one visitor. In production you'd either define multiple accept overloads or use std::any/erased return types. Clang uses a non-templated visitor + side-effects to a Result member; rustc uses Rust enums and match.
Try It
After we build, this minimal main exercises the AST manually:
auto five = std::make_unique<NumberExpr>(5.0);
auto three = std::make_unique<NumberExpr>(3.0);
auto plus = std::make_unique<BinaryExpr>(TokenKind::Plus,
std::move(five),
std::move(three));
Evaluator e;
std::cout << e.eval(*plus) << "\n"; // → 8
(For the actual interactive evaluator we go through the parser; this is just to confirm the AST machinery works in isolation.)
Next
→ 03-recursive-descent-parser.md — build the AST from tokens.