Step 2 — The AST

Goal: design the tree data structure the parser will build and the evaluator will walk.

Three Node Types Cover All Of Arithmetic

NodeRepresentsChildren
NumberExpr42, 3.14none
BinaryExpra + b, a * b, …lhs, rhs
UnaryExpr-aoperand

That's it. Three classes, no surprises.

Class Hierarchy + Visitor

struct NumberExpr; struct BinaryExpr; struct UnaryExpr;

template <typename R>
struct ExprVisitor {
    virtual ~ExprVisitor() = default;
    virtual R visit(NumberExpr&) = 0;
    virtual R visit(BinaryExpr&) = 0;
    virtual R visit(UnaryExpr&)  = 0;
};

struct Expr {
    virtual ~Expr() = default;
    virtual double accept(ExprVisitor<double>& v) = 0;
};

struct NumberExpr : Expr {
    double value;
    explicit NumberExpr(double v) : value(v) {}
    double accept(ExprVisitor<double>& v) override { return v.visit(*this); }
};
// BinaryExpr, UnaryExpr follow the same accept pattern

The dance:

  1. Each node has accept(Visitor&)one virtual call site.
  2. accept immediately calls visitor.visit(*this) — and because *this has its concrete type at the call site, the right overload is selected at compile time.
  3. The visitor's visit(NumberExpr&) etc. are user-defined per pass.

This is the double dispatch trick: virtual dispatch picks the node's concrete type; static dispatch (overloading) picks the operation on it.

Why Not Just A Switch / std::variant?

Two viable alternatives:

Alternative A — std::variant<Number, Binary, Unary>

using Expr = std::variant<NumberExpr, BinaryExpr, UnaryExpr>;
double eval(const Expr& e) {
    return std::visit(overloaded{
        [](const NumberExpr& n) { return n.value; },
        [](const BinaryExpr& b) { /* … */ },
        [](const UnaryExpr& u)  { /* … */ }
    }, e);
}

Pros: no virtuals, slightly faster, more "modern C++". Cons: every new node type requires updating every visit site (or using if constexpr); recursive types need extra wrapping (std::variant<NumberExpr, std::unique_ptr<BinaryExpr>, …>); error messages are worse.

Alternative B — Switch On Tag

struct Expr { ExprKind kind; /* … */ };
double eval(const Expr& e) {
    switch (e.kind) {
        case ExprKind::Number: /* … */;
        case ExprKind::Binary: /* … */;
    }
}

Pros: zero virtual overhead; single allocation possible (one Expr struct sized for the largest variant). Cons: every pass touches every node type's data layout; no encapsulation.

Which does production use?

  • Clang's AST: tagged union via Stmt::getStmtClass() + visitor (RecursiveASTVisitor).
  • rustc's AST: enums (Rust's std::variant-equivalent).
  • LLVM IR's Instruction: tagged class hierarchy + visitor.

We use the virtual + Visitor pattern here because it's the most teachable and the most directly comparable to Clang's design. The trade-offs are real — we discuss them in docs/analysis.md.

Ownership: std::unique_ptr

using ExprPtr = std::unique_ptr<Expr>;

struct BinaryExpr : Expr {
    TokenKind op;
    ExprPtr   lhs;
    ExprPtr   rhs;
    BinaryExpr(TokenKind o, ExprPtr l, ExprPtr r)
        : op(o), lhs(std::move(l)), rhs(std::move(r)) {}
};

Children are owned via unique_ptr. A tree is not shared — destroying the root recursively destroys the whole tree.

Why not shared_ptr? Compilers never need shared ownership of AST nodes. Sharing would imply two parents, and ASTs are trees — by definition no shared parents.

Why not raw pointers + manual delete? Memory leak waiting to happen, especially with parser exceptions. unique_ptr is RAII for ownership.

Why not arena allocation? It's the technique production compilers use (LLVM has BumpPtrAllocator; Clang's ASTContext allocates from a single arena). We'll add arena allocation in cp-09 when we have many AST nodes per program. For arithmetic, individual unique_ptr allocations are fine.

The accept Method, Concretely

double NumberExpr::accept(ExprVisitor<double>& v) { return v.visit(*this); }

When the evaluator runs someExpr->accept(eval):

  1. Virtual dispatch goes to the right accept (e.g., NumberExpr::accept).
  2. That calls eval.visit(*this) — and because *this is a NumberExpr&, overload resolution chooses Evaluator::visit(NumberExpr&).

The combination is sometimes called "double dispatch" — but conceptually it's just: one virtual call routes to the right type, then static overload routes to the right operation.

The Template Parameter R

template <typename R>
struct ExprVisitor {
    virtual R visit(NumberExpr&) = 0;
    virtual R visit(BinaryExpr&) = 0;
    virtual R visit(UnaryExpr&)  = 0;
};

R is the return type. We define Evaluator : ExprVisitor<double>. A printer would be Printer : ExprVisitor<std::string>. A type-checker would be TypeChecker : ExprVisitor<Type>. Code generation in cp-11 will be LLVMGen : ExprVisitor<llvm::Value*>.

In this lab Expr::accept is hard-wired to ExprVisitor<double> — sufficient for our one visitor. In production you'd either define multiple accept overloads or use std::any/erased return types. Clang uses a non-templated visitor + side-effects to a Result member; rustc uses Rust enums and match.

Try It

After we build, this minimal main exercises the AST manually:

auto five  = std::make_unique<NumberExpr>(5.0);
auto three = std::make_unique<NumberExpr>(3.0);
auto plus  = std::make_unique<BinaryExpr>(TokenKind::Plus,
                                          std::move(five),
                                          std::move(three));
Evaluator e;
std::cout << e.eval(*plus) << "\n";   // → 8

(For the actual interactive evaluator we go through the parser; this is just to confirm the AST machinery works in isolation.)

Next

03-recursive-descent-parser.md — build the AST from tokens.