Step 3 — Compiling Functions as Nested Compilers

Goal

Extend the cp-06 single-chunk compiler so that fn foo(...) { ... } emits a separate Function with its own chunk and its own local-variable bookkeeping — while staying able to resume compiling the outer code afterwards.

The Mental Model

A function body is just another little program. When the parser hands the compiler a FnDeclStmt, the compiler temporarily switches its target from the current chunk to a fresh chunk owned by a new Function. When the body finishes, the compiler:

  1. Emits Nil; Return (so a body without an explicit return does the right thing — see step 5 for control-flow specifics).
  2. Pops back to the outer compiler state.
  3. Records the new Function as a constant in the outer chunk's constant pool.
  4. Emits Closure <const-ix> at the outer cursor, which loads the function value onto the operand stack.
  5. Stores that value as a global or as a new local in the outer scope.

Crucially the outer compiler doesn't need to know anything about the inner body — it just sees a single opaque value.

State

class Compiler {
    struct Local { std::string name; int depth; bool isConst; };
    struct FunctionState {
        FunctionPtr           fn;
        std::vector<Local>    locals;
        int                   scopeDepth = 0;
        bool                  isScript;
    };
    std::vector<FunctionState> states_;

    Chunk&                chunk()      { return states_.back().fn->chunk; }
    std::vector<Local>&   locals()     { return states_.back().locals; }
    int&                  scopeDepth() { return states_.back().scopeDepth; }
};

The whole "current compilation context" is the top of states_. Push to enter a function, pop to leave.

void pushFunction(std::string name, int arity, bool isScript) {
    auto fn = std::make_shared<Function>();
    fn->name = std::move(name);
    fn->arity = arity;
    FunctionState fs;
    fs.fn = fn;
    fs.isScript = isScript;
    // Reserve slot 0 for the function value itself (matches the VM's call ABI).
    fs.locals.push_back({"", 0, true});
    states_.push_back(std::move(fs));
}

FunctionPtr popFunction() {
    auto fn = states_.back().fn;
    states_.pop_back();
    return fn;
}

That reserved slot 0 is the link to step 2 — the runtime puts the callable there, and the compiler must not accidentally allocate it to a user variable.

Compiling a FnDeclStmt

void visit(FnDeclStmt& s) override {
    pushFunction(s.name, s.params.size(), /*isScript=*/false);

    for (auto& p : s.params) addLocal(p, /*isConst=*/false, s.line);

    beginScope();
    for (auto& stmt : s.body) stmt->accept(*this);
    endScope();

    emit(Op::Nil);
    emit(Op::Return);

    auto fn = popFunction();

    // Outer scope: load the function value as a constant, then bind it.
    uint8_t ix = makeConstant(Value::makeFn(fn));
    emit(Op::Closure); emit(ix);

    if (scopeDepth() == 0) {
        uint8_t nameIx = makeConstant(Value::makeStr(s.name));
        emit(Op::DefGlobal); emit(nameIx);
    } else {
        addLocal(s.name, /*isConst=*/true, s.line);
    }
}

A few things worth noting:

  • We pass isConst=true for the binding itself but isConst=false for the parameters — assigning to a parameter inside its function body is legal.
  • The body opens its own block scope so endScope() cleans up any lets declared inside; the parameters are above this scope and persist for the entire function (correctly).
  • Op::Closure is currently a synonym for Op::Constant. We give it a distinct opcode so cp-12 can graft upvalue handling on without touching every call site.

Why addLocal(p, ...) Just Works

The cp-06 local table is indexed by insertion order, which matches the runtime slot numbering. Because we reserved slot 0 in pushFunction, the first parameter ends up at slot 1, the second at slot 2, … exactly what the call ABI delivers.

Forbidding Closure Capture (for now)

Without an upvalue system, inner can't see outer's local a:

fn outer(a) {
    fn inner() { return a; }   // ← capture
    return inner();
}

The compiler must detect this at compile time and refuse, rather than emit broken bytecode. Helper:

bool isOuterLocal(const std::string& name) {
    for (int i = (int)states_.size() - 2; i >= 0; --i) {
        const auto& ls = states_[i].locals;
        for (int j = (int)ls.size() - 1; j >= 1; --j)
            if (ls[j].name == name) return true;
    }
    return false;
}

IdentExpr and AssignExpr consult isOuterLocal after their normal local lookup misses but before they fall back to globals. If true, they emit a diagnostic pointing the user to cp-12.

<script> Is a Function Too

Result compile(Program& p) {
    pushFunction("<script>", 0, /*isScript=*/true);
    for (auto& s : p.statements) s->accept(*this);
    emit(Op::Nil); emit(Op::Return);
    auto script = popFunction();
    return Result{script, diagnostics_};
}

Everything composes. No special case for top-level — the VM just calls <script> like any other function.

Compiling CallExpr

void visit(CallExpr& e) override {
    e.callee->accept(*this);                 // pushes <fn>
    for (auto& a : e.args) a->accept(*this); // pushes args
    if (e.args.size() > 255)
        error(e.line, "too many arguments to a single call (>255)");
    emit(Op::Call);
    emit(uint8_t(e.args.size()));
}

The shape on the stack at Op::Call N is exactly what callValue expects — this is how the static side and runtime side cooperate.

Compiling ReturnStmt

void visit(ReturnStmt& s) override {
    if (states_.back().isScript)
        error(s.line, "'return' outside a function");
    if (s.value) s.value->accept(*this);
    else         emit(Op::Nil);
    emit(Op::Return);
}

Pitfalls

  • Forgetting the reserved slot 0. Parameters get the wrong slot numbers.
  • pushFunction after starting to emit prelude. The fresh FunctionState's chunk is empty by design; emit nothing into it before the body.
  • Capturing the inner Chunk& reference across pushFunction/ popFunction. states_.push_back can reallocate the vector — always go through chunk()/locals() accessors.