Step 1 — Operand Stack, Values, and the Dispatch Loop

Goal

Build the minimum VM that can execute a flat sequence of opcodes acting on a stack of Values. At the end of this step:

push(Constant 3); push(Constant 4); Add; Print;

prints 7.

The Operand Stack

A stack virtual machine evaluates expressions by pushing intermediates to a LIFO buffer and popping them when consumed. This mirrors how a tree-walker recurses, but flattens the call sequence into a linear instruction stream.

class VM {
    std::vector<Value> stack_;
    void push(Value v)     { stack_.push_back(std::move(v)); }
    Value pop()            { Value v = stack_.back(); stack_.pop_back(); return v; }
    Value& peek(size_t off=0) { return stack_[stack_.size()-1-off]; }
};

Important properties:

  • The stack is vector<Value> (not array<Value, 256>). Real VMs pre-allocate, but vector is simpler and correct.
  • Every instruction has a known stack effect — e.g. Add is -2 / +1. The compiler tracks this implicitly; bugs here manifest as crashes deep in unrelated code because the wrong value is on top.

The Value Union

A VM Value is a tagged sum, NOT a pointer. Cache-friendly, no allocation for primitives:

enum class ValueKind : uint8_t { Nil, Bool, Number, Str, Fn };

struct Value {
    ValueKind kind;
    union {
        bool   b;
        double n;
    };
    std::string s;     // outside union — has a non-trivial destructor
    FunctionPtr fn;    // shared_ptr — Fn case
};

Aside — why std::string outside the union? C++ unions can hold non-trivial members but you must placement-new and manually destruct them. That's a real win for production VMs (every cache miss matters) but adds 40 lines of error-prone code. cp-07 chooses clarity; cp-12 swaps in a tagged 64-bit NaN-boxed payload for the JIT.

The Dispatch Loop

The dispatch loop is the single hottest piece of code in any interpreter. It reads an opcode, jumps to its handler, executes, repeats:

for (;;) {
    auto op = Op(readByte());
    switch (op) {
        case Op::Constant: push(readConstant()); break;
        case Op::Add: {
            Value b = pop(), a = pop();
            push(Value::makeNum(a.n + b.n));
            break;
        }
        case Op::Print: {
            (*out) << pop().toString() << "\n";
            break;
        }
        case Op::Return: return Status::Ok;
        // …
    }
}

Why a switch?

GCC and Clang compile a dense switch over a uint8_t to a jump table: one indirect branch per dispatch. It is hard to beat without going to inline assembly.

What about computed goto?

Computed goto (&&Op_Add labels in a static const void* table[256]) reduces branch predictor pressure by giving each handler its own predictor entry — each opcode's "what comes next?" is a separate prediction. Speedups are real (15–30%) but the code is GNU-only and harder to read. We use switch here and revisit computed goto when we profile in cp-12.

Reading Operands

Each opcode may consume immediate bytes (operands) from the instruction stream:

uint8_t  readByte()      { return *ip_++; }
uint16_t readShort()     { uint16_t hi = *ip_++; uint16_t lo = *ip_++; return (hi<<8)|lo; }
Value    readConstant()  { return chunk()->constants[readByte()]; }

The instruction pointer (ip_) is a const uint8_t* into the current chunk's code buffer. Advancing it past the buffer end is undefined behavior — the compiler is responsible for terminating every code path with Return.

Stack-Balance Discipline

Every opcode handler in cp-07 obeys a stack discipline:

OpcodePopsPushes
Constant01
Add/Sub/...21
Neg/Not11
Print10
Pop10
Jump00
JumpIfFalse00
Call NN+11
Return10 (then push result into caller frame)

When you introduce a new opcode, document its stack effect first; the compiler tracks scope depth and locals based on this contract.

Try It

cd src/cpp && cmake --build build -j
echo 'print 2 + 3 * 4;' | ./build/mlr --dump

You should see the disassembly of the script chunk followed by 14.

Pitfalls

  • Popping in the wrong order. For binary ops the right-hand side is on top: b = pop(); a = pop();. Reverse it and 1 - 2 becomes 1.
  • push(pop() + …) evaluation order. C++ does not specify argument evaluation order — sequence the pops into named locals before pushing.
  • Re-reading from stack_ after push. vector::push_back can reallocate, invalidating references. Always copy Values out before pushing.