cp-06 — Bytecode Compiler (AST → Stack-VM Chunks)

Status: ✅ Implemented.

Replaces the tree-walking model with compile-then-run: AST → flat array of bytecodes ("chunk"). The chunks are executed in cp-07.

What's Built

  • Op enum — a 32-instruction bytecode ISA: stack manipulation, globals/locals access, arithmetic/logic/comparison, control flow (JUMP, JUMP_IF_FALSE, LOOP), I/O, plus reserved opcodes (CALL, RETURN, CLOSURE, upvalues) that cp-07 will activate.
  • Chunk — bytecode array + deduplicated constants pool + parallel line table.
  • Compiler — AST visitor (both ExprVisitor<void> and StmtVisitor) that emits bytecode while tracking lexical locals as stack slots.
  • disassembler — human-readable dump for debugging and unit testing.
  • mlc CLI: mlc file.ml compiles a file and prints the chunk; mlc alone reads stdin.

Architecture

source → Lexer → Parser → Resolver → TypeChecker → Compiler → Chunk
                                                       │
                                                       └─→ Disassembler → text

The frontend (lex/parse/resolve/typecheck) is unchanged from cp-05; we re-use it. The interpreter was deleted. The new backend stages are Compiler and Disassembler. The tree-walker's Environment chain is gone — locals are stack slots, globals live in a (future) runtime hash table keyed by name strings interned in the constants pool.

Reading Order

  1. CONCEPTS.md — stack machines, bytecode design, operand encoding, why this is faster than tree-walking.
  2. steps/01-instruction-set-design.md
  3. steps/02-the-chunk.md
  4. steps/03-emit-helpers-and-jumps.md
  5. steps/04-locals-vs-globals.md
  6. steps/05-control-flow.md
  7. steps/06-short-circuit-logic.md
  8. steps/07-disassembler-and-testing.md
  9. src/cpp/ — actual code.

Build & Run

cd src/cpp
cmake -S . -B build -G "Unix Makefiles"
cmake --build build -j
ctest --test-dir build --output-on-failure

Then disassemble a program:

echo 'let n = 10; print n * (n + 1) / 2;' | ./build/mlc

Outcomes

After reading the code and steps you can:

  • Design a bytecode instruction set from first principles, justifying every operand width.
  • Compile a typed AST to a flat, executable form using a single forward pass.
  • Encode if/else, while, and short-circuit &&/|| using only conditional jumps.
  • Resolve identifier references to stack slots (locals) vs hash lookups (globals).
  • Disassemble chunks for debugging and assert on the byte stream in unit tests.
  • Articulate the trade-offs between stack VMs (this) and register VMs (Lua, Dalvik).
  • Identify what's deferred to cp-07 (call frames, closures, CALL/RETURN, GC) and why each requires a runtime.

Limitations (revisited in cp-07)

  • No execution. We compile, we disassemble, we stop. The VM is cp-07's job.
  • No function bodies, calls, or return. Closures need call frames and upvalues — both runtime concepts.
  • Constants are capped at 256 per chunk (1-byte index). cp-07 will add CONSTANT_LONG with a 3-byte index for chunks that need more.
  • No source spans for error reporting beyond line numbers. cp-15 expands this.