Step 7 — Disassembler and Testing

Without a VM yet (cp-07's job), the only way to know the compiler is right is to read the bytecode it produces. The disassembler is therefore both a debugging tool and the primary test surface.

See disassembler.cpp and tests/test_compiler.cpp.

Output Format

== <script> ==
0000     1  CONSTANT          0   ; 10
0002     |  DEF_GLOBAL        1   ; n
0004     |  GET_GLOBAL        1   ; n
0006     |  CONSTANT          2   ; 1
0008     |  ADD
0009     |  PRINT
000a     |  RETURN

Per line:

  • 4-hex-digit byte offset.
  • Source line number, or | if same as previous (visual grouping).
  • Opcode name.
  • Operand (right-aligned), with a ; comment showing the resolved constant or scope info.

The format owes everything to Crafting Interpreters' Lox disassembler.

The Dispatcher

Each opcode falls into one of four "shapes":

case Op::Constant:     consumed = constantInstr("CONSTANT", chunk, offset, os); break;
case Op::Pop:          consumed = simple("POP", os); break;
case Op::GetLocal:     consumed = byteInstr("GET_LOCAL", chunk, offset, os); break;
case Op::Jump:         consumed = jumpInstr("JUMP",          +1, chunk, offset, os); break;
case Op::Loop:         consumed = jumpInstr("LOOP",          -1, chunk, offset, os); break;
  • simple — opcode only, advances 1 byte.
  • byteInstr — opcode + 1 operand byte, advances 2.
  • constantInstr — opcode + 1 operand byte (constants index), advances 2, resolves and prints the value.
  • jumpInstr — opcode + 2 operand bytes (big-endian), advances 3, computes and prints the target offset.

disassembleInstruction returns consumed so the outer loop knows how many bytes to skip. This is the same shape your VM dispatch loop will have (cp-07) — the disassembler and VM are isomorphic structurally; the VM swaps "print this" for "execute this".

Test Strategy

Two complementary approaches:

(a) Exact opcode sequence

For short programs where the lowering is fully predictable:

auto out = compileSource("print 1 + 2 * 3;");
CHECK(opsMatch(out.chunk,
    {Op::Constant, Op::Constant, Op::Constant, Op::Mul, Op::Add,
     Op::Print, Op::Return}));

opsMatch walks the byte stream, extracts only the opcodes (skipping operand bytes by knowing the size of each opcode), and compares the resulting vector<Op> to your expected list. It's robust to operand-value churn — if Op::Constant's constant-pool slot changes, the test still passes; only the opcode shape matters.

(b) Substring match on the disassembly

For control flow where exact jump offsets are noisy but landmark opcodes matter:

CHECK_CONTAINS(out.disasm, "LOOP");
CHECK_CONTAINS(out.disasm, "JUMP_IF_FALSE");

Use this when the presence of an opcode is the assertion, not the exact byte sequence.

Negative tests

auto out = compileSource("{ let a = 1; a = 2; }");
CHECK(!out.compiledOk);
CHECK(/* "immutable" appears in some diagnostic */);

The compiler collects diagnostics without throwing, so tests can verify both the failure and the message text.

What the Tests Cover

  • Arithmetic / unary / logical operators emit the right opcodes in the right order.
  • Globals get DEF_GLOBAL/GET_GLOBAL/SET_GLOBAL; locals get GET_LOCAL/SET_LOCAL.
  • Block scope correctly emits per-local Pop on exit.
  • let immutability is enforced.
  • if/else and while emit Jump/JumpIfFalse/Loop correctly with paired Pops.
  • String constants are deduplicated in the pool.
  • The line table has the same length as the code stream and contains real source line numbers.
  • fn declarations and calls emit clear "deferred to cp-07" diagnostics.
  • Same-scope local redeclaration errors at resolve or compile time.
  • Short-circuit && lowers exactly as expected.

That's 15 tests covering all currently-supported features and the principled subset of unsupported ones.

Using the Disassembler at the REPL

echo 'var x = 0; while (x < 3) { x = x + 1; print x; }' | ./build/mlc

Read the output as a sanity check on any compiler change you make. cp-07's VM will replay these same bytes.

Self-Check

  • What does a typical disassembled IfStmt look like? Predict the line count.
  • Why is opsMatch (a) preferable to a string compare on the disassembly for short programs?
  • How would you extend the disassembler to print stack-effect estimates next to each opcode?