Step 7 — Disassembler and Testing
Without a VM yet (cp-07's job), the only way to know the compiler is right is to read the bytecode it produces. The disassembler is therefore both a debugging tool and the primary test surface.
See disassembler.cpp and tests/test_compiler.cpp.
Output Format
== <script> ==
0000 1 CONSTANT 0 ; 10
0002 | DEF_GLOBAL 1 ; n
0004 | GET_GLOBAL 1 ; n
0006 | CONSTANT 2 ; 1
0008 | ADD
0009 | PRINT
000a | RETURN
Per line:
- 4-hex-digit byte offset.
- Source line number, or
|if same as previous (visual grouping). - Opcode name.
- Operand (right-aligned), with a
; commentshowing the resolved constant or scope info.
The format owes everything to Crafting Interpreters' Lox disassembler.
The Dispatcher
Each opcode falls into one of four "shapes":
case Op::Constant: consumed = constantInstr("CONSTANT", chunk, offset, os); break;
case Op::Pop: consumed = simple("POP", os); break;
case Op::GetLocal: consumed = byteInstr("GET_LOCAL", chunk, offset, os); break;
case Op::Jump: consumed = jumpInstr("JUMP", +1, chunk, offset, os); break;
case Op::Loop: consumed = jumpInstr("LOOP", -1, chunk, offset, os); break;
simple— opcode only, advances 1 byte.byteInstr— opcode + 1 operand byte, advances 2.constantInstr— opcode + 1 operand byte (constants index), advances 2, resolves and prints the value.jumpInstr— opcode + 2 operand bytes (big-endian), advances 3, computes and prints the target offset.
disassembleInstruction returns consumed so the outer loop knows how many bytes to skip. This is the same shape your VM dispatch loop will have (cp-07) — the disassembler and VM are isomorphic structurally; the VM swaps "print this" for "execute this".
Test Strategy
Two complementary approaches:
(a) Exact opcode sequence
For short programs where the lowering is fully predictable:
auto out = compileSource("print 1 + 2 * 3;");
CHECK(opsMatch(out.chunk,
{Op::Constant, Op::Constant, Op::Constant, Op::Mul, Op::Add,
Op::Print, Op::Return}));
opsMatch walks the byte stream, extracts only the opcodes (skipping operand bytes by knowing the size of each opcode), and compares the resulting vector<Op> to your expected list. It's robust to operand-value churn — if Op::Constant's constant-pool slot changes, the test still passes; only the opcode shape matters.
(b) Substring match on the disassembly
For control flow where exact jump offsets are noisy but landmark opcodes matter:
CHECK_CONTAINS(out.disasm, "LOOP");
CHECK_CONTAINS(out.disasm, "JUMP_IF_FALSE");
Use this when the presence of an opcode is the assertion, not the exact byte sequence.
Negative tests
auto out = compileSource("{ let a = 1; a = 2; }");
CHECK(!out.compiledOk);
CHECK(/* "immutable" appears in some diagnostic */);
The compiler collects diagnostics without throwing, so tests can verify both the failure and the message text.
What the Tests Cover
- Arithmetic / unary / logical operators emit the right opcodes in the right order.
- Globals get
DEF_GLOBAL/GET_GLOBAL/SET_GLOBAL; locals getGET_LOCAL/SET_LOCAL. - Block scope correctly emits per-local
Popon exit. letimmutability is enforced.if/elseandwhileemitJump/JumpIfFalse/Loopcorrectly with pairedPops.- String constants are deduplicated in the pool.
- The line table has the same length as the code stream and contains real source line numbers.
fndeclarations and calls emit clear "deferred to cp-07" diagnostics.- Same-scope local redeclaration errors at resolve or compile time.
- Short-circuit
&&lowers exactly as expected.
That's 15 tests covering all currently-supported features and the principled subset of unsupported ones.
Using the Disassembler at the REPL
echo 'var x = 0; while (x < 3) { x = x + 1; print x; }' | ./build/mlc
Read the output as a sanity check on any compiler change you make. cp-07's VM will replay these same bytes.
Self-Check
- What does a typical disassembled
IfStmtlook like? Predict the line count. - Why is
opsMatch(a) preferable to a string compare on the disassembly for short programs? - How would you extend the disassembler to print stack-effect estimates next to each opcode?