cp-08 — Three-Address Code IR

A new compiler middle-end that lowers the resolved/type-checked AST into Three-Address Code (TAC), the canonical compiler IR taught in every dragon book. The bytecode VM of cp-07 was a great way to run code, but a poor representation for reasoning about code: an operand stack hides def/use relationships, and locals are addressed by slot rather than name.

TAC reverses these tradeoffs. Each instruction has at most one operation and writes its result into one named destination — t3 = add t1, t2. Control flow lives in a control-flow graph (CFG) of basic blocks connected by explicit jumps. This is exactly the shape an SSA construction algorithm wants in cp-09, and it's the shape LLVM IR will demand in cp-11.

What's in the box

FilePurpose
src/ir.hpp/cppOperand, Op enum, Instr, BasicBlock, Function, Module
src/ir_printer.*Textual IR pretty-printer (the "assembly" we read in tests)
src/ir_builder.*AST → IR lowering pass
src/main.cppmltac CLI driver: source → IR text on stdout
tests/test_ir.cppString-level golden tests over the printed IR

The pipeline is now:

source ─► lexer ─► parser ─► resolver ─► typecheck ─► ir::Builder ─► Module

There is no execution stage in cp-08. cp-09 wires up an interpreter that walks this IR directly (and adds SSA + a couple of optimisation passes).

Build & run

cmake -S src/cpp -B src/cpp/build
cmake --build src/cpp/build -j
ctest --test-dir src/cpp/build --output-on-failure
echo 'fn add(a,b){return a+b;} print add(3,4);' | ./src/cpp/build/mltac

Expected output:

fn @__script__() {
bb0 (entry):
    t0 = call @add(3, 4)
    print t0
    ret
}

fn @add(%a, %b) {
bb0 (entry):
    t0 = add %a, %b
    ret t0
}

What's new conceptually

  • Three operand kinds. t<n> temps (SSA-friendly), %name named storage (local variables / params), and immediate constants.
  • One op per instruction. Compound expressions are flattened by introducing fresh temps for each subexpression result.
  • Globals through memory ops. ldg @x / stg @x, v make global reads and writes explicit — paralleling LLVM's load/store.
  • Explicit control flow. Every block ends in a terminator (jmp, cjmp, ret). No fall-through. No implicit "next instruction".
  • Short-circuit lowered to branches. a && b becomes a cjmp plus a join block, just as cp-07 did with patchable jumps — but now the join lives in the CFG, ready for phi insertion in cp-09.

Reading order

The seven step docs in steps/ follow the same progression as the code:

  1. 01-tac-and-three-address-form.md
  2. 02-operands-and-instructions.md
  3. 03-basic-blocks-and-cfg.md
  4. 04-lowering-expressions.md
  5. 05-lowering-statements-and-control-flow.md
  6. 06-short-circuit-and-phi-preview.md
  7. 07-printer-and-debugging.md