cp-08 — Three-Address Code IR
A new compiler middle-end that lowers the resolved/type-checked AST into Three-Address Code (TAC), the canonical compiler IR taught in every dragon book. The bytecode VM of cp-07 was a great way to run code, but a poor representation for reasoning about code: an operand stack hides def/use relationships, and locals are addressed by slot rather than name.
TAC reverses these tradeoffs. Each instruction has at most one operation
and writes its result into one named destination — t3 = add t1, t2.
Control flow lives in a control-flow graph (CFG) of basic blocks
connected by explicit jumps. This is exactly the shape an SSA construction
algorithm wants in cp-09, and it's the shape LLVM IR will demand in cp-11.
What's in the box
| File | Purpose |
|---|---|
src/ir.hpp/cpp | Operand, Op enum, Instr, BasicBlock, Function, Module |
src/ir_printer.* | Textual IR pretty-printer (the "assembly" we read in tests) |
src/ir_builder.* | AST → IR lowering pass |
src/main.cpp | mltac CLI driver: source → IR text on stdout |
tests/test_ir.cpp | String-level golden tests over the printed IR |
The pipeline is now:
source ─► lexer ─► parser ─► resolver ─► typecheck ─► ir::Builder ─► Module
There is no execution stage in cp-08. cp-09 wires up an interpreter that walks this IR directly (and adds SSA + a couple of optimisation passes).
Build & run
cmake -S src/cpp -B src/cpp/build
cmake --build src/cpp/build -j
ctest --test-dir src/cpp/build --output-on-failure
echo 'fn add(a,b){return a+b;} print add(3,4);' | ./src/cpp/build/mltac
Expected output:
fn @__script__() {
bb0 (entry):
t0 = call @add(3, 4)
print t0
ret
}
fn @add(%a, %b) {
bb0 (entry):
t0 = add %a, %b
ret t0
}
What's new conceptually
- Three operand kinds.
t<n>temps (SSA-friendly),%namenamed storage (local variables / params), and immediate constants. - One op per instruction. Compound expressions are flattened by introducing fresh temps for each subexpression result.
- Globals through memory ops.
ldg @x/stg @x, vmake global reads and writes explicit — paralleling LLVM'sload/store. - Explicit control flow. Every block ends in a terminator
(
jmp,cjmp,ret). No fall-through. No implicit "next instruction". - Short-circuit lowered to branches.
a && bbecomes acjmpplus a join block, just as cp-07 did with patchable jumps — but now the join lives in the CFG, ready for phi insertion in cp-09.
Reading order
The seven step docs in steps/ follow the same progression as the code:
01-tac-and-three-address-form.md02-operands-and-instructions.md03-basic-blocks-and-cfg.md04-lowering-expressions.md05-lowering-statements-and-control-flow.md06-short-circuit-and-phi-preview.md07-printer-and-debugging.md