Step 1 — TAC and three-address form
Why a new IR?
The cp-07 bytecode VM was a perfectly good interpreter. But the moment you try to do interesting things to the program — eliminate dead code, fold constants, allocate registers, prove that two pointers don't alias — the stack representation fights you at every turn.
Three things make a stack machine awkward for analysis:
- Implicit operands.
ADDdoesn't say what it adds; you have to simulate the stack to find out. Every analysis becomes an abstract interpretation. - Position-dependent. Reordering two adjacent instructions changes what's on the stack. You can't pattern-match peepholes locally.
- No SSA story. SSA wants every value to have a name; stack slots are anonymous and transient.
TAC fixes all three by writing each operation in the form
dst = op src1, src2
— at most one operation per instruction, every operand explicit, every result named.
A first example
Source:
print (1 + 2) * 3;
Stack bytecode (cp-07-style):
PUSH 1
PUSH 2
ADD
PUSH 3
MUL
PRINT
TAC:
t0 = add 1, 2
t1 = mul t0, 3
print t1
The TAC form has more "instructions" by raw count, but each line is a
fully self-contained def with explicit uses. You can ask "what defines
t1?" without simulating anything — just grep.
Three operand kinds
Our Operand is a tagged variant with four cases:
None— placeholder for instructions that produce no value.Temp(n)—t0,t1, ... — single-assignment compiler-generated intermediates. In cp-09 these become SSA values.Constant(Value)— an immediate, printed inline.Named(name)—%x,%a— a local variable or parameter. Named operands behave like alloca'd memory slots (read and write via Move), but at TAC level we expose them as first-class operands. cp-09's mem2reg pass promotes these into SSA temps.
Globals do not get an Operand form; they are accessed through explicit
ldg @x / stg @x instructions. That mirrors LLVM's model where
"variable" really means "memory cell" and SSA only describes local
register flow.
The deal we're making
TAC introduces verbosity and a small constant-factor compile-time hit compared to direct bytecode generation. In exchange, we get:
- a uniform substrate for all later analyses (cp-09 SSA, cp-10 passes, cp-11 LLVM, cp-13 MLIR),
- a printable, diff-able IR that makes compiler bugs visible,
- a CFG abstraction we can use to talk precisely about reachability, dominance, and loop structure.
That's the trade every production compiler makes. cp-08 just makes us pay the price explicitly so cp-09 onwards can spend the proceeds.