Step 1 — TAC and three-address form

Why a new IR?

The cp-07 bytecode VM was a perfectly good interpreter. But the moment you try to do interesting things to the program — eliminate dead code, fold constants, allocate registers, prove that two pointers don't alias — the stack representation fights you at every turn.

Three things make a stack machine awkward for analysis:

  1. Implicit operands. ADD doesn't say what it adds; you have to simulate the stack to find out. Every analysis becomes an abstract interpretation.
  2. Position-dependent. Reordering two adjacent instructions changes what's on the stack. You can't pattern-match peepholes locally.
  3. No SSA story. SSA wants every value to have a name; stack slots are anonymous and transient.

TAC fixes all three by writing each operation in the form

dst = op src1, src2

— at most one operation per instruction, every operand explicit, every result named.

A first example

Source:

print (1 + 2) * 3;

Stack bytecode (cp-07-style):

PUSH 1
PUSH 2
ADD
PUSH 3
MUL
PRINT

TAC:

t0 = add 1, 2
t1 = mul t0, 3
print t1

The TAC form has more "instructions" by raw count, but each line is a fully self-contained def with explicit uses. You can ask "what defines t1?" without simulating anything — just grep.

Three operand kinds

Our Operand is a tagged variant with four cases:

  • None — placeholder for instructions that produce no value.
  • Temp(n)t0, t1, ... — single-assignment compiler-generated intermediates. In cp-09 these become SSA values.
  • Constant(Value) — an immediate, printed inline.
  • Named(name)%x, %a — a local variable or parameter. Named operands behave like alloca'd memory slots (read and write via Move), but at TAC level we expose them as first-class operands. cp-09's mem2reg pass promotes these into SSA temps.

Globals do not get an Operand form; they are accessed through explicit ldg @x / stg @x instructions. That mirrors LLVM's model where "variable" really means "memory cell" and SSA only describes local register flow.

The deal we're making

TAC introduces verbosity and a small constant-factor compile-time hit compared to direct bytecode generation. In exchange, we get:

  • a uniform substrate for all later analyses (cp-09 SSA, cp-10 passes, cp-11 LLVM, cp-13 MLIR),
  • a printable, diff-able IR that makes compiler bugs visible,
  • a CFG abstraction we can use to talk precisely about reachability, dominance, and loop structure.

That's the trade every production compiler makes. cp-08 just makes us pay the price explicitly so cp-09 onwards can spend the proceeds.