Step 3 — SSA: the universal IR shape

Static Single Assignment is the IR shape that essentially every modern compiler uses internally. LLVM IR is SSA. HotSpot's C2 is SSA. V8's Sparkplug and Turbofan are SSA. Cranelift is SSA. WebKit's B3 is SSA. MLIR is SSA. Even pre-LLVM compilers (gcc ≥ 4.0) added SSA in the form of "GIMPLE" + tree-SSA.

The core idea

Every variable is assigned exactly once. If source code re-assigns, we create a new SSA name:

// source                IR (SSA)
x = 1;                   %x.1 = 1
x = x + 2;               %x.2 = add %x.1, 2
print x;                 print %x.2

Every %name has exactly one definition. That is the magic property: to find the value of %x.2, you don't need to scan blocks looking for the most recent assignment — there is no most-recent, there is the one. Every analysis that classical IR does with backwards walks becomes a one-step lookup in SSA.

The hard part: control flow

if (cond) { x = 1; } else { x = 2; }
print x;

In the join block, what is x? Neither %x.1 nor %x.2 alone. SSA solves this with phi nodes:

bb_then:  %x.1 = 1; jmp bb_join
bb_else:  %x.2 = 2; jmp bb_join
bb_join:  %x.3 = phi [%x.1, bb_then], [%x.2, bb_else]
          print %x.3

A phi instruction at the top of a block picks which incoming value to use based on which predecessor flow came from. It is the SSA analogue of "the variable's most recent definition".

Where do phi nodes go?

The classical answer is the dominance frontier:

A block B is in the dominance frontier of A if A dominates a predecessor of B but does not dominate B itself.

Every block where two control paths from different definitions can join is in some definition's dominance frontier — and that's exactly where we need a phi.

The Cytron et al. (1991) algorithm computes dominance frontiers, then inserts phi nodes, then renames every variable use to refer to its unique SSA definition. This is mem2reg in LLVM terminology — "promote memory locations (allocas) into SSA registers".

Why we're not implementing it here

Full mem2reg is ~600 lines including dominator-tree construction (Lengauer–Tarjan or Cooper-Harvey-Kennedy iterative), DF computation, phi insertion, and the dominator-tree renaming walk. It is a worthy exercise — and the standard reference implementation (LLVM's Utils/PromoteMemoryToRegister.cpp) is the right thing to read.

Our constFold cheats: it requires that a temp has exactly one definition (which TAC builder-produced IR already satisfies for temps, since each tN is fresh), and propagates only those. That gets us constant folding through binary-op chains without ever computing a dominator tree.

When we move to LLVM in cp-10/11, mem2reg comes for free as a built-in optimisation pass. The point of this step is conceptual: you should be able to draw the dominance frontier of an arbitrary CFG on a whiteboard, and you should know what a phi node is and where it goes.

Practical SSA in real compilers

LLVM — SSA from frontend; mem2reg promotes allocas; instcombine, GVN, EarlyCSE, SROA all assume SSA.
HotSpot C2 — "Ideal Graph" is a sea-of-nodes SSA variant.
V8 Turbofan / Sparkplug — sea-of-nodes graph IR; SSA invariant.
Cranelift — strict SSA, block parameters instead of phi nodes (semantically equivalent but easier to manipulate).
GCC — GIMPLE → tree-SSA → RTL; first major non-research compiler to adopt SSA in mainline.

Compilers & Parser Engineer