Step 3 — SSA: the universal IR shape
Static Single Assignment is the IR shape that essentially every modern
compiler uses internally. LLVM IR is SSA. HotSpot's C2 is SSA. V8's
Sparkplug and Turbofan are SSA. Cranelift is SSA. WebKit's B3 is SSA.
MLIR is SSA. Even pre-LLVM compilers (gcc ≥ 4.0) added SSA in the
form of "GIMPLE" + tree-SSA.
The core idea
Every variable is assigned exactly once. If source code re-assigns, we create a new SSA name:
// source IR (SSA)
x = 1; %x.1 = 1
x = x + 2; %x.2 = add %x.1, 2
print x; print %x.2
Every %name has exactly one definition. That is the magic property:
to find the value of %x.2, you don't need to scan blocks looking for
the most recent assignment — there is no most-recent, there is the
one. Every analysis that classical IR does with backwards walks becomes
a one-step lookup in SSA.
The hard part: control flow
if (cond) { x = 1; } else { x = 2; }
print x;
In the join block, what is x? Neither %x.1 nor %x.2 alone. SSA
solves this with phi nodes:
bb_then: %x.1 = 1; jmp bb_join
bb_else: %x.2 = 2; jmp bb_join
bb_join: %x.3 = phi [%x.1, bb_then], [%x.2, bb_else]
print %x.3
A phi instruction at the top of a block picks which incoming value to use based on which predecessor flow came from. It is the SSA analogue of "the variable's most recent definition".
Where do phi nodes go?
The classical answer is the dominance frontier:
A block
Bis in the dominance frontier ofAifAdominates a predecessor ofBbut does not dominateBitself.
Every block where two control paths from different definitions can join is in some definition's dominance frontier — and that's exactly where we need a phi.
The Cytron et al. (1991) algorithm computes dominance frontiers, then
inserts phi nodes, then renames every variable use to refer to its
unique SSA definition. This is mem2reg in LLVM terminology —
"promote memory locations (allocas) into SSA registers".
Why we're not implementing it here
Full mem2reg is ~600 lines including dominator-tree construction
(Lengauer–Tarjan or Cooper-Harvey-Kennedy iterative), DF computation,
phi insertion, and the dominator-tree renaming walk. It is a worthy
exercise — and the standard reference implementation (LLVM's
Utils/PromoteMemoryToRegister.cpp) is the right thing to read.
Our constFold cheats: it requires that a temp has exactly one
definition (which TAC builder-produced IR already satisfies for temps,
since each tN is fresh), and propagates only those. That gets us
constant folding through binary-op chains without ever computing a
dominator tree.
When we move to LLVM in cp-10/11, mem2reg comes for free as a built-in optimisation pass. The point of this step is conceptual: you should be able to draw the dominance frontier of an arbitrary CFG on a whiteboard, and you should know what a phi node is and where it goes.
Practical SSA in real compilers
- LLVM — SSA from frontend; mem2reg promotes
allocas; instcombine, GVN, EarlyCSE, SROA all assume SSA. - HotSpot C2 — "Ideal Graph" is a sea-of-nodes SSA variant.
- V8 Turbofan / Sparkplug — sea-of-nodes graph IR; SSA invariant.
- Cranelift — strict SSA, block parameters instead of phi nodes (semantically equivalent but easier to manipulate).
- GCC — GIMPLE → tree-SSA → RTL; first major non-research compiler to adopt SSA in mainline.
Read next
- Cytron, Ferrante, Rosen, Wegman, Zadeck (1991) — Efficiently Computing Static Single Assignment Form and the Control Dependence Graph.
- Cooper, Harvey, Kennedy (2001) — A Simple, Fast Dominance Algorithm. The iterative version everyone now uses.
- Braun et al. (2013) — Simple and Efficient Construction of Static Single Assignment Form. Constructs SSA on-the-fly during IR generation; this is what Cranelift, V8, and Rust's MIR do today.