Step 1 — Why an optimising middle-end
A compiler that emits unoptimised IR straight to a backend gets you correct code; that is all. Everything good about a modern compiler — inlining, escape analysis, vectorisation, devirtualisation — happens in the middle-end, after the frontend has built an IR and before the backend lowers it to machine code.
The single most important pattern
IR_in → pass_1 → IR_intermediate → pass_2 → IR_intermediate' → ...
Each pass is a function IR → IR that preserves the program's observable
behaviour. The middle-end's job is to thread enough passes that the IR
that comes out the other side is small, dense, and easy for the backend.
A pass manager schedules and re-runs passes. LLVM's new pass manager (2020) tracks per-pass invalidation of analyses; we use a much simpler "run everything to a fixed point" loop. Both designs share the same invariants:
- Passes do not change observable behaviour (no I/O reordering, no new observable allocations, no new exceptions).
- Passes are robust: malformed input (e.g. a
cjmpon a constant) should be cleaned up, not crashed on. - Passes are composable: running pass A then pass B should produce the same IR as running them as a single fused pass would.
Why ordering matters
Our pipeline runs constFold → dce → simplifyCFG. Each unlocks the
next:
constFoldrewritescjmp t0, bb1, bb2(witht0constant) tojmp bb1. Thebb2block is now unreachable — but that fact isn't visible insideconstFold.simplifyCFGdrops the unreachable block. The temps defined in it are now uses-of-nothing.dcedrops those orphaned defs.- Some of those defs were the only uses of upstream temps. So we run the whole pipeline again. Repeat to fixed point.
LLVM calls this the "phase-ordering problem" and ships hundreds of passes; finding a good static order is genuinely hard. Our loop is the brute-force version: keep going until nobody changes anything.
What we'll cover
- Step 2 — How an IR interpreter works and why it's the semantic oracle.
- Step 3 — SSA: what it is, why every modern compiler uses it, why we don't fully implement it here.
- Step 4 — Constant folding + propagation, including the single-def temp substitution trick that approximates SSA.
- Step 5 — Dead-code elimination and why purity matters.
- Step 6 — CFG simplification.
- Step 7 — The pass-manager loop and how to think about phase ordering.