Step 4 — Constant folding and propagation
What it is
Constant folding evaluates expressions at compile time when all the inputs are known:
t0 = add 3, 4 → t0 = 7
t1 = lt 10, 20 → t1 = true
cjmp t1, bb1, bb2 → jmp bb1
Constant propagation takes folded constants and substitutes them forward, so the next pass can fold further:
t0 = 7
t1 = add t0, 1 → (after propagation) t1 = add 7, 1
→ (after folding) t1 = 8
These two are conceptually one optimisation but practically two passes: fold uses what's already constant, propagation makes more things constant.
Why it's the first pass everyone implements
- Cheap. Linear scan; no analyses required.
- Cascades. Folding one operation usually enables folding several more — the compiler equivalent of compound interest.
- High value. Source code is rich in constants:
(width * 4 + padding * 2) / 8— every operation here can fold once the user picks values. - Foundation for everything. Inlining + constant folding + propagation is how you get the C++ template-meta-programming style of zero-cost abstraction.
Implementation walkthrough
bool constFold(Function& fn) {
bool changed = false;
// Step 1: propagate.
unordered_map<int, Value> constMap;
buildConstMap(fn, constMap); // tN -> Value if sole def is Move-of-const
for (auto& bb : fn.blocks)
for (auto& ins : bb.instrs)
for (auto& s : ins.srcs)
if (substituteOperand(s, constMap)) changed = true;
// Step 2: fold what's now foldable.
for (auto& bb : fn.blocks) {
for (auto& ins : bb.instrs) {
// binary, unary, cjmp-on-const ...
}
}
return changed;
}
The crucial check is buildConstMap: a temp is eligible for
substitution only if it has exactly one definition. In real SSA
that's automatic. In our TAC IR it's true for builder-generated
temps but we double-check because future passes might break the
invariant.
What we don't do (yet)
- Algebraic identities —
x + 0 → x,x * 1 → x,x - x → 0. Every textbook calls these "strength reductions" and they're trivial to add to the fold loop. - Comparison strength reduction —
x < x → false,x == x → true. - GVN (Global Value Numbering) — recognising that
t1 = add a, bandt2 = add a, balways compute the same value, sot2can be replaced witht1. - Conditional constant propagation (SCCP) — Wegman-Zadeck (1991).
Folds across control flow: if every reaching def of
xis the same constant, treatxas that constant.
LLVM's InstCombine is the production-grade version of this pass —
~10k lines and counting, and it's still one of the most-modified files
in the LLVM tree.
The cjmp rewrite is special
cjmp <const>, bbT, bbF
→ jmp bbT // if const is truthy
→ jmp bbF // otherwise
This is the single most important fold for any compiler because it
exposes dead code to simplifyCFG. if (false) { huge_block }
disappears entirely once cjmp folds and simplifyCFG drops the
unreachable side.
This pattern — "fold the predicate, then simplify the CFG" — is
exactly what makes if constexpr viable in C++17 and what makes
generic monomorphisation in Rust/Go produce reasonable code.