Step 6 — CFG simplification
After constFold rewrites a cjmp to a jmp, one of the targets
becomes unreachable. The instructions in that block — prints, function
calls, everything — are still there in fn.blocks. They take no
runtime time (nothing branches to them) but they bloat the IR, confuse
later passes, and confuse humans reading the dump.
simplifyCFG is the pass that throws them away.
The algorithm
unordered_set<int> reachable;
vector<int> stack{ fn.blocks[0].id };
while (!stack.empty()) {
int id = stack.back(); stack.pop_back();
if (!reachable.insert(id).second) continue;
// For each successor (jmp / cjmp / ret) of bb id, push onto stack.
}
// Drop blocks not in `reachable`.
Pure flood-fill from the entry block (always bb0). Successors come
from inspecting the terminator of each block — the one place in the
IR where control flow is encoded explicitly. That's the entire point
of the basic-block model: control flow is structural, not embedded in
straight-line instructions.
Other things simplifyCFG could do
We only do reachability pruning. Production-grade implementations also:
- Block merging. If
bb1ends injmp bb2andbb2has onlybb1as predecessor, merge them. - Empty-block bypass. If
bb1's only instruction isjmp bb2, rewrite every predecessor ofbb1to targetbb2directly. - Branch threading. If
bb1ends incjmp t, bb2, bb3andbb2is a known-constant block (e.g.cjmp false, ...), thread directly. - Hoisting common code. If both arms of a cjmp begin with the same instruction, hoist it before the branch.
LLVM's SimplifyCFG pass implements all of these and quite a few
more.
A subtle correctness rule
Whenever you delete a block, you must also check whether any remaining block referenced it as a successor — and if so, that reference is now invalid. Our reachability flood guarantees this: nothing reaches a deleted block, by definition, so no remaining terminator points to one.
In a more aggressive pass (block merging), you must rewrite phi nodes' incoming-block lists when you merge predecessors. We don't have phi nodes, so we sidestep that landmine. Welcome to mem2reg in cp-10/11.
Renumbering, or not?
We don't renumber block ids after pruning. bb0, bb2, bb4 is fine in
the printer — readers know it's a sparse sequence. If we did renumber,
every bbT / bbF field in every terminator would need updating.
That's fragile and offers no semantic benefit.
LLVM and Cranelift both keep sparse block numbering for the same reason.
Composition with other passes
loop {
a = constFold(fn); // may rewrite cjmp → jmp
b = dce(fn); // may drop now-unused temps
c = simplifyCFG(fn); // may drop now-unreachable blocks
if (!a && !b && !c) break;
}
The order matters: constFold first (creates work for the others),
then dce, then simplifyCFG. Swap any pair and you'll get the same
fixed point eventually, but more iterations to reach it.
This kind of phase ordering is one of the long-standing research problems in compiler engineering — see Whitfield & Soffa (1997) for a formal treatment.