Step 6 — Lowering control flow
Unconditional branch
br label %L3
br is the workhorse terminator. The single-operand form is an
unconditional jump.
Our TAC Jump op lowers directly:
case Op::Jump:
out << " br label %" << blockLabel(ins.bbT) << "\n";
Conditional branch
br i1 %cond, label %T, label %F
The condition must be i1. Since we keep our values in i64, we
must emit an explicit icmp ne ..., 0 before the branch:
case Op::CondJump: {
std::string a; operandValue(ins.srcs[0], a);
std::string cmp = fresh();
out << " " << cmp << " = icmp ne i64 " << a << ", 0\n";
out << " br i1 " << cmp << ", label %L" << ins.bbT
<< ", label %L" << ins.bbF << "\n";
}
If we were storing booleans as i1 natively, this would just be:
br i1 %cond, label %T, label %F
That's a real performance argument for using i1 as the boolean
type, even if it complicates the rest of the IR.
Multi-way branches: switch
LLVM supports an n-way switch:
switch i64 %tag, label %default [
i64 0, label %A
i64 1, label %B
i64 2, label %C
]
We don't emit switch because our TAC doesn't have one — any
switch-like construct would have been lowered to a chain of cjmps
in the IR builder. A more sophisticated frontend would detect
switch-like AST patterns and lower to switch directly, giving the
backend the opportunity to emit a jump table.
phi nodes (or the lack thereof)
Because we use the alloca trick, our emitted IR has no phi
nodes. Every variable read is a load; every write is a store.
The IR for if (c) { x = 1; } else { x = 2; } print x;:
%v0 = load i64, i64* %c.addr
%v1 = icmp ne i64 %v0, 0
br i1 %v1, label %L1, label %L2
L1:
store i64 1, i64* %x.addr
br label %L3
L2:
store i64 2, i64* %x.addr
br label %L3
L3:
%v2 = load i64, i64* %x.addr
call i32 (i8*, ...) @printf(...)
After mem2reg:
br i1 %v1, label %L1, label %L2
L1:
br label %L3
L2:
br label %L3
L3:
%x.3 = phi i64 [1, %L1], [2, %L2]
call i32 (i8*, ...) @printf(...)
That phi was always implicit in the semantics; mem2reg just made it
explicit.
Loops
A while (cond) { body } in TAC has three blocks: header, body, exit.
The header tests the condition and cjmps into body or exit. The
body unconditionally jumps back to header. The header dominates both
body and exit; the body's backward edge is what makes this a natural
loop.
br label %Lheader
Lheader:
...test...
br i1 %cond, label %Lbody, label %Lexit
Lbody:
...body...
br label %Lheader
Lexit:
...
LLVM detects this pattern (loop analysis runs on the SSA form after mem2reg) and enables loop-specific optimisations: invariant code motion, induction-variable simplification, vectorisation, unrolling.
unreachable
If a block has no logical successor (e.g. control falls off the end
of noreturn code), the terminator is unreachable. Our compiler
doesn't generate unreachable because cp-09's simplifyCFG already
removed those blocks. But they appear all the time in C frontends
after calls to abort(), exit(), etc.
Indirect branches
indirectbr label %target, [label %a, label %b, ...] is how
computed-goto / threaded-code interpreters express their dispatch.
Out of scope here; relevant for cp-12's JIT capstone where we may
explore inline caches.
invoke and exception handling
LLVM's exception model uses invoke (a call with a normal and an
exceptional destination) plus landingpad blocks. We have no
exceptions in MiniLang. C++ frontends spend a lot of time on this.