Step 3 — Module / function / block
The module
A module is the unit of compilation. One .ll file = one
llvm::Module = one translation unit. Modules contain:
- A target triple (
arm64-apple-macosx) and data layout. - Global declarations (
@printf,@.fmt,@x). - Function definitions.
- Metadata (debug info, optimisation hints).
; ModuleID = 'minilang'
target triple = "arm64-apple-macosx"
@.fmt = private constant [6 x i8] c"%lld\0A\00"
declare i32 @printf(i8*, ...)
define i32 @main() { ... }
The function
A function has a return type, name, parameter list, and a list of basic blocks:
define i64 @add(i64 %arg0, i64 %arg1) {
L0:
%v0 = add i64 %arg0, %arg1
ret i64 %v0
}
The first block listed is the entry block — implicit, no special marker. Parameters are values in scope from the entry block. Return type is declared up front; every terminator that returns must agree.
Linkage and visibility
Function definitions default to external linkage (visible to the
linker, as if extern in C). Other options:
private— invisible to the linker (we use this for@.fmt).internal— visible only within the module.weak/linkonce— for inline functions and templates.
We don't decorate @main or @add — external is the right default.
The basic block
A basic block is a maximal sequence of straight-line instructions
ending in a terminator — ret, br, switch, unreachable,
indirectbr, invoke, resume, catchret, cleanupret. Exactly
one terminator per block; if you forget, the verifier rejects.
L1:
%v3 = add i64 %v0, 1
store i64 %v3, i64* %i.addr
br label %L0
Blocks are labelled (L1: …). Labels are values of type label,
referred to as %L1 in br targets. The label name on the
definition site has no % — but the reference site does. (Yes, this
inconsistency is annoying. Welcome to LLVM IR.)
Why basic blocks at all?
Because every flow-graph analysis is dramatically simpler if you can reason about straight-line sequences as opaque units, then handle control at the boundaries. Dominator computation, liveness, register allocation, scheduling — all of them operate on basic-block CFGs.
Compare to a representation where any instruction could be a branch target: now every analysis has to track "did anyone jump into the middle of this run?" The basic-block invariant — enter at the top, exit at the bottom — buys you enormous simplification.
Mapping our TAC
Our TAC already had basic blocks (BasicBlock in cp-08), so the
mapping is one-to-one. The only difference: LLVM blocks are labelled
by %LN syntactically; ours by integer id. The emitter prefixes
with L:
static std::string blockLabel(int id) { return "L" + std::to_string(id); }
And emits a br label %L<id> to enter the first block from the
alloca region (LLVM requires an explicit terminator-into-entry-of-body
even when the alloca prelude is in the same block — we just put the
allocas before the br for clarity).