Step 02 · The shape of MLIR

module {
  llvm.func @main() -> i32 {
    %0 = llvm.mlir.constant(42 : i64) : i64
    %1 = llvm.mlir.addressof @fmt : !llvm.ptr
    %2 = llvm.call @printf(%1, %0) vararg(!llvm.func<i32 (ptr, ...)>)
         : (!llvm.ptr, i64) -> i32
    %3 = llvm.mlir.constant(0 : i32) : i32
    llvm.return %3 : i32
  }
}

Key concepts:

  • Operation — every line is an Operation. The name carries the dialect (llvm., arith., func., scf., ...).
  • Region — a block of Operations, enclosed in { ... }. Some ops (scf.for, func.func) have nested regions; that's how MLIR expresses structured control flow.
  • Block — a list of operations ending in a terminator. Labels are ^bb0, ^bb1, .... Blocks may take SSA arguments (MLIR's unification of Φ-nodes and parameters).
  • Value (%name) — SSA result of an op.
  • Type (i64, !llvm.ptr, tensor<4xf32>) — typed by the dialect; ! prefix means "non-builtin".

Implications

  • No global symbol table for SSA — each block can reuse names.
  • Every op states all its operand and result types, so the IR is self-describing and can be parsed by mlir-opt even without knowing the producing dialect's C++ class (provided the dialect is loaded).
  • module itself is an op whose region holds the program.

Our emission strategy

Emitter::emitFunction produces a llvm.func with one entry block, allocas for every named local, then a llvm.br ^bb1 into the first TAC block. After that each TAC block becomes a ^bbN label and its instructions translate one-for-one to llvm.* ops.