02 — The IR Skeleton: Ops, Regions, Blocks, Values

MLIR's core insight is a uniform IR shape. Every operation — whether it represents a constant, a loop, a function, or a module — has the same structure:

operation:
  name           : string ("dialect.opname")
  operands       : list of Values used as inputs
  results        : list of Values produced as outputs
  attributes     : list of (name, constant) — compile-time metadata
  regions        : list of Regions — nested IR
  parent block   : back-pointer

A Region is a list of Blocks. A Block is a list of Operations plus a list of block arguments (SSA values that flow in at the head of the block). The graph is recursive: regions contain blocks, blocks contain ops, ops contain regions, ad infinitum.

cp-18 implements this directly:

struct Op {
    std::string                          name;
    std::vector<Value*>                  operands;
    std::vector<std::unique_ptr<Value>>  results;
    std::vector<std::unique_ptr<Region>> regions;
    std::vector<NamedAttr>               attrs;
    Block*                               parent = nullptr;
    Op*                                  prev = nullptr;
    Op*                                  next = nullptr;
};

struct Block {
    std::vector<std::unique_ptr<Value>> args;
    Op*                                 first = nullptr;
    Op*                                 last  = nullptr;
    Region*                             parent = nullptr;
};

struct Region {
    std::vector<std::unique_ptr<Block>> blocks;
    std::vector<std::unique_ptr<Op>>    opStore;  // ownership
    Op*                                 parentOp = nullptr;
};

Why a single shape for everything?

Because every algorithm that walks IR can use the same traversal. The constant folder, the DCE pass, the printer, the verifier — none of them need special cases for "is this a function or a basic op". A function is just an op with one region. A loop is an op with one region. A module is an op with one region. Same data structure, same walks.

Compare with LLVM IR, where Module, Function, BasicBlock, Instruction are four distinct classes with four distinct traversal APIs. Every pass that operates above the instruction level has to hand-roll its own walk over the right level of the hierarchy.

MLIR's nested-op design is a strict generalisation: anything LLVM can express, MLIR can express with llvm.func and llvm.* op-per-instruction. But the reverse isn't true.

Linked-list blocks

cp-18 uses an intrusive linked list of ops inside each block (Op::prev, Op::next). This matters because passes constantly insert and delete ops in the middle of blocks. A vector<unique_ptr<Op>> would invalidate iterators on every mutation and require O(n) shifts.

The trick from real MLIR: ownership lives in a separate opStore on the parent region (a vector of unique_ptr<Op>). The list pointers in Op form the logical sequence. Erasing an op detaches it from the list but leaves the storage alive until the region dies. That's why Block::eraseOp doesn't actually delete.

Block arguments instead of phi nodes

LLVM IR uses phi instructions at the start of merge blocks:

%x = phi i64 [%a, %B1], [%b, %B2]

MLIR (and cp-18) uses block arguments:

^bb3(%x: i64):
  ...

with predecessors supplying values when they branch in. Functionally equivalent; structurally cleaner. Block-argument indexing matches operand indexing of the branch op, so you never have to scan the block header to match incoming values to predecessors.

cp-18 doesn't yet emit block arguments (the tiny.* dialect has no control flow), but Block::addArg is there for when you extend it.