02 — The IR Skeleton: Ops, Regions, Blocks, Values
MLIR's core insight is a uniform IR shape. Every operation — whether it represents a constant, a loop, a function, or a module — has the same structure:
operation:
name : string ("dialect.opname")
operands : list of Values used as inputs
results : list of Values produced as outputs
attributes : list of (name, constant) — compile-time metadata
regions : list of Regions — nested IR
parent block : back-pointer
A Region is a list of Blocks. A Block is a list of Operations plus a list of block arguments (SSA values that flow in at the head of the block). The graph is recursive: regions contain blocks, blocks contain ops, ops contain regions, ad infinitum.
cp-18 implements this directly:
struct Op {
std::string name;
std::vector<Value*> operands;
std::vector<std::unique_ptr<Value>> results;
std::vector<std::unique_ptr<Region>> regions;
std::vector<NamedAttr> attrs;
Block* parent = nullptr;
Op* prev = nullptr;
Op* next = nullptr;
};
struct Block {
std::vector<std::unique_ptr<Value>> args;
Op* first = nullptr;
Op* last = nullptr;
Region* parent = nullptr;
};
struct Region {
std::vector<std::unique_ptr<Block>> blocks;
std::vector<std::unique_ptr<Op>> opStore; // ownership
Op* parentOp = nullptr;
};
Why a single shape for everything?
Because every algorithm that walks IR can use the same traversal. The constant folder, the DCE pass, the printer, the verifier — none of them need special cases for "is this a function or a basic op". A function is just an op with one region. A loop is an op with one region. A module is an op with one region. Same data structure, same walks.
Compare with LLVM IR, where Module, Function, BasicBlock,
Instruction are four distinct classes with four distinct traversal
APIs. Every pass that operates above the instruction level has to
hand-roll its own walk over the right level of the hierarchy.
MLIR's nested-op design is a strict generalisation: anything LLVM can
express, MLIR can express with llvm.func and llvm.* op-per-instruction.
But the reverse isn't true.
Linked-list blocks
cp-18 uses an intrusive linked list of ops inside each block (Op::prev,
Op::next). This matters because passes constantly insert and delete
ops in the middle of blocks. A vector<unique_ptr<Op>> would invalidate
iterators on every mutation and require O(n) shifts.
The trick from real MLIR: ownership lives in a separate opStore on the
parent region (a vector of unique_ptr<Op>). The list pointers in Op
form the logical sequence. Erasing an op detaches it from the list
but leaves the storage alive until the region dies. That's why
Block::eraseOp doesn't actually delete.
Block arguments instead of phi nodes
LLVM IR uses phi instructions at the start of merge blocks:
%x = phi i64 [%a, %B1], [%b, %B2]
MLIR (and cp-18) uses block arguments:
^bb3(%x: i64):
...
with predecessors supplying values when they branch in. Functionally equivalent; structurally cleaner. Block-argument indexing matches operand indexing of the branch op, so you never have to scan the block header to match incoming values to predecessors.
cp-18 doesn't yet emit block arguments (the tiny.* dialect has no
control flow), but Block::addArg is there for when you extend it.