04 — Defining Dialects
A dialect in cp-18 is just a namespace of helpers that construct ops with the right name + signature. There's no class hierarchy, no registration, no inheritance.
namespace mlf::tiny {
Op* constant(Builder& b, int64_t v) {
return b.create("tiny.const", {}, {i64Ty()},
{{"value", Attribute::integer(v)}});
}
Op* add(Builder& b, Value* l, Value* r) {
return b.create("tiny.add", {l, r}, {i64Ty()});
}
// ...
} // namespace
Three things define an op:
- Name — a dotted string
"dialect.op". Used by passes to match. - Signature — operand types, result types, region count.
- Attributes — name → constant.
valuefortiny.const,sym_namefortiny.func, etc.
That's it. A dialect is a contract about what those three things mean.
Why two dialects?
cp-18 ships tiny.* (high-level, source-aware) and ll.* (low-level, LLVM-ish). The reason for the split is the same reason real MLIR has ~30 built-in dialects: different passes want different abstractions.
- On
tiny.*we can run constant folding trivially — operands oftiny.addeither come fromtiny.constor they don't. No interleaved loads/stores, no aliasing, no ABI quirks. - On
ll.*we'd run register allocation, calling-convention rewrites, memory-layout passes — all things that need to know about the lowered representation.
If you tried to do both at the same level, you'd have one giant dialect
where every pass needs if (op.name == "tiny.add" || op.name == "ll.add") ...
checks. Splitting cleanly separates concerns.
Dialects as a contract
When passes.cpp writes:
if (op->name == "tiny.add" || op->name == "tiny.mul") { ... }
it's relying on the contract that those op names always mean what
dialects.cpp says they mean. If someone adds a tiny.add with two
results, or with a side effect, that contract breaks and the fold pass
becomes a miscompiler.
Real MLIR codifies this with op interfaces and traits: a pass
declares "I match anything implementing BinaryOp", and the trait system
guarantees the matched op has the expected structure. cp-18 trusts the
dialect-helper API as the contract.
How would you add a new dialect?
Pick a name (e.g. tensor.*), decide on op signatures, write helpers in
the namespace. That's the user-facing work. The framework requires no
changes: the printer, the walks, the rewrites all operate on opaque
Op objects.
In real MLIR you'd also subclass Dialect, register your ops, write
verifiers, generate them from TableGen, etc. The architectural shape is
the same as cp-18; the production scaffolding is heavier.
Function ops vs basic ops
tiny.func is a region-carrying op: it has one region containing the
function body. Same for ll.func. Notice how this is just an op —
no special "Function" class in the IR. That's MLIR's design choice: at
the IR level a function isn't fundamentally different from an scf.for
or an scf.if. They all carry regions; they all participate in the same
walks; they all live in the same pass manager.
The implication: you can put a function inside another op. Closures,
nested function definitions, module-of-modules — none require special
casing. Real MLIR exploits this constantly (e.g. gpu.module contains
gpu.func).