04 — Defining Dialects

A dialect in cp-18 is just a namespace of helpers that construct ops with the right name + signature. There's no class hierarchy, no registration, no inheritance.

namespace mlf::tiny {
Op* constant(Builder& b, int64_t v) {
    return b.create("tiny.const", {}, {i64Ty()},
                    {{"value", Attribute::integer(v)}});
}
Op* add(Builder& b, Value* l, Value* r) {
    return b.create("tiny.add", {l, r}, {i64Ty()});
}
// ...
} // namespace

Three things define an op:

  1. Name — a dotted string "dialect.op". Used by passes to match.
  2. Signature — operand types, result types, region count.
  3. Attributes — name → constant. value for tiny.const, sym_name for tiny.func, etc.

That's it. A dialect is a contract about what those three things mean.

Why two dialects?

cp-18 ships tiny.* (high-level, source-aware) and ll.* (low-level, LLVM-ish). The reason for the split is the same reason real MLIR has ~30 built-in dialects: different passes want different abstractions.

  • On tiny.* we can run constant folding trivially — operands of tiny.add either come from tiny.const or they don't. No interleaved loads/stores, no aliasing, no ABI quirks.
  • On ll.* we'd run register allocation, calling-convention rewrites, memory-layout passes — all things that need to know about the lowered representation.

If you tried to do both at the same level, you'd have one giant dialect where every pass needs if (op.name == "tiny.add" || op.name == "ll.add") ... checks. Splitting cleanly separates concerns.

Dialects as a contract

When passes.cpp writes:

if (op->name == "tiny.add" || op->name == "tiny.mul") { ... }

it's relying on the contract that those op names always mean what dialects.cpp says they mean. If someone adds a tiny.add with two results, or with a side effect, that contract breaks and the fold pass becomes a miscompiler.

Real MLIR codifies this with op interfaces and traits: a pass declares "I match anything implementing BinaryOp", and the trait system guarantees the matched op has the expected structure. cp-18 trusts the dialect-helper API as the contract.

How would you add a new dialect?

Pick a name (e.g. tensor.*), decide on op signatures, write helpers in the namespace. That's the user-facing work. The framework requires no changes: the printer, the walks, the rewrites all operate on opaque Op objects.

In real MLIR you'd also subclass Dialect, register your ops, write verifiers, generate them from TableGen, etc. The architectural shape is the same as cp-18; the production scaffolding is heavier.

Function ops vs basic ops

tiny.func is a region-carrying op: it has one region containing the function body. Same for ll.func. Notice how this is just an op — no special "Function" class in the IR. That's MLIR's design choice: at the IR level a function isn't fundamentally different from an scf.for or an scf.if. They all carry regions; they all participate in the same walks; they all live in the same pass manager.

The implication: you can put a function inside another op. Closures, nested function definitions, module-of-modules — none require special casing. Real MLIR exploits this constantly (e.g. gpu.module contains gpu.func).