07 — Where to Grow

cp-18 reproduces MLIR's shape. If you wanted to grow it into something genuinely useful, here's the roadmap.

1. Verifiers

Right now any code that builds an ill-formed op succeeds silently. The first quality-of-life upgrade is a verifyOp(Op&) function that checks:

  • operand and result counts match the dialect spec,
  • operand and result types match,
  • required attributes are present and the right kind,
  • region count is right,
  • terminator ops are present at end of every block.

Real MLIR generates these from TableGen; you can hand-write them per dialect. Run after every pass; refuse to print invalid IR.

2. Op interfaces / traits

Hard-coded if (op.name == "tiny.add" || op.name == "tiny.mul") doesn't scale past a handful of ops. Replace with a trait system:

struct BinaryOp { Value* lhs(Op& o); Value* rhs(Op& o); };
bool implementsBinary(const std::string& name);

Then folders and DCE check implementsBinary(op.name) rather than naming specific ops. New ops opt into the trait by registering with the system. This is MLIR's OpInterface mechanism in skeleton form.

3. Pattern DSL

Switch from hand-written if/switch to declarative patterns:

addPattern<BinaryOpPattern<"tiny.add">>(folder);

The base class encapsulates the match + create-replacement + replace-uses

  • erase boilerplate. Patterns become 5-line specs of "what to match" and "what to emit". This is RewritePattern in MLIR.

4. Real type system

Replace Type { string name } with a tagged union or class hierarchy:

struct Type { enum class Kind { Int, Float, Tensor, Function, ... }; ... };
struct TensorType : Type { Type elemType; vector<int64_t> shape; };

Then types can be compared structurally and dialects can demand specific type shapes. Shape inference becomes possible: an op like tensor.matmul : tensor<MxKxf32> × tensor<KxNxf32> → tensor<MxNxf32> can verify and propagate shapes.

5. Conversion framework

Generalise lowerTinyToLL into:

struct TypeConverter { Type convert(Type src); };
struct ConversionPattern { virtual bool match(Op&) = 0; virtual void rewrite(Op&, ...) = 0; };
void applyFullConversion(Op& root, vector<ConversionPattern*>, TargetSpec);

Where TargetSpec declares which ops are "legal" in the output. Patterns plug in modularly. Same idea as MLIR's mlir::ConversionPatternRewriter.

6. A useful dialect: tensors

The natural next thing to model is tensor.*:

  • tensor.const : tensor<NxNxf32> — a constant tensor with shape.
  • tensor.add : (tensor, tensor) -> tensor — elementwise.
  • tensor.matmul.
  • Conversion to a loop dialect (scf.for + memref.store/load).
  • Conversion of the loop dialect to ll.*.

That's the toy tutorial of MLIR done in your own framework. Three dialects, two lowering steps, demonstrates the whole stack: high-level algebraic IR → loop nest → low-level CPU code.

7. Plug into real LLVM

If you wire the final ll.* dialect to actually emit llvm::IRBuilder calls (the way cp-17's ir_emit.cpp does), you have a complete frontend: surface language → tiny → ll → LLVM IR → JIT or native code.

At that point you're a small implementation distance from your own domain-specific compiler. The IRBuilder bridge is the same code as cp-17 with op-name dispatch driving it.

Where this lab leaves you

You can read MLIR source code, recognise its idioms, and understand why an op-centric, region-carrying, dialect-extensible IR was the right answer for modern compiler stacks. You can also build your own project-internal IR with this shape when LLVM IR is too low-level for your problem domain — which, for any compiler targeting ML, hardware design, or DSLs, is essentially always.