05 — Pass Infrastructure and the Canonicalisation Idea

A pass in cp-18 is just a function:

using Pass = std::function<bool(Op& moduleOp)>;

It mutates the module and returns true if anything changed. A Pipeline runs them in sequence:

mlf::pass::Pipeline pipe;
pipe.add("constant-fold", mlf::pass::constantFold);
pipe.add("dce",           mlf::pass::deadCodeElimination);
pipe.run(moduleOp);

That's the entire model. Real MLIR's PassManager adds threading, caching of analyses, scheduling at different nesting levels (module pass vs function pass vs op pass), pass options, statistics, and pipeline specification via textual config. None of those change the core idea: passes are functions, pipelines compose them.

Constant folding as a model rewrite

constantFold in cp-18 demonstrates the universal rewrite pattern:

for (Op* op in candidates):
    if op matches a known shape:                 // ← pattern matcher
        compute folded value                     // ← semantic step
        Builder b; b.setInsertionPointBefore(op);
        Op* replacement = b.create("...", ...);  // ← rewriter
        replaceAllUses(root, op->result(0), replacement->result(0));
        op->parent->eraseOp(op);                 // ← cleanup
        restart  // because the IR shape changed under us

Real MLIR formalises this as RewritePattern. You write a class with match and rewrite methods, register it, and a driver (applyPatternsAndFoldGreedily) handles the fixpoint loop. The driver solves three problems cp-18 punts on:

  1. Termination. Fixpoint iteration can loop if patterns keep undoing each other. MLIR tracks rewrites and bails if the same op gets rewritten too many times.
  2. Cost-based selection. When multiple patterns match, MLIR picks the cheapest. cp-18 only has one folding rule, so no ambiguity.
  3. Worklist management. Newly inserted ops should be revisited. MLIR keeps a worklist; cp-18 restarts the whole walk after each change (correct but quadratic).

DCE as a model "delete dead things" pass

bool deadCodeElimination(Op& moduleOp) {
    // 1. Find all values that ARE used somewhere.
    // 2. Find a pure op whose results aren't in that set.
    // 3. Delete it. Repeat.
}

The crucial concept: purity. We hard-code the set of pure ops:

static bool isPure(const Op& op) {
    return op.name == "tiny.const" || op.name == "tiny.add"
        || op.name == "tiny.mul"   || /* etc */;
}

Real MLIR uses the MemoryEffectOpInterface: an op declares the read/write/allocate/free effects it has on memory. DCE removes an op only if it has no effects (or only MemoryEffects::Read from immutable storage). This generalises to any dialect without changing the DCE pass.

cp-18 hard-codes the set because we don't have an interface system. Same algorithm, less elegant.

Why a single pipeline, not many "phases"?

Because IRs in this design are monomorphic: every op is just mlf::Op. Passes can be composed freely — fold then DCE then fold again then convert — without serialising to text and re-parsing. Real MLIR does the same: you build a single PassManager and run dozens of passes in sequence, all operating on the same in-memory module.

Compare with classic LLVM, where each pass is a different FunctionPass/ModulePass subclass, ordering is managed by a Pass Manager whose API is large and historical, and pipeline specification (e.g. -O2) is hard-coded in C++ rather than declared in text. MLIR's PassManager is a cleaner take on the same idea.