Step 2 — IR interpreter as oracle
Before writing any optimisation pass, we built an IR interpreter.
That is the most important file in src/: ir_interp.cpp.
Why interpret IR at all?
In production, IR is consumed by a backend that lowers it to machine code. We are not yet ready to do that — cp-10 introduces LLVM IR emission, cp-11 native codegen. But to test our middle-end now, we need a way to ask "does this IR module mean the same thing it did before I ran the passes?"
That is the role of runProgram(module). Given an IR module, it
executes it and returns the stream of print outputs. Two invocations
on semantically-equivalent modules must produce identical strings.
This is the strongest test we can write for a pass. It avoids the "golden file" trap (matching exact instruction sequences is brittle: any harmless permutation breaks the test) and instead checks the only thing that matters — meaning.
Implementation shape
struct Interp {
const Module& mod;
unordered_map<string, Value> globals;
ostringstream out;
Value callFunction(const Function& fn, const vector<Value>& args);
};
A Frame is conjured per call: temps map from tempId → Value, named
locals from string → Value. Globals live on the interpreter.
Execution is a while (true) over blocks. Within a block we walk
instrs sequentially:
- Value-producing ops (
add,lt,move,neg…) compute aValueand write to the dst operand. printformats the operand and appends toout.ldg/stgread or write a global.calllooks up a function by name in the module, recursively invokescallFunction, and stores the result in the dst.jmp/cjmpsetcurrentIdandgoto next_block.retreturns fromcallFunction.
A safety budget (safety = 1e6 instructions) prevents tests from
hanging on infinite loops — see test test_const_fold_comparison
which would have spun forever if we hadn't fixed the IR-builder bug.
Why named operands work
A typical SSA interpreter only has temps. Ours has both temps and
named locals (%i, %x) because cp-08's lowering keeps source-level
variables as memory cells. That's deliberate: cp-09's passes never
need to reason about them.
When we move to LLVM in cp-10, those named locals become allocas and
LLVM's own mem2reg pass converts them into SSA temps. We simulate the
same final result by using the named-local convention as a "loadable
storage slot" model.
How tests use the interpreter
auto preOut = ir::runProgram(module); // before any pass
ir::runAll(module); // mutate
auto postOut = ir::runProgram(module); // after all passes
CHECK(preOut.output == postOut.output);
If a pass ever breaks semantics, that assertion fails before the golden-string checks do. It is the single most valuable assertion in the test file.