Step 2 — Operands and instructions

Operand

struct Operand {
    enum class Kind { None, Temp, Constant, Named };
    Kind        kind = Kind::None;
    int         tempId = -1;     // Temp
    Value       constVal;        // Constant
    std::string name;            // Named  (includes leading sigil-less form)
    // factories: none(), temp(id), constant(v), named(name)
};

We use a single struct with a Kind tag rather than std::variant to keep the struct trivially copyable and (more importantly) easy to print in a debugger. When you're chasing an IR bug at 1 a.m. you want p ins.srcs[0] to show something, not a variant index.

tempId, constVal, and name are independent fields; only one is meaningful for any given Kind. The constructors mirror that:

Operand::temp(3);                 // t3
Operand::constant(Value::makeInt(42));   // immediate
Operand::named("x");              // %x
Operand::none();                  // placeholder

Op — the opcode enum

GroupOpcodes
arithmeticAdd Sub Mul Div Mod Neg
comparisonEq Ne Lt Le Gt Ge
logicalNot (and/or are lowered, not opcodes)
move/loadMove LoadGlobal StoreGlobal
controlJump CondJump Return
effectsPrint Call

Notable design choices:

  • No And / Or opcode. Short-circuit semantics demand control flow; we lower them to CondJump (see step 6).
  • Move rather than Copy. Same idea as RISC-V or MIPS pseudo-ops: one instruction that says "write the source into the destination, unchanged." The mem2reg pass in cp-09 will eliminate most of these.
  • Call is a regular instruction. It has a destination temp (for the return value), an opcode-level callee name in ins.name for direct calls, and operands [callee, arg0, arg1, ...]. Indirect calls (cp-12 closures) will store <indirect> in the name and use srcs[0] for the callee operand.

Instr

struct Instr {
    Op            op;
    Operand       dst;         // None if the op produces no value
    std::vector<Operand> srcs; // 0..N source operands
    std::string   name;        // global name / function name
    int           bbT = -1;    // jmp target / cjmp true target
    int           bbF = -1;    // cjmp false target
    int           line = 0;    // source line for diagnostics
};

One struct fits all instruction kinds. The alternative — a discriminated hierarchy with AddInstr, JumpInstr, CallInstr, ... — is dogmatically purer, but cripplingly painful to walk in passes. Every pass would need a giant visitor or a type-switch. A flat struct lets passes loop over instrs and switch on ins.op.

The cost: each instruction carries unused fields. For TAC at this scale that's a sub-megabyte overhead even for large programs, and it's the shape MLIR uses (an Operation* with attributes, results, operands, successors). Compiler IRs converge on this design for a reason.

Why constants are inline operands

In some IRs (notably LLVM) constants are first-class Values, distinct from instructions. We took the simpler route: a constant is just an Operand::Constant, printed inline. Pros: trivial printer, no constant pool to manage. Cons: you can't dyn_cast a constant the way you can in LLVM. For a teaching IR that's the right trade.