Step 1 — Instruction Set Design

Goal

Design a bytecode instruction set for MiniLang that:

  1. Is easy to emit from the AST (one or two opcodes per node).
  2. Is easy to execute with a tight dispatch loop.
  3. Encodes operands compactly (typical instruction = 1–3 bytes).
  4. Leaves headroom for cp-07's call frames, closures, and upvalues.

The output of this step is opcode.hpp: a single enum and its opName() lookup.

Stack Machine vs Register Machine

Two dominant designs:

Stack machineRegister machine
ExamplesJVM, .NET CLR, CPython, WASMLua 5+, Dalvik, V8 Ignition
Operand encodingImplicit (top of stack)Explicit (register indices)
Instruction count for a = b + c4 (GetB, GetC, Add, SetA)1 (Add a, b, c)
Compiler complexityLowHigh (register allocation)
Per-op dispatch overheadHigher (more ops)Lower
Total dispatch overheadComparable in practiceComparable in practice

We pick stack because (a) compilation is dead simple — every expression node "leaves its result on the stack", and (b) the visitor pattern compiles cleanly to stack opcodes (you don't need to thread destination registers through the visitor).

Byte-Aligned, Variable-Length Encoding

Each instruction is one opcode byte followed by zero or more inline operand bytes:

+--------+--------+--------+
| OPCODE | (operands…)     |
+--------+--------+--------+

Operand widths used in cp-06:

  • 1 byte for constant-pool indices (max 256 constants per chunk).
  • 1 byte for local-slot indices (max 256 locals per function).
  • 2 bytes (big-endian) for jump offsets (max ±32 KB jump distance).

The trade-off: variable length means the disassembler has to know each opcode's operand size, but the byte stream is smaller than fixed-width 32-bit encoding (Lua's 4-byte instructions waste bytes on RETURN, POP, etc.).

The cp-06 ISA at a glance

The 32 opcodes break into six functional groups:

Stack/literal:   Constant Nil True False Pop
Variables:       DefGlobal GetGlobal SetGlobal GetLocal SetLocal
Arithmetic:      Add Sub Mul Div Mod Neg
Logic/compare:   Not Eq Ne Lt Le Gt Ge
Control flow:    Jump JumpIfFalse Loop
I/O:             Print
Reserved (cp-07): Call Return Closure GetUpvalue SetUpvalue CloseUpvalue

Notice:

  • No dedicated And/Or. Short-circuit logic compiles to control flow (JumpIfFalse + Pop). See step 6.
  • Loop is just an unconditional backward jump with a u16 operand (forward jumps are signed via the encoding pattern; backward jumps are explicit because the bytecode is being emitted forward and the target is already known).
  • No typed arithmetic ops. Add works on both numbers (sum) and strings (concat) — the type check at compile time guarantees the runtime knows which to do. Other VMs (JVM with iadd/fadd/…) split by type for performance; we trade that for simplicity.
  • Reserved opcodes are emitted by the disassembler but never produced by the cp-06 compiler. cp-07 wires them up.

Why 32 Opcodes and Not 200?

Crafting Interpreters' Lox has ~30 opcodes. Python has ~120, Java has ~200. Each extra opcode is one more branch in your dispatch switch — and as opcodes proliferate, micro-ops dilute the hot ones. Modern VMs (V8) actually go the other direction: a few big "polymorphic" opcodes that internally specialise.

For learning, fewer opcodes means less to memorise. cp-08 (TAC IR) and cp-09 (SSA) re-explore the design space; cp-13 (MLIR) lets you define your own dialects.

Self-Check

After this step you should be able to:

  • Pick an instruction set design (stack vs register, variable vs fixed) and defend it.
  • Predict, for a source like a + b * c, the exact opcode sequence.
  • Explain why we don't have OpAnd.
  • List which opcodes are reserved and what runtime concept each needs.