Step 1 — Instruction Set Design

Goal

Design a bytecode instruction set for MiniLang that:

Is easy to emit from the AST (one or two opcodes per node).
Is easy to execute with a tight dispatch loop.
Encodes operands compactly (typical instruction = 1–3 bytes).
Leaves headroom for cp-07's call frames, closures, and upvalues.

The output of this step is opcode.hpp: a single enum and its opName() lookup.

Stack Machine vs Register Machine

Two dominant designs:

	Stack machine	Register machine
Examples	JVM, .NET CLR, CPython, WASM	Lua 5+, Dalvik, V8 Ignition
Operand encoding	Implicit (top of stack)	Explicit (register indices)
Instruction count for `a = b + c`	4 (`GetB`, `GetC`, `Add`, `SetA`)	1 (`Add a, b, c`)
Compiler complexity	Low	High (register allocation)
Per-op dispatch overhead	Higher (more ops)	Lower
Total dispatch overhead	Comparable in practice	Comparable in practice

We pick stack because (a) compilation is dead simple — every expression node "leaves its result on the stack", and (b) the visitor pattern compiles cleanly to stack opcodes (you don't need to thread destination registers through the visitor).

Byte-Aligned, Variable-Length Encoding

Each instruction is one opcode byte followed by zero or more inline operand bytes:

+--------+--------+--------+
| OPCODE | (operands…)     |
+--------+--------+--------+

Operand widths used in cp-06:

1 byte for constant-pool indices (max 256 constants per chunk).
1 byte for local-slot indices (max 256 locals per function).
2 bytes (big-endian) for jump offsets (max ±32 KB jump distance).

The trade-off: variable length means the disassembler has to know each opcode's operand size, but the byte stream is smaller than fixed-width 32-bit encoding (Lua's 4-byte instructions waste bytes on RETURN, POP, etc.).

The cp-06 ISA at a glance

The 32 opcodes break into six functional groups:

Stack/literal:   Constant Nil True False Pop
Variables:       DefGlobal GetGlobal SetGlobal GetLocal SetLocal
Arithmetic:      Add Sub Mul Div Mod Neg
Logic/compare:   Not Eq Ne Lt Le Gt Ge
Control flow:    Jump JumpIfFalse Loop
I/O:             Print
Reserved (cp-07): Call Return Closure GetUpvalue SetUpvalue CloseUpvalue

Notice:

No dedicated And/Or. Short-circuit logic compiles to control flow (JumpIfFalse + Pop). See step 6.
Loop is just an unconditional backward jump with a u16 operand (forward jumps are signed via the encoding pattern; backward jumps are explicit because the bytecode is being emitted forward and the target is already known).
No typed arithmetic ops. Add works on both numbers (sum) and strings (concat) — the type check at compile time guarantees the runtime knows which to do. Other VMs (JVM with iadd/fadd/…) split by type for performance; we trade that for simplicity.
Reserved opcodes are emitted by the disassembler but never produced by the cp-06 compiler. cp-07 wires them up.

Why 32 Opcodes and Not 200?

Crafting Interpreters' Lox has ~30 opcodes. Python has ~120, Java has ~200. Each extra opcode is one more branch in your dispatch switch — and as opcodes proliferate, micro-ops dilute the hot ones. Modern VMs (V8) actually go the other direction: a few big "polymorphic" opcodes that internally specialise.

For learning, fewer opcodes means less to memorise. cp-08 (TAC IR) and cp-09 (SSA) re-explore the design space; cp-13 (MLIR) lets you define your own dialects.

Self-Check

After this step you should be able to:

Pick an instruction set design (stack vs register, variable vs fixed) and defend it.
Predict, for a source like a + b * c, the exact opcode sequence.
Explain why we don't have OpAnd.
List which opcodes are reserved and what runtime concept each needs.

Compilers & Parser Engineer