Step 3 — Emit Helpers and Back-Patching Jumps
The compiler is one long sequence of emit(byte, line) calls. Most are trivial; the interesting case is forward jumps, where you have to emit an instruction whose target you don't know yet.
See emit*, emitJump, patchJump, emitLoop in compiler.cpp.
Emit Primitives
void emit(Op op, int line);
void emit(uint8_t byte, int line);
void emitConstant(const Value& v, int line);
These all reduce to chunk_->writeByte (append + track line). emitConstant is one helper because every constant emission is [OpConstant, ix].
The Back-Patching Problem
Consider if (cond) thenBranch else elseBranch. We want bytecode roughly:
<cond>
JumpIfFalse ELSE_START
Pop ; drop condition on then-path
<thenBranch>
Jump END
ELSE_START:
Pop ; drop condition on else-path
<elseBranch>
END:
When we emit JumpIfFalse ELSE_START, we don't yet know what ELSE_START is — it depends on how big the <thenBranch> bytecode turns out to be. Same for Jump END.
Solution: emit a placeholder operand (0xff 0xff), remember its offset, and write the real value once the target is known.
size_t emitJump(Op op, int line) {
emit(op, line);
emit(0xff, line);
emit(0xff, line);
return chunk_->code.size() - 2; // offset of the placeholder bytes
}
void patchJump(size_t at, int line) {
size_t target = chunk_->code.size();
size_t jumpFrom = at + 2; // ip after the operand
size_t off = target - jumpFrom;
if (off > 0xffff) error(line, "branch too far");
chunk_->code[at] = (off >> 8) & 0xff;
chunk_->code[at + 1] = off & 0xff;
}
Usage:
size_t thenJump = emitJump(Op::JumpIfFalse, line);
emit(Op::Pop, line);
visit(thenBranch);
size_t endJump = emitJump(Op::Jump, line);
patchJump(thenJump, line);
emit(Op::Pop, line);
visit(elseBranch);
patchJump(endJump, line);
Backward Jumps — Loop
Loops are easier because the target (loop start) was already emitted:
void emitLoop(size_t loopStart, int line) {
emit(Op::Loop, line);
size_t off = chunk_->code.size() - loopStart + 2;
emit((off >> 8) & 0xff, line);
emit(off & 0xff, line);
}
No patching needed — the offset is computed inline. The VM reads the unsigned operand and subtracts it from ip (which now points past the operand bytes).
Why Two-Byte Offsets?
A u16 gives ±32 KB of branch range from any single jump site. Real programs in MiniLang rarely have functions larger than a few KB. If we hit the limit (the compiler emits "branch too far"), cp-07 will either:
- introduce a
JumpLongwith au24/u32operand, or - bounce through a trampoline (emit a short jump to an intermediate
JumpLongthat does the real work).
JVM uses goto_w for the same reason: long jumps are an opcode flavour, not a switch.
Sentinel Bytes — 0xff 0xff
Why fill placeholders with 0xff 0xff rather than 0x00 0x00? It's a defensive habit: if we forget to patchJump, the VM will read a 65535-byte jump and trip an obvious bug rather than a subtle off-by-one (jumping zero bytes). A linter / asan could be configured to flag this further.
Self-Check
- For an
ifwith noelse, you only need one jump. Why? - Why do we
Poptwice in theif/elselowering (once on each branch)? - Could
emitJumpreturn aOp*instead of asize_t? What problem would that cause?