Step 3 — Emit Helpers and Back-Patching Jumps

The compiler is one long sequence of emit(byte, line) calls. Most are trivial; the interesting case is forward jumps, where you have to emit an instruction whose target you don't know yet.

See emit*, emitJump, patchJump, emitLoop in compiler.cpp.

Emit Primitives

void emit(Op op, int line);
void emit(uint8_t byte, int line);
void emitConstant(const Value& v, int line);

These all reduce to chunk_->writeByte (append + track line). emitConstant is one helper because every constant emission is [OpConstant, ix].

The Back-Patching Problem

Consider if (cond) thenBranch else elseBranch. We want bytecode roughly:

<cond>
JumpIfFalse  ELSE_START
Pop                       ; drop condition on then-path
<thenBranch>
Jump         END
ELSE_START:
Pop                       ; drop condition on else-path
<elseBranch>
END:

When we emit JumpIfFalse ELSE_START, we don't yet know what ELSE_START is — it depends on how big the <thenBranch> bytecode turns out to be. Same for Jump END.

Solution: emit a placeholder operand (0xff 0xff), remember its offset, and write the real value once the target is known.

size_t emitJump(Op op, int line) {
    emit(op, line);
    emit(0xff, line);
    emit(0xff, line);
    return chunk_->code.size() - 2;  // offset of the placeholder bytes
}

void patchJump(size_t at, int line) {
    size_t target = chunk_->code.size();
    size_t jumpFrom = at + 2;             // ip after the operand
    size_t off = target - jumpFrom;
    if (off > 0xffff) error(line, "branch too far");
    chunk_->code[at]     = (off >> 8) & 0xff;
    chunk_->code[at + 1] = off & 0xff;
}

Usage:

size_t thenJump = emitJump(Op::JumpIfFalse, line);
emit(Op::Pop, line);
visit(thenBranch);
size_t endJump = emitJump(Op::Jump, line);
patchJump(thenJump, line);
emit(Op::Pop, line);
visit(elseBranch);
patchJump(endJump, line);

Backward Jumps — Loop

Loops are easier because the target (loop start) was already emitted:

void emitLoop(size_t loopStart, int line) {
    emit(Op::Loop, line);
    size_t off = chunk_->code.size() - loopStart + 2;
    emit((off >> 8) & 0xff, line);
    emit(off & 0xff, line);
}

No patching needed — the offset is computed inline. The VM reads the unsigned operand and subtracts it from ip (which now points past the operand bytes).

Why Two-Byte Offsets?

A u16 gives ±32 KB of branch range from any single jump site. Real programs in MiniLang rarely have functions larger than a few KB. If we hit the limit (the compiler emits "branch too far"), cp-07 will either:

  • introduce a JumpLong with a u24/u32 operand, or
  • bounce through a trampoline (emit a short jump to an intermediate JumpLong that does the real work).

JVM uses goto_w for the same reason: long jumps are an opcode flavour, not a switch.

Sentinel Bytes — 0xff 0xff

Why fill placeholders with 0xff 0xff rather than 0x00 0x00? It's a defensive habit: if we forget to patchJump, the VM will read a 65535-byte jump and trip an obvious bug rather than a subtle off-by-one (jumping zero bytes). A linter / asan could be configured to flag this further.

Self-Check

  • For an if with no else, you only need one jump. Why?
  • Why do we Pop twice in the if/else lowering (once on each branch)?
  • Could emitJump return a Op* instead of a size_t? What problem would that cause?