Step 2 — The Chunk

A chunk is a self-contained compiled unit: every byte the VM will execute, every constant those bytes refer to, and the source-line metadata for diagnostics. In cp-06 there is exactly one chunk — the top-level script. cp-07 will add a chunk-per-function model.

See chunk.hpp.

Data Layout

struct Chunk {
    std::vector<uint8_t> code;       // flat byte stream
    std::vector<Value>   constants;  // referenced by 1-byte index
    std::vector<int>     lines;      // lines[i] = source line for code[i]
    std::string          name = "<script>";
};

Three parallel structures:

code

The byte stream. Opcode and operand bytes are mixed: e.g., [CONSTANT, 0, ADD] is three bytes, two instructions. The VM advances ip by 1 for opcode then by N more for operands.

constants

A pool of Values. Literals in the source (42, "hello") are interned here. The compiler emits OpConstant ix where ix is the pool index. Up to 256 entries (1-byte index). Deduplication is by value-equality so print 1; print 1; uses one slot.

lines

Parallel array: lines[i] is the source line of byte code[i]. When the VM throws a runtime error, it consults lines[ip-1] to print "RuntimeError at line 17". The disassembler suppresses repeated line numbers visually so consecutive bytes on the same line read as a group (|).

Why Parallel Vectors?

The alternative is a vector of Instruction { opcode; operand; line; } structs. That would be cache-cleaner per instruction but each struct is 8+ bytes vs 1 byte for a packed stream. For a typical chunk (hundreds to thousands of bytes), the byte stream pulls more instructions per cache line.

Real VMs go further: HotSpot uses a similar packed bytecode; V8 Ignition uses fixed-size 32-bit instructions but in a TurboFan-style separate handler table. Neither uses one-struct-per-instruction in production.

addConstant and Deduplication

uint8_t addConstant(const Value& v) {
    for (size_t i = 0; i < constants.size(); ++i)
        if (valuesEqual(constants[i], v)) return static_cast<uint8_t>(i);
    constants.push_back(v);
    return static_cast<uint8_t>(constants.size() - 1);
}

O(n²) over a chunk's compilation but n is small (constants typically <50 per script). For real workloads you'd hash; we keep the linear scan for clarity and zero dependencies.

valuesEqual is structural: same kind, same payload. For strings this is == on the contained std::string. For functions (cp-07) we'll compare by pointer identity since two fn () {} declarations are different closures even with identical source.

Overflow

If constants.size() >= 256, addConstant returns 255 and the compiler emits a diagnostic ("too many constants in chunk"). cp-07 introduces OpConstantLong with a 3-byte (24-bit) operand to lift this to 16M.

Lifetimes

The Chunk owns its constants by value. Strings are std::strings on the heap inside Value::s. cp-07 will introduce a GC for runtime-allocated strings (string concat results) but constant strings live for the chunk's lifetime — they're effectively read-only and could be interned across chunks in a future optimisation pass.

Self-Check

  • Why three parallel arrays and not one array of structs?
  • How would you change Chunk to support more than 256 constants?
  • What invariant must hold between code.size() and lines.size()?