Step 2 — The Chunk
A chunk is a self-contained compiled unit: every byte the VM will execute, every constant those bytes refer to, and the source-line metadata for diagnostics. In cp-06 there is exactly one chunk — the top-level script. cp-07 will add a chunk-per-function model.
See chunk.hpp.
Data Layout
struct Chunk {
std::vector<uint8_t> code; // flat byte stream
std::vector<Value> constants; // referenced by 1-byte index
std::vector<int> lines; // lines[i] = source line for code[i]
std::string name = "<script>";
};
Three parallel structures:
code
The byte stream. Opcode and operand bytes are mixed: e.g., [CONSTANT, 0, ADD] is three bytes, two instructions. The VM advances ip by 1 for opcode then by N more for operands.
constants
A pool of Values. Literals in the source (42, "hello") are interned here. The compiler emits OpConstant ix where ix is the pool index. Up to 256 entries (1-byte index). Deduplication is by value-equality so print 1; print 1; uses one slot.
lines
Parallel array: lines[i] is the source line of byte code[i]. When the VM throws a runtime error, it consults lines[ip-1] to print "RuntimeError at line 17". The disassembler suppresses repeated line numbers visually so consecutive bytes on the same line read as a group (|).
Why Parallel Vectors?
The alternative is a vector of Instruction { opcode; operand; line; } structs. That would be cache-cleaner per instruction but each struct is 8+ bytes vs 1 byte for a packed stream. For a typical chunk (hundreds to thousands of bytes), the byte stream pulls more instructions per cache line.
Real VMs go further: HotSpot uses a similar packed bytecode; V8 Ignition uses fixed-size 32-bit instructions but in a TurboFan-style separate handler table. Neither uses one-struct-per-instruction in production.
addConstant and Deduplication
uint8_t addConstant(const Value& v) {
for (size_t i = 0; i < constants.size(); ++i)
if (valuesEqual(constants[i], v)) return static_cast<uint8_t>(i);
constants.push_back(v);
return static_cast<uint8_t>(constants.size() - 1);
}
O(n²) over a chunk's compilation but n is small (constants typically <50 per script). For real workloads you'd hash; we keep the linear scan for clarity and zero dependencies.
valuesEqual is structural: same kind, same payload. For strings this is == on the contained std::string. For functions (cp-07) we'll compare by pointer identity since two fn () {} declarations are different closures even with identical source.
Overflow
If constants.size() >= 256, addConstant returns 255 and the compiler emits a diagnostic ("too many constants in chunk"). cp-07 introduces OpConstantLong with a 3-byte (24-bit) operand to lift this to 16M.
Lifetimes
The Chunk owns its constants by value. Strings are std::strings on the heap inside Value::s. cp-07 will introduce a GC for runtime-allocated strings (string concat results) but constant strings live for the chunk's lifetime — they're effectively read-only and could be interned across chunks in a future optimisation pass.
Self-Check
- Why three parallel arrays and not one array of structs?
- How would you change
Chunkto support more than 256 constants? - What invariant must hold between
code.size()andlines.size()?