Step 05 · Roots and safepoints
GC needs to know which Values are live. The mark phase starts from the root set; anything not reachable from a root is garbage.
Our root set is explicit:
class Heap {
std::vector<Value*> roots_;
void pushRoot(Value* slot);
void popRoot();
};
class RootScope {
RootScope(Heap& h, Value* slot) { h.pushRoot(slot); }
~RootScope() { h.popRoot(); }
};
Callers root every local that could hold an object before any allocation:
Value greet = makeString(h, "hello, ");
RootScope rg(h, &greet); // greet is now safe across GC
Value who = makeString(h, "world"); // GC may run here
RootScope rw(h, &who);
Value full = add(greet, who, h);
This is awkward and easy to forget. Production runtimes do better:
Conservative stack scanning
Scan every byte of the C stack as if each pointer-aligned word might be a pointer; if it points into a known heap object, treat it as a root. The Boehm GC works this way.
- Pros: no rooting boilerplate; works with arbitrary C/C++.
- Cons: false positives keep dead objects alive; can't relocate objects (so no copying / compacting GC).
Precise stack maps
The compiler emits, per call site, a map of which stack slots and registers contain pointers. The GC walks the stack frame by frame, asks the map "what's live here?", and traces those slots.
- Pros: precise, supports relocating GC.
- Cons: codegen complexity, every safepoint inhibits some optimisations.
Safepoints
A safepoint is a point in code where the runtime guarantees the stack is in a known state — typically function entry and loop backedges. The compiler inserts a poll:
if (gc_request) suspend();
When the GC wants to run, it sets gc_request, then waits for all
mutator threads to reach a safepoint. This bounds GC latency to
roughly the loop-iteration time.
cp-14 has only one thread and no JIT integration, so we just call
collect() from alloc synchronously — the entire VM is a "GC
safepoint" by construction. cp-17's capstone wires precise stack maps
into the cp-12 JIT.