Step 04 · Mark-sweep GC
Algorithm (in heap.cpp):
collect():
for each root slot:
mark(*slot)
sweep:
for each object in object table:
if marked: clear mark
else: unlink + free
mark is the classic tri-colour (here: just two-colour) traversal:
void markObject(Object* o) {
if (!o || o->marked) return; // already grey/black
o->marked = 1; // turn black
if (o->kind == ObjKind::Array)
for (each elem) mark(elem); // grey-ify children
}
Recursion-depth concern: a long array chain can blow the C stack. In
production, switch to an explicit work queue (std::vector<Object*>)
and pop until empty. Not needed for our tests but worth knowing.
Why mark-sweep?
Pros:
- Simplest correct GC. Trivial to debug — print the object table before and after, diff the survivors.
- Doesn't move objects. Pointers from the C/C++ host remain valid
across collections. This matters when a JIT'd function takes a raw
char*from a string. - Tolerates ambiguous roots. Even if a root might be a pointer (conservative scanning), false positives just keep objects alive a cycle longer.
Cons:
- Fragmentation. Repeated allocate/free leaves holes that the bump allocator can't reuse. We'd need a free list (next step upward) or compaction.
- Pause time scales with live set + dead set. Sweep is O(total objects), even if very few survived.
- Cache-unfriendly object table walks. Each pointer chase costs a miss.
Triggering policy
maybeCollect() runs on every allocation:
if (allocatedBytes_ >= gcThreshold_) collect();
// after collect:
if (allocatedBytes_ * 2 > gcThreshold_) gcThreshold_ = allocatedBytes_ * 2;
Doubling the threshold based on the post-collect live set is a classic way to keep amortised GC cost bounded. If 1 KiB survives, we let the heap grow to 2 KiB before collecting again — guaranteeing O(live) work per allocated byte.