Step 04 · Mark-sweep GC

Algorithm (in heap.cpp):

collect():
    for each root slot:
        mark(*slot)
    sweep:
        for each object in object table:
            if marked: clear mark
            else: unlink + free

mark is the classic tri-colour (here: just two-colour) traversal:

void markObject(Object* o) {
    if (!o || o->marked) return;     // already grey/black
    o->marked = 1;                   // turn black
    if (o->kind == ObjKind::Array)
        for (each elem) mark(elem);  // grey-ify children
}

Recursion-depth concern: a long array chain can blow the C stack. In production, switch to an explicit work queue (std::vector<Object*>) and pop until empty. Not needed for our tests but worth knowing.

Why mark-sweep?

Pros:

  • Simplest correct GC. Trivial to debug — print the object table before and after, diff the survivors.
  • Doesn't move objects. Pointers from the C/C++ host remain valid across collections. This matters when a JIT'd function takes a raw char* from a string.
  • Tolerates ambiguous roots. Even if a root might be a pointer (conservative scanning), false positives just keep objects alive a cycle longer.

Cons:

  • Fragmentation. Repeated allocate/free leaves holes that the bump allocator can't reuse. We'd need a free list (next step upward) or compaction.
  • Pause time scales with live set + dead set. Sweep is O(total objects), even if very few survived.
  • Cache-unfriendly object table walks. Each pointer chase costs a miss.

Triggering policy

maybeCollect() runs on every allocation:

if (allocatedBytes_ >= gcThreshold_) collect();
// after collect:
if (allocatedBytes_ * 2 > gcThreshold_) gcThreshold_ = allocatedBytes_ * 2;

Doubling the threshold based on the post-collect live set is a classic way to keep amortised GC cost bounded. If 1 KiB survives, we let the heap grow to 2 KiB before collecting again — guaranteeing O(live) work per allocated byte.