07 — Where to Grow

cp-17 is a complete dynamic-language JIT in maybe 800 lines. It demonstrates the architecture; it doesn't demonstrate the optimisations that make JITs worth their complexity. Here's the roadmap if you wanted to grow this into a real VM.

1. Interpreter tier

Real JITs don't JIT first. They interpret bytecode until a function gets warm, then JIT. Why? Because compilation is expensive and most code runs once. cp-17 currently JITs everything immediately, paying the LLVM compile cost even for print 42.

Add: a bytecode design (stack-based, small), an interpreter loop in C++, per-function call counters, a threshold (say 1000 calls), and a queue of "functions to compile". The JIT becomes the second tier, not the first.

2. Inline caches

Right now every method call goes through full IR with no specialisation. The hook for change is there — ml_record_int_arg proves you can collect type observations — but the IR doesn't use them.

Add: a CallSite struct keyed by (function, bytecode offset). On each call, write the receiver type into a small slot. On recompile, generate code that checks the cached type with a single compare-and-branch ("guard") and then proceeds along the fast path. On guard failure, fall back to the generic dispatch.

That single mechanism — type guard + cached fast path — is most of what makes V8 and LuaJIT fast.

3. Deoptimisation / OSR

Once you have guards, you need to handle guard failures. The optimised frame is laid out differently from the interpreter's stack; on a bailout you must reconstruct the interpreter state from the optimised frame's registers and spills, then resume in the interpreter.

This is on-stack replacement (OSR) in the deopt direction. The OSR-in direction (interpreter → JIT, mid-loop) is also useful: detect a hot loop, JIT it, patch the interpreter to jump into the JIT'd loop with current state.

Both are hard. Both require precise side-tables emitted by the JIT describing how every interpreter value maps to a JIT location at every deopt point.

4. Hidden classes for objects

cp-17 has no objects. When you add them: every object header should point to a shape (V8 calls them "maps", JavaScriptCore "structures") that describes its layout. Two objects with the same key sequence share a shape; adding a key transitions to a new shape, recording the transition.

Why? Because inline caches key on shape, not on dynamic type. A property lookup becomes "load shape pointer; compare to cached shape; if match, load at cached offset; else miss". This is the single most important optimisation for dynamic-OO languages and it falls out naturally from the inline-cache infrastructure above.

5. Garbage collection

The runtime currently leaks every string global into the JIT's data section. A real VM needs:

A heap with allocation, marking, and reclamation.
GC roots identified in optimised frames (more side-tables from the JIT).
Write barriers for generational GC (yet another runtime symbol the JIT must inject around every store-to-heap).

cp-14 (Runtime Systems) showed a tagged-value layout; this is where you'd plug a real collector under it.

6. Concurrency

ORC supports multi-threaded compilation out of the box (LLJIT is thread-safe; that's what ThreadSafeModule is about). A real VM compiles on background threads while the main thread keeps interpreting, then atomically swaps the function entry pointer when the JIT result is ready.

Where this lab leaves you

Concretely, after cp-17 you should be able to:

Build an llvm::Module from an AST with IRBuilder (no text).
Wire a runtime function into JIT'd code via ORC's symbol API.
Emit a runtime callback at any IR point and read its results from C++.
Diagnose JIT bugs with verifyModule before they crash.

These are the muscles. The rest — IC, deopt, hidden classes, GC — are combinations of them.

Compilers & Parser Engineer