07 — Where to Grow
cp-17 is a complete dynamic-language JIT in maybe 800 lines. It demonstrates the architecture; it doesn't demonstrate the optimisations that make JITs worth their complexity. Here's the roadmap if you wanted to grow this into a real VM.
1. Interpreter tier
Real JITs don't JIT first. They interpret bytecode until a function gets
warm, then JIT. Why? Because compilation is expensive and most code runs
once. cp-17 currently JITs everything immediately, paying the LLVM
compile cost even for print 42.
Add: a bytecode design (stack-based, small), an interpreter loop in C++, per-function call counters, a threshold (say 1000 calls), and a queue of "functions to compile". The JIT becomes the second tier, not the first.
2. Inline caches
Right now every method call goes through full IR with no specialisation.
The hook for change is there — ml_record_int_arg proves you can collect
type observations — but the IR doesn't use them.
Add: a CallSite struct keyed by (function, bytecode offset). On each
call, write the receiver type into a small slot. On recompile, generate
code that checks the cached type with a single compare-and-branch
("guard") and then proceeds along the fast path. On guard failure, fall
back to the generic dispatch.
That single mechanism — type guard + cached fast path — is most of what makes V8 and LuaJIT fast.
3. Deoptimisation / OSR
Once you have guards, you need to handle guard failures. The optimised frame is laid out differently from the interpreter's stack; on a bailout you must reconstruct the interpreter state from the optimised frame's registers and spills, then resume in the interpreter.
This is on-stack replacement (OSR) in the deopt direction. The OSR-in direction (interpreter → JIT, mid-loop) is also useful: detect a hot loop, JIT it, patch the interpreter to jump into the JIT'd loop with current state.
Both are hard. Both require precise side-tables emitted by the JIT describing how every interpreter value maps to a JIT location at every deopt point.
4. Hidden classes for objects
cp-17 has no objects. When you add them: every object header should point to a shape (V8 calls them "maps", JavaScriptCore "structures") that describes its layout. Two objects with the same key sequence share a shape; adding a key transitions to a new shape, recording the transition.
Why? Because inline caches key on shape, not on dynamic type. A property lookup becomes "load shape pointer; compare to cached shape; if match, load at cached offset; else miss". This is the single most important optimisation for dynamic-OO languages and it falls out naturally from the inline-cache infrastructure above.
5. Garbage collection
The runtime currently leaks every string global into the JIT's data section. A real VM needs:
- A heap with allocation, marking, and reclamation.
- GC roots identified in optimised frames (more side-tables from the JIT).
- Write barriers for generational GC (yet another runtime symbol the JIT must inject around every store-to-heap).
cp-14 (Runtime Systems) showed a tagged-value layout; this is where you'd plug a real collector under it.
6. Concurrency
ORC supports multi-threaded compilation out of the box (LLJIT is
thread-safe; that's what ThreadSafeModule is about). A real VM compiles
on background threads while the main thread keeps interpreting, then
atomically swaps the function entry pointer when the JIT result is ready.
Where this lab leaves you
Concretely, after cp-17 you should be able to:
- Build an
llvm::Modulefrom an AST withIRBuilder(no text). - Wire a runtime function into JIT'd code via ORC's symbol API.
- Emit a runtime callback at any IR point and read its results from C++.
- Diagnose JIT bugs with
verifyModulebefore they crash.
These are the muscles. The rest — IC, deopt, hidden classes, GC — are combinations of them.