03 — Registering Runtime Symbols with ORC
The JIT'd module declares but does not define the runtime functions:
declare void @ml_print_int(i64)
declare void @ml_print_str(ptr)
declare void @ml_record_int_arg(i64)
When ORC compiles main, those calls become real bl/call instructions to
some address — but to which address? Nothing in the module says. ORC will
search its JITDylib for a symbol named ml_print_int, and if none is found,
the lookup fails at execution time with a link error.
Our job is to put the host process's function addresses into that table
before we run anything. In jit.cpp:
llvm::orc::SymbolMap syms;
auto def = [&](void* p) {
return llvm::orc::ExecutorSymbolDef(
llvm::orc::ExecutorAddr::fromPtr(p),
llvm::JITSymbolFlags::Exported | llvm::JITSymbolFlags::Callable);
};
syms[es.intern("ml_print_int")] = def((void*)&ml_print_int);
syms[es.intern("ml_print_str")] = def((void*)&ml_print_str);
syms[es.intern("ml_record_int_arg")] = def((void*)&ml_record_int_arg);
jd.define(llvm::orc::absoluteSymbols(std::move(syms)));
Three pieces:
intern(name)turns aStringRefinto aSymbolStringPtr. The string pool is owned by theExecutionSession, so all lookups can compare pointers instead of strings.ExecutorSymbolDef(addr, flags)wraps a raw pointer with metadata.Exportedmakes the symbol visible to lookups;Callabledistinguishes function pointers from data pointers (relevant for some platforms' ABI).absoluteSymbolswraps the map in aMaterializationUnitwhose "materialise" step is trivial: the addresses are already known.
Then jd.define(...) installs the unit. From this point on,
jit->lookup("ml_print_int") would return the host address, and so will
ORC's internal linker when it resolves the declare in the user module.
Why not just rely on DynamicLibrarySearchGenerator?
LLJIT has a default generator that searches the host process for symbols by
name. If our runtime functions had public C linkage in the main
executable, that mechanism would find them automatically. We register
explicitly for three reasons:
- Determinism. We control which names are reachable; nothing else leaks from the host into JIT'd code.
- Plumbing for sandboxing. In a production VM you eventually want the
JIT to live in a different address space (or a different process). The
ExecutorAddrindirection is what makes that swap possible — same API, just point at a remote address. - It's the same path the type-feedback hook will take. Future VM services — bailout, GC write barrier, deopt — register exactly the same way.
Lookup and call
auto sym = jit->lookup("main"); // ORC compiles `main` on demand
auto fn = sym->toPtr<int64_t(*)()>(); // raw function pointer
int64_t result = fn();
That lookup call is the moment ORC walks the module, runs optimisation
passes, lowers to machine code, copies the bytes into executable pages, and
applies relocations. From the C++ side it looks like a hash-table lookup; in
reality it's the whole back end pipeline you used to spend clang minutes on.