03 — Registering Runtime Symbols with ORC

The JIT'd module declares but does not define the runtime functions:

declare void @ml_print_int(i64)
declare void @ml_print_str(ptr)
declare void @ml_record_int_arg(i64)

When ORC compiles main, those calls become real bl/call instructions to some address — but to which address? Nothing in the module says. ORC will search its JITDylib for a symbol named ml_print_int, and if none is found, the lookup fails at execution time with a link error.

Our job is to put the host process's function addresses into that table before we run anything. In jit.cpp:

llvm::orc::SymbolMap syms;
auto def = [&](void* p) {
    return llvm::orc::ExecutorSymbolDef(
        llvm::orc::ExecutorAddr::fromPtr(p),
        llvm::JITSymbolFlags::Exported | llvm::JITSymbolFlags::Callable);
};
syms[es.intern("ml_print_int")] = def((void*)&ml_print_int);
syms[es.intern("ml_print_str")] = def((void*)&ml_print_str);
syms[es.intern("ml_record_int_arg")] = def((void*)&ml_record_int_arg);
jd.define(llvm::orc::absoluteSymbols(std::move(syms)));

Three pieces:

  • intern(name) turns a StringRef into a SymbolStringPtr. The string pool is owned by the ExecutionSession, so all lookups can compare pointers instead of strings.
  • ExecutorSymbolDef(addr, flags) wraps a raw pointer with metadata. Exported makes the symbol visible to lookups; Callable distinguishes function pointers from data pointers (relevant for some platforms' ABI).
  • absoluteSymbols wraps the map in a MaterializationUnit whose "materialise" step is trivial: the addresses are already known.

Then jd.define(...) installs the unit. From this point on, jit->lookup("ml_print_int") would return the host address, and so will ORC's internal linker when it resolves the declare in the user module.

Why not just rely on DynamicLibrarySearchGenerator?

LLJIT has a default generator that searches the host process for symbols by name. If our runtime functions had public C linkage in the main executable, that mechanism would find them automatically. We register explicitly for three reasons:

  1. Determinism. We control which names are reachable; nothing else leaks from the host into JIT'd code.
  2. Plumbing for sandboxing. In a production VM you eventually want the JIT to live in a different address space (or a different process). The ExecutorAddr indirection is what makes that swap possible — same API, just point at a remote address.
  3. It's the same path the type-feedback hook will take. Future VM services — bailout, GC write barrier, deopt — register exactly the same way.

Lookup and call

auto sym = jit->lookup("main");        // ORC compiles `main` on demand
auto fn  = sym->toPtr<int64_t(*)()>(); // raw function pointer
int64_t result = fn();

That lookup call is the moment ORC walks the module, runs optimisation passes, lowers to machine code, copies the bytes into executable pages, and applies relocations. From the C++ side it looks like a hash-table lookup; in reality it's the whole back end pipeline you used to spend clang minutes on.