05 — Strings as Private Globals + GEP
The Str expression node is the only non-numeric value type in cp-17. Its
lowering reveals two LLVM concepts every IR emitter must internalise:
GlobalVariable for constant data, and getelementptr (GEP) for
address arithmetic.
case Expr::K::Str: {
Constant* s = ConstantDataArray::getString(ctx, e.str, /*addNull=*/true);
auto* gv = new GlobalVariable(
mod, s->getType(), /*isConstant=*/true,
GlobalValue::PrivateLinkage, s, ".str");
return b.CreateInBoundsGEP(
s->getType(), gv,
{ConstantInt::get(i32(), 0), ConstantInt::get(i32(), 0)});
}
Step by step:
ConstantDataArray::getStringbuilds an[N x i8]constant from the string bytes, optionally NUL-terminated. This is the value.new GlobalVariable(...)registers that constant as a module-level symbol withPrivateLinkage(linker-internal, won't conflict across modules) andisConstant=true(the optimiser may place it in.rodata). The variable's type is[N x i8], andgvis aConstant*pointing to it (in LLVM 20 with opaque pointers, the pointer is justptr).CreateInBoundsGEPcomputes the address of element[0][0]:- first index
0steps through the pointer (typical idiom for "the array itself, not array #N"); - second index
0steps to the first byte inside the array. The result is aptrto byte 0 of the string — exactly whatml_print_str(const char*)expects.
- first index
The two-index GEP for arrays
You'll see {0, 0} patterns everywhere in LLVM IR for "decay an array to a
char*". The mental model:
gv : ptr to [N x i8]
gep 0 : same as gv (no offset, but lets us index into the pointee)
gep 0,0 : pointer to the first i8 in the array
Conceptually like C's &str[0]. If the global were [10 x [4 x i8]],
{0, i, j} would give &str[i][j]. The first index is special; subsequent
indices walk the aggregate type.
Why opaque pointers don't change this
LLVM 20 dropped typed pointers — every pointer in IR is just ptr. But the
GEP instruction still needs to know the pointee type to compute offsets.
That's what the explicit s->getType() (the array type) argument to
CreateInBoundsGEP is for. The IRBuilder no longer infers it from the
pointer's type, because the pointer has no type.
Lifetime
The GlobalVariable is owned by the Module. When the Module is moved
into ThreadSafeModule and handed to ORC, ownership transfers. After
JITting, the global's bytes live somewhere in the JIT's data section, and
the pointer we returned is valid for as long as the LLJIT instance lives.
For test code this is fine; for a long-running VM you'd care about reclaiming
unused string globals (a job for ORC's resource-tracker API).