Glossary

Curriculum-wide terminology, alphabetised. When a term appears for the first time in a lab's CONCEPTS.md, it's also defined inline; this file is the consolidated index.

TermDefinition
AOTAhead-of-time compilation. Source → native binary before execution. Clang, rustc, GCC, MSVC. Opposite of JIT.
ASTAbstract Syntax Tree. Hierarchical representation of source code after parsing, with syntax noise (parens, whitespace, semicolons) removed.
Basic BlockMaximal straight-line sequence of IR instructions with one entry and one exit. Building block of the CFG.
BytecodeA linear sequence of opcodes designed for a virtual machine, not real hardware. CPython, JVM, V8 Ignition.
CFGControl-Flow Graph. Directed graph of basic blocks where edges are possible jumps.
ClosureA function value that captures variables from its lexical environment. Requires escape analysis or heap-allocated environments.
Computed GotoA C extension (&&label) enabling threaded bytecode dispatch — 2–3× faster than switch-based dispatch on most CPUs.
Constant FoldingOptimization that pre-computes constant expressions at compile time (2+3 → 5).
DCEDead Code Elimination. Removes instructions whose results are never used.
Dialect (MLIR)A self-contained set of operations and types in MLIR. tensor, arith, affine, llvm are dialects.
Dispatch (VM)The act of selecting and jumping to the implementation of the current opcode. Hottest loop in any interpreter.
DominatorBlock A dominates B if every path from entry to B passes through A. Foundation of SSA construction.
EBNFExtended Backus-Naur Form. Standard notation for context-free grammars.
ELFExecutable and Linkable Format. Linux/BSD object/binary format. macOS uses Mach-O instead.
FFIForeign Function Interface. Calling C (or other-ABI) functions from your language.
GCGarbage Collector. Subsystem that reclaims unreachable heap memory. We build mark-sweep in Phase 8.
HMHindley-Milner. Type inference algorithm (Algorithm W) for functional languages with let-polymorphism.
IRIntermediate Representation. Any data structure between AST and machine code. Compilers typically have 2–10 IRs (e.g., Clang has AST → MLIR → LLVM IR → MIR → MachineInstr).
IRBuilderLLVM helper class that constructs LLVM IR instructions and tracks the insertion point. Most-used API in LLVM frontends.
JITJust-In-Time compilation. Source/bytecode is compiled to native at runtime. V8, HotSpot, LuaJIT, PyPy, ORC.
LexerAlso called scanner or tokenizer. Converts a character stream into a token stream.
LLVM IRLLVM's typed, SSA-form intermediate representation. Human-readable assembly-like syntax.
lliLLVM IR interpreter / JIT driver. Runs .ll files directly.
llcLLVM static compiler. Lowers LLVM IR to target assembly.
Mach-OMach Object file format. macOS/iOS executable/library format. ELF's macOS counterpart.
mem2regLLVM pass that promotes stack allocas to SSA registers when their address is never taken. Foundational; most frontends rely on it.
MLIRMulti-Level Intermediate Representation. LLVM project for multi-IR compiler infrastructure. Powers TensorFlow XLA, IREE, Mojo.
mlir-optThe MLIR optimizer driver. clang -opt for MLIR.
ORCOn-Request Compilation. LLVM's JIT framework, current version is ORC v2.
ParserConverts a token stream into an AST. Two flavors: hand-written (recursive descent / Pratt) or generated (yacc, ANTLR).
Phi Node (φ)SSA-form instruction that selects a value based on which predecessor block was the source. The defining characteristic of SSA.
Pratt ParserTop-down operator-precedence parser. Used by V8, Crafting Interpreters, many JavaScript parsers.
Recursive DescentA top-down parser written as one mutually-recursive function per grammar rule. Used by Clang, GCC, rustc.
SSAStatic Single Assignment. IR form in which every variable is assigned exactly once. Enables almost all modern optimizations.
Symbol TableData structure mapping names to declarations, often a stack of hash maps (one per scope).
Target Triple<arch>-<vendor>-<os>-<abi> string identifying the compilation target (e.g., arm64-apple-macosx15.0).
Three-Address Code (TAC)IR form where instructions have at most 3 operands: x = y op z. Common pre-SSA representation.
TokenAtomic syntactic unit emitted by the lexer (keywords, identifiers, literals, operators).
Tree-Walk InterpreterExecutes a program by recursively visiting AST nodes. Simplest backend; slowest runtime.
Type Environment (Γ)Mapping from variable names to types, used during type checking.
Visitor PatternDesign pattern that adds an operation to a class hierarchy without modifying it. Standard tool for AST traversal.
VMVirtual Machine. Interpreter for a custom bytecode instruction set. CPython, JVM, V8 Ignition, Lua.