Glossary
Curriculum-wide terminology, alphabetised. When a term appears for the first time in a lab's CONCEPTS.md, it's also defined inline; this file is the consolidated index.
| Term | Definition |
|---|---|
| AOT | Ahead-of-time compilation. Source → native binary before execution. Clang, rustc, GCC, MSVC. Opposite of JIT. |
| AST | Abstract Syntax Tree. Hierarchical representation of source code after parsing, with syntax noise (parens, whitespace, semicolons) removed. |
| Basic Block | Maximal straight-line sequence of IR instructions with one entry and one exit. Building block of the CFG. |
| Bytecode | A linear sequence of opcodes designed for a virtual machine, not real hardware. CPython, JVM, V8 Ignition. |
| CFG | Control-Flow Graph. Directed graph of basic blocks where edges are possible jumps. |
| Closure | A function value that captures variables from its lexical environment. Requires escape analysis or heap-allocated environments. |
| Computed Goto | A C extension (&&label) enabling threaded bytecode dispatch — 2–3× faster than switch-based dispatch on most CPUs. |
| Constant Folding | Optimization that pre-computes constant expressions at compile time (2+3 → 5). |
| DCE | Dead Code Elimination. Removes instructions whose results are never used. |
| Dialect (MLIR) | A self-contained set of operations and types in MLIR. tensor, arith, affine, llvm are dialects. |
| Dispatch (VM) | The act of selecting and jumping to the implementation of the current opcode. Hottest loop in any interpreter. |
| Dominator | Block A dominates B if every path from entry to B passes through A. Foundation of SSA construction. |
| EBNF | Extended Backus-Naur Form. Standard notation for context-free grammars. |
| ELF | Executable and Linkable Format. Linux/BSD object/binary format. macOS uses Mach-O instead. |
| FFI | Foreign Function Interface. Calling C (or other-ABI) functions from your language. |
| GC | Garbage Collector. Subsystem that reclaims unreachable heap memory. We build mark-sweep in Phase 8. |
| HM | Hindley-Milner. Type inference algorithm (Algorithm W) for functional languages with let-polymorphism. |
| IR | Intermediate Representation. Any data structure between AST and machine code. Compilers typically have 2–10 IRs (e.g., Clang has AST → MLIR → LLVM IR → MIR → MachineInstr). |
| IRBuilder | LLVM helper class that constructs LLVM IR instructions and tracks the insertion point. Most-used API in LLVM frontends. |
| JIT | Just-In-Time compilation. Source/bytecode is compiled to native at runtime. V8, HotSpot, LuaJIT, PyPy, ORC. |
| Lexer | Also called scanner or tokenizer. Converts a character stream into a token stream. |
| LLVM IR | LLVM's typed, SSA-form intermediate representation. Human-readable assembly-like syntax. |
| lli | LLVM IR interpreter / JIT driver. Runs .ll files directly. |
| llc | LLVM static compiler. Lowers LLVM IR to target assembly. |
| Mach-O | Mach Object file format. macOS/iOS executable/library format. ELF's macOS counterpart. |
| mem2reg | LLVM pass that promotes stack allocas to SSA registers when their address is never taken. Foundational; most frontends rely on it. |
| MLIR | Multi-Level Intermediate Representation. LLVM project for multi-IR compiler infrastructure. Powers TensorFlow XLA, IREE, Mojo. |
| mlir-opt | The MLIR optimizer driver. clang -opt for MLIR. |
| ORC | On-Request Compilation. LLVM's JIT framework, current version is ORC v2. |
| Parser | Converts a token stream into an AST. Two flavors: hand-written (recursive descent / Pratt) or generated (yacc, ANTLR). |
| Phi Node (φ) | SSA-form instruction that selects a value based on which predecessor block was the source. The defining characteristic of SSA. |
| Pratt Parser | Top-down operator-precedence parser. Used by V8, Crafting Interpreters, many JavaScript parsers. |
| Recursive Descent | A top-down parser written as one mutually-recursive function per grammar rule. Used by Clang, GCC, rustc. |
| SSA | Static Single Assignment. IR form in which every variable is assigned exactly once. Enables almost all modern optimizations. |
| Symbol Table | Data structure mapping names to declarations, often a stack of hash maps (one per scope). |
| Target Triple | <arch>-<vendor>-<os>-<abi> string identifying the compilation target (e.g., arm64-apple-macosx15.0). |
| Three-Address Code (TAC) | IR form where instructions have at most 3 operands: x = y op z. Common pre-SSA representation. |
| Token | Atomic syntactic unit emitted by the lexer (keywords, identifiers, literals, operators). |
| Tree-Walk Interpreter | Executes a program by recursively visiting AST nodes. Simplest backend; slowest runtime. |
| Type Environment (Γ) | Mapping from variable names to types, used during type checking. |
| Visitor Pattern | Design pattern that adds an operation to a class hierarchy without modifying it. Standard tool for AST traversal. |
| VM | Virtual Machine. Interpreter for a custom bytecode instruction set. CPython, JVM, V8 Ignition, Lua. |