Phases & Labs
This curriculum has 9 teaching phases and 18 labs, ending in 3 capstone projects. Labs build on each other, but Phase 5 (LLVM), Phase 6 (JIT), and Phase 7 (MLIR) can be tackled in any order after Phase 4.
Legend: ✅ complete · 🟡 scaffolded · ⬜ planned
Phase 1 — Frontend Foundations
Before you can compile, you must convert source text into a structured tree. This phase teaches lexing, parsing, AST design, and tree-walk interpretation.
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| cp-01 | Environment Setup & Toolchain | ✅ | Clang vs Apple Clang, target triples, LLVM toolchain, Mach-O vs ELF, CMake, llvm-config |
| cp-02 | Arithmetic Evaluator | ✅ | Tokens, recursive descent, EBNF, precedence vs grammar nesting, associativity, AST + Visitor, post-order eval |
| cp-03 | MiniLang v0 Frontend | 🟡 | Pratt parsing, statements vs expressions, blocks, functions, closures, REPL state, tree-walk interpreter |
Phase 2 — Static Semantics
A type-checked language with scoped variables is the foundation of every real frontend. This phase makes MiniLang reject invalid programs before they ever run.
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| cp-04 | Symbol Tables & Scoping | 🟡 | Lexical scoping, scope stacks, closure capture, name resolution, shadowing, two-pass resolution |
| cp-05 | Static Type System (MiniLang v1) | 🟡 | Hindley-Milner basics, type environments, monomorphic types, structural vs nominal typing, diagnostics |
Phase 3 — Bytecode Virtual Machines
Tree-walkers are slow because of pointer-chasing and virtual dispatch. Bytecode VMs are how CPython, the JVM, V8 (Ignition), and Lua reach 10–50× more throughput.
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| cp-06 | Bytecode Design & Compiler | 🟡 | Stack-based vs register-based VMs, opcode encoding, constant pools, AST → bytecode lowering, disassembler |
| cp-07 | Stack VM Execution (MiniLang v2) | 🟡 | Computed-goto dispatch, frame layout, call/return, switch-vs-direct-threading, ICache effects |
Phase 4 — Compiler Middle-End (IR & Optimization)
Every production compiler has a middle-end: AST → IR → optimized IR → backend. This phase introduces SSA, the CFG, and classical optimization passes.
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| cp-08 | Three-Address IR & CFG | 🟡 | TAC representation, basic blocks, control-flow graph, dominators, immediate dominator computation |
| cp-09 | SSA & Optimization Passes (MiniLang v3) | 🟡 | φ-nodes, SSA construction (Cytron's algorithm), constant folding, DCE, mem2reg, pass manager |
Phase 5 — LLVM Backend (Industry Core)
LLVM is the compiler infrastructure for Clang, Swift, Rust, Julia, Mojo, and dozens of others. This phase teaches you to generate LLVM IR, run its optimizer, and produce native binaries.
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| cp-10 | LLVM IR Fundamentals | 🟡 | Module / Function / BasicBlock / Instruction hierarchy, IRBuilder, types, attributes, intrinsics |
| cp-11 | LLVM Codegen (MiniLang++) | 🟡 | AST → LLVM IR, control-flow IR patterns, calling conventions, opt pipelines, llc, native linking on macOS |
Phase 6 — JIT Compilation (LLVM ORC)
JITs make dynamic languages fast (V8, LuaJIT, HotSpot). LLVM's ORC v2 API is the industrial-strength way to embed a JIT into your runtime.
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| cp-12 | ORC JIT Runtime | 🟡 | ORC v2 layers, lazy compilation, symbol resolution, function caching, hot-path materialization |
Phase 7 — MLIR (Multi-Level IR)
MLIR is the next-generation compiler infrastructure powering TensorFlow XLA, IREE, Mojo, and Triton. This phase teaches dialect design and progressive lowering.
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| cp-13 | MiniLang MLIR Dialect & Lowering | 🟡 | Operations / Types / Dialects, TableGen, rewrite patterns, ConversionTarget, lowering to LLVM dialect |
Phase 8 — Runtime Systems
A language is more than a compiler — it needs a runtime: stack frames, a heap, a GC, and an FFI.
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| cp-14 | Stack, Heap, GC, FFI | 🟡 | Calling conventions (System V vs Apple ARM64), object headers, mark-sweep GC, root-set scanning, C FFI |
Phase 9 — Tooling
Production compilers live or die by their error messages and tooling.
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| cp-15 | Diagnostics, Modules, CLI | 🟡 | Source spans, fix-it hints (Clang-style), module loader, dependency graph, CLI driver design |
Capstones
| Lab | Title | Status | Demonstrates |
|---|---|---|---|
| cp-16 | MiniLang Compiler Suite | 🟡 | End-to-end: interpreter + VM + LLVM backend in one toolchain |
| cp-17 | JIT-Accelerated Dynamic Language | 🟡 | Python-like subset, ORC JIT, runtime specialization |
| cp-18 | MLIR-Style Compiler Framework | 🟡 | Plugin dialect registry, multi-level lowering, custom passes |
Suggested Pace
- Full-time learner: ~2 labs per week ⇒ ~9 weeks end-to-end.
- Side-project learner: ~1 lab per 1–2 weeks ⇒ ~5 months.
- Concept-only path: skim
CONCEPTS.md+docs/analysis.mdper lab ⇒ ~1 week to absorb the field.
Recommended Progression
Phase 1 (cp-01, cp-02, cp-03) ── MANDATORY, in order
│
└─→ Phase 2 (cp-04, cp-05) ── MANDATORY (frontends pile up)
│
├─→ Phase 3 (cp-06, cp-07) ── VM track
│
└─→ Phase 4 (cp-08, cp-09) ── IR track ── MANDATORY before Phase 5/6/7
│
├─→ Phase 5 (cp-10, cp-11) ── LLVM backend
│ │
│ └─→ Phase 6 (cp-12) ── JIT (needs LLVM)
│
└─→ Phase 7 (cp-13) ── MLIR (parallel to LLVM)
│
└─→ Phase 8 (cp-14) ── Runtime
│
└─→ Phase 9 (cp-15) ── Tooling
│
└─→ Capstones (cp-16 / 17 / 18)
Phase 3 (Bytecode VM) and Phase 4 (IR) are independent — pick whichever excites you first. Phase 5, 6, 7 are each a serious commitment; pick the one most relevant to your career goals first (LLVM = static compilers, JIT = dynamic languages, MLIR = ML compilers / DSLs).