Phases & Labs

This curriculum has 9 teaching phases and 18 labs, ending in 3 capstone projects. Labs build on each other, but Phase 5 (LLVM), Phase 6 (JIT), and Phase 7 (MLIR) can be tackled in any order after Phase 4.

Legend: ✅ complete · 🟡 scaffolded · ⬜ planned

Phase 1 — Frontend Foundations

Before you can compile, you must convert source text into a structured tree. This phase teaches lexing, parsing, AST design, and tree-walk interpretation.

Lab	Title	Status	Key Concepts
cp-01	Environment Setup & Toolchain	✅	Clang vs Apple Clang, target triples, LLVM toolchain, Mach-O vs ELF, CMake, `llvm-config`
cp-02	Arithmetic Evaluator	✅	Tokens, recursive descent, EBNF, precedence vs grammar nesting, associativity, AST + Visitor, post-order eval
cp-03	MiniLang v0 Frontend	🟡	Pratt parsing, statements vs expressions, blocks, functions, closures, REPL state, tree-walk interpreter

Phase 2 — Static Semantics

A type-checked language with scoped variables is the foundation of every real frontend. This phase makes MiniLang reject invalid programs before they ever run.

Lab	Title	Status	Key Concepts
cp-04	Symbol Tables & Scoping	🟡	Lexical scoping, scope stacks, closure capture, name resolution, shadowing, two-pass resolution
cp-05	Static Type System (MiniLang v1)	🟡	Hindley-Milner basics, type environments, monomorphic types, structural vs nominal typing, diagnostics

Phase 3 — Bytecode Virtual Machines

Tree-walkers are slow because of pointer-chasing and virtual dispatch. Bytecode VMs are how CPython, the JVM, V8 (Ignition), and Lua reach 10–50× more throughput.

Lab	Title	Status	Key Concepts
cp-06	Bytecode Design & Compiler	🟡	Stack-based vs register-based VMs, opcode encoding, constant pools, AST → bytecode lowering, disassembler
cp-07	Stack VM Execution (MiniLang v2)	🟡	Computed-goto dispatch, frame layout, call/return, switch-vs-direct-threading, ICache effects

Phase 4 — Compiler Middle-End (IR & Optimization)

Every production compiler has a middle-end: AST → IR → optimized IR → backend. This phase introduces SSA, the CFG, and classical optimization passes.

Lab	Title	Status	Key Concepts
cp-08	Three-Address IR & CFG	🟡	TAC representation, basic blocks, control-flow graph, dominators, immediate dominator computation
cp-09	SSA & Optimization Passes (MiniLang v3)	🟡	φ-nodes, SSA construction (Cytron's algorithm), constant folding, DCE, mem2reg, pass manager

Phase 5 — LLVM Backend (Industry Core)

LLVM is the compiler infrastructure for Clang, Swift, Rust, Julia, Mojo, and dozens of others. This phase teaches you to generate LLVM IR, run its optimizer, and produce native binaries.

Lab	Title	Status	Key Concepts
cp-10	LLVM IR Fundamentals	🟡	Module / Function / BasicBlock / Instruction hierarchy, IRBuilder, types, attributes, intrinsics
cp-11	LLVM Codegen (MiniLang++)	🟡	AST → LLVM IR, control-flow IR patterns, calling conventions, opt pipelines, `llc`, native linking on macOS

Phase 6 — JIT Compilation (LLVM ORC)

JITs make dynamic languages fast (V8, LuaJIT, HotSpot). LLVM's ORC v2 API is the industrial-strength way to embed a JIT into your runtime.

Lab	Title	Status	Key Concepts
cp-12	ORC JIT Runtime	🟡	ORC v2 layers, lazy compilation, symbol resolution, function caching, hot-path materialization

Phase 7 — MLIR (Multi-Level IR)

MLIR is the next-generation compiler infrastructure powering TensorFlow XLA, IREE, Mojo, and Triton. This phase teaches dialect design and progressive lowering.

Lab	Title	Status	Key Concepts
cp-13	MiniLang MLIR Dialect & Lowering	🟡	Operations / Types / Dialects, TableGen, rewrite patterns, `ConversionTarget`, lowering to LLVM dialect

Phase 8 — Runtime Systems

A language is more than a compiler — it needs a runtime: stack frames, a heap, a GC, and an FFI.

Lab	Title	Status	Key Concepts
cp-14	Stack, Heap, GC, FFI	🟡	Calling conventions (System V vs Apple ARM64), object headers, mark-sweep GC, root-set scanning, C FFI

Phase 9 — Tooling

Production compilers live or die by their error messages and tooling.

Lab	Title	Status	Key Concepts
cp-15	Diagnostics, Modules, CLI	🟡	Source spans, fix-it hints (Clang-style), module loader, dependency graph, CLI driver design

Capstones

Lab	Title	Status	Demonstrates
cp-16	MiniLang Compiler Suite	🟡	End-to-end: interpreter + VM + LLVM backend in one toolchain
cp-17	JIT-Accelerated Dynamic Language	🟡	Python-like subset, ORC JIT, runtime specialization
cp-18	MLIR-Style Compiler Framework	🟡	Plugin dialect registry, multi-level lowering, custom passes

Suggested Pace

Full-time learner: ~2 labs per week ⇒ ~9 weeks end-to-end.
Side-project learner: ~1 lab per 1–2 weeks ⇒ ~5 months.
Concept-only path: skim CONCEPTS.md + docs/analysis.md per lab ⇒ ~1 week to absorb the field.

Recommended Progression

Phase 1 (cp-01, cp-02, cp-03)  ── MANDATORY, in order
   │
   └─→ Phase 2 (cp-04, cp-05)  ── MANDATORY (frontends pile up)
          │
          ├─→ Phase 3 (cp-06, cp-07)  ── VM track
          │
          └─→ Phase 4 (cp-08, cp-09)  ── IR track ── MANDATORY before Phase 5/6/7
                 │
                 ├─→ Phase 5 (cp-10, cp-11)  ── LLVM backend
                 │       │
                 │       └─→ Phase 6 (cp-12)  ── JIT (needs LLVM)
                 │
                 └─→ Phase 7 (cp-13)  ── MLIR (parallel to LLVM)
                        │
                        └─→ Phase 8 (cp-14)  ── Runtime
                               │
                               └─→ Phase 9 (cp-15)  ── Tooling
                                      │
                                      └─→ Capstones (cp-16 / 17 / 18)

Phase 3 (Bytecode VM) and Phase 4 (IR) are independent — pick whichever excites you first. Phase 5, 6, 7 are each a serious commitment; pick the one most relevant to your career goals first (LLVM = static compilers, JIT = dynamic languages, MLIR = ML compilers / DSLs).

Compilers & Parser Engineer