Phases & Labs

This curriculum has 9 teaching phases and 18 labs, ending in 3 capstone projects. Labs build on each other, but Phase 5 (LLVM), Phase 6 (JIT), and Phase 7 (MLIR) can be tackled in any order after Phase 4.

Legend: ✅ complete · 🟡 scaffolded · ⬜ planned


Phase 1 — Frontend Foundations

Before you can compile, you must convert source text into a structured tree. This phase teaches lexing, parsing, AST design, and tree-walk interpretation.

LabTitleStatusKey Concepts
cp-01Environment Setup & ToolchainClang vs Apple Clang, target triples, LLVM toolchain, Mach-O vs ELF, CMake, llvm-config
cp-02Arithmetic EvaluatorTokens, recursive descent, EBNF, precedence vs grammar nesting, associativity, AST + Visitor, post-order eval
cp-03MiniLang v0 Frontend🟡Pratt parsing, statements vs expressions, blocks, functions, closures, REPL state, tree-walk interpreter

Phase 2 — Static Semantics

A type-checked language with scoped variables is the foundation of every real frontend. This phase makes MiniLang reject invalid programs before they ever run.

LabTitleStatusKey Concepts
cp-04Symbol Tables & Scoping🟡Lexical scoping, scope stacks, closure capture, name resolution, shadowing, two-pass resolution
cp-05Static Type System (MiniLang v1)🟡Hindley-Milner basics, type environments, monomorphic types, structural vs nominal typing, diagnostics

Phase 3 — Bytecode Virtual Machines

Tree-walkers are slow because of pointer-chasing and virtual dispatch. Bytecode VMs are how CPython, the JVM, V8 (Ignition), and Lua reach 10–50× more throughput.

LabTitleStatusKey Concepts
cp-06Bytecode Design & Compiler🟡Stack-based vs register-based VMs, opcode encoding, constant pools, AST → bytecode lowering, disassembler
cp-07Stack VM Execution (MiniLang v2)🟡Computed-goto dispatch, frame layout, call/return, switch-vs-direct-threading, ICache effects

Phase 4 — Compiler Middle-End (IR & Optimization)

Every production compiler has a middle-end: AST → IR → optimized IR → backend. This phase introduces SSA, the CFG, and classical optimization passes.

LabTitleStatusKey Concepts
cp-08Three-Address IR & CFG🟡TAC representation, basic blocks, control-flow graph, dominators, immediate dominator computation
cp-09SSA & Optimization Passes (MiniLang v3)🟡φ-nodes, SSA construction (Cytron's algorithm), constant folding, DCE, mem2reg, pass manager

Phase 5 — LLVM Backend (Industry Core)

LLVM is the compiler infrastructure for Clang, Swift, Rust, Julia, Mojo, and dozens of others. This phase teaches you to generate LLVM IR, run its optimizer, and produce native binaries.

LabTitleStatusKey Concepts
cp-10LLVM IR Fundamentals🟡Module / Function / BasicBlock / Instruction hierarchy, IRBuilder, types, attributes, intrinsics
cp-11LLVM Codegen (MiniLang++)🟡AST → LLVM IR, control-flow IR patterns, calling conventions, opt pipelines, llc, native linking on macOS

Phase 6 — JIT Compilation (LLVM ORC)

JITs make dynamic languages fast (V8, LuaJIT, HotSpot). LLVM's ORC v2 API is the industrial-strength way to embed a JIT into your runtime.

LabTitleStatusKey Concepts
cp-12ORC JIT Runtime🟡ORC v2 layers, lazy compilation, symbol resolution, function caching, hot-path materialization

Phase 7 — MLIR (Multi-Level IR)

MLIR is the next-generation compiler infrastructure powering TensorFlow XLA, IREE, Mojo, and Triton. This phase teaches dialect design and progressive lowering.

LabTitleStatusKey Concepts
cp-13MiniLang MLIR Dialect & Lowering🟡Operations / Types / Dialects, TableGen, rewrite patterns, ConversionTarget, lowering to LLVM dialect

Phase 8 — Runtime Systems

A language is more than a compiler — it needs a runtime: stack frames, a heap, a GC, and an FFI.

LabTitleStatusKey Concepts
cp-14Stack, Heap, GC, FFI🟡Calling conventions (System V vs Apple ARM64), object headers, mark-sweep GC, root-set scanning, C FFI

Phase 9 — Tooling

Production compilers live or die by their error messages and tooling.

LabTitleStatusKey Concepts
cp-15Diagnostics, Modules, CLI🟡Source spans, fix-it hints (Clang-style), module loader, dependency graph, CLI driver design

Capstones

LabTitleStatusDemonstrates
cp-16MiniLang Compiler Suite🟡End-to-end: interpreter + VM + LLVM backend in one toolchain
cp-17JIT-Accelerated Dynamic Language🟡Python-like subset, ORC JIT, runtime specialization
cp-18MLIR-Style Compiler Framework🟡Plugin dialect registry, multi-level lowering, custom passes

Suggested Pace

  • Full-time learner: ~2 labs per week ⇒ ~9 weeks end-to-end.
  • Side-project learner: ~1 lab per 1–2 weeks ⇒ ~5 months.
  • Concept-only path: skim CONCEPTS.md + docs/analysis.md per lab ⇒ ~1 week to absorb the field.
Phase 1 (cp-01, cp-02, cp-03)  ── MANDATORY, in order
   │
   └─→ Phase 2 (cp-04, cp-05)  ── MANDATORY (frontends pile up)
          │
          ├─→ Phase 3 (cp-06, cp-07)  ── VM track
          │
          └─→ Phase 4 (cp-08, cp-09)  ── IR track ── MANDATORY before Phase 5/6/7
                 │
                 ├─→ Phase 5 (cp-10, cp-11)  ── LLVM backend
                 │       │
                 │       └─→ Phase 6 (cp-12)  ── JIT (needs LLVM)
                 │
                 └─→ Phase 7 (cp-13)  ── MLIR (parallel to LLVM)
                        │
                        └─→ Phase 8 (cp-14)  ── Runtime
                               │
                               └─→ Phase 9 (cp-15)  ── Tooling
                                      │
                                      └─→ Capstones (cp-16 / 17 / 18)

Phase 3 (Bytecode VM) and Phase 4 (IR) are independent — pick whichever excites you first. Phase 5, 6, 7 are each a serious commitment; pick the one most relevant to your career goals first (LLVM = static compilers, JIT = dynamic languages, MLIR = ML compilers / DSLs).