Step 01 · Pipeline overview
minilangc is a thin orchestrator over six clean stages:
source bytes
│
▼ ml::lex
Token stream ── diagnostics? ──► render & exit 65
│
▼ ml::parse
AST (Program) ── diagnostics? ──► render & exit 65
│
▼ ml::typecheck
AST + scope info ── diagnostics? ──► render & exit 65
│
▼ ml::emitLLVMIR
"module.ll" (textual LLVM IR string)
│
▼ ml::buildExecutable → shell out to llc -filetype=obj
"module.o"
│
▼ same path → shell out to clang
executable
│
▼ ml::runExecutable
stdout text
Each arrow corresponds to one function in driver.hpp. The two phases that can fail (parse / build) return rich result structs so the CLI can format the failure however it wants.
Why shell out?
Linking against LLVM-as-a-library is the "right" answer for production
compilers (incremental compilation, JIT, fewer process forks). For this
capstone we shell out to llc + clang because:
- Zero LLVM CMake friction — works as long as
/opt/homebrew/opt/llvm/binexists. - Easier to debug — you can re-run the exact
llccommand yourself. - The pipeline is the same idea — only the boundary is text on disk
vs. in-memory
Module*.
Subsequent labs (cp-17 JIT, cp-18 MLIR) demonstrate the linked alternative.
Stages, separately
Want only the IR? minilangc emit-ir foo.ml > foo.ll.
Want only typecheck? minilangc check foo.ml.
Want everything? minilangc run foo.ml.
This separability is the architecture. Each stage's output is serialisable (tokens → JSON, AST → JSON, IR → text), so you can mix & match: write a third-party formatter, a linter, a documentation generator, all on the same frontend.