Step 01 · Pipeline overview

minilangc is a thin orchestrator over six clean stages:

   source bytes
       │
       ▼  ml::lex
   Token stream  ── diagnostics? ──► render & exit 65
       │
       ▼  ml::parse
   AST (Program)  ── diagnostics? ──► render & exit 65
       │
       ▼  ml::typecheck
   AST + scope info  ── diagnostics? ──► render & exit 65
       │
       ▼  ml::emitLLVMIR
   "module.ll"  (textual LLVM IR string)
       │
       ▼  ml::buildExecutable  → shell out to llc -filetype=obj
   "module.o"
       │
       ▼  same path             → shell out to clang
   executable
       │
       ▼  ml::runExecutable
   stdout text

Each arrow corresponds to one function in driver.hpp. The two phases that can fail (parse / build) return rich result structs so the CLI can format the failure however it wants.

Why shell out?

Linking against LLVM-as-a-library is the "right" answer for production compilers (incremental compilation, JIT, fewer process forks). For this capstone we shell out to llc + clang because:

  • Zero LLVM CMake friction — works as long as /opt/homebrew/opt/llvm/bin exists.
  • Easier to debug — you can re-run the exact llc command yourself.
  • The pipeline is the same idea — only the boundary is text on disk vs. in-memory Module*.

Subsequent labs (cp-17 JIT, cp-18 MLIR) demonstrate the linked alternative.

Stages, separately

Want only the IR? minilangc emit-ir foo.ml > foo.ll. Want only typecheck? minilangc check foo.ml. Want everything? minilangc run foo.ml.

This separability is the architecture. Each stage's output is serialisable (tokens → JSON, AST → JSON, IR → text), so you can mix & match: write a third-party formatter, a linter, a documentation generator, all on the same frontend.