Step 05 · Pretty printing and formatting
A formatter is a function AST → canonical source. Round-tripping
through parse ∘ format should be the identity on the AST (and
idempotent on the formatted text). Our formatter is in
format.cpp; it's tiny because the grammar
is tiny.
Key design choices:
Precedence-aware parenthesisation
void writeExpr(os, e, parentPrec) {
case Bin: {
int p = precOf(e.op);
bool wrap = parentPrec > p;
if (wrap) os << "(";
writeExpr(os, *e.lhs, p);
os << " " << e.op << " ";
writeExpr(os, *e.rhs, p + 1); // right-bias for left-assoc
if (wrap) os << ")";
}
}
parentPrec > p wraps when the parent expects tighter precedence.
The p + 1 on the right side preserves left-associativity:
(1 - 2) - 3 keeps its inner parens (because subtraction isn't
associative) but 1 - 2 - 3 doesn't need any.
Insertion of canonical whitespace
let x=1+2*3; print x ; → let x = 1 + 2 * 3;\nprint x;\n.
A formatter has one correct output per AST. Don't make whitespace configurable; that's gofmt's wisdom. Once teams agree, debates disappear.
Comments are hard
Our toy formatter strips comments because the parser drops them. A real formatter needs to:
- Carry comments through the AST (attach to neighbouring nodes).
- Distinguish leading vs. trailing comments.
- Reflow long lines without orphaning the comment.
This is where rustfmt, prettier, and gofmt all sink most of their implementation budget. The simple case (no comments) is fine for our educational scope.
Beyond plain text
- AST → diff for refactorings (rename a variable, output is a formatted version with the new name everywhere).
- AST → HTML for documentation (syntax-highlighted source).
- AST → AST transformations (CST-preserving rewrites for codemod-style tools).
These all start with a clean formatter.