Step 05 · Pretty printing and formatting

A formatter is a function AST → canonical source. Round-tripping through parse ∘ format should be the identity on the AST (and idempotent on the formatted text). Our formatter is in format.cpp; it's tiny because the grammar is tiny.

Key design choices:

Precedence-aware parenthesisation

void writeExpr(os, e, parentPrec) {
    case Bin: {
        int p = precOf(e.op);
        bool wrap = parentPrec > p;
        if (wrap) os << "(";
        writeExpr(os, *e.lhs, p);
        os << " " << e.op << " ";
        writeExpr(os, *e.rhs, p + 1);  // right-bias for left-assoc
        if (wrap) os << ")";
    }
}

parentPrec > p wraps when the parent expects tighter precedence. The p + 1 on the right side preserves left-associativity: (1 - 2) - 3 keeps its inner parens (because subtraction isn't associative) but 1 - 2 - 3 doesn't need any.

Insertion of canonical whitespace

let x=1+2*3; print x ;let x = 1 + 2 * 3;\nprint x;\n.

A formatter has one correct output per AST. Don't make whitespace configurable; that's gofmt's wisdom. Once teams agree, debates disappear.

Comments are hard

Our toy formatter strips comments because the parser drops them. A real formatter needs to:

  • Carry comments through the AST (attach to neighbouring nodes).
  • Distinguish leading vs. trailing comments.
  • Reflow long lines without orphaning the comment.

This is where rustfmt, prettier, and gofmt all sink most of their implementation budget. The simple case (no comments) is fine for our educational scope.

Beyond plain text

  • AST → diff for refactorings (rename a variable, output is a formatted version with the new name everywhere).
  • AST → HTML for documentation (syntax-highlighted source).
  • AST → AST transformations (CST-preserving rewrites for codemod-style tools).

These all start with a clean formatter.