Step 04 · Parser error recovery

A parser that aborts on the first error is useless for IDEs and frustrating in the CLI. Error recovery is the difference between "fix one thing at a time" and "see the whole picture, fix everything in one pass".

cp-15's parser uses panic mode recovery — the simplest strategy that works:

Program parseProgram() {
    while (peek().kind != Tok::Eof) {
        auto s = parseStmt();
        if (s) p.stmts.push_back(std::move(*s));
        else   skipToSyncPoint();      // resync
    }
}

void skipToSyncPoint() {
    while (peek().kind != Tok::Eof && peek().kind != Tok::Semi) ++i;
    accept(Tok::Semi);
}

; is our synchronisation token. After an error inside a statement, we discard everything up to the next ; and start fresh. This guarantees:

  • The parser terminates (no infinite loops on bad input).
  • Subsequent valid statements still get parsed.
  • The number of errors reported scales linearly with the number of real mistakes (not exponentially — cascade failures are a common parser-design pitfall).

Better recovery strategies

  • Token deletion / insertion: try plausible edits (insert ), delete +) and continue. Powerful but combinatorial.
  • Phrase-level recovery: define multi-token sync sets per non-terminal. Statements sync on ; { fn while if, expressions sync on ) ; ,.
  • Tree-sitter / GLR: parse as much as possible, leaving "ERROR" nodes in the tree. Fast enough to re-run on every keystroke.

For a small language, panic mode with one sync token is 95% as useful as any of these and 10× less code.

Don't forget the lexer

The lexer must also recover. lex emits a diagnostic for the bad character and advances by one byte:

out.diagnostics.push_back(Diagnostic{...});
++i;

Skipping the whole rest of the file on a single bad character would be a denial-of-service vector for IDE users mid-typing.