Step 04 · Parser error recovery
A parser that aborts on the first error is useless for IDEs and frustrating in the CLI. Error recovery is the difference between "fix one thing at a time" and "see the whole picture, fix everything in one pass".
cp-15's parser uses panic mode recovery — the simplest strategy that works:
Program parseProgram() {
while (peek().kind != Tok::Eof) {
auto s = parseStmt();
if (s) p.stmts.push_back(std::move(*s));
else skipToSyncPoint(); // resync
}
}
void skipToSyncPoint() {
while (peek().kind != Tok::Eof && peek().kind != Tok::Semi) ++i;
accept(Tok::Semi);
}
; is our synchronisation token. After an error inside a
statement, we discard everything up to the next ; and start fresh.
This guarantees:
- The parser terminates (no infinite loops on bad input).
- Subsequent valid statements still get parsed.
- The number of errors reported scales linearly with the number of real mistakes (not exponentially — cascade failures are a common parser-design pitfall).
Better recovery strategies
- Token deletion / insertion: try plausible edits (insert
), delete+) and continue. Powerful but combinatorial. - Phrase-level recovery: define multi-token sync sets per
non-terminal. Statements sync on
; { fn while if, expressions sync on) ; ,. - Tree-sitter / GLR: parse as much as possible, leaving "ERROR" nodes in the tree. Fast enough to re-run on every keystroke.
For a small language, panic mode with one sync token is 95% as useful as any of these and 10× less code.
Don't forget the lexer
The lexer must also recover. lex emits a diagnostic for the bad
character and advances by one byte:
out.diagnostics.push_back(Diagnostic{...});
++i;
Skipping the whole rest of the file on a single bad character would be a denial-of-service vector for IDE users mid-typing.