Step 02 · Source spans and locations
A span = (start_offset, length) in the source buffer. A
location = (line, column). We compute the latter from the
former on demand.
class SourceFile {
std::string text_;
std::vector<size_t> lineStarts_; // offset of each line
};
Loc SourceFile::loc(size_t offset) const {
auto it = std::upper_bound(lineStarts_.begin(), lineStarts_.end(), offset);
int line = (int)(it - lineStarts_.begin());
return {line, (int)(offset - lineStarts_[line - 1]) + 1};
}
- Why store offsets, not (line, col)? Offsets are constant-time comparable, deduplicable, hashable. Lines change with edits; offsets don't (per-file).
- Why binary-search lookup? O(log lines) is fast enough for diagnostics. We only convert offsets → (line, col) at print time.
Span propagation
Tokens carry spans straight from the lexer. AST nodes either copy their primary token's span (literals, identifiers) or merge:
e->span = {lhs->span.start,
rhs->span.start + rhs->span.length - lhs->span.start};
A binary expression's span covers lhs op rhs end-to-end. This is
crucial for IDE highlighting: hover over 1 + 2, the whole
expression lights up.
Multi-file
Real compilers carry a (fileId, span) pair. cp-15 uses one file at
a time because that's all the REPL and CLI need; extending to a
SourceMap of files is mechanical (vector<SourceFile> keyed by id).