Step 02 · Source spans and locations

A span = (start_offset, length) in the source buffer. A location = (line, column). We compute the latter from the former on demand.

class SourceFile {
    std::string                 text_;
    std::vector<size_t>         lineStarts_;  // offset of each line
};

Loc SourceFile::loc(size_t offset) const {
    auto it = std::upper_bound(lineStarts_.begin(), lineStarts_.end(), offset);
    int  line = (int)(it - lineStarts_.begin());
    return {line, (int)(offset - lineStarts_[line - 1]) + 1};
}
  • Why store offsets, not (line, col)? Offsets are constant-time comparable, deduplicable, hashable. Lines change with edits; offsets don't (per-file).
  • Why binary-search lookup? O(log lines) is fast enough for diagnostics. We only convert offsets → (line, col) at print time.

Span propagation

Tokens carry spans straight from the lexer. AST nodes either copy their primary token's span (literals, identifiers) or merge:

e->span = {lhs->span.start,
           rhs->span.start + rhs->span.length - lhs->span.start};

A binary expression's span covers lhs op rhs end-to-end. This is crucial for IDE highlighting: hover over 1 + 2, the whole expression lights up.

Multi-file

Real compilers carry a (fileId, span) pair. cp-15 uses one file at a time because that's all the REPL and CLI need; extending to a SourceMap of files is mechanical (vector<SourceFile> keyed by id).