git.ipfire.org Git - thirdparty/git.git/commit

author	Eric Sunshine <sunshine@sunshineco.com>
	Tue, 8 Nov 2022 19:08:28 +0000 (19:08 +0000)
committer	Taylor Blau <me@ttaylorr.com>
	Tue, 8 Nov 2022 20:10:49 +0000 (15:10 -0500)
commit	ca748f518358448efa46f01539a8bdce5cfc710f
tree	5422f4445ab48912d6cc1cf4abafa791fef2348a	tree \| snapshot
parent	c90d81f8bb691b6627497e69d599e1f7fa7e9dfa	commit \| diff

chainlint: tighten accuracy when consuming input stream

To extract the next token in the input stream, Lexer::scan_token() finds
the start of the token by skipping whitespace, then consumes characters
belonging to the token until it encounters a non-token character, such
as an operator, punctuation, or whitespace. In the case of an operator
or punctuation which ends a token, before returning the just-scanned
token, it pushes that operator or punctuation character back onto the
input stream to ensure that it will be the first character consumed by
the next call to scan_token().

However, scan_token() is intentionally lax when whitespace ends a token;
it doesn't bother pushing the whitespace character back onto the token
stream since it knows that the next call to scan_token() will, as its
first step, skip over whitespace anyhow when looking for the start of
the token.

Although such laxity is harmless for the proper functioning of the
lexical analyzer, it does make it difficult to precisely identify the
token's end position in the input stream. Accurate token position
information may be desirable, for instance, to annotate problems or
highlight other interesting facets of the input found during the parsing
phase. To accommodate such possibilities, tighten scan_token() by making
it push the token-ending whitespace character back onto the input
stream, just as it does for other token-ending characters.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>