Sean Allred [Fri, 19 May 2023 04:17:51 +0000 (04:17 +0000)]
ls-remote doc: remove redundant --tags example
The --tags option is already demonstrated in the later example that
lists version-patterned tags. As it doesn't appear to add anything to
the documentation, it ought to be removed to keep the documentation
easier to read.
Signed-off-by: Sean Allred <allred.sean@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 19 May 2023 04:17:50 +0000 (04:17 +0000)]
show-branch doc: say <ref>, not <reference>
The glossary defines 'ref' as the official name of the thing,
and the output from "git grep -e '<ref' Documentation/" shows
that most everybody uses <ref>, not <reference>. In addition,
the page already says <ref> in its SYNOPSIS section for the
command when it is used in the mode to follow the reflogs.
Strictly speaking, many references of these should be updated to
<commit> after adding an explanation on how these <commit>s are
discovered (i.e. we take <rev>, <glob>, or <ref> and starting from
these commits, follow their ancestry or reflog entries to list
commits), but that would be a lot bigger change I would rather not
to do in this patch, whose primary purpose is to make the existing
documentation more consistent.
Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Sean Allred <allred.sean@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sean Allred [Fri, 19 May 2023 04:17:49 +0000 (04:17 +0000)]
show-ref doc: update for internal consistency
- Use inline-code syntax for options where appropriate.
- Use code blocks to clarify output format.
- Use 'OID' (for 'object ID') instead of 'SHA-1' as we support
different hashing algorithms these days.
Signed-off-by: Sean Allred <allred.sean@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 10 May 2023 17:23:27 +0000 (10:23 -0700)]
Merge branch 'hx/negotiator-non-recursive'
The implementation of the default "negotiator", used to find common
ancestor over the network for object tranfer, used to be recursive;
it was updated to be iterative to conserve stackspace usage.
* hx/negotiator-non-recursive:
negotiator/skipping: fix some problems in mark_common()
negotiator/default: avoid stack overflow
Junio C Hamano [Wed, 10 May 2023 17:23:27 +0000 (10:23 -0700)]
Merge branch 'tb/credential-long-lines'
The implementation of credential helpers used fgets() over fixed
size buffers to read protocol messages, causing the remainder of
the folded long line to trigger unexpected behaviour, which has
been corrected.
* tb/credential-long-lines:
contrib/credential: embiggen fixed-size buffer in wincred
contrib/credential: avoid fixed-size buffer in libsecret
contrib/credential: .gitignore libsecret build artifacts
contrib/credential: remove 'gnome-keyring' credential helper
contrib/credential: avoid fixed-size buffer in osxkeychain
t/lib-credential.sh: ensure credential helpers handle long headers
credential.c: store "wwwauth[]" values in `credential_read()`
Junio C Hamano [Tue, 9 May 2023 23:45:45 +0000 (16:45 -0700)]
Merge branch 'en/header-split-cache-h-part-2'
More header clean-up.
* en/header-split-cache-h-part-2: (22 commits)
reftable: ensure git-compat-util.h is the first (indirect) include
diff.h: reduce unnecessary includes
object-store.h: reduce unnecessary includes
commit.h: reduce unnecessary includes
fsmonitor: reduce includes of cache.h
cache.h: remove unnecessary headers
treewide: remove cache.h inclusion due to previous changes
cache,tree: move basic name compare functions from read-cache to tree
cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
hash-ll.h: split out of hash.h to remove dependency on repository.h
tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h
dir.h: move DTYPE defines from cache.h
versioncmp.h: move declarations for versioncmp.c functions from cache.h
ws.h: move declarations for ws.c functions from cache.h
match-trees.h: move declarations for match-trees.c functions from cache.h
pkt-line.h: move declarations for pkt-line.c functions from cache.h
base85.h: move declarations for base85.c functions from cache.h
copy.h: move declarations for copy.c functions from cache.h
server-info.h: move declarations for server-info.c functions from cache.h
packfile.h: move pack_window and pack_entry from cache.h
...
The detect-compilers script to help auto-tweaking the build system
had trouble working with compilers whose version number has extra
suffixes. The script has been taught that certain suffixes (like
"-win32" in "gcc 10-win32") can be safely stripped as they share
the same features and bugs with the version without the suffix.
* mh/fix-detect-compilers-with-nondigit-versions:
Handle some compiler versions containing a dash
The commit object parser has been taught to be a bit more lenient
to parse timestamps on the author/committer line with a malformed
author/committer ident.
* jk/parse-commit-with-malformed-ident:
parse_commit(): describe more date-parsing failure modes
parse_commit(): handle broken whitespace-only timestamp
parse_commit(): parse timestamp from end of line
t4212: avoid putting git on left-hand side of pipe
DocBook Stylesheets limit the size of the manpage titles for some
reason.
Even some of the longest git commands have no trouble fitting in 80
character terminals, so it's not clear why we would want to limit titles
to 20 characters, especially when modern terminals are much bigger.
Alex Henrie [Wed, 3 May 2023 04:46:56 +0000 (22:46 -0600)]
docs: rewrite the documentation of the text and eol attributes
These two sentences are confusing because the description of the text
attribute sounds exactly the same as the description of the text=auto
attribute:
"Setting the text attribute on a path enables end-of-line normalization"
"When text is set to "auto", the path is marked for automatic
end-of-line conversion"
Unless the reader is already familiar with the two variants, there's a
high probability that they will think that "end-of-line normalization"
is the same thing as "automatic end-of-line conversion".
It's also not clear that the phrase "When the file has been committed
with CRLF, no conversion is done" in the paragraph for text=auto does
not apply equally to the bare text attribute which is described earlier.
Moreover, it falsely implies that normalization is only suppressed if
the file has been committed. In fact, running `git add` on a CRLF file,
adding the text=auto attribute to the file, and running `git add` again
does not do anything to the line endings either.
On top of that, in several places the documentation for the eol
attribute sounds like either it does not affect normalization on checkin
or it forces normalization on checkin. It also sounds like setting eol
(or setting a config variable) is required to turn on conversion on
checkout, but the text attribute can turn on conversion on checkout by
itself if eol is unspecified.
Rephrase the documentation of text, text=auto, eol, eol=crlf, and eol=lf
to be clear about how they are the same, how they are different, and in
what cases conversion is performed.
Helped-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei Rybak [Mon, 17 Apr 2023 19:10:44 +0000 (21:10 +0200)]
t2019: don't create unused files
Tests in t2019-checkout-ambiguous-ref.sh redirect two invocations of
"git checkout" to files "stdout" and "stderr". Several assertions are
made using file "stderr". File "stdout", however, is unused.
Don't redirect standard output of "git checkout" to file "stdout" in
t2019-checkout-ambiguous-ref.sh to avoid creating unnecessary files.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei Rybak [Mon, 17 Apr 2023 19:10:43 +0000 (21:10 +0200)]
t1502: don't create unused files
Three tests in file t1502-rev-parse-parseopt.sh use three redirections
with invocation of "git rev-parse --parseopt --". All three tests
redirect standard output to file "out" and file "spec" to standard
input. Two of the tests redirect standard output a second time to file
"actual", and the third test redirects standard error to file "err".
These tests check contents of files "actual" and "err", but don't use
the files named "out" for assertions. The two tests that redirect to
standard output twice might also be confusing to the reader.
Don't redirect standard output of "git rev-parse" to file "out" in
t1502-rev-parse-parseopt.sh to avoid creating unnecessary files.
Acked-by: Øystein Walle <oystwa@gmail.com> Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei Rybak [Mon, 17 Apr 2023 19:10:42 +0000 (21:10 +0200)]
t1450: don't create unused files
Test 'fsck error and recovery on invalid object type' in file
t1450-fsck.sh redirects output of a failing "git fsck" invocation to
files "out" and "err" to assert presence of error messages in the output
of the command. Commit 31deb28f5e (fsck: don't hard die on invalid
object types, 2021-10-01) changed the way assertions in this test are
performed. The test doesn't compare the whole standard error with
prepared file "err.expect" and it doesn't assert that standard output is
empty.
Don't create unused files "err.expect" and "out" in test 'fsck error and
recovery on invalid object type'.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei Rybak [Mon, 17 Apr 2023 19:10:41 +0000 (21:10 +0200)]
t1300: don't create unused files
Three tests in t1300-config.sh check that "git config --get" barfs when
syntax errors are present in the config file. The tests redirect
standard output and standard error of "git config --get" to files,
"actual" and "error" correspondingly. They assert presence of an error
message in file "error". However, these tests don't use file "actual"
for assertions.
Don't redirect standard output of "git config --get" to file "actual" in
t1300-config.sh to avoid creating unnecessary files.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei Rybak [Mon, 17 Apr 2023 19:10:40 +0000 (21:10 +0200)]
t1300: fix config file syntax error descriptions
Three tests in t1300-config.sh check that "git config --get" barfs when
the config file contains various syntax errors: key=value pair without
equals sign, broken section line, and broken value string. The sample
config files include a comment describing the kind of broken syntax.
This description seems to have been copy-pasted from the "broken section
line" sample to the other two samples.
Fix descriptions of broken config file syntax in samples used in
t1300-config.sh.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei Rybak [Mon, 17 Apr 2023 19:10:39 +0000 (21:10 +0200)]
t0300: don't create unused file
Test 'credential config with partial URLs' in t0300-credentials.sh
contains three "git credential fill" invocations. For two of the
invocations, the test asserts presence or absence of string "yep" in the
standard output. For the third test it checks for an error message in
standard error.
Don't redirect standard output of "git credential" to file "stdout" in
t0300-credentials.sh to avoid creating an unnecessary file when only
standard error is checked.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Felipe Contreras [Tue, 18 Apr 2023 01:18:28 +0000 (19:18 -0600)]
doc: remove custom callouts format
The code to render callouts for manpages comes from 17 years ago: 776e994af5 (Properly render asciidoc "callouts" in git man pages.,
2006-04-28), and it was needed back then, but DocBook Stylesheets added
support for that in 2008 [1], since 1.74.0 it hasn't been necessary.
What's worse: the format of the upstream callouts is much nicer than our
hacked version.
1. Changes in the working tree not yet staged for the next
commit.
2. Changes between the index and your last commit; what you
would be committing if you run git commit without -a
option.
3. Changes in the working tree since your last commit; what
you would be committing if you run git commit -a
1. Changes in the working tree not yet staged for the next commit.
2. Changes between the index and your last commit; what you would
be committing if you run git commit without -a option.
3. Changes in the working tree since your last commit; what you
would be committing if you run git commit -a
Let's drop our unnecessary inferior custom format and use the official
one.
[1] https://sourceforge.net/p/docbook/code/7842/
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 2 May 2023 17:13:35 +0000 (10:13 -0700)]
Merge branch 'tb/ban-strtok'
Mark strtok() and strtok_r() to be banned.
* tb/ban-strtok:
banned.h: mark `strtok()` and `strtok_r()` as banned
t/helper/test-json-writer.c: avoid using `strtok()`
t/helper/test-oidmap.c: avoid using `strtok()`
t/helper/test-hashmap.c: avoid using `strtok()`
string-list: introduce `string_list_setlen()`
string-list: multi-delimiter `string_list_split_in_place()`
Junio C Hamano [Tue, 2 May 2023 17:13:34 +0000 (10:13 -0700)]
Merge branch 'jk/blame-fake-commit-label'
The output given by "git blame" that attributes a line to contents
taken from the file specified by the "--contents" option shows it
differently from a line attributed to the working tree file.
* jk/blame-fake-commit-label:
blame: use different author name for fake commit generated by --contents
The completion script used to use bare "read" without the "-r"
option to read the contents of various state files, which risked
getting confused with backslashes in them. This has been
corrected.
* ek/completion-use-read-r-to-read-literally:
completion: suppress unwanted unescaping of `read`
René Scharfe [Mon, 1 May 2023 19:51:57 +0000 (21:51 +0200)]
test-ctype: check EOF
The character classifiers are supposed to allow passing EOF to them, a
negative value. It isn't part of any character class. Extend the tests
to cover that.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Mon, 1 May 2023 15:54:06 +0000 (11:54 -0400)]
contrib/credential: embiggen fixed-size buffer in wincred
As in previous commits, harden the wincred credential helper against the
aforementioned protocol injection attack.
Unlike the approached used for osxkeychain and libsecret, where a
fixed-size buffer was replaced with `getline()`, we must take a
different approach here. There is no `getline()` equivalent in Windows,
and the function is not available to us with ordinary compiler settings.
Instead, allocate a larger (still fixed-size) buffer in which to process
each line. The value of 100 KiB is chosen to match the maximum-length
header that curl will allow, CURL_MAX_HTTP_HEADER.
To ensure that we are reading complete lines at a time, and that we
aren't susceptible to a similar injection attack (albeit with more
padding), ensure that each read terminates at a newline (i.e., that no
line is more than 100 KiB long).
Note that it isn't sufficient to turn the old loop into something like:
the credential helper would fill its buffer after reading up through the
first '\r', call fgets() again, and then see "host=example.com\r\n" on
its line.
Note that the original code was written in a way that would trim an
arbitrary number of "\r" and "\n" from the end of the string. We should
get only a single "\n" (since the point of `fgets()` is to return the
buffer to us when it sees one), and likewise would not expect to see
more than one associated "\r". The new code trims a single "\r\n", which
matches the original intent.
Tested-by: Matthew John Cheetham <mjcheetham@outlook.com> Helped-by: Matthew John Cheetham <mjcheetham@outlook.com> Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Mon, 1 May 2023 15:54:03 +0000 (11:54 -0400)]
contrib/credential: avoid fixed-size buffer in libsecret
The libsecret credential helper reads the newline-delimited
protocol stream one line at a time by repeatedly calling fgets() into a
fixed-size buffer, and is thus affected by the vulnerability described
in the previous commit.
To mitigate this attack, avoid using a fixed-size buffer, and instead
rely on getline() to allocate a buffer as large as necessary to fit the
entire content of the line, preventing any protocol injection.
In most parts of Git we don't assume that every platform has getline().
But libsecret is primarily used on Linux, where we do already assume it
(using a knob in config.mak.uname). POSIX also added getline() in 2008,
so we'd expect other recent Unix-like operating systems to have it
(e.g., FreeBSD also does).
Note that the buffer was already allocated on the heap in this case, but
we'll swap `g_free()` for `free()`, since it will now be allocated by
the system `getline()`, rather than glib's `g_malloc()`.
Tested-by: Jeff King <peff@peff.net> Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The libsecret credential helper does not mark its build artifact as
ignored, so running "make" results in a dirty working tree.
Mark the "git-credential-libsecret" binary as ignored to avoid the above.
Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
libgnome-keyring was deprecated in 2014 (in favor of libsecret), more
than nine years ago [1].
The credential helper implemented using libgnome-keyring has had a small
handful of commits since 2013, none of which implemented or changed any
functionality. The last commit to do substantial work in this area was 15f7221686 (contrib/git-credential-gnome-keyring.c: support really
ancient gnome-keyring, 2013-09-23), just shy of nine years ago.
This credential helper suffers from the same `fgets()`-related injection
attack (using the new "wwwauth[]" feature) as in the previous commit.
Instead of patching it, let's remove this helper as deprecated.
Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Mon, 1 May 2023 15:53:54 +0000 (11:53 -0400)]
contrib/credential: avoid fixed-size buffer in osxkeychain
The macOS Keychain-based credential helper reads the newline-delimited
protocol stream one line at a time by repeatedly calling fgets() into a
fixed-size buffer, and is thus affected by the vulnerability described
in the previous commit.
To mitigate this attack, avoid using a fixed-size buffer, and instead
rely on getline() to allocate a buffer as large as necessary to fit the
entire content of the line, preventing any protocol injection.
We solved a similar problem in a5bb10fd5e (config: avoid fixed-sized
buffer when renaming/deleting a section, 2023-04-06) by switching to
strbuf_getline(). We can't do that here because the contrib helpers do
not link with the rest of Git, and so can't use a strbuf. But we can use
the system getline() directly, which works similarly.
In most parts of Git we don't assume that every platform has getline().
But this helper is run only on OS X, and that platform added support in
10.7 ("Lion") which was released in 2011.
Tested-by: Taylor Blau <me@ttaylorr.com> Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Mon, 1 May 2023 15:53:51 +0000 (11:53 -0400)]
t/lib-credential.sh: ensure credential helpers handle long headers
Add a test ensuring that the "wwwauth[]" field cannot be used to
inject malicious data into the credential helper stream.
Many of the credential helpers in contrib/credential read the
newline-delimited protocol stream one line at a time by repeatedly
calling fgets() into a fixed-size buffer.
This assumes that each line is no more than 1024 characters long, since
each iteration of the loop assumes that it is parsing starting at the
beginning of a new line in the stream. However, similar to a5bb10fd5e
(config: avoid fixed-sized buffer when renaming/deleting a section,
2023-04-06), if a line is longer than 1024 characters, a malicious actor
can embed another command within an existing line, bypassing the usual
checks introduced in 9a6bbee800 (credential: avoid writing values with
newlines, 2020-03-11).
As with the problem fixed in that commit, specially crafted input can
cause the helper to return the credential for the wrong host, letting an
attacker trick the victim into sending credentials for one host to
another.
Luckily, all parts of the credential helper protocol that are available
in a tagged release of Git are immune to this attack:
- "protocol" is restricted to known values, and is thus immune.
- "host" is immune because curl will reject hostnames that have a '='
character in them, which would be required to carry out this attack.
- "username" is immune, because the buffer characters to fill out the
first `fgets()` call would pollute the `username` field, causing the
credential helper to return nothing (because it would match a
username if present, and the username of the credential to be stolen
is likely not 1024 characters).
- "password" is immune because providing a password instructs
credential helpers to avoid filling credentials in the first place.
- "path" is similar to username; if present, it is not likely to match
any credential the victim is storing. It's also not enabled by
default; the victim would have to set credential.useHTTPPath
explicitly.
However, the new "wwwauth[]" field introduced via 5f2117b24f
(credential: add WWW-Authenticate header to cred requests, 2023-02-27)
can be used to inject data into the credential helper stream. For
example, running:
being sent to the credential helper. If we tweak that "1024" to align
our output with the helper's buffer size and the rest of the data on the
line, it can cause the helper to see "host=victim.com" on its own line,
allowing motivated attackers to exfiltrate credentials belonging to
"victim.com".
The below test demonstrates these failures and provides us with a test
to ensure that our fix is correct. That said, it has a couple of
shortcomings:
- it's in t0303, since that's the only mechanism we have for testing
random helpers. But that means nobody is going to run it under
normal circumstances.
- to get the attack right, it has to line up the stuffed name with the
buffer size, so we depend on the exact buffer size. I parameterized
it so it could be used to test other helpers, but in practice it's
not likely for anybody to do that.
Still, it's the best we can do, and will help us confirm the presence of
the problem (and our fixes) in the new few patches.
Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Mon, 1 May 2023 15:53:48 +0000 (11:53 -0400)]
credential.c: store "wwwauth[]" values in `credential_read()`
Teach git-credential to read "wwwauth[]" value(s) when parsing the
output of a credential helper.
These extra headers are not needed for Git's own HTTP support to use the
feature internally, but the feature would not be available for a
scripted caller (say, git-remote-mediawiki providing the header in the
same way).
As a bonus, this also makes it easier to use wwwauth[] in synthetic
credential inputs in our test suite.
Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t/t3501-revert-cherry-pick.sh: clarify scope of the file
The file started out as a test for picks and reverts with renames, but
has been subsequently populated with all kinds of basic tests, in
accordance with its generic name. Adjust the description to reflect
that.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 28 Apr 2023 23:03:03 +0000 (16:03 -0700)]
Merge branch 'jk/gpg-trust-level-fix'
The "%GT" placeholder for the "--format" option of "git log" and
friends caused BUG() to trigger on a commit signed with an unknown
key, which has been corrected.
* jk/gpg-trust-level-fix:
gpg-interface: set trust level of missing key to "undefined"
Junio C Hamano [Fri, 28 Apr 2023 23:03:03 +0000 (16:03 -0700)]
Merge branch 'tb/enable-cruft-packs-by-default'
When "gc" needs to retain unreachable objects, packing them into
cruft packs (instead of exploding them into loose object files) has
been offered as a more efficient option for some time. Now the use
of cruft packs has been made the default and no longer considered
an experimental feature.
* tb/enable-cruft-packs-by-default:
repository.h: drop unused `gc_cruft_packs`
builtin/gc.c: make `gc.cruftPacks` enabled by default
t/t9300-fast-import.sh: prepare for `gc --cruft` by default
t/t6500-gc.sh: add additional test cases
t/t6500-gc.sh: refactor cruft pack tests
t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default
t/t5304-prune.sh: prepare for `gc --cruft` by default
builtin/gc.c: ignore cruft packs with `--keep-largest-pack`
builtin/repack.c: fix incorrect reference to '-C'
pack-write.c: plug a leak in stage_tmp_packfiles()
messages: capitalization and punctuation exceptions
These are conscious violations of the usual rules for error messages,
based on this reasoning:
- If an error message is directly followed by another sentence, it
needs to be properly terminated with a period, lest the grammar
looks broken and becomes hard to read.
- That second sentence isn't actually an error message any more, so
it should abide to conventional language rules for good looks and
legibility. Arguably, these should be converted to advice
messages (which the user can squelch, too), but that's a much
bigger effort to get right.
- Neither of these apply to the first hunk in do_exec(), but this
two-line message looks just too much like a real sentence to not
terminate it. Also, leaving it alone would make it asymmetrical
to the other hunk.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
N_() is meant to be used on strings that are subsequently _()'d, which
isn't the case here.
The affected construct is a bit questionable from an i18n perspective,
as it pieces together a sentence from separate strings. However, it
doesn't appear to be that bad, as the "assembly instructions" are in a
translatable message as well. Lacking specific complaints from
translators, it doesn't seem worth changing this.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 27 Apr 2023 23:00:59 +0000 (16:00 -0700)]
Merge branch 'tb/pack-revindex-on-disk'
The on-disk reverse index that allows mapping from the pack offset
to the object name for the object stored at the offset has been
enabled by default.
* tb/pack-revindex-on-disk:
t: invert `GIT_TEST_WRITE_REV_INDEX`
config: enable `pack.writeReverseIndex` by default
pack-revindex: introduce `pack.readReverseIndex`
pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
pack-revindex: make `load_pack_revindex` take a repository
t5325: mark as leak-free
pack-write.c: plug a leak in stage_tmp_packfiles()
Jeff King [Thu, 27 Apr 2023 08:17:24 +0000 (04:17 -0400)]
parse_commit(): describe more date-parsing failure modes
The previous few commits improved the parsing of dates in malformed
commit objects. But there's one big case left implicit: we may still
feed garbage to parse_timestamp(). This is preferable to trying to be
more strict, but let's document the thinking in a comment.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comment in parse_commit_date() claims that parse_timestamp() will
not walk past the end of the buffer we've been given, since it will hit
the newline at "eol" and stop. This is usually true, when dateptr
contains actual numbers to parse. But with a line like:
committer name <email> \n
with just whitespace, and no numbers, parse_timestamp() will consume
that newline as part of the leading whitespace, and we may walk past our
"tail" pointer (which itself is set from the "size" parameter passed in
to parse_commit_buffer()).
In practice this can't cause us to walk off the end of an array, because
we always add an extra NUL byte to the end of objects we load from disk
(as a defense against exactly this kind of bug). However, you can see
the behavior in action when "committer" is the final header (which it
usually is, unless there's an encoding) and the subject line can be
parsed as an integer. We walk right past the newline on the committer
line, as well as the "\n\n" separator, and mistake the subject for the
timestamp.
We can solve this by trimming the whitespace ourselves, making sure that
it has some non-whitespace to parse. Note that we need to be a bit
careful about the definition of "whitespace" here, as our isspace()
doesn't match exotic characters like vertical tab or formfeed. We can
work around that by checking for an actual number (see the in-code
comment). This is slightly more restrictive than the current code, but
in practice the results are either the same (we reject "foo" as "0", but
so would parse_timestamp()) or extremely unlikely even for broken
commits (parse_timestamp() would allow "\v123" as "123", but we'll now
make it "0").
I did also allow "-" here, which may be controversial, as we don't
currently support negative timestamps. My reasoning was two-fold. One,
the design of parse_timestamp() is such that we should be able to easily
switch it to handling signed values, and this otherwise creates a
hard-to-find gotcha that anybody doing that work would get tripped up
on. And two, the status quo is that we currently parse them, though the
result of course ends up as a very large unsigned value (which is likely
to just get clamped to "0" for display anyway, since our date routines
can't handle it).
The new test checks the commit parser (via "--until") for both vanilla
spaces and the vertical-tab case. I also added a test to check these
against the pretty-print formatter, which uses split_ident_line(). It's
not subject to the same bug, because it already insists that there be
one or more digits in the timestamp.
Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 27 Apr 2023 08:14:09 +0000 (04:14 -0400)]
parse_commit(): parse timestamp from end of line
To find the committer timestamp, we parse left-to-right looking for the
closing ">" of the email, and then expect the timestamp right after
that. But we've seen some broken cases in the wild where this fails, but
we _could_ find the timestamp with a little extra work. E.g.:
This means that features that rely on the committer timestamp, like
--since or --until, will treat the commit as happening at time 0 (i.e.,
1970).
This is doubly confusing because the pretty-print parser learned to
handle these in 03818a4a94 (split_ident: parse timestamp from end of
line, 2013-10-14). So printing them via "git show", etc, makes
everything look normal, but --until, etc are still broken (despite the
fact that that commit explicitly mentioned --until!).
So let's use the same trick as 03818a4a94: find the end of the line, and
parse back to the final ">". In theory we could use split_ident_line()
here, but it's actually a bit more strict. In particular, it requires a
valid time-zone token, too. That should be present, of course, but we
wouldn't want to break --until for cases that are working currently.
We might want to teach split_ident_line() to become more lenient there,
but it would require checking its many callers (since right now they can
assume that if date_start is non-NULL, so is tz_start).
So for now we'll just reimplement the same trick in the commit parser.
The test is in t4212, which already covers similar cases, courtesy of 03818a4a94. We'll just adjust the broken commit to munge both the author
and committer timestamps. Note that we could match (author|committer)
here, but alternation can't be used portably in sed. Since we wouldn't
expect to see ">" except as part of an ident line, we can just match
that character on any line.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Mon, 24 Apr 2023 22:20:26 +0000 (18:20 -0400)]
banned.h: mark `strtok()` and `strtok_r()` as banned
`strtok()` has a couple of drawbacks that make it undesirable to have
any new instances. In addition to being thread-unsafe, it also
encourages confusing data flows, where `strtok()` may be called from
multiple functions with its first argument as NULL, making it unclear
from the immediate context which string is being tokenized.
Now that we have removed all instances of `strtok()` from the tree,
let's ban `strtok()` to avoid introducing new ones in the future. If new
callers should arise, they are encouraged to use
`string_list_split_in_place()` (and `string_list_remove_empty_items()`,
if applicable).
string_list_split_in_place() is not a perfect drop-in replacement
for `strtok_r()`, particularly if the caller is processing a string with
an arbitrary number of tokens, and wants to process each token one at a
time.
But there are no instances of this in Git's tree which are more
well-suited to `strtok_r()` than the friendlier
`string_list_split_in_place()`, so ban `strtok_r()`, too.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Han Xin [Wed, 26 Apr 2023 13:15:04 +0000 (21:15 +0800)]
negotiator/skipping: fix some problems in mark_common()
The mark_common() method in negotiator/skipping.c was converted
from recursive to iterative in 4654134976f (negotiator/skipping:
avoid stack overflow, 2022-10-25), but there is some more work
to do:
1. prio_queue() should be used with clear_prio_queue(), otherwise there
will be a memory leak.
2. It does not do duplicate protection before prio_queue_put().
(The COMMON bit would work here, too.)
3. When it translated from recursive to iterative it kept "return"
statements that should probably be "continue" statements.
4. It does not attempt to parse commits, and instead returns
immediately when finding an unparsed commit. This is something
that it did in its original version, so maybe it is by design,
but it doesn't match the doc comment for the method.
Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Han Xin <hanxin.hx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Han Xin [Wed, 26 Apr 2023 13:15:03 +0000 (21:15 +0800)]
negotiator/default: avoid stack overflow
mark_common() in negotiator/default.c may overflow the stack due to
recursive function calls. Avoid this by instead recursing using a
heap-allocated data structure.
This is the same case as 4654134976f (negotiator/skipping: avoid
stack overflow, 2022-10-25)
Reported-by: Xin Xing <xingxin.xx@bytedance.com> Signed-off-by: Han Xin <hanxin.hx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 25 Apr 2023 20:56:20 +0000 (13:56 -0700)]
Merge branch 'ps/fix-geom-repack-with-alternates'
Geometric repacking ("git repack --geometric=<n>") in a repository
that borrows from an alternate object database had various corner
case bugs, which have been corrected.
* ps/fix-geom-repack-with-alternates:
repack: disable writing bitmaps when doing a local repack
repack: honor `-l` when calculating pack geometry
t/helper: allow chmtime to print verbosely without modifying mtime
pack-objects: extend test coverage of `--stdin-packs` with alternates
pack-objects: fix error when same packfile is included and excluded
pack-objects: fix error when packing same pack twice
pack-objects: split out `--stdin-packs` tests into separate file
repack: fix generating multi-pack-index with only non-local packs
repack: fix trying to use preferred pack in alternates
midx: fix segfault with no packs and invalid preferred pack
The sendemail-validate validate hook learned to pass the total
number of input files and where in the sequence each invocation is
via environment variables.
* rj/send-email-validate-hook-count-messages:
send-email: export patch counters in validate environment
Junio C Hamano [Tue, 25 Apr 2023 20:56:20 +0000 (13:56 -0700)]
Merge branch 'jk/protocol-cap-parse-fix'
The code to parse capability list for v0 on-wire protocol fell into
an infinite loop when a capability appears multiple times, which
has been corrected.
* jk/protocol-cap-parse-fix:
v0 protocol: use size_t for capability length/offset
t5512: test "ls-remote --heads --symref" filtering with v0 and v2
t5512: allow any protocol version for filtered symref test
t5512: add v2 support for "ls-remote --symref" test
v0 protocol: fix sha1/sha256 confusion for capabilities^{}
t5512: stop referring to "v1" protocol
v0 protocol: fix infinite loop when parsing multi-valued capabilities
Junio C Hamano [Tue, 25 Apr 2023 20:56:19 +0000 (13:56 -0700)]
Merge branch 'en/header-split-cache-h'
Header clean-up.
* en/header-split-cache-h: (24 commits)
protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
mailmap, quote: move declarations of global vars to correct unit
treewide: reduce includes of cache.h in other headers
treewide: remove double forward declaration of read_in_full
cache.h: remove unnecessary includes
treewide: remove cache.h inclusion due to pager.h changes
pager.h: move declarations for pager.c functions from cache.h
treewide: remove cache.h inclusion due to editor.h changes
editor: move editor-related functions and declarations into common file
treewide: remove cache.h inclusion due to object.h changes
object.h: move some inline functions and defines from cache.h
treewide: remove cache.h inclusion due to object-file.h changes
object-file.h: move declarations for object-file.c functions from cache.h
treewide: remove cache.h inclusion due to git-zlib changes
git-zlib: move declarations for git-zlib functions from cache.h
treewide: remove cache.h inclusion due to object-name.h changes
object-name.h: move declarations for object-name.c functions from cache.h
treewide: remove unnecessary cache.h inclusion
treewide: be explicit about dependence on mem-pool.h
treewide: be explicit about dependence on oid-array.h
...
Taylor Blau [Mon, 24 Apr 2023 22:20:23 +0000 (18:20 -0400)]
t/helper/test-json-writer.c: avoid using `strtok()`
Apply similar treatment as in the previous commit to remove usage of
`strtok()` from the "oidmap" test helper.
Each of the different commands that the "json-writer" helper accepts
pops the next space-delimited token from the current line and interprets
it as a string, integer, or double (with the exception of the very first
token, which is the command itself).
To accommodate this, split the line in place by the space character, and
pass the corresponding string_list to each of the specialized `get_s()`,
`get_i()`, and `get_d()` functions.
`get_i()` and `get_d()` are thin wrappers around `get_s()` that convert
their result into the appropriate type by either calling `strtol()` or
`strtod()`, respectively. In `get_s()`, we mark the token as "consumed"
by incrementing the `consumed_nr` counter, indicating how many tokens we
have read up to that point.
Because each of these functions needs the string-list parts, the number
of tokens consumed, and the line number, these three are wrapped up in
to a struct representing the line state.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Mon, 24 Apr 2023 22:20:17 +0000 (18:20 -0400)]
t/helper/test-hashmap.c: avoid using `strtok()`
Avoid using the non-reentrant `strtok()` to separate the parts of each
incoming command. Instead of replacing it with `strtok_r()`, let's
instead use the more friendly pair of `string_list_split_in_place()` and
`string_list_remove_empty_items()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
is preferable over calling `string_list_clear()` on every iteration of
the loop. This is because `string_list_clear()` causes us free our
existing `items` array. This means that every time we call
`string_list_split_in_place()`, the string-list internals re-allocate
the same size array.
Since in the above example we do not care about the individual parts
after processing each line, it is much more efficient to pretend that
there aren't any elements in the `string_list` by setting `list->nr` to
0 while leaving the list of elements allocated as-is.
This allows `string_list_split_in_place()` to overwrite any existing
entries without needing to free and re-allocate them.
However, setting `list->nr` manually is not safe in all instances. There
are a couple of cases worth worrying about:
- If the `string_list` is initialized with `strdup_strings`,
truncating the list can lead to overwriting strings which are
allocated elsewhere. If there aren't any other pointers to those
strings other than the ones inside of the `items` array, they will
become unreachable and leak.
(We could ourselves free the truncated items between
string_list->items[nr] and `list->nr`, but no present or future
callers would benefit from this additional complexity).
- If the given `nr` is larger than the current value of `list->nr`,
we'll trick the `string_list` into a state where it thinks there are
more items allocated than there actually are, which can lead to
undefined behavior if we try to read or write those entries.
Guard against both of these by introducing a helper function which
guards assignment of `list->nr` against each of the above conditions.
Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enhance `string_list_split_in_place()` to accept multiple characters as
delimiters instead of a single character.
Instead of using `strchr(2)` to locate the first occurrence of the given
delimiter character, `string_list_split_in_place_multi()` uses
`strcspn(2)` to move past the initial segment of characters comprised of
any characters in the delimiting set.
When only a single delimiting character is provided, `strpbrk(2)` (which
is implemented with `strcspn(2)`) has equivalent performance to
`strchr(2)`. Modern `strcspn(2)` implementations treat an empty
delimiter or the singleton delimiter as a special case and fall back to
calling strchrnul(). Both glibc[1] and musl[2] implement `strcspn(2)`
this way.
This change is one step to removing `strtok(2)` from the tree. Note that
`string_list_split_in_place()` is not a strict replacement for
`strtok()`, since it will happily turn sequential delimiter characters
into empty entries in the resulting string_list. For example:
Jacob Keller [Mon, 24 Apr 2023 19:35:08 +0000 (12:35 -0700)]
blame: use different author name for fake commit generated by --contents
When the --contents option is used with git blame, and the contents of
the file have lines which can't be annotated by the history being
blamed, the user will see an author of "Not Committed Yet". This is
similar to the way blame handles working tree contents when blaming
without a revision.
This is slightly confusing since this data isn't the working copy and
while it is technically "not committed yet", its also coming from an
external file. Replace this author name with "External file
(--contents)" to better differentiate such lines from actual working
copy lines.
Suggested-by: Junio C Hamano <gitster@pobox.com> Suggested-by: Glen Choo <chooglen@google.com> Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-ort: fix calling merge_finalize() with no intermediate merge
If some code sets up the data structures for a merge, but then never
actually performs one before calling merge_finalize(), then
merge_finalize() wouldn't notice that result->priv was NULL and
return early, resulting in following that NULL pointer and getting
a segfault. There is currently no code in the git codebase that does
this, but this issue was found during testing of some proposed patches
that had the following structure:
where some flags could cause the code to have N=0, i.e. doing no merges.
Add a check for result->priv being NULL and return early to avoid a
segfault in these kinds of cases.
While at it, ensure the FREE_AND_NULL() in the function does something
useful with the nulling aspect, namely sets result->priv to NULL rather
than a mere temporary.
Reported-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache,tree: move basic name compare functions from read-cache to tree
None of base_name_compare(), df_name_compare(), or name_compare()
depended upon a cache_entry or index_state in any way. By moving these
functions to tree.h, half a dozen other files can stop depending upon
cache.h (though that change will be made in a later commit).
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
Since cmp_cache_name_compare() was comparing cache_entry structs, it
was associated with the cache rather than with trees. Move the
function. As a side effect, we can make cache_name_stage_compare()
static as well.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
hash-ll.h: split out of hash.h to remove dependency on repository.h
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo). However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id. Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.
This allows hash.h and object.h to be fairly small, minimal headers. It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>