unpack-trees: detect mismatching number of cache-tree/index entries
Same as the preceding commit, we unconditionally dereference the index's
cache entries depending on the number of cache-tree entries, which can
lead to a segfault when the cache-tree is corrupted. Fix this bug.
This also makes t4058 pass with the leak sanitizer enabled.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache-tree: detect mismatching number of index entries
In t4058 we have some tests that exercise git-read-tree(1) when used
with a tree that contains duplicate entries. While the expectation is
that we fail, we ideally should fail gracefully without a segfault.
But that is not the case: we never check that the number of entries in
the cache-tree is less than or equal to the number of entries in the
index. This can lead to an out-of-bounds read as we unconditionally
access `istate->cache[idx]`, where `idx` is controlled by the number of
cache-tree entries and the current position therein. The result is a
segfault.
Fix this segfault by adding a sanity check for the number of index
entries before dereferencing them.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache-tree: refactor verification to return error codes
The function `cache_tree_verify()` will `BUG()` whenever it finds that
the cache-tree extension of the index is corrupt. The function is thus
inherently untestable because the resulting call to `abort()` will be
detected by our testing framework and labelled an error. And rightfully
so: it shouldn't ever be possible to hit bugs, as they should indicate a
programming error rather than corruption of on-disk state.
Refactor the function to instead return error codes. This also ensures
that the function can be used e.g. by git-fsck(1) without the whole
process dying. Furthermore, this refactoring plugs some memory leaks
when returning early by creating a common exit path.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 16 Sep 2024 21:22:55 +0000 (14:22 -0700)]
Merge branch 'jk/ref-filter-trailer-fixes'
Bugfixes and leak plugging in "git for-each-ref --format=..." code
paths.
* jk/ref-filter-trailer-fixes:
ref-filter: fix leak with unterminated %(if) atoms
ref-filter: add ref_format_clear() function
ref-filter: fix leak when formatting %(push:remoteref)
ref-filter: fix leak with %(describe) arguments
ref-filter: fix leak of %(trailers) "argbuf"
ref-filter: store ref_trailer_buf data per-atom
ref-filter: drop useless cast in trailers_atom_parser()
ref-filter: strip signature when parsing tag trailers
ref-filter: avoid extra copies of payload/signature
t6300: drop newline from wrapped test title
Junio C Hamano [Mon, 16 Sep 2024 21:22:53 +0000 (14:22 -0700)]
Merge branch 'cp/unit-test-reftable-stack'
Another reftable test migrated to the unit-test framework.
* cp/unit-test-reftable-stack:
t-reftable-stack: add test for stack iterators
t-reftable-stack: add test for non-default compaction factor
t-reftable-stack: use reftable_ref_record_equal() to compare ref records
t-reftable-stack: use Git's tempfile API instead of mkstemp()
t: harmonize t-reftable-stack.c with coding guidelines
t: move reftable/stack_test.c to the unit testing framework
The environment GIT_ADVICE has been intentionally kept undocumented
to discourage its use by interactive users. Add documentation to
help tool writers.
* ds/doc-wholesale-disabling-advice-messages:
advice: recommend GIT_ADVICE=0 for tools
Junio C Hamano [Fri, 13 Sep 2024 22:27:42 +0000 (15:27 -0700)]
Merge branch 'jk/sparse-fdleak-fix'
A file descriptor left open is now properly closed when "git
sparse-checkout" updates the sparse patterns.
* jk/sparse-fdleak-fix:
sparse-checkout: use fdopen_lock_file() instead of xfdopen()
sparse-checkout: check commit_lock_file when writing patterns
sparse-checkout: consolidate cleanup when writing patterns
Junio C Hamano [Fri, 13 Sep 2024 22:26:49 +0000 (15:26 -0700)]
Merge branch 'jk/free-commit-buffer-of-skipped-commits' into maint-2.46
The code forgot to discard unnecessary in-core commit buffer data
for commits that "git log --skip=<number>" traversed but omitted
from the output, which has been corrected.
* jk/free-commit-buffer-of-skipped-commits:
revision: free commit buffers for skipped commits
Junio C Hamano [Thu, 12 Sep 2024 18:47:23 +0000 (11:47 -0700)]
Merge branch 'jk/messages-with-excess-lf-fix'
One-line messages to "die" and other helper functions will get LF
added by these helper functions, but many existing messages had an
unnecessary LF at the end, which have been corrected.
* jk/messages-with-excess-lf-fix:
drop trailing newline from warning/error/die messages
Junio C Hamano [Thu, 12 Sep 2024 18:47:23 +0000 (11:47 -0700)]
Merge branch 'ps/pack-refs-auto-heuristics'
"git pack-refs --auto" for the files backend was too aggressive,
which has been a bit tamed.
* ps/pack-refs-auto-heuristics:
refs/files: use heuristic to decide whether to repack with `--auto`
t0601: merge tests for auto-packing of refs
wrapper: introduce `log2u()`
Junio C Hamano [Thu, 12 Sep 2024 18:47:23 +0000 (11:47 -0700)]
Merge branch 'tb/multi-pack-reuse-fix'
A data corruption bug when multi-pack-index is used and the same
objects are stored in multiple packfiles has been corrected.
* tb/multi-pack-reuse-fix:
builtin/pack-objects.c: do not open-code `MAX_PACK_OBJECT_HEADER`
pack-bitmap.c: avoid repeated `pack_pos_to_offset()` during reuse
builtin/pack-objects.c: translate bit positions during pack-reuse
pack-bitmap: tag bitmapped packs with their corresponding MIDX
t/t5332-multi-pack-reuse.sh: verify pack generation with --strict
Junio C Hamano [Thu, 12 Sep 2024 18:02:16 +0000 (11:02 -0700)]
Merge branch 'ps/bundle-outside-repo-fix' into maint-2.46
"git bundle unbundle" outside a repository triggered a BUG()
unnecessarily, which has been corrected.
* ps/bundle-outside-repo-fix:
bundle: default to SHA1 when reading bundle headers
builtin/bundle: have unbundle check for repo before opening its bundle
Junio C Hamano [Thu, 12 Sep 2024 18:02:16 +0000 (11:02 -0700)]
Merge branch 'jc/patch-id' into maint-2.46
The patch parser in "git patch-id" has been tightened to avoid
getting confused by lines that look like a patch header in the log
message.
cf. <Zqh2T_2RLt0SeKF7@tanuki>
* jc/patch-id:
patch-id: tighten code to detect the patch header
patch-id: rewrite code that detects the beginning of a patch
patch-id: make get_one_patchid() more extensible
patch-id: call flush_current_id() only when needed
t4204: patch-id supports various input format
The code forgot to discard unnecessary in-core commit buffer data
for commits that "git log --skip=<number>" traversed but omitted
from the output, which has been corrected.
* jk/free-commit-buffer-of-skipped-commits:
revision: free commit buffers for skipped commits
ref-filter: fix leak with unterminated %(if) atoms
When parsing `%(if)` atoms we expect a few other atoms to exist to
complete it, like `%(then)` and `%(end)`. Whether or not we have seen
these other atoms is tracked in an allocated `if_then_else` structure,
which gets free'd by the `if_then_else_handler()` once we have parsed
the complete conditional expression.
This results in a memory leak when the `%(if)` atom is not terminated
correctly and thus incomplete. We never end up executing its handler and
thus don't end up freeing the structure.
Plug this memory leak by introducing a new `at_end_data_free` callback
function. If set, we'll execute it in `pop_stack_element()` and pass it
the `at_end_data` variable with the intent to free its state. Wire it up
for the `%(if)` atom accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 9 Sep 2024 23:21:18 +0000 (19:21 -0400)]
ref-filter: add ref_format_clear() function
After using the ref-filter API, callers should use ref_filter_clear() to
free any used memory. However, there's not a matching function to clear
the ref_format struct.
Traditionally this did not need to be cleaned up, as it was just a way
for the caller to store and pass format options as a single unit. Even
though the parsing step of some placeholders may allocate data, that's
usually inside their "used_atom" structs, which are part of the
ref_filter itself.
But a few placeholders keep data outside of there. The %(ahead-behind)
and %(is-base) parsers both keep a master list of bases, because they
perform a single filtering pass outside of the use of any particular
atom. And since the format parser does not have access to the ref_filter
struct, they store their cross-atom data in the ref_format struct
itself.
And thus when they are finished, the ref_format also needs to be cleaned
up. So let's add a function to do so, and call it from all of the users
of the ref-filter API.
The %(is-base) case is found by running LSan on t6300. After this patch,
the script can now be marked leak-free.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 9 Sep 2024 23:19:51 +0000 (19:19 -0400)]
ref-filter: fix leak when formatting %(push:remoteref)
When we expand the %(upstream) or %(push) placeholders, we rely on
remote.c's remote_ref_for_branch() to fill in the ":refname" argument.
But that function has confusing memory ownership semantics: it may or
may not return an allocated string, depending on whether we are in
"upstream" mode or "push" mode. The caller in ref-filter.c always
duplicates the result, meaning that we leak the original in the case of
%(push:refname).
To solve this, let's make the return value from remote_ref_for_branch()
consistent, by always returning an allocated pointer. Note that the
switch to returning a non-const pointer has a ripple effect inside the
function, too. We were storing the "dst" result as a const pointer, too,
even though it is always allocated! It is the return value from
apply_refspecs(), which is always a non-const allocated string.
And then on the caller side in ref-filter.c (and this is the only caller
at all), we just need to avoid the extra duplication when the return
value is non-NULL.
This clears up one case that LSan finds in t6300, but there are more.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 9 Sep 2024 23:19:02 +0000 (19:19 -0400)]
ref-filter: fix leak with %(describe) arguments
When we parse a %(describe) placeholder, we stuff its arguments into a
strvec, which is then detached into the used_atom struct. But later,
when ref_array_clear() frees the atom, we never free the memory.
To solve this, we just need to add the appropriate free() calls. But
it's a little awkward, since we have to free each element of the array,
in addition to the array itself. Instead, let's store the actual strvec,
which lets us do a simple strvec_clear().
This clears up one case that LSan finds in t6300, but there are more.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 9 Sep 2024 23:18:28 +0000 (19:18 -0400)]
ref-filter: fix leak of %(trailers) "argbuf"
When we parse a placeholder like "%(trailers:key=foo)", our atom parsing
function is passed just the argument string "key=foo". We duplicate this
into its own string, but never free it, causing a leak.
We do the duplication for two reasons:
1. There's a mismatch with the pretty.c trailer-formatting code that
we rely on. It expects to see a closing paren, like "key=foo)". So
we duplicate the argument string with that extra character to pass
along.
This is probably something we could fix in the long run, but it's
somewhat non-trivial if we want to avoid regressing error cases for
things like "git log --format='%(trailer:oops'". So let's accept
it as a necessity for now.
2. The argument parser expects to store the list of "key" entries
("foo" in this case) in a string-list. It also stores the length of
the string in the string-list "util" field. The original caller in
pretty.c uses this with a "nodup" string list to avoid making extra
copies, which creates a subtle dependency on the lifetime of the
original format string.
We do the same here, which creates that same dependency. So we
can't simply free it as soon as the parsing is done.
There are two possible solutions here. The first is to hold on to the
duplicated "argbuf" string in the used_atom struct, so that it lives as
long as the string_list which references it.
But I think a less-subtle solution, and what this patch does, is to
switch to a duplicating string_list. That makes it self-contained, and
lets us free argbuf immediately. It may involve a few extra allocations,
but this parsing is something that happens once per program, not once
per output ref.
This clears up one case that LSan finds in t6300, but there are more.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 9 Sep 2024 23:16:53 +0000 (19:16 -0400)]
ref-filter: store ref_trailer_buf data per-atom
The trailer API takes options via a trailer_opts struct. Some of those
options point to data structures which require extra storage. Those
structures aren't actually embedded in the options struct, but rather we
pass pointers, and the caller is responsible for managing them. This is
a little convoluted, but makes sense since some of them are not even
concrete (e.g., you can pass a filter function and a void data pointer,
but the trailer code doesn't even know what's in the pointer).
When for-each-ref, etc, parse the %(trailers) placeholder, they stuff
the extra data into a ref_trailer_buf struct. But we only hold a single
static global instance of this struct. So if a format string has
multiple %(trailer) placeholders, they'll stomp on each other: the "key"
list will end up with entries for all of them, and the separator buffers
will use the values from whichever was parsed last.
Instead, we should have a ref_trailer_buf for each instance of the
placeholder, and store it alongside the trailer_opts in the used_atom
structure.
And that's what this patch does. Note that we also have to add code to
clean them up in ref_array_clear(). The original code did not bother
cleaning them up, but it wasn't technically a "leak" since they were
still reachable from the static global instance.
Reported-by: Brooke Kuhlmann <brooke@alchemists.io> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 9 Sep 2024 23:14:45 +0000 (19:14 -0400)]
ref-filter: strip signature when parsing tag trailers
To expand the "%(trailers)" placeholder, we have to feed the commit or
tag body to the trailer API. But that API doesn't know anything about
signatures, and will be confused by a signed tag like this:
because it will start looking for trailers after the signature, and get
stopped walking backwards by the very non-trailer signature lines. So it
thinks there are no trailers.
This problem has existed since %(trailers) was added to the ref-filter
code, but back then trailers on tags weren't something we really
considered (commits don't have the same problem because their signatures
are embedded in the header). But since 066cef7707 (builtin/tag: add
--trailer option, 2024-05-05), we'd generate an object like the above
for "git tag -s --trailer 'Some-trailer: foo' my-tag".
The implementation here is pretty simple: we just make a NUL-terminated
copy of the non-signature part of the tag (which we've already parsed)
and pass it to the trailer API. There are some alternatives I rejected,
at least for now:
- the trailer code already understands skipping past some cruft at the
end of a commit, such as patch dividers. see find_end_of_log_message().
We could teach it to do the same for signatures. But since this is
the only context where we'd want that feature, and since we've already
parsed the object into subject/body/signature here, it seemed easier
to just pass in the truncated message.
- it would be nice if we could just pass in a pointer/len pair to the
trailer API (rather than a NUL-terminated string) to avoid the extra
copy. I think this is possible, since as noted above, the trailer
code already has to deal with ignoring some cruft at the end of the
input. But after an initial attempt at this, it got pretty messy, as
we have to touch a lot of intermediate functions that are also
called in other contexts.
So I went for the simple and stupid thing, at least for now. I don't
think the extra copy overhead will be all that bad. The previous
patch noted that an extra copy seemed to cause about 1-2% slowdown
for something simple like "%(subject)". But here we are only
triggering it for "%(trailers)" (and only when there is a
signature), and the trailer code is a bit allocation-heavy already.
I couldn't measure any difference formatting "%(trailers)" on
linux.git before and after (even though there are not even any
trailers to find).
Reported-by: Brooke Kuhlmann <brooke@alchemists.io> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 9 Sep 2024 23:12:28 +0000 (19:12 -0400)]
ref-filter: avoid extra copies of payload/signature
When we know we're going to show the subject or body of a tag or commit,
we call find_subpos(), which returns pointers and lengths for the three
parts: subject, body, signature.
Oddly, the function finds the signature twice: once by calling
parse_signature() at the start, which copies the signature into a
separate strbuf, and then again by calling parse_signed_buffer() after
we've parsed past the subject.
This is due to 482c119186 (gpg-interface: improve interface for parsing
tags, 2021-02-11) and 88bce0e24c (ref-filter: hoist signature parsing,
2021-02-11). The idea is that in a multi-hash world, tag signatures may
appear in the header, rather than at the end of the body, in which case
we need to extract them into a separate buffer.
But parse_signature() would never find such a buffer! It only looks for
signature lines (like "-----BEGIN PGP") at the start of each line,
without any header keyword. So this code will never find anything except
the usual in-body signature.
And the extra code has two downsides:
1. We spend time copying the payload and signature into strbufs. That
might even be useful if we ended up with a NUL-terminated copy of
the payload data, but we throw it away immediately. And the
signature, since it comes at the end of the message, is already its
own NUL-terminated buffer.
The overhead isn't huge, but I measured a pretty consistent 1-2%
speedup running "git for-each-ref --format='%(subject)'" with this
patch on a clone of linux.git.
2. The output of find_subpos() is a set of three ptr/len combinations,
but only two of them point into the original buffer. This makes the
interface confusing: you can't do pointer comparisons between them,
and you have to remember to free the signature buffer. Since
there's only one caller, it's not too bad in practice, but it did
bite me while working on the next patch (and simplifying it will
pave the way for that).
In the long run we might have to go back to something like this
approach, if we do have multi-hash header signatures. But I would argue
that the extra buffer should kick in only for a header signature, and be
passed out of find_subpos() separately.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 9 Sep 2024 23:08:41 +0000 (19:08 -0400)]
t6300: drop newline from wrapped test title
We don't usually include newlines in test titles, because you get funny
TAP output like:
ok 417 - show good signature with custom format
ok 418 - show good signature with custom format
with ssh
ok 419 - signature atom with grade option and bad signature
where a TAP parser would ignore the extra line anyway, giving the wrong
title. This comes from 26c9c03f0a (ref-filter: add new "signature" atom,
2023-06-04), and I think it was probably just editor line wrapping.
reftable_stack_init_ref_iterator and reftable_stack_init_log_iterator
as defined by reftable/stack.{c,h} initialize a stack iterator to
iterate over the ref and log records in a reftable stack respectively.
Since these functions are not exercised by any of the existing tests,
add a test for them.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t-reftable-stack: add test for non-default compaction factor
In a recent codebase update (commit ae8e378430, merge branch
'ps/reftable-write-options', 2024/05/13) the geometric factor used
in auto-compaction of reftable tables was made configurable. Add
a test to verify the functionality introduced by this update.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t-reftable-stack: use reftable_ref_record_equal() to compare ref records
In the current stack tests, ref records are compared for equality
by sometimes using the dedicated function for ref-record comparison,
reftable_ref_record_equal(), and sometimes by explicity comparing
contents of the ref records.
The latter method is undesired because there can exist unequal ref
records with some of the contents being equal. Replace the latter
instances of ref-record comparison with the former. This has the
added benefit of preserving uniformity throughout the test file.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Alex Henrie [Mon, 9 Sep 2024 14:10:58 +0000 (08:10 -0600)]
apply: support --ours, --theirs, and --union for three-way merges
--ours, --theirs, and --union are already supported in `git merge-file`
for automatically resolving conflicts in favor of one version or the
other, instead of leaving conflict markers in the file. Support them in
`git apply -3` as well because the two commands do the same kind of
file-level merges.
In case in the future --ours, --theirs, and --union gain a meaning
outside of three-way-merges, they do not imply --3way but rather must be
specified alongside it.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back when 7cc91a2f (Add the configuration option skipFetchAll,
2009-11-09) added for the sole purpose of adding skipFetchAll as a
synonym to skipDefaultUpdate, there was no explanation about the
reason why it was needed., but these two configuration variables
mean exactly the same thing.
Also, when we taught the "prefetch" task to "git maintenance" later,
we did make it pay attention to the setting, but we forgot to
document it.
Document these variables as synonyms that collectively implements
the last-one-wins semantics, and also clarify that the prefetch task
is also controlled by this variable.
Ramsay Jones [Mon, 9 Sep 2024 01:23:48 +0000 (02:23 +0100)]
config.mak.uname: add HAVE_DEV_TTY to cygwin config section
If neither HAVE_DEV_TTY nor GIT_WINDOWS_NATIVE is set, while compiling
the 'compat/terminal.c' code, then the fallback code calls the system
getpass() function. Unfortunately, this ignores the 'echo' parameter of
the git_terminal_prompt() function, since it has no way to implement that
functionality. This results in a less than optimal user experience on
cygwin, which does not define either of those build flags.
However, cygwin does have a functional '/dev/tty', so that it can build
with HAVE_DEV_TTY and benefit from the improved user experience.
The improved git_terminal_prompt() function that comes with HAVE_DEV_TTY
is used in the git_prompt() function, which in turn is used by the
'git credential', 'git bisect' and 'git help' commands. In addition to
git_terminal_prompt(), read_key_without_echo() is likewise improved and
used by the 'git add -p' command.
While using the 'git credential fill' command, for example:
$ printf "%s\n" protocol=https host=example.com path=git | ./git credential fill
Username for 'https://example.com': user
Password for 'https://user@example.com':
protocol=https
host=example.com
username=user
password=pass
$
The 'user' name is now echoed while typing (the password isn't), where this
wasn't the case before.
When using the auto-correct feature:
$ ./git -c help.autocorrect=prompt fred
WARNING: You called a Git command named 'fred', which does not exist.
Run 'grep' instead [y/N]? n
$ ./git -c help.autocorrect=prompt fred
WARNING: You called a Git command named 'fred', which does not exist.
Run 'grep' instead [y/N]? y
fatal: no pattern given
$
The user can actually see what they are typing at the prompt. Similar
comments apply to 'git bisect':
$ ./git bisect bad master~1
You need to start by "git bisect start"
Do you want me to do it for you [Y/n]? y
status: waiting for both good and bad commits
status: waiting for good commit(s), bad commit known
$ ./git bisect reset
Already on 'master-tmp'
$
$ ./git bisect start
status: waiting for both good and bad commits
$ ./git bisect bad master~1
status: waiting for good commit(s), bad commit known
$ ./git bisect next
warning: bisecting only with a bad commit
Are you sure [Y/n]? n
$ ./git bisect reset
Already on 'master-tmp'
$
The read_key_without_echo() function leads to a much improved 'git add -p'
command, when the 'interactive.singleKey' configuration is set:
$ cd ..
$ mkdir test-git
$ cd test-git
$ git init -q
$ echo foo >file
$ git add file
$ echo bar >file
$ ../git/git -c interactive.singleKey=true add -p
diff --git a/file b/file
index 257cc56..5716ca5 100644
--- a/file
+++ b/file
@@ -1 +1 @@
-foo
+bar
(1/1) Stage this hunk [y,n,q,a,d,e,p,?]? y
$
Note that, not only is the user input echoed, but that it is immediately
accepted (without having to type <return>) and the program exits with the
hunk staged (in this case) or not.
In order to reap these benefits, set the HAVE_DEV_TTY build flag in the
cygwin configuration section of config.mak.uname.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t-reftable-stack: use Git's tempfile API instead of mkstemp()
Git's tempfile API defined by $GIT_DIR/tempfile.{c,h} provides
a unified interface for tempfile operations. Since reftable/stack.c
uses this API for all its tempfile needs instead of raw functions
like mkstemp(), make the ported stack test strictly use Git's
tempfile API as well.
A bigger benefit is the fact that we know to clean up the tempfile
in case the test fails because it gets registered and pruned via a
signal handler.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t: harmonize t-reftable-stack.c with coding guidelines
Harmonize the newly ported test unit-tests/t-reftable-stack.c
with the following guidelines:
- Single line 'for' statements must omit curly braces.
- Structs must be 0-initialized with '= { 0 }' instead of '= { NULL }'.
- Array sizes and indices should preferably be of type 'size_t' and
not 'int'.
- Function pointers should be passed as 'func' and not '&func'.
While at it, remove initialization for those variables that are
re-used multiple times, like loop variables.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t: move reftable/stack_test.c to the unit testing framework
reftable/stack_test.c exercises the functions defined in
reftable/stack.{c, h}. Migrate reftable/stack_test.c to the
unit testing framework. Migration involves refactoring the tests
to use the unit testing framework instead of reftable's test
framework and renaming the tests to be in-line with unit-tests'
standards.
Since some of the tests use set_test_hash() defined by
reftable/test_framework.{c, h} but these files are not
'#included' in the test file, copy this function in the
ported test file.
With the migration of stack test to the unit-tests framework,
"test-tool reftable" becomes a no-op. Hence, get rid of everything
that uses "test-tool reftable" alongside everything that is used
to implement it.
While at it, alphabetically sort the cmds[] list in
helper/test-tool.c by moving the entry for "dump-reftable".
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff: report dirty submodules as changes in builtin_diff()
The diff machinery has two ways to detect changes to set the exit code:
Just comparing hashes and comparing blob contents. The latter is needed
if certain changes have to be ignored, e.g. with --ignore-space-change
or --ignore-matching-lines. It's enabled by the diff_options flag
diff_from_contents.
The slower mode as never considered submodules (and subrepos) as changes
with --submodule=diff or --submodule=log, which is inconsistent with
--submodule=short (the default). Fix it.
d7b97b7185 (diff: let external diffs report that changes are
uninteresting, 2024-06-09) set diff_from_contents if external diff
programs are allowed. This is the default e.g. for git diff, and so
that change exposed the inconsistency much more widely.
diff: report copies and renames as changes in run_diff_cmd()
The diff machinery has two ways to detect changes to set the exit code:
Just comparing hashes and comparing blob contents. The latter is needed
if certain changes have to be ignored, e.g. with --ignore-space-change
or --ignore-matching-lines. It's enabled by the diff_options flag
diff_from_contents.
The slower mode has never considered copies and renames to be changes,
which is inconsistent with the quicker one. Fix it. Even if we ignore
the file contents (because it's empty or contains only ignored lines),
there's still the meta data change of adding or changing a filename, so
we need to report it in the exit code.
d7b97b7185 (diff: let external diffs report that changes are
uninteresting, 2024-06-09) set diff_from_contents if external diff
programs are allowed. This is the default e.g. for git diff, and so
that change exposed the inconsistency much more widely.
The GIT_ADVICE environment variable was added implicitly in b79deeb5544
(advice: add --no-advice global option, 2024-05-03) but was not
documented. Add documentation to show that it is an option for tools
that want to disable these messages. Make note that while the
--no-advice option exists, older Git versions will fail to parse that
option. The environment variable presents a way to change the behavior
of Git versions that understand it without disrupting older versions.
Co-authored-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some large repositories use tags to track a huge list of release
versions. While this choice is costly on the ref advertisement, it is
further wasteful for clients who do not need those tags. Allow clients
to optionally skip the tag advertisement.
This behavior is similar to that of 'git clone --no-tags' implemented in 0dab2468ee5 (clone: add a --no-tags option to clone without tags,
2017-04-26), including the modification of the remote.origin.tagOpt
config value to include "--no-tags".
One thing that is opposite of the 'git clone' implementation is that
this allows '--tags' as an assumed option, which can be naturally negated
with '--no-tags'. The clone command does not accept '--tags' but allows
"--no-no-tags" as the negation of its '--no-tags' option.
While testing this option, combine the test with the previously untested
'--no-src' option introduced in 4527db8ff8c (scalar: add --[no-]src
option, 2023-08-28).
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 6 Sep 2024 17:38:49 +0000 (10:38 -0700)]
Merge branch 'jk/unused-parameters'
Make our codebase compilable with the -Werror=unused-parameter
option.
* jk/unused-parameters:
CodingGuidelines: mention -Wunused-parameter and UNUSED
config.mak.dev: enable -Wunused-parameter by default
compat: mark unused parameters in win32/mingw functions
compat: disable -Wunused-parameter in win32/headless.c
compat: disable -Wunused-parameter in 3rd-party code
t-reftable-readwrite: mark unused parameter in callback function
gc: mark unused config parameter in virtual functions
Brian Lyles [Fri, 6 Sep 2024 14:50:08 +0000 (09:50 -0500)]
interpret-trailers: handle message without trailing newline
When git-interpret-trailers is used to add a trailer to a message that
does not end in a trailing newline, the new trailer is added on the line
immediately following the message instead of as a trailer block
separated from the message by a blank line.
For example, if a message's text was exactly "The subject" with no
trailing newline present, `git interpret-trailers --trailer
my-trailer=true` will result in the following malformed commit message:
The subject
my-trailer: true
While it is generally expected that a commit message should end with a
newline character, git-interpret-trailers should not be returning an
invalid message in this case.
Use `strbuf_complete_line` to ensure that the message ends with a
newline character when reading the input.
Signed-off-by: Brian Lyles <brianmlyles@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 6 Sep 2024 03:48:41 +0000 (23:48 -0400)]
sparse-checkout: use fdopen_lock_file() instead of xfdopen()
When updating sparse patterns, we open a lock_file to write out the new
data. The lock_file struct holds the file descriptor, but we call
fdopen() to get a stdio handle to do the actual write.
After we finish writing, we fflush() so that all of the data is on disk,
and then call commit_lock_file() which closes the descriptor. But we
never fclose() the stdio handle, leaking it.
The obvious solution seems like it would be to just call fclose(). But
when? If we do it before commit_lock_file(), then the lock_file code is
left thinking it owns the now-closed file descriptor, and will do an
extra close() on the descriptor. But if we do it before, we have the
opposite problem: the lock_file code will close the descriptor, and
fclose() will do the extra close().
We can handle this correctly by using fdopen_lock_file(). That leaves
ownership of the stdio handle with the lock_file, which knows not to
double-close it.
We do have to adjust the code a bit:
- we have to handle errors ourselves; we can just die(), since that's
what xfdopen() would have done (and we can even provide a more
specific error message).
- we no longer need to call fflush(); committing the lock-file
auto-closes it, which will now do the flush for us. As a bonus, this
will actually check that the flush was successful before renaming
the file into place.
- we can get rid of the local "fd" variable, since we never look at it
ourselves now
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 6 Sep 2024 03:47:38 +0000 (23:47 -0400)]
sparse-checkout: check commit_lock_file when writing patterns
When writing a new "sparse-checkout" file, we do the usual strategy of
writing to a lockfile and committing it into place. But we don't check
the outcome of commit_lock_file(). Failing there would prevent us from
writing a bogus file (good), but we would ignore the error and return a
successful exit code (bad).
Fix this by calling die(). Note that we need to keep the sparse_filename
variable valid for longer, since the filename stored in the lock_file
struct will be dropped when we run commit_lock_file().
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 6 Sep 2024 03:47:08 +0000 (23:47 -0400)]
sparse-checkout: consolidate cleanup when writing patterns
In write_patterns_and_update(), we always need to free the pattern list
before exiting the function. Rather than handling it manually when we
return early, we can jump to an "out" label where cleanup happens. This
let us drop one line, but also establishes a pattern we can use for
other cleanup.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 5 Sep 2024 08:51:49 +0000 (04:51 -0400)]
drop trailing newline from warning/error/die messages
Our error reporting routines append a trailing newline, and the strings
we pass to them should not include them (otherwise we get an extra blank
line after the message).
These cases were all found by looking at the results of:
Note that we _do_ sometimes include a newline in the middle of such
messages, to create multiline output (hence our grep matching "," or ")"
after we see the newline, so we know we're at the end of the string).
It's possible that one or more of these cases could intentionally be
including a blank line at the end, but having looked at them all
manually, I think these are all just mistakes.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Kevin Lyles [Tue, 3 Sep 2024 22:06:47 +0000 (22:06 +0000)]
builtin/cat-file: mark 'git cat-file' sparse-index compatible
This change affects how 'git cat-file' works with the index when
specifying an object with the ":<path>" syntax (which will give file
contents from the index).
'git cat-file' expands a sparse index to a full index any time contents
are requested from the index by specifying an object with the ":<path>"
syntax. This is true even when the requested file is part of the sparse
index, and results in much slower 'git cat-file' operations when working
within the sparse index.
Mark 'git cat-file' as not needing a full index, so that you only pay
the cost of expanding the sparse index to a full index when you request
a file outside of the sparse index.
Add tests to ensure both that:
- 'git cat-file' returns the correct file contents whether or not the
file is in the sparse index
- 'git cat-file' expands to the full index any time you request
something outside of the sparse index
Signed-off-by: Kevin Lyles <klyles+github@epic.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Kevin Lyles [Tue, 3 Sep 2024 22:06:46 +0000 (22:06 +0000)]
t1092: allow run_on_* functions to use standard input
The 'run_on_sparse' and 'run_on_all' functions do not work correctly for
commands accepting standard input, because they run the same command
multiple times and the first instance consumes it. This also indirectly
affects 'test_all_match' and 'test_sparse_match'.
To allow these functions to work with commands accepting standard input,
first slurp standard input to a temporary file, and then run the command
with its standard input redirected from the temporary file. This ensures
that each command sees the same contents from its standard input.
Note that this does not impact commands that do not read from standard
input; they continue to ignore it. Additionally, existing uses of the
run_on_* functions do not need to do anything differently, as the
standard input of the test environment is already connected to
/dev/null.
We do not explicitly clean up the input files because they are cleaned
up with the rest of the test repositories and their contents may be
useful for figuring out which command failed when a test case fails.
Signed-off-by: Kevin Lyles <klyles@epic.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/files: use heuristic to decide whether to repack with `--auto`
The `--auto` flag for git-pack-refs(1) allows the ref backend to decide
whether or not a repack is in order. This switch has been introduced
mostly with the "reftable" backend in mind, which already knows to
auto-compact its tables during normal operations. When the flag is set,
then it will use the same auto-compaction mechanism and thus end up
doing nothing in most cases.
The "files" backend does not have any such heuristic yet and instead
packs any loose references unconditionally. So we rewrite the complete
"packed-refs" file even if there's only a single loose reference to be
packed.
Even worse, starting with 9f6714ab3e (builtin/gc: pack refs when using
`git maintenance run --auto`, 2024-03-25), `git pack-refs --auto` is
unconditionally executed via our auto maintenance, so we end up repacking
references every single time auto maintenance kicks in. And while that
commit already mentioned that the "files" backend unconditionally packs
refs now, the author obviously didn't quite think about the consequences
thereof. So while the idea was sound, we really should have added a
heuristic to the "files" backend before implementing it.
Introduce a heuristic that decides whether or not it is worth to pack
loose references. The important factors to decide here are the number of
loose references in comparison to the overall size of the "packed-refs"
file. The bigger the "packed-refs" file, the longer it takes to rewrite
it and thus we scale up the limit of allowed loose references before we
repack.
As is the nature of heuristics, this mechansim isn't obviously
"correct", but should rather be seen as a tradeoff between how much
resources we spend packing refs and how inefficient the ref store
becomes. For all I can say, we have successfully been using the exact
same heuristic in Gitaly for several years by now.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two tests in t0601 which exercise the same underlying logic,
once via `git pack-refs --auto` and once via `git maintenance run
--auto`. Merge these two tests into one such that it becomes easier to
extend test coverage for both commands at the same time.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have an implementation of a function that computes the log2 for an
integer. While we could instead use log2(3P), that involves floating
point numbers and is thus needlessly complex and inefficient.
We're about to add a second caller that wants to compute log2 for a
`size_t`. Let's thus move the function into "wrapper.h" such that it
becomes generally available.
While at it, tweak the implementation a bit:
- The parameter is converted from `int` to `uintmax_t`. This
conversion is safe to do in "bisect.c" because we already check that
the argument is positive.
- The return value is an `unsigned`. It cannot ever be negative, so it
is pointless for it to be a signed integer.
- Loop until `!n` instead of requiring that `n > 1` and then subtract
1 from the result and add a special case for `!sz`. This helps
compilers to generate more efficient code.
Compilers recognize the pattern of this function and optimize
accordingly. On GCC 14.2 x86_64:
log2u(unsigned long):
test rdi, rdi
je .L3
bsr rax, rdi
ret
.L3:
mov eax, -1
ret
Clang 18.1 does not yet recognize the pattern, but starts to do so on
Clang trunk x86_64. The code isn't quite as efficient as the one
generated by GCC, but still manages to optimize away the loop:
log2u(unsigned long):
test rdi, rdi
je .LBB0_1
shr rdi
bsr rcx, rdi
mov eax, 127
cmovne rax, rcx
xor eax, -64
add eax, 65
ret
.LBB0_1:
mov eax, -1
ret
The pattern is also recognized on other platforms like ARM64 GCC 14.2.0,
where we end up using `clz`:
log2u(unsigned long):
clz x2, x0
cmp x0, 0
mov w1, 63
sub w0, w1, w2
csinv w0, w0, wzr, ne
ret
Note that we have a similar function `fastlog2()` in the reftable code.
As that codebase is separate from the Git codebase we do not adapt it to
use the new function.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/index-pack: fix segfaults when running outside of a repo
It was reported that git-verify-pack(1) has started to crash with Git
v2.46.0 when run outside of a repository. This is another fallout from c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), where we have stopped setting the default hash algorithm
for `the_repository`. Consequently, code that relies on `the_hash_algo`
will now crash when it hasn't explicitly been initialized, which may be
the case when running outside of a Git repository.
The crash is not in git-verify-pack(1) but instead in git-index-pack(1),
which gets called by the former. Ideally, both of these programs should
be able to identify the hash algorithm used by the packfile and index
without having to rely on external information. But unfortunately, the
format for neither of them is completely self-describing, so it is not
possible to derive that information. This is a design issue that we
should address by introducing a new packfile version that encodes its
object hash.
For now though the more important fix is to not make either of these
programs crash anymore, which we do by falling back to SHA1 when the
object hash is unconfigured. This pessimizes reading packfiles which
use a different hash than SHA1, but restores previous behaviour.
Reported-by: Ilya K <me@0upti.me> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation/BreakingChanges: announce removal of git-pack-redundant(1)
The git-pack-redundant(1) command is already in the process of being
phased out and dies unless the user passes the `--i-still-use-this` flag
since 4406522b76 (pack-redundant: escalate deprecation warning to an
error, 2023-03-23). We haven't heard any complaints, so let's announce
the removal of this command in Git 3.0 in our breaking changes document.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 3 Sep 2024 16:15:01 +0000 (09:15 -0700)]
Merge branch 'js/fetch-push-trace2-annotation'
More trace2 events at key points on push and fetch code paths have
been added.
* js/fetch-push-trace2-annotation:
send-pack: add new tracing regions for push
fetch: add top-level trace2 regions
trace2: implement trace2_printf() for event target
Alex Henrie [Mon, 2 Sep 2024 02:59:14 +0000 (20:59 -0600)]
mergetools: vscode: new tool
VSCode has supported three-way merges since 2022, see
<https://github.com/microsoft/vscode/issues/5770#issuecomment-1188658476>.
Although the program binary is located at /usr/bin/code, name the
mergetool "vscode" because the word "code" is too generic and would lead
to confusion. The name "vscode" also matches Git's existing
contrib/vscode directory.
On Windows, VSCode adds the directory that contains code.cmd to %PATH%,
so there is no need to invoke mergetool_find_win32_cmd to search for the
program.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>