Junio C Hamano [Thu, 12 Jun 2025 17:04:58 +0000 (10:04 -0700)]
build: sed portability fixes
Recently generating the version-def.h file and the config-list.h
file have been updated, which broke versions of "sed" that do not
want to be fed a file that ends with an incomplete line, and/or that
do not understand the more recent "-E" option to use extended
regular expression.
Fix them in response to a build-failure reported on Solaris boxes.
Junio C Hamano [Mon, 7 Apr 2025 21:23:17 +0000 (14:23 -0700)]
Merge branch 'cc/lop-remote'
Bugfix in newly introduced large-object-promisor remote support.
* cc/lop-remote:
promisor-remote: compare remote names case sensitively
promisor-remote: fix possible issue when no URL is advertised
promisor-remote: fix segfault when remote URL is missing
t5710: arrange to delete the client before cloning
Junio C Hamano [Mon, 7 Apr 2025 21:23:17 +0000 (14:23 -0700)]
Merge branch 'jc/name-rev-stdin'
Using "git name-rev --stdin" as an example, improve the framework to
prepare tests to pretend to be in the future where the breaking
changes have already happened.
* jc/name-rev-stdin:
name-rev: remove "--stdin" support
t6120: further modernize
t6120: avoid hiding "git" exit status
t: introduce WITH_BREAKING_CHANGES prerequisite
t: extend test_lazy_prereq
t: document test_lazy_prereq
Junio C Hamano [Sat, 29 Mar 2025 07:39:10 +0000 (16:39 +0900)]
Merge branch 'en/random-cleanups'
Miscellaneous code clean-ups.
* en/random-cleanups:
merge-ort: remove extraneous word in comment
merge-ort: fix accidental strset<->strintmap
t7615: be more explicit about diff algorithm used
t6423: fix a comment that accidentally reversed two commits
stash: remove merge-recursive.h include
Junio C Hamano [Sat, 29 Mar 2025 07:39:10 +0000 (16:39 +0900)]
Merge branch 'jk/use-wunreachable-code-for-devs'
Enable -Wunreachable-code for developer builds.
* jk/use-wunreachable-code-for-devs:
config.mak.dev: enable -Wunreachable-code
git-compat-util: add NOT_CONSTANT macro and use it in atfork_prepare()
run-command: use errno to check for sigfillset() error
Junio C Hamano [Sat, 29 Mar 2025 07:39:08 +0000 (16:39 +0900)]
Merge branch 'jk/fetch-ref-prefix-cleanup'
In protocol v2 where the refs advertisement is constrained, we try
to tell the server side not to limit the advertisement when there
is no specific need to, which has been the source of confusion and
recent bugs. Revamp the logic to simplify.
* jk/fetch-ref-prefix-cleanup:
fetch: use ref prefix list to skip ls-refs
fetch: avoid ls-refs only to ask for HEAD symref update
fetch: stop protecting additions to ref-prefix list
fetch: ask server to advertise HEAD for config-less fetch
refspec_ref_prefixes(): clean up refspec_item logic
t5516: beef up exact-oid ref prefixes test
t5516: drop NEEDSWORK about v2 reachability behavior
t5516: prefer "oid" to "sha1" in some test titles
t5702: fix typo in test name
First step of deprecating and removing merge-recursive.
* en/merge-ort-prepare-to-remove-recursive:
am: switch from merge_recursive_generic() to merge_ort_generic()
merge-ort: fix merge.directoryRenames=false
t3650: document bug when directory renames are turned off
merge-ort: support having merge verbosity be set to 0
merge-ort: allow rename detection to be disabled
merge-ort: add new merge_ort_generic() function
Junio C Hamano [Sat, 29 Mar 2025 07:39:06 +0000 (16:39 +0900)]
Merge branch 'cc/signed-fast-export-import'
"git fast-export | git fast-import" learns to deal with commit and
tag objects with embedded signatures a bit better.
* cc/signed-fast-export-import:
fast-export, fast-import: add support for signed-commits
fast-export: do not modify memory from get_commit_buffer
git-fast-export.adoc: clarify why 'verbatim' may not be a good idea
fast-export: rename --signed-tags='warn' to 'warn-verbatim'
fast-export: fix missing whitespace after switch
git-fast-import.adoc: add missing LF in the BNF
Junio C Hamano [Wed, 26 Mar 2025 07:26:10 +0000 (16:26 +0900)]
Merge branch 'en/merge-process-renames-crash-fix'
The merge-recursive and merge-ort machinery crashed in corner cases
when certain renames are involved.
* en/merge-process-renames-crash-fix:
merge-ort: fix slightly overzealous assertion for rename-to-self
t6423: add a testcase causing a failed assertion in process_renames
Junio C Hamano [Wed, 26 Mar 2025 07:26:10 +0000 (16:26 +0900)]
Merge branch 'ua/some-builtins-wo-the-repository'
A handful of built-in command implementations have been rewritten
to use the repository instance supplied by git.c:run_builtin(), its
caller.
* ua/some-builtins-wo-the-repository:
builtin/checkout-index: stop using `the_repository`
builtin/for-each-ref: stop using `the_repository`
builtin/ls-files: stop using `the_repository`
builtin/pack-refs: stop using `the_repository`
builtin/send-pack: stop using `the_repository`
builtin/verify-commit: stop using `the_repository`
builtin/verify-tag: stop using `the_repository`
config: teach repo_config to allow `repo` to be NULL
Junio C Hamano [Wed, 26 Mar 2025 07:26:10 +0000 (16:26 +0900)]
Merge branch 'tb/refs-exclude-fixes'
The refname exclusion logic in the packed-ref backend has been
broken for some time, which confused upload-pack to advertise
different set of refs. This has been corrected.
Junio C Hamano [Wed, 26 Mar 2025 07:26:09 +0000 (16:26 +0900)]
Merge branch 'sj/ref-consistency-checks-more'
"git fsck" becomes more careful when checking the refs.
* sj/ref-consistency-checks-more:
builtin/fsck: add `git refs verify` child process
packed-backend: check whether the "packed-refs" is sorted
packed-backend: add "packed-refs" entry consistency check
packed-backend: check whether the refname contains NUL characters
packed-backend: add "packed-refs" header consistency check
packed-backend: check if header starts with "# pack-refs with: "
packed-backend: check whether the "packed-refs" is regular file
builtin/refs: get worktrees without reading head information
t0602: use subshell to ensure working directory unchanged
The CI setups of GitLab and GitHub use a common dependency management
script 'ci/install-dependencies.sh'. The script install the necessary
packages based on a combination of the "$distro" and "$jobname" env
variables.
The "$distro" variable is derived from the "CI_JOB_IMAGE" env variable
set by the CI configs. In the GitHub CI config, some of the jobs are
missing this variable. For the 'Documentation' job which depends on
'meson' being installed, this raises an error since the 'meson'
dependency is never installed.
Fix this by adding the 'CI_JOB_IMAGE' variable to all missing jobs. We
don't add it the windows jobs, since they manager their dependency as
part of the CI config and no further dependency management is needed.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jean-Noël Avila [Wed, 19 Mar 2025 08:16:23 +0000 (08:16 +0000)]
doc: apply new format to git-branch man page
- Switch the synopsis to a synopsis block which automatically
formats placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine applies synopsis rules to
these spans.
Possible values for some variables, that were mentioned in the description
prose, are now made into enumerated list.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jean-Noël Avila [Wed, 19 Mar 2025 08:16:22 +0000 (08:16 +0000)]
completion: take into account the formatting backticks for options
With the modern formatting of the manpages, the options and commands are now
backticked in their definition lists. This patch updates the generation of
the completion list to take into account this new format.
The script `generate-configlist.sh` is updated to get rid of extraneous
commands and fit everything in a single sed script.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Tue, 18 Mar 2025 11:00:08 +0000 (12:00 +0100)]
promisor-remote: compare remote names case sensitively
Because the "[remote "nick"] fetch = ..." configuration variables
have the nickname in the second part, the nicknames are case
sensitive, unlike the first and the third component (i.e.
"remote.origin.fetch" and "Remote.origin.FETCH" are the same thing,
but "remote.Origin.fetch" and "remote.origin.fetch" are different).
Let's follow the way Git works in general and compare the remote
names case sensitively when processing advertised remotes.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Tue, 18 Mar 2025 11:00:07 +0000 (12:00 +0100)]
promisor-remote: fix possible issue when no URL is advertised
In the 'KnownUrl' case, in should_accept_remote(), let's check that
`remote_url` is not NULL before we use strcmp() to compare it with
the local URL. This could avoid crashes if a server starts to not
advertise any URL in the future.
If `remote_url` is NULL, we should reject the URL. Let's also warn in
this case because we warn otherwise when a remote is rejected to try
to help diagnose things at the end of the function.
And while we are checking that remote_url is not NULL and warning if
it is, it makes sense to also help diagnose the case where remote_url
is empty.
Also while at it, let's spell "URL" with uppercase letters in all the
warnings.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Tue, 18 Mar 2025 11:00:06 +0000 (12:00 +0100)]
promisor-remote: fix segfault when remote URL is missing
Using strvec_push() to push `NULL` into a 'strvec' results in a
segfault, because `xstrdup(NULL)` crashes.
So when an URL is missing from the config, let's not push the remote
name and URL into the 'strvec's.
While at it, let's also not push them in case the URL is empty. It's
just not worth the trouble and it's consistent with how Git otherwise
treats missing and empty URLs in the same way.
Note that in case of missing or empty URL, Git uses the remote name to
fetch, which can work if the remote is on the same filesystem. So
configurations where the client, server and remote are all on the same
filesystem may need URLs to be configured even if they are the same as
the remote names. But this is a rare case, and the work around is easy
enough.
We leave improving the strvec API and/or xstrdup() for a future
separate effort.
While at it, let's also use git_config_get_string_tmp() instead of
git_config_get_string() to simplify memory management.
Helped-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Tue, 18 Mar 2025 11:00:05 +0000 (12:00 +0100)]
t5710: arrange to delete the client before cloning
If `test_when_finished "rm -rf client"` is run after we clone, it
will not run if the clone failed, so the "client" directory might
not be removed at the end of the test.
`git clone` does try to remove the directory when it fails, but
let's be safe and try to protect against possibly weird clone
failures by moving `test_when_finished "rm -rf client"` before
the clone. It just makes more sense this way around.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Thu, 13 Mar 2025 02:46:41 +0000 (02:46 +0000)]
am: switch from merge_recursive_generic() to merge_ort_generic()
Switch from merge-recursive to merge-ort. Adjust the following
testcases due to the switch:
* t4151: This test left an untracked file in the way of the merge.
merge-recursive could only sometimes tell when untracked files were
in the way, and by the time it discovers others, it has already made
too many changes to back out of the merge. So, instead of writing the
results to e.g. 'file1' it would instead write them to
'file1~branch1'. This is confusing for users, because they might not
notice 'file1~branch1' and accidentally add and commit 'file1'.
In contrast, merge-ort correctly notices the file in the way before
making any changes and aborts. Since this test didn't care about the
file in the way, just remove it before calling git-am.
* t4255: Usage of merge-ort allows us to change two known failures into
successes.
* t6427: As noted a few commits ago, the choice of conflict label for
diff3 markers for the ancestor commit was previously handled by
merge-recursive.c rather than by callers. Since that has now changed,
`git am` needs to specify that label. Although the previous conflict
label ("constructed merge base") was already fairly somewhat slanted
towards `git am`, let's use wording more along the lines of the
related command-line flag from `git apply` and function involved to
tie it more closely to `git am`.
Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Thu, 13 Mar 2025 02:46:40 +0000 (02:46 +0000)]
merge-ort: fix merge.directoryRenames=false
There are two issues here.
First, when merge.directoryRenames is set to false, there are a few code
paths that should be turned off. I missed one; collect_renames() was
still doing some directory rename detection logic unconditionally. It
ended up not having much effect because
get_provisional_directory_renames() was skipped earlier and not setting
up renames->dir_renames, but the code should still be skipped.
Second, the larger issue is that sometimes we get a cached_pair rename
from a previous commit being replayed mapping A->B, but in a subsequent
commit but collect_merge_info() doesn't even recurse into the
directory containing B because there are no source pairings for that
rename that are relevant; we can merge that commit fine without knowing
the rename. But since the cached renames are added to the normal
renames, when we go to process it and find that B is not part of
opt->priv->paths, we hit the assertion error
process_renames: Assertion `newinfo && ~newinfo->merged.clean` failed.
I think we could fix this at the beginning of detect_regular_renames() by
pruning from cached_pairs any entry whose destination isn't in
opt->priv->paths, but it's suboptimal in that we'd kind of like the
cached_pair to be restored afterwards so that it can help the subsequent
commit, but more importantly since it sits at the intersection of
the caching renames optimization and the relevant renames optimization,
and the trivial directory resolution optimization, and I don't currently
have Documentation/technical/remembering-renames.txt fully paged in, I'm
not sure if that's a full solution or a bandaid for the current
testcase. However, since the remembering renames optimization was the
weakest of the set, and the optimization is far less important when
directory rename detection is off (as that implies far fewer potential
renames), let's just use a bigger hammer to ensure this special case is
fixed: turn off the rename caching. We do the same thing already when
we encounter rename/rename(1to1) cases (as per `git grep -3
disabling.the.optimization`, though it uses a slightly different
triggering mechanism since it's trying to affect the next time that
merge_check_renames_reusable() is called), and I think it makes sense
to do the same here.
Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is quite a curious bug: the same test case will succeed, without any
assertion, if instead run with `merge.directoryRenames=true`.
Further, the assertion does not manifest while replaying the first
commit, it manifests while replaying the _second_ commit of the commit
range. But it does _not_ manifest when the second commit is replayed
individually.
This would indicate that there is an incomplete rename cache left-over
from the first replayed commit which is being reused for the second
commit, and if directory rename detection is enabled, the missing paths
are somehow regenerated.
Incidentally, the same bug can by triggered by modifying t6429 to switch
from merge.directoryRenames=true to merge.directoryRenames=false.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[en: tweaked the commit message slightly, including adjusting the
line number of the assertion to the latest version, and the much
later discovery that a simple t6429 tweak would also display the
issue.] Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Thu, 13 Mar 2025 02:46:38 +0000 (02:46 +0000)]
merge-ort: support having merge verbosity be set to 0
Various callers such as am & checkout set the merge verbosity to 0 to
avoid having conflict messages printed. While this could be achieved by
avoiding the wrappers from merge-ort-wrappers and instead passing 0 for
display_update_msgs to merge_switch_to_result(), for simplicity of
converting callers simply allow them to also achieve this with the
merge-ort-wrappers by setting verbosity to 0.
Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Thu, 13 Mar 2025 02:46:37 +0000 (02:46 +0000)]
merge-ort: allow rename detection to be disabled
When merge-ort was written, I did not at first allow rename detection to
be disabled, because I suspected that most folks disabling rename
detection were doing so solely for performance reasons. Since I put a
lot of working into providing dramatic speedups for rename detection
performance as used by the merge machinery, I wanted to know if there
were still real world repositories where rename detection was
problematic from a performance perspective. We have had years now to
collect such information, and while we never received one, waiting
longer with the option disabled seems unlikely to help surface such
issues at this point. Also, there has been at least one request to
allow rename detection to be disabled for behavioral rather than
performance reasons (see the thread including
https://lore.kernel.org/git/CABPp-BG-Nx6SCxxkGXn_Fwd2wseifMFND8eddvWxiZVZk0zRaA@mail.gmail.com/
), so let's start heeding the config and command line settings.
Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Thu, 13 Mar 2025 02:46:36 +0000 (02:46 +0000)]
merge-ort: add new merge_ort_generic() function
merge-recursive.[ch] have three entry points:
* merge_trees()
* merge_recursive()
* merge_recursive_generic()
merge-ort*.[ch] only has equivalents for the first two. Add an
equivalent for the final entry point, so we can switch callers to
use it and remove merge-recursive.[ch].
While porting it over, finally fix the issue with the label for the
ancestor (used when merge.conflictStyle=diff3 as a conflict label).
merge-recursive.c has traditionally not allowed callers to set that
label, but I have found that problematic for years.
(Side note: This function was initially part of the merge-ort rewrite,
but reviewers questioned the ancestor label funnyness which I was
never really happy with anyway. It resulted in me jettisoning it and
hoping at the time that I would eventually be able to force the existing
callers to use some other API. That worked with `git stash`, as per 874cf2a60444 (stash: apply stash using 'merge_ort_nonrecursive()',
2022-05-10), but this API is the most reasonable one for `git am` and
`git merge-recursive`, if we can just allow them some freedom over the
ancestor label.)
The merge_recursive_generic() function did not know whether it was being
invoked by `git stash`, `git merge-recursive`, or `git am`, and the
choice of meaningful ancestor label, when there is a unique ancestor,
varies for these different callers:
* git am: ancestor is a constructed "fake ancestor" that user knows
nothing about and has no access to. (And is different than
the normal thing we mean by a "virtual merge base" which is
the merging of merge bases.)
* git merge-recursive: ancestor might be a tree, but at least it
was one specified by the user (if they invoked
merge-recursive directly)
* git stash: ancestor was the commit serving as the stash base
Thus, using a label like "constructed merge base" (as
merge_recursive_generic() does) presupposes that `git am` is the only
caller; it is incorrect for other callers. This label has thrown me off
more than once. Allow the caller to override when there is a unique
merge base.
Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adam Johnson [Thu, 1 Jun 2023 21:14:57 +0000 (21:14 +0000)]
doc: restore: remove note on --patch w/ pathspecs
This note was added to the restore command docs in 46e91b663b
(checkout: split part of it to new command 'restore', 2019-04-25),
but it is now inaccurate. The underlying builtin `add -i` implementation,
made default in 0527ccb1b5 (add -i: default to the built-in implementation,
2021-11-30), supports pathspecs, so `git restore -p <pathspec>...` has
worked for all users since then. I bisected to verify this was the commit
that added support.
Signed-off-by: Adam Johnson <me@adamj.eu> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In that case it was found by Coverity, but finding it earlier saves
everybody time and effort.
We can use -Wunreachable-code to get some help from the compiler here.
Interestingly, this is a noop in gcc. It was a real warning up until gcc
4.x, when it was removed for being too flaky, but they left the
command-line option to avoid breaking users. See:
However, clang does implement this option, and it finds the case
mentioned above (and no other cases within the code base). And since we
run clang in several of our CI jobs, that's enough to get an early
warning of breakage.
We could enable it only for clang, but since gcc is happy to ignore it,
it's simpler to just turn it on for all developer builds.
Signed-off-by: Jeff King <peff@peff.net>
[jc: squashed meson.build change sent by Patrick] Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 17 Mar 2025 23:53:28 +0000 (16:53 -0700)]
git-compat-util: add NOT_CONSTANT macro and use it in atfork_prepare()
Our hope is that the number of code paths that falsely trigger
warnings with the -Wunreachable-code compilation option are small,
and they can be worked around case-by-case basis, like we just did
in the previous commit. If we need such a workaround a bit more
often, however, we may benefit from a more generic and descriptive
facility that helps document the cases we need such workarounds.
Side note: if we need the workaround all over the place, it
simply means -Wunreachable-code is not a good tool for us to
save engineering effort to catch mistakes. We are still
exploring if it helps us, so let's assume that it is not the
case.
Introduce NOT_CONSTANT() macro, with which, the developer can tell
the compiler:
Do not optimize this expression out, because, despite whatever
you are told by the system headers, this expression should *not*
be treated as a constant.
and use it as a replacement for the workaround we used that was
somewhat specific to the sigfillset case. If the compiler already
knows that the call to sigfillset() cannot fail on a particular
platform it is compiling for and declares that the if() condition
would not hold, it is plausible that the next version of the
compiler may learn that sigfillset() that never fails would not
touch errno and decide that in this sequence:
errno = 0;
sigfillset(&all)
if (errno)
die_errno("sigfillset");
the if() statement will never trigger. Marking that the value
returned by sigfillset() cannot be a constant would document our
intention better and would not break with such a new version of
compiler that is even more "clever". With the marco, the above
sequence can be rewritten:
if (NOT_CONSTANT(sigfillset(&all)))
die_errno("sigfillset");
which looks almost like other innocuous annotations we have,
e.g. UNUSED.
Jeff King [Mon, 17 Mar 2025 23:53:27 +0000 (16:53 -0700)]
run-command: use errno to check for sigfillset() error
Since enabling -Wunreachable-code, builds with clang on macOS now fail,
complaining that the die_errno() call in:
if (sigfillset(&all))
die_errno("sigfillset");
is unreachable. On that platform the manpage documents that sigfillset()
always returns success, and presumably the implementation is a macro or
inline function that does so in a way that is transparent to the
compiler.
But we should continue to check on other platforms, since POSIX says it
may return an error.
We could solve this with a compile-time knob to split the two cases
(assuming success on macOS and checking for the error elsewhere). But we
can also work around it more directly by relying on errno to check the
outcome (since POSIX dictates that errno will be set on error). And that
works around the compiler's cleverness, since it doesn't know the
semantics of errno (though I suppose if sigfillset() is simple enough,
it could perhaps realize that no writes to errno are possible; however
this does seem to work in practice).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Sun, 16 Mar 2025 06:58:58 +0000 (06:58 +0000)]
merge-ort: fix accidental strset<->strintmap
Both strset_for_each_entry and strintmap_for_each_entry are macros that
evaluate to the same thing, so they are technically interchangeable.
However, the intent is that we use the one matching the variable type we
are passing. Unfortunately, I somehow mistakenly got one of these wrong
in 7bee6c100431 (merge-ort: avoid recursing into directories when we
don't need to, 2021-07-16) -- possibly related to the fact that
relevant_sources was initially a strset and later refactored into a
strintmap. Correct which macro we use.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Sun, 16 Mar 2025 06:58:57 +0000 (06:58 +0000)]
t7615: be more explicit about diff algorithm used
t7615 is entirely about testing the differences about different
diff algorithms, but it doesn't specify any diff algorithm when it
is testing myers. Given that we have discussed potentially switching
defaults (https://lore.kernel.org/git/xmqqed873vgn.fsf@gitster.g/), it
makes sense in tests that are about different diff algorithms to be
explicitly about which one is intended to be used in each test. Add
that specificity.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Sun, 16 Mar 2025 06:58:55 +0000 (06:58 +0000)]
stash: remove merge-recursive.h include
stash was modified to use merge_ort_nonrecursive() instead of
merge_recursive_generic() back in commit 874cf2a60444 (stash: apply
stash using 'merge_ort_nonrecursive()', 2022-05-10). That makes the
inclusion of merge-recursive.h unnecessary. In preparation for the
removal of merge-recursive.h, remove the unnecessary include.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eli Schwartz [Sun, 16 Mar 2025 06:06:05 +0000 (02:06 -0400)]
meson: fix perl detection when docs are enabled, but perl bindings aren't
The `perl` variable in meson.build is assigned to a program lookup,
which may have the value "not-found object" if configuring with
`-Dperl=disabled`.
There is already a list of other cases where we do need a perl command,
even when not building perl bindings. Building documentation should be
one of those cases, but was missing from the list. Add it.
Fixes:
```
$ meson setup builddir/ -Ddocs=man -Dperl=disabled -Dtests=false
[...]
Documentation/meson.build:308:22: ERROR: Tried to use not-found external program in "command"
```
Bug: https://bugs.gentoo.org/949247 Signed-off-by: Eli Schwartz <eschwartz@gentoo.org> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Sat, 15 Mar 2025 01:08:13 +0000 (01:08 +0000)]
diffcore-rename: fix BUG when break detection and --follow used together
Prior to commit 9db2ac56168e (diffcore-rename: accelerate rename_dst
setup, 2020-12-11), the function add_rename_dst() resulted in quadratic
runtime since each call inserted the new entry into the array in sorted
order. The reason for the sorted order requirement was so that
locate_rename_dst(), used when break detection is turned on, could find
the appropriate entry in logarithmic time via bisection on string
comparisons. (It's better to be quadratic in moving pointers than
quadratic in string comparisons, so this made some sense.) However,
since break detection always sticks the broken pairs adjacent to each
other, that commit decided to simply append entries to rename_dst, and
record the mapping of (filename) -> (index within rename_dst) via a
strintmap. Doing this relied on the fact that when adding the source of
a broken pair via register_rename_src(), that the next item we'd process
was the other half of the same broken pair and would be added to
rename_dst via add_rename_dst(). This assumption was fine under break
detection alone, but the combination of break detection and
single_follow violated that assumption because of this code:
else if (options->single_follow &&
strcmp(options->single_follow, p->two->path))
continue; /* not interested */
which would end up skipping calling add_rename_dst() below that point.
Since I knew I was assuming that the dst pair of a break would always be
added right after the src pair of a break, I added a new BUG() directive
as part of that commit later on at time of use that would check my
assumptions held. That BUG() didn't trip for nearly 4 years...which
sadly meant I had long since forgotten the related details. Anyway...
When the dst half of a broken pair is skipped like this, it means that
not only could my recorded index be invalid (just past the end of the
array), it could also point to some unrelated dst that just happened to
be the next one added to the array. So, to fix this, we need to add a
little more safety around the checks for the recorded break_idx.
It turns out that making a testcase to trigger this is quite the
challenge. I actually added two testscases:
* One testcase which uses --follow incorrectly (it uses its single
pathspec to specifying something other than a single filename), and
which triggers the same bug reported-by Olaf. This triggers a
special case within locate_rename_dst() where idx evaluates to 0
and rename_dst is NULL, meaning that our return value of
&rename_dst[idx] happens to evaluate to NULL as well. This
addressing of an index into a NULL array hints at deeper problems,
which are raised in the next testcase...
* A second testcase which when run under valgrind shows that the code
actually depends upon unintialized memory, in particular the entry
just after the end of the rename_dst array.
In short, when the two rare options -B and --follow are used together,
fix the accidental find of the wrong dst entry (which would often be
uninitialized memory just past the end of the array, but also could
have just been a dst for an unrelated path if no dst was recorded for
the expected path). Do so by adding a little more care around checking
the recorded indices in break_idx.
Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Fri, 14 Mar 2025 22:00:42 +0000 (23:00 +0100)]
xdiff: avoid arithmetic overflow in xdl_get_hunk()
xdl_get_hunk() calculates the maximum number of common lines between two
changes that would fit into the same hunk for the given context options.
It involves doubling and addition and thus can overflow if the terms are
huge.
The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while
the type of the corresponding context and interhunkcontext in struct
diff_options is int. On many platforms longs are bigger that ints,
which prevents the overflow. On Windows they have the same range and
the overflow manifests as hunks that are split erroneously and lines
being repeated between them.
Fix the overflow by checking and not going beyond LONG_MAX. This allows
specifying a huge context line count and getting all lines of a changed
files in a single hunk, as expected.
Reported-by: Jason Cho <jason11choca@proton.me> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 13 Mar 2025 18:09:47 +0000 (14:09 -0400)]
builtin/pack-objects.c: freshen objects from existing cruft packs
Once an object is written into a cruft pack, we can only freshen it by
writing a new loose or packed copy of that object with a more recent
mtime.
Prior to 61568efa95 (builtin/pack-objects.c: support `--max-pack-size`
with `--cruft`, 2023-08-28), we typically had at most one cruft pack in
a repository at any given time. So freshening unreachable objects was
straightforward when already rewriting the cruft pack (and its *.mtimes
file).
But 61568efa95 changes things: 'pack-objects' now supports writing
multiple cruft packs when invoked with `--cruft` and the
`--max-pack-size` flag. Cruft packs are rewritten until they reach some
size threshold, at which point they are considered "frozen", and will
only be modified in a pruning GC, or if the threshold itself is
adjusted.
Prior to this patch, however, this process breaks down when we attempt
to freshen an object packed in an earlier cruft pack, and that cruft
pack is larger than the threshold and thus will survive the repack.
When this is the case, it is impossible to freshen objects in cruft
pack(s) when those cruft packs are larger than the threshold. This is
because we would avoid writing them in the new cruft pack entirely, for
a couple of reasons.
1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
over the packs we're going to retain (which are marked as kept
in-core by 'read_cruft_objects()').
This means that we will avoid enumerating additional packed copies
of objects found in any cruft packs which are larger than the given
size threshold. Thus there is no opportunity to call
'create_object_entry()' whatsoever.
2. We likewise will discard the loose copy (if one exists) of any
unreachable object packed in a cruft pack that is larger than the
threshold. Here our call path is 'add_unreachable_loose_objects()',
which uses the 'add_loose_object()' callback.
That function will eventually land us in 'want_object_in_pack()'
(via 'add_cruft_object_entry()'), and we'll discard the object as it
appears in one of the packs which we marked as kept in-core.
This means in effect that it is impossible to freshen an unreachable
object once it appears in a cruft pack larger than the given threshold.
Instead, we should pack an additional copy of an unreachable object we
want to freshen even if it appears in a cruft pack, provided that the
cruft copy has an mtime which is before the mtime of the copy we are
trying to pack/freshen. This is sub-optimal in the sense that it
requires keeping an additional copy of unreachable objects upon
freshening, but we don't have a better alternative without the ability
to make in-place modifications to existing *.mtimes files.
In order to implement this, we have to adjust the behavior of
'want_found_object()'. When 'pack-objects' is told that we're *not*
going to retain any cruft packs (i.e. the set of packs marked as kept
in-core does not contain a cruft pack), the behavior is unchanged.
But when there *is* at least one cruft pack that we're holding onto, it
is no longer sufficient to reject a copy of an object found in that
cruft pack for that reason alone. In this case, we only want to reject a
candidate object when copies of that object either:
- exists in a non-cruft pack that we are retaining, regardless of that
pack's mtime, or
- exists in a cruft pack with an mtime at least as recent as the copy
we are debating whether or not to pack, in which case freshening
would be redundant.
To do this, keep track of whether or not we have any cruft packs in our
in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
flag. When we end up in this new special case, we replace a call to
'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
objects when we have a copy in an existing cruft pack with at least as
recent an mtime as our candidate (in which case "freshening" would be
redundant).
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: reuse iterators when determining refname availability
When verifying whether refnames are available we have to verify whether
any reference exists that is nested under the current reference. E.g.
given a reference "refs/heads/foo", we must make sure that there is no
other reference "refs/heads/foo/*".
This check is performed using a ref iterator with the prefix set to the
nested reference namespace. Until now it used to not be possible to
reseek iterators, so we always had to reallocate the iterator for every
single reference we're about to check. This keeps us from reusing state
that the iterator may have and that may make it work more efficiently.
Refactor the logic to reseek iterators. This leads to a sizeable speedup
with the "reftable" backend:
Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 39.8 ms ± 0.9 ms [User: 29.7 ms, System: 9.8 ms]
Range (min … max): 38.4 ms … 42.0 ms 62 runs
Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 31.9 ms ± 1.1 ms [User: 27.0 ms, System: 4.5 ms]
Range (min … max): 29.8 ms … 34.3 ms 74 runs
Summary
update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
The "files" backend doesn't really show a huge impact:
Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 392.3 ms ± 7.1 ms [User: 59.7 ms, System: 328.8 ms]
Range (min … max): 384.6 ms … 404.5 ms 10 runs
Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 387.7 ms ± 7.4 ms [User: 54.6 ms, System: 329.6 ms]
Range (min … max): 377.0 ms … 397.7 ms 10 runs
Summary
update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
1.01 ± 0.03 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
This is mostly because it is way slower to begin with because it has to
create a separate file for each new reference, so the milliseconds we
shave off by reseeking the iterator doesn't really translate into a
significant relative improvement.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/iterator: implement seeking for files iterators
Implement seeking for "files" iterators. As we simply use a ref-cache
iterator under the hood the implementation is straight-forward. Note
that we do not implement seeking on reflog iterators, same as with the
"reftable" backend.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/iterator: implement seeking for packed-ref iterators
Implement seeking of `packed-ref` iterators. The implementation is again
straight forward, except that we cannot continue to use the prefix
iterator as we would otherwise not be able to reseek the iterator
anymore in case one first asks for an empty and then for a non-empty
prefix. Instead, we open-code the logic to in `advance()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/iterator: implement seeking for ref-cache iterators
Implement seeking of ref-cache iterators. This is done by splitting most
of the logic to seek iterators out of `cache_ref_iterator_begin()` and
putting it into `cache_ref_iterator_seek()` so that we can reuse the
logic.
Note that we cannot use the optimization anymore where we return an
empty ref iterator when there aren't any references, as otherwise it
wouldn't be possible to reseek the iterator to a different prefix that
may exist. This shouldn't be much of a performance concern though as we
now start to bail out early in case `advance()` sees that there are no
more directories to be searched.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/iterator: implement seeking for reftable iterators
Implement seeking of reftable iterators. As the low-level reftable
iterators already support seeking this change is straight-forward. Two
notes though:
- We do not support seeking on reflog iterators. It is unclear what
seeking would even look like in this context, as you typically would
want to seek to a specific entry in the reflog for a specific ref.
There is currently no use case for this, but if one arises in the
future, we can still implement seeking at that later point.
- We start to check whether `reftable_stack_init_ref_iterator()` is
successful.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/iterator: implement seeking for merged iterators
Implement seeking on merged iterators. The implementation is rather
straight forward, with the only exception that we must not deallocate
the underlying iterators once they have been exhausted.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/iterator: provide infrastructure to re-seek iterators
Reftable iterators need to be scrapped after they have either been
exhausted or aren't useful to the caller anymore, and it is explicitly
not possible to reuse them for iterations. But enabling for reuse of
iterators may allow us to tune them by reusing internal state of an
iterator. The reftable iterators for example can already be reused
internally, but we're not able to expose this to any users outside of
the reftable backend.
Introduce a new `.seek` function in the ref iterator vtable that allows
callers to seek an iterator multiple times. It is expected to be
functionally the same as calling `refs_ref_iterator_begin()` with a
different (or the same) prefix.
Note that it is not possible to adjust parameters other than the seeked
prefix for now, so exclude patterns, trimmed prefixes and flags will
remain unchanged. We do not have a usecase for changing these parameters
right now, but if we ever find one we can adapt accordingly.
Implement the callback for trivial cases. The other iterators will be
implemented in subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.
This lifecycle is somewhat unusual in the Git codebase and creates two
problems:
- Callsites need to be very careful about when exactly they call
`ref_iterator_abort()`, as calling the function is only valid when
the iterator itself still is. This leads to somewhat awkward calling
patterns in some situations.
- It is impossible to reuse iterators and re-seek them to a different
prefix. This feature isn't supported by any iterator implementation
except for the reftable iterators anyway, but if it was implemented
it would allow us to optimize cases where we need to search for
specific references repeatedly by reusing internal state.
Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.
Note that the `dir_iterator` is somewhat special because it does not
implement the `ref_iterator` interface, but is only used to implement
other iterators. Consequently, we have to provide `dir_iterator_free()`
instead of `dir_iterator_release()` as the allocated structure itself is
managed by the `dir_iterator` interfaces, as well, and not freed by
`ref_iterator_free()` like in all the other cases.
While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: stop re-verifying common prefixes for availability
One of the checks done by `refs_verify_refnames_available()` is whether
any of the prefixes of a reference already exists. For example, given a
reference "refs/heads/main", we'd check whether "refs/heads" or "refs"
already exist, and if so we'd abort the transaction.
When updating multiple references at once, this check is performed for
each of the references individually. Consequently, because references
tend to have common prefixes like "refs/heads/" or refs/tags/", we
evaluate the availability of these prefixes repeatedly. Naturally this
is a waste of compute, as the availability of those prefixes should in
general not change in the middle of a transaction. And if it would,
backends would notice at a later point in time.
Optimize this pattern by storing prefixes in a `strset` so that we can
trivially track those prefixes that we have already checked. This leads
to a significant speedup with the "reftable" backend when creating many
references that all share a common prefix:
Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 63.1 ms ± 1.8 ms [User: 41.0 ms, System: 21.6 ms]
Range (min … max): 60.6 ms … 69.5 ms 38 runs
Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 40.0 ms ± 1.3 ms [User: 29.3 ms, System: 10.3 ms]
Range (min … max): 38.1 ms … 47.3 ms 61 runs
Summary
update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
For the "files" backend we see an improvement, but a much smaller one:
Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 395.8 ms ± 5.3 ms [User: 63.6 ms, System: 330.5 ms]
Range (min … max): 387.0 ms … 404.6 ms 10 runs
Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 386.0 ms ± 4.0 ms [User: 51.5 ms, System: 332.8 ms]
Range (min … max): 380.8 ms … 392.6 ms 10 runs
Summary
update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
1.03 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
This change also leads to a modest improvement when writing references
with "initial" semantics, for example when migrating references. The
following benchmarks are migrating 1m references from the "reftable" to
the "files" backend:
Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
Time (mean ± σ): 836.6 ms ± 5.6 ms [User: 645.2 ms, System: 185.2 ms]
Range (min … max): 829.6 ms … 845.9 ms 10 runs
Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
Time (mean ± σ): 759.8 ms ± 5.1 ms [User: 574.9 ms, System: 178.9 ms]
Range (min … max): 753.1 ms … 768.8 ms 10 runs
Summary
migrate reftable:files (refcount = 1000000, revision = HEAD) ran
1.10 ± 0.01 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)
And vice versa:
Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
Time (mean ± σ): 870.7 ms ± 5.7 ms [User: 735.2 ms, System: 127.4 ms]
Range (min … max): 861.6 ms … 883.2 ms 10 runs
Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
Time (mean ± σ): 799.1 ms ± 8.5 ms [User: 661.1 ms, System: 130.2 ms]
Range (min … max): 787.5 ms … 812.6 ms 10 runs
Summary
migrate files:reftable (refcount = 1000000, revision = HEAD) ran
1.09 ± 0.01 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)
The impact here is significantly smaller given that we don't perform any
reference reads with "initial" semantics, so the speedup only comes from
us doing less string list lookups.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/files: batch refname availability checks for initial transactions
The "files" backend explicitly carves out special logic for its initial
transaction so that it can avoid writing out every single reference as
a loose reference. While the assumption is that there shouldn't be any
preexisting references, we still have to verify that none of the newly
written references will conflict with any other new reference in the
same transaction.
Refactor the initial transaction to use batched refname availability
checks. This does not yet have an effect on performance as we still call
`refs_verify_refname_available()` in a loop. But this will change in
subsequent commits and then impact performance when cloning a repository
with many references or when migrating references to the "files" format.
This will improve performance when cloning a repository with many
references or when migrating references from any format to the "files"
format once the availability checks have learned to optimize checks for
many references in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/files: batch refname availability checks for normal transactions
Same as the "reftable" backend that we have adapted in the preceding
commit to use batched refname availability checks we can also do so for
the "files" backend. Things are a bit more intricate here though, as we
call `refs_verify_refname_available()` in a set of different contexts:
1. `lock_raw_ref()` when it hits either EEXISTS or EISDIR when creating
a new reference, mostly to create a nice, user-readable error
message. This is nothing we have to care about too much, as we only
hit this code path at most once when we hit a conflict.
2. `lock_raw_ref()` when it _could_ create the lockfile to check
whether it is conflicting with any packed refs. In the general case,
this code path will be hit once for every (successful) reference
update.
3. `lock_ref_oid_basic()`, but it is only executed when copying or
renaming references or when expiring reflogs. It will thus not be
called in contexts where we have many references queued up.
4. `refs_refname_ref_available()`, but again only when copying or
renaming references. It is thus not interesting due to the same
reason as the previous case.
5. `files_transaction_finish_initial()`, which is only executed when
creating a new repository or migrating references.
So out of these, only (2) and (5) are viable candidates to use the
batched checks.
Adapt `lock_raw_ref()` accordingly by queueing up reference names that
need to be checked for availability and then checking them after we have
processed all updates. This check is done before we (optionally) lock
the `packed-refs` file, which is somewhat flawed because it means that
the `packed-refs` could still change after the availability check and
thus create an undetected conflict. But unconditionally locking the file
would change semantics that users are likely to rely on, so we keep the
current locking sequence intact, even if it's suboptmial.
The refactoring of `files_transaction_finish_initial()` will be done in
the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the "reftable" backend to batch the availability check for
refnames. This does not yet have an effect on performance as
`refs_verify_refnames_available()` effectively still performs the
availability check for each refname individually. But this will be
optimized in subsequent commits, where we learn to optimize some parts
of the logic when checking multiple refnames for availability.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: introduce function to batch refname availability checks
The `refs_verify_refname_available()` functions checks whether a
reference update can be committed or whether it would conflict with
either a prefix or suffix thereof. This function needs to be called once
per reference that one wants to check, which requires us to redo a
couple of checks every time the function is called.
Introduce a new function `refs_verify_refnames_available()` that does
the same, but for a list of references. For now, the new function uses
the exact same implementation, except that we loop through all refnames
provided by the caller. This will be tuned in subsequent commits.
The existing `refs_verify_refname_available()` function is reimplemented
on top of the new function. As such, the diff is best viewed with the
`--ignore-space-change option`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/update-ref: skip ambiguity checks when parsing object IDs
Most of the commands in git-update-ref(1) accept an old and/or new
object ID to update a specific reference to. These object IDs get parsed
via `repo_get_oid()`, which not only handles plain object IDs, but also
those that have a suffix like "~" or "^2". More surprisingly though, it
even knows to resolve arbitrary revisions, despite the fact that its
manpage does not mention this fact even once.
One consequence of this is that we also check for ambiguous references:
when parsing a full object ID where the DWIM mechanism would also cause
us to resolve it as a branch, we'd end up printing a warning. While this
check makes sense to have in general, it is arguably less useful in the
context of git-update-ref(1). This is due to multiple reasons:
- The manpage is explicitly structured around object IDs. So if we see
a fully blown object ID, the intent should be quite clear in
general.
- The command is part of our plumbing layer and not a tool that users
would generally use in interactive workflows. As such, the warning
will likely not be visible to anybody in the first place.
- Users can and should use the fully-qualified refname in case there
is any potential for ambiguity. And given that this command is part
of our plumbing layer, one should always try to be as defensive as
possible and use fully-qualified refnames.
Furthermore, this check can be quite expensive when updating lots of
references via `--stdin`, because we try to read multiple references per
object ID that we parse according to the DWIM rules. This effect can be
seen both with the "files" and "reftable" backend.
The issue is not unique to git-update-ref(1), but was also an issue in
git-cat-file(1), where it was addressed by disabling the ambiguity check
in 25fba78d36b (cat-file: disable object/refname ambiguity check for
batch mode, 2013-07-12).
Disable the warning in git-update-ref(1), which provides a significant
speedup with both backends. The user-visible outcome is unchanged even
when ambiguity exists, except that we don't show the warning anymore.
The following benchmark creates 10000 new references with a 100000
preexisting refs with the "files" backend:
Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 467.3 ms ± 5.1 ms [User: 100.0 ms, System: 365.1 ms]
Range (min … max): 461.9 ms … 479.3 ms 10 runs
Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 394.1 ms ± 5.8 ms [User: 63.3 ms, System: 327.6 ms]
Range (min … max): 384.9 ms … 405.7 ms 10 runs
Summary
update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
And with the "reftable" backend:
Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 146.9 ms ± 2.2 ms [User: 90.4 ms, System: 56.0 ms]
Range (min … max): 142.7 ms … 150.8 ms 19 runs
Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 63.2 ms ± 1.1 ms [User: 41.0 ms, System: 21.8 ms]
Range (min … max): 61.1 ms … 66.6 ms 41 runs
Summary
update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Note that the absolute improvement with both backends is roughly in the
same ballpark, but the relative improvement for the "reftable" backend
is more significant because writing the new table to disk is faster in
the first place.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-name: allow skipping ambiguity checks in `get_oid()` family
When reading an object ID via `get_oid_basic()` or any of its related
functions we perform a check whether the object ID is ambiguous, which
can be the case when a reference with the same name exists. While the
check is generally helpful, there are cases where it only adds to the
runtime overhead without providing much of a benefit.
Add a new flag that allows us to disable the check. The flag will be
used in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new function `repo_get_oid_with_flags()`. This function
behaves the same as `repo_get_oid()`, except that it takes an extra
`flags` parameter that it ends up passing to `get_oid_with_context()`.
This function will be used in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
meson: don't install git-pack-redundant(1) docs with breaking changes
When breaking changes are enabled we continue to install documentation
of the git-pack-redundant(1) command even though it is completely
disabled and thus inaccessible. Improve this by only installing the
documentation in case breaking changes aren't enabled.
Based-on-patch-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
meson: don't compile git-pack-redundant(1) with breaking changes
We continue to compile the git-pack-redundant(1) builtin with Meson when
breaking changes are enabled even though we ultimately don't expose this
command at all. This is mostly harmless, but given that the intent of
the build option is to be as close as possible to the state where the
breaking change has been fully implemented this isn't optimal either.
Improve the situation by not compiling the builtin when breaking changes
are enabled.
Based-on-patch-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
meson: define WITH_BREAKING_CHANGES when enabling breaking changes
While Meson already supports the `-Dbreaking_changes=true` option, it
only wires up the build option that propagates into the tests. The build
option is only used for our tests to enable the `WITH_BREAKING_CHANGES`
prerequisite though, and does not influence the code that is actually
being built.
The omission went unnoticed because we only have tests right now that
get disabled when breaking changes are enabled, but not the other way
round. In other words, we don't have any tests that verify that breaking
changes behave as expected.
Fix the build issue by setting the `WITH_BREAKING_CHANGES` preprocessor
macro when breaking changes are enabled. Note that the `libgit_c_args`
array is defined after the current spot where we handle the option, so
to not have multiple sites where we handle it we instead move it after
the array has been defined.
Based-on-patch-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 11 Mar 2025 21:25:05 +0000 (14:25 -0700)]
name-rev: remove "--stdin" support
As part of Git 3.0, remove the hidden synonym for "--annotate-stdin"
for real. As this does not change the fact that it used to be
called "--stdin" in older version of Git, keep that passage in the
documentation for "--annotate-stdin".
Junio C Hamano [Tue, 11 Mar 2025 21:25:04 +0000 (14:25 -0700)]
t6120: further modernize
There is absolutely no reason why a pattern given to grep to find
'warning: --stdin is deprecated' must be quoted within a pair of
single quotes, or the pattern to look for the literal string as ERE.
Quote the test body with a pair of single quotes like everybody
else, and quote the needle string in a pair of double quotes. Also
use test_grep instead of "grep -E".
Junio C Hamano [Tue, 11 Mar 2025 21:25:02 +0000 (14:25 -0700)]
t: introduce WITH_BREAKING_CHANGES prerequisite
Earlier c5bc9a7f (Makefile: wire up build option for deprecated
features, 2025-01-22) made an unfortunate decision to introduce the
WITHOUT_BREAKING_CHANGES prerequisite to perform tests that ensure
the historical behaviour that may be different from what we will
have in the future. It would inevitably invite double-negation when
we need to add tests to ensure the behaviour we want to have in the
future.
Introduce WITH_BREAKING_CHANGES prerequisite and replace the
existing uses of WITHOUT_BREAKING_CHANGES prerequisite. To catch
any future topics that add more uses of WITHOUT_BREAKING_CHANGES,
mark it as a removed prerequisite.
dir.h: remove duplicate forward declaration of struct repository
The `struct repository;` forward declaration appears twice in `dir.h`:
once at line 10 and again at line 46. This duplication is unnecessary
and likely unintentional.
Removing the second declaration has no impact on compilation, as verified
by a clean build.
Signed-off-by: Abhijeetsingh Meena <abhijeet040403@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 11 Mar 2025 21:25:01 +0000 (14:25 -0700)]
t: extend test_lazy_prereq
Allow test_lazy_prereq script to signal a programming error by
exiting with status 125 (like how bisect scripts do). This is used
to signal a deprecated-and-then-removed prerequisite that should
never be used in tests anymore.