builtin/cat-file: support "blob:none" objects filter
Implement support for the "blob:none" filter in git-cat-file(1), which
causes us to omit all blobs.
Note that this new filter requires us to read the object type via
`oid_object_info_extended()` in `batch_object_write()`. But as we try to
optimize away reading objects from the database the `data->info.typep`
pointer may not be set. We thus have to adapt the logic to conditionally
set the pointer in cases where the filter is given.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/cat-file: wire up an option to filter objects
In batch mode, git-cat-file(1) enumerates all objects and prints them
by iterating through both loose and packed objects. This works without
considering their reachability at all, and consequently most options to
filter objects as they exist in e.g. git-rev-list(1) are not applicable.
In some situations it may still be useful though to filter objects based
on properties that are inherent to them. This includes the object size
as well as its type.
Such a filter already exists in git-rev-list(1) with the `--filter=`
command line option. While this option supports a couple of filters that
are not applicable to our usecase, some of them are quite a neat fit.
Wire up the filter as an option for git-cat-file(1). This allows us to
reuse the same syntax as in git-rev-list(1) so that we don't have to
reinvent the wheel. For now, we die when any of the filter options has
been passed by the user, but they will be wired up in subsequent
commits.
Further note that the filters that we are about to introduce don't
significantly speed up the runtime of git-cat-file(1). While we can skip
emitting a lot of objects in case they are uninteresting to us, the
majority of time is spent reading the packfile, which is bottlenecked by
I/O and not the processor. This will change though once we start to make
use of bitmaps, which will allow us to skip reading the whole packfile.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/cat-file: introduce function to report object status
We have multiple callsites that report the status of an object, for
example when the objec tis missing or its name is ambiguous. We're about
to add a couple more such callsites to report on "excluded" objects.
Prepare for this by introducing a new function `report_object_status()`
that encapsulates the functionality.
Note that this function also flushes stdout, which is a requirement so
that request-response style batched modes can learn about the status
before proceeding to the next object. We already flush correctly at all
existing callsites, even though the flush in `batch_one_object()` only
comes after the switch statement. That flush is now redundant, and we
could in theory deduplicate it by moving it into all branches that don't
use `report_object_status()`. But that doesn't quite feel sensible:
- The duplicate flush should ultimately just be a no-op for us and
thus shouldn't impact performance significantly.
- By keeping the flush in `report_object_status()` we ensure that all
future callers get semantics correct.
So let's just be pragmatic and live with the duplicated flush.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/cat-file: rename variable that tracks usage
The usage strings for git-cat-file(1) that we pass to `parse_options()`
and `usage_msg_optf()` are stored in a variable called `usage`. This
variable shadows the declaration of `usage()`, which we'll want to use
in a subsequent commit.
Rename the variable to `builtin_catfile_usage`, which is in line with
how the variable is typically called in other builtins.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 06c92dafb8 (Makefile: allow specifying a SHA-1 for non-cryptographic
uses, 2024-09-26), support for unsafe SHA-1 is added. Add the unsafe
SHA-1 build info to `git version --build-info` and update corresponding
documentation.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the `--build-options` flag is used with git-version(1), additional
information about the built version of Git is printed. During build
time, different SHA implementations may be configured, but this
information is not included in the version info.
Add the SHA implementations Git is built with to the version info by
requiring each backend to define a SHA1_BACKEND or SHA256_BACKEND symbol
as appropriate and use the value in the printed build options.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 7 Apr 2025 21:23:17 +0000 (14:23 -0700)]
Merge branch 'cc/lop-remote'
Bugfix in newly introduced large-object-promisor remote support.
* cc/lop-remote:
promisor-remote: compare remote names case sensitively
promisor-remote: fix possible issue when no URL is advertised
promisor-remote: fix segfault when remote URL is missing
t5710: arrange to delete the client before cloning
Junio C Hamano [Mon, 7 Apr 2025 21:23:17 +0000 (14:23 -0700)]
Merge branch 'jc/name-rev-stdin'
Using "git name-rev --stdin" as an example, improve the framework to
prepare tests to pretend to be in the future where the breaking
changes have already happened.
* jc/name-rev-stdin:
name-rev: remove "--stdin" support
t6120: further modernize
t6120: avoid hiding "git" exit status
t: introduce WITH_BREAKING_CHANGES prerequisite
t: extend test_lazy_prereq
t: document test_lazy_prereq
Mark Levedahl [Tue, 1 Apr 2025 03:01:02 +0000 (23:01 -0400)]
gitk: limit PATH search to bare executable names
The path search overrides used by gitk on Windows are applied to any
executable whose name is not 'absolute', meaning that
[exec foo/bar ...]
will search each element of $PATH to find one with subdirectory foo
containing bar. But, per POSIX, and Tcl implementation on all platforms,
foo/bar is taken as $(pwd)/foo/bar, and is not searched on $PATH.
Fix this descrepency using the same approach applied to git-gui in
commit 3f71c97e. The key is that the executable name must have no path
component, indicated by [file split $exename] having array length 1.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Mark Levedahl [Tue, 1 Apr 2025 03:01:01 +0000 (23:01 -0400)]
gitk: _search_exe is no longer needed
The _search_exe variable allows specifying the suffix used for executables,
typically {} on unix, .exe on Windows. But, the override code is now
used only on Windows, so _search_exe is no longer needed. Eliminate it.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Mark Levedahl [Tue, 1 Apr 2025 03:01:00 +0000 (23:01 -0400)]
gitk: override $PATH search only on Windows
Commit 4cbe9e0e2 was written to address problems that result from Tcl's
documented behavior on Windows where the current working directory and a
number of Windows system directories are automatically prepended to
$PATH when searching for executables [1]. This basic Windows behavior
has resulted in more than one CVE against git for Windows:
CVE-2023-23618, CVE-2022-41953 are listed on the git for Windows github
website for the Tcl components of git (gitk, git-gui).
4cbe9e0e2 is intended to restrict the search to looking only in
directories given in $PATH and in the given order, which is exactly the
Tcl behavior documented to exist on non-Windows platforms [1]. Thus,
this change could have been written to affect only Windows, leaving
other platforms alone.
However, 4cbe9e0e2 implements the override for all platforms. This
includes specialized code for Cygwin, copied from git-gui prior to
commit 7145c654 on https://github.com/j6t/git-gui, so targets a
long retired Cygwin port of the Windows Tcl/Tk using Windows pathnames.
Since 2012, Cygwin uses a Unix/X11 port requiring Unix pathnames,
meaning 4cbe9e0e2 is incompatible. 4cbe9e0e2 also induces an infinite
recursion as _which now invokes the exec wrapper that invokes _which.
This is part of git v2.49.0, so gitk on Cygwin is broken in that
release.
Rather than fix the unnecessary override code for Cygwin, let's just
limit the override of exec/open to Windows, leaving all other platforms
using their native exec/open as they did prior to 4cbe9e0e2. This patch
wraps the override code in an "if {[is_Windows]} { ... }" block while
removing the non-Windows code added in 4cbe9e0e2.
[1] see https://www.tcl-lang.org/man/tcl8.6/TclCmd/exec.htm
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
brian m. carlson [Mon, 31 Mar 2025 21:53:58 +0000 (21:53 +0000)]
t5605: fix test for cloning from a different user
This test currently passes, but for the wrong reason. The
repo_is_hardlinked function expects a .git directory or a bare
repository and currently fails because it cannot find the objects
directory.
One solution is to use the --bare argument, but then --show-toplevel
won't work. We could change that, but there's no need to, so just add
the missing .git directory.
In addition, use the built-in negation functionality of test_grep to
avoid mishandling real errors (such as a missing file) and, as a final
fix, remove the extra newline.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 1 Apr 2025 10:05:13 +0000 (19:05 +0900)]
Merge branch 'ps/reftable-sans-compat-util' into ps/reftable-api-revamp
* ps/reftable-sans-compat-util:
Makefile: skip reftable library for Coccinelle
reftable: decouple from Git codebase by pulling in "compat/posix.h"
git-compat-util.h: split out POSIX-emulating bits
compat/mingw: split out POSIX-related bits
reftable/basics: introduce `REFTABLE_UNUSED` annotation
reftable/basics: stop using `SWAP()` macro
reftable/stack: stop using `sleep_millisec()`
reftable/system: introduce `reftable_rand()`
reftable/reader: stop using `ARRAY_SIZE()` macro
reftable/basics: provide wrappers for big endian conversion
reftable/basics: stop using `st_mult()` in array allocators
reftable: stop using `BUG()` in trivial cases
reftable/record: don't `BUG()` in `reftable_record_cmp()`
reftable/record: stop using `BUG()` in `reftable_record_init()`
reftable/record: stop using `COPY_ARRAY()`
reftable/blocksource: stop using `xmmap()`
reftable/stack: stop using `write_in_full()`
reftable/stack: stop using `read_in_full()`
Add a new builtin driver for generic INI files (e. g. the gitconfig
files), where:
- the funcname regular expression matches section names, i. e. any
string between brackets at the beginning of the line, with or without
indentation;
- word_regex matches any word with one or more non-whitespace
characters without checking if it is a valid variable name or value.
Also add tests for the new userdiff driver. These files define sections
and subsections, with and without indentation.
Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: D. Ben Knoble <ben.knoble@gmail.com> Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Matt Hunter [Sun, 30 Mar 2025 11:24:06 +0000 (07:24 -0400)]
revision: fix --left/right-only use with unrelated histories
This is a similar fix as 023756f4eb (revision walker: --cherry-pick is a
limited operation), but for the --left-only and --right-only options.
When computing a symmetric difference between two unrelated histories,
no suitable merge base exists, and so no boundary commit is flagged as
UNINTERESTING. Previously, we relied on the presence of such boundary
to trigger limiting and thus consideration of either "revs->left_only"
or "revs->right_only".
A number of other entries in the option parser have started including
overrides for "revs->limited = 1". Do the same for these options.
Signed-off-by: Matt Hunter <m@lfurio.us> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Arnav Bhate [Sun, 30 Mar 2025 17:45:06 +0000 (23:15 +0530)]
pathspec: fix sign comparison warnings
There are multiple places, especially in loops, where a signed and an
unsigned data type are compared. Git uses a mix of signed and unsigned
types to store lengths of arrays. This sometimes leads to using a signed
index for an array whose length is stored in an unsigned variable or
vice versa. In some cases, where both signed and unsigned data types
have been used to store lengths of arrays in the same function, only
one variable was used to iterate over both types.
Replace signed data types with unsigned data types and vice versa
wherever necessary. Where both types of iterators are required, move
the declaration inside the for loop. In cases where this is not
possible, add appropriate cast.
Remove #define DISABLE_SIGN_COMPARE_WARNINGS.
Signed-off-by: Arnav Bhate <bhatearnav@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ci: use Visual Studio for win+meson job on GitHub Workflows
In 7304bd2bc39 (ci: wire up Visual Studio build with Meson, 2025-01-22)
we have wired up a new CI job that builds and tests Git with Meson on a
Windows machine. The expectation here was that this build uses the
Visual Studio toolchain to do so, and that is true on GitLab CI. But on
GitHub Workflows it is not the case because we've got GCC in our PATH,
and thus Meson favors that compiler toolchain over Visual Studio's.
Fix this by explicitly asking Meson to use the Visual Studio toolchain.
While this is only really required for GitHub Workflows, let's also pass
the flag in GitLab CI so that we don't implicitly assume the toolchain
that Meson is going to pick.
Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Almost all of the tools we discover during the build process need to be
native programs. There are only a handful of exceptions, which typically
are programs whose paths we need to embed into the resulting executable
so that they can be found on the target system when Git executes. While
this distinction typically doesn't matter, it does start to matter when
considering cross-compilation where the build and target machines are
different.
Meson supports cross-compilation via so-called machine files. These
machine files allow the user to override parameters for the build
machine, but also for the target machine when cross-compiling. Part of
the machine file is a section that allows the user to override the
location where binaries are to be found in the target system. The
following machine file would for example override the path of the POSIX
shell:
[binaries]
sh = '/usr/xpg4/bin/sh'
It can be handed over to Meson via `meson setup --cross-file`.
We do not handle this correctly right now though because we don't know
to distinguish binaries for the build and target hosts at all. Address
this by explicitly passing the `native:` parameter to `find_program()`:
- When set to `true`, we get binaries discovered on the build host.
- When set to `false`, we get either the path specified in the
machine file. Or, if no machine file exists or it doesn't specify
the binary path, then we fall back to the binary discovered on the
build host.
As mentioned, only a handful of binaries are not native: only the system
shell, Python and Perl need to be treated specially here.
Reported-by: Peter Seiderer <ps.report@gmx.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both the "netrc" credential helper and git-subtree(1) from "contrib/"
carry a couple of tests with them. These tests get wired up in Meson
unconditionally even in the case where `-Dtests=false`. As those tests
depend on the `test_enviroment` variable, which only gets defined in
case `-Dtests=true`, the result is an error:
In 19d8fe7da65 (Makefile: extract script to generate gitweb.js,
2024-12-06) we have extracted the logic to build "gitweb.js" into a
separate script. As part of that the rules that builds the script
has gained a new dependency on that script.
This refactoring is broken though because we use "$^" to determine
the set of JavaScript files that need to be concatenated, and this
implicit variable now also contains the build script itself. As a
result, the build script ends up ni the generated "gitweb.js" file,
which is wrong.
Fix the issue by filtering out non-JavaScript files.
Based-on-patch-by: Thorsten Glaser <tg@debian.org> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "curl" option controls whether or not a couple of features that
depend on curl shall be included. Most importantly, these features
include the HTTP remote helpers, which are rather quintessential for a
well-functioning Git installation. So while the dependency can in theory
be dropped, most users wouldn't consider the resulting installation to
be fully functional.
The "curl" option is defined as a feature, which means that it can be
"enabled", "disabled" or "auto", which has the effect that the feature
will be enabled if the dependency itself has been found. While most of
the other features have "auto" as default value, the "curl" option is
set to "enabled" by default due to it being so important. Consequently,
autoconfiguration of Git will fail by default if the library cannot be
found.
There is a bug though with how we handle the option in case the user
overrides the feature with `meson setup -Dcurl=auto`: while we will try
to find the library in that case, we won't ever use it because we later
on check for `get_option('curl').enabled()` when deciding whether or not
we want to build dependent sources. But `enabled()` only returns true if
the option has the value "enabled", for "auto" it will return false.
Fix the issue by instead checking for `curl.found()`, which is only true
if the library has been found. And as we only try to find the library
when `get_option('curl')` returns "true" or "auto" this is exactly what
we want.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Arnav Bhate [Sat, 29 Mar 2025 06:03:14 +0000 (11:33 +0530)]
rm: fix sign comparison warnings
There are multiple places in loops, where a signed and an
unsigned data type are compared. Git uses a mix of signed and unsigned
types to store lengths of arrays. This sometimes leads to using a signed
index for an array whose length is stored in an unsigned variable or
vice versa.
get_ours_cache_pos is a special case where i, though derived from a
signed variable is never negative. Move this part to the caller side
and make i an unsigned argument of the function. Rename i to
pos to make it descriptive, now that it is a function argument.
Replace signed data types with unsigned data types and vice versa
wherever necessary. Where both signed and unsigned data types have been
used, define a new variable in the scope of the for loop for use as the
iterator. Remove #define DISABLE_SIGN_COMPARE_WARNINGS.
Signed-off-by: Arnav Bhate <bhatearnav@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sat, 29 Mar 2025 07:39:10 +0000 (16:39 +0900)]
Merge branch 'en/random-cleanups'
Miscellaneous code clean-ups.
* en/random-cleanups:
merge-ort: remove extraneous word in comment
merge-ort: fix accidental strset<->strintmap
t7615: be more explicit about diff algorithm used
t6423: fix a comment that accidentally reversed two commits
stash: remove merge-recursive.h include
Junio C Hamano [Sat, 29 Mar 2025 07:39:10 +0000 (16:39 +0900)]
Merge branch 'jk/use-wunreachable-code-for-devs'
Enable -Wunreachable-code for developer builds.
* jk/use-wunreachable-code-for-devs:
config.mak.dev: enable -Wunreachable-code
git-compat-util: add NOT_CONSTANT macro and use it in atfork_prepare()
run-command: use errno to check for sigfillset() error
Junio C Hamano [Sat, 29 Mar 2025 07:39:08 +0000 (16:39 +0900)]
Merge branch 'jk/fetch-ref-prefix-cleanup'
In protocol v2 where the refs advertisement is constrained, we try
to tell the server side not to limit the advertisement when there
is no specific need to, which has been the source of confusion and
recent bugs. Revamp the logic to simplify.
* jk/fetch-ref-prefix-cleanup:
fetch: use ref prefix list to skip ls-refs
fetch: avoid ls-refs only to ask for HEAD symref update
fetch: stop protecting additions to ref-prefix list
fetch: ask server to advertise HEAD for config-less fetch
refspec_ref_prefixes(): clean up refspec_item logic
t5516: beef up exact-oid ref prefixes test
t5516: drop NEEDSWORK about v2 reachability behavior
t5516: prefer "oid" to "sha1" in some test titles
t5702: fix typo in test name
First step of deprecating and removing merge-recursive.
* en/merge-ort-prepare-to-remove-recursive:
am: switch from merge_recursive_generic() to merge_ort_generic()
merge-ort: fix merge.directoryRenames=false
t3650: document bug when directory renames are turned off
merge-ort: support having merge verbosity be set to 0
merge-ort: allow rename detection to be disabled
merge-ort: add new merge_ort_generic() function
Junio C Hamano [Sat, 29 Mar 2025 07:39:06 +0000 (16:39 +0900)]
Merge branch 'cc/signed-fast-export-import'
"git fast-export | git fast-import" learns to deal with commit and
tag objects with embedded signatures a bit better.
* cc/signed-fast-export-import:
fast-export, fast-import: add support for signed-commits
fast-export: do not modify memory from get_commit_buffer
git-fast-export.adoc: clarify why 'verbatim' may not be a good idea
fast-export: rename --signed-tags='warn' to 'warn-verbatim'
fast-export: fix missing whitespace after switch
git-fast-import.adoc: add missing LF in the BNF
Philippe Blain [Fri, 28 Mar 2025 17:07:49 +0000 (17:07 +0000)]
p9210: fix 'scalar clone' when running from a detached HEAD
In p9210-scalar-clone.sh, we test using 'scalar clone' to clone
$GIT_PERF_LARGE_REPO (copied locally as 'to-clone'), which defaults to
the git.git checkout we are running the test from.
When --branch is not specified (as in this test), 'scalar clone' tries
to get the default branch of the remote repository by parsing the output
of 'git ls-remote --symref $URL HEAD', as implemented in
scalar.c:remote_default_branch. When the git.git checkout we are running
the test from is in detached HEAD, this fails and we fall back to using
the name of the currently checked out branch in the newly initialized
repository, which in this case is the value returned earlier in
cmd_clone by repo_default_branch_name.
We then invoke 'git checkout -t origin/$branch', with $branch being the
name we got from remote_default_branch. This invocation fails if
'$branch' does not exist as a branch in the current git.git checkout.
Fix this by creating a local branch in 'to-clone' in the setup test
"enable server-side partial clone", making sure to use '-B' in case a
branch named 'test-branch' already exists.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Fri, 28 Mar 2025 17:07:48 +0000 (17:07 +0000)]
p7821: fix test_perf invocation for prereqs
Since 5dccd9155f (t/perf: add iteration setup mechanism to perf-lib,
2022-04-04), perf tests need to declare their prerequisites with
'--prereq', after the test title. p7821 was forgotten in that commit,
such that running that test on a machine where the PCRE prereq is not
satisfied aborts the test with:
error: bug in the test script: test_wrapper_ needs 2 positional parameters
Fix this by correcting the two 'test_perf' invocations in that test
suite.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Fri, 28 Mar 2025 14:45:40 +0000 (14:45 +0000)]
merge-file doc: set conflict-marker-size attribute
When committing a conflict resolution for a merge containing 1f010d6bdf7 (doc: use .adoc extension for AsciiDoc files, 2025-01-20)
my pre-commit hook failed because "git diff --check" thought there was
a left over conflict marker in "merge-file.adoc". Fix this by setting
the "conflict-marker-size" attribute as we do for all the other
documentation files that contain example conflict markers.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sat, 29 Mar 2025 01:10:25 +0000 (10:10 +0900)]
Merge branch 'tb/incremental-midx-part-2' into ps/cat-file-filter-batch
* tb/incremental-midx-part-2:
midx: implement writing incremental MIDX bitmaps
pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
pack-bitmap.c: keep track of each layer's type bitmaps
ewah: implement `struct ewah_or_iterator`
pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
pack-bitmap.c: compute disk-usage with incremental MIDXs
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
pack-bitmap.c: open and store incremental bitmap layers
pack-revindex: prepare for incremental MIDX bitmaps
Documentation: describe incremental MIDX bitmaps
Documentation: remove a "future work" item from the MIDX docs
read-cache: check range before dereferencing an array element
Before accessing an array element at a given index, we should make sure
that the index is within the desired bounds, otherwise it makes little
sense to access the array element in the first place.
In this instance, testing whether `ce->name[common]` is the trailing NUL
byte is technically different from testing whether `common` is within
the bounds of `previous_name`. It is also redundant, as the range-check
guarantees that `previous_name->buf[common]` cannot be NUL and therefore
the condition `ce->name[common] == previous_name->buf[common]` would not
be met if `ce->name[common]` evaluated to NUL.
However, in the interest of reducing the cognitive load to reason about
the correctness of this loop (so that I can focus on interesting
projects again), I'll simply move the range-check to the beginning of
the loop condition and keep the redundant NUL check.
This acquiesces CodeQL's `cpp/offset-use-before-range-check` rule.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
detect-compiler: detect clang even if it found CUDA
In my setup, clang finds `/usr/local/cuda` and hence the output of
`clang -v` ends with this line:
Found CUDA installation: /usr/local/cuda, version
This confuses the `detect-compiler` script because it matches _all_
lines that contain the needle "version" surrounded by spaces. As a
consequence, the `get_family` function returns two lines: "Ubuntu clang"
and above-mentioned line, which the `case` statement does not handle
well and hence reports "unknown compiler family" instead of the expected
set of "clang14", "clang13", ..., "clang1" output.
Let's unconfuse the script by letting it parse the first matching line
and ignore the rest.
Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When compiling Git using `clang`, the `-Wcomma` option can be used to
warn about code using the comma operator (because it is typically
unintentional and wants to use the semicolon instead).
Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
compat/regex: explicitly mark intentional use of the comma operator
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. That is why the
`-Wcomma` option of clang was introduced: To identify unintentional uses
of the comma operator.
In the `compat/regex/` code, the comma operator is used twice, once to
avoid surrounding two conditional statements with curly brackets, the
other one to increment two counters simultaneously in a `do ... while`
condition.
The first one is replaced with a proper conditional block, surrounded by
curly brackets.
The second one would be harder to replace because the loop contains two
`continue`s. Therefore, the second one is marked as intentional by
casting the value-to-discard to `void`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. That is why the
`-Wcomma` option of clang was introduced: To identify unintentional uses
of the comma operator.
In this instance, the usage is intentional because it allows storing the
value of the current character as `prev_ch` before making the next
character the current one, all of which happens in the loop condition
that lets the loop stop at a closing bracket.
However, it is hard to read.
The chosen alternative to using the comma operator is to move those
assignments from the condition into the loop body; In this particular
case that requires special care because the loop body contains a
`continue` for the case where a character class is found that starts
with `[:` but does not end in `:]` (and the assignments should occur
even when that code path is taken), which needs to be turned into a
`goto`.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. That is why the
`-Wcomma` option of clang was introduced: To identify unintentional uses
of the comma operator.
Intentional uses include situations where one wants to avoid curly
brackets around multiple statements that need to be guarded by a
condition. This is the case here, as the repetitive nature of the
statements is easier to see for a human reader this way. At least in my
opinion.
However, opinions on this differ wildly, take 10 people and you have 10
different preferences.
On the Git mailing list, it seems that the consensus is to use the long
form instead, so let's do just that.
Suggested-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdiff: avoid using the comma operator unnecessarily
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. While the code in
this patch used the comma operator intentionally (to avoid curly
brackets around two statements, each, that want to be guarded by a
condition), it is better to surround it with curly brackets and to use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
clar: avoid using the comma operator unnecessarily
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. In this instance, it
makes the code harder to read than necessary, too. Better use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
kwset: avoid using the comma operator unnecessarily
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. Better use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
rebase: avoid using the comma operator unnecessarily
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. Better use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
remote-curl: avoid using the comma operator unnecessarily
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. Better use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 26 Mar 2025 07:26:10 +0000 (16:26 +0900)]
Merge branch 'en/merge-process-renames-crash-fix'
The merge-recursive and merge-ort machinery crashed in corner cases
when certain renames are involved.
* en/merge-process-renames-crash-fix:
merge-ort: fix slightly overzealous assertion for rename-to-self
t6423: add a testcase causing a failed assertion in process_renames
Junio C Hamano [Wed, 26 Mar 2025 07:26:10 +0000 (16:26 +0900)]
Merge branch 'ua/some-builtins-wo-the-repository'
A handful of built-in command implementations have been rewritten
to use the repository instance supplied by git.c:run_builtin(), its
caller.
* ua/some-builtins-wo-the-repository:
builtin/checkout-index: stop using `the_repository`
builtin/for-each-ref: stop using `the_repository`
builtin/ls-files: stop using `the_repository`
builtin/pack-refs: stop using `the_repository`
builtin/send-pack: stop using `the_repository`
builtin/verify-commit: stop using `the_repository`
builtin/verify-tag: stop using `the_repository`
config: teach repo_config to allow `repo` to be NULL
Junio C Hamano [Wed, 26 Mar 2025 07:26:10 +0000 (16:26 +0900)]
Merge branch 'tb/refs-exclude-fixes'
The refname exclusion logic in the packed-ref backend has been
broken for some time, which confused upload-pack to advertise
different set of refs. This has been corrected.
Junio C Hamano [Wed, 26 Mar 2025 07:26:09 +0000 (16:26 +0900)]
Merge branch 'sj/ref-consistency-checks-more'
"git fsck" becomes more careful when checking the refs.
* sj/ref-consistency-checks-more:
builtin/fsck: add `git refs verify` child process
packed-backend: check whether the "packed-refs" is sorted
packed-backend: add "packed-refs" entry consistency check
packed-backend: check whether the refname contains NUL characters
packed-backend: add "packed-refs" header consistency check
packed-backend: check if header starts with "# pack-refs with: "
packed-backend: check whether the "packed-refs" is regular file
builtin/refs: get worktrees without reading head information
t0602: use subshell to ensure working directory unchanged
By default, whatever ends up been written to the "MERGED" window will
become the file which conflict we are resolving.
However, it is possible to use the "@" symbol to specify a different
one. For example, if we use this slightly different version of the
previously used string:
"(LOCAL,BASE,@REMOTE)/MERGED"
...then the user should proceed to edit the contents of the top right
window (instead of the bottom window) as *that* is what will become the
conflicts free file once vim is closed.
Before this commit, the "@" marker worked for all targets *except* for
"REMOTE". In other words, these worked as expected:
Reported-by: kawarimidoll <kawarimidoll+git@gmail.com> Suggested-by: D. Ben Knoble <ben.knoble@gmail.com> Signed-off-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eli Schwartz [Tue, 25 Mar 2025 20:08:48 +0000 (16:08 -0400)]
meson: disable coccinelle configuration when building from a tarball
Wiring up coccinelle in the build, depends on running git commands to
get the list of files to operate on. Reasonable, for a feature mainly
used by people developing on git. If building git itself from a tarball
distribution of git's own source code, one likely does not need to run
coccinelle.
But running those git commands failed, and caused the build to error
out, if `spatch` was installed -- because the build assumed that its
presence indicated a desire to use it on this source tree. Instead, we
can expand the conditional to check for both `spatch` and the `.git`
file or directory.
Meson's `opt.require()` method allows us to add a prerequisite for the
feature option. If the prerequisite fails, then the option either:
- converts autodetection to disabled
- emits an informative error if the feature was set to enabled:
```
ERROR: Feature coccinelle cannot be enabled: coccinelle can only be run from a git checkout
```
Signed-off-by: Eli Schwartz <eschwartz@gentoo.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
D. Ben Knoble [Mon, 24 Mar 2025 20:52:23 +0000 (16:52 -0400)]
vimdiff: clarify the sigil used for marking the buffer to save
The original documentation from 7b5cf8be18 (vimdiff: add tool
documentation, 2022-03-30) mistakenly described the marker as an
asterisk, which is the character "*". The code and examples have always
looked for an arobase ("@").
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Acked-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Tue, 25 Mar 2025 00:51:48 +0000 (19:51 -0500)]
advice: allow disabling default branch name advice
The default branch name advice message is displayed when
`repo_default_branch_name()` is invoked and the `init.defaultBranch`
config is not set. In this scenario, the advice message is always shown
even if the `--no-advice` option is used.
Adapt `repo_default_branch_name()` to allow the default branch name
advice message to be disabled with the `--no-advice` option and
corresponding configuration.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 199f44cb2ead (builtin/clone: allow remote helpers to detect repo,
2024-02-27), clones started partially initializing the refdb before
executing the remote helpers by creating a HEAD file and "refs/"
directory. This has resulted in some scenarios where git-clone(1) now
prints the default branch name advice message where it previously did
not.
A side-effect of the HEAD file already existing, is that computation of
the default branch name is handled later in execution. This matters
because prior to 97abaab5f6 (refs: drop `git_default_branch_name()`,
2024-05-17), the default branch value would be computed during its first
execution and cached. Subsequent invocations would simply return the
cached value. Since the next `git_default_branch_name()` call site,
which is invoked through `guess_remote_head()`, is not configured to
suppress the advice message, computing the default branch name results
in the advice message being printed.
Configure `guess_remote_head()` to suppress the advice message,
restoring the previous behavior.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Tue, 25 Mar 2025 00:51:46 +0000 (19:51 -0500)]
remote: allow `guess_remote_head()` to suppress advice
The `repo_default_branch_name()` invoked through `guess_remote_head()`
is configured to always display the default branch advice message.
Adapt `guess_remote_head()` to accept flags and convert the `all`
parameter to a flag. Add the `REMOTE_GUESS_HEAD_QUIET` flag to to enable
suppression of advice messages. Call sites are updated accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tuomas Ahola [Mon, 24 Mar 2025 21:47:03 +0000 (23:47 +0200)]
bulk-checkin: fix sign compare warnings
In file bulk-checkin.c, three warnings are emitted by
"-Wsign-compare", two of which are caused by trivial loop iterator
type mismatches. For the third case, the type of `rsize` from
can be changed to size_t as both options of the ternary expression are
unsigned and the signedness of the variable isn't really needed
anywhere.
To prevent `read_result != rsize` making a clash, it is to be noted
that `read_result` is checked not to hold negative values. Therefore
casting the variable to size_t is a safe operation and enough to
remove the sign-compare warning.
Fix issues accordingly, and remove `DISABLE_SIGN_COMPARE_WARNINGS` to
enable "-Wsign-compare" for the file.
Signed-off-by: Tuomas Ahola <taahol@utu.fi> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a bug to obtain the peer certificate without verifying it.
Having said that, from my reading of
https://www.openssl.org/docs/man1.1.1/man3/SSL_set_verify.html, it would
appear that Git is saved by the fact that it calls
`SSL_CTX_set_verify(ctx, SSL_VERIFY_PEER, NULL)` already early on.
In other words, that `SSL_VERIFY_PEER` combined with the `NULL`
parameter (i.e. no overridden callback) would _already_ verify the peer
certificate. The fact that we later call `SSL_get_peer_certificate()`
is mistaken by CodeQL to mean that that peer certificate still needs to
be verified, but that had already happened at that point.
Nevertheless, it is better to verify the peer certificate explicitly
than to rely on some side effect that is really hard to reason about
(and that took me more than one business day to analyze fully). It also
makes it easier for static analyzers to validate the correctness of the
code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The check for dubious ownership has one particular quirk on Windows: if
running as an administrator, files owned by the Administrators _group_
are considered owned by the user.
The rationale for that is: When running in elevated mode, Git creates
files that aren't owned by the individual user but by the Administrators
group.
There is yet another quirk, though: The check I introduced to determine
whether the current user is an administrator uses the
`CheckTokenMembership()` function with the current process token. And
that check only succeeds when running in elevated mode!
Let's be a bit more lenient here and look harder whether the current
user is an administrator. We do this by looking for a so-called "linked
token". That token exists when administrators run in non-elevated mode,
and can be used to create a new process in elevated mode. And feeding
_that_ token to the `CheckTokenMembership()` function succeeds!
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 24 Mar 2025 00:51:51 +0000 (00:51 +0000)]
maintenance: add loose-objects.batchSize config
The 'loose-objects' task of 'git maintenance run' first deletes loose
objects that exit within packfiles and then collects loose objects into
a packfile. This second step uses an implicit limit of fifty thousand
that cannot be modified by users.
Add a new config option that allows this limit to be adjusted or ignored
entirely.
While creating tests for this option, I noticed that actually there was
an off-by-one error due to the strict comparison in the limit check. I
considered making the limit check turn true on equality, but instead I
thought to use INT_MAX as a "no limit" barrier which should mean it's
never possible to hit the limit. Thus, a new decrement to the limit is
provided if the value is positive. (The restriction to positive values
is to avoid underflow if INT_MIN is configured.)
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 24 Mar 2025 00:51:50 +0000 (00:51 +0000)]
maintenance: force progress/no-quiet to children
The --no-quiet option for 'git maintenance run' is supposed to indicate
that progress should happen even while ignoring the value of isatty(2).
However, Git implicitly asks child processes to check isatty(2) since
these arguments are not passed through.
The pass through of --no-quiet will be useful in a test in the next
change.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
David Mandelberg [Sun, 23 Mar 2025 21:06:53 +0000 (17:06 -0400)]
completion: fix bugs with slashes in remote names
Previously, some calls to for-each-ref passed fixed numbers of path
components to strip from refs, assuming that remote names had no slashes
in them. This made completions like:
René Scharfe [Sun, 23 Mar 2025 09:53:21 +0000 (10:53 +0100)]
commit: move clear_commit_marks_many() loop body to clear_commit_marks()
clear_commit_marks_many() clears multiple commits one by one. Move the
code for handling a single commit to clear_commit_marks() and call it
instead of the other way around, to simplify the code.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:57:08 +0000 (13:57 -0400)]
midx: implement writing incremental MIDX bitmaps
Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.
The details for doing so are mostly straightforward. The main changes
are as follows:
- find_object_pos() now makes use of an extra MIDX parameter which is
used to locate the bit positions of objects which are from previous
layers (and thus do not exist in the current layer's pack_order
field).
(Note also that the pack_order field is moved into struct
write_midx_context to further simplify the callers for
write_midx_bitmap()).
- bitmap_writer_build_type_index() first determines how many objects
precede the current bitmap layer and offsets the bits it sets in
each respective type-level bitmap by that amount so they can be OR'd
together.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:57:05 +0000 (13:57 -0400)]
pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
Now that we have initialized arrays for each bitmap layer's type bitmaps
in the previous commit, adjust existing callers to use them in
preparation for multi-layered bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:57:02 +0000 (13:57 -0400)]
pack-bitmap.c: keep track of each layer's type bitmaps
Prepare for reading the type-level bitmaps from previous bitmap layers
by maintaining an array for each type, where each element in that type's
array corresponds to one layer's bitmap for that type.
These fields will be used in a later commit to instantiate the 'struct
ewah_or_iterator' for each type.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:59 +0000 (13:56 -0400)]
ewah: implement `struct ewah_or_iterator`
While individual bitmap layers store different commit, type-level, and
pseudo-merge bitmaps, only the top-most layer is used to compute
reachability traversals.
Many functions which implement the aforementioned traversal rely on
enumerating the results according to the type-level bitmaps, and so
would benefit from a conceptual type-level bitmap that spans multiple
layers.
Implement `struct ewah_or_iterator` which is capable of enumerating
multiple EWAH bitmaps at once, and OR-ing the results together. When
initialized with, for example, all of the commit type bitmaps from each
layer, callers can pretend as if they are enumerating a large type-level
bitmap which contains the commits from *all* bitmap layers.
There are a couple of alternative approaches which were considered:
- Decompress each EWAH bitmap and OR them together, enumerating a
single (non-EWAH) bitmap. This would work, but has the disadvantage
of decompressing a potentially large bitmap, which may not be
necessary if the caller does not wish to read all of it.
- Recursively call bitmap internal functions, reusing the "result" and
"haves" bitmap from the top-most layer. This approach resembles the
original implementation of this feature, but is inefficient in that
it both (a) requires significant refactoring to implement, and (b)
enumerates large sections of later bitmaps which are all zeros (as
they pertain to objects in earlier layers).
(b) is not so bad in and of itself, but can cause significant
slow-downs when combined with expensive loop bodies.
This approach (enumerating an OR'd together version of all of the
type-level bitmaps from each layer) produces a significantly more
straightforward implementation with significantly less refactoring
required in order to make it work.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:56 +0000 (13:56 -0400)]
pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
Prepare for using pseudo-merges with incremental MIDX bitmaps by
attempting to apply pseudo-merges from each layer when encountering a
given commit during a walk.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:46 +0000 (13:56 -0400)]
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
Implement support for the special `--test-bitmap` mode of `git rev-list`
when using incremental MIDXs.
The bitmap_test_data structure is extended to contain a "base" pointer
that mirrors the structure of the bitmap chain that it is being used to
test.
When we find a commit to test, we first chase down the ->base pointer to
find the appropriate bitmap_test_data for the bitmap layer that the
given commit is contained within, and then perform the test on that
bitmap.
In order to implement this, light modifications are made to
bitmap_for_commit() to reimplement it in terms of a new function,
find_bitmap_for_commit(), which fills out a pointer which indicates the
bitmap layer which contains the given commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:43 +0000 (13:56 -0400)]
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
In a similar fashion as previous commits in the first phase of
incremental MIDXs, enumerate not just the packs in the current
incremental MIDX layer, but previous ones as well.
Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
single pack from a MIDX, use the oldest layer's preferred pack as it is
likely to contain the largest number of reusable sections.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:40 +0000 (13:56 -0400)]
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
Since we may ask for a pack_id that is in an earlier MIDX layer relative
to the one corresponding to our bitmap, use nth_midxed_pack() instead of
accessing the ->packs array directly.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:37 +0000 (13:56 -0400)]
pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
EWAH-compressed bitmap corresponding to some given commit object.
Teach this function about incremental MIDX bitmaps by teaching it to
recur on earlier bitmap layers when it fails to find a given commit in
the current layer.
The changes to do so are as follows:
- Avoid initializing hash_pos at its declaration, since
bitmap_for_commit() is now a recursive function and may receive a
NULL bitmap_index pointer as its first argument.
- In cases where we would previously return NULL (to indicate that a
lookup failed and the given bitmap_index does not contain an entry
corresponding to the given commit), recursively call the function on
the previous bitmap layer.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:34 +0000 (13:56 -0400)]
pack-bitmap.c: open and store incremental bitmap layers
Prepare the pack-bitmap machinery to work with incremental MIDXs by
adding a new "base" field to keep track of the bitmap index associated
with the previous MIDX layer.
The changes in this commit are mostly boilerplate to open the correct
bitmap(s), add them to the chain of bitmap layers along the "base"
pointer, ensure that the correct packs and their reverse indexes are
loaded across MIDX layers, etc.
While we're at it, keep track of a base_nr field to indicate how many
bitmap layers (including the current bitmap) exist. This will be used in
a future commit to allocate an array of 'struct ewah_bitmap' pointers to
collect all of the respective type bitmaps among all layers to
initialize a multi-EWAH iterator.
Subsequent commits will teach the functions within the pack-bitmap
machinery how to interact with these new fields.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:31 +0000 (13:56 -0400)]
pack-revindex: prepare for incremental MIDX bitmaps
Prepare the reverse index machinery to handle object lookups in an
incremental MIDX bitmap. These changes are broken out across a few
functions:
- load_midx_revindex() learns to use the appropriate MIDX filename
depending on whether the given 'struct multi_pack_index *' is
incremental or not.
- pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
object position in the MIDX pseudo-pack order, and find the
earliest containing MIDX (similar to midx.c::midx_for_object().
- midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the
number of objects in the base (since 'vb - midx->revindx_data' is
relative to the containing MIDX, and pack_pos_to_midx() expects a
global position).
Likewise, this function adjusts its output by adding
m->num_objects_in_base to return a global position out through the
`*pos` pointer.
Together, these changes are sufficient to use the multi-pack index's
reverse index format for incremental multi-pack reachability bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>