git.ipfire.org Git - thirdparty/git.git/log

Merge branch 'kn/ci-meson-check-build-docs-fix'

GitHub Actions CI switched on a CI/CD variable that does not exist
when choosing what packages to install etc., which has been
corrected.

* kn/ci-meson-check-build-docs-fix:
ci/github: add missing 'CI_JOB_IMAGE' env variable

Merge branch 'aj/doc-restore-p-update'

Stale description in "git restore -p" documentation has been
updated.

* aj/doc-restore-p-update:
doc: restore: remove note on --patch w/ pathspecs

gitk: limit PATH search to bare executable names

The path search overrides used by gitk on Windows are applied to any
executable whose name is not 'absolute', meaning that
[exec foo/bar ...]
will search each element of $PATH to find one with subdirectory foo
containing bar. But, per POSIX, and Tcl implementation on all platforms,
foo/bar is taken as $(pwd)/foo/bar, and is not searched on $PATH.

Fix this descrepency using the same approach applied to git-gui in
commit 3f71c97e. The key is that the executable name must have no path
component, indicated by [file split $exename] having array length 1.

Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>

gitk: _search_exe is no longer needed

The _search_exe variable allows specifying the suffix used for executables,
typically {} on unix, .exe on Windows. But, the override code is now
used only on Windows, so _search_exe is no longer needed. Eliminate it.

Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>

gitk: override $PATH search only on Windows

Commit 4cbe9e0e2 was written to address problems that result from Tcl's
documented behavior on Windows where the current working directory and a
number of Windows system directories are automatically prepended to
$PATH when searching for executables [1]. This basic Windows behavior
has resulted in more than one CVE against git for Windows:
CVE-2023-23618, CVE-2022-41953 are listed on the git for Windows github
website for the Tcl components of git (gitk, git-gui).

4cbe9e0e2 is intended to restrict the search to looking only in
directories given in $PATH and in the given order, which is exactly the
Tcl behavior documented to exist on non-Windows platforms [1]. Thus,
this change could have been written to affect only Windows, leaving
other platforms alone.

However, 4cbe9e0e2 implements the override for all platforms. This
includes specialized code for Cygwin, copied from git-gui prior to
commit 7145c654 on https://github.com/j6t/git-gui, so targets a
long retired Cygwin port of the Windows Tcl/Tk using Windows pathnames.
Since 2012, Cygwin uses a Unix/X11 port requiring Unix pathnames,
meaning 4cbe9e0e2 is incompatible. 4cbe9e0e2 also induces an infinite
recursion as _which now invokes the exec wrapper that invokes _which.
This is part of git v2.49.0, so gitk on Cygwin is broken in that
release.

Rather than fix the unnecessary override code for Cygwin, let's just
limit the override of exec/open to Windows, leaving all other platforms
using their native exec/open as they did prior to 4cbe9e0e2. This patch
wraps the override code in an "if {[is_Windows]} { ... }" block while
removing the non-Windows code added in 4cbe9e0e2.

[1] see https://www.tcl-lang.org/man/tcl8.6/TclCmd/exec.htm

Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>

gitk: adjust indentation to match the style used in this script

We do not use tab characters for intentation in general. A recent patch
introduced many lines that do use them. Replace them by 4 spaces each.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>

t5605: fix test for cloning from a different user

This test currently passes, but for the wrong reason. The
repo_is_hardlinked function expects a .git directory or a bare
repository and currently fails because it cannot find the objects
directory.

One solution is to use the --bare argument, but then --show-toplevel
won't work. We could change that, but there's no need to, so just add
the missing .git directory.

In addition, use the built-in negation functionality of test_grep to
avoid mishandling real errors (such as a missing file) and, as a final
fix, remove the extra newline.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ps/reftable-sans-compat-util' into ps/reftable-api-revamp

* ps/reftable-sans-compat-util:
  Makefile: skip reftable library for Coccinelle
  reftable: decouple from Git codebase by pulling in "compat/posix.h"
  git-compat-util.h: split out POSIX-emulating bits
  compat/mingw: split out POSIX-related bits
  reftable/basics: introduce `REFTABLE_UNUSED` annotation
  reftable/basics: stop using `SWAP()` macro
  reftable/stack: stop using `sleep_millisec()`
  reftable/system: introduce `reftable_rand()`
  reftable/reader: stop using `ARRAY_SIZE()` macro
  reftable/basics: provide wrappers for big endian conversion
  reftable/basics: stop using `st_mult()` in array allocators
  reftable: stop using `BUG()` in trivial cases
  reftable/record: don't `BUG()` in `reftable_record_cmp()`
  reftable/record: stop using `BUG()` in `reftable_record_init()`
  reftable/record: stop using `COPY_ARRAY()`
  reftable/blocksource: stop using `xmmap()`
  reftable/stack: stop using `write_in_full()`
  reftable/stack: stop using `read_in_full()`

userdiff: add builtin driver for INI files

Add a new builtin driver for generic INI files (e. g. the gitconfig
files), where:

- the funcname regular expression matches section names, i. e. any
  string between brackets at the beginning of the line, with or without
  indentation;

- word_regex matches any word with one or more non-whitespace
  characters without checking if it is a valid variable name or value.

Also add tests for the new userdiff driver. These files define sections
and subsections, with and without indentation.

Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: D. Ben Knoble <ben.knoble@gmail.com>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

revision: fix --left/right-only use with unrelated histories

This is a similar fix as 023756f4eb (revision walker: --cherry-pick is a
limited operation), but for the --left-only and --right-only options.

When computing a symmetric difference between two unrelated histories,
no suitable merge base exists, and so no boundary commit is flagged as
UNINTERESTING. Previously, we relied on the presence of such boundary
to trigger limiting and thus consideration of either "revs->left_only"
or "revs->right_only".

A number of other entries in the option parser have started including
overrides for "revs->limited = 1". Do the same for these options.

Signed-off-by: Matt Hunter <m@lfurio.us>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pathspec: fix sign comparison warnings

There are multiple places, especially in loops, where a signed and an
unsigned data type are compared. Git uses a mix of signed and unsigned
types to store lengths of arrays. This sometimes leads to using a signed
index for an array whose length is stored in an unsigned variable or
vice versa. In some cases, where both signed and unsigned data types
have been used to store lengths of arrays in the same function, only
one variable was used to iterate over both types.

Replace signed data types with unsigned data types and vice versa
wherever necessary. Where both types of iterators are required, move
the declaration inside the for loop. In cases where this is not
possible, add appropriate cast.

Remove #define DISABLE_SIGN_COMPARE_WARNINGS.

Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: use Visual Studio for win+meson job on GitHub Workflows

In 7304bd2bc39 (ci: wire up Visual Studio build with Meson, 2025-01-22)
we have wired up a new CI job that builds and tests Git with Meson on a
Windows machine. The expectation here was that this build uses the
Visual Studio toolchain to do so, and that is true on GitLab CI. But on
GitHub Workflows it is not the case because we've got GCC in our PATH,
and thus Meson favors that compiler toolchain over Visual Studio's.

Fix this by explicitly asking Meson to use the Visual Studio toolchain.
While this is only really required for GitHub Workflows, let's also pass
the flag in GitLab CI so that we don't implicitly assume the toolchain
that Meson is going to pick.

Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

meson: distinguish build and target host binaries

Almost all of the tools we discover during the build process need to be
native programs. There are only a handful of exceptions, which typically
are programs whose paths we need to embed into the resulting executable
so that they can be found on the target system when Git executes. While
this distinction typically doesn't matter, it does start to matter when
considering cross-compilation where the build and target machines are
different.

Meson supports cross-compilation via so-called machine files. These
machine files allow the user to override parameters for the build
machine, but also for the target machine when cross-compiling. Part of
the machine file is a section that allows the user to override the
location where binaries are to be found in the target system. The
following machine file would for example override the path of the POSIX
shell:

    [binaries]
    sh = '/usr/xpg4/bin/sh'

It can be handed over to Meson via `meson setup --cross-file`.

We do not handle this correctly right now though because we don't know
to distinguish binaries for the build and target hosts at all. Address
this by explicitly passing the `native:` parameter to `find_program()`:

  - When set to `true`, we get binaries discovered on the build host.

  - When set to `false`, we get either the path specified in the
    machine file. Or, if no machine file exists or it doesn't specify
    the binary path, then we fall back to the binary discovered on the
    build host.

As mentioned, only a handful of binaries are not native: only the system
shell, Python and Perl need to be treated specially here.

Reported-by: Peter Seiderer <ps.report@gmx.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

meson: respect 'tests' build option in contrib

Both the "netrc" credential helper and git-subtree(1) from "contrib/"
carry a couple of tests with them. These tests get wired up in Meson
unconditionally even in the case where `-Dtests=false`. As those tests
depend on the `test_enviroment` variable, which only gets defined in
case `-Dtests=true`, the result is an error:

```
$ meson setup -Dtests=false -Dcontrib=subtree build
[...]

contrib/subtree/meson.build:15:27: ERROR: Unknown variable "test_environment".
```

Fix the issue by not defining these tests at all in case the "tests"
option is set to `false`.

Reported-by: Sam James <sam@gentoo.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitweb: fix generation of "gitweb.js"

In 19d8fe7da65 (Makefile: extract script to generate gitweb.js,
2024-12-06) we have extracted the logic to build "gitweb.js" into a
separate script. As part of that the rules that builds the script
has gained a new dependency on that script.

This refactoring is broken though because we use "$^" to determine
the set of JavaScript files that need to be concatenated, and this
implicit variable now also contains the build script itself. As a
result, the build script ends up ni the generated "gitweb.js" file,
which is wrong.

Fix the issue by filtering out non-JavaScript files.

Based-on-patch-by: Thorsten Glaser <tg@debian.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

meson: fix handling of '-Dcurl=auto'

The "curl" option controls whether or not a couple of features that
depend on curl shall be included. Most importantly, these features
include the HTTP remote helpers, which are rather quintessential for a
well-functioning Git installation. So while the dependency can in theory
be dropped, most users wouldn't consider the resulting installation to
be fully functional.

The "curl" option is defined as a feature, which means that it can be
"enabled", "disabled" or "auto", which has the effect that the feature
will be enabled if the dependency itself has been found. While most of
the other features have "auto" as default value, the "curl" option is
set to "enabled" by default due to it being so important. Consequently,
autoconfiguration of Git will fail by default if the library cannot be
found.

There is a bug though with how we handle the option in case the user
overrides the feature with `meson setup -Dcurl=auto`: while we will try
to find the library in that case, we won't ever use it because we later
on check for `get_option('curl').enabled()` when deciding whether or not
we want to build dependent sources. But `enabled()` only returns true if
the option has the value "enabled", for "auto" it will return false.

Fix the issue by instead checking for `curl.found()`, which is only true
if the library has been found. And as we only try to find the library
when `get_option('curl')` returns "true" or "auto" this is exactly what
we want.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rm: fix sign comparison warnings

There are multiple places in loops, where a signed and an
unsigned data type are compared. Git uses a mix of signed and unsigned
types to store lengths of arrays. This sometimes leads to using a signed
index for an array whose length is stored in an unsigned variable or
vice versa.

get_ours_cache_pos is a special case where i, though derived from a
signed variable is never negative. Move this part to the caller side
and make i an unsigned argument of the function. Rename i to
pos to make it descriptive, now that it is a function argument.

Replace signed data types with unsigned data types and vice versa
wherever necessary. Where both signed and unsigned data types have been
used, define a new variable in the scope of the for loop for use as the
iterator. Remove #define DISABLE_SIGN_COMPARE_WARNINGS.

Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The second batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'hj/doc-rev-list-ancestry-fix'

Doc update.

* hj/doc-rev-list-ancestry-fix:
doc: add missing commit C to the graph for --ancestry-path=H D..M

Merge branch 'es/meson-building-docs-requires-perl'

Build update.

* es/meson-building-docs-requires-perl:
meson: fix perl detection when docs are enabled, but perl bindings aren't

Merge branch 'en/random-cleanups'

Miscellaneous code clean-ups.

* en/random-cleanups:
  merge-ort: remove extraneous word in comment
  merge-ort: fix accidental strset<->strintmap
  t7615: be more explicit about diff algorithm used
  t6423: fix a comment that accidentally reversed two commits
  stash: remove merge-recursive.h include

Merge branch 'rs/xdiff-context-length-fix'

The xdiff code on 32-bit platform misbehaved when an insanely large
context size is given, which has been corrected.

* rs/xdiff-context-length-fix:
xdiff: avoid arithmetic overflow in xdl_get_hunk()

Merge branch 'jk/use-wunreachable-code-for-devs'

Enable -Wunreachable-code for developer builds.

* jk/use-wunreachable-code-for-devs:
  config.mak.dev: enable -Wunreachable-code
  git-compat-util: add NOT_CONSTANT macro and use it in atfork_prepare()
  run-command: use errno to check for sigfillset() error

Merge branch 'en/diff-rename-follow-fix'

A corner-case bug in "git log --follow -B" has been fixed.

* en/diff-rename-follow-fix:
diffcore-rename: fix BUG when break detection and --follow used together

Merge branch 'tb/multi-cruft-pack-refresh-fix'

Certain "cruft" objects would have never been refreshed when there
are multiple cruft packs in the repository, which has been
corrected.

* tb/multi-cruft-pack-refresh-fix:
builtin/pack-objects.c: freshen objects from existing cruft packs

Merge branch 'am/dir-dedup-decl-of-repository'

Code cleanup.

* am/dir-dedup-decl-of-repository:
dir.h: remove duplicate forward declaration of struct repository

Merge branch 'ps/meson-with-breaking-changes'

Update meson based build procedure for breaking changes support.

* ps/meson-with-breaking-changes:
  meson: don't install git-pack-redundant(1) docs with breaking changes
  meson: don't compile git-pack-redundant(1) with breaking changes
  meson: define WITH_BREAKING_CHANGES when enabling breaking changes

Merge branch 'jk/fetch-ref-prefix-cleanup'

In protocol v2 where the refs advertisement is constrained, we try
to tell the server side not to limit the advertisement when there
is no specific need to, which has been the source of confusion and
recent bugs.  Revamp the logic to simplify.

* jk/fetch-ref-prefix-cleanup:
  fetch: use ref prefix list to skip ls-refs
  fetch: avoid ls-refs only to ask for HEAD symref update
  fetch: stop protecting additions to ref-prefix list
  fetch: ask server to advertise HEAD for config-less fetch
  refspec_ref_prefixes(): clean up refspec_item logic
  t5516: beef up exact-oid ref prefixes test
  t5516: drop NEEDSWORK about v2 reachability behavior
  t5516: prefer "oid" to "sha1" in some test titles
  t5702: fix typo in test name

Merge branch 'ab/decorate-code-cleanup'

Code clean-up.

* ab/decorate-code-cleanup:
decorate: fix sign comparison warnings

Merge branch 'en/merge-ort-prepare-to-remove-recursive'

First step of deprecating and removing merge-recursive.

* en/merge-ort-prepare-to-remove-recursive:
  am: switch from merge_recursive_generic() to merge_ort_generic()
  merge-ort: fix merge.directoryRenames=false
  t3650: document bug when directory renames are turned off
  merge-ort: support having merge verbosity be set to 0
  merge-ort: allow rename detection to be disabled
  merge-ort: add new merge_ort_generic() function

Merge branch 'ps/refname-avail-check-optim'

The code paths to check whether a refname X is available (by seeing
if another ref X/Y exists, etc.) have been optimized.

* ps/refname-avail-check-optim:
  refs: reuse iterators when determining refname availability
  refs/iterator: implement seeking for files iterators
  refs/iterator: implement seeking for packed-ref iterators
  refs/iterator: implement seeking for ref-cache iterators
  refs/iterator: implement seeking for reftable iterators
  refs/iterator: implement seeking for merged iterators
  refs/iterator: provide infrastructure to re-seek iterators
  refs/iterator: separate lifecycle from iteration
  refs: stop re-verifying common prefixes for availability
  refs/files: batch refname availability checks for initial transactions
  refs/files: batch refname availability checks for normal transactions
  refs/reftable: batch refname availability checks
  refs: introduce function to batch refname availability checks
  builtin/update-ref: skip ambiguity checks when parsing object IDs
  object-name: allow skipping ambiguity checks in `get_oid()` family
  object-name: introduce `repo_get_oid_with_flags()`

Merge branch 'cc/signed-fast-export-import'

"git fast-export | git fast-import" learns to deal with commit and
tag objects with embedded signatures a bit better.

* cc/signed-fast-export-import:
  fast-export, fast-import: add support for signed-commits
  fast-export: do not modify memory from get_commit_buffer
  git-fast-export.adoc: clarify why 'verbatim' may not be a good idea
  fast-export: rename --signed-tags='warn' to 'warn-verbatim'
  fast-export: fix missing whitespace after switch
  git-fast-import.adoc: add missing LF in the BNF

p9210: fix 'scalar clone' when running from a detached HEAD

In p9210-scalar-clone.sh, we test using 'scalar clone' to clone
$GIT_PERF_LARGE_REPO (copied locally as 'to-clone'), which defaults to
the git.git checkout we are running the test from.

When --branch is not specified (as in this test), 'scalar clone' tries
to get the default branch of the remote repository by parsing the output
of 'git ls-remote --symref $URL HEAD', as implemented in
scalar.c:remote_default_branch. When the git.git checkout we are running
the test from is in detached HEAD, this fails and we fall back to using
the name of the currently checked out branch in the newly initialized
repository, which in this case is the value returned earlier in
cmd_clone by repo_default_branch_name.

We then invoke 'git checkout -t origin/$branch', with $branch being the
name we got from remote_default_branch. This invocation fails if
'$branch' does not exist as a branch in the current git.git checkout.

Fix this by creating a local branch in 'to-clone' in the setup test
"enable server-side partial clone", making sure to use '-B' in case a
branch named 'test-branch' already exists.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

p7821: fix test_perf invocation for prereqs

Since 5dccd9155f (t/perf: add iteration setup mechanism to perf-lib,
2022-04-04), perf tests need to declare their prerequisites with
'--prereq', after the test title. p7821 was forgotten in that commit,
such that running that test on a machine where the PCRE prereq is not
satisfied aborts the test with:

error: bug in the test script: test_wrapper_ needs 2 positional parameters

Fix this by correcting the two 'test_perf' invocations in that test
suite.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-file doc: set conflict-marker-size attribute

When committing a conflict resolution for a merge containing
1f010d6bdf7 (doc: use .adoc extension for AsciiDoc files, 2025-01-20)
my pre-commit hook failed because "git diff --check" thought there was
a left over conflict marker in "merge-file.adoc". Fix this by setting
the "conflict-marker-size" attribute as we do for all the other
documentation files that contain example conflict markers.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'tb/incremental-midx-part-2' into ps/cat-file-filter-batch

* tb/incremental-midx-part-2:
  midx: implement writing incremental MIDX bitmaps
  pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
  pack-bitmap.c: keep track of each layer's type bitmaps
  ewah: implement `struct ewah_or_iterator`
  pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
  pack-bitmap.c: compute disk-usage with incremental MIDXs
  pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
  pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
  pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
  pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
  pack-bitmap.c: open and store incremental bitmap layers
  pack-revindex: prepare for incremental MIDX bitmaps
  Documentation: describe incremental MIDX bitmaps
  Documentation: remove a "future work" item from the MIDX docs

read-cache: check range before dereferencing an array element

Before accessing an array element at a given index, we should make sure
that the index is within the desired bounds, otherwise it makes little
sense to access the array element in the first place.

In this instance, testing whether `ce->name[common]` is the trailing NUL
byte is technically different from testing whether `common` is within
the bounds of `previous_name`. It is also redundant, as the range-check
guarantees that `previous_name->buf[common]` cannot be NUL and therefore
the condition `ce->name[common] == previous_name->buf[common]` would not
be met if `ce->name[common]` evaluated to NUL.

However, in the interest of reducing the cognitive load to reason about
the correctness of this loop (so that I can focus on interesting
projects again), I'll simply move the range-check to the beginning of
the loop condition and keep the redundant NUL check.

This acquiesces CodeQL's `cpp/offset-use-before-range-check` rule.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

detect-compiler: detect clang even if it found CUDA

In my setup, clang finds `/usr/local/cuda` and hence the output of
`clang -v` ends with this line:

Found CUDA installation: /usr/local/cuda, version

This confuses the `detect-compiler` script because it matches _all_
lines that contain the needle "version" surrounded by spaces. As a
consequence, the `get_family` function returns two lines: "Ubuntu clang"
and above-mentioned line, which the `case` statement does not handle
well and hence reports "unknown compiler family" instead of the expected
set of "clang14", "clang13", ..., "clang1" output.

Let's unconfuse the script by letting it parse the first matching line
and ignore the rest.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

clang: warn when the comma operator is used

When compiling Git using `clang`, the `-Wcomma` option can be used to
warn about code using the comma operator (because it is typically
unintentional and wants to use the semicolon instead).

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

compat/regex: explicitly mark intentional use of the comma operator

The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. That is why the
`-Wcomma` option of clang was introduced: To identify unintentional uses
of the comma operator.

In the `compat/regex/` code, the comma operator is used twice, once to
avoid surrounding two conditional statements with curly brackets, the
other one to increment two counters simultaneously in a `do ... while`
condition.

The first one is replaced with a proper conditional block, surrounded by
curly brackets.

The second one would be harder to replace because the loop contains two
`continue`s. Therefore, the second one is marked as intentional by
casting the value-to-discard to `void`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

wildmatch: avoid using of the comma operator

The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. That is why the
`-Wcomma` option of clang was introduced: To identify unintentional uses
of the comma operator.

In this instance, the usage is intentional because it allows storing the
value of the current character as `prev_ch` before making the next
character the current one, all of which happens in the loop condition
that lets the loop stop at a closing bracket.

However, it is hard to read.

The chosen alternative to using the comma operator is to move those
assignments from the condition into the loop body; In this particular
case that requires special care because the loop body contains a
`continue` for the case where a character class is found that starts
with `[:` but does not end in `:]` (and the assignments should occur
even when that code path is taken), which needs to be turned into a
`goto`.

Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff-delta: avoid using the comma operator

The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. That is why the
`-Wcomma` option of clang was introduced: To identify unintentional uses
of the comma operator.

Intentional uses include situations where one wants to avoid curly
brackets around multiple statements that need to be guarded by a
condition. This is the case here, as the repetitive nature of the
statements is easier to see for a human reader this way. At least in my
opinion.

However, opinions on this differ wildly, take 10 people and you have 10
different preferences.

On the Git mailing list, it seems that the consensus is to use the long
form instead, so let's do just that.

Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

xdiff: avoid using the comma operator unnecessarily

The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. While the code in
this patch used the comma operator intentionally (to avoid curly
brackets around two statements, each, that want to be guarded by a
condition), it is better to surround it with curly brackets and to use a
semicolon instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

clar: avoid using the comma operator unnecessarily

The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. In this instance, it
makes the code harder to read than necessary, too. Better use a
semicolon instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

kwset: avoid using the comma operator unnecessarily

The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. Better use a
semicolon instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rebase: avoid using the comma operator unnecessarily

The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. Better use a
semicolon instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote-curl: avoid using the comma operator unnecessarily

The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. Better use a
semicolon instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Start 2.50 cycle (batch #1)

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ja/doc-block-delimiter-markup-fix'

Doc markup updates.

* ja/doc-block-delimiter-markup-fix:
doc: add a blank line around block delimiters

Merge branch 'en/merge-process-renames-crash-fix'

The merge-recursive and merge-ort machinery crashed in corner cases
when certain renames are involved.

* en/merge-process-renames-crash-fix:
merge-ort: fix slightly overzealous assertion for rename-to-self
t6423: add a testcase causing a failed assertion in process_renames

Merge branch 'ua/some-builtins-wo-the-repository'

A handful of built-in command implementations have been rewritten
to use the repository instance supplied by git.c:run_builtin(), its
caller.

* ua/some-builtins-wo-the-repository:
  builtin/checkout-index: stop using `the_repository`
  builtin/for-each-ref: stop using `the_repository`
  builtin/ls-files: stop using `the_repository`
  builtin/pack-refs: stop using `the_repository`
  builtin/send-pack: stop using `the_repository`
  builtin/verify-commit: stop using `the_repository`
  builtin/verify-tag: stop using `the_repository`
  config: teach repo_config to allow `repo` to be NULL

Merge branch 'tb/refs-exclude-fixes'

The refname exclusion logic in the packed-ref backend has been
broken for some time, which confused upload-pack to advertise
different set of refs.  This has been corrected.

* tb/refs-exclude-fixes:
  refs.c: stop matching non-directory prefixes in exclude patterns
  refs.c: remove empty '--exclude' patterns

Merge branch 'sj/ref-consistency-checks-more'

"git fsck" becomes more careful when checking the refs.

* sj/ref-consistency-checks-more:
  builtin/fsck: add `git refs verify` child process
  packed-backend: check whether the "packed-refs" is sorted
  packed-backend: add "packed-refs" entry consistency check
  packed-backend: check whether the refname contains NUL characters
  packed-backend: add "packed-refs" header consistency check
  packed-backend: check if header starts with "# pack-refs with: "
  packed-backend: check whether the "packed-refs" is regular file
  builtin/refs: get worktrees without reading head information
  t0602: use subshell to ensure working directory unchanged

Merge branch 'jt/diff-pairs'

A post-processing filter for "diff --raw" output has been
introduced.

* jt/diff-pairs:
  builtin/diff-pairs: allow explicit diff queue flush
  builtin: introduce diff-pairs command
  diff: add option to skip resolving diff statuses
  diff: return diff_filepair from diff queue helpers

mergetools: vimdiff: add tests for layout with REMOTE as the target

Add some tests to make sure that now "REMOTE" can be used as a target
(ie. can be used together with the "@" marker) inside
"mergetool.vimdiff.layout"

Signed-off-by: Fernando Ramos <greenfoo@u92.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mergetools: vimdiff: fix layout where REMOTE is the target

"mergetool.vimdiff.layout" is used to define the vim layout (ie. how
windows, tabs and buffers are physically organized) when resolving
conflicts.

For example, if we set it to this:

    "(LOCAL,BASE,REMOTE)/MERGED"

...vim will open and show this layout:

    ------------------------------------------
    |             |           |              |
    |   LOCAL     |   BASE    |   REMOTE     |
    |             |           |              |
    ------------------------------------------
    |                                        |
    |                MERGED                  |
    |                                        |
    ------------------------------------------

By default, whatever ends up been written to the "MERGED" window will
become the file which conflict we are resolving.

However, it is possible to use the "@" symbol to specify a different
one.  For example, if we use this slightly different version of the
previously used string:

    "(LOCAL,BASE,@REMOTE)/MERGED"

...then the user should proceed to edit the contents of the top right
window (instead of the bottom window) as *that* is what will become the
conflicts free file once vim is closed.

Before this commit, the "@" marker worked for all targets *except* for
"REMOTE". In other words, these worked as expected:

    "(@LOCAL,BASE,REMOTE)/MERGED"
    "(LOCAL,@BASE,REMOTE)/MERGED"
    "(LOCAL,BASE,REMOTE)/@MERGED"

...but this didn't:

    "(LOCAL,BASE,@REMOTE)/MERGED"

This commit fixes that.

Reported-by: kawarimidoll <kawarimidoll+git@gmail.com>
Suggested-by: D. Ben Knoble <ben.knoble@gmail.com>
Signed-off-by: Fernando Ramos <greenfoo@u92.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

meson: disable coccinelle configuration when building from a tarball

Wiring up coccinelle in the build, depends on running git commands to
get the list of files to operate on. Reasonable, for a feature mainly
used by people developing on git. If building git itself from a tarball
distribution of git's own source code, one likely does not need to run
coccinelle.

But running those git commands failed, and caused the build to error
out, if `spatch` was installed -- because the build assumed that its
presence indicated a desire to use it on this source tree. Instead, we
can expand the conditional to check for both `spatch` and the `.git`
file or directory.

Meson's `opt.require()` method allows us to add a prerequisite for the
feature option. If the prerequisite fails, then the option either:

- converts autodetection to disabled

- emits an informative error if the feature was set to enabled:
  ```
  ERROR: Feature coccinelle cannot be enabled: coccinelle can only be run from a git checkout
  ```

Signed-off-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

vimdiff: clarify the sigil used for marking the buffer to save

The original documentation from 7b5cf8be18 (vimdiff: add tool
documentation, 2022-03-30) mistakenly described the marker as an
asterisk, which is the character "*". The code and examples have always
looked for an arobase ("@").

Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Acked-by: Fernando Ramos <greenfoo@u92.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

advice: allow disabling default branch name advice

The default branch name advice message is displayed when
`repo_default_branch_name()` is invoked and the `init.defaultBranch`
config is not set. In this scenario, the advice message is always shown
even if the `--no-advice` option is used.

Adapt `repo_default_branch_name()` to allow the default branch name
advice message to be disabled with the `--no-advice` option and
corresponding configuration.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/clone: suppress unexpected default branch advice

In 199f44cb2ead (builtin/clone: allow remote helpers to detect repo,
2024-02-27), clones started partially initializing the refdb before
executing the remote helpers by creating a HEAD file and "refs/"
directory. This has resulted in some scenarios where git-clone(1) now
prints the default branch name advice message where it previously did
not.

A side-effect of the HEAD file already existing, is that computation of
the default branch name is handled later in execution. This matters
because prior to 97abaab5f6 (refs: drop `git_default_branch_name()`,
2024-05-17), the default branch value would be computed during its first
execution and cached. Subsequent invocations would simply return the
cached value. Since the next `git_default_branch_name()` call site,
which is invoked through `guess_remote_head()`, is not configured to
suppress the advice message, computing the default branch name results
in the advice message being printed.

Configure `guess_remote_head()` to suppress the advice message,
restoring the previous behavior.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: allow `guess_remote_head()` to suppress advice

The `repo_default_branch_name()` invoked through `guess_remote_head()`
is configured to always display the default branch advice message.

Adapt `guess_remote_head()` to accept flags and convert the `all`
parameter to a flag. Add the `REMOTE_GUESS_HEAD_QUIET` flag to to enable
suppression of advice messages. Call sites are updated accordingly.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bulk-checkin: fix sign compare warnings

In file bulk-checkin.c, three warnings are emitted by
"-Wsign-compare", two of which are caused by trivial loop iterator
type mismatches. For the third case, the type of `rsize` from

ssize_t rsize = size < sizeof(ibuf) ? size : sizeof(ibuf);

can be changed to size_t as both options of the ternary expression are
unsigned and the signedness of the variable isn't really needed
anywhere.

To prevent `read_result != rsize` making a clash, it is to be noted
that `read_result` is checked not to hold negative values. Therefore
casting the variable to size_t is a safe operation and enough to
remove the sign-compare warning.

Fix issues accordingly, and remove `DISABLE_SIGN_COMPARE_WARNINGS` to
enable "-Wsign-compare" for the file.

Signed-off-by: Tuomas Ahola <taahol@utu.fi>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

imap-send: explicitly verify the peer certificate

It is a bug to obtain the peer certificate without verifying it.

Having said that, from my reading of
https://www.openssl.org/docs/man1.1.1/man3/SSL_set_verify.html, it would
appear that Git is saved by the fact that it calls
`SSL_CTX_set_verify(ctx, SSL_VERIFY_PEER, NULL)` already early on.

In other words, that `SSL_VERIFY_PEER` combined with the `NULL`
parameter (i.e. no overridden callback) would _already_ verify the peer
certificate. The fact that we later call `SSL_get_peer_certificate()`
is mistaken by CodeQL to mean that that peer certificate still needs to
be verified, but that had already happened at that point.

Nevertheless, it is better to verify the peer certificate explicitly
than to rely on some side effect that is really hard to reason about
(and that took me more than one business day to analyze fully). It also
makes it easier for static analyzers to validate the correctness of the
code.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-tool path-utils: support debugging "dubious ownership" issues

This adds a new sub-sub-command for `test-tool`, simply passing through
the command-line arguments to the `is_path_owned_by_current_user()`
function.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: special-case administrators even more

The check for dubious ownership has one particular quirk on Windows: if
running as an administrator, files owned by the Administrators _group_
are considered owned by the user.

The rationale for that is: When running in elevated mode, Git creates
files that aren't owned by the individual user but by the Administrators
group.

There is yet another quirk, though: The check I introduced to determine
whether the current user is an administrator uses the
`CheckTokenMembership()` function with the current process token. And
that check only succeeds when running in elevated mode!

Let's be a bit more lenient here and look harder whether the current
user is an administrator. We do this by looking for a so-called "linked
token". That token exists when administrators run in non-elevated mode,
and can be used to create a new process in elevated mode. And feeding
_that_ token to the `CheckTokenMembership()` function succeeds!

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

maintenance: add loose-objects.batchSize config

The 'loose-objects' task of 'git maintenance run' first deletes loose
objects that exit within packfiles and then collects loose objects into
a packfile. This second step uses an implicit limit of fifty thousand
that cannot be modified by users.

Add a new config option that allows this limit to be adjusted or ignored
entirely.

While creating tests for this option, I noticed that actually there was
an off-by-one error due to the strict comparison in the limit check. I
considered making the limit check turn true on equality, but instead I
thought to use INT_MAX as a "no limit" barrier which should mean it's
never possible to hit the limit. Thus, a new decrement to the limit is
provided if the value is positive. (The restriction to positive values
is to avoid underflow if INT_MIN is configured.)

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

maintenance: force progress/no-quiet to children

The --no-quiet option for 'git maintenance run' is supposed to indicate
that progress should happen even while ignoring the value of isatty(2).
However, Git implicitly asks child processes to check isatty(2) since
these arguments are not passed through.

The pass through of --no-quiet will be useful in a test in the next
change.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

completion: fix bugs with slashes in remote names

Previously, some calls to for-each-ref passed fixed numbers of path
components to strip from refs, assuming that remote names had no slashes
in them. This made completions like:

git push github/dseomn :com<Tab>

Result in:

git push github/dseomn :dseomn/completion-remote-slash

With this patch, it instead results in:

git push github/dseomn :completion-remote-slash

Signed-off-by: David Mandelberg <david@mandelberg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

completion: add helper to count path components

A follow-up commit will use this with for-each-ref to strip the right
number of path components from refnames.

Signed-off-by: David Mandelberg <david@mandelberg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit: move clear_commit_marks_many() loop body to clear_commit_marks()

clear_commit_marks_many() clears multiple commits one by one. Move the
code for handling a single commit to clear_commit_marks() and call it
instead of the other way around, to simplify the code.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: implement writing incremental MIDX bitmaps

Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.

The details for doing so are mostly straightforward. The main changes
are as follows:

  - find_object_pos() now makes use of an extra MIDX parameter which is
    used to locate the bit positions of objects which are from previous
    layers (and thus do not exist in the current layer's pack_order
    field).

    (Note also that the pack_order field is moved into struct
    write_midx_context to further simplify the callers for
    write_midx_bitmap()).

  - bitmap_writer_build_type_index() first determines how many objects
    precede the current bitmap layer and offsets the bits it sets in
    each respective type-level bitmap by that amount so they can be OR'd
    together.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators

Now that we have initialized arrays for each bitmap layer's type bitmaps
in the previous commit, adjust existing callers to use them in
preparation for multi-layered bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: keep track of each layer's type bitmaps

Prepare for reading the type-level bitmaps from previous bitmap layers
by maintaining an array for each type, where each element in that type's
array corresponds to one layer's bitmap for that type.

These fields will be used in a later commit to instantiate the 'struct
ewah_or_iterator' for each type.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ewah: implement `struct ewah_or_iterator`

While individual bitmap layers store different commit, type-level, and
pseudo-merge bitmaps, only the top-most layer is used to compute
reachability traversals.

Many functions which implement the aforementioned traversal rely on
enumerating the results according to the type-level bitmaps, and so
would benefit from a conceptual type-level bitmap that spans multiple
layers.

Implement `struct ewah_or_iterator` which is capable of enumerating
multiple EWAH bitmaps at once, and OR-ing the results together. When
initialized with, for example, all of the commit type bitmaps from each
layer, callers can pretend as if they are enumerating a large type-level
bitmap which contains the commits from *all* bitmap layers.

There are a couple of alternative approaches which were considered:

  - Decompress each EWAH bitmap and OR them together, enumerating a
    single (non-EWAH) bitmap. This would work, but has the disadvantage
    of decompressing a potentially large bitmap, which may not be
    necessary if the caller does not wish to read all of it.

  - Recursively call bitmap internal functions, reusing the "result" and
    "haves" bitmap from the top-most layer. This approach resembles the
    original implementation of this feature, but is inefficient in that
    it both (a) requires significant refactoring to implement, and (b)
    enumerates large sections of later bitmaps which are all zeros (as
    they pertain to objects in earlier layers).

    (b) is not so bad in and of itself, but can cause significant
    slow-downs when combined with expensive loop bodies.

This approach (enumerating an OR'd together version of all of the
type-level bitmaps from each layer) produces a significantly more
straightforward implementation with significantly less refactoring
required in order to make it work.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs

Prepare for using pseudo-merges with incremental MIDX bitmaps by
attempting to apply pseudo-merges from each layer when encountering a
given commit during a walk.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: compute disk-usage with incremental MIDXs

In a similar fashion as previous commits, use nth_midxed_pack() instead
of accessing the MIDX's ->packs array directly to support incremental
MIDXs.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs

Implement support for the special `--test-bitmap` mode of `git rev-list`
when using incremental MIDXs.

The bitmap_test_data structure is extended to contain a "base" pointer
that mirrors the structure of the bitmap chain that it is being used to
test.

When we find a commit to test, we first chase down the ->base pointer to
find the appropriate bitmap_test_data for the bitmap layer that the
given commit is contained within, and then perform the test on that
bitmap.

In order to implement this, light modifications are made to
bitmap_for_commit() to reimplement it in terms of a new function,
find_bitmap_for_commit(), which fills out a pointer which indicates the
bitmap layer which contains the given commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs

In a similar fashion as previous commits in the first phase of
incremental MIDXs, enumerate not just the packs in the current
incremental MIDX layer, but previous ones as well.

Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
single pack from a MIDX, use the oldest layer's preferred pack as it is
likely to contain the largest number of reusable sections.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs

Since we may ask for a pack_id that is in an earlier MIDX layer relative
to the one corresponding to our bitmap, use nth_midxed_pack() instead of
accessing the ->packs array directly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs

The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
EWAH-compressed bitmap corresponding to some given commit object.

Teach this function about incremental MIDX bitmaps by teaching it to
recur on earlier bitmap layers when it fails to find a given commit in
the current layer.

The changes to do so are as follows:

  - Avoid initializing hash_pos at its declaration, since
    bitmap_for_commit() is now a recursive function and may receive a
    NULL bitmap_index pointer as its first argument.

  - In cases where we would previously return NULL (to indicate that a
    lookup failed and the given bitmap_index does not contain an entry
    corresponding to the given commit), recursively call the function on
    the previous bitmap layer.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: open and store incremental bitmap layers

Prepare the pack-bitmap machinery to work with incremental MIDXs by
adding a new "base" field to keep track of the bitmap index associated
with the previous MIDX layer.

The changes in this commit are mostly boilerplate to open the correct
bitmap(s), add them to the chain of bitmap layers along the "base"
pointer, ensure that the correct packs and their reverse indexes are
loaded across MIDX layers, etc.

While we're at it, keep track of a base_nr field to indicate how many
bitmap layers (including the current bitmap) exist. This will be used in
a future commit to allocate an array of 'struct ewah_bitmap' pointers to
collect all of the respective type bitmaps among all layers to
initialize a multi-EWAH iterator.

Subsequent commits will teach the functions within the pack-bitmap
machinery how to interact with these new fields.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-revindex: prepare for incremental MIDX bitmaps

Prepare the reverse index machinery to handle object lookups in an
incremental MIDX bitmap. These changes are broken out across a few
functions:

  - load_midx_revindex() learns to use the appropriate MIDX filename
    depending on whether the given 'struct multi_pack_index *' is
    incremental or not.

  - pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
    object position in the MIDX pseudo-pack order, and find the
    earliest containing MIDX (similar to midx.c::midx_for_object().

  - midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the
    number of objects in the base (since 'vb - midx->revindx_data' is
    relative to the containing MIDX, and pack_pos_to_midx() expects a
    global position).

    Likewise, this function adjusts its output by adding
    m->num_objects_in_base to return a global position out through the
    `*pos` pointer.

Together, these changes are sufficient to use the multi-pack index's
reverse index format for incremental multi-pack reachability bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: describe incremental MIDX bitmaps

Prepare to implement support for reachability bitmaps for the new
incremental multi-pack index (MIDX) feature over the following commits.

This commit begins by first describing the relevant format and usage
details for incremental MIDX bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: remove a "future work" item from the MIDX docs

One of the items listed as "future work" in the MIDX's technical
documentation is to extend the format to allow MIDXs to be written
incrementally across multiple layers.

This was suggested all the way back in ceab693d1f (multi-pack-index: add
design document, 2018-07-12), and implemented in b9497848df (Merge
branch 'tb/incremental-midx-part-1', 2024-08-19). Let's remove it
accordingly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

compat/mingw: fix EACCESS when opening files with `O_CREAT | O_EXCL`

In our CI systems we can observe that t0610 fails rather frequently.
This testcase races a bunch of git-update-ref(1) processes with one
another which are all trying to update a unique reference, where we
expect that all processes succeed and end up updating the reftable
stack. The error message in this case looks like the following:

    fatal: update_ref failed for ref 'refs/heads/branch-88': reftable: transaction prepare: I/O error

Instrumenting the code with a couple of calls to `BUG()` in relevant
sites where we return `REFTABLE_IO_ERROR` quickly leads one to discover
that this error is caused when calling `flock_acquire()`, which is a
thin wrapper around our lockfile API. Curiously, the error code we get
in such cases is `EACCESS`, indicating that we are not allowed to access
the file.

The root cause of this is an oddity of `CreateFileW()`, which is what
`_wopen()` uses internally. Quoting its documentation [1]:

    If you call CreateFile on a file that is pending deletion as a
    result of a previous call to DeleteFile, the function fails. The
    operating system delays file deletion until all handles to the file
    are closed. GetLastError returns ERROR_ACCESS_DENIED.

This behaviour is triggered quite often in the above testcase because
all the processes race with one another trying to acquire the lock for
the "tables.list" file. This is due to how locking works in the reftable
library when compacting a stack:

    1. Lock the "tables.list" file and reads its contents.

    2. Decide which tables to compact.

    3. Lock each of the individual tables that we are about to compact.

    4. Unlock the "tables.list" file.

    5. Compact the individual tables into one large table.

    6. Re-lock the "tables.list" file.

    7. Write the new list of tables into it.

    8. Commit the "tables.list" file.

The important step is (4): we don't commit the file directly by renaming
it into place, but instead we delete the lockfile so that concurrent
processes can continue to append to the reftable stack while we compact
the tables. And because we use `DeleteFileW()` to do so, we may now race
with another process that wants to acquire that lockfile. So if we are
unlucky, we would now see `ERROR_ACCESS_DENIED` instead of the expected
`ERROR_FILE_EXISTS`, which the lockfile subsystem isn't prepared to
handle and thus it will bail out without retrying to acquire the lock.

In theory, the issue is not limited to the reftable library and can be
triggered by every other user of the lockfile subsystem, as well. My gut
feeling tells me it's rather unlikely to surface elsewhere though.

Fix the issue by translating the error to `EEXIST`. This makes the
lockfile subsystem handle the error correctly: in case a timeout is set
it will now retry acquiring the lockfile until the timeout has expired.

With this, t0610 is now always passing on my machine whereas it was
previously failing in around 20-30% of all test runs.

[1]: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

meson: fix compat sources when compiling with MSVC

In our compat library we have both "msvc.c" and "mingw.c". The former is
mostly a thin wrapper around the latter as it directly includes it, but
it has a couple of extra headers that aren't included in "mingw.c" and
is expected to be used with the Visual Studio compiler toolchain.

While our Makefile knows to pick up the correct file depending on
whether or not the Visual Studio toolchain is used, we don't do the same
with Meson. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/fetch: avoid aborting closed reference transaction

As part of the reference transaction commit phase, the transaction is
set to a closed state regardless of whether it was successful of not.
Attempting to abort a closed transaction via `ref_transaction_abort()`
results in a `BUG()`.

In c92abe71df (builtin/fetch: fix leaking transaction with `--atomic`,
2024-08-22), logic to free a transaction after the commit phase is moved
to the centralized exit path. In cases where the transaction commit
failed, this results in a closed transaction being aborted and signaling
a bug.

Free the transaction and set it to NULL when the commit fails. This
allows the exit path to correctly handle the error without attempting to
abort the transaction.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

repack: begin combining cruft packs with `--combine-cruft-below-size`

The previous commit changed the behavior of repack's '--max-cruft-size'
to specify a cruft pack-specific override for '--max-pack-size'.

Introduce a new flag, '--combine-cruft-below-size' which is a
replacement for the old behavior of '--max-cruft-size'. This new flag
does explicitly what it says: it combines together cruft packs which are
smaller than a given threshold, and leaves alone ones which are
larger.

This accomplishes the original intent of '--max-cruft-size', which was
to avoid repacking cruft packs larger than the given threshold.

The new behavior is slightly different. Instead of building up small
packs together until the threshold is met, '--combine-cruft-below-size'
packs up *all* cruft packs smaller than the threshold. This means that
we may make a pack much larger than the given threshold (e.g., if you
aggregate 5 packs which are each 99 MiB in size with a threshold of 100
MiB).

But that's OK: the point isn't to restrict the size of the cruft packs
we generate, it's to avoid working with ones that have already grown too
large. If repositories still want to limit the size of the generated
cruft pack(s), they may use '--max-cruft-size'.

There's some minor test fallout as a result of the slight differences in
behavior between the old meaning of '--max-cruft-size' and the behavior
of '--combine-cruft-below-size'. In the test which is now called
"--combine-cruft-below-size combines packs", we need to use the new flag
over the old one to exercise that test's intended behavior. The
remainder of the changes there are to improve the clarity of the
comments.

Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

repack: avoid combining cruft packs with `--max-cruft-size`

In 37dc6d8104 (builtin/repack.c: implement support for
`--max-cruft-size`, 2023-10-02), we exposed new functionality that
allowed repositories to specify the behavior of when we should combine
multiple cruft packs together.

This feature was designed to ensure that we never repacked cruft packs
which were larger than the given threshold in order to provide tighter
I/O bounds for repositories that have many unreachable objects. In
essence, specifying '--max-cruft-size=N' instructed 'repack' to
aggregate cruft packs together (in order of ascending size) until the
combine size grows past 'N', and then make a new cruft pack whose
contents includes the packs we rolled up.

But this isn't quite how it works in practice. Suppose for example that
we have two cruft packs which are each 100MiB in size. One might expect
specifying "--max-cruft-size=200M" would combine these two packs
together, and then avoid repacking them until a pruning GC takes place.
In reality, 'repack' would try and aggregate these together, but writing
a pack that is strictly smaller than 200 MiB (since pack-objects'
"--max-pack-size" provides a strict bound for packs containing more than
one object).

So instead we'll write out a pack that is, say, 199 MiB in size, and
then another 1 MiB pack containing the balance. If we later repack the
repository without adding any new unreachable objects, we'll repeat the
same exercise again, making the same 199 MiB and 1 MiB packs each time.

This happens because of a poor choice to bolt the '--max-cruft-size'
functionality onto pack-objects' '--max-pack-size', forcing us to
generate packs which are always smaller than the provided threshold and
thus subject to repacking.

The following commit will introduce a new flag that implements something
similar to the behavior above. Let's prepare for that by making repack's
'--max-cruft-size' flag behave as an cruft pack-specific override for
'--max-pack-size'.

Do so by temporarily repurposing the 'collapse_small_cruft_packs()'
function to instead generate a cruft pack using the same instructions as
if we didn't specify any maximum pack size. The calling code looks
something like:

    if (args->max_pack_size && !cruft_expiration) {
        collapse_small_cruft_packs(in, args->max_pack_size, existing);
    } else {
        for_each_string_list_item(item, &existing->non_kept_packs)
            fprintf(in, "-%s.pack\n", item->string);
        for_each_string_list_item(item, &existing->cruft_packs)
            fprintf(in, "-%s.pack\n", item->string);
    }

This patch makes collapse_small_cruft_packs() behave identically to the
'else' arm of the conditional above. This repurposing of
'collapse_small_cruft_packs()' is intentional, since it will set us up
nicely to introduce the new behavior in the following commit.

Naturally, there is some test fallout in the test which exercises the
old meaning of '--max-cruft-size'. Mark that test as failing for now to
be dealt with in the following commit. Likewise, add a new test which
explicitly tests the behavior of '--max-cruft-size' to place a hard
limit on the size of any generated cruft pack(s).

Note that this is a breaking change, as it alters the user-visible
behavior of '--max-cruft-size'. But I'm OK changing this behavior in
this instance, since the behavior wasn't accurate to begin with.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/t7704-repack-cruft.sh: consolidate `write_blob()`

A previous commit moved a handful of tests from a different script into
t7704, including one that relies on generating random blobs.

Incidentally, the original home of this test defined its own helper
"write_blob" for doing so, which is identical in function to our
"generate_random_blob" (and is slightly inferior to the latter, which
cleans up after itself).

Rewrite the test that uses "write_blob" to no longer do so and then
remove the function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests

Now that a number of new tests have landed in t7704, make sure that they
all make sense and are testing the things they say they are.

Things are mostly OK, but a handful of tests needed tweaks. Those tweaks
are as follows:

  - Use the terms "too large" or "too small" in tests that exercise the
    '--max-cruft-size' behavior. This has historically been treated as a
    threshold beneath which to combine cruft packs, but that will change
    in a subsequent commit. Prepare for that by using a more generic
    term.

  - Remove references to "--max-cruft-size" in the freshening tests.
    These tests provide coverage of our ability to record updated mtimes
    for objects already in cruft packs whose mtimes are upserted from
    various sources (loose objects, finding that object in a new pack,
    another cruft pack, etc.).

    These have nothing to do with the '--max-cruft-size' feature, and in
    fact none of the tests even *use* '--max-cruft-size'. Name them
    appropriately to make it clear that these tests exercise freshening
    behavior, not '--max-cruft-size' behavior.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests

The cruft pack feature has two primary test scripts which exercise
various parts of it, which are:

- t5329-pack-objects-cruft.sh
- t7704-repack-cruft.sh

The former is designed to test low-level pack generation mechanics at
the 'git pack-objects --cruft'-level, which is plumbing. The latter, on
the other hand, is designed to test the user-facing behavior through
'git repack --cruft', which is porcelain (under the "ancillary
manipulators" sub-section).

At some point a handful of tests which should have been added to the
latter script were instead written to the former. This isn't a huge
deal, but rectifying it is straightforward. Move a handful of
'repack'-related tests out of t5329 and into their rightful home in
t7704.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rev-list: support NUL-delimited --missing option

The `--missing={print,print-info}` option for git-rev-list(1) prints
missing objects found while performing the object walk in the form:

        $ git rev-list --missing=print-info <rev>
        ?<oid> [SP <token>=<value>]... LF

Add support for printing missing objects in a NUL-delimited format when
the `-z` option is enabled.

        $ git rev-list -z --missing=print-info <rev>
        <oid> NUL missing=yes NUL [<token>=<value> NUL]...

In this mode, values containing special characters or spaces are printed
as-is without being escaped or quoted. Instead of prefixing the missing
OID with '?', a separate `missing=yes` token/value pair is appended.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rev-list: support NUL-delimited --boundary option

The `--boundary` option for git-rev-list(1) prints boundary objects
found while performing the object walk in the form:

        $ git rev-list --boundary <rev>
        -<oid> LF

Add support for printing boundary objects in a NUL-delimited format when
the `-z` option is enabled.

        $ git rev-list -z --boundary <rev>
        <oid> NUL boundary=yes NUL

In this mode, instead of prefixing the boundary OID with '-', a separate
`boundary=yes` token/value pair is appended.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rev-list: support delimiting objects with NUL bytes

When walking objects, git-rev-list(1) prints each object entry on a
separate line. Some options, such as `--objects`, may print additional
information about tree and blob object on the same line in the form:

        $ git rev-list --objects <rev>
        <tree/blob oid> SP [<path>] LF

Note that in this form the SP is appended regardless of whether the tree
or blob object has path information available. Paths containing a
newline are also truncated at the newline.

Introduce the `-z` option for git-rev-list(1) which reformats the output
to use NUL-delimiters between objects and associated info in the
following form:

        $ git rev-list -z --objects <rev>
        <oid> NUL [path=<path> NUL]

In this form, the start of each record is signaled by an OID entry that
is all hexidecimal and does not contain any '='. Additional path info
from `--objects` is appended to the record as a token/value pair
`path=<path>` as-is without any truncation.

For now, the `--objects` flag is the only options that can be used in
combination with `-z`. In a subsequent commit, NUL-delimited support for
other options is added. Other options that do not make sense when used
in combination with `-z` are rejected.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rev-list: refactor early option parsing

Before invoking `setup_revisions()`, the `--missing` and
`--exclude-promisor-objects` options are parsed early. In a subsequent
commit, another option is added that must be parsed early.

Refactor the code to parse both options in a single early pass.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rev-list: inline `show_object_with_name()` in `show_object()`

The `show_object_with_name()` function only has a single call site.
Inline call to `show_object_with_name()` in `show_object()` so the
explicit function can be cleaned up and live closer to where it is used.
While at it, factor out the code that prints the OID and newline for
both objects with and without a name. In a subsequent commit,
`show_object()` is modified to support printing object information in a
NUL-delimited format.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

treewide: replace assert() with ASSERT() in special cases

When the compiler/linker cannot verify that an assert() invocation is
free of side effects for us (e.g. because the assertion includes some
kind of function call), replace the use of assert() with ASSERT().

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: add build checking for side-effects in assert() calls

It is a big no-no to have side-effects in an assertion, because if the
assert() is compiled out, you don't get that side-effect, leading to the
code behaving differently.  That can be a large headache to debug.

We have roughly 566 assert() calls in our codebase (my grep might have
picked up things that aren't actually assert() calls, but most appeared
to be).  All but 9 of them can be determined by gcc to be free of side
effects with a clever redefine of assert() provided by Bruno De Fraine
(from
https://stackoverflow.com/questions/10593492/catching-assert-with-side-effects),
who upon request has graciously placed his two-liner into the public
domain without warranty of any kind.  The current 9 assert() calls
flagged by this clever redefinition of assert() appear to me to be free
of side effects as well, but are too complicated for a compiler/linker
to figure that since each assertion involves some kind of function call.
Add a CI job which will find and report these possibly problematic
assertions, and have the job suggest to the user that they replace these
with ASSERT() calls.

Example output from running:

```
ERROR: The compiler could not verify the following assert()
       calls are free of side-effects.  Please replace with
       ASSERT() calls.
/home/newren/floss/git/diffcore-rename.c:1409
assert(!dir_rename_count || strmap_empty(dir_rename_count));
/home/newren/floss/git/merge-ort.c:1645
assert(renames->deferred[side].trivial_merges_okay &&
       !strset_contains(&renames->deferred[side].target_dirs,
path));
/home/newren/floss/git/merge-ort.c:794
assert(omittable_hint ==
       (!starts_with(type_short_descriptions[type], "CONFLICT") &&
!starts_with(type_short_descriptions[type], "ERROR")) ||
       type == CONFLICT_DIR_RENAME_SUGGESTED);
/home/newren/floss/git/merge-recursive.c:1200
assert(!merge_remote_util(commit));
/home/newren/floss/git/object-file.c:2709
assert(would_convert_to_git_filter_fd(istate, path));
/home/newren/floss/git/parallel-checkout.c:280
assert(is_eligible_for_parallel_checkout(pc_item->ce, &pc_item->ca));
/home/newren/floss/git/scalar.c:244
assert(have_fsmonitor_support());
/home/newren/floss/git/scalar.c:254
assert(have_fsmonitor_support());
/home/newren/floss/git/sequencer.c:4968
assert(!(opts->signoff || opts->no_commit ||
opts->record_origin || should_edit(opts) ||
opts->committer_date_is_author_date ||
opts->ignore_date));
```

Note that if there are possibly problematic assertions, not necessarily
all of them will be shown in a single run, because the compiler errors
may include something like "ld: ... more undefined references to
`not_supposed_to_survive' follow" instead of listing each individually.
But in such cases, once you clean up a few that are shown in your first
run, subsequent runs will show (some of) the ones that remain, allowing
you to iteratively remove them all.

Helped-by: Bruno De Fraine <defraine@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-compat-util: introduce ASSERT() macro

Create a ASSERT() macro which is similar to assert(), but will not be
compiled out when NDEBUG is defined, and is thus safe to use even if its
argument has side-effects.

We will use this new macro in a subsequent commit to convert a few
existing assert() invocations to ASSERT(). In particular, we'll
convert the handful of invocations which cannot be proven to be free of
side effects with a simple compiler/linker hack.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>