git.ipfire.org Git - thirdparty/git.git/log

test-tool path-utils: support debugging "dubious ownership" issues

This adds a new sub-sub-command for `test-tool`, simply passing through
the command-line arguments to the `is_path_owned_by_current_user()`
function.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: special-case administrators even more

The check for dubious ownership has one particular quirk on Windows: if
running as an administrator, files owned by the Administrators _group_
are considered owned by the user.

The rationale for that is: When running in elevated mode, Git creates
files that aren't owned by the individual user but by the Administrators
group.

There is yet another quirk, though: The check I introduced to determine
whether the current user is an administrator uses the
`CheckTokenMembership()` function with the current process token. And
that check only succeeds when running in elevated mode!

Let's be a bit more lenient here and look harder whether the current
user is an administrator. We do this by looking for a so-called "linked
token". That token exists when administrators run in non-elevated mode,
and can be used to create a new process in elevated mode. And feeding
_that_ token to the `CheckTokenMembership()` function succeeds!

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

maintenance: add loose-objects.batchSize config

The 'loose-objects' task of 'git maintenance run' first deletes loose
objects that exit within packfiles and then collects loose objects into
a packfile. This second step uses an implicit limit of fifty thousand
that cannot be modified by users.

Add a new config option that allows this limit to be adjusted or ignored
entirely.

While creating tests for this option, I noticed that actually there was
an off-by-one error due to the strict comparison in the limit check. I
considered making the limit check turn true on equality, but instead I
thought to use INT_MAX as a "no limit" barrier which should mean it's
never possible to hit the limit. Thus, a new decrement to the limit is
provided if the value is positive. (The restriction to positive values
is to avoid underflow if INT_MIN is configured.)

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

maintenance: force progress/no-quiet to children

The --no-quiet option for 'git maintenance run' is supposed to indicate
that progress should happen even while ignoring the value of isatty(2).
However, Git implicitly asks child processes to check isatty(2) since
these arguments are not passed through.

The pass through of --no-quiet will be useful in a test in the next
change.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

completion: fix bugs with slashes in remote names

Previously, some calls to for-each-ref passed fixed numbers of path
components to strip from refs, assuming that remote names had no slashes
in them. This made completions like:

git push github/dseomn :com<Tab>

Result in:

git push github/dseomn :dseomn/completion-remote-slash

With this patch, it instead results in:

git push github/dseomn :completion-remote-slash

Signed-off-by: David Mandelberg <david@mandelberg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

completion: add helper to count path components

A follow-up commit will use this with for-each-ref to strip the right
number of path components from refnames.

Signed-off-by: David Mandelberg <david@mandelberg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit: move clear_commit_marks_many() loop body to clear_commit_marks()

clear_commit_marks_many() clears multiple commits one by one. Move the
code for handling a single commit to clear_commit_marks() and call it
instead of the other way around, to simplify the code.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: implement writing incremental MIDX bitmaps

Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.

The details for doing so are mostly straightforward. The main changes
are as follows:

  - find_object_pos() now makes use of an extra MIDX parameter which is
    used to locate the bit positions of objects which are from previous
    layers (and thus do not exist in the current layer's pack_order
    field).

    (Note also that the pack_order field is moved into struct
    write_midx_context to further simplify the callers for
    write_midx_bitmap()).

  - bitmap_writer_build_type_index() first determines how many objects
    precede the current bitmap layer and offsets the bits it sets in
    each respective type-level bitmap by that amount so they can be OR'd
    together.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators

Now that we have initialized arrays for each bitmap layer's type bitmaps
in the previous commit, adjust existing callers to use them in
preparation for multi-layered bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: keep track of each layer's type bitmaps

Prepare for reading the type-level bitmaps from previous bitmap layers
by maintaining an array for each type, where each element in that type's
array corresponds to one layer's bitmap for that type.

These fields will be used in a later commit to instantiate the 'struct
ewah_or_iterator' for each type.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ewah: implement `struct ewah_or_iterator`

While individual bitmap layers store different commit, type-level, and
pseudo-merge bitmaps, only the top-most layer is used to compute
reachability traversals.

Many functions which implement the aforementioned traversal rely on
enumerating the results according to the type-level bitmaps, and so
would benefit from a conceptual type-level bitmap that spans multiple
layers.

Implement `struct ewah_or_iterator` which is capable of enumerating
multiple EWAH bitmaps at once, and OR-ing the results together. When
initialized with, for example, all of the commit type bitmaps from each
layer, callers can pretend as if they are enumerating a large type-level
bitmap which contains the commits from *all* bitmap layers.

There are a couple of alternative approaches which were considered:

  - Decompress each EWAH bitmap and OR them together, enumerating a
    single (non-EWAH) bitmap. This would work, but has the disadvantage
    of decompressing a potentially large bitmap, which may not be
    necessary if the caller does not wish to read all of it.

  - Recursively call bitmap internal functions, reusing the "result" and
    "haves" bitmap from the top-most layer. This approach resembles the
    original implementation of this feature, but is inefficient in that
    it both (a) requires significant refactoring to implement, and (b)
    enumerates large sections of later bitmaps which are all zeros (as
    they pertain to objects in earlier layers).

    (b) is not so bad in and of itself, but can cause significant
    slow-downs when combined with expensive loop bodies.

This approach (enumerating an OR'd together version of all of the
type-level bitmaps from each layer) produces a significantly more
straightforward implementation with significantly less refactoring
required in order to make it work.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs

Prepare for using pseudo-merges with incremental MIDX bitmaps by
attempting to apply pseudo-merges from each layer when encountering a
given commit during a walk.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: compute disk-usage with incremental MIDXs

In a similar fashion as previous commits, use nth_midxed_pack() instead
of accessing the MIDX's ->packs array directly to support incremental
MIDXs.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs

Implement support for the special `--test-bitmap` mode of `git rev-list`
when using incremental MIDXs.

The bitmap_test_data structure is extended to contain a "base" pointer
that mirrors the structure of the bitmap chain that it is being used to
test.

When we find a commit to test, we first chase down the ->base pointer to
find the appropriate bitmap_test_data for the bitmap layer that the
given commit is contained within, and then perform the test on that
bitmap.

In order to implement this, light modifications are made to
bitmap_for_commit() to reimplement it in terms of a new function,
find_bitmap_for_commit(), which fills out a pointer which indicates the
bitmap layer which contains the given commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs

In a similar fashion as previous commits in the first phase of
incremental MIDXs, enumerate not just the packs in the current
incremental MIDX layer, but previous ones as well.

Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
single pack from a MIDX, use the oldest layer's preferred pack as it is
likely to contain the largest number of reusable sections.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs

Since we may ask for a pack_id that is in an earlier MIDX layer relative
to the one corresponding to our bitmap, use nth_midxed_pack() instead of
accessing the ->packs array directly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs

The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
EWAH-compressed bitmap corresponding to some given commit object.

Teach this function about incremental MIDX bitmaps by teaching it to
recur on earlier bitmap layers when it fails to find a given commit in
the current layer.

The changes to do so are as follows:

  - Avoid initializing hash_pos at its declaration, since
    bitmap_for_commit() is now a recursive function and may receive a
    NULL bitmap_index pointer as its first argument.

  - In cases where we would previously return NULL (to indicate that a
    lookup failed and the given bitmap_index does not contain an entry
    corresponding to the given commit), recursively call the function on
    the previous bitmap layer.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: open and store incremental bitmap layers

Prepare the pack-bitmap machinery to work with incremental MIDXs by
adding a new "base" field to keep track of the bitmap index associated
with the previous MIDX layer.

The changes in this commit are mostly boilerplate to open the correct
bitmap(s), add them to the chain of bitmap layers along the "base"
pointer, ensure that the correct packs and their reverse indexes are
loaded across MIDX layers, etc.

While we're at it, keep track of a base_nr field to indicate how many
bitmap layers (including the current bitmap) exist. This will be used in
a future commit to allocate an array of 'struct ewah_bitmap' pointers to
collect all of the respective type bitmaps among all layers to
initialize a multi-EWAH iterator.

Subsequent commits will teach the functions within the pack-bitmap
machinery how to interact with these new fields.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-revindex: prepare for incremental MIDX bitmaps

Prepare the reverse index machinery to handle object lookups in an
incremental MIDX bitmap. These changes are broken out across a few
functions:

  - load_midx_revindex() learns to use the appropriate MIDX filename
    depending on whether the given 'struct multi_pack_index *' is
    incremental or not.

  - pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
    object position in the MIDX pseudo-pack order, and find the
    earliest containing MIDX (similar to midx.c::midx_for_object().

  - midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the
    number of objects in the base (since 'vb - midx->revindx_data' is
    relative to the containing MIDX, and pack_pos_to_midx() expects a
    global position).

    Likewise, this function adjusts its output by adding
    m->num_objects_in_base to return a global position out through the
    `*pos` pointer.

Together, these changes are sufficient to use the multi-pack index's
reverse index format for incremental multi-pack reachability bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: describe incremental MIDX bitmaps

Prepare to implement support for reachability bitmaps for the new
incremental multi-pack index (MIDX) feature over the following commits.

This commit begins by first describing the relevant format and usage
details for incremental MIDX bitmaps.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: remove a "future work" item from the MIDX docs

One of the items listed as "future work" in the MIDX's technical
documentation is to extend the format to allow MIDXs to be written
incrementally across multiple layers.

This was suggested all the way back in ceab693d1f (multi-pack-index: add
design document, 2018-07-12), and implemented in b9497848df (Merge
branch 'tb/incremental-midx-part-1', 2024-08-19). Let's remove it
accordingly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

compat/mingw: fix EACCESS when opening files with `O_CREAT | O_EXCL`

In our CI systems we can observe that t0610 fails rather frequently.
This testcase races a bunch of git-update-ref(1) processes with one
another which are all trying to update a unique reference, where we
expect that all processes succeed and end up updating the reftable
stack. The error message in this case looks like the following:

    fatal: update_ref failed for ref 'refs/heads/branch-88': reftable: transaction prepare: I/O error

Instrumenting the code with a couple of calls to `BUG()` in relevant
sites where we return `REFTABLE_IO_ERROR` quickly leads one to discover
that this error is caused when calling `flock_acquire()`, which is a
thin wrapper around our lockfile API. Curiously, the error code we get
in such cases is `EACCESS`, indicating that we are not allowed to access
the file.

The root cause of this is an oddity of `CreateFileW()`, which is what
`_wopen()` uses internally. Quoting its documentation [1]:

    If you call CreateFile on a file that is pending deletion as a
    result of a previous call to DeleteFile, the function fails. The
    operating system delays file deletion until all handles to the file
    are closed. GetLastError returns ERROR_ACCESS_DENIED.

This behaviour is triggered quite often in the above testcase because
all the processes race with one another trying to acquire the lock for
the "tables.list" file. This is due to how locking works in the reftable
library when compacting a stack:

    1. Lock the "tables.list" file and reads its contents.

    2. Decide which tables to compact.

    3. Lock each of the individual tables that we are about to compact.

    4. Unlock the "tables.list" file.

    5. Compact the individual tables into one large table.

    6. Re-lock the "tables.list" file.

    7. Write the new list of tables into it.

    8. Commit the "tables.list" file.

The important step is (4): we don't commit the file directly by renaming
it into place, but instead we delete the lockfile so that concurrent
processes can continue to append to the reftable stack while we compact
the tables. And because we use `DeleteFileW()` to do so, we may now race
with another process that wants to acquire that lockfile. So if we are
unlucky, we would now see `ERROR_ACCESS_DENIED` instead of the expected
`ERROR_FILE_EXISTS`, which the lockfile subsystem isn't prepared to
handle and thus it will bail out without retrying to acquire the lock.

In theory, the issue is not limited to the reftable library and can be
triggered by every other user of the lockfile subsystem, as well. My gut
feeling tells me it's rather unlikely to surface elsewhere though.

Fix the issue by translating the error to `EEXIST`. This makes the
lockfile subsystem handle the error correctly: in case a timeout is set
it will now retry acquiring the lockfile until the timeout has expired.

With this, t0610 is now always passing on my machine whereas it was
previously failing in around 20-30% of all test runs.

[1]: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

meson: fix compat sources when compiling with MSVC

In our compat library we have both "msvc.c" and "mingw.c". The former is
mostly a thin wrapper around the latter as it directly includes it, but
it has a couple of extra headers that aren't included in "mingw.c" and
is expected to be used with the Visual Studio compiler toolchain.

While our Makefile knows to pick up the correct file depending on
whether or not the Visual Studio toolchain is used, we don't do the same
with Meson. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/fetch: avoid aborting closed reference transaction

As part of the reference transaction commit phase, the transaction is
set to a closed state regardless of whether it was successful of not.
Attempting to abort a closed transaction via `ref_transaction_abort()`
results in a `BUG()`.

In c92abe71df (builtin/fetch: fix leaking transaction with `--atomic`,
2024-08-22), logic to free a transaction after the commit phase is moved
to the centralized exit path. In cases where the transaction commit
failed, this results in a closed transaction being aborted and signaling
a bug.

Free the transaction and set it to NULL when the commit fails. This
allows the exit path to correctly handle the error without attempting to
abort the transaction.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

repack: begin combining cruft packs with `--combine-cruft-below-size`

The previous commit changed the behavior of repack's '--max-cruft-size'
to specify a cruft pack-specific override for '--max-pack-size'.

Introduce a new flag, '--combine-cruft-below-size' which is a
replacement for the old behavior of '--max-cruft-size'. This new flag
does explicitly what it says: it combines together cruft packs which are
smaller than a given threshold, and leaves alone ones which are
larger.

This accomplishes the original intent of '--max-cruft-size', which was
to avoid repacking cruft packs larger than the given threshold.

The new behavior is slightly different. Instead of building up small
packs together until the threshold is met, '--combine-cruft-below-size'
packs up *all* cruft packs smaller than the threshold. This means that
we may make a pack much larger than the given threshold (e.g., if you
aggregate 5 packs which are each 99 MiB in size with a threshold of 100
MiB).

But that's OK: the point isn't to restrict the size of the cruft packs
we generate, it's to avoid working with ones that have already grown too
large. If repositories still want to limit the size of the generated
cruft pack(s), they may use '--max-cruft-size'.

There's some minor test fallout as a result of the slight differences in
behavior between the old meaning of '--max-cruft-size' and the behavior
of '--combine-cruft-below-size'. In the test which is now called
"--combine-cruft-below-size combines packs", we need to use the new flag
over the old one to exercise that test's intended behavior. The
remainder of the changes there are to improve the clarity of the
comments.

Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

repack: avoid combining cruft packs with `--max-cruft-size`

In 37dc6d8104 (builtin/repack.c: implement support for
`--max-cruft-size`, 2023-10-02), we exposed new functionality that
allowed repositories to specify the behavior of when we should combine
multiple cruft packs together.

This feature was designed to ensure that we never repacked cruft packs
which were larger than the given threshold in order to provide tighter
I/O bounds for repositories that have many unreachable objects. In
essence, specifying '--max-cruft-size=N' instructed 'repack' to
aggregate cruft packs together (in order of ascending size) until the
combine size grows past 'N', and then make a new cruft pack whose
contents includes the packs we rolled up.

But this isn't quite how it works in practice. Suppose for example that
we have two cruft packs which are each 100MiB in size. One might expect
specifying "--max-cruft-size=200M" would combine these two packs
together, and then avoid repacking them until a pruning GC takes place.
In reality, 'repack' would try and aggregate these together, but writing
a pack that is strictly smaller than 200 MiB (since pack-objects'
"--max-pack-size" provides a strict bound for packs containing more than
one object).

So instead we'll write out a pack that is, say, 199 MiB in size, and
then another 1 MiB pack containing the balance. If we later repack the
repository without adding any new unreachable objects, we'll repeat the
same exercise again, making the same 199 MiB and 1 MiB packs each time.

This happens because of a poor choice to bolt the '--max-cruft-size'
functionality onto pack-objects' '--max-pack-size', forcing us to
generate packs which are always smaller than the provided threshold and
thus subject to repacking.

The following commit will introduce a new flag that implements something
similar to the behavior above. Let's prepare for that by making repack's
'--max-cruft-size' flag behave as an cruft pack-specific override for
'--max-pack-size'.

Do so by temporarily repurposing the 'collapse_small_cruft_packs()'
function to instead generate a cruft pack using the same instructions as
if we didn't specify any maximum pack size. The calling code looks
something like:

    if (args->max_pack_size && !cruft_expiration) {
        collapse_small_cruft_packs(in, args->max_pack_size, existing);
    } else {
        for_each_string_list_item(item, &existing->non_kept_packs)
            fprintf(in, "-%s.pack\n", item->string);
        for_each_string_list_item(item, &existing->cruft_packs)
            fprintf(in, "-%s.pack\n", item->string);
    }

This patch makes collapse_small_cruft_packs() behave identically to the
'else' arm of the conditional above. This repurposing of
'collapse_small_cruft_packs()' is intentional, since it will set us up
nicely to introduce the new behavior in the following commit.

Naturally, there is some test fallout in the test which exercises the
old meaning of '--max-cruft-size'. Mark that test as failing for now to
be dealt with in the following commit. Likewise, add a new test which
explicitly tests the behavior of '--max-cruft-size' to place a hard
limit on the size of any generated cruft pack(s).

Note that this is a breaking change, as it alters the user-visible
behavior of '--max-cruft-size'. But I'm OK changing this behavior in
this instance, since the behavior wasn't accurate to begin with.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/t7704-repack-cruft.sh: consolidate `write_blob()`

A previous commit moved a handful of tests from a different script into
t7704, including one that relies on generating random blobs.

Incidentally, the original home of this test defined its own helper
"write_blob" for doing so, which is identical in function to our
"generate_random_blob" (and is slightly inferior to the latter, which
cleans up after itself).

Rewrite the test that uses "write_blob" to no longer do so and then
remove the function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests

Now that a number of new tests have landed in t7704, make sure that they
all make sense and are testing the things they say they are.

Things are mostly OK, but a handful of tests needed tweaks. Those tweaks
are as follows:

  - Use the terms "too large" or "too small" in tests that exercise the
    '--max-cruft-size' behavior. This has historically been treated as a
    threshold beneath which to combine cruft packs, but that will change
    in a subsequent commit. Prepare for that by using a more generic
    term.

  - Remove references to "--max-cruft-size" in the freshening tests.
    These tests provide coverage of our ability to record updated mtimes
    for objects already in cruft packs whose mtimes are upserted from
    various sources (loose objects, finding that object in a new pack,
    another cruft pack, etc.).

    These have nothing to do with the '--max-cruft-size' feature, and in
    fact none of the tests even *use* '--max-cruft-size'. Name them
    appropriately to make it clear that these tests exercise freshening
    behavior, not '--max-cruft-size' behavior.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests

The cruft pack feature has two primary test scripts which exercise
various parts of it, which are:

- t5329-pack-objects-cruft.sh
- t7704-repack-cruft.sh

The former is designed to test low-level pack generation mechanics at
the 'git pack-objects --cruft'-level, which is plumbing. The latter, on
the other hand, is designed to test the user-facing behavior through
'git repack --cruft', which is porcelain (under the "ancillary
manipulators" sub-section).

At some point a handful of tests which should have been added to the
latter script were instead written to the former. This isn't a huge
deal, but rectifying it is straightforward. Move a handful of
'repack'-related tests out of t5329 and into their rightful home in
t7704.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rev-list: support NUL-delimited --missing option

The `--missing={print,print-info}` option for git-rev-list(1) prints
missing objects found while performing the object walk in the form:

        $ git rev-list --missing=print-info <rev>
        ?<oid> [SP <token>=<value>]... LF

Add support for printing missing objects in a NUL-delimited format when
the `-z` option is enabled.

        $ git rev-list -z --missing=print-info <rev>
        <oid> NUL missing=yes NUL [<token>=<value> NUL]...

In this mode, values containing special characters or spaces are printed
as-is without being escaped or quoted. Instead of prefixing the missing
OID with '?', a separate `missing=yes` token/value pair is appended.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rev-list: support NUL-delimited --boundary option

The `--boundary` option for git-rev-list(1) prints boundary objects
found while performing the object walk in the form:

        $ git rev-list --boundary <rev>
        -<oid> LF

Add support for printing boundary objects in a NUL-delimited format when
the `-z` option is enabled.

        $ git rev-list -z --boundary <rev>
        <oid> NUL boundary=yes NUL

In this mode, instead of prefixing the boundary OID with '-', a separate
`boundary=yes` token/value pair is appended.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rev-list: support delimiting objects with NUL bytes

When walking objects, git-rev-list(1) prints each object entry on a
separate line. Some options, such as `--objects`, may print additional
information about tree and blob object on the same line in the form:

        $ git rev-list --objects <rev>
        <tree/blob oid> SP [<path>] LF

Note that in this form the SP is appended regardless of whether the tree
or blob object has path information available. Paths containing a
newline are also truncated at the newline.

Introduce the `-z` option for git-rev-list(1) which reformats the output
to use NUL-delimiters between objects and associated info in the
following form:

        $ git rev-list -z --objects <rev>
        <oid> NUL [path=<path> NUL]

In this form, the start of each record is signaled by an OID entry that
is all hexidecimal and does not contain any '='. Additional path info
from `--objects` is appended to the record as a token/value pair
`path=<path>` as-is without any truncation.

For now, the `--objects` flag is the only options that can be used in
combination with `-z`. In a subsequent commit, NUL-delimited support for
other options is added. Other options that do not make sense when used
in combination with `-z` are rejected.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rev-list: refactor early option parsing

Before invoking `setup_revisions()`, the `--missing` and
`--exclude-promisor-objects` options are parsed early. In a subsequent
commit, another option is added that must be parsed early.

Refactor the code to parse both options in a single early pass.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rev-list: inline `show_object_with_name()` in `show_object()`

The `show_object_with_name()` function only has a single call site.
Inline call to `show_object_with_name()` in `show_object()` so the
explicit function can be cleaned up and live closer to where it is used.
While at it, factor out the code that prints the OID and newline for
both objects with and without a name. In a subsequent commit,
`show_object()` is modified to support printing object information in a
NUL-delimited format.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

treewide: replace assert() with ASSERT() in special cases

When the compiler/linker cannot verify that an assert() invocation is
free of side effects for us (e.g. because the assertion includes some
kind of function call), replace the use of assert() with ASSERT().

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: add build checking for side-effects in assert() calls

It is a big no-no to have side-effects in an assertion, because if the
assert() is compiled out, you don't get that side-effect, leading to the
code behaving differently.  That can be a large headache to debug.

We have roughly 566 assert() calls in our codebase (my grep might have
picked up things that aren't actually assert() calls, but most appeared
to be).  All but 9 of them can be determined by gcc to be free of side
effects with a clever redefine of assert() provided by Bruno De Fraine
(from
https://stackoverflow.com/questions/10593492/catching-assert-with-side-effects),
who upon request has graciously placed his two-liner into the public
domain without warranty of any kind.  The current 9 assert() calls
flagged by this clever redefinition of assert() appear to me to be free
of side effects as well, but are too complicated for a compiler/linker
to figure that since each assertion involves some kind of function call.
Add a CI job which will find and report these possibly problematic
assertions, and have the job suggest to the user that they replace these
with ASSERT() calls.

Example output from running:

```
ERROR: The compiler could not verify the following assert()
       calls are free of side-effects.  Please replace with
       ASSERT() calls.
/home/newren/floss/git/diffcore-rename.c:1409
assert(!dir_rename_count || strmap_empty(dir_rename_count));
/home/newren/floss/git/merge-ort.c:1645
assert(renames->deferred[side].trivial_merges_okay &&
       !strset_contains(&renames->deferred[side].target_dirs,
path));
/home/newren/floss/git/merge-ort.c:794
assert(omittable_hint ==
       (!starts_with(type_short_descriptions[type], "CONFLICT") &&
!starts_with(type_short_descriptions[type], "ERROR")) ||
       type == CONFLICT_DIR_RENAME_SUGGESTED);
/home/newren/floss/git/merge-recursive.c:1200
assert(!merge_remote_util(commit));
/home/newren/floss/git/object-file.c:2709
assert(would_convert_to_git_filter_fd(istate, path));
/home/newren/floss/git/parallel-checkout.c:280
assert(is_eligible_for_parallel_checkout(pc_item->ce, &pc_item->ca));
/home/newren/floss/git/scalar.c:244
assert(have_fsmonitor_support());
/home/newren/floss/git/scalar.c:254
assert(have_fsmonitor_support());
/home/newren/floss/git/sequencer.c:4968
assert(!(opts->signoff || opts->no_commit ||
opts->record_origin || should_edit(opts) ||
opts->committer_date_is_author_date ||
opts->ignore_date));
```

Note that if there are possibly problematic assertions, not necessarily
all of them will be shown in a single run, because the compiler errors
may include something like "ld: ... more undefined references to
`not_supposed_to_survive' follow" instead of listing each individually.
But in such cases, once you clean up a few that are shown in your first
run, subsequent runs will show (some of) the ones that remain, allowing
you to iteratively remove them all.

Helped-by: Bruno De Fraine <defraine@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-compat-util: introduce ASSERT() macro

Create a ASSERT() macro which is similar to assert(), but will not be
compiled out when NDEBUG is defined, and is thus safe to use even if its
argument has side-effects.

We will use this new macro in a subsequent commit to convert a few
existing assert() invocations to ASSERT(). In particular, we'll
convert the handful of invocations which cannot be proven to be free of
side effects with a simple compiler/linker hack.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable: adapt write_object_record() to propagate block_writer_add() errors

Previously, write_object_record() would flush the current block and retry
appending the record whenever block_writer_add() returned any nonzero
error. This forced an assumption that every failure meant the block was
full, even when errors such as memory allocation or I/O failures occurred.

Update the write_object_record() to inspect the error code returned by
block_writer_add() and flush and reinitialize the writer iff the
error is REFTABLE_ENTRY_TOO_BIG_ERROR. For any other error, immediately
propagate it.

If the flush and reinitialization still fail with
REFTABLE_ENTRY_TOO_BIG_ERROR, reset the record's offset length to zero
before a final attempt.

All call sites now handle various error codes returned by
block_writer_add().

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable: adapt writer_add_record() to propagate block_writer_add() errors

Previously, writer_add_record() would flush the current block and retry
appending the record whenever block_writer_add() returned any nonzero
error. This forced an assumption that every failure meant the block was
full, even when errors such as memory allocation or I/O failures occurred.

Update the writer_add_record() to inspect the error code returned by
block_writer_add() and only flush and reinitialize the writer when the
error is REFTABLE_ENTRY_TOO_BIG_ERROR. For any other error, immediately
propagate it.

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable: propagate specific error codes in block_writer_add()

Previously, functions block_writer_add() and related functions returned
-1 when the record did not fit, forcing the caller to assume that any
failure meant the entry was too big. Replace these generic -1 returns
with defined error codes.

This prepares the codebase for finer-grained error handling so that
callers can distinguish between a block-full condition and other errors.

Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pseudo-merge.h: fix a typo

The comment added in 7252d9a036 (pseudo-merge: implement support for
finding existing merges, 2024-05-23) misspells 'bitmap' as 'bitamp'.

Correct that so that we no longer have any stray "bitamps" lurking
throughout the tree:

$ git grep -ci bitamp | wc -l
0

Noticed-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refspec: replace `refspec_item_init()` with fetch/push variants

For similar reasons as in the previous refactoring of `refspec_init()`
into `refspec_init_fetch()` and `refspec_init_push()`, apply the same
refactoring to `refspec_item_init()`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refspec: remove refspec_item_init_or_die()

There are two callers of this function, which ensures that a dispatched
call to refspec_item_init() does not fail.

In the following commit, we're going to add fetch/push-specific variants
of refspec_item_init(), which will turn one function into two. To avoid
introducing yet another pair of new functions (such as
refspec_item_init_push_or_die() and refspec_item_init_fetch_or_die()),
let's remove the thin wrapper entirely.

This duplicates a single line of code among two callers, but thins the
refspec.h API by one function, and prevents introducing two more in the
following commit.

Note that we still have a trailing Boolean argument in the function
`refspec_item_init()`. The following commit will address this.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refspec: replace `refspec_init()` with fetch/push variants

To avoid having a Boolean argument in the refspec_init() function,
replace it with two variants:

- `refspec_init_fetch()`
- `refspec_init_push()`

to codify the meaning of that Boolean into the function's name itself.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refspec: treat 'fetch' as a Boolean value

Since 6d4c057859 (refspec: introduce struct refspec, 2018-05-16), we
have macros called REFSPEC_FETCH and REFSPEC_PUSH. This confusingly
suggests that we might introduce other modes in the future, which, while
possible, is highly unlikely.

But these values are treated as a Boolean, and stored in a struct field
called 'fetch'. So the following:

    if (refspec->fetch == REFSPEC_FETCH) { ... }

, and

    if (refspec->fetch) { ... }

are equivalent. Let's avoid renaming the Boolean values "true" and
"false" here and remove the two REFSPEC_ macros mentioned above.

Since this value is truly a Boolean and will only ever take on a value
of 0 or 1, we can declare it as a single bit unsigned field. In
practice this won't shrink the size of 'struct refspec', but it more
clearly indicates the intent.

Note that this introduces some awkwardness like:

    refspec_item_init_or_die(&spec, refspec, 1);

, where it's unclear what the final "1" does. This will be addressed in
the following commits.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jk/fetch-ref-prefix-cleanup' into tb/refspec-fetch-cleanup

* jk/fetch-ref-prefix-cleanup:
  fetch: use ref prefix list to skip ls-refs
  fetch: avoid ls-refs only to ask for HEAD symref update
  fetch: stop protecting additions to ref-prefix list
  fetch: ask server to advertise HEAD for config-less fetch
  refspec_ref_prefixes(): clean up refspec_item logic
  t5516: beef up exact-oid ref prefixes test
  t5516: drop NEEDSWORK about v2 reachability behavior
  t5516: prefer "oid" to "sha1" in some test titles
  t5702: fix typo in test name

http.c: allow custom TCP keepalive behavior via config

curl supports a few options to control when and how often it should
instruct the OS to send TCP keepalives, like KEEPIDLE, KEEPINTVL, and
KEEPCNT. Until this point, there hasn't been a way for users to change
what values are used for these options, forcing them to rely on curl's
defaults.

But we do unconditionally enable TCP keepalives without giving users an
ability to tweak any fine-grained parameters. Ordinarily this isn't a
problem, particularly for users that have fast-enough connections,
and/or are talking to a server that has generous or nonexistent
thresholds for killing a connection it hasn't heard from in a while.

But it can present a problem when one or both of those assumptions fail.
For instance, I can reliably get an in-progress clone to be killed from
the remote end when cloning from some forges while using trickle to
limit my clone's bandwidth.

For those users and others who wish to more finely tune the OS's
keepalive behavior, expose configuration and environment variables which
allow setting curl's KEEPIDLE, KEEPINTVL, and KEEPCNT options.

Note that while KEEPIDLE and KEEPINTVL were added in curl 7.25.0,
KEEPCNT was added much more recently in curl 8.9.0. Per f7c094060c
(git-curl-compat: remove check for curl 7.25.0, 2024-10-23), both
KEEPIDLE and KEEPINTVL are set unconditionally. But since we may be
compiled with a curl that isn't as new as 8.9.0, only set KEEPCNT when
we have CURLOPT_TCP_KEEPCNT to begin with.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

http.c: inline `set_curl_keepalive()`

At the end of `get_curl_handle()` we call `set_curl_keepalive()` to
enable TCP keepalive probes on our CURL handle. `set_curl_keepalive()`
dates back to 47ce115370 (http: use curl's tcp keepalive if available,
2013-10-14), which conditionally compiled different variants of
`set_curl_keepalive()` depending on what version of curl we were
compiled with[^1].

As of f7c094060c (git-curl-compat: remove check for curl 7.25.0,
2024-10-23), we no longer conditionally compile `set_curl_keepalive()`
since we no longer support pre-7.25.0 versions of curl. But the version
of that function that we kept is really just a thin wrapper around
setting the TCP_KEEPALIVE option, so there's no reason to keep it in its
own function.

Inline the definition of `set_curl_keepalive()` to within
`get_curl_handle()` so that the setup of our CURL handle is
self-contained.

[1]: The details are spelled out in 47ce115370, but the gist is curl
  7.25.0 and newer use CURLOPT_TCP_KEEPALIVE, older versions use
  CURLOPT_SOCKOPTFUNCTION with a custom callback, and older versions
  that predate even that option do nothing.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

http.c: introduce `set_long_from_env()` for convenience

In 7059cd99fc (http_init(): Fix config file parsing, 2009-03-09), http.c
gained a new "set_from_env()" function as a convenience function around
conditionally assigning an environment variable to some variable if and
only if the environment variable was set to begin with.

But prior to 7059cd99fc, there were two spots which need to first
strtol() whatever is set in the environment before assigning it to a
long pointer. Both instances stored the result of getenv() in a
temporary variable, and conditionally strtol() it depending on whether
or not getenv() returned NULL.

Replace those two instances with a new cousin of 'set_from_env()' called
'set_long_from_env()', which does what its name suggests. This allows us
to remove the temporary variables and clean up some minor code
duplication while also adding more robust error handling.

More importantly, however, it prepares us for a future commit which will
introduce more instances of assigning an environment variable to a long.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

http.c: remove unnecessary casts to long

When parsing 'http.lowSpeedLimit' and 'http.lowSpeedTime', we explicitly
cast the result of 'git_config_int()' to a long before assignment. This
cast has been in place since all the way back in 58e60dd203 (Add support
for pushing to a remote repository using HTTP/DAV, 2005-11-02).

But that cast has always been unnecessary, since long is guaranteed to
be at least as wide as int. Let's drop the cast accordingly.

Noticed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci/github: add missing 'CI_JOB_IMAGE' env variable

The CI setups of GitLab and GitHub use a common dependency management
script 'ci/install-dependencies.sh'. The script install the necessary
packages based on a combination of the "$distro" and "$jobname" env
variables.

The "$distro" variable is derived from the "CI_JOB_IMAGE" env variable
set by the CI configs. In the GitHub CI config, some of the jobs are
missing this variable. For the 'Documentation' job which depends on
'meson' being installed, this raises an error since the 'meson'
dependency is never installed.

Fix this by adding the 'CI_JOB_IMAGE' variable to all missing jobs. We
don't add it the windows jobs, since they manager their dependency as
part of the CI config and no further dependency management is needed.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: apply new format to git-branch man page

- Switch the synopsis to a synopsis block which automatically
formats placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine applies synopsis rules to
these spans.

Possible values for some variables, that were mentioned in the description
prose, are now made into enumerated list.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

completion: take into account the formatting backticks for options

With the modern formatting of the manpages, the options and commands are now
backticked in their definition lists. This patch updates the generation of
the completion list to take into account this new format.

The script `generate-configlist.sh` is updated to get rid of extraneous
commands and fit everything in a single sed script.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

index-pack, unpack-objects: restore missing ->init_fn

Commit 0578f1e66a ("global: adapt callers to use generic hash context helpers")
accidentally removed `->init_fn`, which is required for OpenSSL 3+ SHA1.

This fixes the following error on fetch:
fatal: fetch-pack: invalid index-pack output

Signed-off-by: Jensen Huang <hmz007@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

promisor-remote: compare remote names case sensitively

Because the "[remote "nick"] fetch = ..." configuration variables
have the nickname in the second part, the nicknames are case
sensitive, unlike the first and the third component (i.e.
"remote.origin.fetch" and "Remote.origin.FETCH" are the same thing,
but "remote.Origin.fetch" and "remote.origin.fetch" are different).

Let's follow the way Git works in general and compare the remote
names case sensitively when processing advertised remotes.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

promisor-remote: fix possible issue when no URL is advertised

In the 'KnownUrl' case, in should_accept_remote(), let's check that
`remote_url` is not NULL before we use strcmp() to compare it with
the local URL. This could avoid crashes if a server starts to not
advertise any URL in the future.

If `remote_url` is NULL, we should reject the URL. Let's also warn in
this case because we warn otherwise when a remote is rejected to try
to help diagnose things at the end of the function.

And while we are checking that remote_url is not NULL and warning if
it is, it makes sense to also help diagnose the case where remote_url
is empty.

Also while at it, let's spell "URL" with uppercase letters in all the
warnings.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

promisor-remote: fix segfault when remote URL is missing

Using strvec_push() to push `NULL` into a 'strvec' results in a
segfault, because `xstrdup(NULL)` crashes.

So when an URL is missing from the config, let's not push the remote
name and URL into the 'strvec's.

While at it, let's also not push them in case the URL is empty. It's
just not worth the trouble and it's consistent with how Git otherwise
treats missing and empty URLs in the same way.

Note that in case of missing or empty URL, Git uses the remote name to
fetch, which can work if the remote is on the same filesystem. So
configurations where the client, server and remote are all on the same
filesystem may need URLs to be configured even if they are the same as
the remote names. But this is a rare case, and the work around is easy
enough.

We leave improving the strvec API and/or xstrdup() for a future
separate effort.

While at it, let's also use git_config_get_string_tmp() instead of
git_config_get_string() to simplify memory management.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5710: arrange to delete the client before cloning

If `test_when_finished "rm -rf client"` is run after we clone, it
will not run if the clone failed, so the "client" directory might
not be removed at the end of the test.

`git clone` does try to remove the directory when it fails, but
let's be safe and try to protect against possibly weird clone
failures by moving `test_when_finished "rm -rf client"` before
the clone. It just makes more sense this way around.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fetch: don't ask for remote HEAD if followRemoteHEAD is "never"

When we are going to consider updating the refs/remotes/*/HEAD symref,
we have to ask the remote side where its HEAD points. But if we know
that the feature is disabled by config, we don't need to bother!

This saves a little bit of work and network communication for the
server. And even a little bit of effort on the client, as our local
set_head() function did a bit of work matching the remote HEAD before
realizing that we're not going to do anything with it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fetch: only respect followRemoteHEAD with configured refspecs

The new followRemoteHEAD feature is triggered for almost every fetch,
causing us to ask the server about the remote "HEAD" and to consider
updating our local tracking HEAD symref. This patch limits the feature
only to the case when we are fetching a remote using its configured
refspecs (typically into its refs/remotes/ hierarchy). There are two
reasons for this.

One is efficiency. E.g., the fixes in 6c915c3f85 (fetch: do not ask for
HEAD unnecessarily, 2024-12-06) and 20010b8c20 (fetch: avoid ls-refs
only to ask for HEAD symref update, 2025-03-08) were aimed at reducing
the work we do when we would not be able to update HEAD anyway. But they
do not quite cover all cases. The remaining one is:

git fetch origin refs/heads/foo:refs/remotes/origin/foo

which _sometimes_ can update HEAD, but usually not. And that leads us to
the second point, which is being simple and explainable.

The code for updating the tracking HEAD symref requires both that we
learned which ref the remote HEAD points at, and that the server
advertised that ref to us. But because the v2 protocol narrows the
server's advertisement, the command above would not typically update
HEAD at all, unless it happened to point to the "foo" branch. Or even
weirder, it probably _would_ update if the server is very old and
supports only the v0 protocol, which always gives a full advertisement.

This creates confusing behavior for the user: sometimes we may try to
update HEAD and sometimes not, depending on vague rules.

One option here would be to loosen the update code to accept the remote
HEAD even if the server did not advertise that ref. I think that could
work, but it may also lead to interesting corner cases (e.g., creating a
dangling symref locally, even though the branch is not unborn on the
server, if we happen not to have fetched it).

So let's instead simplify the rules: we'll only consider updating the
tracking HEAD symref when we're doing a full fetch of the remote's
configured refs. This is easy to implement; we can just set a flag at
the moment we realize we're using the configured refspecs. And we can
drop the special case code added by 6c915c3f85 and 20010b8c20, since
this covers those cases. The existing tests from those commits still
pass.

In t5505, an incidental call to "git fetch <remote> <refspec>" updated
HEAD, which caused us to adjust the test in 3f763ddf28 (fetch: set
remote/HEAD if it does not exist, 2024-11-22). We can now adjust that
back to how it was before the feature was added.

Even though t5505 is incidentally testing our new desired behavior,
we'll add an explicit test in t5510 to make sure it is covered.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jk/fetch-ref-prefix-cleanup' into jk/fetch-follow-remote-head-fix

* jk/fetch-ref-prefix-cleanup:
  fetch: use ref prefix list to skip ls-refs
  fetch: avoid ls-refs only to ask for HEAD symref update
  fetch: stop protecting additions to ref-prefix list
  fetch: ask server to advertise HEAD for config-less fetch
  refspec_ref_prefixes(): clean up refspec_item logic
  t5516: beef up exact-oid ref prefixes test
  t5516: drop NEEDSWORK about v2 reachability behavior
  t5516: prefer "oid" to "sha1" in some test titles
  t5702: fix typo in test name

docs: add BreakingChanges to TECH_DOCS target

When BreakingChanges.txt was added in 57ec9254eb9 (docs: introduce
document to announce breaking changes, 2024-06-14) there was no
corresponding change to the Makefile to build it. Fix that by adding it
to the TECH_DOCS target.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-refs doc: fix indentation for --exclude

Separate the paragraphs in the description of `--exclude` with a `+`
rather than an empty line to indent the whole description rather than
just the first paragraph.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

am: switch from merge_recursive_generic() to merge_ort_generic()

Switch from merge-recursive to merge-ort.  Adjust the following
testcases due to the switch:

* t4151: This test left an untracked file in the way of the merge.
  merge-recursive could only sometimes tell when untracked files were
  in the way, and by the time it discovers others, it has already made
  too many changes to back out of the merge.  So, instead of writing the
  results to e.g. 'file1' it would instead write them to
  'file1~branch1'.  This is confusing for users, because they might not
  notice 'file1~branch1' and accidentally add and commit 'file1'.
  In contrast, merge-ort correctly notices the file in the way before
  making any changes and aborts.  Since this test didn't care about the
  file in the way, just remove it before calling git-am.

* t4255: Usage of merge-ort allows us to change two known failures into
  successes.

* t6427: As noted a few commits ago, the choice of conflict label for
  diff3 markers for the ancestor commit was previously handled by
  merge-recursive.c rather than by callers.  Since that has now changed,
  `git am` needs to specify that label.  Although the previous conflict
  label ("constructed merge base") was already fairly somewhat slanted
  towards `git am`, let's use wording more along the lines of the
  related command-line flag from `git apply` and function involved to
  tie it more closely to `git am`.

Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: fix merge.directoryRenames=false

There are two issues here.

First, when merge.directoryRenames is set to false, there are a few code
paths that should be turned off.  I missed one; collect_renames() was
still doing some directory rename detection logic unconditionally.  It
ended up not having much effect because
get_provisional_directory_renames() was skipped earlier and not setting
up renames->dir_renames, but the code should still be skipped.

Second, the larger issue is that sometimes we get a cached_pair rename
from a previous commit being replayed mapping A->B, but in a subsequent
commit but collect_merge_info() doesn't even recurse into the
directory containing B because there are no source pairings for that
rename that are relevant; we can merge that commit fine without knowing
the rename.  But since the cached renames are added to the normal
renames, when we go to process it and find that B is not part of
opt->priv->paths, we hit the assertion error
  process_renames: Assertion `newinfo && ~newinfo->merged.clean` failed.
I think we could fix this at the beginning of detect_regular_renames() by
pruning from cached_pairs any entry whose destination isn't in
opt->priv->paths, but it's suboptimal in that we'd kind of like the
cached_pair to be restored afterwards so that it can help the subsequent
commit, but more importantly since it sits at the intersection of
the caching renames optimization and the relevant renames optimization,
and the trivial directory resolution optimization, and I don't currently
have Documentation/technical/remembering-renames.txt fully paged in, I'm
not sure if that's a full solution or a bandaid for the current
testcase.  However, since the remembering renames optimization was the
weakest of the set, and the optimization is far less important when
directory rename detection is off (as that implies far fewer potential
renames), let's just use a bigger hammer to ensure this special case is
fixed: turn off the rename caching.  We do the same thing already when
we encounter rename/rename(1to1) cases (as per `git grep -3
disabling.the.optimization`, though it uses a slightly different
triggering mechanism since it's trying to affect the next time that
merge_check_renames_reusable() is called), and I think it makes sense
to do the same here.

Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t3650: document bug when directory renames are turned off

There is a bug in the way renames are cached that rears its head when
`merge.directoryRenames` is set to false; it results in the following
message:

merge-ort.c:3002: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed.
Aborted

It is quite a curious bug: the same test case will succeed, without any
assertion, if instead run with `merge.directoryRenames=true`.

Further, the assertion does not manifest while replaying the first
commit, it manifests while replaying the _second_ commit of the commit
range. But it does _not_ manifest when the second commit is replayed
individually.

This would indicate that there is an incomplete rename cache left-over
from the first replayed commit which is being reused for the second
commit, and if directory rename detection is enabled, the missing paths
are somehow regenerated.

Incidentally, the same bug can by triggered by modifying t6429 to switch
from merge.directoryRenames=true to merge.directoryRenames=false.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[en: tweaked the commit message slightly, including adjusting the
line number of the assertion to the latest version, and the much
later discovery that a simple t6429 tweak would also display the
issue.]
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: support having merge verbosity be set to 0

Various callers such as am & checkout set the merge verbosity to 0 to
avoid having conflict messages printed. While this could be achieved by
avoiding the wrappers from merge-ort-wrappers and instead passing 0 for
display_update_msgs to merge_switch_to_result(), for simplicity of
converting callers simply allow them to also achieve this with the
merge-ort-wrappers by setting verbosity to 0.

Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: allow rename detection to be disabled

When merge-ort was written, I did not at first allow rename detection to
be disabled, because I suspected that most folks disabling rename
detection were doing so solely for performance reasons.  Since I put a
lot of working into providing dramatic speedups for rename detection
performance as used by the merge machinery, I wanted to know if there
were still real world repositories where rename detection was
problematic from a performance perspective.  We have had years now to
collect such information, and while we never received one, waiting
longer with the option disabled seems unlikely to help surface such
issues at this point.  Also, there has been at least one request to
allow rename detection to be disabled for behavioral rather than
performance reasons (see the thread including
https://lore.kernel.org/git/CABPp-BG-Nx6SCxxkGXn_Fwd2wseifMFND8eddvWxiZVZk0zRaA@mail.gmail.com/
), so let's start heeding the config and command line settings.

Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: add new merge_ort_generic() function

merge-recursive.[ch] have three entry points:
  * merge_trees()
  * merge_recursive()
  * merge_recursive_generic()
merge-ort*.[ch] only has equivalents for the first two.  Add an
equivalent for the final entry point, so we can switch callers to
use it and remove merge-recursive.[ch].

While porting it over, finally fix the issue with the label for the
ancestor (used when merge.conflictStyle=diff3 as a conflict label).
merge-recursive.c has traditionally not allowed callers to set that
label, but I have found that problematic for years.

(Side note: This function was initially part of the merge-ort rewrite,
but reviewers questioned the ancestor label funnyness which I was
never really happy with anyway.  It resulted in me jettisoning it and
hoping at the time that I would eventually be able to force the existing
callers to use some other API.  That worked with `git stash`, as per
874cf2a60444 (stash: apply stash using 'merge_ort_nonrecursive()',
2022-05-10), but this API is the most reasonable one for `git am` and
`git merge-recursive`, if we can just allow them some freedom over the
ancestor label.)

The merge_recursive_generic() function did not know whether it was being
invoked by `git stash`, `git merge-recursive`, or `git am`, and the
choice of meaningful ancestor label, when there is a unique ancestor,
varies for these different callers:

  * git am: ancestor is a constructed "fake ancestor" that user knows
            nothing about and has no access to.  (And is different than
            the normal thing we mean by a "virtual merge base" which is
            the merging of merge bases.)
  * git merge-recursive: ancestor might be a tree, but at least it
                         was one specified by the user (if they invoked
                         merge-recursive directly)
  * git stash: ancestor was the commit serving as the stash base

Thus, using a label like "constructed merge base" (as
merge_recursive_generic() does) presupposes that `git am` is the only
caller; it is incorrect for other callers.  This label has thrown me off
more than once.  Allow the caller to override when there is a unique
merge base.

Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: add missing commit C to the graph for --ancestry-path=H D..M

The graph for `--ancestry-path=H D..M` should contain commit C.

Signed-off-by: Han Jiang <jhcarl0814@gmail.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: restore: remove note on --patch w/ pathspecs

This note was added to the restore command docs in 46e91b663b
(checkout: split part of it to new command 'restore', 2019-04-25),
but it is now inaccurate. The underlying builtin `add -i` implementation,
made default in 0527ccb1b5 (add -i: default to the built-in implementation,
2021-11-30), supports pathspecs, so `git restore -p <pathspec>...` has
worked for all users since then. I bisected to verify this was the commit
that added support.

Signed-off-by: Adam Johnson <me@adamj.eu>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config.mak.dev: enable -Wunreachable-code

Having the compiler point out unreachable code can help avoid bugs, like
the one discussed in:

https://lore.kernel.org/git/20250307195057.GA3675279@coredump.intra.peff.net/

In that case it was found by Coverity, but finding it earlier saves
everybody time and effort.

We can use -Wunreachable-code to get some help from the compiler here.
Interestingly, this is a noop in gcc. It was a real warning up until gcc
4.x, when it was removed for being too flaky, but they left the
command-line option to avoid breaking users. See:

https://stackoverflow.com/questions/17249934/why-does-gcc-not-warn-for-unreachable-code

However, clang does implement this option, and it finds the case
mentioned above (and no other cases within the code base). And since we
run clang in several of our CI jobs, that's enough to get an early
warning of breakage.

We could enable it only for clang, but since gcc is happy to ignore it,
it's simpler to just turn it on for all developer builds.

Signed-off-by: Jeff King <peff@peff.net>
[jc: squashed meson.build change sent by Patrick]
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-compat-util: add NOT_CONSTANT macro and use it in atfork_prepare()

Our hope is that the number of code paths that falsely trigger
warnings with the -Wunreachable-code compilation option are small,
and they can be worked around case-by-case basis, like we just did
in the previous commit.  If we need such a workaround a bit more
often, however, we may benefit from a more generic and descriptive
facility that helps document the cases we need such workarounds.

    Side note: if we need the workaround all over the place, it
    simply means -Wunreachable-code is not a good tool for us to
    save engineering effort to catch mistakes.  We are still
    exploring if it helps us, so let's assume that it is not the
    case.

Introduce NOT_CONSTANT() macro, with which, the developer can tell
the compiler:

    Do not optimize this expression out, because, despite whatever
    you are told by the system headers, this expression should *not*
    be treated as a constant.

and use it as a replacement for the workaround we used that was
somewhat specific to the sigfillset case.  If the compiler already
knows that the call to sigfillset() cannot fail on a particular
platform it is compiling for and declares that the if() condition
would not hold, it is plausible that the next version of the
compiler may learn that sigfillset() that never fails would not
touch errno and decide that in this sequence:

errno = 0;
sigfillset(&all)
if (errno)
die_errno("sigfillset");

the if() statement will never trigger.  Marking that the value
returned by sigfillset() cannot be a constant would document our
intention better and would not break with such a new version of
compiler that is even more "clever".  With the marco, the above
sequence can be rewritten:

if (NOT_CONSTANT(sigfillset(&all)))
die_errno("sigfillset");

which looks almost like other innocuous annotations we have,
e.g. UNUSED.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'tb/multi-cruft-pack-refresh-fix' into tb/combine-cruft-below-size

* tb/multi-cruft-pack-refresh-fix:
builtin/pack-objects.c: freshen objects from existing cruft packs

reflog: implement subcommand to drop reflogs

While 'git-reflog(1)' currently allows users to expire reflogs and
delete individual entries, it lacks functionality to completely remove
reflogs for specific references. This becomes problematic in
repositories where reflogs are not needed but continue to accumulate
entries despite setting 'core.logAllRefUpdates=false'.

Add a new 'drop' subcommand to git-reflog that allows users to delete
the entire reflog for a specified reference. Include an '--all' flag to
enable dropping all reflogs from all worktrees and an addon flag
'--single-worktree', to only drop all reflogs from the current worktree.

While here, remove an extraneous newline in the file.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reflog: improve error for when reflog is not found

The 'git reflog expire' prints the error message '<ref> points nowhere!'
when used with a non-existent ref. This message is a bit confusing and
vague. Modify the message to be more clear and direct.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

run-command: use errno to check for sigfillset() error

Since enabling -Wunreachable-code, builds with clang on macOS now fail,
complaining that the die_errno() call in:

if (sigfillset(&all))
die_errno("sigfillset");

is unreachable. On that platform the manpage documents that sigfillset()
always returns success, and presumably the implementation is a macro or
inline function that does so in a way that is transparent to the
compiler.

But we should continue to check on other platforms, since POSIX says it
may return an error.

We could solve this with a compile-time knob to split the two cases
(assuming success on macOS and checking for the error elsewhere). But we
can also work around it more directly by relying on errno to check the
outcome (since POSIX dictates that errno will be set on error). And that
works around the compiler's cleverness, since it doesn't know the
semantics of errno (though I suppose if sigfillset() is simple enough,
it could perhaps realize that no writes to errno are possible; however
this does seem to work in practice).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: remove extraneous word in comment

"is was" -> "was"

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: fix accidental strset<->strintmap

Both strset_for_each_entry and strintmap_for_each_entry are macros that
evaluate to the same thing, so they are technically interchangeable.
However, the intent is that we use the one matching the variable type we
are passing. Unfortunately, I somehow mistakenly got one of these wrong
in 7bee6c100431 (merge-ort: avoid recursing into directories when we
don't need to, 2021-07-16) -- possibly related to the fact that
relevant_sources was initially a strset and later refactored into a
strintmap. Correct which macro we use.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7615: be more explicit about diff algorithm used

t7615 is entirely about testing the differences about different
diff algorithms, but it doesn't specify any diff algorithm when it
is testing myers. Given that we have discussed potentially switching
defaults (https://lore.kernel.org/git/xmqqed873vgn.fsf@gitster.g/), it
makes sense in tests that are about different diff algorithms to be
explicitly about which one is intended to be used in each test. Add
that specificity.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t6423: fix a comment that accidentally reversed two commits

The comment describing testcase 13b of t6423 somehow mixed up commits
A and B in one paragraph. Fix the references.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

stash: remove merge-recursive.h include

stash was modified to use merge_ort_nonrecursive() instead of
merge_recursive_generic() back in commit 874cf2a60444 (stash: apply
stash using 'merge_ort_nonrecursive()', 2022-05-10). That makes the
inclusion of merge-recursive.h unnecessary. In preparation for the
removal of merge-recursive.h, remove the unnecessary include.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

meson: fix perl detection when docs are enabled, but perl bindings aren't

The `perl` variable in meson.build is assigned to a program lookup,
which may have the value "not-found object" if configuring with
`-Dperl=disabled`.

There is already a list of other cases where we do need a perl command,
even when not building perl bindings. Building documentation should be
one of those cases, but was missing from the list. Add it.

Fixes:

```
$ meson setup builddir/ -Ddocs=man -Dperl=disabled -Dtests=false
[...]
Documentation/meson.build:308:22: ERROR: Tried to use not-found external program in "command"
```

Bug: https://bugs.gentoo.org/949247
Signed-off-by: Eli Schwartz <eschwartz@gentoo.org>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-gui: heed core.commentChar/commentString

This amends 1ae85ff6d (git-gui: strip comments and consecutive empty
lines from commit messages, 2024-08-13) to deal with custom comment
characters/strings.

The magic commentString value "auto" is not handled, because the option
makes no sense to me - it does not support comments in templates and
hook output, and it seems far-fetched that someone would introduce
comments during editing the message.

Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>

diffcore-rename: fix BUG when break detection and --follow used together

Prior to commit 9db2ac56168e (diffcore-rename: accelerate rename_dst
setup, 2020-12-11), the function add_rename_dst() resulted in quadratic
runtime since each call inserted the new entry into the array in sorted
order.  The reason for the sorted order requirement was so that
locate_rename_dst(), used when break detection is turned on, could find
the appropriate entry in logarithmic time via bisection on string
comparisons.  (It's better to be quadratic in moving pointers than
quadratic in string comparisons, so this made some sense.)  However,
since break detection always sticks the broken pairs adjacent to each
other, that commit decided to simply append entries to rename_dst, and
record the mapping of (filename) -> (index within rename_dst) via a
strintmap.  Doing this relied on the fact that when adding the source of
a broken pair via register_rename_src(), that the next item we'd process
was the other half of the same broken pair and would be added to
rename_dst via add_rename_dst().  This assumption was fine under break
detection alone, but the combination of break detection and
single_follow violated that assumption because of this code:

else if (options->single_follow &&
strcmp(options->single_follow, p->two->path))
continue; /* not interested */

which would end up skipping calling add_rename_dst() below that point.
Since I knew I was assuming that the dst pair of a break would always be
added right after the src pair of a break, I added a new BUG() directive
as part of that commit later on at time of use that would check my
assumptions held.  That BUG() didn't trip for nearly 4 years...which
sadly meant I had long since forgotten the related details.  Anyway...

When the dst half of a broken pair is skipped like this, it means that
not only could my recorded index be invalid (just past the end of the
array), it could also point to some unrelated dst that just happened to
be the next one added to the array.  So, to fix this, we need to add a
little more safety around the checks for the recorded break_idx.

It turns out that making a testcase to trigger this is quite the
challenge.  I actually added two testscases:
  * One testcase which uses --follow incorrectly (it uses its single
    pathspec to specifying something other than a single filename), and
    which triggers the same bug reported-by Olaf.  This triggers a
    special case within locate_rename_dst() where idx evaluates to 0
    and rename_dst is NULL, meaning that our return value of
    &rename_dst[idx] happens to evaluate to NULL as well.  This
    addressing of an index into a NULL array hints at deeper problems,
    which are raised in the next testcase...
  * A second testcase which when run under valgrind shows that the code
    actually depends upon unintialized memory, in particular the entry
    just after the end of the rename_dst array.

In short, when the two rare options -B and --follow are used together,
fix the accidental find of the wrong dst entry (which would often be
uninitialized memory just past the end of the array, but also could
have just been a dst for an unrelated path if no dst was recorded for
the expected path).  Do so by adding a little more care around checking
the recorded indices in break_idx.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

xdiff: avoid arithmetic overflow in xdl_get_hunk()

xdl_get_hunk() calculates the maximum number of common lines between two
changes that would fit into the same hunk for the given context options.
It involves doubling and addition and thus can overflow if the terms are
huge.

The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while
the type of the corresponding context and interhunkcontext in struct
diff_options is int.  On many platforms longs are bigger that ints,
which prevents the overflow.  On Windows they have the same range and
the overflow manifests as hunks that are split erroneously and lines
being repeated between them.

Fix the overflow by checking and not going beyond LONG_MAX.  This allows
specifying a huge context line count and getting all lines of a changed
files in a single hunk, as expected.

Reported-by: Jason Cho <jason11choca@proton.me>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Git 2.49

Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/pack-objects.c: freshen objects from existing cruft packs

Once an object is written into a cruft pack, we can only freshen it by
writing a new loose or packed copy of that object with a more recent
mtime.

Prior to 61568efa95 (builtin/pack-objects.c: support `--max-pack-size`
with `--cruft`, 2023-08-28), we typically had at most one cruft pack in
a repository at any given time. So freshening unreachable objects was
straightforward when already rewriting the cruft pack (and its *.mtimes
file).

But 61568efa95 changes things: 'pack-objects' now supports writing
multiple cruft packs when invoked with `--cruft` and the
`--max-pack-size` flag. Cruft packs are rewritten until they reach some
size threshold, at which point they are considered "frozen", and will
only be modified in a pruning GC, or if the threshold itself is
adjusted.

Prior to this patch, however, this process breaks down when we attempt
to freshen an object packed in an earlier cruft pack, and that cruft
pack is larger than the threshold and thus will survive the repack.

When this is the case, it is impossible to freshen objects in cruft
pack(s) when those cruft packs are larger than the threshold. This is
because we would avoid writing them in the new cruft pack entirely, for
a couple of reasons.

1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
    we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
    over the packs we're going to retain (which are marked as kept
    in-core by 'read_cruft_objects()').

    This means that we will avoid enumerating additional packed copies
    of objects found in any cruft packs which are larger than the given
    size threshold. Thus there is no opportunity to call
    'create_object_entry()' whatsoever.

2. We likewise will discard the loose copy (if one exists) of any
    unreachable object packed in a cruft pack that is larger than the
    threshold. Here our call path is 'add_unreachable_loose_objects()',
    which uses the 'add_loose_object()' callback.

    That function will eventually land us in 'want_object_in_pack()'
    (via 'add_cruft_object_entry()'), and we'll discard the object as it
    appears in one of the packs which we marked as kept in-core.

This means in effect that it is impossible to freshen an unreachable
object once it appears in a cruft pack larger than the given threshold.

Instead, we should pack an additional copy of an unreachable object we
want to freshen even if it appears in a cruft pack, provided that the
cruft copy has an mtime which is before the mtime of the copy we are
trying to pack/freshen. This is sub-optimal in the sense that it
requires keeping an additional copy of unreachable objects upon
freshening, but we don't have a better alternative without the ability
to make in-place modifications to existing *.mtimes files.

In order to implement this, we have to adjust the behavior of
'want_found_object()'. When 'pack-objects' is told that we're *not*
going to retain any cruft packs (i.e. the set of packs marked as kept
in-core does not contain a cruft pack), the behavior is unchanged.

But when there *is* at least one cruft pack that we're holding onto, it
is no longer sufficient to reject a copy of an object found in that
cruft pack for that reason alone. In this case, we only want to reject a
candidate object when copies of that object either:

- exists in a non-cruft pack that we are retaining, regardless of that
   pack's mtime, or

- exists in a cruft pack with an mtime at least as recent as the copy
   we are debating whether or not to pack, in which case freshening
   would be redundant.

To do this, keep track of whether or not we have any cruft packs in our
in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
flag. When we end up in this new special case, we replace a call to
'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
objects when we have a copy in an existing cruft pack with at least as
recent an mtime as our candidate (in which case "freshening" would be
redundant).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge tag 'l10n-2.49.0-rnd1' of https://github.com/git-l10n/git-po

l10n-2.49.0-rnd1

* tag 'l10n-2.49.0-rnd1' of https://github.com/git-l10n/git-po:
  l10n: zh_TW: Git 2.49.0 round 1
  l10n: update German translation
  l10n: po-id for 2.49
  l10n: zh_CN: updated translation for 2.49
  l10n: uk: add 2.49 translation
  l10n: tr: Update Turkish translations for 2.49.0
  l10n: ko: fix minor typo in Korean translation
  l10n: it: fix spelling of "sorgente" (Italian for "source")
  l10n: sv.po: Fix Swedish typos
  l10n: sv.po: Update Swedish translation
  l10n: fr: 2.49 round 2
  l10n: bg.po: Updated Bulgarian translation (5836t)
  l10n: Updated translation for vi-2.49

Merge branch 'l10n/zh-TW/2025-03-09' of github.com:l10n-tw/git-po

* 'l10n/zh-TW/2025-03-09' of github.com:l10n-tw/git-po:
l10n: zh_TW: Git 2.49.0 round 1

l10n: zh_TW: Git 2.49.0 round 1

Co-authored-by: Lumynous <lumynou5.tw@gmail.com>
Signed-off-by: Yi-Jyun Pan <pan93412@gmail.com>

Merge branch 'l10n-de-2.49' of github.com:ralfth/git

* 'l10n-de-2.49' of github.com:ralfth/git:
l10n: update German translation

l10n: update German translation

Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>

l10n: po-id for 2.49

Update following components:

  * builtin/clone.c
  * builtin/commit.c
  * builtin/fetch.c
  * builtin/index-pack.c
  * builtin/pack-objects.c
  * builtin/refs.c
  * builtin/repack.c
  * builtin/unpack-objects.c
  * command-list.h
  * diff.c
  * object-file.c
  * parse-options.c
  * promisor-remote.c
  * refspec.c
  * remote.c

Translate following new components:

  * path-walk.c
  * builtin/backfill.c
  * t/helper/test-path-walk.c

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>

A bit more updates after -rc2

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'pb/doc-follow-remote-head'

Doc updates.

* pb/doc-follow-remote-head:
config/remote.txt: improve wording for 'remote.<name>.followRemoteHEAD'
config/remote.txt: reunite 'severOption' description paragraphs

Merge branch 'tc/zlib-ng-fix'

"git version --build-options" stopped showing zlib version by
mistake due to recent refactoring, which has been corrected.

* tc/zlib-ng-fix:
help: print zlib-ng version number
help: include git-zlib.h to print zlib version

Merge branch 'ma/clone-doc-markup-fix'

Doc markup fix.

* ma/clone-doc-markup-fix:
git-clone doc: fix indentation

Merge branch 'ps/refname-avail-check-optim' into kn/non-transactional-batch-updates

* ps/refname-avail-check-optim: (43 commits)
  refs: reuse iterators when determining refname availability
  refs/iterator: implement seeking for files iterators
  refs/iterator: implement seeking for packed-ref iterators
  refs/iterator: implement seeking for ref-cache iterators
  refs/iterator: implement seeking for reftable iterators
  refs/iterator: implement seeking for merged iterators
  refs/iterator: provide infrastructure to re-seek iterators
  refs/iterator: separate lifecycle from iteration
  refs: stop re-verifying common prefixes for availability
  refs/files: batch refname availability checks for initial transactions
  refs/files: batch refname availability checks for normal transactions
  refs/reftable: batch refname availability checks
  refs: introduce function to batch refname availability checks
  builtin/update-ref: skip ambiguity checks when parsing object IDs
  object-name: allow skipping ambiguity checks in `get_oid()` family
  object-name: introduce `repo_get_oid_with_flags()`
  Git 2.49-rc0
  The fourteenth batch
  mailmap: fix check-mailmap with full mailmap line
  The thirteenth batch
  ...

refs: reuse iterators when determining refname availability

When verifying whether refnames are available we have to verify whether
any reference exists that is nested under the current reference. E.g.
given a reference "refs/heads/foo", we must make sure that there is no
other reference "refs/heads/foo/*".

This check is performed using a ref iterator with the prefix set to the
nested reference namespace. Until now it used to not be possible to
reseek iterators, so we always had to reallocate the iterator for every
single reference we're about to check. This keeps us from reusing state
that the iterator may have and that may make it work more efficiently.

Refactor the logic to reseek iterators. This leads to a sizeable speedup
with the "reftable" backend:

    Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):      39.8 ms ±   0.9 ms    [User: 29.7 ms, System: 9.8 ms]
      Range (min … max):    38.4 ms …  42.0 ms    62 runs

    Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):      31.9 ms ±   1.1 ms    [User: 27.0 ms, System: 4.5 ms]
      Range (min … max):    29.8 ms …  34.3 ms    74 runs

    Summary
      update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)

The "files" backend doesn't really show a huge impact:

    Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
      Time (mean ± σ):     392.3 ms ±   7.1 ms    [User: 59.7 ms, System: 328.8 ms]
      Range (min … max):   384.6 ms … 404.5 ms    10 runs

    Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
      Time (mean ± σ):     387.7 ms ±   7.4 ms    [User: 54.6 ms, System: 329.6 ms]
      Range (min … max):   377.0 ms … 397.7 ms    10 runs

    Summary
      update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
        1.01 ± 0.03 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)

This is mostly because it is way slower to begin with because it has to
create a separate file for each new reference, so the milliseconds we
shave off by reseeking the iterator doesn't really translate into a
significant relative improvement.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>