git.ipfire.org Git - thirdparty/git.git/log

midx: teach `midx_preferred_pack()` about incremental MIDXs

The function `midx_preferred_pack()` is used to determine the identity
of the preferred pack, which is the identity of a unique pack within
the MIDX which is used as a tie-breaker when selecting from which pack
to represent an object that appears in multiple packs within the MIDX.

Historically we have said that the MIDX's preferred pack has the unique
property that all objects from that pack are represented in the MIDX.
But that isn't quite true: a more precise statement would be that all
objects from that pack *which appear in the MIDX* are selected from that
pack.

This helps us extend the concept of preferred packs across a MIDX chain,
where some object(s) in the preferred pack may appear in other packs
in an earlier MIDX layer, in which case those object(s) will not appear
in a subsequent MIDX layer from either the preferred pack or any other
pack.

Extend the concept of preferred packs by using the pack which represents
the object at the first position in MIDX pseudo-pack order belonging to
the current MIDX layer (i.e., at position 'm->num_objects_in_base').

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: teach `midx_contains_pack()` about incremental MIDXs

Now that the `midx_contains_pack()` versus `midx_locate_pack()` debacle
has been cleaned up, teach the former about how to operate in an
incremental MIDX-aware world in a similar fashion as in previous
commits.

Instead of using either of the two `midx_for_object()` or
`midx_for_pack()` helpers, this function is split into two: one that
determines whether a pack is contained in a single MIDX, and another
which calls the former in a loop over all MIDXs.

This approach does not require that we change any of the implementation
in what is now `midx_contains_pack_1()` as it still operates over a
single MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: remove unused `midx_locate_pack()`

Commit 307d75bbe6 (midx: implement `midx_locate_pack()`, 2023-12-14)
introduced `midx_locate_pack()`, which was described at the time as a
complement to the function `midx_contains_pack()` which allowed
callers to determine where in the MIDX lexical order a pack appeared, as
opposed to whether or not it was simply contained.

307d75bbe6 suggests that future patches would be added which would
introduce callers for this new function, but none ever were, meaning the
function has gone unused since its introduction.

Clean this up by in effect reverting 307d75bbe6, which removes the
unused functions and inlines its definition back into
`midx_contains_pack()`.

(Looking back through the list archives when 307d75bbe6 was written,
this was in preparation for this[1] patch from back when we had the
concept of "disjoint" packs while developing multi-pack verbatim reuse.
That concept was abandoned before the series was merged, but I never
dropped what would become 307d75bbe6 from the series, leading to the
state prior to this commit).

[1]: https://lore.kernel.org/git/3019738b52ba8cd78ea696a3b800fa91e722eb66.1701198172.git.me@ttaylorr.com/

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: teach `fill_midx_entry()` about incremental MIDXs

In a similar fashion as previous commits, teach the `fill_midx_entry()`
function to work in a incremental MIDX-aware fashion.

This function, unlike others which accept an index into either the
lexical order of objects or packs, takes in an object_id, and attempts
to fill a caller-provided 'struct pack_entry' with the remaining pieces
of information about that object from the MIDX.

The function uses `bsearch_midx()` which fills out the frame-local 'pos'
variable, recording the given object_id's lexical position within the
MIDX chain, if found (if no matching object ID was found, we'll return
immediately without filling out the `pack_entry` structure).

Once given that position, we jump back through the `->base_midx` pointer
to ensure that our `m` points at the MIDX layer which contains the given
object_id (and not an ancestor or descendant of it in the chain). Note
that we can drop the bounds check "if (pos >= m->num_objects)" because
`midx_for_object()` performs this check for us.

After that point, we only need to make two special considerations within
this function:

  - First, the pack_int_id returned to us by `nth_midxed_pack_int_id()`
    is a position in the concatenated lexical order of packs, so we must
    ensure that we subtract `m->num_packs_in_base` before accessing the
    MIDX-local `packs` array.

  - Second, we must avoid translating the `pos` back to a MIDX-local
    index, since we use it as an argument to `nth_midxed_offset()` which
    expects a position relative to the concatenated lexical order of
    objects.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: teach `nth_midxed_offset()` about incremental MIDXs

In a similar fashion as in previous commits, teach the function
`nth_midxed_offset()` about incremental MIDXs.

The given object `pos` is used to find the containing MIDX, and
translated back into a MIDX-local position by assigning the return value
of `midx_for_object()` to it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: teach `bsearch_midx()` about incremental MIDXs

Now that the special cases callers of `bsearch_midx()` have been dealt
with, teach `bsearch_midx()` to handle incremental MIDX chains.

The incremental MIDX-aware version of `bsearch_midx()` works by
repeatedly searching for a given OID in each layer along the
`->base_midx` pointer, stopping either when an exact match is found, or
the end of the chain is reached.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: introduce `bsearch_one_midx()`

The `bsearch_midx()` function will be extended in a following commit to
search for the location of a given object ID across all MIDXs in a chain
(or the single non-chain MIDX if no chain is available).

While most callers will naturally want to use the updated
`bsearch_midx()` function, there are a handful of special cases that
will want finer control and will only want to search through a single
MIDX.

For instance, the object abbreviation code, which cares about object IDs
near to where we'd expect to find a match in a MIDX. In that case, we
want to look at the nearby matches in each layer of the MIDX chain, not
just a single one).

Split the more fine-grained control out into a separate function called
`bsearch_one_midx()` which searches only a single MIDX.

At present both `bsearch_midx()` and `bsearch_one_midx()` have identical
behavior, but the following commit will rewrite the former to be aware
of incremental MIDXs for the remaining non-special case callers.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: teach `nth_bitmapped_pack()` about incremental MIDXs

In a similar fashion as in previous commits, teach the function
`nth_bitmapped_pack()` about incremental MIDXs by translating the given
`pack_int_id` from the concatenated lexical order to a MIDX-local
lexical position.

When accessing the containing MIDX's array of packs, use the local pack
ID. Likewise, when reading the 'BTMP' chunk, use the MIDX-local offset
when accessing the data within that chunk.

(Note that the both the call to prepare_midx_pack() and the assignment
of bp->pack_int_id both care about the global pack_int_id, so avoid
shadowing the given 'pack_int_id' parameter).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: teach `nth_midxed_object_oid()` about incremental MIDXs

The function `nth_midxed_object_oid()` returns the object ID for a given
object position in the MIDX lexicographic order.

Teach this function to instead operate over the concatenated
lexicographic order defined in an earlier step so that it is able to be
used with incremental MIDXs.

To do this, we need to both (a) adjust the bounds check for the given
'n', as well as record the MIDX-local position after chasing the
`->base_midx` pointer to find the MIDX which contains that object.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: teach `prepare_midx_pack()` about incremental MIDXs

The function `prepare_midx_pack()` is part of the midx.h API and
loads the pack identified by the MIDX-local 'pack_int_id'. This patch
prepares that function to be aware of an incremental MIDX world.

To do this, introduce the second of the two general purpose helpers
mentioned in the previous commit. This commit introduces
`midx_for_pack()`, which is the pack-specific analog of
`midx_for_object()`, and works in the same fashion.

Like `midx_for_object()`, this function chases down the '->base_midx'
field until it finds the MIDX layer within the chain that contains the
given pack.

Use this function within `prepare_midx_pack()` so that the `pack_int_id`
it expects is now relative to the entire MIDX chain, and that it
prepares the given pack in the appropriate MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs

The function `nth_midxed_pack_int_id()` takes in a object position in
MIDX lexicographic order and returns an identifier of the pack from
which that object was selected in the MIDX.

Currently, the given object position is an index into the lexicographic
order of objects in a single MIDX. Change this position to instead refer
into the concatenated lexicographic order of all MIDXs in a MIDX chain.

This has two visible effects within the implementation of
`prepare_midx_pack()`:

  - First, the given position is now an index into the concatenated
    lexicographic order of all MIDXs in the order in which they appear
    in the MIDX chain.

  - Second the pack ID returned from this function is now also in the
    concatenated order of packs among all layers of the MIDX chain in
    the same order that they appear in the MIDX chain.

To do this, introduce the first of two general purpose helpers, this one
being `midx_for_object()`. `midx_for_object()` takes a double pointer to
a `struct multi_pack_index` as well as an object `pos` in terms of the
entire MIDX chain[^1].

The function chases down the '->base_midx' field until it finds the MIDX
layer within the chain that contains the given object. It then:

  - modifies the double pointer to point to the containing MIDX, instead
    of the tip of the chain, and

  - returns the MIDX-local position[^2] at which the given object can be
    found.

Use this function within `nth_midxed_pack_int_id()` so that the `pos` it
expects is now relative to the entire MIDX chain, and that it returns
the appropriate pack position for that object.

[^1]: As a reminder, this means that the object is identified among the
  objects contained in all layers of the incremental MIDX chain, not any
  particular layer. For example, consider MIDX chain with two individual
  MIDXs, one with 4 objects and another with 3 objects. If the MIDX with
  4 objects appears earlier in the chain, then asking for object 6 would
  return the second object in the MIDX with 3 objects.

[^2]: Building on the previous example, asking for object 6 in a MIDX
  chain with (4, 3) objects, respectively, this would set the double
  pointer to point at the MIDX containing three objects, and would
  return an index to the second object within that MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: add new fields for incremental MIDX chains

The incremental MIDX chain feature is designed around the idea of
indexing into a concatenated lexicographic ordering of object IDs
present in the MIDX.

When given an object position, the MIDX machinery needs to be able to
locate both (a) which MIDX layer contains the given object, and (b) at
what position *within that MIDX layer* that object appears.

To do this, three new fields are added to the `struct multi_pack_index`:

  - struct multi_pack_index *base_midx;
  - uint32_t num_objects_in_base;
  - uint32_t num_packs_in_base;

These three fields store the pieces of information suggested by their
respective field names. In turn, the `num_objects_in_base` and
`num_packs_in_base` fields are used to crawl backwards along the
`base_midx` pointer to locate the appropriate position for a given
object within the MIDX that contains it.

The following commits will update various parts of the MIDX machinery
(as well as their callers from outside of midx.c and midx-write.c) to be
aware and make use of these fields when performing object lookups.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: describe incremental MIDX format

Prepare to implement incremental multi-pack indexes (MIDXs) over the
next several commits by first describing the relevant prerequisites
(like a new chunk in the MIDX format, the directory structure for
incremental MIDXs, etc.)

The format is described in detail in the patch contents below, but the
high-level description is as follows.

Incremental MIDXs live in $GIT_DIR/objects/pack/multi-pack-index.d, and
each `*.midx` within that directory has a single "parent" MIDX, which is
the MIDX layer immediately before it in the MIDX chain. The chain order
resides in a file 'multi-pack-index-chain' in the same directory.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t3206: test_when_finished before dirtying operations, not after

Many existing tests in this script perform operation(s) and then use
test_when_finished to define how to undo the effect of the
operation(s).

This is backwards. When your operation(s) fail before you manage to
successfully call test_when_finished (remember, that these commands
must be all &&-chained, so a failure of an earlier operation mean
your test_when_finished may not be executed at all). You must
establish how to clean up your mess with test_when_finished before
you create the mess to be cleaned up.

Also make sure that the body of test_when_finished deals with case
where the cruft it wants to remove failed to be created, by using
"rm -f" (instead of "rm") to remove potential cruft files, and
having "|| :" after "git notes remove" to remove potential cruft
notes---both of these by default fail when asked to remove something
that does not exist, instead of being silently idempotent no-ops.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: port helper/test-hashmap.c to unit-tests/t-hashmap.c

helper/test-hashmap.c along with t0011-hashmap.sh test the hashmap.h
library. Migrate them to the unit testing framework for better
debugging, runtime performance and concise code.

Along with the migration, make 'add' tests from the shell script order
agnostic in unit tests, since they iterate over entries with the same
keys and we do not guarantee the order. This was already done for the
'iterate' tests[1].

The helper/test-hashmap.c is still not removed because it contains a
performance test meant to be run by the user directly (not used in
t/perf). And it makes sense for such a utility to be a helper.

[1]: e1e7a77141 (t: sort output of hashmap iteration, 2019-07-30)

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Josh Steadmon <steadmon@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/t7704-repack-cruft.sh: avoid failures during long-running tests

On systems where running t7704.09 takes longer than 10 seconds, the test
can fail.

The test works by doing the following:

  - First write three unreachable objects, backdating the mtime for a
    single object ($foo) which we expect to prune.

  - Repack the repository into a pack containing reachable objects, and
    another three cruft packs, each containing one of the objects
    written in the previous step.

  - Backdate the mtimes of the cruft pack *.mtimes files themselves.
    (Note that this does not affect what is pruned further down in the
    test, but is done to ensure that the cruft packs are rewritten
    during that step).

  - Then repack with --cruft-expiration=10.seconds.ago, expecting to
    prune one of the three unreachable objects written in the first
    step.

  - Assert that the surviving cruft packs were rewritten, object $foo is
    pruned, and unreachable objects $bar, and $baz remain in the
    repository.

If longer than 10 seconds pass between writing the three unreachable
objects (the first step) and the "git repack --cruft" (the fourth step),
we will mistakenly prune more objects than expected, causing the test to
fail.

The $foo object which we expect to prune has its mtime set back to
10,000 seconds relative to the current time, but we prune it with a
cutoff of 10.seconds.ago.

Instead, set the cutoff to be 1,000 seconds to give the test much longer
time to run without failing. This helps platforms where running
individual tests can perform slowly, on my machine this test runs much
more quickly:

    $ hyperfine './t7704-repack-cruft.sh --run=9'
    Benchmark 1: ./t7704-repack-cruft.sh --run=9
      Time (mean ± σ):     647.4 ms ±  30.7 ms    [User: 528.5 ms, System: 124.1 ms]
      Range (min … max):   594.1 ms … 696.5 ms    10 runs

Reported-by: Randall Becker <randall.becker@nexbridge.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t6421: fix test to work when repo dir contains d0

The `grep` statement in this test looks for `d0.*<string>`, attempting
to filter to only show lines that had tabular output where the 2nd
column had `d0` and the final column had a substring of
[`git -c `]`fetch.negotiationAlgorithm`. These lines also have
`child_start` in the 4th column, but this isn't part of the condition.

A subsequent line will have `d1` in the 2nd column, `start` in the 4th
column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final
column. If `/path/to/git/git` contains the substring `d0`, then this
line is included by `grep` as well as the desired line, leading to an
effective doubling of the number of lines, and test failures.

Tighten the grep expression to require `d0` to be surrounded by spaces,
and to have the `child_start` label.

Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

set errno=0 before strtoX calls

To detect conversion failure after calls to functions like `strtod`, one
can check `errno == ERANGE`. These functions are not guaranteed to set
`errno` to `0` on successful conversion, however. Manual manipulation of
`errno` can likely be avoided by checking that the output pointer
differs from the input pointer, but that's not how other locations, such
as parse.c:139, handle this issue; they set errno to 0 prior to
executing the function.

For every place I could find a strtoX function with an ERANGE check
following it, set `errno = 0;` prior to executing the conversion
function.

Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

log-tree: use decimal_width()

Reduce code duplication by calling decimal_width() to count the digits
in the number of commits instead of calculating it locally.

It also has the advantage of returning int, which is the exact type
expected by the printf()-like function strbuf_addf() for field width
arguments.

Additionally, decimal_width() supports numbers bigger than 1410065407,
which is (hopefully) just a theoretical advantage.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: prevent memory leak by freeing packed_ref_store

This complements 64a6dd8ffc (refs: implement removal of ref storages,
2024-06-06).

Signed-off-by: Sven Strickroth <email@cs-ware.de>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

apply: canonicalize modes read from patches

Git stores only canonical modes for blobs. So for a regular file, we
care about only "100644" or "100755" (depending only on the executable
bit), but never modes where the group or other permissions are more
exotic. So never "100664", "100700", etc. When a file in the working
tree has such a mode, we quietly turn it into one of the two canonical
modes, and that's what is stored both in the index and in tree objects.

However, we don't canonicalize modes we read from incoming patches in
git-apply. These may appear in a few lines:

  - "old mode" / "new mode" lines for mode changes

  - "new file mode" lines for newly created files

  - "deleted file mode" for removing files

For "new mode" and for "new file mode", this is harmless. The patch is
asking the result to have a certain mode, but:

  - when we add an index entry (for --index or --cached), it is
    canonicalized as we create the entry, via create_ce_mode().

  - for a working tree file, try_create_file() passes either 0777 or
    0666 to open(), so what you get depends only on your umask, not any
    other bits (aside from the executable bit) in the original mode.

However, for "old mode" and "deleted file mode", there is a minor
annoyance. We compare the patch's expected preimage mode with the
current state. But that current state is always going to be a canonical
mode itself:

  - updating an index entry via --cached will have the canonical mode in
    the index

  - for updating a working tree file, check_preimage() runs the mode
    through ce_mode_from_stat(), which does the usual canonicalization

So if the patch feeds a non-canonical mode, it's impossible for it to
match, and we will always complain with something like:

  file has type 100644, expected 100664

Since this is just a warning, the operation proceeds, but it's
confusing and annoying.

These cases should be pretty rare in practice. Git would never produce a
patch with non-canonical modes itself (since it doesn't store them).
And while we do accept patches from other programs, all of those lines
were invented by Git. So you'd need a program trying to be Git
compatible, but not handling canonicalization the same way. Reportedly
"quilt" is such a program.

We should canonicalize the modes as we read them so that the user never
sees the useless warning.

A few notes on the tests:

  - I've covered instances of all lines for completeness, even though
    the "new mode" / "new file mode" ones behave OK currently.

  - the tests apply patches to both the index and working tree, and
    check the result of both. Again, we know that all of these paths
    canonicalize anyway, but it's giving us extra coverage (although we
    are even less likely to have such a bug now since we canonicalize up
    front).

  - the test patches are missing "index" lines, which is also something
    Git would never produce. But they don't matter for the test, they do
    match the case from quilt we saw in the wild, and they avoid some
    sha1/sha256 complexity.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t-reftable-tree: improve the test for infix_walk()

In the current testing setup for infix_walk(), the following
properties of an infix traversal of a tree remain untested:
- every node of the tree must be visited
- every node must be visited exactly once
In fact, only the property 'traversal in increasing order' is tested.
Modify test_infix_walk() to check for all the properties above.

This can be achieved by storing the nodes' keys linearly, in a nullified
buffer, as we visit them and then checking the input keys against this
buffer in increasing order. By checking that the element just after
the last input key is 'NULL' in the output buffer, we ensure that
every node is traversed exactly once.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t-reftable-tree: add test for non-existent key

In the current testing setup for tree_search(), the case for
non-existent key is not exercised. Improve this by adding a
test-case for the same.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t-reftable-tree: split test_tree() into two sub-test functions

In the current testing setup, tests for both tree_search() and
infix_walk() defined by reftable/tree.{c, h} are performed by
a single test function, test_tree(). Split tree_test() into
test_tree_search() and test_infix_walk() responsible for
independently testing tree_search() and infix_walk() respectively.
This improves the overall readability of the test file as well as
simplifies debugging.

Note that the last parameter in the tree_search() functiom is
'int insert' which when set, inserts the key if it is not found
in the tree. Otherwise, the function returns NULL for such cases.

While at it, use 'func' to pass function pointers and not '&func'.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: move reftable/tree_test.c to the unit testing framework

reftable/tree_test.c exercises the functions defined in
reftable/tree.{c, h}. Migrate reftable/tree_test.c to the unit
testing framework. Migration involves refactoring the tests to use
the unit testing framework instead of reftable's test framework and
renaming the tests to align with unit-tests' standards.

Also add a comment to help understand the test routine.

Note that this commit mostly moves the test from reftable/ to
t/unit-tests/ and most of the refactoring is performed by the
trailing commits.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable: remove unnecessary curly braces in reftable/tree.c

According to Documentation/CodingGuidelines, single-line control-flow
statements must omit curly braces (except for some special cases).
Make reftable/tree.c adhere to this guideline.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-gui: Remove forced rescan of stat-dirty files.

It is possible that stat information of tracked files is modified without
actually modifying the content. Plumbing commands would detect such files
as modified, so that Git GUI runs `git update-info --refresh` in order to
synchronize the cached stat info with the reality. However, this can be
an expensive operation in large repositories. As remediation,
e534f3a88676 (git-gui: Allow the user to disable update-index --refresh
during rescan, 2006-11-07) introduced an option to skip the expensive
part.

The option was named "trust file modification timestamp". But the catch
is that sometimes file timestamps can't be trusted. In this case, a file
would remain listed in Unstaged Changes although there are no changes.
So 16403d0b1f9d (git-gui: Refresh a file if it has an empty diff,
2006-11-11) introduced a popup message informing the user about the
situation and then removed the file from the Unstaged Changes list.

Now users had to click away the message box for every file that was
stat-dirty. Under the assumption that a file in such a state is not
the only one, 124355d32c06 (git-gui: Always start a rescan on an empty
diff, 2007-01-22) introduced a forced (potentially expensive) refresh
that would de-list all stat-dirty files after the first notification was
dismissed.

Along came 6c510bee2013 (Lazy man's auto-CRLF, 2007-02-13) in Git. It
introduced a new case where a file in the worktree can have no essential
differences to the staged version, but still be detected as modified by
plumbing commands. This time, however, the index cannot be synchronized
fully by `git update-index --refresh`, so that the file remains listed
in Unstaged Changes until it is staged manually.

Needless to say that the message box now becomes an annoyance, because
it must be dismissed every time an affected file is selected, and the
file remains listed nevertheless.

Remove the message box. Write the notice that no differences were found
in the diff panel instead. Also include a link that, when clicked,
initiates the rescan. With this scheme, the rescan does not happen
automatically anymore, but requires an additional click. (This is now
two clicks in total for users who encounter stat-dirty files after
enabling the "trust file modification timestamps" option.) However,
users whom the rescan does not help (autocrlf-related dirty files) save
half the clicks because there is no message box to dismiss.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>

Documentation: add platform support policy

Supporting many platforms is only possible when we have the right tools to
ensure that support.

Teach platform maintainers how they can help us to help them, by
explaining what kind of tooling support we would like to have, and what
level of support becomes available as a result. Provide examples so that
platform maintainers can see what we're asking for in practice.

With this policy in place, we can make changes with stronger assurance
that we are not breaking anybody we promised not to. Instead, we can
feel confident that our existing testing and integration practices
protect those who care from breakage.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: drop `ref_store`-less functions

In c8f815c208 (refs: remove functions without ref store, 2024-05-07), we
have removed functions of the refs subsystem that do not take a ref
store as input parameter. In order to make it easier for folks to figure
out how to replace calls to such functions in in-flight patch series, we
kept their definitions around in an ifdeffed block.

Now that Git v2.46 is out, it is rather unlikely that anybody still has
references to these old functions in their unreleased patches. Let's
thus drop them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

http: do not ignore proxy path

The documentation for `http.proxy` describes that option, and the
environment variables it overrides, as supporting "the syntax understood
by curl". curl allows SOCKS proxies to use a path to a Unix domain
socket, like `socks5h://localhost/path/to/socket.sock`. Git should
therefore include, if present, the path part of the proxy URL in what it
passes to libcurl.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/ls-remote: fall back to SHA1 outside of a repo

In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), we have stopped setting the default hash algorithm for
`the_repository`. Consequently, code that relies on `the_hash_algo` will
now crash when it hasn't explicitly been initialized, which may be the
case when running outside of a Git repository.

It was reported that git-ls-remote(1) may crash in such a way when using
a remote helper that advertises refspecs. This is because the refspec
announced by the helper will get parsed during capability negotiation.
At that point we haven't yet figured out what object format the remote
uses though, so when run outside of a repository then we will fail.

The course of action is somewhat dubious in the first place. Ideally, we
should only parse object IDs once we have asked the remote helper for
the object format. And if the helper didn't announce the "object-format"
capability, then we should always assume SHA256. But instead, we used to
take either SHA1 if there was no repository, or we used the hash of the
local repository, which is wrong.

Arguably though, crashing hard may not be in the best interest of our
users, either. So while the old behaviour was buggy, let's restore it
for now as a short-term fix. We should eventually revisit, potentially
by deferring the point in time when we parse the refspec until after we
have figured out the remote's object hash.

Reported-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t0018: remove leftover debugging cruft

The actual file is copied out to /tmp, presumably so that the tester
can inspect it after the test is done, which may have been a useful
debugging aid.

But in the final shape of the test suite, such a code should not
exist. We cannot even assume that we are allowed to write into /tmp
(our TMPDIR may not even be pointing at it) or read from it for that
matter.

Noticed-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config.c: avoid segfault with --fixed-value and valueless config

When using `--fixed-value` with a key whose value is left empty (implied
as being "true"), 'git config' may crash when invoked like either of:

    $ git config set --file=config --value=value --fixed-value \
        section.key pattern
    $ git config --file=config --fixed-value section.key value pattern

The original bugreport[1] bisects to 00bbdde141 (builtin/config:
introduce "set" subcommand, 2024-05-06), which is a red-herring, since
the original bugreport uses the new 'git config set' invocation.

The behavior likely bisects back to c90702a1f6 (config: plumb
--fixed-value into config API, 2020-11-25), which introduces the new
--fixed-value option in the first place.

Looking at the relevant frame from a failed process's coredump, the
crash appears in config.c::matches() like so:

    (gdb) up
    #1  0x000055b3e8b06022 in matches (key=0x55b3ea894360 "section.key", value=0x0,
        store=0x7ffe99076eb0) at config.c:2884
    2884 return !strcmp(store->fixed_value, value);

where we are trying to compare the `--fixed-value` argument to `value`,
which is NULL.

Avoid attempting to match `--fixed-value` for configuration keys with no
explicit value. A future patch could consider the empty value to mean
"true", "yes", "on", etc. when invoked with `--type=bool`, but let's
punt on that for now in the name of avoiding the segfault.

[1]: https://lore.kernel.org/git/CANrWfmTek1xErBLrnoyhHN+gWU+rw14y6SQ+abZyzGoaBjmiKA@mail.gmail.com/

Reported-by: Han Jiang <jhcarl0814@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The second batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'as/show-ref-option-help-update'

A few descriptions in "git show-ref -h" have been clarified.

* as/show-ref-option-help-update:
show-ref: improve short help messages of options

Merge branch 'jc/doc-reviewing-guidelines-positive-reviews'

The reviewing guidelines document now explicitly encourages people
to give positive reviews and how.

* jc/doc-reviewing-guidelines-positive-reviews:
ReviewingGuidelines: encourage positive reviews more

Merge branch 'jc/doc-rebase-fuzz-vs-offset-fix'

"git rebase --help" referred to "offset" (the difference between
the location a change was taken from and the change gets replaced)
incorrectly and called it "fuzz", which has been corrected.

* jc/doc-rebase-fuzz-vs-offset-fix:
doc: difference in location to apply is "offset", not "fuzz"

t-reftable-pq: add tests for merged_iter_pqueue_top()

merged_iter_pqueue_top() as defined by reftable/pq.{c, h} returns
the element at the top of a priority-queue's heap without removing
it. Since there are no tests for this function in the existing
setup, add tests for the same.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t-reftable-pq: add test for index based comparison

When comparing two entries, the priority queue as defined by
reftable/pq.{c, h} first compares the entries on the basis of
their ref-record's keys. If the keys turn out to be equal, the
comparison is then made on the basis of their update indices
(which are never equal).

In the current testing setup, only the case for comparison on
the basis of ref-record's keys is exercised. Add a test for
index-based comparison as well. Rename the existing test to
reflect its nature of only testing record-based comparison.

While at it, replace 'strbuf_detach' with 'xstrfmt' to assign
refnames in the existing test. This makes the test conciser.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t-reftable-pq: make merged_iter_pqueue_check() callable by reference

merged_iter_pqueue_check() checks the validity of a priority queue
represented by a merged_iter_pqueue struct by asserting the
parent-child relation in the struct's heap. Explicity passing a
struct to this function means a copy of the entire struct is created,
which is inefficient.

Make the function accept a pointer to the struct instead. This is
safe to do since the function doesn't modify the struct in any way.
Make the function parameter 'const' to assert immutability.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t-reftable-pq: make merged_iter_pqueue_check() static

merged_iter_pqueue_check() is a function previously defined in
reftable/pq_test.c (now t/unit-tests/t-reftable-pq.c) and used in
the testing of a priority queue as defined by reftable/pq.{c, h}.
As such, this function is only called by reftable/pq_test.c and it
makes little sense to expose it to non-testing code via reftable/pq.h.

Hence, make this function static and remove its prototype from
reftable/pq.h.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: move reftable/pq_test.c to the unit testing framework

reftable/pq_test.c exercises a priority queue defined by
reftable/pq.{c, h}. Migrate reftable/pq_test.c to the unit testing
framework. Migration involves refactoring the tests to use the unit
testing framework instead of reftable's test framework, and
renaming the tests to align with unit-tests' standards.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable: change the type of array indices to 'size_t' in reftable/pq.c

The variables 'i', 'j', 'k' and 'min' are used as indices for
'pq->heap', which is an array. Additionally, 'pq->len' is of
type 'size_t' and is often used to assign values to these
variables. Hence, change the type of these variables from 'int'
to 'size_t'.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable: remove unnecessary curly braces in reftable/pq.c

According to Documentation/CodingGuidelines, control-flow statements
with a single line as their body must omit curly braces. Make
reftable/pq.c conform to this guideline. Besides that, remove
unnecessary newlines and variable assignment.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

credential/osxkeychain: respect NUL terminator in username

This patch fixes a case where git-credential-osxkeychain might output
uninitialized bytes to stdout.

We need to get the username string from a system API using
CFStringGetCString(). To do that, we get the max size for the string
from CFStringGetMaximumSizeForEncoding(), allocate a buffer based on
that, and then read into it. But then we print the entire buffer to
stdout, including the trailing NUL and any extra bytes which were not
needed. Instead, we should stop at the NUL.

This code comes from 9abe31f5f1 (osxkeychain: replace deprecated
SecKeychain API, 2024-02-17). The bug was probably overlooked back then
because this code is only used as a fallback when we can't get the
string via CFStringGetCStringPtr(). According to Apple's documentation:

  Whether or not this function returns a valid pointer or NULL depends
  on many factors, all of which depend on how the string was created and
  its properties.

So it's not clear how we could make a test for this, and we'll have to
rely on manually testing on a system that triggered the bug in the first
place.

Reported-by: Hong Jiang <ilford@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Tested-by: Hong Jiang <ilford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-reach: fix trivial memory leak when computing reachability

We don't free the local `stack` commit list that we use to compute
reachability of multiple commits at once. Do so.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

convert: fix leaking config strings

In `read_convert_config()`, we end up reading some string values into
variables. We don't free any potentially-existing old values though,
which will result in a memory leak in case the same key has been defined
multiple times.

Fix those leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

entry: fix leaking pathnames during delayed checkout

When filtering files during delayed checkout, we pass a string list to
`async_query_available_blobs()`. This list is initialized with NODUP,
and thus inserted strings will not be owned by the list. In the latter
function we then try to hand over ownership by passing an `xstrup()`'d
value to `string_list_insert()`. But this is not how this works: a NODUP
list does not take ownership of allocated strings and will never free
them for the caller.

Fix this issue by initializing the list as `DUP` instead and dropping
the explicit call to `xstrdup()`. This is okay to do given that this is
the single callsite of `async_query_available_blobs()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

object-name: fix leaking commit list items

When calling `get_oid_oneline()`, we pass in a `struct commit_list` that
gets modified by the function. This creates a weird situation where the
commit list may sometimes be empty after returning, but sometimes it
will continue to carry additional commits. In those cases the remainder
of the list leaks.

Ultimately, the design where we only pass partial ownership to
`get_oid_oneline()` feels shoddy. Refactor the code such that we only
pass a constant pointer to the list, creating a local copy as needed.
Callers are thus always responsible for freeing the commit list, which
then allows us to plug a bunch of memory leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/test-repository: fix leaking repository

The test-repository test helper zeroes out `the_repository` such that it
can be sure that our codebase only ends up using the supplied repository
that we initialize in the respective helper functions. This does cause
memory leaks though as the data that `the_repository` has been holding
onto is not referenced anymore.

Fix this by calling `repo_clear()` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/credential-cache: fix trivial leaks

There are two trivial leaks in git-credential-cache(1):

  - We leak the child process in `spawn_daemon()`. As we do not call
    `finish_command()` and instead let the created process daemonize, we
    have to clear the process manually.

  - We do not free the computed socket path in case it wasn't given via
    `--socket=`.

Plug both of these memory leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/worktree: fix leaking derived branch names

There are several heuristics that git-worktree(1) uses to derive the
name of the newly created branch when not given explicitly. These
heuristics all allocate a new string, but we only end up freeing that
string in a subset of cases.

Fix the remaining cases where we didn't yet free the derived branch
names. While at it, also free `opt_track`, which is being populated via
an `OPT_PASSTHRU()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/shortlog: fix various trivial memory leaks

There is a trivial memory leak in git-shortlog(1). Fix it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/rerere: fix various trivial memory leaks

There are multiple trivial memory leaks in git-rerere(1). Fix those.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/credential-store: fix leaking credential

We never free credentials read by the credential store, leading to a
memory leak. Plug it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/show-branch: fix several memory leaks

There are several memory leaks in git-show-branch(1). Fix them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/rev-parse: fix memory leak with `--parseopt`

The `--parseopt` mode allows shell scripts to have the same option
parsing mode as we have in C builtins. It soaks up a set of option
descriptions via stdin and massages them into proper `struct option`s
that we can then use to parse a set of arguments.

We only partially free those options when done though, creating a memory
leak. Interestingly, we only end up free'ing the first option's help,
which is of course wrong.

Fix this by freeing all option's help fields as well as their `argh`
fields to plug this memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/stash: fix various trivial memory leaks

There are multiple trivial memory leaks in git-stash(1). Fix those.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/remote: fix various trivial memory leaks

There are multiple trivial memory leaks in git-remote(1). Fix those.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/remote: fix leaking strings in `branch_list`

The `struct string_list branch_list` is declared as `NODUP`, which makes
it not copy strings inserted into it. This causes memory leaks though,
as this means it also won't be responsible for _freeing_ inserted
strings. Thus, every branch we add to this will leak.

Fix this by marking the list as `DUP` instead and free the local copy we
have of the variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/ls-remote: fix leaking `pattern` strings

Users can pass patterns to git-ls-remote(1), which allows them to filter
the list of printed references. We assemble those patterns into an array
and prefix them with "*/", but never free either the array nor the
allocated strings.

Refactor the code to use a `struct strvec` instead of manually tracking
the strings in an array. Like this, we can easily use `strvec_clear()`
to release both the vector and the contained string for us, plugging the
leak.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/submodule--helper: fix leaking buffer in `is_tip_reachable`

The `rev` buffer in `is_tip_reachable()` is being populated with the
output of git-rev-list(1) -- if either the command fails or the buffer
contains any data, then the input commit is not reachable.

The buffer isn't used for anything else, but neither do we free it,
causing a memory leak. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/submodule--helper: fix leaking clone depth parameter

The submodule helper supports a `--depth` parameter for both its "add"
and "clone" subcommands, which in both cases end up being forwarded to
git-clone(1). But while the former subcommand uses an `OPT_INTEGER()` to
parse the depth, the latter uses `OPT_STRING()`. Consequently, it is
possible to pass non-integer input to "--depth" when calling the "clone"
subcommand, where the value will then ultimately cause git-clone(1) to
bail out.

Besides the fact that the parameter verification should happen earlier,
the submodule helper infrastructure also internally tracks the depth via
a string. This requires us to convert the integer in the "add"
subcommand into an allocated string, and this string ultimately leaks.

Refactor the code to consistently track the clone depth as an integer.
This plugs the memory leak, simplifies the code and allows us to use
`OPT_INTEGER()` instead of `OPT_STRING()`, validating the input before
we shell out to git--clone(1).

Original-patch-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/name-rev: fix various trivial memory leaks

There are several structures that we don't release after
`cmd_name_rev()` is done. Plug those leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/describe: fix trivial memory leak when describing blob

We never free the `struct strvec args` variable in `describe_blob()`,
which thus causes a memory leak. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/describe: fix leaking array when running diff-index

When running git-describe(1) with `--dirty`, we will set up a `struct
rev_info` with arguments for git-diff-index(1). The way we assemble the
arguments it causes two memory leaks though:

  - We never release the `struct strvec`.

  - `setup_revisions()` may end up removing some entries from the
    `strvec`, which we wouldn't free even if we released the struct.

While we could plug those leaks, this is ultimately unnecessary as the
arguments we pass are part of a static array anyway. So instead,
refactor the code to drop the `struct strvec` and just pass this static
array directly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/describe: fix memory leak with `--contains=`

When calling `git describe --contains=`, we end up invoking
`cmd_name_rev()` with some munged argv array. This array may contain
allocated strings and furthermore will likely be modified by the called
function. This results in two memory leaks:

  - First, we leak the array that we use to assemble the arguments.

  - Second, we leak the allocated strings that we may have put into the
    array.

Fix those leaks by creating a separate copy of the array that we can
hand over to `cmd_name_rev()`. This allows us to free all strings
contained in the `strvec`, as the original vector will not be modified
anymore.

Furthermore, free both the `strvec` and the copied array to fix the
first memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/log: fix leaking branch name when creating cover letters

When calling `make_cover_letter()` without a branch name, we try to
derive the branch name by calling `find_branch_name()`. But while this
function returns an allocated string, we never free the result and thus
have a memory leak. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/replay: plug leaking `advance_name` variable

The `advance_name` variable can either contain a static string when
parsed via the `--advance` command line option or it may be an allocated
string when set via `determine_replay_mode()`. Because we cannot be sure
whether it is allocated or not we just didn't free it at all, resulting
in a memory leak.

Split up the variables such that we can track the static and allocated
strings separately and then free the allocated one to fix the memory
leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Start the 2.47 cycle

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jc/how-to-maintain-updates'

Doc update.

* jc/how-to-maintain-updates:
howto-maintain: update daily tasks
howto-maintain: cover a whole development cycle

Merge branch 'tn/doc-commit-fix'

Docfix.

* tn/doc-commit-fix:
doc: remove dangling closing parenthesis

Merge branch 'jc/doc-one-shot-export-with-shell-func'

It has been documented that we avoid "VAR=VAL shell_func" and why.

* jc/doc-one-shot-export-with-shell-func:
CodingGuidelines: document a shell that "fails" "VAR=VAL shell_func"

Merge branch 'cp/unit-test-reftable-merged'

Another reftable test has been ported to use the unit test framework.

* cp/unit-test-reftable-merged:
  t-reftable-merged: add test for REFTABLE_FORMAT_ERROR
  t-reftable-merged: use reftable_ref_record_equal to compare ref records
  t-reftable-merged: add tests for reftable_merged_table_max_update_index
  t-reftable-merged: improve the const-correctness of helper functions
  t-reftable-merged: improve the test t_merged_single_record()
  t: harmonize t-reftable-merged.c with coding guidelines
  t: move reftable/merged_test.c to the unit testing framework

Merge branch 'kn/ci-clang-format'

A CI job that use clang-format to check coding style issues in new
code has been added.

* kn/ci-clang-format:
  ci/style-check: add `RemoveBracesLLVM` in CI job
  check-whitespace: detect if no base_commit is provided
  ci: run style check on GitHub and GitLab
  clang-format: formalize some of the spacing rules
  clang-format: avoid spacing around bitfield colon
  clang-format: indent preprocessor directives after hash

Merge branch 'jc/checkout-no-op-switch-errors'

"git checkout --ours" (no other arguments) complained that the
option is incompatible with branch switching, which is technically
correct, but found confusing by some users. It now says that the
user needs to give pathspec to specify what paths to checkout.

* jc/checkout-no-op-switch-errors:
checkout: special case error messages during noop switching

Merge branch 'pw/add-patch-with-suppress-blank-empty'

"git add -p" by users with diff.suppressBlankEmpty set to true
failed to parse the patch that represents an unmodified empty line
with an empty line (not a line with a single space on it), which
has been corrected.

* pw/add-patch-with-suppress-blank-empty:
add-patch: use normalize_marker() when recounting edited hunk
add-patch: handle splitting hunks with diff.suppressBlankEmpty

Merge branch 'rj/make-cleanup'

A build tweak knob has been simplified by not setting the value
that is already the default; another unused one has been removed.

* rj/make-cleanup:
config.mak.uname: remove unused uname_P variable
Makefile: drop -Wno-universal-initializer from SP_EXTRA_FLAGS

Merge branch 'jt/doc-post-receive-hook-update'

Doc update.

* jt/doc-post-receive-hook-update:
doc: clarify post-receive hook behavior

Merge branch 'ad/merge-with-diff-algorithm'

Many Porcelain commands that internally use the merge machinery
were taught to consistently honor the diff.algorithm configuration.

* ad/merge-with-diff-algorithm:
merge-recursive: honor diff.algorithm

Merge branch 'rs/t-strvec-use-test-msg'

Unit test clean-up.

* rs/t-strvec-use-test-msg:
  t-strvec: fix type mismatch in check_strvec
  t-strvec: improve check_strvec() output
  t-strvec: use test_msg()

t98xx: mark Perforce tests as memory-leak free

All the Perforce tests are free of memory leaks. This went unnoticed
because most folks do not have p4 and p4d installed on their computers.
Consequently, given that the prerequisites for running those tests
aren't fulfilled, `TEST_PASSES_SANITIZE_LEAK=check` won't notice that
those tests are indeed memory leak free.

Mark those tests accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: update Perforce version to r23.2

Update our Perforce version from r21.2 to r23.2. Note that the updated
version is not the newest version. Instead, it is the last version where
the way that Perforce is being distributed remains the same as in r21.2.
Newer releases stopped distributing p4 and p4d executables as well as
the macOS archives directly and would thus require more work.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t98xx: fix Perforce tests with p4d r23 and newer

Some of the tests in t98xx modify the Perforce depot in ways that the
tool wouldn't normally allow. This is done to test behaviour of git-p4
in certain edge cases that we have observed in the wild, but which
should in theory not be possible.

Naturally, modifying the depot on disk directly is quite intimate with
the tool and thus prone to breakage when Perforce updates the way that
data is stored. And indeed, those tests are broken nowadays with r23 of
Perforce. While a file revision was previously stored as a plain file
"depot/file,v", it is now stored in a directory "depot/file,d" with
compression.

Adapt those tests to handle both old- and new-style depot layouts.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

convert: return early when not tracing

When Git adds a file requiring encoding conversion and tracing of encoding
conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
environment variable, the `trace_encoding()` function still allocates &
prepares "human readable" copies of the file contents before and after
conversion to show in the trace. This results in a high memory footprint
and increased runtime without providing any user-visible benefit.

This fix introduces an early exit from the `trace_encoding()` function
when tracing is not requested, preventing unnecessary memory allocation
and processing.

Signed-off-by: D Harithamma <harithamma.d@ibm.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: consistently use spaces inside initializers

Our coding guide is inconsistent with how it uses spaces inside of
initializers (`struct foo bar = { something }`). While we mostly carry
the space between open and closing braces and the initialized members,
in one case we don't.

Fix this one instance such that we consistently carry the space. This is
also consistent with how clang-format formats such initializers.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: document idiomatic function names

We semi-regularly have discussions around whether a function shall be
named `S_release()`, `S_clear()` or `S_free()`. Indeed, it may not be
obvious which of these is preferable as we never really defined what
each of these variants means exactly.

Carve out a space where we can add idiomatic names for common functions
in our coding guidelines and define each of those functions. Like this,
we can get to a shared understanding of their respective semantics and
can easily point towards our style guide in future discussions such that
our codebase becomes more consistent over time.

Note that the intent is not to rename all functions which violate these
semantics right away. Rather, the intent is to slowly converge towards a
common style over time.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: document naming schema for structs and their functions

We nowadays have a proper mishmash of struct-related functions that are
called `<verb>_<struct>` (e.g. `clear_prio_queue()`) versus functions
that are called `<struct>_<verb>` (e.g. `strbuf_clear()`). While the
former style may be easier to tie into a spoken conversation, most of
our communication happens in text anyway. Furthermore, prefixing
functions with the name of the structure they operate on makes it way
easier to group them together, see which functions are related, and will
also help folks who are using code completion.

Let's thus settle on one style, namely the one where functions start
with the name of the structure they operate on.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: clarify indentation style for C preprocessor directives

In the preceding commit, we have settled on using a single space per
nesting level to indent preprocessor directives. Clarify our coding
guidelines accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

clang-format: fix indentation width for preprocessor directives

In [1], we have improved our clang-format configuration to also specify
the style for how to indent preprocessor directives. But while we have
settled the question of where to put the indentation, either before or
after the hash sign, we didn't specify exactly how to indent.

With the current configuration, clang-format uses tabs to indent each
level of nested preprocessor directives, which is in fact unintentional
and never done in our codebase. Instead, we use a mixture of indenting
by either one or two spaces, where using a single space is somewhat more
common.

Adapt our clang-format configuration accordingly by specifying an
indentation width of one space.

[1]: <20240708092317.267915-1-karthik.188@gmail.com>

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'kn/ci-clang-format' into ps/doc-more-c-coding-guidelines

* kn/ci-clang-format:
  ci/style-check: add `RemoveBracesLLVM` in CI job
  check-whitespace: detect if no base_commit is provided
  ci: run style check on GitHub and GitLab
  clang-format: formalize some of the spacing rules
  clang-format: avoid spacing around bitfield colon
  clang-format: indent preprocessor directives after hash

refs/reftable: stop using `the_repository`

Convert the reftable ref backend to stop using `the_repository` in favor
of the repo that gets passed in via `struct ref_store`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/packed: stop using `the_repository`

Convert the packed ref backend to stop using `the_repository` in favor
of the repo that gets passed in via `struct ref_store`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: stop using `the_repository`

Convert the files ref backend to stop using `the_repository` in favor of
the repo that gets passed in via `struct ref_store`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: stop using `the_repository` in `parse_loose_ref_contents()`

We implicitly rely on `the_repository` in `parse_loose_ref_contents()`
by calling `parse_oid_hex()`. Convert the function to instead use
`parse_oid_hex_algop()` and have callers pass in the hash algorithm to
use.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: stop using `the_repository`

Convert "refs.c" to stop using `the_repository` in favor of the repo
that gets passed in via `struct ref_store`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t-strvec: use if_test

The macro TEST takes a single expression. If a test requires multiple
statements then they need to be placed in a function that's called in
the TEST expression.

Remove the cognitive overhead of defining and calling single-use
functions by using if_test instead.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t-reftable-basics: use if_test

The macro TEST takes a single expression.  If a test requires multiple
statements then they need to be placed in a function that's called in
the TEST expression.

Remove the overhead of defining and calling single-use functions by
using if_test instead.

Run the tests in the order of definition.  We can reorder them like that
because they are independent.  Technically this changes the output, but
retains the meaning of a full run and allows for easier review e.g. with
diff option --ignore-all-space.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t-ctype: use if_test

Use the documented macro if_test instead of the internal functions
test__run_begin() and test__run_end(), which are supposed to be private
to the unit test framework.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

unit-tests: add if_test

The macro TEST only allows defining a test that consists of a single
expression. Add a new macro, if_test, which provides a way to define
unit tests that are made up of one or more statements.

if_test allows defining self-contained tests en bloc, a bit like
test_expect_success does for regular tests. It acts like a conditional;
the test body is executed if test_skip_all() had not been called before.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>