git.ipfire.org Git - thirdparty/git.git/log

]> git.ipfire.org Git - thirdparty/git.git/log

Junio C Hamano [Wed, 14 Aug 2024 21:54:47 +0000 (14:54 -0700)]

Merge branch 'jk/osxkeychain-username-is-nul-terminated'

The credential helper to talk to OSX keychain sometimes sent
garbage bytes after the username, which has been corrected.

* jk/osxkeychain-username-is-nul-terminated:
credential/osxkeychain: respect NUL terminator in username

commit | commitdiff | tree

Junio C Hamano [Wed, 14 Aug 2024 21:54:47 +0000 (14:54 -0700)]

Merge branch 'ps/leakfixes-part-3'

More leakfixes.

* ps/leakfixes-part-3: (24 commits)
  commit-reach: fix trivial memory leak when computing reachability
  convert: fix leaking config strings
  entry: fix leaking pathnames during delayed checkout
  object-name: fix leaking commit list items
  t/test-repository: fix leaking repository
  builtin/credential-cache: fix trivial leaks
  builtin/worktree: fix leaking derived branch names
  builtin/shortlog: fix various trivial memory leaks
  builtin/rerere: fix various trivial memory leaks
  builtin/credential-store: fix leaking credential
  builtin/show-branch: fix several memory leaks
  builtin/rev-parse: fix memory leak with `--parseopt`
  builtin/stash: fix various trivial memory leaks
  builtin/remote: fix various trivial memory leaks
  builtin/remote: fix leaking strings in `branch_list`
  builtin/ls-remote: fix leaking `pattern` strings
  builtin/submodule--helper: fix leaking buffer in `is_tip_reachable`
  builtin/submodule--helper: fix leaking clone depth parameter
  builtin/name-rev: fix various trivial memory leaks
  builtin/describe: fix trivial memory leak when describing blob
  ...

commit | commitdiff | tree

Jacob Keller [Wed, 14 Aug 2024 00:05:10 +0000 (17:05 -0700)]

t9001-send-email.sh: update alias list used for pine test

The set of aliases used for the pine --dump-aliases test do not
perfectly mesh with the way the pine address book is defined. While
technically all valid, there are some oddities including bob's name
being partially split so that the actual address is returned as
"Bobbyton <bob@example.com>". A strict reading of the pine documentation
indicates that the address should either be of the form
"address@domain" or a comma separated list of address, name/address
pairs, or other aliases enclosed by ().

The parsing implementation in git-send-email is not as strict, but it
makes sense to ensure the test data used is. Although the --dump-aliases
test does not make use of the address data, it is helpful to avoid
giving future developers the wrong impression of the file format.

Also add an alias which translates to multiple addresses using the ()
format.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Derrick Stolee [Wed, 14 Aug 2024 10:31:30 +0000 (10:31 +0000)]

p1500: add is-base performance tests

The previous two changes introduced a commit walking heuristic for finding
the most likely base branch for a given source. This algorithm walks
first-parent histories until reaching a collision.

This walk _should_ be very fast. Exceptions include cases where a
commit-graph file does not exist, leading to a full walk of all reachable
commits to compute generation numbers, or a case where no collision in the
first-parent history exists, leading to a walk of all first-parent history
to the root commits.

The p1500 test script guarantees a complete commit-graph file during its
setup, so we will not test that scenario. Do create a new root commit in an
effort to test the scenario of parallel first-parent histories.

Even with the extra root commit, these tests take no longer than 0.02
seconds on my machine for the Git repository. However, the results are
slightly more interesting in a copy of the Linux kernel repository:

Test
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref              0.12
1500.3: ahead-behind counts: git branch                    0.12
1500.4: ahead-behind counts: git tag                       0.12
1500.5: contains: git for-each-ref --merged                0.04
1500.6: contains: git branch --merged                      0.04
1500.7: contains: git tag --merged                         0.04
1500.8: is-base check: test-tool reach (refs)              0.03
1500.9: is-base check: test-tool reach (tags)              0.03
1500.10: is-base check: git for-each-ref                   0.03
1500.11: is-base check: git for-each-ref (disjoint-base)   0.07

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Derrick Stolee [Wed, 14 Aug 2024 10:31:29 +0000 (10:31 +0000)]

for-each-ref: add 'is-base' token

The previous change introduced the get_branch_base_for_tip() method in
commit-reach.c. The motivation of that change was about using a heuristic to
deteremine the base branch for a source commit from a list of candidate
commit tips. This change makes that algorithm visible to users via a new
atom in the 'git for-each-ref' format. This change is very similar to the
chang in 49abcd21da6 (for-each-ref: add ahead-behind format atom,
2023-03-20).

Introduce the 'is-base:<source>' atom, which will indicate that the
algorithm should be computed and the result of the algorithm is reported
using an indicator of the form '(<source>)'. For example, using
'%(is-base:HEAD)' would result in one line having the token '(HEAD)'.

Use the sorted order of refs included in the ref filter to break ties in the
algorithm's heuristic. In the previous change, the motivating examples
include using an L0 trunk, long-lived L1 branches, and temporary release
branches. A caller could communicate the ordered preference among these
categories using the input refpecs and avoiding a different sort mechanism.
This sorting behavior is tested in the test scripts.

It is important to include this atom as a special case to
can_do_iterative_format() to match the expectations created in bd98f9774e1
(ref-filter.c: filter & format refs in the same callback, 2023-11-14). The
ahead-behind atom was one of the special cases, and this similarly requires
using an algorithm across all input refs before starting the format of any
single ref.

In the test script, the format tokens use colons or lack whitespace to avoid
Git complaining about trailing whitespace errors.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Derrick Stolee [Wed, 14 Aug 2024 10:31:28 +0000 (10:31 +0000)]

commit: add gentle reference lookup method

The lookup_commit_reference_by_name() method uses lookup_commit_reference()
without an option to use lookup_commit_reference_gently(). Create a gentle
version of the method so it can be used in locations where non-commits may
be found but error messages should be silenced.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Derrick Stolee [Wed, 14 Aug 2024 10:31:27 +0000 (10:31 +0000)]

commit-reach: add get_branch_base_for_tip

Add a new reachability algorithm that intends to discover (from a heuristic)
which branch was used as the starting point for a given commit. Add focused
tests using the 'test-tool reach' command.

In repositories that use pull requests (or merge requests) to advance one or
more "protected" branches, the history of that reference can be recovered by
following the first-parent history in most cases. Most are completed using
no-fast-forward merges, though squash merges are quite common. Less common
is rebase-and-merge, which still validates this assumption. Finally, the
case that breaks this assumption is the fast-forward update (with potential
rebasing).  Even in this case, the previous commit commonly appears in the
first-parent history of the branch.

Similar assumptions can be made for a topic branch created by a single user
with the intention to merge back into another branch. Using 'git commit',
'git merge', and 'git cherry-pick' from HEAD will default to having the
first-parent commit be the previous commit at HEAD. This history changes
only with commands such as 'git reset' or 'git rebase', where the command
names also imply that the branch is starting from a new location.

With this movement of branches in mind, the following heuristic is proposed
as a way to determine the base branch for a given source branch:

  Among a list of candidate base branches, select the candidate that
  minimizes the number of commits in the first-parent history of the source
  that are not in the first-parent history of the candidate.

Prior third-party solutions to this problem have used this optimization
criteria, but have relied upon extracting the first-parent history and
comparing those lists as tables instead of using commit-graph walks.

Given current command-line interface options, this optimization criteria is
not easy to detect directly. Even using the command

  git rev-list --count --first-parent <base>..<source>

does not measure this count, as it uses full reachability from <base> to
determine which commits to remove from the range '<base>..<source>'. This
may lead to one asking if we should instead be using the full reachability
of the candidate and only the first-parent history of the source. This,
unfortunately, does not work for repositories that use long-lived branches
and automation to merge across those branches.

In extremely large repositories, merging into a single trunk may not be
feasible.  This is usually due to the desired frequency of updates
(thousands of engineers doing daily work) combined with the time required to
perform a validation build.  These factors combine to create significant
risk of semantic merge conflicts, leading to build breaks on the trunk. In
response, repository maintainers can create a single Level Zero (L0) trunk
and multiple Level One (L1) branches. By partitioning the engineers by
organization, these engineers may see lower risk of semantic merge conflicts
as well as be protected against build breaks in other L1 branches. The key
to making this system work is a semi-automated process of merging L1
branches into the L0 trunk and vice-versa.  In a large enough organization,
these L1 branches may further split into L2 or L3 branches, but the same
principles apply for merging across deeper levels.

If these automated merges use a typical merge with the second parent
bringing in the "new" content, then each L0 and L1 branch can track its
previous positions by following first-parent history, which appear as
parallel paths (until reaching the first place where the branches diverged).
If we also walk to second parents, then the histories overlap significantly
and cannot be distinguished except for very-recent changes.

For this reason, the first-parent condition should be symmetrical across the
base and source branches.

Another common case for desiring the result of this optimization method is
the use of release branches. When releasing a version of a repository, a
branch can be used to track that release. Any updates that are worth fixing
in that release can be merged to the release branch and shipped with only
the necessary fixes without any new features introduced in the trunk branch.
The 'maint-2.<X>' branches represent this pattern in the Git project. The
microsoft/git fork uses 'vfs-2.<X>.<Y>' branches to track the changes that
are custom to that fork on top of each upstream Git release 2.<X>.<Y>. This
application doesn't need the symmetrical first-parent condition, but the use
of first-parent histories does not change the results for these branches.

To determine the base branch from a list of candidates, create a new method
in commit-reach.c that performs a single* commit-graph walk. The core
concept is to walk first-parents starting at the candidate bases and the
source, tracking the "best" base to reach a given commit. Use generation
numbers to ensure that a commit is walked at most once and all children have
been explored before visiting it.  When reaching a commit that is reachable
from both a base and the source, we will then have a guarantee that this is
the closest intersection of first-parent histories. Track the best base to
reach that commit and return it as a result. In rare cases involving
multiple root commits, the first-parent history of the source may never
intersect any of the candidates and thus a null result is returned.

* There are up to two walks, since we require all commits to have a computed
  generation number in order to avoid incorrect results. This is similar to
  the need for computed generation numbers in ahead_behind() as implemented
  in fd67d149bde (commit-reach: implement ahead_behind() logic, 2023-03-20).

In order to track the "best" base, use a new commit slab that stores an
integer.  This value defaults to zero upon initialization, so use -1 to
track that the source commit can reach this commit and use 'i + 1' to track
that the ith base can reach this commit. When multiple bases can reach a
commit, minimize the index to break ties. This allows the caller to specify
an order to the bases that determines some amount of preference when the
heuristic does not result in a unique result.

The trickiest part of the integer slab is what happens when reaching a
collision among the histories of the bases and the history of the source.
This is noticed when viewing the first parent and seeing that it has a slab
value that differs in sign (negative or positive). In this case, the
collision commit is stored in the method variable 'branch_point' and its
slab value is set to -1. The index of the best base (so far) is stored in
the method variable 'best_index'. It is possible that there are multiple
commits that have the branch_point as its first parent, leading to multiple
updates of best_index.  The result is determined when 'branch_point' is
visited in the commit walk, giving the guarantee that all commits that could
reach 'branch_point' were visited.

Several interesting cases of collisions and different results are tested in
the t6600-test-reach.sh script. Recall that this script also tests the
algorithm in three possible states involving the commit-graph file and how
many commits are written in the file. This provides some coverage of the
need (and lack of need) for the ensure_generations_valid() method.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:58 +0000 (08:52 +0200)]

builtin/diff: free symmetric diff members

We populate a `struct symdiff` in case the user has requested a
symmetric diff. Part of this is to populate a `skip` bitmap that
indicates which commits shall be ignored in the diff. But while this
bitmap is dynamically allocated, we never free it.

Fix this by introducing and calling a new `symdiff_release()` function
that does this for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:55 +0000 (08:52 +0200)]

diff: free state populated via options

The `objfind` and `anchors` members of `struct diff_options` are
populated via option parsing, but are never freed in `diff_free()`. Fix
this to plug those memory leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:53 +0000 (08:52 +0200)]

builtin/log: fix leak when showing converted blob contents

In `show_blob_object()`, we proactively call `textconv_object()`. In
case we have a textconv driver for this blob we will end up showing the
converted contents, otherwise we'll show the un-converted contents of it
instead.

When the object has been converted we never free the buffer containing
the converted contents. Fix this to plug this memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:50 +0000 (08:52 +0200)]

userdiff: fix leaking memory for configured diff drivers

The userdiff structures may be initialized either statically on the
stack or dynamically via configuration keys. In the latter case we end
up leaking memory because we didn't have any infrastructure to discern
those strings which have been allocated statically and those which have
been allocated dynamically.

Refactor the code such that we have two pointers for each of these
strings: one that holds the value as accessed by other subsystems, and
one that points to the same string in case it has been allocated. Like
this, we can safely free the second pointer and thus plug those memory
leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:44 +0000 (08:52 +0200)]

builtin/format-patch: fix various trivial memory leaks

There are various memory leaks hit by git-format-patch(1). Basically all
of them are trivial, except that un-setting `diffopt.no_free` requires
us to unset the `diffopt.file` because we manually close it already.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:42 +0000 (08:52 +0200)]

diff: fix leak when parsing invalid ignore regex option

When parsing invalid ignore regexes passed via the `-I` option we don't
free already-allocated memory, leading to a memory leak. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:39 +0000 (08:52 +0200)]

unpack-trees: clear index when not propagating it

When provided a pointer to a destination index, then `unpack_trees()`
will end up copying its `o->internal.result` index into the provided
pointer. In those cases it is thus not necessary to free the index, as
we have transferred ownership of it.

There are cases though where we do not end up transferring ownership of
the memory, but `clear_unpack_trees_porcelain()` will never discard the
index in that case and thus cause a memory leak. And right now it cannot
do so in the first place because we have no indicator of whether we did
or didn't transfer ownership of the index.

Adapt the code to zero out the index in case we transfer its ownership.
Like this, we can now unconditionally discard the index when being asked
to clear the `unpack_trees_options`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:36 +0000 (08:52 +0200)]

sequencer: release todo list on error paths

We're not releasing the `todo_list` in `sequencer_pick_revisions()` when
hitting an error path. Restructure the function to have a common exit
path such that we can easily clean up the list and thus plug this memory
leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:33 +0000 (08:52 +0200)]

merge-ort: unconditionally release attributes index

We conditionally release the index used for reading gitattributes in
merge-ort based on whether or the index has been populated. This check
uses `cache_nr` as a condition. This isn't sufficient though, as the
variable may be zero even when some other parts of the index have been
populated. This leads to memory leaks when sparse checkouts are in use,
as we may not end up releasing the sparse checkout patterns.

Fix this issue by unconditionally releasing the index.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:31 +0000 (08:52 +0200)]

builtin/fast-export: plug leaking tag names

When resolving revisions in `get_tags_and_duplicates()`, we only
partially manage the lifetime of `full_name`. In fact, managing its
lifetime properly is almost impossible because we put direct pointers to
that variable into multiple lists without duplicating the string. The
consequence is that these strings will ultimately leak.

Refactor the code to make the lists we put those names into duplicate
the memory. This allows us to properly free the string as required and
thus plugs the memory leak.

While this requires us to allocate more data overall, it shouldn't be
all that bad given that the number of allocations corresponds with the
number of command line parameters, which typically aren't all that many.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:26 +0000 (08:52 +0200)]

builtin/fast-export: fix leaking diff options

Before calling `handle_commit()` in a loop, we set `diffopt.no_free`
such that its contents aren't getting freed inside of `handle_commit()`.
We never unset that flag though, which means that the structure's
allocated resources will ultimately leak.

Fix this by unsetting the flag after the loop such that we release its
resources via `release_revisions()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:23 +0000 (08:52 +0200)]

builtin/fast-import: plug trivial memory leaks

Plug some trivial memory leaks in git-fast-import(1).

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:20 +0000 (08:52 +0200)]

builtin/notes: fix leaking `struct notes_tree` when merging notes

We allocate a `struct notes_tree` in `merge_commit()` which we then
initialize via `init_notes()`. It's not really necessary to allocate the
structure though given that we never pass ownership to the caller.
Furthermore, the allocation leads to a memory leak because despite its
name, `free_notes()` doesn't free the `notes_tree` but only clears it.

Fix this issue by converting the code to use an on-stack variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:17 +0000 (08:52 +0200)]

builtin/rebase: fix leaking `commit.gpgsign` value

In `get_replay_opts()`, we override the `gpg_sign` field that already
got populated by `sequencer_init_config()` in case the user has
"commit.gpgsign" set in their config. This creates a memory leak because
we overwrite the previously assigned value, which may have already
pointed to an allocated string.

Let's plug the memory leak by freeing the value before we overwrite it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:14 +0000 (08:52 +0200)]

config: fix leaking comment character config

When the comment line character has been specified multiple times in the
configuration, then `git_default_core_config()` will cause a memory leak
because it unconditionally copies the string into `comment_line_str`
without free'ing the previous value. In fact, it can't easily free the
value in the first place because it may contain a string constant.

Refactor the code such that we track allocated comment character strings
via a separate non-constant variable `comment_line_str_to_free`. Adapt
sites that set `comment_line_str` to set both and free the old value
that was stored in `comment_line_str_to_free`.

This memory leak is being hit in t3404. As there are still other memory
leaks in that file we cannot yet mark it as passing with leak checking
enabled.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:12 +0000 (08:52 +0200)]

submodule-config: fix leaking name entry when traversing submodules

We traverse through submodules in the tree via `tree_entry()`, passing
to it a `struct name_entry` that it is supposed to populate with the
tree entry's contents. We unnecessarily allocate this variable instead
of passing a variable that is allocated on the stack, and the ultimately
don't even free that variable. This is unnecessary and leaks memory.

Convert the variable to instead be allocated on the stack to plug the
memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:06 +0000 (08:52 +0200)]

read-cache: fix leaking hashfile when writing index fails

In `do_write_index()`, we use a `struct hashfile` to write the index
with a trailer hash. In case the write fails though, we never clean up
the allocated `hashfile` state and thus leak memory.

Refactor the code to have a common exit path where we can free this and
other allocated memory. While at it, refactor our use of `strbuf`s such
that we reuse the same buffer to avoid some unneeded allocations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:03 +0000 (08:52 +0200)]

bulk-checkin: fix leaking state TODO

When flushing a bulk-checking to disk we also reset the `struct
bulk_checkin_packfile` state. But while we free some of its members,
others aren't being free'd, leading to memory leaks:

  - The temporary packfile name is not getting freed.

  - The `struct hashfile` only gets freed in case we end up calling
    `finalize_hashfile()`. There are code paths though where that is not
    the case, namely when nothing has been written. For this, we need to
    make `free_hashfile()` public.

Fix those leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:52:00 +0000 (08:52 +0200)]

object-name: fix leaking symlink paths in object context

The object context may be populated with symlink contents when reading a
symlink, but the associated strbuf doesn't ever get released when
releasing the object context, causing a memory leak. Plug it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:51:58 +0000 (08:51 +0200)]

object-file: fix memory leak when reading corrupted headers

When reading corrupt object headers in `read_loose_object()`, we bail
out immediately. This causes a memory leak though because we would have
already initialized the zstream in `unpack_loose_header()`, and it is
the callers responsibility to finish the zstream even on error. While
this feels weird, other callsites do it correctly already.

Fix this leak by ending the zstream even on errors. We may want to
revisit this interface in the future such that the callee handles this
for us already when there was an error.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:51:55 +0000 (08:51 +0200)]

git: fix leaking system paths

Git has some flags to make it output system paths as they have been
compiled into Git. This is done by calling `system_path()`, which
returns an allocated string. This string isn't ever free'd though,
creating a memory leak.

Plug those leaks. While they are surfaced by t0211, there are more
memory leaks looming exposed by that test suite and it thus does not yet
pass with the memory leak checker enabled.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Wed, 14 Aug 2024 06:51:52 +0000 (08:51 +0200)]

remote: plug memory leak when aliasing URLs

When we have a `url.*.insteadOf` configuration, then we end up aliasing
URLs when populating remotes. One place where this happens is in
`alias_all_urls()`, where we loop through all remotes and then alias
each of their URLs. The actual aliasing logic is then contained in
`alias_url()`, which returns an allocated string that contains the new
URL. This URL replaces the old URL that we have in the strvec that
contains all remote URLs.

We replace the remote URLs via `strvec_replace()`, which does not hand
over ownership of the new string to the vector. Still, we didn't free
the aliased URL and thus have a memory leak here. Fix it by freeing the
aliased string.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Jacob Keller [Wed, 14 Aug 2024 00:05:09 +0000 (17:05 -0700)]

t9001-send-email.sh: fix quoting for mailrc --dump-aliases test

The .mailrc alias file format documents that multiple addresses are
separated by spaces. The alias file used in the t9001 --dump-aliases
mailrc test have addresses which include both a name and email. These
are unquoted, so git send-email will parse this as an alias that
translates to multiple independent addresses.

The existing test does not care about this, as --dump-aliases only dumps
the alias and not the address. However, it is incorrect for a future
where --dump-aliases could also dump the mail addresses.

Fix the test to quote the aliases properly, so that they translate to a
single address.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Jeff King [Tue, 13 Aug 2024 05:02:16 +0000 (01:02 -0400)]

midx: drop unused parameters from add_midx_to_chain()

When loading a chained midx, we build up an array of hashes, one per
layer of the chain. But since the chain is also represented by the
linked list of multi_pack_index structs, nobody actually reads this
array. We pass it to add_midx_to_chain(), but the parameters are
completely ignored.

So we can drop those unused parameters. And then we can see that its
sole caller, load_midx_chain_fd_st(), only cares about one layer hash at a
time (for parsing each line and feeding it to the single-layer midx
code). So we can replace the array with a single object_id on the stack.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:18:15 +0000 (11:18 +0200)]

bundle: default to SHA1 when reading bundle headers

We hit a segfault when trying to open a bundle via `git bundle
list-heads` when running outside of a repository. This is caused by
c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), which stopped setting the default object hash so that
`the_hash_algo` is a `NULL` pointer when running outside of any repo.

This is only a symptom of a deeper issue though. Bundles default to the
SHA1 object format unless they advertise an "@object-format=" header.
Consequently, it has been wrong in the first place to use the object
format used by the current repository when parsing bundles. The
consequence is that trying to open a bundle that uses a different object
hash than the current repository will fail:

$ git bundle list-heads sha1.bundle
error: unrecognized header: ee4b540943284700a32591ad09f7e15bdeb2a10c HEAD (45)

Fix the bug by defaulting to the SHA1 object hash. We already handle the
"@object-format=" header as expected, so we don't need to adapt this
part.

Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:18:08 +0000 (11:18 +0200)]

builtin/bundle: have unbundle check for repo before opening its bundle

The `git bundle unbundle` subcommand requires a repository to unbundle
the contents into. As thus, the subcommand checks whether we have a
startup repository in the first place, and if not it dies.

This check happens after we have already opened the bundle though. This
causes a segfault when running outside of a repository starting with
c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07) because we have no hash function set up, but we do try to
parse refs advertised by the bundle's header.

The next commit will fix that underlying issue by defaulting to the SHA1
object format for bundles, which will also fix the described segfault here.
But as we know that we will die anyway, we can do better than that and
avoid some vain work by moving the check for a repository before we try
to open the bundle.

Reported-by: ArcticLampyrid <ArcticLampyrid@outlook.com>
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Chandra Pratap [Tue, 13 Aug 2024 14:34:50 +0000 (20:04 +0530)]

t-reftable-readwrite: add test for known error

When using reftable_writer_add_ref() to add a ref record to a
reftable writer, The update_index of the ref record must be within
the limits set by reftable_writer_set_limits(), or REFTABLE_API_ERROR
is returned. This scenario is currently left untested. Add a test
case for the same.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Chandra Pratap [Tue, 13 Aug 2024 14:34:49 +0000 (20:04 +0530)]

t-reftable-readwrite: use 'for' in place of infinite 'while' loops

Using a for loop with an empty conditional statement is more concise
and easier to read than an infinite 'while' loop in instances
where we need a loop variable. Hence, replace such instances of a
'while' loop with the equivalent 'for' loop.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Chandra Pratap [Tue, 13 Aug 2024 14:34:48 +0000 (20:04 +0530)]

t-reftable-readwrite: use free_names() instead of a for loop

free_names() as defined by reftable/basics.{c,h} frees a NULL
terminated array of malloced strings along with the array itself.
Use this function instead of a for loop to free such an array.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Chandra Pratap [Tue, 13 Aug 2024 14:34:47 +0000 (20:04 +0530)]

t: move reftable/readwrite_test.c to the unit testing framework

reftable/readwrite_test.c exercises the functions defined in
reftable/reader.{c,h} and reftable/writer.{c,h}. Migrate
reftable/readwrite_test.c to the unit testing framework. Migration
involves refactoring the tests to use the unit testing framework
instead of reftable's test framework and renaming the tests to
align with unit-tests' naming conventions.

Since some tests in reftable/readwrite_test.c use the functions
set_test_hash(), noop_flush() and strbuf_add_void() defined in
reftable/test_framework.{c,h} but these files are not #included
in the ported unit test, copy these functions in the new test file.

While at it, ensure structs are 0-initialized with '= { 0 }'
instead of '= { NULL }'.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:14:23 +0000 (11:14 +0200)]

config: hide functions using `the_repository` by default

The config subsystem provides a bunch of legacy functions that read or
set configuration for `the_repository`. The use of those functions is
discouraged, and it is easy to miss the implicit dependency on
`the_repository` that calls to those functions may cause.

Move all config-related functions that use `the_repository` into a block
that gets only conditionally compiled depending on whether or not the
macro has been defined. This also removes all dependencies on that
variable in "config.c", allowing us to remove the definition of said
preprocessor macro.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:14:21 +0000 (11:14 +0200)]

global: prepare for hiding away repo-less config functions

We're about to hide config functions that implicitly depend on
`the_repository` behind the `USE_THE_REPOSITORY_VARIABLE` macro. This
will uncover a bunch of dependents that transitively relied on the
global variable, but didn't define the macro yet.

Adapt them such that we define the macro to prepare for this change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:14:18 +0000 (11:14 +0200)]

config: don't depend on `the_repository` with branch conditions

When computing branch "includeIf" conditions we use `the_repository` to
obtain the main ref store. We really shouldn't depend on this global
repository though, but should instead use the repository that is being
passed to us via `struct config_include_data`. Otherwise, when parsing
configuration of e.g. submodules, we may end up evaluating the condition
the via the wrong refdb.

Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:14:15 +0000 (11:14 +0200)]

config: don't have setters depend on `the_repository`

Some of the setters that accept a `struct repository` still implicitly
rely on `the_repository` via `git_config_set_multivar_in_file()`. While
this function would typically use the caller-provided path, it knows to
fall back to using the configuration path indicated by `the_repository`.

Adapt those functions to instead use the caller-provided repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:14:12 +0000 (11:14 +0200)]

config: pass repo to functions that rename or copy sections

Refactor functions that rename or copy config sections to accept a
`struct repository` such that we can get rid of the implicit dependency
on `the_repository`. Rename the functions accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:14:07 +0000 (11:14 +0200)]

config: pass repo to `git_die_config()`

Refactor `git_die_config()` to accept a `struct repository` such that we
can get rid of the implicit dependency on `the_repository`. Rename the
function accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:14:03 +0000 (11:14 +0200)]

config: pass repo to `git_config_get_expiry_in_days()`

Refactor `git_config_get_expiry_in_days()` to accept a `struct
repository` such that we can get rid of the implicit dependency on
`the_repository`. Rename the function accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:13:59 +0000 (11:13 +0200)]

config: pass repo to `git_config_get_expiry()`

Refactor `git_config_get_expiry()` to accept a `struct repository` such
that we can get rid of the implicit dependency on `the_repository`.
Rename the function accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:13:57 +0000 (11:13 +0200)]

config: pass repo to `git_config_get_max_percent_split_change()`

Refactor `git_config_get_max_percent_split_change()` to accept a `struct
repository` such that we can get rid of the implicit dependency on
`the_repository`. Rename the function accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:13:54 +0000 (11:13 +0200)]

config: pass repo to `git_config_get_split_index()`

Refactor `git_config_get_split_index()` to accept a `struct repository`
such that we can get rid of the implicit dependency on `the_repository`.
Rename the function accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:13:49 +0000 (11:13 +0200)]

config: pass repo to `git_config_get_index_threads()`

Refactor `git_config_get_index_threads()` to accept a `struct
repository` such that we can get rid of the implicit dependency on
`the_repository`. Rename the function accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:13:46 +0000 (11:13 +0200)]

config: expose `repo_config_clear()`

While we already have `repo_config_clear()` as an alternative to
`git_config_clear()` that doesn't rely on `the_repository`, it is not
exposed to callers outside of the config subsystem. Do so.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:13:43 +0000 (11:13 +0200)]

config: introduce missing setters that take repo as parameter

While we already provide some of the config-setting interfaces with a
`struct repository` as parameter, others only have a variant that
implicitly depends on `the_repository`. Fill in those gaps such that we
can start to deprecate the repo-less variants.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:13:40 +0000 (11:13 +0200)]

path: hide functions using `the_repository` by default

The path subsystem provides a bunch of legacy functions that compute
paths relative to the "gitdir" and "commondir" directories of the global
`the_repository` variable. Use of those functions is discouraged, and it
is easy to miss the implicit dependency on `the_repository` that calls
to those functions may cause.

With `USE_THE_REPOSITORY_VARIABLE`, we have recently introduced a tool
that allows us to get rid of such functions over time. With this macro,
we can hide away functions that have such implicit dependency such that
other subsystems that want to be free of `the_repository` will not use
them by accident.

Move all path-related functions that use `the_repository` into a block
that gets only conditionally compiled depending on whether or not the
macro has been defined. This also removes all dependencies on that
variable in "path.c", allowing us to remove the definition of said
preprocessor macro.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:13:37 +0000 (11:13 +0200)]

path: stop relying on `the_repository` in `worktree_git_path()`

When not provided a worktree, then `worktree_git_path()` will fall back
to returning a path relative to the main repository. In this case, we
implicitly rely on `the_repository` to derive the path. Remove this
dependency by passing a `struct repository` as parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:13:31 +0000 (11:13 +0200)]

path: stop relying on `the_repository` when reporting garbage

We access `the_repository` in `report_linked_checkout_garbage()` both
directly and indirectly via `get_git_dir()`. Remove this dependency by
instead passing a `struct repository` as parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:13:28 +0000 (11:13 +0200)]

hooks: remove implicit dependency on `the_repository`

We implicitly depend on `the_repository` in our hook subsystem because
we use `strbuf_git_path()` to compute hook paths. Remove this dependency
by accepting a `struct repository` as parameter instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:13:25 +0000 (11:13 +0200)]

editor: do not rely on `the_repository` for interactive edits

We implicitly rely on `the_repository` when editing a file interactively
because we call `git_path()`. Adapt the function to instead take a
`struct repository` as a parameter so that we can remove this hidden
dependency.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:13:23 +0000 (11:13 +0200)]

path: expose `do_git_common_path()` as `repo_common_pathv()`

With the same reasoning as the preceding commit, expose the function
`do_git_common_path()` as `repo_common_pathv()`. While at it, reorder
parameters such that they match the order we have in `repo_git_pathv()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Tue, 13 Aug 2024 09:13:20 +0000 (11:13 +0200)]

path: expose `do_git_path()` as `repo_git_pathv()`

We're about to move functions of the "path" subsytem that do not use a
`struct repository` into "path.h" as static inlined functions. This will
require us to call `do_git_path()`, which is internal to "path.c".

Expose the function as `repo_git_pathv()` to prepare for the change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Junio C Hamano [Fri, 9 Aug 2024 22:31:35 +0000 (15:31 -0700)]

remerge-diff: clean up temporary objdir at a central place

After running a diff between two things, or a series of diffs while
walking the history, the diff computation is concluded by a call to
diff_result_code() to extract the exit status of the diff machinery.

The function can work on "struct diffopt", but all the callers
historically and currently pass "struct diffopt" that is embedded in
the "struct rev_info" that is used to hold the remerge_diff bit and
the remerge_objdir variable that points at the temporary object
directory in use.

Redefine diff_result_code() to take the whole "struct rev_info" to
give it an access to these members related to remerge-diff, so that
it can get rid of the temporary object directory for any and all
callers that used the feature. We can lose the equivalent code to
do so from the code paths for individual commands, diff-tree, diff,
and log.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Junio C Hamano [Fri, 9 Aug 2024 22:30:51 +0000 (15:30 -0700)]

remerge-diff: lazily prepare temporary objdir on demand

It is error prone for each caller that sets revs.remerge_diff bit
to be responsible for preparing a temporary object directory and
rotate it into the list of alternate object stores, making it the
primary object store.

Instead, remove the code to set up and arrange the temporary object
directory from the current callers and implement it in the code that
runs remerge-diff logic. The code to undo the futzing of the list
of alternate object store is still spread across the callers, but we
will deal with it in future steps.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Junio C Hamano [Fri, 9 Aug 2024 17:14:12 +0000 (10:14 -0700)]

doc: grammofix in git-diff-tree

Describe in present tense what the option does when it is given.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Junio C Hamano [Fri, 9 Aug 2024 17:13:49 +0000 (10:13 -0700)]

tutorial: grammofix

We say "these", so "range notations" must be plural.

Reported-by: Furkan Akkurt
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

John Cai [Fri, 9 Aug 2024 15:37:51 +0000 (15:37 +0000)]

ref-filter: populate symref from iterator

With a previous commit, the reference the symbolic ref points to is saved
in the ref iterator records. Instead of making a separate call to
resolve_refdup() each time, we can just populate the ref_array_item with
the value from the iterator.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

John Cai [Fri, 9 Aug 2024 15:37:50 +0000 (15:37 +0000)]

refs: add referent to each_ref_fn

Add a parameter to each_ref_fn so that callers to the ref APIs
that use this function as a callback can have acess to the
unresolved value of a symbolic ref.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

John Cai [Fri, 9 Aug 2024 15:37:49 +0000 (15:37 +0000)]

refs: keep track of unresolved reference value in iterators

Since ref iterators do not hold onto the direct value of a reference
without resolving it, the only way to get ahold of a direct value of a
symbolic ref is to make a separate call to refs_read_symbolic_ref.

To make accessing the direct value of a symbolic ref more efficient,
let's save the direct value of the ref in the iterators for both the
files backend and the reftable backend.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Xing Xin [Fri, 9 Aug 2024 07:24:52 +0000 (07:24 +0000)]

diff-tree: fix crash when used with --remerge-diff

When using "git-diff-tree" to get the tree diff for merge commits with
the diff format set to `remerge`, a bug is triggered as shown below:

  $ git diff-tree -r --remerge-diff 363337e6eb
  363337e6eb812d0c0d785ed4261544f35559ff8b
  BUG: log-tree.c:1006: did a remerge diff without remerge_objdir?!?

This bug is reported by `log-tree.c:do_remerge_diff`, where a bug check
added in commit 7b90ab467a (log: clean unneeded objects during log
--remerge-diff, 2022-02-02) detects the absence of `remerge_objdir` when
attempting to clean up temporary objects generated during the remerge
process.

After some further digging, I find that the remerge-related diff options
were introduced in db757e8b8d (show, log: provide a --remerge-diff
capability, 2022-02-02), which also affect the setup of `rev_info` for
"git-diff-tree", but were not accounted for in the original
implementation (inferred from the commit message).

Elijah Newren, the author of the remerge diff feature, notes that other
callers of `log-tree.c:log_tree_commit` (the only caller of
`log-tree.c:do_remerge_diff`) also exist, but:

  `builtin/am.c`: manually sets all flags; remerge_diff is not among them
  `sequencer.c`: manually sets all flags; remerge_diff is not among them

so `builtin/diff-tree.c` really is the only caller that was overlooked
when remerge-diff functionality was added.

This commit resolves the crash by adding `remerge_objdir` setup logic to
`builtin/diff-tree.c`, mirroring `builtin/log.c:cmd_log_walk_no_free`.
It also includes the necessary cleanup for `remerge_objdir`.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Junio C Hamano [Thu, 8 Aug 2024 21:19:25 +0000 (14:19 -0700)]

tests: drop use of 'tee' that hides exit status

A few tests have "| tee output" downstream of a git command, and
then inspect the contents of the file. The net effect is that we
use an extra process, and hide the exit status from the upstream git
command.

In any of these tests, I do not see a reason why we want to hide a
possible failure from these git commands. Replace the use of tee
with a plain simple redirection.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Junio C Hamano [Wed, 7 Aug 2024 16:46:53 +0000 (09:46 -0700)]

The third batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Junio C Hamano [Thu, 8 Aug 2024 17:41:20 +0000 (10:41 -0700)]

Merge branch 'ps/p4-tests-updates'

Perforce tests have been updated.

* ps/p4-tests-updates:
  t98xx: mark Perforce tests as memory-leak free
  ci: update Perforce version to r23.2
  t98xx: fix Perforce tests with p4d r23 and newer

commit | commitdiff | tree

Junio C Hamano [Thu, 8 Aug 2024 17:41:20 +0000 (10:41 -0700)]

Merge branch 'dh/encoding-trace-optim'

An expensive operation to prepare tracing was done in re-encoding
code path even when the tracing was not requested, which has been
corrected.

* dh/encoding-trace-optim:
convert: return early when not tracing

commit | commitdiff | tree

Junio C Hamano [Thu, 8 Aug 2024 17:41:19 +0000 (10:41 -0700)]

Merge branch 'ps/doc-more-c-coding-guidelines'

Some project conventions have been added to CodingGuidelines.

* ps/doc-more-c-coding-guidelines:
  Documentation: consistently use spaces inside initializers
  Documentation: document idiomatic function names
  Documentation: document naming schema for structs and their functions
  Documentation: clarify indentation style for C preprocessor directives
  clang-format: fix indentation width for preprocessor directives

commit | commitdiff | tree

Junio C Hamano [Thu, 8 Aug 2024 17:41:19 +0000 (10:41 -0700)]

Merge branch 'rs/grep-omit-blank-lines-after-function-at-eof'

"git grep -W" omits blank lines that follow the found function at
the end of the file, just like it omits blank lines before the next
function.

* rs/grep-omit-blank-lines-after-function-at-eof:
grep: -W: skip trailing empty lines at EOF, too

commit | commitdiff | tree

Junio C Hamano [Thu, 8 Aug 2024 17:41:19 +0000 (10:41 -0700)]

Merge branch 'dd/notes-empty-no-edit-by-default'

"git notes add -m '' --allow-empty" and friends that take prepared
data to create notes should not invoke an editor, but it started
doing so since Git 2.42, which has been corrected.

* dd/notes-empty-no-edit-by-default:
notes: do not trigger editor when adding an empty note

commit | commitdiff | tree

Junio C Hamano [Thu, 8 Aug 2024 17:41:18 +0000 (10:41 -0700)]

Merge branch 'es/shell-check-updates'

Test script linter has been updated to catch an attempt to use
one-shot export construct "VAR=VAL func" for shell functions (which
does not work for some shells) better.

* es/shell-check-updates:
  check-non-portable-shell: improve `VAR=val shell-func` detection
  check-non-portable-shell: suggest alternative for `VAR=val shell-func`
  check-non-portable-shell: loosen one-shot assignment error message
  t4034: fix use of one-shot variable assignment with shell function
  t3430: drop unnecessary one-shot "VAR=val shell-func" invocation

commit | commitdiff | tree

Junio C Hamano [Thu, 8 Aug 2024 17:41:18 +0000 (10:41 -0700)]

Merge branch 'rj/add-p-pager'

A 'P' command to "git add -p" that passes the patch hunk to the
pager has been added.

* rj/add-p-pager:
  add-patch: render hunks through the pager
  pager: introduce wait_for_pager
  pager: do not close fd 2 unnecessarily
  add-patch: test for 'p' command

commit | commitdiff | tree

Junio C Hamano [Thu, 8 Aug 2024 17:41:17 +0000 (10:41 -0700)]

Merge branch 'ks/unit-test-comment-typofix'

Typofix.

* ks/unit-test-comment-typofix:
unit-tests/test-lib: fix typo in check_pointer_eq() description

commit | commitdiff | tree

AbdAlRahman Gad [Thu, 8 Aug 2024 16:32:07 +0000 (19:32 +0300)]

t7004: make use of write_script

Use write_script which takes care of emitting the `#!/bin/sh` line
and the `chmod +x`.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

AbdAlRahman Gad [Thu, 8 Aug 2024 16:32:06 +0000 (19:32 +0300)]

t7004: use single quotes instead of double quotes

Some test bodies and test description are surrounded with double
quotes instead of single quotes, violating our coding style.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

AbdAlRahman Gad [Thu, 8 Aug 2024 16:32:05 +0000 (19:32 +0300)]

t7004: begin the test body on the same line as test_expect_success

Test body should begin with a single quote right after the test
description instead of backslash followed by new line.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

AbdAlRahman Gad [Thu, 8 Aug 2024 16:32:04 +0000 (19:32 +0300)]

t7004: description on the same line as test_expect_success

There are several tests in t7004 where the test description that
follows `test_expect_success` is on a separate line, violating our
coding style. Adapt these to be on the same line.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

AbdAlRahman Gad [Thu, 8 Aug 2024 16:32:02 +0000 (19:32 +0300)]

t7004: do not prepare things outside test_expect_success

Do not prepare expect and other things outside test_expect_success.
If such code fails for some reason, we won't necessarily hear about
it in a timely fashion (or perhaps at all). By placing all code inside
`test_expect_success` it ensures that we know immediately if it fails.

Also add '\' before EOF to avoid shell interpolation and '-' to allow
indentation of the body.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

AbdAlRahman Gad [Thu, 8 Aug 2024 16:32:03 +0000 (19:32 +0300)]

t7004: use indented here-doc

Use <<-\EOF instead of <<\EOF where the latter allows us to indent
the body of the here-doc.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

AbdAlRahman Gad [Thu, 8 Aug 2024 16:32:01 +0000 (19:32 +0300)]

t7004: one command per line

One of the tests in t7004 has multiple commands on a single line,
which is discouraged. Adapt these by splitting up these into one
line per command.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

AbdAlRahman Gad [Thu, 8 Aug 2024 16:32:00 +0000 (19:32 +0300)]

t7004: remove space after redirect operators

Modernize 't7004' by removing whitespace after redirect operators.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Thu, 8 Aug 2024 14:06:58 +0000 (16:06 +0200)]

reftable/stack: handle locked tables during auto-compaction

When compacting tables, it may happen that we want to compact a set of
tables which are already locked by a concurrent process that compacts
them. In the case where we wanted to perform a full compaction of all
tables it is sensible to bail out in this case, as we cannot fulfill the
requested action.

But when performing auto-compaction it isn't necessarily in our best
interest of us to abort the whole operation. For example, due to the
geometric compacting schema that we use, it may be that process A takes
a lot of time to compact the bulk of all tables whereas process B
appends a bunch of new tables to the stack. B would in this case also
notice that it has to compact the tables that process A is compacting
already and thus also try to compact the same range, probably including
the new tables it has appended. But because those tables are locked
already, it will fail and thus abort the complete auto-compaction. The
consequence is that the stack will grow longer and longer while A isn't
yet done with compaction, which will lead to a growing performance
impact.

Instead of aborting auto-compaction altogether, let's gracefully handle
this situation by instead compacting tables which aren't locked. To do
so, instead of locking from the beginning of the slice-to-be-compacted,
we start locking tables from the end of the slice. Once we hit the first
table that is locked already, we abort. If we succeeded to lock two or
more tables, then we simply reduce the slice of tables that we're about
to compact to those which we managed to lock.

This ensures that we can at least make some progress for compaction in
said scenario. It also helps in other scenarios, like for example when a
process died and left a stale lockfile behind. In such a case we can at
least ensure some compaction on a best-effort basis.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Thu, 8 Aug 2024 14:06:53 +0000 (16:06 +0200)]

reftable/stack: fix corruption on concurrent compaction

The locking employed by compaction uses the following schema:

  1. Lock "tables.list" and verify that it matches the version we have
     loaded in core.

  2. Lock each of the tables in the user-supplied range of tables that
     we are supposed to compact. These locks prohibit any concurrent
     process to compact those tables while we are doing that.

  3. Unlock "tables.list". This enables concurrent processes to add new
     tables to the stack, but also allows them to compact tables outside
     of the range of tables that we have locked.

  4. Perform the compaction.

  5. Lock "tables.list" again.

  6. Move the compacted table into place.

  7. Write the new order of tables, including the compacted table, into
     the lockfile.

  8. Commit the lockfile into place.

Letting concurrent processes modify the "tables.list" file while we are
doing the compaction is very much part of the design and thus expected.
After all, it may take some time to compact tables in the case where we
are compacting a lot of very large tables.

But there is a bug in the code. Suppose we have two processes which are
compacting two slices of the table. Given that we lock each of the
tables before compacting them, we know that the slices must be disjunct
from each other. But regardless of that, compaction performed by one
process will always impact what the other process needs to write to the
"tables.list" file.

Right now, we do not check whether the "tables.list" has been changed
after we have locked it for the second time in (5). This has the
consequence that we will always commit the old, cached in-core tables to
disk without paying to respect what the other process has written. This
scenario would then lead to data loss and corruption.

This can even happen in the simpler case of one compacting process and
one writing process. The newly-appended table by the writing process
would get discarded by the compacting process because it never sees the
new table.

Fix this bug by re-checking whether our stack is still up to date after
locking for the second time. If it isn't, then we adjust the indices of
tables to replace in the updated stack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Thu, 8 Aug 2024 14:06:49 +0000 (16:06 +0200)]

reftable/stack: use lock_file when adding table to "tables.list"

When modifying "tables.list", we need to lock the list before updating
it to ensure that no concurrent writers modify the list at the same
point in time. While we do this via the `lock_file` subsystem when
compacting the stack, we manually handle the lock when adding a new
table to it. While not wrong, it is at least inconsistent.

Refactor the code to consistently lock "tables.list" via the `lock_file`
subsytem.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Thu, 8 Aug 2024 14:06:44 +0000 (16:06 +0200)]

reftable/stack: do not die when fsyncing lock file files

We use `fsync_component_or_die()` when committing an addition to the
"tables.list" lock file, which unsurprisingly dies in case the fsync
fails. Given that this is part of the reftable library, we should never
die and instead let callers handle the error.

Adapt accordingly and use `fsync_component()` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Thu, 8 Aug 2024 14:06:39 +0000 (16:06 +0200)]

reftable/stack: simplify tracking of table locks

When compacting tables, we store the locks of all tables we are about to
compact in the `table_locks` array. As we currently only ever compact
all tables in the user-provided range or none, we simply track those
locks via the indices of the respective tables in the merged stack.

This is about to change though, as we will introduce a mode where auto
compaction gracefully handles the case of already-locked files. In this
case, it may happen that we only compact a subset of the user-supplied
range of tables. In this case, the indices will not necessarily match
the lock indices anymore.

Refactor the code such that we track the number of locks via a separate
variable. The resulting code is expected to perform the same, but will
make it easier to perform the described change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Thu, 8 Aug 2024 14:06:34 +0000 (16:06 +0200)]

reftable/stack: update stats on failed full compaction

When auto-compaction fails due to a locking error, we update the
statistics to indicate this failure. We're not doing the same when
performing a full compaction.

Fix this inconsistency by using `stack_compact_range_stats()`, which
handles the stat update for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Thu, 8 Aug 2024 14:06:29 +0000 (16:06 +0200)]

reftable/stack: test compaction with already-locked tables

We're lacking test coverage for compacting tables when some of the
tables that we are about to compact are locked. Add two tests that
exercise this, one for auto-compaction and one for full compaction.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Thu, 8 Aug 2024 14:06:24 +0000 (16:06 +0200)]

reftable/stack: extract function to setup stack with N tables

We're about to add two tests, and both of them will want to initialize
the reftable stack with a set of N tables. Introduce a new function that
handles this and refactor existing tests that use such a setup to use
it.

Note that this changes the exact records contained in the preexisting
tests. This is fine though as we only care about the shape of the stack
here, not the shape of each table.

Furthermore, with this change we now start to disable auto compaction
when writing the tables, as otherwise we might not end up with the
expected amount of new tables added. This also slightly changes the
behaviour of these tests, but the properties we care for remain intact.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

Patrick Steinhardt [Thu, 8 Aug 2024 14:06:19 +0000 (16:06 +0200)]

reftable/stack: refactor function to gather table sizes

Refactor the function that gathers table sizes to be more idiomatic. For
one, use `REFTABLE_CALLOC_ARRAY()` instead of `reftable_calloc()`.
Second, avoid using an integer to iterate through the tables in the
reftable stack given that `stack_len` itself is using a `size_t`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

shejialuo [Thu, 8 Aug 2024 11:31:42 +0000 (19:31 +0800)]

fsck: add ref name check for files backend

The git-fsck(1) only implicitly checks the reference, it does not fully
check refs with bad format name such as standalone "@".

However, a file ending with ".lock" should not be marked as having a bad
ref name. It is expected that concurrent writers may have such lock files.
We currently ignore this situation. But for bare ".lock" file, we will
report it as error.

In order to provide such checks, add a new fsck message id "badRefName"
with default ERROR type. Use existing "check_refname_format" to explicit
check the ref name. And add a new unit test to verify the functionality.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

shejialuo [Thu, 8 Aug 2024 11:31:31 +0000 (19:31 +0800)]

files-backend: add unified interface for refs scanning

For refs and reflogs, we need to scan its corresponding directories to
check every regular file or symbolic link which shares the same pattern.
Introduce a unified interface for scanning directories for
files-backend.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

shejialuo [Thu, 8 Aug 2024 11:27:28 +0000 (19:27 +0800)]

builtin/refs: add verify subcommand

Introduce a new subcommand "verify" in git-refs(1) to allow the user to
check the reference database consistency and also this subcommand will
be used as the entry point of checking refs for "git-fsck(1)".

Add "verbose" field into "fsck_options" to indicate whether we should
print verbose messages when checking refs and objects consistency.

Remove bit-field for "strict" field, this is because we cannot take
address of a bit-field which makes it unhandy to set member variables
when parsing the command line options.

The "git-fsck(1)" declares "fsck_options" variable with "static"
identifier which avoids complaint by the leak-checker. However, in
"git-refs verify", we need to do memory clean manually. Thus add
"fsck_options_clear" function in "fsck.c" to provide memory clean
operation.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

shejialuo [Thu, 8 Aug 2024 11:27:17 +0000 (19:27 +0800)]

refs: set up ref consistency check infrastructure

The "struct ref_store" is the base class which contains the "be" pointer
which provides backend-specific functions whose interfaces are defined
in the "ref_storage_be". We could reuse this polymorphism to define only
one interface. For every backend, we need to provide its own function
pointer.

The interfaces defined in the `ref_storage_be` are carefully structured
in semantic. It's organized as the five parts:

1. The name and the initialization interfaces.
2. The ref transaction interfaces.
3. The ref internal interfaces (pack, rename and copy).
4. The ref filesystem interfaces.
5. The reflog related interfaces.

To keep consistent with the git-fsck(1), add a new interface named
"fsck_refs_fn" to the end of "ref_storage_be". This semantic cannot be
grouped into any above five categories. Explicitly add blank line to
make it different from others.

Last, implement placeholder functions for each ref backends.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

shejialuo [Thu, 8 Aug 2024 11:27:08 +0000 (19:27 +0800)]

fsck: add refs report function

Introduce a new struct "fsck_ref_report" to contain the information we
need when reporting refs-related messages.

With the new "fsck_vreport" function, add a new function
"fsck_report_ref" to report refs-related fsck error message. Unlike
"report" function uses the exact parameters, we simply pass "struct
fsck_ref_report *report" as the parameter. This is because at current we
don't know exactly how many fields we need. By passing this parameter,
we don't need to change this function prototype when we want to add more
information into "fsck_ref_report".

We have introduced "fsck_report_ref" function to report the error
message for refs. We still need to add the corresponding callback
function. Create refs-specific "error_func" callback
"fsck_refs_error_function".

Last, add "FSCK_REFS_OPTIONS_DEFAULT" macro to create default options
when checking ref consistency.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

shejialuo [Thu, 8 Aug 2024 11:26:57 +0000 (19:26 +0800)]

fsck: add a unified interface for reporting fsck messages

The static function "report" provided by "fsck.c" aims at checking error
type and calling the callback "error_func" to report the message. Both
refs and objects need to check the error type of the current fsck
message. In order to extract this common behavior, create a new function
"fsck_vreport". Instead of using "...", provide "va_list" to allow more
flexibility.

Instead of changing "report" prototype to be align with the
"fsck_vreport" function, we leave the "report" prototype unchanged due
to the reason that there are nearly 62 references about "report"
function. Simply change "report" function to use "fsck_vreport" to
report objects related messages.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

shejialuo [Thu, 8 Aug 2024 11:26:47 +0000 (19:26 +0800)]

fsck: make "fsck_error" callback generic

The "fsck_error" callback is designed to report the objects-related
error messages. It accepts two parameter "oid" and "object_type" which
is not generic. In order to provide a unified callback which can report
either objects or refs, remove the objects-related parameters and add
the generic parameter "void *fsck_report".

Create a new "fsck_object_report" structure which incorporates the
removed parameters "oid" and "object_type". Then change the
corresponding references to adapt to new "fsck_error" callback.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit | commitdiff | tree

shejialuo [Thu, 8 Aug 2024 11:24:25 +0000 (19:24 +0800)]

fsck: rename objects-related fsck error functions

The names of objects-related fsck error functions are generic. It's OK
when there is only object database check. However, we are going to
introduce refs database check report function. To avoid ambiguity,
rename object-related fsck error functions to explicitly indicate these
functions are used to report objects-related messages.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom