git.ipfire.org Git - thirdparty/git.git/log

doc: fetch: document `--jobs=0` behavior

In c39952b92 (fetch: choose a sensible default with --jobs=0 again,
2023-02-20), the `--jobs=0` behavior was (re)introduced, but it went
undocumented. Since this is the same behavior as `git -c fetch.parallel=0
fetch`, which is documented, this change creates symmetry between the two
documentation sections.

Signed-off-by: Daniel D. Beck <daniel@ddbeck.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit: use commit graph in `lookup_commit_reference_gently()`

In the preceding commit we refactored `lookup_commit_reference_gently()`
so that it doesn't parse non-commit objects anymore. This has led to a
speedup when git-receive-pack(1) accepts a shallow push into a repo
with lots of refs that point to blobs or trees.

But while this case is now faster, we still have the issue that
accepting pushes with lots of "normal" refs that point to commits are
still slow. This is mostly because we look up the commits via the object
database, and that is rather costly.

Adapt the code to use `repo_parse_commit_gently()` instead of
`parse_object()` to parse the resulting commit object. This function
knows to use the commit-graph to fill in the object, which is way more
cost efficient.

This leads to another significant speedup when accepting shallow pushes.
The following benchmark pushes a single objects from a shallow clone
into a repository with 600,000 references that all point to commits:

  Benchmark 1: git-receive-pack (rev = HEAD~)
    Time (mean ± σ):      9.179 s ±  0.031 s    [User: 8.858 s, System: 0.528 s]
    Range (min … max):    9.154 s …  9.213 s    3 runs

  Benchmark 2: git-receive-pack (rev = HEAD)
    Time (mean ± σ):      2.337 s ±  0.032 s    [User: 2.331 s, System: 0.234 s]
    Range (min … max):    2.308 s …  2.371 s    3 runs

  Summary
    git-receive-pack . </tmp/input (rev = HEAD) ran
      3.93 ± 0.05 times faster than git-receive-pack (rev = HEAD~)

Also, this again leads to a significant reduction in memory allocations.
Before this change:

  HEAP SUMMARY:
      in use at exit: 17,524,978 bytes in 22,393 blocks
    total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated

And after this change:

  HEAP SUMMARY:
      in use at exit: 11,534,036 bytes in 12,406 blocks
    total heap usage: 13,284 allocs, 878 frees, 15,521,451 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit: make `repo_parse_commit_no_graph()` more robust

In the next commit we will start to parse more commits via the
commit-graph. This change will lead to a segfault though because we try
to access the tree of a commit via `repo_get_commit_tree()`, but:

  - The commit has been parsed via the commit-graph, and thus its
    `maybe_tree` field is not yet populated.

  - We cannot use the commit-graph to populate the commit's tree because
    we're in the process of writing the commit-graph.

The consequence is that we'll get a `NULL` pointer for the tree in
`write_graph_chunk_data()`.

In theory we are already mindful of this situation, as we explicitly use
`repo_parse_commit_no_graph()` to parse the commit without the help of
the commit-graph. But that doesn't do the trick as the commit is already
marked as parsed, so the function will not re-populate it. And as the
commit-graph has been closed, neither will `get_commit_tree_oid()` be
able to load the tree for us.

It seems like this issue can only be hit under artificial circumstances:
the error was hit via `git_test_write_commit_graph_or_die()`, which is
run by git-commit(1) and git-merge(1) in case `GIT_TEST_COMMIT_GRAPH=1`:

  $ GIT_TEST_COMMIT_GRAPH=1 meson test t7507-commit-verbose \
      --test-args=-ix -i
  ...
  ++ git -c commit.verbose=true commit --amend
  hint: Waiting for your editor to close the file...
  ./test-lib.sh: line 1012: 55895 Segmentation fault         (core dumped) git -c commit.verbose=true commit --amend

To the best of my knowledge, this is the only case where we end up
writing a commit-graph in the same process that might have already
consulted the commit-graph to look up arbitrary objects. But regardless
of that, this feels like a bigger accident that is just waiting to
happen.

Make the code more robust by extending `repo_parse_commit_no_graph()` to
unparse a commit first in case we detect it's coming from a graph. This
ensures that we will re-read the object without it, and thus we will
populate `maybe_tree` properly.

This fix shouldn't have any performance consequences: the function is
only ever called in the "commit-graph.c" code, and we'll only re-parse
the commit at most once.

Add an exclusion to our Coccinelle rules so that it doesn't complain
about us accessing `maybe_tree` directly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit: avoid parsing non-commits in `lookup_commit_reference_gently()`

The function `lookup_commit_reference_gently()` can be used to look up a
committish by object ID. As such, the function knows to peel for example
tag objects so that we eventually end up with the commit.

The function is used quite a lot throughout our tree. One such user is
"shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this
function is to figure out whether a shallow push is missing any objects
that are required to satisfy the ref updates, and if so, which of the
ref updates is missing objects.

This is done by painting the tree with `UNINTERESTING`. We start
painting by calling `refs_for_each_ref()` so that we can mark all
existing referenced objects as the boundary of objects that we already
have, and which are supposed to be fully connected. The reference tips
are then parsed via `lookup_commit_reference_gently()`, and the commit
is then marked as uninteresting.

But references may not necessarily point to a committish, and if a lot
of them aren't then this step takes a lot of time. This is mostly due to
the way that `lookup_commit_reference_gently()` is implemented: before
we learn about the type of the object we already call `parse_object()`
on the object ID. This has two consequences:

  - We parse all objects, including trees and blobs, even though we
    don't even need the contents of them.

  - More importantly though, `parse_object()` will cause us to check
    whether the object ID matches its contents.

Combined this means that we deflate and hash every non-committish
object, and that of course ends up being both CPU- and memory-intensive.

Improve the logic so that we first use `peel_object()`. This function
won't parse the object for us, and thus it allows us to learn about the
object's type before we parse and return it.

The following benchmark pushes a single object from a shallow clone into
a repository that has 100,000 refs. These refs were created by listing
all objects via `git rev-list(1) --objects --all` and creating refs for
a subset of them, so lots of those refs will cover non-commit objects.

  Benchmark 1: git-receive-pack (rev = HEAD~)
    Time (mean ± σ):     62.571 s ±  0.413 s    [User: 58.331 s, System: 4.053 s]
    Range (min … max):   62.191 s … 63.010 s    3 runs

  Benchmark 2: git-receive-pack (rev = HEAD)
    Time (mean ± σ):     38.339 s ±  0.192 s    [User: 36.220 s, System: 1.992 s]
    Range (min … max):   38.176 s … 38.551 s    3 runs

  Summary
    git-receive-pack . </tmp/input (rev = HEAD) ran
      1.63 ± 0.01 times faster than git-receive-pack . </tmp/input (rev = HEAD~)

This leads to a sizeable speedup as we now skip reading and parsing
non-commit objects. Before this change we spent around 40% of the time
in `assign_shallow_commits_to_refs()`, after the change we only spend
around 1.2% of the time in there. Almost the entire remainder of the
time is spent in git-rev-list(1) to perform the connectivity checks.

Despite the speedup though, this also leads to a massive reduction in
allocations. Before:

  HEAP SUMMARY:
      in use at exit: 352,480,441 bytes in 97,185 blocks
    total heap usage: 2,793,820 allocs, 2,696,635 frees, 67,271,456,983 bytes allocated

And after:

  HEAP SUMMARY:
      in use at exit: 17,524,978 bytes in 22,393 blocks
    total heap usage: 33,313 allocs, 10,920 frees, 407,774,251 bytes allocated

Note that when all references refer to commits performance stays roughly
the same, as expected. The following benchmark was executed with 600k
commits:

  Benchmark 1: git-receive-pack (rev = HEAD~)
    Time (mean ± σ):      9.101 s ±  0.006 s    [User: 8.800 s, System: 0.520 s]
    Range (min … max):    9.095 s …  9.106 s    3 runs

  Benchmark 2: git-receive-pack (rev = HEAD)
    Time (mean ± σ):      9.128 s ±  0.094 s    [User: 8.820 s, System: 0.522 s]
    Range (min … max):    9.019 s …  9.188 s    3 runs

  Summary
    git-receive-pack (rev = HEAD~) ran
      1.00 ± 0.01 times faster than git-receive-pack (rev = HEAD)

This will be improved in the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: use test_seq -f and pipes in a few more places

Several tests use a pattern that writes to a temporary file like this:

  printf "do something with %d\n" $(test_seq <count>) >tmpfile &&
  git do-something --stdin <tmpfile

Other tests use test_seq's -f parameter, but still write to a temporary file:

  test_seq -f "do something with %d" <count> >input &&
  git do-something --stdin <input

Simplify both of these patterns to

  test_seq -f "do something with %d" <count> |
  git do-something --stdin

Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

wt-status: use hash_algo from local repository instead of global the_hash_algo

wt-status.c still uses the global the_hash_algo even though a repository
instance is already available via struct wt_status.

Replace uses of the_hash_algo with the hash algorithm stored in the
associated repository (s->repo->hash_algo or r->hash_algo).

This removes another dependency on global state and keeps wt-status
consistent with local repository usage.

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

wt-status: replace uses of the_repository with local repository instances

wt-status.c uses the global the_repository in several places even when a
repository instance is already available via struct wt_status *s or struct
repository *r.

Replace these uses of the_repository with the repository available in the
local context (i.e. s->repo or r).

The replacements of all the_repository with s->repo are mostly to cases
where a repository instance is already available via struct wt_status *s
and struct repository *r, all functions operating on struct wt_status *s
are only used after s is initialized by wt_status_prepare(), which sets
s->repo from the repository provided by the caller. As a result, s->repo is
guaranteed to be available and consistent whenever these functions are
invoked.

This reduces reliance on global state and keeps wt-status consistent,
though many functions operating on struct wt_status *s are called via
commit.c and it still relies on the_repository, but within wt-status.c the
local repository pointer refers to the same underlying repository object.

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

wt-status: pass struct repository through function parameters

Some functions in wt-status.c (count_stash_entries(),
read_line_from_git_path(), abbrev_oid_in_line(), and
read_rebase_todolist()) rely on the_repository as they do not have access
to a local repository instance.

Add a struct repository *r parameter to these functions and pass the local
repository instance through the callers, which already have access to it
either directly by struct repository *r or indirectly by struct wt_state
*s (s->repo).

Replace uses of the_repository in these functions with the passed parameter.

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'pks-meson-fixes' of github.com:pks-gitlab/git-gui

* 'pks-meson-fixes' of github.com:pks-gitlab/git-gui:
  git-gui: wire up "git-gui--askyesno" with Meson
  git-gui: massage "git-gui--askyesno" with "generate-script.sh"
  git-gui: prefer shell at "/bin/sh" with Meson
  git-gui: fix use of GIT_CEILING_DIRECTORIES

Signed-off-by: Johannes Sixt <j6t@kdbg.org>

format-patch: fix From header in cover letter

"git format-patch" takes "--from=<user ident>" command line option
and uses the given ident for patch e-mails, but this is not applied
to the cover letter, the option is ignored and the committer ident
of the current user is used. This has been the case ever since
"--from" was introduced in a9080475 (teach format-patch to place
other authors into in-body "From", 2013-07-03).

Teach the make_cover_letter() function to honor the option, instead of
always using the current committer identity. Change variable name from
"committer" to "from" to better reflect the purpose of the variable.

Signed-off-by: Mirko Faina <mroik@delayed.space>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The 5th batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jc/doc-rerere-update'

Doc update.

* jc/doc-rerere-update:
rerere: minor documantation update

Merge branch 'sd/t7003-test-path-is-helpers'

Test updates.

* sd/t7003-test-path-is-helpers:
t7003: modernize path existence checks using test helpers

Merge branch 'kh/doc-rerere-options-xref'

Doc update.

* kh/doc-rerere-options-xref:
doc: rerere-options.adoc: link to git-rerere(1)

Merge branch 'bk/t2003-modernise'

Test update.

* bk/t2003-modernise:
t2003: modernize path existence checks using test helpers

Merge branch 'rs/commit-commit-stack'

Code clean-up to use the commit_stack API.

* rs/commit-commit-stack:
commit: use commit_stack

Merge branch 'rs/clean-includes'

Clean up redundant includes of header files.

* rs/clean-includes:
remove duplicate includes

Merge branch 'rs/xdiff-wo-the-repository'

Reduce dependency on the_repository of xdiff-interface layer.

* rs/xdiff-wo-the-repository:
xdiff-interface: stop using the_repository

Merge branch 'rs/version-wo-the-repository'

Code clean-up.

* rs/version-wo-the-repository:
version: stop using the_repository

Merge branch 'yt/merge-file-outside-a-repository'

"git merge-file" can be run outside a repository, but it ignored
all configuration, even the per-user ones. The command now uses
available configuration files to find its customization.

* yt/merge-file-outside-a-repository:
merge-file: honor merge.conflictStyle outside of a repository

Merge branch 'ja/doc-synopsis-style-even-more'

A handful of documentation pages have been modernized to use the
"synopsis" style.

* ja/doc-synopsis-style-even-more:
  doc: convert git-show to synopsis style
  doc: fix some style issues in git-clone and for-each-ref-options
  doc: finalize git-clone documentation conversion to synopsis style
  doc: convert git-submodule to synopsis style

Merge branch 'pc/lockfile-pid'

Allow recording process ID of the process that holds the lock next
to a lockfile for diagnosis.

* pc/lockfile-pid:
lockfile: add PID file for debugging stale locks

environment: stop storing `core.attributesFile` globally

The `core.attributeFile` config value is parsed in
git_default_core_config(), loaded eagerly and stored in the global
variable `git_attributes_file`. Storing this value in a global
variable can lead to it being overwritten by another repository when
more than one Git repository run in the same Git process.

Create a new struct `repo_config_values` to hold this value and
other repository dependent values parsed by `git_default_config()`.
This will ensure the current behaviour remains the same while also
enabling the libification of Git.

An accessor function 'repo_config_values()' s created to ensure
that we do not access an uninitialized repository, or an instance
of a different repository than the current one.

Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitweb: let page header grow on mobile for long wrapped project names

On mobile, long project names in the page header can wrap to multiple lines,
but the fixed 25px header height does not grow with wrapped content.

Switch the header from fixed height to min-height so it expands as needed
while keeping the same baseline height for single-line titles.

Signed-off-by: Rito Rhymes <rito@ritovision.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitweb: fix mobile footer overflow by wrapping text and clearing floats

On narrow screens, footer text can wrap, but the fixed 22px footer height
and floated footer blocks can cause overflow.

Switch to min-height and add a clearfix on the footer container so it grows
to contain wrapped float content cleanly.

Signed-off-by: Rito Rhymes <rito@ritovision.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitweb: fix mobile page overflow across log/commit/blob/diff views

On mobile-sized viewports, gitweb pages in log/commit/blob/diff views can
overflow horizontally due to desktop-oriented paddings and fixed-width
preformatted content.

Extend the existing mobile media query to rebalance those layouts: reduce
or clear paddings in log/commit sections, and allow horizontal scrolling
for preformatted blob/diff content instead of forcing page-wide overflow.

All layout adjustments in this patch are mobile-scoped, except one global
safeguard: set overflow-wrap:anywhere on div.log_body. Log content can
contain escaped or non-breaking text that behaves like a single long token
and can overflow at any viewport width, including desktop.

Signed-off-by: Rito Rhymes <rito@ritovision.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitweb: prevent project search bar from overflowing on mobile

On narrow screens, the project search input can exceed the available width
and force page-wide horizontal scrolling.

Add a mobile media query and apply side padding to the search container,
then cap the input width to its container with border-box sizing so the
form stays within the viewport.

Signed-off-by: Rito Rhymes <rito@ritovision.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitweb: add viewport meta tag for mobile devices

Without a viewport meta tag, phone browsers render gitweb at desktop
width and scale the whole page down to fit the screen.

Add a viewport meta tag so the layout viewport tracks device width.
This is the baseline needed for mobile CSS fixes in follow-up commits.

Signed-off-by: Rito Rhymes <rito@ritovision.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fetch-pack: wire up and enable auto filter logic

Previous commits have set up an infrastructure for `--filter=auto` to
automatically prepare a partial clone filter based on what the server
advertised and the client accepted.

Using that infrastructure, let's now enable the `--filter=auto` option
in `git clone` and `git fetch` by setting `allow_auto_filter` to 1.

Note that these small changes mean that when `git clone --filter=auto`
or `git fetch --filter=auto` are used, "auto" is automatically saved
as the partial clone filter for the server on the client. Therefore
subsequent calls to `git fetch` on the client will automatically use
this "auto" mode even without `--filter=auto`.

Let's also set `allow_auto_filter` to 1 in `transport.c`, as the
transport layer must be able to accept the "auto" filter spec even if
the invoking command hasn't fully parsed it yet.

When an "auto" filter is requested, let's have the "fetch-pack.c" code
in `do_fetch_pack_v2()` compute a filter and send it to the server.

In `do_fetch_pack_v2()` the logic also needs to check for the
"promisor-remote" capability and call `promisor_remote_reply()` to
parse advertised remotes and populate the list of those accepted (and
their filters).

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

promisor-remote: change promisor_remote_reply()'s signature

The `promisor_remote_reply()` function performs two tasks:
1. It uses filter_promisor_remote() to parse the server's
   "promisor-remote" advertisement and to mark accepted remotes in the
   repository configuration.
2. It assembles a reply string containing the accepted remote names to
   send back to the server.

In a following commit, the fetch-pack logic will need to trigger the
side effect (1) to ensure the repository state is correct, but it will
not need to send a reply (2).

To avoid assembling a reply string when it is not needed, let's change
the signature of promisor_remote_reply(). It will now return `void` and
accept a second `char **accepted_out` argument. Only if that argument
is not NULL will a reply string be assembled and returned back to the
caller via that argument.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

promisor-remote: keep advertised filters in memory

Currently, advertised filters are only kept in memory temporarily
during parsing, or persisted to disk if `promisor.storeFields`
contains 'partialCloneFilter'.

In a following commit though, we will add a `--filter=auto` option.
This option will enable the client to use the filters that the server
is suggesting for the promisor remotes the client accepts.

To use them even if `promisor.storeFields` is not configured, these
filters should be stored somewhere for the current session.

Let's add an `advertised_filter` field to `struct promisor_remote`
for that purpose.

To ensure that the filters are available in all cases,
filter_promisor_remote() captures them into a temporary list and
applies them to the `promisor_remote` structs after the potential
configuration reload.

Then the accepted remotes are marked as `accepted` in the repository
state. This ensures that subsequent calls to look up accepted remotes
(like in the filter construction below) actually find them.

In a following commit, we will add a `--filter=auto` option that will
enable a client to use the filters suggested by the server for the
promisor remotes the client accepted.

To enable the client to construct a filter spec based on these filters,
let's also add a `promisor_remote_construct_filter(repo)` function.

This function:

- iterates over all accepted promisor remotes in the repository,
- collects the filters advertised for them (using `advertised_filter`
added in this commit, and
- generates a single filter spec for them.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

list-objects-filter-options: support 'auto' mode for --filter

In a following commit, we are going to allow passing "auto" as a
<filterspec> to the `--filter=<filterspec>` option, but only for some
commands. Other commands that support the `--filter=<filterspec>`
option should still die() when 'auto' is passed.

Let's set up the "list-objects-filter-options.{c,h}" infrastructure to
support that:

- Add a new `unsigned int allow_auto_filter : 1;` flag to
  `struct list_objects_filter_options` which specifies if "auto" is
  accepted or not by the current command.
- Change gently_parse_list_objects_filter() to parse "auto" if it's
  accepted.
- Make sure we die() if "auto" is combined with another filter.
- Update list_objects_filter_release() to preserve the
  allow_auto_filter flag, as this function is often called (via
  opt_parse_list_objects_filter) to reset the struct before parsing a
  new value.

Let's also update `list-objects-filter.c` to recognize the new
`LOFC_AUTO` choice. Since "auto" must be resolved to a concrete filter
before filtering actually begins, initializing a filter with
`LOFC_AUTO` is invalid and will trigger a BUG().

Note that ideally combining "auto" with "auto" could be allowed, but in
practice, it's probably not worth the added code complexity. And if we
really want it, nothing prevents us to allow it in future work.

If we ever want to give a meaning to combining "auto" with a different
filter too, nothing prevents us to do that in future work either.

Also note that the new `allow_auto_filter` flag depends on the command,
not user choices, so it should be reset to the command default when
`struct list_objects_filter_options` instances are reset.

While at it, let's add a new "u-list-objects-filter-options.c" file for
`struct list_objects_filter_options` related unit tests. For now it
only tests gently_parse_list_objects_filter() though.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: fetch: document `--filter=<filter-spec>` option

The `--filter=<filter-spec>` option is documented in most commands that
support it except `git fetch`.

Let's fix that and document this option. To ensure consistency across
commands, let's reuse the exact description currently found in
`git clone`.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fetch: make filter_options local to cmd_fetch()

The `struct list_objects_filter_options filter_options` variable used
in "builtin/fetch.c" to store the parsed filters specified by
`--filter=<filterspec>` is currently a static variable global to the
file.

As we are going to use it more in a following commit, it could become a
bit less easy to understand how it's managed.

To avoid that, let's make it clear that it's owned by cmd_fetch() by
moving its definition into that function and making it non-static.

This requires passing a pointer to it through the prepare_transport(),
do_fetch(), backfill_tags(), fetch_one_setup_partial(), and fetch_one()
functions, but it's quite straightforward.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

clone: make filter_options local to cmd_clone()

The `struct list_objects_filter_options filter_options` variable used
in "builtin/clone.c" to store the parsed filters specified by
`--filter=<filterspec>` is currently a static variable global to the
file.

As we are going to use it more in a following commit, it could become
a bit less easy to understand how it's managed.

To avoid that, let's make it clear that it's owned by cmd_clone() by
moving its definition into that function and making it non-static.

The only additional change to make this work is to pass it as an
argument to checkout(). So it's a small quite cheap cleanup anyway.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

promisor-remote: allow a client to store fields

A previous commit allowed a server to pass additional fields through
the "promisor-remote" protocol capability after the "name" and "url"
fields, specifically the "partialCloneFilter" and "token" fields.

Another previous commit, c213820c51 (promisor-remote: allow a client
to check fields, 2025-09-08), has made it possible for a client to
decide if it accepts a promisor remote advertised by a server based
on these additional fields.

Often though, it would be interesting for the client to just store in
its configuration files these additional fields passed by the server,
so that it can use them when needed.

For example if a token is necessary to access a promisor remote, that
token could be updated frequently only on the server side and then
passed to all the clients through the "promisor-remote" capability,
avoiding the need to update it on all the clients manually.

Storing the token on the client side makes sure that the token is
available when the client needs to access the promisor remotes for a
lazy fetch.

To allow this, let's introduce a new "promisor.storeFields"
configuration variable.

Note that for a partial clone filter, it's less interesting to have
it stored on the client. This is because a filter should be used
right away and we already pass a `--filter=<filter-spec>` option to
`git clone` when starting a partial clone. Storing the filter could
perhaps still be interesting for information purposes.

Like "promisor.checkFields" and "promisor.sendFields", the new
configuration variable should contain a comma or space separated list
of field names. Only the "partialCloneFilter" and "token" field names
are supported for now.

When a server advertises a promisor remote, for example "foo", along
with for example "token=XXXXX" to a client, and on the client side
"promisor.storeFields" contains "token", then the client will store
XXXXX for the "remote.foo.token" variable in its configuration file
and reload its configuration so it can immediately use this new
configuration variable.

A message is emitted on stderr to warn users when the config is
changed.

Note that even if "promisor.acceptFromServer" is set to "all", a
promisor remote has to be already configured on the client side for
some of its config to be changed. In any case no new remote is
configured and no new URL is stored.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

promisor-remote: refactor initialising field lists

In "promisor-remote.c", the fields_sent() and fields_checked()
functions serve similar purposes and contain a small amount of
duplicated code.

As we are going to add a similar function in a following commit,
let's refactor this common code into a new initialize_fields_list()
function.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

shallow: handling fetch relative-deepen

When a shallowed repository gets deepened beyond the beginning of a
merged branch, we may end up with some shallows that are hidden behind
the reachable shallow commits. Added test 'fetching deepen beyond
merged branch' exposes that behaviour.

An example showing the problem based on added test:
0. Whole initial git repo to be cloned from
   Graph:
   *   033585d (HEAD -> main) Merge branch 'branch'
   |\
   | * 984f8b1 (branch) five
   | * ecb578a four
   |/
   * 0cb5d20 three
   * 2b4e70d two
   * 61ba98b one

1. Initial shallow clone --depth=3 (all good)
   Shallows:
   2b4e70da2a10e1d3231a0ae2df396024735601f1
   ecb578a3cf37198d122ae5df7efed9abaca17144
   Graph:
   *   033585d (HEAD -> main) Merge branch 'branch'
   |\
   | * 984f8b1 five
   | * ecb578a (grafted) four
   * 0cb5d20 three
   * 2b4e70d (grafted) two

2. Deepen shallow clone with fetch --deepen=1 (NOT OK)
   Shallows:
   0cb5d204f4ef96ed241feb0f2088c9f4794ba758
   61ba98be443fd51c542eb66585a1f6d7e15fcdae
   Graph:
   *   033585d (HEAD -> main) Merge branch 'branch'
   |\
   | * 984f8b1 five
   | * ecb578a four
   |/
   * 0cb5d20 (grafted) three
   ---
   Note that second shallow commit 61ba98be443fd51c542eb66585a1f6d7e15fcdae
   is not reachable.

On the other hand, it seems that equivalent absolute depth driven
fetches result in all the correct shallows. That led to this proposal,
which unifies absolute and relative deepening in a way that the same
get_shallow_commits() call is used in both cases. The difference is
only that depth is adapted for relative deepening by measuring
equivalent depth of current local shallow commits in the current remote
repo. Thus a new function get_shallows_depth() has been added and the
function get_reachable_list() became redundant / removed.

Same example showing the corrected second step:
2. Deepen shallow clone with fetch --deepen=1 (all good)
   Shallow:
   61ba98be443fd51c542eb66585a1f6d7e15fcdae
   Graph:
   *   033585d (HEAD -> main) Merge branch 'branch'
   |\
   | * 984f8b1 five
   | * ecb578a four
   |/
   * 0cb5d20 three
   * 2b4e70d two
   * 61ba98b (grafted) one

The get_shallows_depth() function also shares the logic of the
get_shallow_commits() function, but it focuses on counting depth of
each existing shallow commit. The minimum result is stored as
'data->deepen_relative', which is set not to be zero for relative
deepening anyway. That way we can always sum 'data->deepen_relative'
and 'depth' values, because 'data->deepen_relative' is always 0 in
absolute deepening.
To avoid duplicating logic between get_shallows_depth() and
get_shallow_commits(), get_shallow_commits() was modified so that
it is used by get_shallows_depth().

Signed-off-by: Samo Pogačnik <samo_pogacnik@t-2.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

shallow: free local object_array allocations

The local object_array 'stack' in get_shallow_commits() function
does not free its dynamic elements before the function returns.
As a result elements remain allocated and their reference forgotten.

Also note, that test 'fetching deepen beyond merged branch' added by
'shallow: handling fetch relative-deepen' patch fails without this
correction in linux-leaks and linux-reftable-leaks test runs.

Signed-off-by: Samo Pogačnik <samo_pogacnik@t-2.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: patch-id: see also git-cherry(1)

git-cherry(1) links to this command. These two commands are similar and
we also mention it in the “Examples” section now. Let’s link to it.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: patch-id: add script example

The utility and usability of git-patch-id(1) was discussed
relatively recently:[1]

    Using "git patch-id" is definitely in the "write a script for it"
    category. I don't think I've ever used it as-is from the command
    line as part of a one-liner. It's very much a command that is
    designed purely for scripting, the interface is just odd and baroque
    and doesn't really make sense for one-liners.

    The typical use of patch-id is to generate two *lists* of patch-ids,
    then sort them and use the patch-id as a key to find commits that
    look the same.

The command doc *could* use an example, and since it is a mapper command
it makes sense for that example to be a little script.

Mapping the commits of some branch to an upstream ref allows us to
demonstrate generating two lists, sorting them, joining them, and
finally discarding the patch ID lookup column with cut(1).

† 1: https://lore.kernel.org/workflows/CAHk-=wiN+8EUoik4UeAJ-HPSU7hczQP+8+_uP3vtAy_=YfJ9PQ@mail.gmail.com/

Inspired-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: patch-id: emphasize multi-patch processing

Emphasize that you can pass multiple patches or diffs to this command.

git-patch-id(1) is an efficient pID–commit mapper, able to map
thousands of commits in seconds. But discussions on the command
seem to typically[1] use the standard loop-over-rev-list-and-
shell-out pattern:

    for commit in rev-list:
        prepare a diff from commit | git patch-id

This is unnecessary; we can bulk-process the patches:

    git rev-list --no-merges <ref> |
         git diff-tree --patch --stdin |
         git patch-id --stable

The first version (translated to shell) takes a little over nine
minutes for a commit history of about 78K commits.[2] The other one,
by contrast, takes slightly less than a minute.

Also drop “the” from “standard input”.

† 1: https://stackoverflow.com/a/19758159
† 2: This is `master` of this repository on 2025-10-02

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

add-patch: allow interfile navigation when selecting hunks

After deciding on all hunks in a file, the interactive session
advances automatically to the next file if there is another,
or the process ends.

Now using the `--no-auto-advance` flag with `--patch`, the process
does not advance automatically. A user can choose to go to the next
file by pressing '>' or the previous file by pressing '<', before or
after deciding on all hunks in the current file.

After all hunks have been decided in a file, the user can still
rework with the file by applying the options available in the permit
set for that hunk, and after all the decisions, the user presses 'q'
to submit.
After all hunks have been decided, the user can press '?' which will
show the hunk selection summary in the help patch remainder text
including the total hunks, number of hunks marked for use and number
of hunks marked for skip.

This feature is enabled by passing the `--no-auto-advance` flag
to `--patch` option of the subcommands add, stash, reset,
and checkout.

Signed-off-by: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

add-patch: allow all-or-none application of patches

When the flag `--no-auto-advance` is used with `--patch`,
if the user has decided `USE` on a hunk in a file, goes to another
file, and then returns to this file and changes the previous
decision on the hunk to `SKIP`, because the patch has already
been applied, the last decision is not registered and the now
SKIPPED hunk is still applied.

Move the logic for applying patches into a function so that we can
reuse this logic to implement the all or non application of the patches
after the user is done with the hunk selection.

Signed-off-by: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

add-patch: modify patch_update_file() signature

The function `patch_update_file()` takes the add_p_state struct
pointer and the current `struct file_diff` pointer and returns an
int.

When using the `--no-auto-advance` flag, we want to be able to request
the next or previous file from the caller.

Modify the function signature to instead take the index of the
current `file_diff` and the `add_p_state` struct pointer so that we
can compute the `file_diff` from the index while also having
access to the file index. This will help us request the next or
previous file from the caller.

Signed-off-by: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

interactive -p: add new `--auto-advance` flag

When using the interactive add, reset, stash or checkout machinery,
we do not have the option of reworking with a file when selecting
hunks, because the session automatically advances to the next file
or ends if we have just one file.

Introduce the flag `--auto-advance` which auto advances by default,
when interactively selecting patches with the '--patch' option.
However, the `--no-auto-advance` option does not auto advance, thereby
allowing users the option to rework with files.

Signed-off-by: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ref-filter: avoid strrchr() in rstrip_ref_components()

To strip path components from our refname string, we repeatedly call
strrchr() to find the trailing slash, shortening the string each time by
assigning NUL over it. This has two downsides:

  1. Calling strrchr() in a loop is quadratic, since each call has to
     call strlen() under the hood to find the end of the string (even
     though we know exactly where it is from the last loop iteration).

  2. We need a temporary buffer, since we're munging the string with NUL
     as we shorten it (which we must do, because strrchr() has no other
     way of knowing what we consider the end of the string).

Using memrchr() would let us fix both of these, but it isn't portable.
So instead, let's just open-code the string traversal from back to
front as we loop.

I doubt that the quadratic nature is a serious concern. You can see it
in practice with something like:

  git init
  git commit --allow-empty -m foo
  echo "$(git rev-parse HEAD) refs/heads$(perl -e 'print "/a" x 500_000')" >.git/packed-refs
  time git for-each-ref --format='%(refname:rstrip=-1)'

That takes ~5.5s to run on my machine before this patch, and ~11ms
after. But I don't think there's a reasonable way for somebody to infect
you with such a garbage ref, as the wire protocol is limited to 64k
pkt-lines. The difference is measurable for me for a 32k-component ref
(about 19ms vs 7ms), so perhaps you could create some chaos by pushing a
lot of them. But we also run into filesystem limits (if the loose
backend is in use), and in practice it seems like there are probably
simpler and more effective ways to waste CPU.

Likewise the extra allocation probably isn't really measurable. In fact,
since our goal is to return an allocated string, we end up having to
make the same allocation anyway (though it is sized to the result,
rather than the input). My main goal was simplicity in avoiding the need
to handle cleaning it up in the early return path.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ref-filter: simplify rstrip_ref_components() memory handling

We're stripping path components from the end of a string, which we do by
assigning a NUL as we parse each component, shortening the string. This
requires an extra temporary buffer to avoid munging our input string.

But the way that we allocate the buffer is unusual. We have an extra
"to_free" variable. Usually this is used when the access variable is
conceptually const, like:

   const char *foo;
   char *to_free = NULL;

   if (...)
           foo = to_free = xstrdup(...);
   else
           foo = some_const_string;
   ...
   free(to_free);

But that's not what's happening here. Our "start" variable always points
to the allocated buffer, and to_free is redundant. Worse, it is marked
as const itself, requiring a cast when we free it.

Let's drop to_free entirely, and mark "start" as non-const, making the
memory handling more clear. As a bonus, this also silences a warning
from glibc-2.43 that our call to strrchr() implicitly strips away the
const-ness of "start".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ref-filter: simplify lstrip_ref_components() memory handling

We're walking forward in the string, skipping path components from
left-to-right. So when we've stripped as much as we want, the pointer we
have is a complete NUL-terminated string and we can just return it
(after duplicating it, of course). So there is no need for a temporary
allocated string.

But we do make an extra temporary copy due to f0062d3b74 (ref-filter:
free item->value and item->value->s, 2018-10-18). This is probably from
cargo-culting the technique used in rstrip_ref_components(), which
_does_ need a separate string (since it is stripping from the end and
ties off the temporary string with a NUL).

Let's drop the extra allocation. This is slightly more efficient, but
more importantly makes the code much simpler.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ref-filter: factor out refname component counting

The "lstrip" and "rstrip" options to the %(refname) placeholder both
accept a negative length, which asks us to keep that many path
components (rather than stripping that many).

The code to count components and convert the negative value to a
positive was copied from lstrip to rstrip in 1a34728e6b (ref-filter: add
an 'rstrip=<N>' option to atoms which deal with refnames, 2017-01-10).

Let's factor it out into a separate function. This reduces duplication
and also makes the lstrip/rstrip functions much easier to follow, since
the bulk of their code is now the actual stripping.

Note that the computed "remaining" value is currently stored as a
"long", so in theory that's what our function should return. But this is
purely historical. When the variable was added in 0571979bd6 (tag: do
not show ambiguous tag names as "tags/foo", 2016-01-25), we parsed the
value from strtol(), and thus used a long. But these days we take "len"
as an int, and also use an int to count up components. So let's just
consistently use int here. This value could only overflow in a
pathological case (e.g., 4GB worth of "a/a/...") and even then will not
result in out-of-bounds memory access (we keep stripping until we run
out of string to parse).

The minimal Myers diff here is a little hard to read; with --patience
the code movement is shown much more clearly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation/git-history: document default for "--update-refs="

While we document the values that can be passed to the "--update-refs="
option, we don't give the user any hint what the default behaviour is.
Document it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/history: rename "--ref-action=" to "--update-refs="

With the preceding commit we have changed "--ref-action=" to only
control which refs are supposed to be updated, not what happens with
them. As a consequence, the option is now somewhat misnamed, as we don't
control the action itself anymore.

Rename it to "--update-refs=" to better align it with its new use.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/history: replace "--ref-action=print" with "--dry-run"

The git-history(1) command has the ability to perform a dry-run
that will not end up modifying any references. Instead, we'll only print
any ref updates that would happen as a consequence of performing the
operation.

This mode is somewhat hidden though behind the "--ref-action=print"
option. This command line option has its origin in git-replay(1), where
it's probably an okayish interface as this command is sitting more on
the plumbing side of tools. But git-history(1) is a user-facing tool,
and this way of achieving a dry-run is way too technical and thus not
very discoverable.

Besides usability issues, it also has another issue: the dry-run mode
will always operate as if the user wanted to rewrite all branches. But
in fact, the user also has the option to only update the HEAD reference,
and they might want to perform a dry-run of such an operation, too. We
could of course introduce "--ref-action=print-head", but that would
become even less ergonomic.

Replace "--ref-action=print" with a new "--dry-run" toggle. This new
toggle works with both "--ref-action={head,branches}" and is way more
discoverable.

Add a test to verify that both "--ref-action=" values behave as
expected.

This patch is best viewed with "--ignore-space-change".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/history: check for merges before asking for user input

The replay infrastructure is not yet capable of replaying merge commits.
Unfortunately, we only notice that we're about to replay merges after we
have already asked the user for input, so any commit message that the
user may have written will be discarded in that case.

Fix this by checking whether the revwalk contains merge commits before
we ask for user input.

Adapt one of the tests that is expected to fail because of this check
to use false(1) as editor. If the editor had been executed by Git, it
would fail with the error message "Aborting commit as launching the
editor failed."

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/history: perform revwalk checks before asking for user input

When setting up the revision walk in git-history(1) we also perform some
verifications whether the request actually looks sane. Unfortunately,
these verifications come _after_ we have already asked the user for the
commit message of the commit that is to be rewritten. So in case any of
the verifications fails, the user will have lost their modifications.

Extract the function to set up the revision walk and call it before we
ask for user input to fix this.

Adapt one of the tests that is expected to fail because of this check
to use false(1) as editor. If the editor had been executed by Git, it
would fail with the error message "Aborting commit as launching the
editor failed."

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-gui: wire up "git-gui--askyesno" with Meson

The new "git-gui--askyesno" helper script has only been wired up for our
Makefile, not for Meson. Wire it up properly to bring both build systems
on par with each other again.

Signed-off-by: Patrick Steinhardt <ps@pks.im>

git-gui: massage "git-gui--askyesno" with "generate-script.sh"

In e749c87 (git-gui: provide question helper for retry fallback on
Windows, 2025-08-28) we have introudced a new "git-gui--askyesno" helper
script. While the script is conceptually similar to our existing helper
script "git-gui--askpass", we don't massage it via "generate-script.sh".
This means that build options like the path to the wish shell are not
propagated correctly.

Fix this issue.

Signed-off-by: Patrick Steinhardt <ps@pks.im>

git-gui: prefer shell at "/bin/sh" with Meson

Meson detects the path of the target shell via `find_program("sh")`,
which essentially does a lookup via `PATH`. We know that almost all
systems have "/bin/sh" available though, which makes it the superior
choice as a default value.

Adapt `find_program()` to prefer "/bin/sh" over any other "sh"
executable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>

git-gui: fix use of GIT_CEILING_DIRECTORIES

The GIT-VERSION-GEN script sets up GIT_CEILING_DIRECTORIES so that we
won't accidentally parse version information from an unrelated parent
repository. The ceiling is derived from the source directory by simply
appendign "/.." to it, which mean that we'll only consider the current
directory for repository discovery.

This works alright in the case where git-gui is built as a standalone
project, but it breaks when git-gui is embedded into a _related_ parent
project. This is for example how git-gui is distributed via Git.

Interestingly enough, the version information is still derived properly
when building git-gui via Git's Makefile. In that case we eventually end
up specifying the ceiling directory as "./.." as we use relative paths
there, and that seems to not restrict the repository discovery.

But when building via Meson we specify the source directory as an
absolute path, and if so the repository discovery _is_ stopped. The
consequence is that we won't be able to derive the version in that case.

Fix the issue by adding a new optional parameter to GIT-VERSION-GEN that
allows the caller to override the parent project directory and wire up
new build options for Meson and Make that allows users to specify it.

Note that by default we won't set the parent project directory. This
isn't required for Meson anyway as we already use absolute paths there,
but for our Makefile it means that we still end up with "./.." as
ceiling directory, which is ineffective. But using e.g. pwd(1) as the
default value would break downstream's version generation, unless we
updated git-gui and the Makefile at the same point in time.

Signed-off-by: Patrick Steinhardt <ps@pks.im>

repo: add new flag --keys to git-repo-info

If the user wants to find what are the available keys, they need to
either check the documentation or to ask for all the key-value pairs
by using --all.

Add a new flag --keys for listing only the available keys without
listing the values.

Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

repo: rename the output format "keyvalue" to "lines"

Both subcommands in git-repo(1) accept the "keyvalue" format. This
format is newline-delimited, where the key is separated from the
value with an equals sign.

The name of this option is suboptimal though, as it is both too
limiting while at the same time not really indicating what it
actually does:

  - There is no mention of the format being newline-delimited, which
    is the key differentiator to the "nul" format.

  - Both "nul" and "keyvalue" have a key and a value, so the latter
    is not exactly giving any hint what makes it so special.

  - "keyvalue" requires there to be, well, a key and a value, but we
    want to add additional output that is only going to be newline
    delimited.

Taken together, "keyvalue" is kind of a bad name for this output
format.

Luckily, the git-repo(1) command is still rather new and marked as
experimental, so things aren't cast into stone yet. Rename the
format to "lines" instead to better indicate that the major
difference is that we'll get newline-delimited output. This new name
will also be a better fit for a subsequent extension in git-repo(1).

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

CodingGuidelines: document NEEDSWORK comments

We often say things like /* NEEDSWORK: further _do_ _this_ */ in
comments, but it is a short-hand to say "We might later want to do
this.  We might not.  We do not have to decide it right now at this
moment in the commit this comment was added.  If somebody is
inclined to work in this area further, the first thing they need to
do is to figure out if it truly makes sense to do so, before blindly
doing it."

This seems to have never been documented.  Do so now.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

The 4th batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'sb/merge-ours-sparse'

"git merge-ours" is taught to work better in a sparse checkout.

* sb/merge-ours-sparse:
merge-ours: integrate with sparse-index
merge-ours: drop USE_THE_REPOSITORY_VARIABLE

Merge branch 'sd/doc-my1c-api-config-reference-fix'

Docfix.

* sd/doc-my1c-api-config-reference-fix:
doc: fix repo_config documentation reference

Merge branch 'jc/ci-test-contrib-too'

Test contrib/ things in CI to catch breakages before they enter the
"next" branch.

* jc/ci-test-contrib-too:
  : Some of our downstream folks run more tests than we do and catch
  : breakages in them, namely, where contrib/*/Makefile has "test" target.
  : Let's make sure we fail upon accepting a new topic that break them in
  : 'seen'.
  ci: ubuntu: use GNU coreutils for dirname
  test: optionally test contrib in CI

Merge branch 'jt/odb-transaction-per-source'

Transaction to create objects (or not) is currently tied to the
repository, but in the future a repository can have multiple object
sources, which may have different transaction mechanisms.  Make the
odb transaction API per object source.

* jt/odb-transaction-per-source:
  odb: transparently handle common transaction behavior
  odb: prepare `struct odb_transaction` to become generic
  object-file: rename transaction functions
  odb: store ODB source in `struct odb_transaction`

Merge branch 'ps/commit-list-functions-renamed'

Rename three functions around the commit_list data structure.

* ps/commit-list-functions-renamed:
  commit: rename `free_commit_list()` to conform to coding guidelines
  commit: rename `reverse_commit_list()` to conform to coding guidelines
  commit: rename `copy_commit_list()` to conform to coding guidelines

Merge branch 'tc/last-modified-not-a-tree'

Giving "git last-modified" a tree (not a commit-ish) died an
uncontrolled death, which has been corrected.

* tc/last-modified-not-a-tree:
  last-modified: verify revision argument is a commit-ish
  last-modified: remove double error message
  last-modified: fix memory leak when more than one commit is given
  last-modified: rewrite error message when more than one commit given

Merge branch 'mc/doc-send-email-signed-off-by-cc'

Docfix.

* mc/doc-send-email-signed-off-by-cc:
doc: send-email: correct --no-signed-off-by-cc misspelling

Merge branch 'cf/c23-const-preserving-strchr-updates-0'

ISO C23 redefines strchr and friends that tradiotionally took
a const pointer and returned a non-const pointer derived from it to
preserve constness (i.e., if you ask for a substring in a const
string, you get a const pointer to the substring).  Update code
paths that used non-const pointer to receive their results that did
not have to be non-const to adjust.

* cf/c23-const-preserving-strchr-updates-0:
  gpg-interface: remove an unnecessary NULL initialization
  global: constify some pointers that are not written to

Merge branch 'jc/diff-highlight-main-master-testfix'

Test fix (in contrib/)

* jc/diff-highlight-main-master-testfix:
diff-highlight: allow testing with Git 3.0 breaking changes

Merge branch 'cs/subtree-reftable-testfix'

Test fix (in contrib/)

* cs/subtree-reftable-testfix:
contrib/subtree: fix tests with reftable backend

Merge branch 'tc/memzero-array'

Coccinelle rules update.

* tc/memzero-array:
cocci: extend MEMZERO_ARRAY() rules

t0213: add trace2 cmd_ancestry tests

Add a new test script t0213-trace2-ancestry.sh that verifies
cmd_ancestry events across all three trace2 output formats (normal,
perf, and event).

The tests use the "400ancestry" test helper to spawn child processes
with controlled trace2 environments. Git alias resolution (which
spawns a child git process) creates a predictable multi-level process
tree. Filter functions extract cmd_ancestry events from each format,
truncating the ancestor list at the outermost "test-tool" so that only
the controlled portion of the tree is verified, regardless of the test
runner environment.

A runtime prerequisite (TRACE2_ANCESTRY) is used to detect whether the
platform has a real procinfo implementation; platforms with only the
stub are skipped.

We must pay attention to an extra ancestor on Windows (MINGW) when
running without the bin-wrappers (such as we do in CI). In this
situation we see an extra "sh.exe" ancestor after "test-tool.exe".

Also update the comment in t0210-trace2-normal.sh to reflect that
ancestry testing now has its own dedicated test script.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-tool: extend trace2 helper with 400ancestry

Add a new test helper "400ancestry" to the trace2 test-tool that
spawns a child process with a controlled trace2 environment, capturing
only the child's trace2 output (including cmd_ancestry events) in
isolation.

The helper clears all inherited GIT_TRACE2* variables in the child
and enables only the requested target (normal, perf, or event),
directing output to a specified file. This gives the test suite a
reliable way to capture cmd_ancestry events: the child always sees
"test-tool" as its immediate parent in the process ancestry, providing
a predictable value to verify in tests.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

trace2: emit cmd_ancestry data for Windows

Since 2f732bf15e (tr2: log parent process name, 2021-07-21) it is now
now possible to emit a specific process ancestry event in TRACE2. We
should emit the Windows process ancestry data with the correct event
type.

To not break existing consumers of the data_json "windows/ancestry"
event, we continue to emit the ancestry data as a JSON event.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

trace2: refactor Windows process ancestry trace2 event

In 353d3d77f4 (trace2: collect Windows-specific process information,
2019-02-22) we added process ancestry information for Windows to TRACE2
via a data_json event. It was only later in 2f732bf15e (tr2: log parent
process name, 2021-07-21) that the specific cmd_ancestry event was
added to TRACE2.

In a future commit we will emit the ancestry information with the newer
cmd_ancestry TRACE2 event. Right now, we rework this implementation of
trace2_collect_process_info to separate the calculation of ancestors
from building and emiting the JSON array via a data_json event.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

build: include procinfo.c impl for macOS

Include an implementation of trace2_collect_process_info for macOS.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

trace2: add macOS process ancestry tracing

In 353d3d77f4 (trace2: collect Windows-specific process information,
2019-02-22) Windows-specific process ancestry information was added as
a data_json event to TRACE2. Furthermore in 2f732bf15e (tr2: log
parent process name, 2021-07-21) similar functionality was added for
Linux-based systems, using procfs.

Teach Git to also log process ancestry on macOS using the sysctl with
KERN_PROC to get process information (PPID and process name).
Like the Linux implementation, we use the cmd_ancestry TRACE2 event
rather than using a data_json event and creating another custom data
point.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

templates: detect commit messages containing diffs

If the body of a commit message contains a diff that is not indented
then "git am" will treat that diff as part of the patch rather than
as part of the commit message. This allows it to apply email messages
that were created by adding a commit message in front of a regular diff
without adding the "---" separator used by "git format-patch". This
often surprises users [1-4] so add a check to the sample "commit-msg"
hook to reject messages that would confuse "git am". Even if a project
does not use an email based workflow it is not uncommon for people
to generate patches from it and apply them with "git am". Therefore
it is still worth discouraging the creation of commit messages that
would not be applied correctly.

A further source of confusion when applying patches with "git am" is
the "---" separator that is added by "git format patch". If a commit
message body contains that line then it will be truncated by "git am".
As this is often used by patch authors to add some commentary that
they do not want to end up in the commit message when the patch is
applied, the hook does not complain about the presence of "---" lines
in the message.

Detecting if the message contains a diff is complicated by the
hook being passed the message before it is cleaned up so we need to
ignore any diffs below the scissors line. There are also two possible
config keys to check to find the comment character at the start of
the scissors line. The first paragraph of the commit message becomes
the email subject header which beings "Subject: " and so does not
need to be checked. The trailing ".*" when matching commented lines
ensures that if the comment string ends with a "$" it is not treated
as an anchor.

[1] https://lore.kernel.org/git/bcqvh7ahjjgzpgxwnr4kh3hfkksfruf54refyry3ha7qk7dldf@fij5calmscvm
[2] https://lore.kernel.org/git/ca13705ae4817ffba16f97530637411b59c9eb19.camel@scientia.org/
[3] https://lore.kernel.org/git/d0b577825124ac684ab304d3a1395f3d2d0708e8.1662333027.git.matheus.bernardino@usp.br/
[4] https://lore.kernel.org/git/CAFOYHZC6Qd9wkoWPcTJDxAs9u=FGpHQTkjE-guhwkya0DRVA6g@mail.gmail.com/

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

templates: add .gitattributes entry for sample hooks

The sample hooks are shell scripts but the filenames end with ".sample"
so they need their own .gitattributes rule. Update our editorconfig
settings to match the attributes as well.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sparse-checkout: use string_list_sort_u

sparse_checkout_list() uses string_list_sort and
string_list_remove_duplicates instead of string_list_sort_u.

use string_list_sort_u at that place.

Signed-off-by: Amisha Chhajed <136238836+amishhaa@users.noreply.github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: add caveat about round-tripping format-patch

git-format-patch(1) and git-am(1) deal with formatting commits as
patches and applying them, respectively. Naturally they use a few
delimiters to mark where the commit message ends. This can lead to
surprising behavior when these delimiters are used in the commit
message itself.

git-format-patch(1) will accept any commit message and not warn or error
about these delimiters being used.[1]

Especially problematic is the presence of unindented diffs in the commit
message; the patch machinery will naturally (since the commit message
has ended) try to apply that diff and everything after it.[2]

It is unclear whether any commands in this chain will learn to warn
about this. One concern could be that users have learned to rely on
the three-dash line rule to conveniently add extra-commit message
information in the commit message, knowing that git-am(1) will
ignore it.[4]

All of this is covered already, technically. However, we should spell
out the implications.

† 1: There is also git-commit(1) to consider. However, making that
     command warn or error out over such delimiters would be disruptive
     to all Git users who never use email in their workflow.
† 2: Recently patch(1) caused this issue for a project, but it was noted
     that git-am(1) has the same behavior[3]
† 3: https://github.com/i3/i3/pull/6564#issuecomment-3858381425
† 4: https://lore.kernel.org/git/xmqqldh4b5y2.fsf@gitster.g/
     https://lore.kernel.org/git/V3_format-patch_caveats.354@msgid.xyz/

Reported-by: Matthias Beyer <mail@beyermatthias.de>
Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
Reported-by: Matheus Tavares <matheus.tavb@gmail.com>
Reported-by: Chris Packham <judge.packham@gmail.com>
Helped-by: Jakob Haufe <sur5r@sur5r.net>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t9812: modernize test path helpers

Replace assertion-style 'test -f' checks with Git's
test_path_is_file() helper for clearer failures and
consistency.

Signed-off-by: Ashwani Kumar Kamal <ashwanikamal.im421@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

odb: convert `odb_has_object()` flags into an enum

Following the reason in the preceding commit, convert the
`odb_has_object()` flags into an enum.

With this change, we would have catched the misuse of `odb_has_object()`
that was fixed in a preceding commit as the compiler would have
generated a warning:

  ../builtin/backfill.c:71:9: error: implicit conversion from enumeration type 'enum odb_object_info_flag' to different enumeration type 'enum odb_has_object_flag' [-Werror,-Wenum-conversion]
     70 |                 if (!odb_has_object(ctx->repo->objects, &list->oid[i],
        |                      ~~~~~~~~~~~~~~
     71 |                                     OBJECT_INFO_FOR_PREFETCH))
        |                                     ^~~~~~~~~~~~~~~~~~~~~~~~
  1 error generated.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

odb: convert object info flags into an enum

Convert the object info flags into an enum and adapt all functions that
receive these flags as parameters to use the enum instead of an integer.
This serves two purposes:

  - The function signatures become more self-documenting, as callers
    don't have to wonder which flags they expect.

  - The compiler can warn when a wrong flag type is passed.

Note that the second benefit is somewhat limited. For example, when
or-ing multiple enum flags together the result will be an integer, and
the compiler will not warn about such use cases. But where it does help
is when a single flag of the wrong type is passed, as the compiler would
generate a warning in that case.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

odb: drop gaps in object info flag values

The object info flag values have a two gaps in their definitions, where
some bits are skipped over. These gaps don't really hurt, but it makes
one wonder whether anything is going on and whether a subset of flags
might be defined somewhere else.

That's not the case though. Instead, this is a case of flags that have
been dropped in the past:

  - The value 4 was used by `OBJECT_INFO_SKIP_CACHED`, removed in
    9c8a294a1a (sha1-file: remove OBJECT_INFO_SKIP_CACHED, 2020-01-02).

  - The value 8 was used by `OBJECT_INFO_ALLOW_UNKNOWN_TYPE`, removed in
    ae24b032a0 (object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag,
    2025-05-16).

Close those gaps to avoid any more confusion.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/fsck: fix flags passed to `odb_has_object()`

In `mark_object()` we invoke `has_object()` with a value of 1. This is
somewhat fishy given that the function expects a bitset of flags, so any
behaviour that this results in is purely coincidental and may break at
any point in time.

The call to `has_object()` was originally introduced in 9eb86f41de
(fsck: do not lazy fetch known non-promisor object, 2020-08-05). The
intent here was to skip lazy fetches of promisor objects: we have
already verified that the object is not a promisor object, so if the
object is missing it indicates a corrupt repository.

The hardcoded value that we pass maps to `HAS_OBJECT_RECHECK_PACKED`,
which is probably the intended behaviour: `odb_has_object()` will not
fetch promisor objects unless `HAS_OBJECT_FETCH_PROMISOR` is passed, but
we may want to verify that no concurrent process has written the object
that we're trying to read.

Convert the code to use the named flag instead of the the hardcoded
value.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/backfill: fix flags passed to `odb_has_object()`

The function `fill_missing_blobs()` receives an array of object IDs and
verifies for each of them whether the corresponding object exists. If it
doesn't exist, we add it to a set of objects and then batch-fetch all of
the objects at once.

The check for whether or not we already have the object is broken
though: we pass `OBJECT_INFO_FOR_PREFETCH`, but `odb_has_object()`
expects us to pass `HAS_OBJECT_*` flags. The flag expands to:

  - `OBJECT_INFO_QUICK`, which asks the object database to not reprepare
    in case the object wasn't found. This makes sense, as we'd otherwise
    reprepare the object database as many times as we have missing
    objects.

  - `OBJECT_INFO_SKIP_FETCH_OBJECT`, which asks the object database to
    not fetch the object in case it's missing. Again, this makes sense,
    as we want to batch-fetch the objects.

This shows that we indeed want the equivalent of this flag, but of
course represented as `HAS_OBJECT_*` flags.

Luckily, the code is already working correctly. The `OBJECT_INFO` flag
expands to `(1 << 3) | (1 << 4)`, none of which are valid `HAS_OBJECT`
flags. And if no flags are passed, `odb_has_object()` ends up calling
`odb_read_object_info_extended()` with exactly the above two flags that
we wanted to set in the first place.

Of course, this is pure luck, and this can break any moment. So let's
fix this and correct the code to not pass any flags at all.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff --anchored: avoid checking unmatched lines

For a line to be an anchor it has to appear in each of the files being
diffed exactly once. With that in mind lets delay checking whether
a line is an anchor until we know there is exactly one instance of
the line in each file. As each line is checked at most once, there
is no need to cache the result of is_anchor() and we can drop that
field from the hashmap entries. When diffing 5000 recent commits in
git.git this gives a modest speedup of ~2%. In the (rather extreme)
example below that consists largely of deletions the speedup is ~16%.

    seq 0 10000000 >old
    printf '%s\n' 300000 100000 200000 >new
    git diff --no-index --anchored=300000 old new

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'master' of https://github.com/j6t/gitk

* 'master' of https://github.com/j6t/gitk:
gitk: fix msgfmt being required
gitk: fix highlighted remote prefix of branches with directories

CodingGuidelines: document // comments

We do not use // comments in our C code, which is implied by the
description of multi-line comment rule and its examples, but is not
explicitly spelled out. Spell it out.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

The 3rd batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'rs/blame-ignore-colors-fix'

"git blame --ignore-revs=... --color-lines" did not account for
ignored revisions passing blame to the same commit an adjacent line
gets blamed for.

* rs/blame-ignore-colors-fix:
blame: fix coloring for repeated suspects

Merge branch 'hs/t9160-test-paths'

Test update.

* hs/t9160-test-paths:
t9160:modernize test path checking

Merge branch 'am/doc-github-contributiong-link-to-submittingpatches'

GitHub repository banner update.

* am/doc-github-contributiong-link-to-submittingpatches:
.github/CONTRIBUTING.md: link to SubmittingPatches on git-scm.com

Merge branch 'kh/doc-shortlog-fix'

Doc fix.

* kh/doc-shortlog-fix:
doc: shortlog: put back trailer paragraphs

Merge branch 'sp/show-index-warn-fallback'

When "git show-index" is run outside a repository, it silently
defaults to SHA-1; the tool now warns when this happens.

* sp/show-index-warn-fallback:
show-index: use gettext wrapping in user facing error messages
show-index: warn when falling back to SHA-1 outside a repository

builtin/pack-objects: don't fetch objects when merging packs

The "--stdin-packs" option can be used to merge objects from multiple
packfiles given via stdin into a new packfile. One big upside of this
option is that we don't have to perform a complete rev walk to enumerate
objects. Instead, we can simply enumerate all objects that are part of
the specified packfiles, which can be significantly faster in very large
repositories.

There is one downside though: when we don't perform a rev walk we also
don't have a good way to learn about the respective object's names. As a
consequence, we cannot use the name hashes as a heuristic to get better
delta selection.

We try to offset this downside though by performing a localized rev
walk: we queue all objects that we're about to repack as interesting,
and all objects from excluded packfiles as uninteresting. We then
perform a best-effort rev walk that allows us to fill in object names.

There is one gotcha here though: when "--exclude-promisor-objects" has
not been given we will perform backfill fetches for any promised objects
that are missing. This used to not be an issue though as this option was
mutually exclusive with "--stdin-packs". But that has changed recently,
and starting with dcc9c7ef47 (builtin/repack: handle promisor packs with
geometric repacking, 2026-01-05) we will now repack promisor packs
during geometric compaction. The consequence is that a geometric repack
may now perform a bunch of backfill fetches.

We of course cannot pass "--exclude-promisor-objects" to fix this
issue -- after all, the whole intent is to repack objects part of a
promisor pack. But arguably we don't have to: the rev walk is intended
as best effort, and we already configure it to ignore missing links to
other objects. So we can adapt the walk to unconditionally disable
fetching any missing objects.

Do so and add a test that verifies we don't backfill any objects.

Reported-by: Lukas Wanko <lwanko@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>