Junio C Hamano [Tue, 4 Nov 2025 15:48:07 +0000 (07:48 -0800)]
Merge branch 'qj/doc-my1stcontrib-email-verify'
The "MyFirstContribution" tutorial tells the reader how to send out
their patches; the section gained a hint to verify the message
reached the mailing list.
* qj/doc-my1stcontrib-email-verify:
MyFirstContribution: add note on confirming patches
Junio C Hamano [Tue, 4 Nov 2025 15:48:07 +0000 (07:48 -0800)]
Merge branch 'tz/test-prepare-gnupghome'
Tests did not set up GNUPGHOME correctly, which is fixed but some
flaky tests are exposed in t1016, which needs to be addressed
before this topic can move forward.
* tz/test-prepare-gnupghome:
t/lib-gpg: call prepare_gnupghome() in GPG2 prereq
t/lib-gpg: add prepare_gnupghome() to create GNUPGHOME dir
Junio C Hamano [Mon, 3 Nov 2025 14:49:55 +0000 (06:49 -0800)]
Merge branch 'jk/diff-patch-dry-run-cleanup'
Finishing touches to fixes to the recent regression in "git diff -w
--quiet" and anything that needs to internally generate patch to
see if it turns empty.
* jk/diff-patch-dry-run-cleanup:
diff: simplify run_external_diff() quiet logic
diff: drop dry-run redirection to /dev/null
diff: replace diff_options.dry_run flag with NULL file
diff: drop save/restore of color_moved in dry-run mode
diff: send external diff output to diff_options.file
Junio C Hamano [Mon, 3 Nov 2025 14:49:55 +0000 (06:49 -0800)]
Merge branch 'ps/maintenance-geometric'
"git maintenance" command learns the "geometric" strategy where it
avoids doing maintenance tasks that rebuilds everything from
scratch.
* ps/maintenance-geometric:
t7900: fix a flaky test due to git-repack always regenerating MIDX
builtin/maintenance: introduce "geometric" strategy
builtin/maintenance: make "gc" strategy accessible
builtin/maintenance: extend "maintenance.strategy" to manual maintenance
builtin/maintenance: run maintenance tasks depending on type
builtin/maintenance: improve readability of strategies
builtin/maintenance: don't silently ignore invalid strategy
builtin/maintenance: make the geometric factor configurable
builtin/maintenance: introduce "geometric-repack" task
builtin/gc: make `too_many_loose_objects()` reusable without GC config
builtin/gc: remove global `repack` variable
Junio C Hamano [Mon, 3 Nov 2025 14:49:54 +0000 (06:49 -0800)]
Merge branch 'rs/add-patch-quit'
The 'q'(uit) command in "git add -p" has been improved to quit
without doing any meaningless work before leaving, and giving EOF
(typically control-D) to the prompt is made to behave the same way.
* rs/add-patch-quit:
add-patch: quit on EOF
add-patch: quit without skipping undecided hunks
Junio C Hamano [Thu, 30 Oct 2025 15:00:20 +0000 (08:00 -0700)]
Merge branch 'kf/log-shortlog-completion-fix'
"git shortlog" knows "--committer" and "--author" options, which
the command line completion (in contrib/) did not handle well,
which has been corrected.
* kf/log-shortlog-completion-fix:
completion: complete some 'git log' options
Junio C Hamano [Thu, 30 Oct 2025 15:00:19 +0000 (08:00 -0700)]
Merge branch 'ps/remove-packfile-store-get-packs'
Two slightly different ways to get at "all the packfiles" in API
has been cleaned up.
* ps/remove-packfile-store-get-packs:
packfile: rename `packfile_store_get_all_packs()`
packfile: introduce macro to iterate through packs
packfile: drop `packfile_store_get_packs()`
builtin/grep: simplify how we preload packs
builtin/gc: convert to use `packfile_store_get_all_packs()`
object-name: convert to use `packfile_store_get_all_packs()`
Junio C Hamano [Wed, 29 Oct 2025 19:38:24 +0000 (12:38 -0700)]
Merge branch 'tb/incremental-midx-part-3.1'
Clean-up "git repack" machinery to prepare for incremental update
of midx files.
* tb/incremental-midx-part-3.1: (49 commits)
builtin/repack.c: clean up unused `#include`s
repack: move `write_cruft_pack()` out of the builtin
repack: move `write_filtered_pack()` out of the builtin
repack: move `pack_kept_objects` to `struct pack_objects_args`
repack: move `finish_pack_objects_cmd()` out of the builtin
builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()`
repack: extract `write_pack_opts_is_local()`
repack: move `find_pack_prefix()` out of the builtin
builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()`
builtin/repack.c: introduce `struct write_pack_opts`
repack: 'write_midx_included_packs' API from the builtin
builtin/repack.c: inline packs within `write_midx_included_packs()`
builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs`
builtin/repack.c: inline `remove_redundant_bitmaps()`
builtin/repack.c: reorder `remove_redundant_bitmaps()`
repack: keep track of MIDX pack names using existing_packs
builtin/repack.c: use a string_list for 'midx_pack_names'
builtin/repack.c: extract opts struct for 'write_midx_included_packs()'
builtin/repack.c: remove ref snapshotting from builtin
repack: remove pack_geometry API from the builtin
...
Jeff King [Wed, 29 Oct 2025 19:10:31 +0000 (15:10 -0400)]
test-tool: fix leak in delete-gpgsig command
We read the input into a strbuf, so we must free it. Without this, t1016
complains in SANITIZE=leak mode.
The bug was introduced in 7673ecd2dc (t1016-compatObjectFormat: add
tests to verify the conversion between objects, 2023-10-01). But nobody
seems to have noticed, probably because CI did not run these tests until
the fix in 6cd8369ef3 (t/lib-gpg: call prepare_gnupghome() in GPG2
prereq, 2024-07-03).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 29 Oct 2025 15:32:37 +0000 (11:32 -0400)]
doc: document backslash in gitignore patterns
Because gitignore patterns are passed to fnmatch, the handling of
backslashes is the same as it is there: it can be used to escape
metacharacters. We do reference fnmatch(3) for more details, but it may
be friendlier to point out this implication explicitly (especially for
people who want to know about backslash handling and search the
documentation for that word). There are also two cases that I've seen
some other backslash-escaping systems handle differently, so let's
describe those:
1. A backslash before any character treats that character literally,
even if it's not otherwise a meta-character. As opposed to
including the backslash itself (like "foo\bar" in shell expands to
"foo\bar") or forbidding it ("foo\zar" is required to produce a
diagnostic in C).
2. A backslash at the end of the string is an invalid pattern (and not
a literal backslash).
This second one in particular was a point of confusion between our
implementation and the one in JGit. Our wildmatch behavior matches what
POSIX specifies for fnmatch, so the code and documentation are in line.
But let's add a test to cover this case. Note that the behavior here
differs between wildmatch itself (which is what gitignore will use) and
pathspec matching (which will only turn to wildmatch if a literal match
fails). So we match "foo\" to "foo\" in pathspecs, but not via
gitignore.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t1016-compatObjectFormat: really freeze time for reproduciblity
The strategy in t1016-compatObjectFormat is to build two trees with
identical commits, one tree encoded in sha1 the other tree encoded
in sha256 and to use the compatibility code to test and see if
the two trees are identical.
GPG signatures include the current time as part of the signature.
To make gpg deterministic I forced the use of gpg --faked-system-time.
Unfortunately I did not look closely enough.
By default gpg still allows time to move forward with --faked-system-time.
So in those rare instances when the system is heavily loaded and gpg runs
slower than other times, signatures over the exact same data differ
due to timestamps with a minuscule difference.
Reading through the gpg documentation with a close eye, time can be
frozen by including an exclamation point at the end of the argument to
--faked-system-time.
Add the exclamation point so gpg really runs with a fixed notion of time,
resulting in the exact same data having identical gpg signatures.
That is enough that I can run "t1016-compatObjectFormat.sh --stress"
and I don't see any failures.
It is possible a future change to gpg will make replay protection more
robust and not provide a way to allow two separate runs of gpg to
produce exactly the same signature for exactly the same data. If that
happens a deeper comparison of the two repositories will need to be
performed. A comparison that simply verifies the signatures and
compares the data for equality. For now that is a lot of work
for no gain so I am just documenting the possibility.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The placeholder markup is underscore (_), not backtick (`) as well.
The inline-verbatim markup (backticks) handle interior formatting. This
means in this case that it applies HTML `<code>` to the underscores and
`<em>` to the placeholder.
That is the effect, anyway; we can see from the rest of 042d6f34 (doc:
git-checkout: clarify `-b` and `-B`, 2025-09-10) that this was probably
an unintended mix-up.
Acked-by: Julia Evans <julia@jvns.ca> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 28 Oct 2025 17:29:09 +0000 (10:29 -0700)]
Merge branch 'ps/ci-rust'
CI improvements to handle the recent Rust integration better.
* ps/ci-rust:
rust: support for Windows
ci: verify minimum supported Rust version
ci: check for common Rust mistakes via Clippy
rust/varint: add safety comments
ci: check formatting of our Rust code
ci: deduplicate calls to `apt-get update`
Junio C Hamano [Tue, 28 Oct 2025 17:29:09 +0000 (10:29 -0700)]
Merge branch 'cc/fast-import-strip-signed-tags'
"git fast-import" is taught to handle signed tags, just like it
recently learned to handle signed commits, in different ways.
* cc/fast-import-strip-signed-tags:
fast-import: add '--signed-tags=<mode>' option
fast-export: handle all kinds of tag signatures
t9350: properly count annotated tags
lib-gpg: allow tests with GPGSM or GPGSSH prereq first
doc: git-tag: stop focusing on GPG signed tags
Junio C Hamano [Tue, 28 Oct 2025 17:29:08 +0000 (10:29 -0700)]
Merge branch 'ds/sparse-checkout-clean'
"git sparse-checkout" subcommand learned a new "clean" action to
prune otherwise unused working-tree files that are outside the
areas of interest.
* ds/sparse-checkout-clean:
sparse-index: improve advice message instructions
t: expand tests around sparse merges and clean
sparse-index: point users to new 'clean' action
sparse-checkout: add --verbose option to 'clean'
dir: add generic "walk all files" helper
sparse-checkout: match some 'clean' behavior
sparse-checkout: add basics of 'clean' command
sparse-checkout: remove use of the_repository
t7900: fix a flaky test due to git-repack always regenerating MIDX
When a supposedly no-op "git repack" runs across a second boundary,
because the command always touches the MIDX file and updates its
timestamp, "ls -l $GIT_DIR/objects/pack/" before and after the
operation can change, which causes such a test to fail. Only
compare the *.pack files in the directory before and after the
operation to work around this flakyness.
Arguably, git-repack(1) should learn to not rewrite the MIDX in case
we know it is already up-to-date. But this is not a new problem
introduced via the new geometric maintenance task, so for now it
should be good enough to paper over the issue.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: taken from diff to v4 from v3 that was already merged to 'next'] Signed-off-by: Junio C Hamano <gitster@pobox.com>
MyFirstContribution: add note on confirming patches
Add a note after the `git send-email` section explaining how
contributors can confirm that their patches reached the mailing
list by checking https://lore.kernel.org/git/. This helps
contributors verify that their emails were successfully delivered.
Signed-off-by: Queen Ediri Jessa <qjessa662@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Xinyu Ruan [Fri, 24 Oct 2025 08:38:14 +0000 (08:38 +0000)]
refs: add missing remove_on_disk implementation for debug backend
The debug ref backend (refs_be_debug) was missing the remove_on_disk
function pointer, which caused a segmentation fault when running
'GIT_TRACE_REFS=1 git refs migrate --ref-format=reftable' commands.
Signed-off-by: Xinyu Ruan <r200981113@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 27 Oct 2025 02:48:20 +0000 (19:48 -0700)]
Merge branch 'js/ci-github-actions-update' into maint-2.51
CI update.
* js/ci-github-actions-update:
build(deps): bump actions/github-script from 7 to 8
build(deps): bump actions/setup-python from 5 to 6
build(deps): bump actions/checkout from 4 to 5
build(deps): bump actions/download-artifact from 4 to 5
Junio C Hamano [Mon, 27 Oct 2025 02:48:20 +0000 (19:48 -0700)]
Merge branch 'ps/t7528-ssh-agent-uds-workaround' into maint-2.51
Recent OpenSSH creates the Unix domain socket to communicate with
ssh-agent under $HOME instead of /tmp, which causes our test to
fail doe to overly long pathname in our test environment, which has
been worked around by using "ssh-agent -T".
* ps/t7528-ssh-agent-uds-workaround:
t7528: work around ETOOMANY in OpenSSH 10.1 and newer
Junio C Hamano [Mon, 27 Oct 2025 02:48:19 +0000 (19:48 -0700)]
Merge branch 'jk/status-z-short-fix' into maint-2.51
The "--short" option of "git status" that meant output for humans
and "-z" option to show NUL delimited output format did not mix
well, and colored some but not all things. The command has been
updated to color all elements consistently in such a case.
* jk/status-z-short-fix:
status: make coloring of "-z --short" consistent
Junio C Hamano [Mon, 27 Oct 2025 02:48:19 +0000 (19:48 -0700)]
Merge branch 'jk/diff-no-index-with-pathspec-fix' into maint-2.51
An earlier addition to "git diff --no-index A B" to limit the
output with pathspec after the two directories misbehaved when
these directories were given with a trailing slash, which has been
corrected.
* jk/diff-no-index-with-pathspec-fix:
diff --no-index: fix logic for paths ending in '/'
Junio C Hamano [Mon, 27 Oct 2025 02:48:18 +0000 (19:48 -0700)]
Merge branch 'ps/gitlab-ci-disable-windows-monitoring' into maint-2.51
Windows "real-time monitoring" interferes with the execution of
tests and affects negatively in both correctness and performance,
which has been disabled in Gitlab CI.
* ps/gitlab-ci-disable-windows-monitoring:
gitlab-ci: disable realtime monitoring to unbreak Windows jobs
Junio C Hamano [Mon, 27 Oct 2025 02:48:18 +0000 (19:48 -0700)]
Merge branch 'jc/diff-from-contents-fix' into maint-2.51
The code to squelch output from "git diff -w --name-status"
etc. for paths that "git diff -w -p" would have stayed silent
leaked output from dry-run patch generation, which has been
corrected.
* jc/diff-from-contents-fix:
diff: make sure the other caller of diff_flush_patch_quietly() is silent
Junio C Hamano [Mon, 27 Oct 2025 02:48:18 +0000 (19:48 -0700)]
Merge branch 'jk/diff-from-contents-fix' into maint-2.51
Recently we attempted to improve "git diff -w" and friends to
handle cases where patch output would be suppressed, but it
introduced a bug that emits unnecessary output, which has been
corrected.
* jk/diff-from-contents-fix:
diff: restore redirection to /dev/null for diff_from_contents
René Scharfe [Sat, 25 Oct 2025 05:48:28 +0000 (07:48 +0200)]
add-patch: quit on EOF
If we reach the end of the input, e.g. because the user pressed ctrl-D
on Linux, there is no point in showing any more prompts, as we won't get
any reply. Do the same as option 'q' would: Quit.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Sun, 26 Oct 2025 15:42:22 +0000 (11:42 -0400)]
match_pathname(): give fnmatch one char of prefix context
In match_pathname(), which we use for matching .gitignore and
.gitattribute patterns, we are comparing paths with fnmatch patterns
(actually our extended wildmatch, which will be important). There's an
extra optimization there: we pre-compute the number of non-wildcard
characters at the beginning of the pattern and do an fspathncmp() on
that prefix.
That lets us avoid fnmatch entirely on patterns without wildcards, and
shrinks the amount of work we hand off to fnmatch. For a pattern like
"foo*.txt" and a path "foobar.txt", we'd cut away the matching "foo"
prefix and just pass "*.txt" and "bar.txt" to fnmatch().
But this misses a subtle corner case. In fnmatch(), we'll think
"bar.txt" is the start of the path, but it's not. This doesn't matter
for the pattern above, but consider the wildmatch pattern "foo**/bar"
and the path "foobar". These two should not match, because there is no
file named "bar", and the "**" applies only to the containing directory
name. But after removing the "foo" prefix, fnmatch will get "**/bar" and
"bar", which it does consider a match, because "**/" can match zero
directories.
We can solve this by giving fnmatch a bit more context. As long as it
has one byte of the matched prefix, then it will know that "bar" is not
the start of the path. In this example it would get "o**/bar" and
"obar", and realize that they cannot match.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Sun, 26 Oct 2025 15:41:46 +0000 (11:41 -0400)]
match_pathname(): reorder prefix-match check
As an optimization, we use fspathncmp() to match a prefix of the pattern
that does not contain any wildcards, and then pass the remainder to
fnmatch(). If it has matched the whole thing, we can return early.
Let's shift this early-return check to before we tweak the pattern and
name strings. That will gives us more flexibility with that tweaking.
It might also save a few instructions, but I couldn't measure any
improvement in doing so (and I wouldn't be surprised if an optimizing
compiler could figure that out itself).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sat, 25 Oct 2025 05:46:42 +0000 (07:46 +0200)]
add-patch: quit without skipping undecided hunks
Option q implies d, i.e., it marks any undecided hunks towards the
bottom of the hunk array as skipped. This is unnecessary; later code
treats undecided and skipped hunks the same: The only functions that
use UNDECIDED_HUNK and SKIP_HUNK are patch_update_file() itself (but
not after its big for loop) and its helpers get_first_undecided() and
display_hunks().
Streamline the handling of option q by quitting immediately.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 24 Oct 2025 20:48:05 +0000 (13:48 -0700)]
Merge branch 'ps/t7528-ssh-agent-uds-workaround'
Recent OpenSSH creates the Unix domain socket to communicate with
ssh-agent under $HOME instead of /tmp, which causes our test to
fail doe to overly long pathname in our test environment, which has
been worked around by using "ssh-agent -T".
* ps/t7528-ssh-agent-uds-workaround:
t7528: work around ETOOMANY in OpenSSH 10.1 and newer
Junio C Hamano [Fri, 24 Oct 2025 20:48:04 +0000 (13:48 -0700)]
Merge branch 'jk/status-z-short-fix'
The "--short" option of "git status" that meant output for humans
and "-z" option to show NUL delimited output format did not mix
well, and colored some but not all things. The command has been
updated to color all elements consistently in such a case.
* jk/status-z-short-fix:
status: make coloring of "-z --short" consistent
We have two different repacking strategies in Git:
- The "gc" strategy uses git-gc(1).
- The "incremental" strategy uses multi-pack indices and `git
multi-pack-index repack` to merge together smaller packfiles as
determined by a specific batch size.
The former strategy is our old and trusted default, whereas the latter
has historically been used for our scheduled maintenance. But both
strategies have their shortcomings:
- The "gc" strategy performs regular all-into-one repacks. Furthermore
it is rather inflexible, as it is not easily possible for a user to
enable or disable specific subtasks.
- The "incremental" strategy is not a full replacement for the "gc"
strategy as it doesn't know to prune stale data.
So today, we don't have a strategy that is well-suited for large repos
while being a full replacement for the "gc" strategy.
Introduce a new "geometric" strategy that aims to fill this gap. This
strategy invokes all the usual cleanup tasks that git-gc(1) does like
pruning reflogs and rerere caches as well as stale worktrees. But where
it differs from both the "gc" and "incremental" strategy is that it uses
our geometric repacking infrastructure exposed by git-repack(1) to
repack packfiles. The advantage of geometric repacking is that we only
need to perform an all-into-one repack when the object count in a repo
has grown significantly.
One downside of this strategy is that pruning of unreferenced objects is
not going to happen regularly anymore. Every geometric repack knows to
soak up all loose objects regardless of their reachability, and merging
two or more packs doesn't consider reachability, either. Consequently,
the number of unreachable objects will grow over time.
This is remedied by doing an all-into-one repack instead of a geometric
repack whenever we determine that the geometric repack would end up
merging all packfiles anyway. This all-into-one repack then performs our
usual reachability checks and writes unreachable objects into a cruft
pack. As cruft packs won't ever be merged during geometric repacks we
can thus phase out these objects over time.
Of course, this still means that we retain unreachable objects for far
longer than with the "gc" strategy. But the maintenance strategy is
intended especially for large repositories, where the basic assumption
is that the set of unreachable objects will be significantly dwarfed by
the number of reachable objects.
If this assumption is ever proven to be too disadvantageous we could for
example introduce a time-based strategy: if the largest packfile has not
been touched for longer than $T, we perform an all-into-one repack. But
for now, such a mechanism is deferred into the future as it is not clear
yet whether it is needed in the first place.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/maintenance: make "gc" strategy accessible
While the user can pick the "incremental" maintenance strategy, it is
not possible to explicitly use the "gc" strategy. This has two
downsides:
- It is impossible to use the default "gc" strategy for a specific
repository when the strategy was globally set to a different strategy.
- It is not possible to use git-gc(1) for scheduled maintenance.
Address these issues by making making the "gc" strategy configurable.
Furthermore, extend the strategy so that git-gc(1) runs for both manual
and scheduled maintenance.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/maintenance: extend "maintenance.strategy" to manual maintenance
The "maintenance.strategy" configuration allows users to configure how
Git is supposed to perform repository maintenance. The idea is that we
provide a set of high-level strategies that may be useful in different
contexts, like for example when handling a large monorepo. Furthermore,
the strategy can be tweaked by the user by overriding specific tasks.
In its current form though, the strategy only applies to scheduled
maintenance. This creates something of a gap, as scheduled and manual
maintenance will now use _different_ strategies as the latter would
continue to use git-gc(1) by default. This makes the strategies way less
useful than they could be on the one hand. But even more importantly,
the two different strategies might clash with one another, where one of
the strategies performs maintenance in such a way that it discards
benefits from the other strategy.
So ideally, it should be possible to pick one strategy that then applies
globally to all the different ways that we perform maintenance. This
doesn't necessarily mean that the strategy always does the _same_ thing
for every maintenance type. But it means that the strategy can configure
the different types to work in tandem with each other.
Change the meaning of "maintenance.strategy" accordingly so that the
strategy is applied to both types, manual and scheduled. As preceding
commits have introduced logic to run maintenance tasks depending on this
type we can tweak strategies so that they perform those tasks depending
on the context.
Note that this raises the question of backwards compatibility: when the
user has configured the "incremental" strategy we would have ignored
that strategy beforehand. Instead, repository maintenance would have
continued to use git-gc(1) by default.
But luckily, we can match that behaviour by:
- Keeping all current tasks of the incremental strategy as
`MAINTENANCE_TYPE_SCHEDULED`. This ensures that those tasks will not
run during manual maintenance.
- Configuring the "gc" task so that it is invoked during manual
maintenance.
Like this, the user shouldn't observe any difference in behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/maintenance: run maintenance tasks depending on type
We basically have three different ways to execute repository
maintenance:
1. Manual maintenance via `git maintenance run`.
2. Automatic maintenance via `git maintenance run --auto`.
3. Scheduled maintenance via `git maintenance run --schedule=`.
At the moment, maintenance strategies only have an effect for the last
type of maintenance. This is about to change in subsequent commits, but
to do so we need to be able to skip some tasks depending on how exactly
maintenance was invoked.
Introduce a new maintenance type that discern between manual (1 & 2) and
scheduled (3) maintenance. Convert the `enabled` field into a bitset so
that it becomes possible to specifiy which tasks exactly should run in a
specific context.
The types picked for existing strategies match the status quo:
- The default strategy is only ever executed as part of a manual
maintenance run. It is not possible to use it for scheduled
maintenance.
- The incremental strategy is only ever executed as part of a
scheduled maintenance run. It is not possible to use it for manual
maintenance.
The strategies will be tweaked in subsequent commits to make use of this
new infrastructure.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/maintenance: improve readability of strategies
Our maintenance strategies are essentially a large array of structures,
where each of the tasks can be enabled and scheduled individually. With
the current layout though all the configuration sits on the same nesting
layer, which makes it a bit hard to discern which initialized fields
belong to what task.
Improve readability of the individual tasks by using nested designated
initializers instead.
Suggested-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing maintenance strategies we completely ignore the
user-configured value in case it is unknown to us. This makes it
basically undiscoverable to the user that scheduled maintenance is
devolving into a no-op.
Change this to instead die when seeing an unknown maintenance strategy.
While at it, pull out the parsing logic into a separate function so that
we can reuse it in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/maintenance: make the geometric factor configurable
The geometric repacking task uses a factor of two for its geometric
sequence, meaning that each next pack must contain at least twice as
many objects as the next-smaller one. In some cases it may be helpful to
configure this factor though to reduce the number of packfile merges
even further, e.g. in very big repositories. But while git-repack(1)
itself supports doing this, the maintenance task does not give us a way
to tune it.
Introduce a new "maintenance.geometric-repack.splitFactor" configuration
to plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new "geometric-repack" task. This task uses our geometric
repack infrastructure as provided by git-repack(1) itself, which is a
strategy that especially hosting providers tend to use to amortize the
costs of repacking objects.
There is one issue though with geometric repacks, namely that they
unconditionally pack all loose objects, regardless of whether or not
they are reachable. This is done because it means that we can completely
skip the reachability step, which significantly speeds up the operation.
But it has the big downside that we are unable to expire objects over
time.
To address this issue we thus use a split strategy in this new task:
whenever a geometric repack would merge together all packs, we instead
do an all-into-one repack. By default, these all-into-one repacks have
cruft packs enabled, so unreachable objects would now be written into
their own pack. Consequently, they won't be soaked up during geometric
repacking anymore and can be expired with the next full repack, assuming
that their expiry date has surpassed.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/gc: make `too_many_loose_objects()` reusable without GC config
To decide whether or not a repository needs to be repacked we estimate
the number of loose objects. If the number exceeds a certain threshold
we perform the repack, otherwise we don't.
This is done via `too_many_loose_objects()`, which takes as parameter
the `struct gc_config`. This configuration is only used to determine the
threshold. In a subsequent commit we'll add another caller of this
function that wants to pass a different limit than the one stored in
that structure.
Refactor the function accordingly so that we only take the limit as
parameter instead of the whole structure.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The global `repack` variable is used to store all command line arguments
that we eventually want to pass to git-repack(1). It is being appended
to from multiple different functions, which makes it hard to follow the
logic. Besides being hard to follow, it also makes it unnecessarily hard
to reuse this infrastructure in new code.
Refactor the code so that we store this variable on the stack and pass
a pointer to it around as needed. This is done so that we can reuse
`add_repack_all_options()` in a subsequent commit.
The refactoring itself is straight-forward. One function that deserves
attention though is `need_to_gc()`: this function determines whether or
not we need to execute garbage collection for `git gc --auto`, but also
for `git maintenance run --auto`. But besides figuring out whether we
have to perform GC, the function also sets up the `repack` arguments.
For `git gc --auto` it's trivial to adapt, as we already have the
on-stack variable at our fingertips. But for the maintenance condition
it's less obvious what to do.
As it turns out, we can just use another temporary variable there that
we then immediately discard. If we need to perform GC we execute a child
git-gc(1) process to repack objects for us, and that process will have
to recompute the arguments anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 24 Oct 2025 17:25:07 +0000 (13:25 -0400)]
diff: simplify run_external_diff() quiet logic
We'd sometimes end up in run_external_diff() to do a dry-run diff (e.g.,
to find content-level changes for --quiet). We recognize this quiet mode
by seeing the lack of DIFF_FORMAT_PATCH in the output format.
But since introducing an explicit dry-run check via 3ed5d8bd73 (diff:
stop output garbled message in dry run mode, 2025-10-20), this logic can
never trigger. We can only get to this function by calling
diff_flush_patch(), and that comes from only two places:
1. A dry-run flush comes from diff_flush_patch_quietly(), which is
always in dry-run mode (so the other half of our "||" is true
anyway).
2. A regular flush comes from diff_flush_patch_all_file_pairs(),
which is only called when output_format has DIFF_FORMAT_PATCH in
it.
So we can simplify our "quiet" condition to just checking dry-run mode
(which used to be a specific flag, but recently became just a NULL
"file" pointer). And since it's so simple, we can just do that inline.
This makes the logic about o->file more obvious, since we handle the
NULL and non-stdout cases next to each other.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 24 Oct 2025 17:09:25 +0000 (13:09 -0400)]
diff: drop dry-run redirection to /dev/null
As an added protection against dry-run diffs accidentally producing
output, we redirect diff_options.file to /dev/null. But as of the
previous patch, this now does nothing, since dry-run diffs are
implemented by setting "file" to NULL.
So we can drop this extra code with no change in behavior. This is
effectively a revert of 623f7af284 (diff: restore redirection to
/dev/null for diff_from_contents, 2025-10-17) and 3da4413dbc (diff: make
sure the other caller of diff_flush_patch_quietly() is silent,
2025-10-22), but:
1. We get a conflict because we already dropped the color_moved
handling in an earlier patch. But we just resolve the conflicts to
"theirs" (removing all of the code).
Jeff King [Fri, 24 Oct 2025 17:08:53 +0000 (13:08 -0400)]
diff: replace diff_options.dry_run flag with NULL file
We introduced a dry_run flag to diff_options in b55e6d36eb (diff: ensure
consistent diff behavior with ignore options, 2025-08-08), with the idea
that the lower-level diff code could skip output when it is set.
As we saw with the bugs fixed by 3ed5d8bd73 (diff: stop output garbled
message in dry run mode, 2025-10-20), it is easy to miss spots. In the
end, we located all of them by checking where diff_options.file is used.
That suggests another possible approach: we can replace the dry_run
boolean with a NULL pointer for "file", as we know that using "file" in
dry_run mode would always be an error. This turns any missed spots from
producing extra output[1] into a segfault. Which is less forgiving, but
that is the point: this is indicative of a programming error, and
complaining loudly and immediately is good.
[1] We protect ourselves against garbled output as a separate step,
courtesy of 623f7af284 (diff: restore redirection to /dev/null for
diff_from_contents, 2025-10-17). So in that sense this patch can
only introduce user-visible errors (since any "bugs" were going to
/dev/null before), but the idea is to catch them rather than quietly
send garbage to /dev/null.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 24 Oct 2025 17:07:53 +0000 (13:07 -0400)]
diff: drop save/restore of color_moved in dry-run mode
When running a dry-run content-level diff to check whether a "--quiet"
diff has any changes, we have always unset the color_moved variable
since the feature was added in 2e2d5ac184 (diff.c: color moved lines
differently, 2017-06-30). The reasoning is not given explicitly there,
but presumably the idea is that since color_moved requires a lot of
extra computation to match lines but does not actually affect the
found_changes flag, we want to skip it.
Later, in 3da4413dbc (diff: make sure the other caller of
diff_flush_patch_quietly() is silent, 2025-10-22) we copied the same
idea for other dry-run diffs.
But neither spot actually needs to reset this flag at all, because
diff_flush_patch() will not ever compute color_moved. Nor could it, as
it is only looking at a single file-pair, and we detect moves across
files. So color_moved is checked only when we are actually doing real
DIFF_FORMAT_PATCH output, and call diff_flush_patch_all_file_pairs().
So we can get rid of these extra lines to save and restore the
color_moved flag without changing the behavior at all. (Note that there
is no "restore" to drop for the second caller, as we know at that point
we are not generating any output and can just leave the feature
disabled).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 24 Oct 2025 17:06:49 +0000 (13:06 -0400)]
diff: send external diff output to diff_options.file
Diff output usually goes to the process stdout, but it can be redirected
with the "--output" option. We store this in the "file" pointer of
diff_options, and all of the diff code should write there instead of to
stdout.
But there's one spot we missed: running an external diff cmd. We don't
redirect its output at all, so it just defaults to the stdout of the
parent process. We should instead point its stdout at our output file.
There are a few caveats to watch out for when doing so:
- The stdout field takes a descriptor, not a FILE pointer. We can pull
out the descriptor with fileno().
- The run-command API always closes the stdout descriptor we pass to
it. So we must duplicate it (otherwise we break the FILE pointer,
since it now points to a closed descriptor).
- We don't need to worry about closing our dup'd descriptor, since the
point is that run-command will do it for us (even in the case of an
error). But we do need to make sure we skip the dup() if we set
no_stdout (because then run-command will not look at it at all).
- When the output is going to stdout, it would not be wrong to dup()
the descriptor, but we don't need to. We can skip that extra work
with a simple pointer comparison.
- It seems like you'd need to fflush() the descriptor before handing
off a copy to the child process to prevent out-of-order writes. But
that was true even before this patch! It works because run-command
always calls fflush(NULL) before running the child.
The new test shows the breakage (and fix). The need for duplicating the
descriptor doesn't need a new test; that is covered by the later test
"GIT_EXTERNAL_DIFF with more than one changed files".
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Fri, 24 Oct 2025 16:47:10 +0000 (18:47 +0200)]
commit-reach: avoid commit_list_insert_by_date()
Building a list using commit_list_insert_by_date() has quadratic worst
case complexity. Avoid it by just appending in the loop and sorting at
the end.
The number of merge bases is usually small, so don't expect speedups in
normal repositories. It has no limit, though. The added perf test
shows a nice improvement when dealing with 16384 merge bases:
Test v2.51.1 HEAD
-----------------------------------------------------------------
6010.2: git merge-base 0.55(0.54+0.00) 0.03(0.02+0.00) -94.5%
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 24 Oct 2025 16:10:37 +0000 (09:10 -0700)]
Merge branch 'jc/diff-from-contents-fix'
The code to squelch output from "git diff -w --name-status"
etc. for paths that "git diff -w -p" would have stayed silent
leaked output from dry-run patch generation, which has been
corrected.
* jc/diff-from-contents-fix:
diff: make sure the other caller of diff_flush_patch_quietly() is silent
Junio C Hamano [Fri, 24 Oct 2025 16:10:37 +0000 (09:10 -0700)]
Merge branch 'jk/diff-from-contents-fix'
Recently we attempted to improve "git diff -w" and friends to
handle cases where patch output would be suppressed, but it
introduced a bug that emits unnecessary output, which has been
corrected.
* jk/diff-from-contents-fix:
diff: restore redirection to /dev/null for diff_from_contents
t7528: work around ETOOMANY in OpenSSH 10.1 and newer
In t7528 we spawn an SSH agent to verify that we can sign a commit via
it. This test has started to fail on some machines:
+++ ssh-agent
unix_listener_tmp: path "/home/pks/Development/git/build/test-output/trash directory.t7528-signed-commit-ssh/.ssh/agent/s.UTulegefEg.agent.UrPHumMXPq" too long for Unix domain socket
main: Couldn't prepare agent socket
As it turns out this is caused by a change in OpenSSH 10.1 [1]:
* ssh-agent(1), sshd(8): move agent listener sockets from /tmp to
under ~/.ssh/agent for both ssh-agent(1) and forwarded sockets
in sshd(8).
Instead of creating the socket in "/tmp", OpenSSH now creates the socket
in our home directory. And as the home directory gets modified to be
located in our test output directory we end up with paths that are
somewhat long. But Linux has a rather short limit of 108 characters for
socket paths, and other systems have even lower limits, so it is very
easy now to exceed the limit and run into the above error.
Work around the issue by using `ssh-agent -T`, which instructs it to
use the old behaviour and create the socket in "/tmp" again. This switch
has only been introduced with 10.1 though, so for older versions we have
to fall back to not using it. That's fine though, as older versions know
to put the socket into "/tmp" already.
An alternative approach would be to abbreviate the socket name itself so
that we create it as e.g. "sshsock" in the trash directory. But taking
the above example we'd still end up with a path that is 91 characters
long. So we wouldn't really have a lot of headroom, and it is quite
likely that some developers would see the issue on their machines.
[1]: https://www.openssh.com/txt/release-10.1
Reported-by: Xi Ruoyao <xry111@xry111.site> Suggested-by: brian m. carlson <sandals@crustytoothpaste.net> Helped-by: Jeff King <peff@peff.net> Helped-by: Lauri Tirkkonen <lauri@hacktheplanet.fi> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gpg-interface: do not use misdesigned strbuf_split*()
In get_default_ssh_signing_key(), the default ssh signing key is
retrieved in `key_stdout` buf, which is then split using
strbuf_split_max() into up to two strbufs at a new line and the first
strbuf is returned as a `char *`and not a strbuf.
This makes the function lack the use of strbuf API as no edits are
performed on the split tokens.
Simplify the process of retrieving and returning the desired line by
using strchr() to isolate the line and xmemdupz() to return a copy of the
line. This removes the roundabout way of splitting the string into
strbufs, just to return the line.
Reported-by: Junio Hamano <gitster@pobox.com> Helped-by: Christian Couder <christian.couder@gmail.com> Helped-by: Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com> Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gpg-interface: do not use misdesigned strbuf_split*()
In get_ssh_finger_print(), the output of the `ssh-keygen` command is
put into `fingerprint_stdout` strbuf. The string in `fingerprint_stdout`
is then split into up to 3 strbufs using strbuf_split_max(). However they
are not modified after the split thereby not making use of the strbuf API
as the fingerprint token is merely returned as a char * and not a strbuf.
Hence they do not need to be strbufs.
Simplify the process of retrieving and returning the desired token by
using strchr() to isolate the token and xmemdupz() to return a copy of the
token. This removes the roundabout way of splitting the string into
strbufs just to return the token.
Reported-by: Junio Hamano <gitster@pobox.com> Helped-by: Christian Couder <christian.couder@gmail.com> Helped-by: Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com> Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Lidong Yan [Sun, 19 Oct 2025 16:30:24 +0000 (00:30 +0800)]
diff: stop output garbled message in dry run mode
Earlier, b55e6d36 (diff: ensure consistent diff behavior with
ignore options, 2025-08-08) introduced "dry-run" mode to the
diff machinery so that content-based diff filtering (like
ignoring space changes or those that match -I<regex>) can first
try to produce a patch without emitting any output to see if
under the given diff filtering condition we would get any output
lines, and a new helper function diff_flush_patch_quietly() was
introduced to use the mode to see an individual filepair needs
to be shown.
However, the solution was not complete. When files are deleted,
file modes change, or there are unmerged entries in the index,
dry-run mode still produces output because we overlooked these
conditions, and as a result, dry-run mode was not quiet.
To fix this, return early in emit_diff_symbol_from_struct() if
we are in dry-run mode. This function will be called by all the
emit functions to output the results. Returning early can avoid
diff output when files are deleted or file modes are changed.
Stop print message in dry-run mode if we have unmerged entries
in index. Discard output of external diff tool in dry-run mode.
Signed-off-by: Lidong Yan <yldhome2d2@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 22 Oct 2025 17:39:12 +0000 (10:39 -0700)]
diff: make sure the other caller of diff_flush_patch_quietly() is silent
Earlier, we added is a protection for the loop that computes "git
diff --quiet -w" to ensure calls to the diff_flush_patch_quietly()
helper stays quiet. Do the same for another loop that deals with
options like "--name-status" to make calls to the same helper.
Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 22 Oct 2025 18:38:58 +0000 (11:38 -0700)]
Merge branch 'je/doc-pull'
Documentation updates.
* je/doc-pull:
doc: git-pull: clarify how to exit a conflicted merge
doc: git-pull: delete the example
doc: git-pull: clarify options for integrating remote branch
doc: git-pull: move <repository> and <refspec> params
Junio C Hamano [Wed, 22 Oct 2025 18:38:58 +0000 (11:38 -0700)]
Merge branch 'bc/sha1-256-interop-01'
The beginning of SHA1-SHA256 interoperability work.
* bc/sha1-256-interop-01:
t1010: use BROKEN_OBJECTS prerequisite
t: allow specifying compatibility hash
fsck: consider gpgsig headers expected in tags
rev-parse: allow printing compatibility hash
docs: add documentation for loose objects
docs: improve ambiguous areas of pack format documentation
docs: reflect actual double signature for tags
docs: update offset order for pack index v3
docs: update pack index v3 format
Junio C Hamano [Wed, 22 Oct 2025 18:38:58 +0000 (11:38 -0700)]
Merge branch 'js/ci-github-actions-update'
CI update.
* js/ci-github-actions-update:
build(deps): bump actions/github-script from 7 to 8
build(deps): bump actions/setup-python from 5 to 6
build(deps): bump actions/checkout from 4 to 5
build(deps): bump actions/download-artifact from 4 to 5
Ruoyu Zhong [Wed, 22 Oct 2025 08:36:13 +0000 (08:36 +0000)]
bisect: fix handling of `help` and invalid subcommands
As documented in git-bisect(1), `git bisect help` should display usage
information. However, since the migration of `git bisect` to a full
builtin command in 73fce29427 (Turn `git bisect` into a full built-in,
2022-11-10), this behavior was broken. Running `git bisect help` would,
instead of showing usage, either fail silently if already in a bisect
session, or otherwise trigger an interactive autostart prompt asking "Do
you want me to do it for you [Y/n]?".
Similarly, since df63421be9 (bisect--helper: handle states directly,
2022-11-10), running invalid subcommands like `git bisect foobar` also
led to the same behavior.
This occurred because `help` and other unrecognized subcommands were
being unconditionally passed to `bisect_state`, which then called
`bisect_autostart`, triggering the interactive prompt.
Fix this by:
1. Adding explicit handling for the `help` subcommand to show usage;
2. Validating that unrecognized commands are actually valid state
commands before calling `bisect_state`;
3. Showing an error with usage for truly invalid commands.
This ensures that `git bisect help` displays the usage as documented,
and invalid commands fail cleanly without entering interactive mode.
Alternate terms are still handled correctly through
`check_and_set_terms`.
Signed-off-by: Ruoyu Zhong <zhongruoyu@outlook.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emily Yang [Fri, 17 Oct 2025 20:58:59 +0000 (20:58 +0000)]
commit-graph: add new config for changed-paths & recommend it in scalar
The changed-path Bloom filters feature has proven stable and reliable
over several years of use, delivering significant performance
improvement for file history computation in large monorepos. Currently
a user can opt-in to writing the changed-path Bloom filters using the
"--changed-paths" option to "git commit-graph write". The filters will
be persisted until the user drops the filters using the
"--no-changed-paths" option. For this functionality, refer to 0087a87ba8
(commit-graph: persist existence of changed-paths, 2020-07-01).
Large monorepos using Git's background maintenance to build and update
commit-graph files could use an easy switch to enable this feature
without a foreground computation. In this commit, we're proposing a new
config option "commitGraph.changedPaths":
* If "true", "git commit-graph write" will write Bloom filters,
equivalent to passing "--changed-paths";
* If "false" or "unset", Bloom filters will be written during "git
commit-graph write" only if the filters already exist in the current
commit-graph file. This matches the default behaviour of "git
commit-graph write" without any "--[no-]changed-paths" option. Note
"false" can disable a previous "true" config value but doesn't imply
"--no-changed-paths".
This config will always respect the precedence of command line option
"--[no-]changed-paths".
We also set this new config as optional recommended config in scalar to
turn on this feature for large repos.
Helped-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Emily Yang <emilyyang.git@gmail.com> Acked-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>