Junio C Hamano [Sun, 31 May 2026 01:00:38 +0000 (10:00 +0900)]
Merge branch 'jk/commit-graph-lazy-load-fallback'
The logic to lazy-load trees from the commit-graph has been made
more robust by falling back to reading the commit object when
the commit-graph is no longer available.
* jk/commit-graph-lazy-load-fallback:
commit: fall back to full read when maybe_tree is NULL
Junio C Hamano [Sun, 31 May 2026 01:00:38 +0000 (10:00 +0900)]
Merge branch 'pt/fsmonitor-linux'
The fsmonitor daemon has been implemented for Linux.
* pt/fsmonitor-linux:
fsmonitor: convert shown khash to strset in do_handle_client
fsmonitor: add tests for Linux
fsmonitor: add timeout to daemon stop command
fsmonitor: close inherited file descriptors and detach in daemon
run-command: add close_fd_above_stderr option
fsmonitor: implement filesystem change listener for Linux
fsmonitor: rename fsm-settings-darwin.c to fsm-settings-unix.c
fsmonitor: rename fsm-ipc-darwin.c to fsm-ipc-unix.c
fsmonitor: use pthread_cond_timedwait for cookie wait
compat/win32: add pthread_cond_timedwait
fsmonitor: fix hashmap memory leak in fsmonitor_run_daemon
fsmonitor: fix khash memory leak in do_handle_client
t9210, t9211: disable GIT_TEST_SPLIT_INDEX for scalar clone tests
Junio C Hamano [Sun, 31 May 2026 01:00:37 +0000 (10:00 +0900)]
Merge branch 'ps/graph-lane-limit'
The graph output from commands like "git log --graph" can now be
limited to a specified number of lanes, preventing overly wide output
in repositories with many branches.
* ps/graph-lane-limit:
graph: add truncation mark to capped lanes
graph: add --graph-lane-limit option
graph: limit the graph width to a hard-coded max
Junio C Hamano [Sun, 31 May 2026 01:00:37 +0000 (10:00 +0900)]
Merge branch 'jr/bisect-custom-terms-in-output'
"git bisect" now uses the selected terms (e.g., old/new) more
consistently in its output.
* jr/bisect-custom-terms-in-output:
rev-parse: use selected alternate terms to look up refs
bisect: print bisect terms in single quotes
bisect: use selected alternate terms in status output
Junio C Hamano [Sun, 31 May 2026 01:00:37 +0000 (10:00 +0900)]
Merge branch 'kk/tips-reachable-from-bases-optim'
Revision traversal optimization.
* kk/tips-reachable-from-bases-optim:
t6600: add tests for duplicate tips in tips_reachable_from_bases()
commit-reach: use object flags for tips_reachable_from_bases()
Junio C Hamano [Wed, 27 May 2026 05:15:46 +0000 (14:15 +0900)]
Merge branch 'ps/setup-wo-the-repository'
Many uses of the_repository has been updated to use a more
appropriate struct repository instance in setup.c codepath.
* ps/setup-wo-the-repository:
setup: stop using `the_repository` in `init_db()`
setup: stop using `the_repository` in `create_reference_database()`
setup: stop using `the_repository` in `initialize_repository_version()`
setup: stop using `the_repository` in `check_repository_format()`
setup: stop using `the_repository` in `upgrade_repository_format()`
setup: stop using `the_repository` in `setup_git_directory()`
setup: stop using `the_repository` in `setup_git_directory_gently()`
setup: stop using `the_repository` in `setup_git_env()`
setup: stop using `the_repository` in `set_git_work_tree()`
setup: stop using `the_repository` in `setup_work_tree()`
setup: stop using `the_repository` in `enter_repo()`
setup: stop using `the_repository` in `verify_non_filename()`
setup: stop using `the_repository` in `verify_filename()`
setup: stop using `the_repository` in `path_inside_repo()`
setup: stop using `the_repository` in `prefix_path()`
setup: stop using `the_repository` in `is_inside_work_tree()`
setup: stop using `the_repository` in `is_inside_git_dir()`
setup: replace use of `the_repository` in static functions
Junio C Hamano [Wed, 27 May 2026 05:15:46 +0000 (14:15 +0900)]
Merge branch 'ps/odb-in-memory'
Add a new odb "in-memory" source that is meant to only hold
tentative objects (like the virtual blob object that represents the
working tree file used by "git blame").
* ps/odb-in-memory:
t/unit-tests: add tests for the in-memory object source
odb: generic in-memory source
odb/source-inmemory: stub out remaining functions
odb/source-inmemory: implement `freshen_object()` callback
odb/source-inmemory: implement `count_objects()` callback
odb/source-inmemory: implement `find_abbrev_len()` callback
odb/source-inmemory: implement `for_each_object()` callback
odb/source-inmemory: convert to use oidtree
oidtree: add ability to store data
cbtree: allow using arbitrary wrapper structures for nodes
odb/source-inmemory: implement `write_object_stream()` callback
odb/source-inmemory: implement `write_object()` callback
odb/source-inmemory: implement `read_object_stream()` callback
odb/source-inmemory: implement `read_object_info()` callback
odb: fix unnecessary call to `find_cached_object()`
odb/source-inmemory: implement `free()` callback
odb: introduce "in-memory" source
Junio C Hamano [Wed, 27 May 2026 05:15:45 +0000 (14:15 +0900)]
Merge branch 'tb/incremental-midx-part-3.3'
The repacking code has been refactored and compaction of MIDX layers
have been implemented, and incremental strategy that does not require
all-into-one repacking has been introduced.
* tb/incremental-midx-part-3.3:
repack: allow `--write-midx=incremental` without `--geometric`
repack: introduce `--write-midx=incremental`
repack: implement incremental MIDX repacking
packfile: ensure `close_pack_revindex()` frees in-memory revindex
builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK`
repack-geometry: prepare for incremental MIDX repacking
repack-midx: extract `repack_fill_midx_stdin_packs()`
repack-midx: factor out `repack_prepare_midx_command()`
midx: expose `midx_layer_contains_pack()`
repack: track the ODB source via existing_packs
midx: support custom `--base` for incremental MIDX writes
midx: introduce `--no-write-chain-file` for incremental MIDX writes
midx: use `strvec` for `keep_hashes`
midx: build `keep_hashes` array in order
midx: use `strset` for retained MIDX files
midx-write: handle noop writes when converting incremental chains
Junio C Hamano [Wed, 27 May 2026 05:15:45 +0000 (14:15 +0900)]
Merge branch 'ds/fetch-negotiation-options'
The negotiation tip options in "git fetch" have been reworked to
allow requiring certain refs to be sent as "have" lines, and to
restrict negotiation to a specific set of refs.
* ds/fetch-negotiation-options:
send-pack: pass negotiation config in push
remote: add remote.*.negotiationInclude config
fetch: add --negotiation-include option for negotiation
negotiator: add have_sent() interface
remote: add remote.*.negotiationRestrict config
transport: rename negotiation_tips
fetch: add --negotiation-restrict option
t5516: fix test order flakiness
Junio C Hamano [Wed, 27 May 2026 05:15:44 +0000 (14:15 +0900)]
Merge branch 'kn/refs-fsck-skip-lock-files'
The consistency checks for the files reference backend have been updated
to skip lock files earlier, avoiding unnecessary parsing of
intermediate files.
* kn/refs-fsck-skip-lock-files:
refs/files: skip lock files during consistency checks
Junio C Hamano [Wed, 27 May 2026 05:15:43 +0000 (14:15 +0900)]
Merge branch 'en/batch-prefetch'
In a lazy clone, "git cherry" and "git grep" often fetch necessary
blob objects one by one from promisor remotes. It has been corrected
to collect necessary object names and fetch them in bulk to gain
reasonable performance.
Junio C Hamano [Wed, 27 May 2026 05:15:43 +0000 (14:15 +0900)]
Merge branch 'pb/doc-diff-format-updates'
Doc updates.
* pb/doc-diff-format-updates:
diff-format.adoc: mode and hash are 0* for unmerged paths from index only
diff-format.adoc: 'git diff-files' prints two lines for unmerged files
diff-format.adoc: remove mention of diff-tree specific output
Junio C Hamano [Wed, 27 May 2026 05:15:43 +0000 (14:15 +0900)]
Merge branch 'kk/limit-list-optim'
The limit_list() function that is one of the core part of the
revision traversal infrastructure has been optimized by replacing
its use of linear list with priority queue.
* kk/limit-list-optim:
revision: use priority queue in limit_list()
Junio C Hamano [Mon, 25 May 2026 00:40:08 +0000 (09:40 +0900)]
Merge branch 'jk/dumb-http-alternate-fix'
The HTTP walker misinterpreted the alternates file that gives an
absolute path when the server URL does not have the final slash
(i.e., "https://example.com" not "https://example.com/").
* jk/dumb-http-alternate-fix:
http: handle absolute-path alternates from server root
Junio C Hamano [Mon, 25 May 2026 00:40:08 +0000 (09:40 +0900)]
Merge branch 'jk/pretty-no-strbuf-presizing'
Remove ineffective strbuf presizing that would have computed an
allocation that would not have fit in the available memory anyway,
or too small due to integer wraparound to cause immediate automatic
growing.
* jk/pretty-no-strbuf-presizing:
pretty: drop strbuf pre-sizing from add_rfc2047()
Junio C Hamano [Mon, 25 May 2026 00:40:07 +0000 (09:40 +0900)]
Merge branch 'mm/diff-U-takes-no-negative-values'
The command line parser for "git diff" learned a few options take
only non-negative integers.
* mm/diff-U-takes-no-negative-values:
parse-options: clarify what "negated" means for PARSE_OPT_NONEG
xdiff: guard against negative context lengths
diff: reject negative values for -U/--unified
diff: reject negative values for --inter-hunk-context
Junio C Hamano [Thu, 21 May 2026 23:48:20 +0000 (08:48 +0900)]
Merge branch 'ps/maintenance-daemonize-lockfix'
"git maintenance" that goes background did not use the lockfile to
prevent multiple maintenance processes from running at the same
time, which has been corrected.
* ps/maintenance-daemonize-lockfix:
run-command: honor "gc.auto" for auto-maintenance
builtin/maintenance: fix locking with "--detach"
Junio C Hamano [Thu, 21 May 2026 03:28:55 +0000 (12:28 +0900)]
Merge branch 'js/mingw-no-nedmalloc' into maint-2.54
Stop using unmaintained custom allocator in Windows build which was
the last user of the code.
* js/mingw-no-nedmalloc:
mingw: remove the vendored compat/nedmalloc/ subtree
mingw: drop the build-system plumbing for nedmalloc
mingw: stop using nedmalloc
Junio C Hamano [Thu, 21 May 2026 03:27:47 +0000 (12:27 +0900)]
Merge branch 'js/maintenance-fix-deadlock-on-win10' into maint-2.54
To help Windows 10 installations, avoid removing files whose
contents are still mmap()'ed.
* js/maintenance-fix-deadlock-on-win10:
maintenance(geometric): do release the `.idx` files before repacking
mingw: optionally use legacy (non-POSIX) delete semantics
Junio C Hamano [Thu, 21 May 2026 03:26:28 +0000 (12:26 +0900)]
Merge branch 'js/ci-github-actions-update' into maint-2.54
Update various GitHub Actions versions.
* js/ci-github-actions-update:
l10n: bump mshick/add-pr-comment from v2 to v3
ci: bump git-for-windows/setup-git-for-windows-sdk from v1 to v2
ci: bump actions/checkout from v5 to v6
ci: bump actions/github-script from v8 to v9
ci: bump actions/{upload,download}-artifact to v7 and v8
ci: bump microsoft/setup-msbuild from v2 to v3
Junio C Hamano [Thu, 21 May 2026 03:06:47 +0000 (12:06 +0900)]
Merge branch 'kn/refs-generic-helpers'
Refactor service routines in the ref subsystem backends.
* kn/refs-generic-helpers:
refs: use peeled tag values in reference backends
refs: add peeled object ID to the `ref_update` struct
refs: move object parsing to the generic layer
update-ref: handle rejections while adding updates
update-ref: move `print_rejected_refs()` up
refs: return `ref_transaction_error` from `ref_transaction_update()`
refs: extract out reflog config to generic layer
refs: introduce `ref_store_init_options`
refs: remove unused typedef 'ref_transaction_commit_fn'
Jeff King [Tue, 19 May 2026 06:15:34 +0000 (02:15 -0400)]
commit: fall back to full read when maybe_tree is NULL
When we load a commit object from the commit graph (rather than reading
the object contents), we don't fill in its "maybe_tree" entry, but
rather wait to lazy-load it. This goes back to 7b8a21dba1 (commit-graph:
lazy-load trees for commits, 2018-04-06), and saves the work of
instantiating tree objects that nobody cares about.
But it creates a data dependency: now the commit struct depends on the
graph file to do that lazy load. This is a problem if we close the graph
file; now we have a commit struct that claims to be parsed but is
missing some of its data.
It's rare for this to be a problem in practice, because we don't tend to
close the graph files at all, and if we do we don't tend to look at
their commits afterward. But there is one case that is easy to trigger:
git-clone's --dissociate option will close the object database before
running the dissociate repack, and then afterwards still try to check
out the working tree. This will yield an error like:
What happens is that we expect repo_get_commit_tree() to lazy-load the
tree, but commit_graph_position() returns COMMIT_NOT_FROM_GRAPH because
the position slab has gone away (and even if it hadn't, we don't have
the graph file itself available anymore).
Let's try harder to find the tree in repo_get_commit_tree() by actually
opening the commit object and parsing the tree line. This is extra work,
but no more than we'd have to go to if we hadn't done the initial graph
load in the first place.
It does mean that a corrupt commit (e.g., one that points to a non-tree
object for which we couldn't instantiate a struct) will repeatedly load
the object from disk, once for each call to repo_get_commit_tree(). But
such corruptions should be rare, and we don't tend to perform such calls
repeatedly (usually we'd abort the operation upon seeing corruption).
It also means we have to reimplement a bit of the commit parsing. We
can't just use parse_commit_buffer() here, because it expects an
unparsed struct and wants to load everything, including parent links.
But we don't know if the parent list has been munged during traversal,
so it's not safe for us to touch it. Fortunately, it's quite easy to
load just the tree, as it is always the first line of the commit object.
There is an alternative approach which I considered but rejected:
"complete" each graph-loaded commit struct when we close the graph file
by looking up and instantiating their trees at close time. This is the
most elegant solution in some sense, as it resolves the data dependency
at the moment it goes away. And it avoids ever opening the commit
objects at all, which can be more efficient.
But not always. The resolving effort scales with the number of
graph-loaded commits, even though we may only later access one or a few.
So the tradeoff depends on how many were loaded in total versus how many
will be later accessed.
And in most cases, we will not access any at all! Programs which close
the object database before exiting will then do a bunch of work for no
reason. This could be mitigated by requiring a separate function to
resolve the graph structs before closing the file. But now each close
call has to consider whether to call that resolving function. So we'd
fix this case in git-clone, but we don't know what other cases (if any)
are lurking.
Moreover, this strategy does nothing if we lose access to the graph file
unexpectedly (e.g., due to a system error). I'm not entirely sure this
is possible now (we mmap it, so I'd guess any error would turn into
SIGBUS anyway). But it feels like making the lazy-load more robust
(which this patch does) is the best way to handle a wide variety of
possible failure modes.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 19 May 2026 16:24:55 +0000 (16:24 +0000)]
send-pack: pass negotiation config in push
When push.negotiate is enabled, 'git push' spawns a child 'git fetch
--negotiate-only' process to find common commits. Pass
--negotiation-include and --negotiation-restrict options from the
'remote.<name>.negotiationInclude' and
'remote.<name>.negotiationRestrict' config keys to this child process.
When negotiationRestrict is configured, it replaces the default
behavior of using all remote refs as negotiation tips. This allows
the user to control which local refs are used for push negotiation.
When negotiationInclude is configured, the specified ref patterns
are passed as --negotiation-include to ensure their tips are always
sent as 'have' lines during push negotiation.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 19 May 2026 16:24:54 +0000 (16:24 +0000)]
remote: add remote.*.negotiationInclude config
Add a new 'remote.<name>.negotiationInclude' multi-valued config option that
provides default values for --negotiation-include when no
--negotiation-include arguments are specified over the command line. This
is a mirror of how 'remote.<name>.negotiationRestrict' specifies defaults
for the --negotiation-restrict arguments.
Each value is either an exact ref name or a glob pattern whose tips should
always be sent as 'have' lines during negotiation. The config values are
resolved through the same resolve_negotiation_include() codepath as the CLI
options.
This option is additive with the normal negotiation process: the negotiation
algorithm still runs and advertises its own selected commits, but the refs
matching the config are sent unconditionally on top of those heuristically
selected commits.
Similar to the negotiationRestrict config, an empty value resets the value
list to allow ignoring earlier config values, such as those that might be
set in system or global config.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 19 May 2026 16:24:53 +0000 (16:24 +0000)]
fetch: add --negotiation-include option for negotiation
Add a new --negotiation-include option to 'git fetch', which ensures
that certain ref tips are always sent as 'have' lines during fetch
negotiation, regardless of what the negotiation algorithm selects.
This is useful when the repository has a large number of references, so
the normal negotiation algorithm truncates the list. This is especially
important in repositories with long parallel commit histories. For
example, a repo could have a 'dev' branch for development and a
'release' branch for released versions. If the 'dev' branch isn't
selected for negotiation, then it's not a big deal because there are
many in-progress development branches with a shared history. However, if
'release' is not selected for negotiation, then the server may think
that this is the first time the client has asked for that reference,
causing a full download of its parallel commit history (and any extra
data that may be unique to that branch). This is based on a real example
where certain fetches would grow to 60+ GB when a release branch
updated.
This option is a complement to --negotiation-restrict, which reduces the
negotiation ref set to a specific list. In the earlier example, using
--negotiation-restrict to focus the negotiation to 'dev' and 'release'
would avoid those problematic downloads, but would still not allow
advertising potentially-relevant user branches. In this way, the
'include' version solves the problem I mention while allowing
negotiation to pick other references opportunistically. The two options
can also be combined to allow the best of both worlds.
The argument may be an exact ref name or a glob pattern. Non-existent
refs are silently ignored. This behavior is also updated in the ref matching
logic for the related --negotiation-restrict option to match.
The implementation outputs the requested objects as haves before the
negotiator performs its own algorithm to choose the next haves. Use the new
have_sent() interface to signal these have commits were sent before engaging
with the negotiator's next() iterator.
Also add --negotiation-include to 'git pull' passthrough options.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 19 May 2026 16:24:52 +0000 (16:24 +0000)]
negotiator: add have_sent() interface
In a future change, we will introduce a capability to choose specific commit
OIDs as 'have's in fetch negotiation, with the ability to have the
negotiator choose more 'have's to increase coverage beyond that required
core set. The negotiator works to avoid emitting 'have's that can reach each
other, but that logic is hidden beneath the negotiator's iterator function
pointer ('next'). We need a way to communicate to the negotiator that we
have picked a 'have' so it could incorporate that into its logic.
Add a have_sent() method to the fetch_negotiator interface. This is the
signal that allows the negotiator to track the commit as already shown and
can perform the proper bookkeeping to avoid emitting those objects or
anything they can reach.
For our non-trivial negotiators, it is sufficient to mark these commits as
common, so the implementation is quite simple. This logic will be exercised
in the next change.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 19 May 2026 16:24:51 +0000 (16:24 +0000)]
remote: add remote.*.negotiationRestrict config
In a previous change, the --negotiation-restrict command-line option of 'git
fetch' was added as a synonym of --negotiation-tip. Both of these options
restrict the set of 'haves' the client can send as part of negotiation.
This was previously not available via a configuration option. Add a new
'remote.<name>.negotiationRestrict' multi-valued config option that updates
'git fetch <name>' to use these restrictions by default.
If the user provides even one --negotiation-restrict argument, then the
config is ignored.
An empty value resets the value list to allow ignoring earlier config
values, such as those that might be set in system or global config.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 19 May 2026 16:24:50 +0000 (16:24 +0000)]
transport: rename negotiation_tips
The previous change added the --negotiation-restrict synonym for the
--negotiation-tip option for 'git fetch'. In anticipation of adding a new
option that behaves similarly but with distinct changes to its behavior,
rename the internal representation of this data from 'negotiation_tips' to
'negotiation_restrict_tips'.
The 'tips' part is kept because this is an oid_array in the transport layer.
This requires the builtin to handle parsing refs into collections of oids so
the transport layer can handle this cleaner form of the data.
Also update the string_list used to store the inputs from command-line
options.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 19 May 2026 16:24:49 +0000 (16:24 +0000)]
fetch: add --negotiation-restrict option
The --negotiation-tip option to 'git fetch' and 'git pull' allows users
to specify that they want to focus negotiation on a small set of
references. This is a _restriction_ on the negotiation set, helping to
focus the negotiation when the ref count is high. However, it doesn't
allow for the ability to opportunistically select references beyond that
list.
This subtle detail that this is a 'maximum set' and not a 'minimum set'
is not immediately clear from the option name. This makes it more
complicated to add a new option that provides the complementary behavior
of a minimum set.
For now, create a new synonym option, --negotiation-restrict, that
behaves identically to --negotiation-tip. Update the documentation to
make it clear that this new name is the preferred option, but we keep
the old name for compatibility. Mark --negotiation-tip as an alias of the
new, preferred option.
Update a few warning messages with the new option, but also make them
translatable with the option name inserted by formatting. At least one
of these messages will be reused later for a new option.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 19 May 2026 16:24:48 +0000 (16:24 +0000)]
t5516: fix test order flakiness
The 'fetch follows tags by default' test sorts using 'sort -k 4', but
for-each-ref output only has 3 columns. This relies on sort treating records
with fewer fields as having an empty fourth field, which may produce
unstable results depending on locale. This appears to be an accident added
in 3f763ddf28 (fetch: set remote/HEAD if it does not exist, 2024-11-22).
Use 'sort -k 3' to match the actual number of columns in the output.
Reviewed-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:58:25 +0000 (11:58 -0400)]
repack: allow `--write-midx=incremental` without `--geometric`
Previously, `--write-midx=incremental` required `--geometric` and would
die() without it. Relax this restriction so that incremental MIDX
repacking can be used independently.
Without `--geometric`, the behavior is append-only: a single new MIDX
layer is created containing whatever packs were written by the repack
and appended to the existing chain (or a new chain is started). Existing
layers are preserved as-is with no compaction or merging.
Implement this via a new repack_make_midx_append_plan() that builds a
plan consisting of a WRITE step for the freshly written packs followed
by COPY steps for every existing MIDX layer. The existing compaction
plan (repack_make_midx_compaction_plan) is used only when `--geometric`
is active.
Update the documentation to describe the behavior with and without
`--geometric`, and replace the test that enforced the old restriction
with one exercising append-only incremental MIDX repacking.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:58:22 +0000 (11:58 -0400)]
repack: introduce `--write-midx=incremental`
Expose the incremental MIDX repacking mode (implemented in an earlier
commit) via a new --write-midx=incremental option for `git repack`.
Add "incremental" as a recognized argument to the --write-midx
OPT_CALLBACK, mapping it to REPACK_WRITE_MIDX_INCREMENTAL. When this
mode is active and --geometric is in use, set the midx_layer_threshold
on the pack geometry so that only packs in sufficiently large tip layers
are considered for repacking.
Two new configuration options control the compaction behavior:
- repack.midxSplitFactor (default: 2): the factor used in the
geometric merging condition for MIDX layers.
- repack.midxNewLayerThreshold (default: 8): the minimum number of
packs in the tip MIDX layer before its packs are considered as
candidates for geometric repacking.
Add tests exercising the new mode across a variety of scenarios
including basic geometric violations, multi-round chain integrity,
branching and merging histories, cross-layer object uniqueness, and
threshold-based compaction.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:58:19 +0000 (11:58 -0400)]
repack: implement incremental MIDX repacking
Implement the `write_midx_incremental()` function, which builds and
maintains an incremental MIDX chain as part of the geometric repacking
process.
Unlike the default mode which writes a single flat MIDX, the incremental
mode constructs a compaction plan that determines which MIDX layers to
write, compact, or copy, and then executes each step using `git
multi-pack-index` subcommands with the --no-write-chain-file flag.
The repacking strategy works as follows:
* Acquire the lock guarding the multi-pack-index-chain.
* A new MIDX layer is always written containing the newly created
pack(s). If the tip MIDX layer was rewritten during geometric
repacking, any surviving packs from that layer are also included.
* Starting from the new layer, adjacent MIDX layers are merged together
as long as the accumulated object count exceeds half the object count
of the next deeper layer (controlled by 'repack.midxSplitFactor').
* Remaining layers in the chain are evaluated pairwise and either
compacted or copied as-is, following the same merging condition.
* Write the contents of the new multi-pack-index chain, atomically move
it into place, and then release the lock.
* Delete any now-unused MIDX layers.
After writing the new layer, the strategy is evaluated among the
existing MIDX layers in order from oldest to newest. Each step that
writes a new MIDX layer uses "--no-write-chain-file" to avoid updating
the multi-pack-index-chain file. After all steps are complete, the new
chain file is written and then atomically moved into place.
At present, this functionality is exposed behind a new enum value,
`REPACK_WRITE_MIDX_INCREMENTAL`, but has no external callers. A
subsequent commit will expose this mode via `git repack
--write-midx=incremental`.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following commit will introduce a case where we write a MIDX bitmap
over packs that do not themselves have on-disk *.rev files.
This case is supported within Git, and we will simply fall back to
generating the revindex in memory. But we don't ever release that
memory, causing a leak that is exposed by a test introduced in the
following commit.
(As far as I could find, we never free()'d memory allocated as a
byproduct of creating an in-memory revindex, likely because that code
predates the leak-checking niceties we have in the test suite now.)
Rectify this by calling `FREE_AND_NULL()` on the `p->revindex` field
when calling `close_pack_revindex()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:58:13 +0000 (11:58 -0400)]
builtin/repack.c: convert `--write-midx` to an `OPT_CALLBACK`
Change the --write-midx (-m) flag from an OPT_BOOL to an OPT_CALLBACK
that accepts an optional mode argument. Introduce an enum with
REPACK_WRITE_MIDX_NONE and REPACK_WRITE_MIDX_DEFAULT to distinguish
between the two states, and update all existing boolean checks
accordingly.
For now, passing no argument (or just `-m`) selects the default mode,
preserving existing behavior. A subsequent commit will add a new mode
for writing incremental MIDXs.
Extract repack_write_midx() as a dispatcher that selects the
appropriate MIDX-writing implementation based on the mode.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:58:10 +0000 (11:58 -0400)]
repack-geometry: prepare for incremental MIDX repacking
Teach `pack_geometry_init()` to optionally restrict the set of
repacking candidates to only packs in the tip MIDX layer when a
`midx_layer_threshold` is configured. If the tip layer has fewer packs
than the threshold, those packs are excluded entirely; otherwise only
packs in that layer participate in the geometric repack.
Also track whether any tip-layer packs were included in the rollup
(`midx_tip_rewritten`), which a subsequent commit will use to decide
how to update the MIDX chain after repacking.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `write_midx_included_packs()` manages the lifecycle of
writing packs to stdin when running `git multi-pack-index write` as a
child process.
Extract a standalone `repack_fill_midx_stdin_packs()` helper, which
handles `--stdin-packs` argument setup, starting the command, writing
pack names to its standard input, and finishing the command.
This simplifies `write_midx_included_packs()` and prepares for a
subsequent commit where the same helper is called with `cmd->out = -1`
to capture the MIDX's checksum from the command's standard output,
which is needed when writing MIDX layers with `--no-write-chain-file`.
No functional changes are included in this patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:58:03 +0000 (11:58 -0400)]
repack-midx: factor out `repack_prepare_midx_command()`
The `write_midx_included_packs()` function assembles and executes a
`git multi-pack-index write` command, constructing the argument list
inline.
Future commits will introduce additional callers that need to construct
similar `git multi-pack-index` commands (for both `write` and `compact`
subcommands), so extract the common portions of the command setup into a
reusable `repack_prepare_midx_command()` helper.
The extracted helper sets `git_cmd`, pushes `multi-pack-index` and a
subcommand, and handles `--progress`/`--no-progress` and `--bitmap`
flags. The remaining arguments that are specific to the `write`
subcommand (such as `--stdin-packs`) are left to the caller.
No functional changes are included in this patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:58:00 +0000 (11:58 -0400)]
midx: expose `midx_layer_contains_pack()`
Rename the function `midx_contains_pack_1()` to instead be called
`midx_layer_contains_pack()` and make it accessible. Unlike
`midx_contains_pack()` (which recurses through the entire chain), this
function checks only a single MIDX layer.
This will be used by a subsequent commit to determine whether a given
pack belongs to the tip MIDX layer specifically, rather than to any
layer in the chain.
No functional changes are present in this commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:57:57 +0000 (11:57 -0400)]
repack: track the ODB source via existing_packs
Store the ODB source in the `existing_packs` struct and use that in
place of the raw `repo->objects->sources` access within `cmd_repack()`.
The source used is still assigned from the first source in the list, so
there are no functional changes in this commit. The changes instead
serve two purposes (one immediate, one not):
- The incremental MIDX-based repacking machinery will need to know what
source is being used to read the existing MIDX/chain (should one
exist).
- In the future, if "git repack" is taught how to operate on other
object sources, this field will serve as the authoritative value for
that source.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:57:54 +0000 (11:57 -0400)]
midx: support custom `--base` for incremental MIDX writes
Both `compact` and `write --incremental` fix the base of the resulting
MIDX layer: `compact` always places the compacted result on top of
"from's" immediate parent in the chain, and `write --incremental` always
appends a new layer to the existing tip. In both cases the base is not
configurable.
Future callers need additional flexibility. For instance, the incremental
MIDX-based repacking code may wish to write a layer based on some
intermediate ancestor rather than the current tip, or produce a root
layer when replacing the bottommost entries in the chain.
Introduce a new `--base` option for both subcommands to specify the
checksum of the MIDX layer to use as the base. The given checksum must
refer to a valid layer in the MIDX chain that is an ancestor of the
topmost layer being written or compacted.
The special value "none" is accepted to produce a root layer with no
parent. This will be needed when the incremental repacking machinery
determines that the bottommost layers of the chain should be replaced.
If no `--base` is given, behavior is unchanged: `compact` uses "from's"
immediate parent in the chain, and `write` appends to the existing tip.
For the `write` subcommand, `--base` requires `--no-write-chain-file`. A plain
`write --incremental` appends a new layer to the live chain tip with no
mechanism to atomically replace it; overriding the base would produce a
layer that does not extend the tip, breaking chain invariants. With
`--no-write-chain-file` the chain is left unmodified and the caller is
responsible for assembling a valid chain.
For `compact`, no such restriction applies. The compaction operation
atomically replaces the compacted range in the chain file, so writing
the result on top of any valid ancestor preserves chain invariants.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:57:51 +0000 (11:57 -0400)]
midx: introduce `--no-write-chain-file` for incremental MIDX writes
When writing an incremental MIDX layer, the MIDX machinery writes the
new layer into the multi-pack-index.d directory and then updates the
multi-pack-index-chain file to include the freshly written layer.
Future callers however may not wish to immediately update the MIDX chain
itself, preferring instead to write out new layer(s) themselves before
atomically updating the chain. Concretely, the new incremental
MIDX-based repacking strategy will want to do exactly this (that is,
assemble the new MIDX chain itself before writing a new chain file and
atomically linking it into place).
Introduce a `--no-write-chain-file` flag that:
* writes the new MIDX layer into the multi-pack-index.d directory
* prints its checksum
* does not update the multi-pack-index-chain file.
The MIDX chain file (and thus, the lock protecting it) remain untouched,
allowing callers to assemble the chain themselves. This flag requires
`--incremental`, since the notion of a separate layer only makes sense
for incremental MIDXs.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:57:48 +0000 (11:57 -0400)]
midx: use `strvec` for `keep_hashes`
The `keep_hashes` array in `write_midx_internal()` accumulates the
checksums of MIDX files that should be retained when pruning stale
entries from the MIDX chain. For similar reasons as in a previous
commit, rewrite this using a strvec, requiring us to pass one fewer
parameter.
Unlike the aforementioned previous commit, use a `strvec` instead of a
`string_list`, which provides a more ergonomic interface to adjust the
values at a particular index. The ordering is important here, as this
value is used to determine the contents of the resulting
`multi-pack-index-chain` file when writing with "--incremental".
Since the previous commit already builds the array in forward order, the
conversion is straightforward: replace indexed assignments with
`strvec_push()`, drop the pre-counting and `CALLOC_ARRAY()`, and
simplify cleanup via `strvec_clear()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:57:45 +0000 (11:57 -0400)]
midx: build `keep_hashes` array in order
Instead of filling the keep_hashes array using reverse indexing (e.g.,
`keep_hashes[count - i - 1]`) while traversing linked lists forward,
collect linked list nodes into a temporary `layers` array and then
iterate it backwards to fill `keep_hashes` sequentially.
This makes the filling logic easier to follow, since each segment of the
array is filled with a simple forward-marching index. Moreover, this
change prepares us for a subsequent commit that will switch to using a
`strvec`.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:57:42 +0000 (11:57 -0400)]
midx: use `strset` for retained MIDX files
Both `clear_midx_files_ext()` and `clear_incremental_midx_files_ext()`
build a list of filenames to keep while pruning stale MIDX files. Today
they hand-roll an array instead of using a `strset`, thus requiring us
to pass an additional length parameter, and makes lookups linear.
Replace the bare array with a `strset` which can be passed around as a
single parameter. Though it improves lookup performance, the difference
is likely immeasurable given how small the keep_hashes array typically
is.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 19 May 2026 15:57:39 +0000 (11:57 -0400)]
midx-write: handle noop writes when converting incremental chains
When updating a MIDX, we optimize out writes that will result in an
identical MIDX as the one we already have on disk. See b3bab9d2729
(midx-write: extract function to test whether MIDX needs updating,
2025-12-10) for more details on exactly which writes are optimized out.
If `midx_needs_update()` can't rule out any of the obvious cases (e.g.,
the checksum is invalid, we're requesting a different version, or
performing compaction which always requires an update), then we compare
the packs we're writing to the packs we already know about. If there are
an equal number of packs being written as there are in any existing
MIDX layer(s), then we compare the packs by their name.
This comparison fails when we have an incremental MIDX chain with
at least two layers, since we do not recursively peel through earlier
layers, instead treating the `->pack_names` array of the tip MIDX layer
as containing all `m->num_packs + m->num_packs_in_base` packs.
Adjust this to instead look through the MIDX layers one by one when
comparing pack names. While we're at it, fix a typo above in the same
function.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 20 May 2026 01:30:57 +0000 (10:30 +0900)]
Merge branch 'ps/history-fixup'
"git history" learned "fixup" command.
* ps/history-fixup:
builtin/history: introduce "fixup" subcommand
builtin/history: generalize function to commit trees
replay: allow callers to control what happens with empty commits
Some tests assume that bare repository accesses are by default
allowed; rewrite some of them to avoid the assumption, rewrite
others to explicitly set safe.bareRepository to allow them.
* js/adjust-tests-to-explicitly-access-bare-repo:
safe.bareRepository: default to "explicit" with WITH_BREAKING_CHANGES
status tests: filter `.gitconfig` from status output
ls-files tests: filter `.gitconfig` from `--others` output
t5601: restore `.gitconfig` after includeIf test
t1305: use `--git-dir=.` for bare repo in include cycle test
t1300: remove global config settings injected by test-lib.sh
t7900: do not let `$HOME/.gitconfig` interfere with XDG tests
test-lib: allow bare repository access when breaking changes are enabled
Junio C Hamano [Wed, 20 May 2026 01:30:56 +0000 (10:30 +0900)]
Merge branch 'en/diffstat-utf8-truncation-fix'
The computation to shorten the filenames shown in diffstat measured
width of individual UTF-8 characters to add up, but forgot to take
into account error cases (e.g., an invalid UTF-8 sequence, or a
control character).
* en/diffstat-utf8-truncation-fix:
diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation
Junio C Hamano [Wed, 20 May 2026 01:30:56 +0000 (10:30 +0900)]
Merge branch 'js/mingw-no-nedmalloc'
Stop using unmaintained custom allocator in Windows build which was
the last user of the code.
* js/mingw-no-nedmalloc:
mingw: remove the vendored compat/nedmalloc/ subtree
mingw: drop the build-system plumbing for nedmalloc
mingw: stop using nedmalloc
Update code paths that assumed "unsigned long" was long enough for
"size_t".
* js/objects-larger-than-4gb-on-windows:
ci: run expensive tests on push builds to integration branches
t5608: mark >4GB tests as EXPENSIVE
test-tool synthesize: add precomputed SHA-256 pack for 4 GiB + 1
test-tool synthesize: precompute pack for 4 GiB + 1
test-tool synthesize: use the unsafe hash for speed
t5608: add regression test for >4GB object clone
test-tool: add a helper to synthesize large packfiles
delta, packfile: use size_t for delta header sizes
odb, packfile: use size_t for streaming object sizes
git-zlib: handle data streams larger than 4GB
index-pack, unpack-objects: use size_t for object size
Stop using `the_repository` in `init_db()` and instead accept
the repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup: stop using `the_repository` in `create_reference_database()`
Stop using `the_repository` in `create_reference_database()` and instead
accept the repository as a parameter. The injection of `the_repository`
is thus bumped one level higher, where callers now pass it in
explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup: stop using `the_repository` in `initialize_repository_version()`
Stop using `the_repository` in `initialize_repository_version()` and
instead accept the repository as a parameter. The injection of
`the_repository` is thus bumped one level higher, where callers now pass
it in explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup: stop using `the_repository` in `check_repository_format()`
Stop using `the_repository` in `check_repository_format()` and instead
accept the repository as a parameter. The injection of `the_repository`
is thus bumped one level higher, where callers now pass it in
explicitly.
Furthermore, the function is never used outside "setup.c". Drop its
declaration in "setup.h" and make it static. Note that this requires us
to reorder the function.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup: stop using `the_repository` in `upgrade_repository_format()`
Stop using `the_repository` in `upgrade_repository_format()` and instead
accept the repository as a parameter. The injection of `the_repository`
is thus bumped one level higher, where callers now pass it in
explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup: stop using `the_repository` in `setup_git_directory()`
Stop using `the_repository` in `setup_git_directory()` and instead
accept the repository as a parameter. The injection of `the_repository`
is thus bumped one level higher, where callers now pass it in
explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup: stop using `the_repository` in `setup_git_directory_gently()`
Stop using `the_repository` in `setup_git_directory_gently()` and
instead accept the repository as a parameter. The injection of
`the_repository` is thus bumped one level higher, where callers now pass
it in explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup: stop using `the_repository` in `setup_git_env()`
Stop using `the_repository` in `setup_git_env()` and instead accept the
repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.
Furthermore, the function is never used outside of "setup.c". Drop the
declaration in "environment.h" and make it static.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup: stop using `the_repository` in `set_git_work_tree()`
Stop using `the_repository` in `set_git_work_tree()` and instead accept
the repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.
Similar as with the preceding commit, we track whether the worktree has
been initialized already via a global variable so that we can die in
case the repository is re-initialized with a different worktree path.
Store this info in the `struct repository` instead so that we correctly
handle this per repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup: stop using `the_repository` in `setup_work_tree()`
Stop using `the_repository` in `setup_work_tree()` and instead accept
the repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.
Note that the function tracks two bits of information via global
variables. This of course doesn't make much sense anymore now that we
can set up worktrees for arbitrary repositories:
- We track whether the worktree has already been initialized and, if
so, we skip the call to `chdir_notify()` and setenv(3p). It does not
make much sense to store this info in the repository, as we _would_
want to update the environment when switching between worktrees back
and forth.
So instead of storing this info in the repository, we drop this
state entirely and live with the fact that we may execute the logic
twice. It should ultimately be idempotent though and thus not be
much of a problem.
- We track whether the worktree configuration is bogus. If so, and if
later on some caller tries to setup the worktree, then we'll die
instead. This is indeed information that we can move into the
repository itself.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup: stop using `the_repository` in `enter_repo()`
Stop using `the_repository` in `enter_repo()` and instead accept the
repository as a parameter. The injection of `the_repository` is thus
bumped one level higher, where callers now pass it in explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>