Every pktline that we send out via `send_sideband()` currently requires
two syscalls: one to write the pktline's length, and one to send its
data. This typically isn't all that much of a problem, but under extreme
load the syscalls may cause contention in the kernel.
Refactor the code to instead use the newly introduced writev(3p) infra
so that we can send out the data with a single syscall. This reduces the
number of syscalls from around 133,000 calls to write(3p) to around
67,000 calls to writev(3p).
Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit we have added a compatibility wrapper for the
writev(3p) syscall. Introduce some generic wrappers for this function
that we nowadays take for granted in the Git codebase.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit we're going to add the first caller to
writev(3p). Introduce a compatibility wrapper for this syscall that we
can use on systems that don't have this syscall.
The syscall exists on modern Unixes like Linux and macOS, and seemingly
even for NonStop according to [1]. It doesn't seem to exist on Windows
though.
upload-pack: reduce lock contention when writing packfile data
In our production systems we have recently observed write contention in
git-upload-pack(1). The system in question was consistently streaming
packfiles at a rate of dozens of gigabits per second, but curiously the
system was neither bottlenecked on CPU, memory or IOPS.
We eventually discovered that Git was spending 80% of its time in
`pipe_write()`, out of which almost all of the time was spent in the
`ep_poll_callback` function in the kernel. Quoting the reporter:
This infrastructure is part of an event notification queue designed to
allow for multiple producers to emit events, but that concurrency
safety is guarded by 3 layers of locking. The layer we're hitting
contention in uses a simple reader/writer lock mode (a.k.a. shared
versus exclusive mode), where producers need shared-mode (read mode),
and various other actions use exclusive (write) mode.
The system in question generates workloads where we have hundreds of
git-upload-pack(1) processes active at the same point in time. These
processes end up contending around those locks, and the consequence is
that the Git processes stall.
Now git-upload-pack(1) already has the infrastructure in place to buffer
some of the data it reads from git-pack-objects(1) before actually
sending it out. We only use this infrastructure in very limited ways
though, so we generally end up matching one read(3p) call with one
write(3p) call. Even worse, when the sideband is enabled we end up
matching one read with _two_ writes: one for the pkt-line length, and
one for the packfile data.
Extend our use of the buffering infrastructure so that we soak up bytes
until the buffer is filled up at least 2/3rds of its capacity. The
change is relatively simple to implement as we already know to flush the
buffer in `create_pack_file()` after git-pack-objects(1) has finished.
This significantly reduces the number of write(3p) syscalls we need to
do. Before this change, cloning the Linux repository resulted in around
400,000 write(3p) syscalls. With the buffering in place we only do
around 130,000 syscalls.
Now we could of course go even further and make sure that we always fill
up the whole buffer. But this might cause an increase in read(3p)
syscalls, and some tests show that this only reduces the number of
write(3p) syscalls from 130,000 to 100,000. So overall this doesn't seem
worth it.
Note that the issue could also be fixed by adapting the write buffer
that we use in the downstream git-pack-objects(1) command, and such a
change would have roughly the same result. But the command that
generates the packfile data may not always be git-pack-objects(1) as it
can be changed via "uploadpack.packObjectsHook", so such a fix would
only help in _some_ cases. Regardless of that, we'll also adapt the
write buffer size of git-pack-objects(1) in a subsequent commit.
Helped-by: Matt Smiley <msmiley@gitlab.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
upload-pack: prefer flushing data over sending keepalive
When using the sideband in git-upload-pack(1) we know to send out
keepalive packets in case generating the pack takes too long. These
keepalives take the form of a simple empty pktline.
In the preceding commit we have adapted git-upload-pack(1) to buffer
data more aggressively before sending it to the client. This creates an
obvious optimization opportunity: when we hit the keepalive timeout
while we still hold on to some buffered data, then it makes more sense
to flush out the data instead of sending the empty keepalive packet.
This is overall not going to be a significant win. Most keepalives will
come before the pack data starts, and once pack-objects starts producing
data, it tends to do so pretty consistently. And of course we can't send
data before we see the PACK header, because the whole point is to buffer
the early bit waiting for packfile URIs. But the optimization is easy
enough to realize.
Do so and flush out data instead of sending an empty pktline.
Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `create_pack_file()` is responsible for sending the
packfile data to the client of git-upload-pack(1). As generating the
bytes may take significant computing resources we also have a mechanism
in place that optionally sends keepalive pktlines in case we haven't
sent out any data.
The keepalive logic is purely based poll(3p): we pass a timeout to that
syscall, and if the call times out we send out the keepalive pktline.
While reasonable, this logic isn't entirely sufficient: even if the call
to poll(3p) ends because we have received data on any of the file
descriptors we may not necessarily send data to the client.
The most important edge case here happens in `relay_pack_data()`. When
we haven't seen the initial "PACK" signature from git-pack-objects(1)
yet we buffer incoming data. So in the worst case, if each of the bytes
of that signature arrive shortly before the configured keepalive
timeout, then we may not send out any data for a time period that is
(almost) four times as long as the configured timeout.
This edge case is rather unlikely to matter in practice. But in a
subsequent commit we're going to adapt our buffering mechanism to become
more aggressive, which makes it more likely that we don't send any data
for an extended amount of time.
Adapt the logic so that instead of using a fixed timeout on every call
to poll(3p), we instead figure out how much time has passed since the
last-sent data.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
upload-pack: fix debug statement when flushing packfile data
When git-upload-pack(1) writes packfile data to the client we have some
logic in place that buffers some partial lines. When that buffer still
contains data after git-pack-objects(1) has finished we flush the buffer
so that all remaining bytes are sent out.
Curiously, when we do so we also print the string "flushed." to stderr.
This statement has been introduced in b1c71b7281 (upload-pack: avoid
sending an incomplete pack upon failure, 2006-06-20), so quite a while
ago. What's interesting though is that stderr is typically spliced
through to the client-side, and consequently the client would see this
message. Munging the way how we do the caching indeed confirms this:
It's quite clear that this string shouldn't ever be visible to the
client, so it rather feels like this is a left-over debug statement. The
menitoned commit doesn't mention this line, either.
Remove the debug output to prepare for a change in how we do the
buffering in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Sat, 31 Jan 2026 13:32:54 +0000 (21:32 +0800)]
Merge branch 'jx/zh_CN' of github.com:jiangxin/git
* 'jx/zh_CN' of github.com:jiangxin/git:
l10n: zh_CN: standardize glossary terms
l10n: zh_CN: updated translation for 2.53
l10n: zh_CN: fix inconsistent use of standard vs. wide colons
Jiang Xin [Fri, 30 Jan 2026 02:38:47 +0000 (10:38 +0800)]
l10n: zh_CN: standardize glossary terms
Add preferred Chinese terminology notes and align existing translations
to the updated glossary. AI-assisted review was used to check and
improve legacy translations.
Jiang Xin [Thu, 29 Jan 2026 13:41:39 +0000 (21:41 +0800)]
l10n: zh_CN: fix inconsistent use of standard vs. wide colons
Replace mixed usage of standard (ASCII) colons ':' with full-width
(wide) colons ':' in Chinese translations to ensure typographic
consistency, as reported by CAESIUS-TIM [1].
Full-width punctuation is preferred in Chinese localization for better
readability and adherence to typesetting conventions.
Junio C Hamano [Sun, 25 Jan 2026 17:08:06 +0000 (09:08 -0800)]
Merge branch 'master' of https://github.com/j6t/git-gui
* 'master' of https://github.com/j6t/git-gui:
git-gui: mark *.po files at any directory level as UTF-8
git-gui i18n: Update Bulgarian translation (558t)
git-gui i18n: Update Bulgarian translation (557t)
Johannes Sixt [Sun, 25 Jan 2026 09:46:23 +0000 (10:46 +0100)]
git-gui: mark *.po files at any directory level as UTF-8
When a commit is viewed in Gitk that changes a file in po/glossary, the
patch text shows mojibake instead of correctly decoded UTF-8 text.
Gitk retrieves the encoding attribute to decide how to treat the bytes
that make up the patch text. There is an attribute definition that all
files are US-ASCII, and a later attribute definition overrides this.
But the override, which specifies UTF-8, applies only to *.po files in
directory po/ and does not apply to subdirectories.
Widen the pattern to apply to all directory levels.
* dk/replay-doc-omit-irrelevant-rev-list-options:
lint-gitlink: preemptively ignore all /ifn?def|endif/ macros
replay: drop rev-list formatting options from manual
Junio C Hamano [Fri, 23 Jan 2026 21:34:36 +0000 (13:34 -0800)]
Merge branch 'js/symlink-windows'
Upstream symbolic link support on Windows from Git-for-Windows.
* js/symlink-windows:
mingw: special-case index entries for symlinks with buggy size
mingw: emulate `stat()` a little more faithfully
mingw: try to create symlinks without elevated permissions
mingw: add support for symlinks to directories
mingw: implement basic `symlink()` functionality (file symlinks only)
mingw: implement `readlink()`
mingw: allow `mingw_chdir()` to change to symlink-resolved directories
mingw: support renaming symlinks
mingw: handle symlinks to directories in `mingw_unlink()`
mingw: add symlink-specific error codes
mingw: change default of `core.symlinks` to false
mingw: factor out the retry logic
mingw: compute the correct size for symlinks in `mingw_lstat()`
mingw: teach dirent about symlinks
mingw: let `mingw_lstat()` error early upon problems with reparse points
mingw: drop the separate `do_lstat()` function
mingw: implement `stat()` with symlink support
mingw: don't call `GetFileAttributes()` twice in `mingw_lstat()`
Junio C Hamano [Fri, 23 Jan 2026 21:34:36 +0000 (13:34 -0800)]
Merge branch 'js/ci-leak-skip-svn'
Dscho observed that SVN tests are taking too much time in CI leak
checking tasks, but most time is spent not in our code but in libsvn
code (which happen to be written in Perl), whose leaks have little
value to discover for us. Skip SVN, P4, and CVS tests in the leak
checking tasks.
* js/ci-leak-skip-svn:
ci: skip CVS and P4 tests in leaks job, too
ci(*-leaks): skip the git-svn tests to save time
Junio C Hamano [Thu, 22 Jan 2026 00:16:28 +0000 (16:16 -0800)]
Merge branch 'rs/tree-wo-the-repository'
Remove implicit reliance on the_repository global in the APIs
around tree objects and make it explicit which repository to work
in.
* rs/tree-wo-the-repository:
cocci: remove obsolete the_repository rules
cocci: convert parse_tree functions to repo_ variants
tree: stop using the_repository
tree: use repo_parse_tree()
path-walk: use repo_parse_tree_gently()
pack-bitmap-write: use repo_parse_tree()
delta-islands: use repo_parse_tree()
bloom: use repo_parse_tree()
add-interactive: use repo_parse_tree_indirect()
tree: add repo_parse_tree*()
environment: move access to core.maxTreeDepth into repo settings
Junio C Hamano [Thu, 22 Jan 2026 00:16:27 +0000 (16:16 -0800)]
Merge branch 'tb/midx-write-corrupt-checksum-fix'
The logic that avoids reusing MIDX files with a wrong checksum was
broken, which has been corrected.
* tb/midx-write-corrupt-checksum-fix:
midx-write.c: assume checksum-invalid MIDXs require an update
t/t5319-multi-pack-index.sh: drop early 'test_done'
"git repack --geometric" did not work with promisor packs, which
has been corrected.
* ps/geometric-repacking-with-promisor-remotes:
builtin/repack: handle promisor packs with geometric repacking
repack-promisor: extract function to remove redundant packs
repack-promisor: extract function to finalize repacking
repack-geometry: extract function to compute repacking split
builtin/pack-objects: exclude promisor objects with "--stdin-packs"
Junio C Hamano [Wed, 21 Jan 2026 16:29:00 +0000 (08:29 -0800)]
Merge branch 'js/prep-symlink-windows'
Further preparation to upstream symbolic link support on Windows.
* js/prep-symlink-windows:
trim_last_path_component(): avoid hard-coding the directory separator
strbuf_readlink(): support link targets that exceed 2*PATH_MAX
strbuf_readlink(): avoid calling `readlink()` twice in corner-cases
init: do parse _all_ core.* settings early
mingw: do resolve symlinks in `getcwd()`
Junio C Hamano [Wed, 21 Jan 2026 16:29:00 +0000 (08:29 -0800)]
Merge branch 'ps/read-object-info-improvements'
The object-info API has been cleaned up.
* ps/read-object-info-improvements:
packfile: drop repository parameter from `packed_object_info()`
packfile: skip unpacking object header for disk size requests
packfile: disentangle return value of `packed_object_info()`
packfile: always populate pack-specific info when reading object info
packfile: extend `is_delta` field to allow for "unknown" state
packfile: always declare object info to be OI_PACKED
object-file: always set OI_LOOSE when reading object info
Junio C Hamano [Wed, 21 Jan 2026 16:28:58 +0000 (08:28 -0800)]
Merge branch 'ps/packfile-store-in-odb-source'
The packfile_store data structure is moved from object store to odb
source.
* ps/packfile-store-in-odb-source:
packfile: move MIDX into packfile store
packfile: refactor `find_pack_entry()` to work on the packfile store
packfile: inline `find_kept_pack_entry()`
packfile: only prepare owning store in `packfile_store_prepare()`
packfile: only prepare owning store in `packfile_store_get_packs()`
packfile: move packfile store into object source
packfile: refactor misleading code when unusing pack windows
packfile: refactor kept-pack cache to work with packfile stores
packfile: pass source to `prepare_pack()`
packfile: create store via its owning source
Junio C Hamano [Wed, 21 Jan 2026 16:28:58 +0000 (08:28 -0800)]
Merge branch 'ps/ref-consistency-checks'
Update code paths that check data integrity around refs subsystem.
cf. <CAOLa=ZShPP3BPXa=YnC-vuX4zF=pUTFdUidZwOdna8bfVTNM9w@mail.gmail.com>
* ps/ref-consistency-checks:
builtin/fsck: drop `fsck_head_link()`
builtin/fsck: move generic HEAD check into `refs_fsck()`
builtin/fsck: move generic object ID checks into `refs_fsck()`
refs/reftable: introduce generic checks for refs
refs/reftable: fix consistency checks with worktrees
refs/reftable: extract function to retrieve backend for worktree
refs/reftable: adapt includes to become consistent
refs/files: introduce function to perform normal ref checks
refs/files: extract generic symref target checks
fsck: drop unused fields from `struct fsck_ref_report`
refs/files: perform consistency checks for root refs
refs/files: improve error handling when verifying symrefs
refs/files: extract function to check single ref
refs/files: remove useless indirection
refs/files: remove `refs_check_dir` parameter
refs/files: move fsck functions into global scope
refs/files: simplify iterating through root refs
Junio C Hamano [Wed, 21 Jan 2026 16:28:57 +0000 (08:28 -0800)]
Merge branch 'tb/macos-iconv-workarounds'
The iconv library on macOS fails to correctly handle stateful
ISO/IEC 2022 encoded strings. Work it around instead of replacing
it wholesale from homebrew.
* tb/macos-iconv-workarounds:
utf8.c: enable workaround for iconv under macOS 14/15
utf8.c: prepare workaround for iconv under macOS 14/15
Jean-Noël Avila [Wed, 21 Jan 2026 13:27:05 +0000 (14:27 +0100)]
lint-gitlink: preemptively ignore all /ifn?def|endif/ macros
Instead of testing if the macro name is ifn?def:: as if it were a inline
macro, it is faster and safer to just ignore such block macro lines before
hand.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
D. Ben Knoble [Tue, 20 Jan 2026 14:05:57 +0000 (09:05 -0500)]
replay: drop rev-list formatting options from manual
The rev-list options in our manuals are quite long; git-replay's manual
is no exception. Since replay doesn't use the formatting options at all
(it has its own output format), drop them.
This is the first time we have needed compound tests [1] for if[n]def in
our documentation:
For both ifdef and ifndef, the "," takes on the intuitive meaning:
- ifdef: if any of the listed attributes are set…
- ifndef: unless any of the listed attributes are set
(Use "+" for "all".)
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sat, 17 Jan 2026 18:34:17 +0000 (10:34 -0800)]
ci: skip CVS and P4 tests in leaks job, too
Looking at the CI logs, the p4 and cvs tests account for another 24
minutes of test time and they offer minimal value for quite a
similar reason as the previous step.
Let's introduce and use a mechanism to skip these tests to save
some resources.
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
I noticed recently that the leak-checking jobs still take a lot of time,
and upon analysis, the git-svn tests contribute significantly to this.
Analyzing a recent CI run, I saw that the Git test suite contains
1,017 tests, running for approximately 5¼ hours total. Of these, 65
git-svn-related tests (~6% of test count) took 42.24 minutes combined,
accounting for ~13.% of the total runtime. This implies that the git-svn
tests are roughly twice as expernsive compared to the other tests.
However, testing git-svn in the leak-checking jobs provides minimal
value: git-svn is implemented as a Perl script, and leak checking only
handles C code. While git-svn does call into Git's built-in commands
that are implemented in C, these are standard Git operations that are
already thoroughly exercised elsewhere in the test suite. Therefore,
running the git-svn tests in the leak-checking jobs only adds to the
overall run time with little value in return.
Given that the leak-checking jobs are particularly time-intensive and
these 42+ minutes of SVN tests per job provide no additional leak
detection value, skip them in the *-leaks jobs to reduce CI runtime.
Assisted-by: Claude Sonnet 4.5 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tian Yuchen [Sat, 17 Jan 2026 06:25:15 +0000 (14:25 +0800)]
t1005: modernize "! test -f" to "test_path_is_missing"
Replace instances of "! test -f <file>" with "test_path_is_missing <file>".
This macro provides better diagnostics when the test fails (it prints
"Path exists:" instead of silently failing).
Signed-off-by: Tian Yuchen <a3205153416@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Sat, 17 Jan 2026 13:59:38 +0000 (21:59 +0800)]
help: report on whether or not gettext is enabled
When users report that Git has no localized output, we need to check not
only their locale settings, but also whether Git was built with GETTEXT
support in the first place.
Expose this information via the existing build info output by adding a
"gettext: enabled" line to `git version --build-options` (and therefore
also to `git bugreport`) when `NO_GETTEXT` is not defined at build time.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ramsay Jones [Fri, 16 Jan 2026 20:39:56 +0000 (20:39 +0000)]
t0610-reftable-basics: mitigate a flaky test on cygwin
Test #29 ('ref transaction: corrupted tables cause failure') started to
fail intermittently for me (from v2.52.0-rc0) when running the testsuite
with '-j8'. (Also, having moved to a new laptop and windows 11, rather
than windows 10). If the test is run by hand, or without any parallelism,
then it passes without issue.
When the test fails (e.g. 1 out of 32 parallel runs) the cause is due to
a permission error while corrupting a table file:
./test-lib.sh: line 1010: .git/reftable/0x000000000001-0x000000000002-d89bb8ee.ref: Permission denied
This corruption is done in a shell loop, directly after a 'test_commit',
which uses an ': >"$f"' expression to truncate the file. Adding a sleep
of one second after the 'test_commit' and before the shell loop fixes
the test (it is not clear why). Replacing the redirection shell expression
with a 'test-tool truncate "$f" 0' invocation also provides a fix, which
could simply be another way to change the timing sufficiently to win the
race.
During a debug session, I tried looking at the strace output for the
shell redirection:
$ rm /tmp/hello; echo hello >/tmp/hello; ls -l /tmp/hello
-rw-r--r-- 1 ramsay None 6 Nov 10 17:25 /tmp/hello
$
When comparing the output, the differences seemed to be what you would
expect and, if anything, the shell redirect probably would have taken
longer than the test-tool solution (many fcntl() calls to dup the stdout
to the <fd>). The call to the win32 api NtCreateFile() was identical,
apart from the first (FileHandle) parameter, of course.
In order to fix this flaky test on cygwin, despite not knowing why it
works, replace the shell redirection with the above 'test-tool truncate'
invocation.
Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ramsay Jones [Fri, 16 Jan 2026 20:39:44 +0000 (20:39 +0000)]
t9700/test.pl: fix path type expectation on cygwin
Commit 4ec7ac101b ("t9700: accommodate for Windows paths", 2025-12-17)
changed the type of the absolute path to the git directory from unix to
win32 for both GfW and cygwin. This fixed the test for GfW but causes
new failures on cygwin, since the test expectation is that it uses unix
paths on cygwin. In order to not break cygwin, disable the new code by
removing the "or $^O eq 'cygwin'" sub-expression from the conditional
part of the fix.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 16 Jan 2026 20:40:28 +0000 (12:40 -0800)]
Merge branch 'kh/doc-patch-id'
"git patch-id" documentation updates.
* kh/doc-patch-id:
doc: patch-id: --verbatim locks in --stable
doc: patch-id: spell out the git-diff-tree(1) form
doc: patch-id: use definite article for the result
patch-id: use “patch ID” throughout
doc: patch-id: capitalize Git version
doc: patch-id: don’t use semicolon between bullet points
René Scharfe [Thu, 15 Jan 2026 22:01:25 +0000 (23:01 +0100)]
cocci: remove obsolete the_repository rules
035c7de9e9e (cocci: apply the "revision.h" part of
"the_repository.pending", 2023-03-28) removed the last of the repo-less
functions and macros mentioned in the_repository.cocci at the time. No
stragglers appeared since then. Remove the applied rules now that they
have outlived their usefulness.
Also add a reminder to eventually remove the just added rules for
tree.h.
Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
It seems to have caused a few regressions, two of the three known
ones we have proposed solutions for. Let's give ourselves a bit
more room to maneuver during the pre-release freeze period and
restart once the 2.53 ships.
Junio C Hamano [Thu, 15 Jan 2026 15:12:41 +0000 (07:12 -0800)]
Merge branch 'ps/clar-integers'
Import newer version of "clar", unit testing framework.
* ps/clar-integers:
gitattributes: disable blank-at-eof errors for clar test expectations
t/unit-tests: demonstrate use of integer comparison assertions
t/unit-tests: update clar to 39f11fe
Junio C Hamano [Thu, 15 Jan 2026 15:12:41 +0000 (07:12 -0800)]
Merge branch 'kh/replay-invalid-onto-advance'
Improve the error message when a bad argument is given to the
`--onto` option of "git replay". Test coverage of "git replay" has
been improved.
* kh/replay-invalid-onto-advance:
t3650: add more regression tests for failure conditions
replay: die if we cannot parse object
replay: improve code comment and die message
replay: die descriptively when invalid commit-ish is given
replay: find *onto only after testing for ref name
replay: remove dead code and rearrange
builtin/repack: handle promisor packs with geometric repacking
When performing a fetch with an object filter, we mark the resulting
packfile as a promisor pack. An object part of such a pack may miss any
of its referenced objects, and Git knows to handle this case by fetching
any such missing objects from the promisor remote.
The "promisor" property needs to be retained going forward. So every
time we pack a promisor object, the resulting pack must be marked as a
promisor pack. git-repack(1) does this already: when a repository has a
promisor remote, it knows to pass "--exclude-promisor-objects" to the
git-pack-objects(1) child process. Promisor packs are written separately
when doing an all-into-one repack via `repack_promisor_objects()`.
But we don't support promisor objects when doing a geometric repack yet.
Promisor packs do not get any special treatment there, as we simply
merge promisor and non-promisor packs. The resulting pack is not even
marked as a promisor pack, which essentially corrupts the repository.
This corruption couldn't happen in the real world though: we pass both
"--exclude-promisor-objects" and "--stdin-packs" to git-pack-objects(1)
if a repository has a promisor remote, but as those options are mutually
exclusive we always end up dying. And while we made those flags
compatible with one another in a preceding commit, we still end up dying
in case git-pack-objects(1) is asked to repack a promisor pack.
There's multiple ways to fix this:
- We can exclude promisor packs from the geometric progression
altogether. This would have the consequence that we never repack
promisor packs at all. But in a partial clone it is quite likely
that the user generates a bunch of promisor packs over time, as
every backfill fetch would create another one. So this doesn't
really feel like a sensible option.
- We can adapt git-pack-objects(1) to support repacking promisor packs
and include them in the normal geometric progression. But this would
mean that the set of promisor objects expands over time as the packs
are merged with normal packs.
- We can use a separate geometric progression to repack promisor
packs.
The first two options both have significant downsides, so they aren't
really feasible. But the third option fixes both of these downsides: we
make sure that promisor packs get merged, and at the same time we never
expand the set of promisor objects beyond the set of objects that are
already marked as promisor objects.
Implement this strategy so that geometric repacking works in partial
clones.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
repack-promisor: extract function to remove redundant packs
We're about to add a second caller that wants to remove redundant packs
after a geometric repack. Split out the function which does this to
prepare for that.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
repack-geometry: extract function to compute repacking split
We're about to add a second caller that wants to compute the repacking
split for a set of packfiles. Split out the function that computes this
split to prepare for that.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/pack-objects: exclude promisor objects with "--stdin-packs"
It is currently not possible to combine "--exclude-promisor-objects"
with "--stdin-packs" because both flags want to set up a revision walk
to enumerate the objects to pack. In a subsequent commit though we want
to extend geometric repacks to support promisor objects, and for that we
need to handle the combination of both flags.
There are two cases we have to think about here:
- "--stdin-packs" asks us to pack exactly the objects part of the
specified packfiles. It is somewhat questionable what to do in the
case where the user asks us to exclude promisor objects, but at the
same time explicitly passes a promisor pack to us. For now, we
simply abort the request as it is self-contradicting. As we have
also been dying before this commit there is no regression here.
- "--stdin-packs=follow" does the same as the first flag, but it also
asks us to include all objects transitively reachable from any
object in the packs we are about to repack. This is done by doing
the revision walk mentioned further up. Luckily, fixing this case is
trivial: we only need to modify the revision walk to also set the
`exclude_promisor_objects` field.
Note that we do not support the "--exclude-promisor-objects-best-effort"
flag for now as we don't need it to support geometric repacking with
promisor objects.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Mon, 12 Jan 2026 23:45:06 +0000 (18:45 -0500)]
midx-write.c: assume checksum-invalid MIDXs require an update
In 6ce9d558ced (midx-write: skip rewriting MIDX with `--stdin-packs`
unless needed, 2025-12-10), the MIDX machinery learned how to optimize
out unnecessary writes with "--stdin-packs".
In order to do this, it compares the contents of the in-progress write
against a MIDX loaded directly from the object store. We load a separate
MIDX (as opposed to checking our update relative to "ctx.m") because the
MIDX code does not reuse an existing MIDX with --stdin-packs, and always
leaves "ctx.m" as NULL. See commit 0c5a62f14bc (midx-write.c: do not
read existing MIDX with `packs_to_include`, 2024-06-11) for details on
why.
If "ctx.m" is non-NULL, however, it is guaranteed to be checksum-valid,
since we only assign "ctx.m" when "midx_checksum_valid()" returns true.
Since the same guard does not exist for the MIDX we pass to
"midx_needs_update()", we may ignore on-disk corruption when determining
whether or not we can optimize out the write.
Add a similar guard within "midx_needs_update()" to prevent such an
issue.
A more robust fix would involve revising 0c5a62f14bc and teaching the
MIDX generation code how to reuse an existing MIDX even when invoked
with "--stdin-packs", such that we could avoid side-loading the MIDX
directly from the object store in order to call "midx_needs_update()".
For now, pursue the minimal fix.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Mon, 12 Jan 2026 23:45:03 +0000 (18:45 -0500)]
t/t5319-multi-pack-index.sh: drop early 'test_done'
In 6ce9d558ced (midx-write: skip rewriting MIDX with `--stdin-packs`
unless needed, 2025-12-10), an extra 'test_done' was added, causing the
test script to finish before having run all of its tests.
Dropping this extraneous 'test_done' exposes a bug from commit 6ce9d558ced that causes a subsequent test to fail. Mark that test with a
'test_expect_failure' for now, and the subsequent commit will explain
and fix the bug.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 13 Jan 2026 13:21:09 +0000 (05:21 -0800)]
Merge branch 'ps/repack-avoid-noop-midx-rewrite' into tb/midx-write-corrupt-checksum-fix
* ps/repack-avoid-noop-midx-rewrite:
midx-write: skip rewriting MIDX with `--stdin-packs` unless needed
midx-write: extract function to test whether MIDX needs updating
midx: fix `BUG()` when getting preferred pack without a reverse index