Junio C Hamano [Wed, 27 May 2026 05:15:43 +0000 (14:15 +0900)]
Merge branch 'pb/doc-diff-format-updates'
Doc updates.
* pb/doc-diff-format-updates:
diff-format.adoc: mode and hash are 0* for unmerged paths from index only
diff-format.adoc: 'git diff-files' prints two lines for unmerged files
diff-format.adoc: remove mention of diff-tree specific output
Junio C Hamano [Wed, 27 May 2026 05:15:43 +0000 (14:15 +0900)]
Merge branch 'kk/limit-list-optim'
The limit_list() function that is one of the core part of the
revision traversal infrastructure has been optimized by replacing
its use of linear list with priority queue.
* kk/limit-list-optim:
revision: use priority queue in limit_list()
Junio C Hamano [Mon, 25 May 2026 00:40:08 +0000 (09:40 +0900)]
Merge branch 'jk/dumb-http-alternate-fix'
The HTTP walker misinterpreted the alternates file that gives an
absolute path when the server URL does not have the final slash
(i.e., "https://example.com" not "https://example.com/").
* jk/dumb-http-alternate-fix:
http: handle absolute-path alternates from server root
Junio C Hamano [Mon, 25 May 2026 00:40:08 +0000 (09:40 +0900)]
Merge branch 'jk/pretty-no-strbuf-presizing'
Remove ineffective strbuf presizing that would have computed an
allocation that would not have fit in the available memory anyway,
or too small due to integer wraparound to cause immediate automatic
growing.
* jk/pretty-no-strbuf-presizing:
pretty: drop strbuf pre-sizing from add_rfc2047()
Junio C Hamano [Mon, 25 May 2026 00:40:07 +0000 (09:40 +0900)]
Merge branch 'mm/diff-U-takes-no-negative-values'
The command line parser for "git diff" learned a few options take
only non-negative integers.
* mm/diff-U-takes-no-negative-values:
parse-options: clarify what "negated" means for PARSE_OPT_NONEG
xdiff: guard against negative context lengths
diff: reject negative values for -U/--unified
diff: reject negative values for --inter-hunk-context
Junio C Hamano [Thu, 21 May 2026 23:48:20 +0000 (08:48 +0900)]
Merge branch 'ps/maintenance-daemonize-lockfix'
"git maintenance" that goes background did not use the lockfile to
prevent multiple maintenance processes from running at the same
time, which has been corrected.
* ps/maintenance-daemonize-lockfix:
run-command: honor "gc.auto" for auto-maintenance
builtin/maintenance: fix locking with "--detach"
Junio C Hamano [Thu, 21 May 2026 03:28:55 +0000 (12:28 +0900)]
Merge branch 'js/mingw-no-nedmalloc' into maint-2.54
Stop using unmaintained custom allocator in Windows build which was
the last user of the code.
* js/mingw-no-nedmalloc:
mingw: remove the vendored compat/nedmalloc/ subtree
mingw: drop the build-system plumbing for nedmalloc
mingw: stop using nedmalloc
Junio C Hamano [Thu, 21 May 2026 03:27:47 +0000 (12:27 +0900)]
Merge branch 'js/maintenance-fix-deadlock-on-win10' into maint-2.54
To help Windows 10 installations, avoid removing files whose
contents are still mmap()'ed.
* js/maintenance-fix-deadlock-on-win10:
maintenance(geometric): do release the `.idx` files before repacking
mingw: optionally use legacy (non-POSIX) delete semantics
Junio C Hamano [Thu, 21 May 2026 03:26:28 +0000 (12:26 +0900)]
Merge branch 'js/ci-github-actions-update' into maint-2.54
Update various GitHub Actions versions.
* js/ci-github-actions-update:
l10n: bump mshick/add-pr-comment from v2 to v3
ci: bump git-for-windows/setup-git-for-windows-sdk from v1 to v2
ci: bump actions/checkout from v5 to v6
ci: bump actions/github-script from v8 to v9
ci: bump actions/{upload,download}-artifact to v7 and v8
ci: bump microsoft/setup-msbuild from v2 to v3
Junio C Hamano [Thu, 21 May 2026 03:06:47 +0000 (12:06 +0900)]
Merge branch 'kn/refs-generic-helpers'
Refactor service routines in the ref subsystem backends.
* kn/refs-generic-helpers:
refs: use peeled tag values in reference backends
refs: add peeled object ID to the `ref_update` struct
refs: move object parsing to the generic layer
update-ref: handle rejections while adding updates
update-ref: move `print_rejected_refs()` up
refs: return `ref_transaction_error` from `ref_transaction_update()`
refs: extract out reflog config to generic layer
refs: introduce `ref_store_init_options`
refs: remove unused typedef 'ref_transaction_commit_fn'
Junio C Hamano [Wed, 20 May 2026 01:30:57 +0000 (10:30 +0900)]
Merge branch 'ps/history-fixup'
"git history" learned "fixup" command.
* ps/history-fixup:
builtin/history: introduce "fixup" subcommand
builtin/history: generalize function to commit trees
replay: allow callers to control what happens with empty commits
Some tests assume that bare repository accesses are by default
allowed; rewrite some of them to avoid the assumption, rewrite
others to explicitly set safe.bareRepository to allow them.
* js/adjust-tests-to-explicitly-access-bare-repo:
safe.bareRepository: default to "explicit" with WITH_BREAKING_CHANGES
status tests: filter `.gitconfig` from status output
ls-files tests: filter `.gitconfig` from `--others` output
t5601: restore `.gitconfig` after includeIf test
t1305: use `--git-dir=.` for bare repo in include cycle test
t1300: remove global config settings injected by test-lib.sh
t7900: do not let `$HOME/.gitconfig` interfere with XDG tests
test-lib: allow bare repository access when breaking changes are enabled
Junio C Hamano [Wed, 20 May 2026 01:30:56 +0000 (10:30 +0900)]
Merge branch 'en/diffstat-utf8-truncation-fix'
The computation to shorten the filenames shown in diffstat measured
width of individual UTF-8 characters to add up, but forgot to take
into account error cases (e.g., an invalid UTF-8 sequence, or a
control character).
* en/diffstat-utf8-truncation-fix:
diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation
Junio C Hamano [Wed, 20 May 2026 01:30:56 +0000 (10:30 +0900)]
Merge branch 'js/mingw-no-nedmalloc'
Stop using unmaintained custom allocator in Windows build which was
the last user of the code.
* js/mingw-no-nedmalloc:
mingw: remove the vendored compat/nedmalloc/ subtree
mingw: drop the build-system plumbing for nedmalloc
mingw: stop using nedmalloc
Update code paths that assumed "unsigned long" was long enough for
"size_t".
* js/objects-larger-than-4gb-on-windows:
ci: run expensive tests on push builds to integration branches
t5608: mark >4GB tests as EXPENSIVE
test-tool synthesize: add precomputed SHA-256 pack for 4 GiB + 1
test-tool synthesize: precompute pack for 4 GiB + 1
test-tool synthesize: use the unsafe hash for speed
t5608: add regression test for >4GB object clone
test-tool: add a helper to synthesize large packfiles
delta, packfile: use size_t for delta header sizes
odb, packfile: use size_t for streaming object sizes
git-zlib: handle data streams larger than 4GB
index-pack, unpack-objects: use size_t for object size
"git rebase --update-refs", when used with an rebase.instructionFormat
with "%d" (describe) in it, tried to update local branch HEAD by
mistake, which has been corrected.
Junio C Hamano [Tue, 19 May 2026 00:57:44 +0000 (09:57 +0900)]
Merge branch 'kh/name-rev-custom-format'
A new builtin "git format-rev" is introduced for pretty formatting
one revision expression per line or commit object names found in
running text.
* kh/name-rev-custom-format:
format-rev: introduce builtin for on-demand pretty formatting
name-rev: make dedicated --annotate-stdin --name-only test
name-rev: factor code for sharing with a new command
name-rev: run clang-format before factoring code
name-rev: wrap both blocks in braces
Junio C Hamano [Tue, 19 May 2026 00:57:43 +0000 (09:57 +0900)]
Merge branch 'en/xdiff-cleanup-3'
Preparation of the xdiff/ codebase to work with Rust.
* en/xdiff-cleanup-3:
xdiff/xdl_cleanup_records: make execution of action easier to follow
xdiff/xdl_cleanup_records: make setting action easier to follow
xdiff/xdl_cleanup_records: make limits more clear
xdiff/xdl_cleanup_records: use unambiguous types
xdiff: use unambiguous types in xdl_bogo_sqrt()
xdiff/xdl_cleanup_records: delete local recs pointer
Junio C Hamano [Tue, 19 May 2026 00:57:43 +0000 (09:57 +0900)]
Merge branch 'mc/http-emptyauth-negotiate-fix'
The 'http.emptyAuth=auto' configuration now correctly attempts
Negotiate authentication before falling back to manual credentials.
This allows seamless Kerberos ticket-based authentication without
requiring users to explicitly set 'http.emptyAuth=true'.
* mc/http-emptyauth-negotiate-fix:
doc: clarify http.emptyAuth values
t5563: add tests for http.emptyAuth with Negotiate
http: attempt Negotiate auth in http.emptyAuth=auto mode
http: extract http_reauth_prepare() from retry paths
Junio C Hamano [Sun, 17 May 2026 13:58:30 +0000 (22:58 +0900)]
Merge branch 'hn/git-checkout-m-with-stash'
"git checkout -m another-branch" was invented to deal with local
changes to paths that are different between the current and the new
branch, but it gave only one chance to resolve conflicts. The command
was taught to create a stash to save the local changes.
* hn/git-checkout-m-with-stash:
checkout -m: autostash when switching branches
checkout: rollback lock on early returns in merge_working_tree
sequencer: teach autostash apply to take optional conflict marker labels
sequencer: allow create_autostash to run silently
stash: add --label-ours, --label-theirs, --label-base for apply
Junio C Hamano [Sun, 17 May 2026 13:58:30 +0000 (22:58 +0900)]
Merge branch 'ss/t7004-unhide-git-failures'
Test clean-up.
* ss/t7004-unhide-git-failures:
t7004: avoid subshells to capture git exit codes
t7004: dynamically grab expected state in tests
t7004: drop hardcoded tag count for state verification
Junio C Hamano [Sun, 17 May 2026 13:58:29 +0000 (22:58 +0900)]
Merge branch 'en/backfill-fixes-and-edges'
The 'git backfill' command now rejects revision-limiting options that
are incompatible with its operation, uses standard documentation for
revision ranges, and includes blobs from boundary commits by default
to improve performance of subsequent operations.
* en/backfill-fixes-and-edges:
backfill: default to grabbing edge blobs too
backfill: document acceptance of revision-range in more standard manner
backfill: reject rev-list arguments that do not make sense
Philippe Blain [Fri, 15 May 2026 15:48:11 +0000 (15:48 +0000)]
diff-format.adoc: mode and hash are 0* for unmerged paths from index only
In the "Raw output format" section, we mention that the 'mode' and
'sha1' for "src" and "dst" are 0* if "(creation|deletion) or unmerged".
For unmerged entries, 'mode' and 'sha1' are in fact 0* only when we are
looking at the index, i.e. on the left side for 'git diff-files' and on
the right side for 'git diff-index --cached'. Be more precise by
mentioning this, and while at it uniformize the wording of the "work
tree out of sync with the index" case.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Fri, 15 May 2026 15:48:10 +0000 (15:48 +0000)]
diff-format.adoc: 'git diff-files' prints two lines for unmerged files
Since 10637b84d9 (diff-files: -1/-2/-3 to diff against unmerged stage.,
2005-11-29), for unmerged entries 'git diff-files' print both an
"unmerged" line ('U'), as well as an "in-place edit" line ('M')
comparing stage 2 (by default) with the working tree. The "Raw output
format" documentation however mentions that all commands print a single
line per changed file. Adjust diff-format.adoc to also mention this
special case, for completeness.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Fri, 15 May 2026 15:48:09 +0000 (15:48 +0000)]
diff-format.adoc: remove mention of diff-tree specific output
In the "Raw output format" section, we start by mentioning that 'git
diff-tree' prints the hashes of what is being compared. This is only
true in --stdin mode, and is already mentioned in the description of
'--stdin' in git-diff-tree.adoc. Remove this sentence such that we only
focus on the common output between diff-tree, diff-index, diff-files and
René Scharfe [Fri, 15 May 2026 07:33:53 +0000 (09:33 +0200)]
trailer: change strbuf in-place in unfold_value()
Avoid an allocation by doing s/\n\s*/ /g (replacing NL and any following
whitespace with a SP) right in the strbuf instead of copying the result
to a temporary one and swapping them in the end. We can safely do that
because the replacement is never longer than the original string.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Sat, 16 May 2026 02:23:10 +0000 (22:23 -0400)]
commit: handle large commit messages in utf8 verification
Running t4205 under UBSan with the EXPENSIVE prereq enabled triggers an
error when we try to create a commit message that is over 2GB:
commit.c:1574:6: runtime error: signed integer overflow:
-2147483648 - 1 cannot be represented in type 'int'
The problem is that find_invalid_utf8() is not prepared to handle
large buffers, as it uses an "int" to represent buffer sizes and
offsets.
We can fix this with a few changes:
1. We'll take in "len" as a size_t (which is what the caller has
anyway, since it's working with a strbuf).
2. We need to return a size_t to give the offset to the invalid utf8,
but we also need a sentinel value for "no invalid value"
(previously "-1"). Let's split these to return a bool for "found
invalid utf8" and then pass back the offset as an out-parameter.
We'll switch the function name to match the new semantics.
3. The caller in verify_utf8() uses a "long" to store buffer
positions, which is a bit funny. This goes back to 08a94a145c
(commit/commit-tree: correct latin1 to utf-8, 2012-06-28) and is
perhaps trying to match our use of "unsigned long" for object sizes
(though we don't care about it ever becoming negative here). This
should be a size_t, too, as some platforms (like Windows) still use
a 32-bit long on machines with 64-bit pointers.
4. The "bytes" field within find_invalid_utf() does not have range
problems. It is the number of bytes the utf8 sequence claims to
have, so is limited by how many bits can be set in a single 8-bit
byte. However, if we leave it as an "int" then the compiler will
complain about the sign mismatch when comparing it to "len". So
let's make it unsigned, too.
All of this is a little silly, of course, because 2GB text commit
messages are clearly nonsense. So we might consider rejecting them
outright, but it is easy enough to make these helper functions more
robust in the meantime.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Sat, 16 May 2026 02:16:22 +0000 (22:16 -0400)]
apply: plug leak on "patch too large" error
In apply_patch(), we return immediately if read_patch_file() returns an
error. Traditionally this was OK, since an error from strbuf_read()
would restore the strbuf to its unallocated state.
But since f1c0e3946e (apply: reject patches larger than ~1 GiB,
2022-10-25), we may also return an error if we successfully read the
patch but it is too large. In this case we leak the strbuf contents when
apply_patch() returns.
You can see it in action by running t4141 under LSan with the EXPENSIVE
prereq enabled.
We can fix this in one of two places:
1. In read_patch_file(), we could release the buffer before returning
the error, behaving more like a raw strbuf_read() call.
2. In apply_patch(), we can release the strbuf ourselves before
returning.
I picked the latter, since it future proofs us against read_patch_file()
getting new error modes. We also have a cleanup label in that function
already, so now our error handling at this spot matches the rest of the
function (and all of the variables are initialized such that the rest of
the cleanup is correctly a noop at this point).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
limit_list() maintains a date-sorted work queue of commits using a
linked list with commit_list_insert_by_date() for insertion. Each
insertion walks the list to find the right position — O(n) per insert.
In repositories with merge-heavy histories, the symmetric difference
can contain thousands of commits, making this O(n) insertion the
dominant cost.
Replace the sorted linked list with a prio_queue (binary heap). This
gives O(log n) insertion and O(log n) extraction instead of O(n)
insertion and O(1) extraction, which is a net win when the queue is
large.
The still_interesting() and everybody_uninteresting() helpers are
updated to scan the prio_queue's contiguous array instead of walking a
linked list. process_parents() already accepts both a commit_list and
a prio_queue parameter, so the change in limit_list() simply switches
which one is passed.
Benchmark: git rev-list --left-right --count HEAD~N...HEAD
Repository: 2.3M commits, merge-heavy DAG (monorepo)
Best of 5 runs, times in seconds:
The "gc.auto" configuration has traditionally been used to turn off
running git-gc(1) as part of our auto-maintenance. We have eventually
switched over to git-maintenance(1) in a95ce12430 (maintenance: replace
run_auto_gc(), 2020-09-17), and with 1942d48380 (maintenance: optionally
skip --auto process, 2020-08-28) we have introduced "maintenance.auto"
to control whether or not to run auto-maintenance.
At that point though we still shelled out to git-gc(1) internally. So
if "gc.auto=0" was set we would still _execute_ git-maintenance(1), but
the command would have exited fast because git-gc(1) itself knew to
honor the config key.
This has recently changed though, as we have adapted the default
maintenance strategy to not use git-gc(1) anymore. The consequence is
that "gc.auto=0" doesn't have an effect anymore, which is a somewhat
surprising change in behaviour for our users.
Adapt `run_auto_maintenance()` so that it knows to also read "gc.auto",
similar to how it also reads both "maintenance.autoDetach" and
"gc.autoDetach".
Reported-by: Jean-Christophe Manciot <actionmystique@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running git-maintenance(1), we create a lockfile that is supposed
to keep other maintenance processes from running at the same time. This
lockfile is broken though in case the "--detach" flag is passed: the
lockfile is created by the parent process and will be cleaned up either
manually or on exit. But when detaching, the parent will exit before all
of the background maintenance tasks have been run, and consequently the
lock only covers a smaller part of the whole maintenance process.
Fix this bug by reassigning all tempfiles from the parent process to the
child process when daemonizing so that it becomes the responsibility of
the child to clean them up.
Note that this is a broader fix, as we now always reassign tempfiles
when daemonizing. This is a natural consequence of the semantics of
`daemonize()` though, as it essentially promises to continue running the
current process in the background. It is thus sensible to have that
function perform the whole dance of assigning resources to the child
process, including tempfiles.
There's only a single other caller in "daemon.c", but that process
doesn't create any tempfiles before the call to `daemonize()` and is
thus not impacted by this change.
Reported-by: Jean-Christophe Manciot <actionmystique@gmail.com> Helped-by: Jeff King <peff@peff.net> Helped-by: Derrick Stolee <stolee@gmail.com> Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Michael Montalbo [Tue, 12 May 2026 18:10:23 +0000 (18:10 +0000)]
parse-options: clarify what "negated" means for PARSE_OPT_NONEG
The documentation says the flag prevents an option from being
"negated" without specifying what that means. Add a parenthetical
to clarify that it rejects the "--no-<option>" form.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Michael Montalbo [Tue, 12 May 2026 18:10:22 +0000 (18:10 +0000)]
xdiff: guard against negative context lengths
The xdemitconf_t fields ctxlen and interhunkctxlen are typed as long
(signed), but negative values are not meaningful for context line
counts. Unlike the diff_options fields changed in the previous two
commits, these cannot be converted to unsigned because the xdiff
arithmetic relies on signed subtraction:
s1 = XDL_MAX(xch->i1 - xecfg->ctxlen, 0);
If ctxlen were unsigned long, the signed operand would be implicitly
converted to unsigned, and the subtraction would wrap to a large
positive value when i1 < ctxlen, defeating the XDL_MAX clamp. The
signed type is required for correct context-window calculations.
The previous two commits reject negative values at the parse layer
for --inter-hunk-context and -U/--unified, so negative values should
no longer reach xdiff in normal use. Add BUG() guards at the top of
xdl_get_hunk() as defense in depth to catch programming errors in
current or future callers that bypass option parsing.
xdl_get_hunk() is called by both xdl_emit_diff() and
xdl_call_hunk_func(), so a single guard covers all xdiff consumers.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Line 503 of a 106-line file, count "999-" is not a valid integer.
The config variable diff.context already rejects negative values, but
the command line callback diff_opt_unified() uses strtol() with no
range check.
Change the type of diff_options.context and its static default from
int to unsigned int, matching the change to interhunkcontext in the
previous commit. The type change requires reworking the callback and
config parsing to validate in a local variable before assigning to
the now-unsigned field.
Unlike --inter-hunk-context which could be converted to OPT_UNSIGNED,
-U needs OPT_CALLBACK_F for PARSE_OPT_OPTARG (bare -U with no value
enables patch output). Add a range check in the callback instead.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hunk 1 covers lines 110-115, hunk 2 starts at 115 (overlap), hunk 3
starts at 116 (overlaps both). The resulting patch cannot be applied.
The config variable diff.interHunkContext already rejects negative
values, but the command line option does not.
Change the type of diff_options.interhunkcontext and its static
default from int to unsigned int, and switch the option parser from
OPT_INTEGER_F to OPT_UNSIGNED. This rejects negative values at parse
time via git_parse_unsigned() and enforces the correct type at compile
time via BARF_UNLESS_UNSIGNED.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 12 May 2026 16:26:19 +0000 (12:26 -0400)]
http: handle absolute-path alternates from server root
When a dumb http server reports alternates with an absolute path, we try
to paste that onto the root of the URL we're trying to fetch from. So if
we go to "http://example.com/path/to/child.git" and it tells us about an
alternate at "/parent.git", we'll hit "http://example.com/parent.git".
But there's a bug in computing the base when the URL does not have any
path component at all, like "http://example.com". When looking for the
first slash after the host, strchr() returns NULL, and we compute a
nonsense value for the length of the host portion. And then when we use
that length to copy the base of the URL into a strbuf, we're likely to
fail.
The security implications are minimal here. We store the nonsense length
("serverlen") as an int, so on a 64-bit system it may effectively be
anything (it is zero minus a 64-bit heap pointer, then truncated to
32-bits and stuffed into a signed value). When we feed that length to
strbuf_add(), it is cast into a size_t and one of four things will
happen:
1. If serverlen was negative, it will turn into a very large positive
value and strbuf_add() will fail to allocate, ending the program.
Ditto if serverlen was positive but just very large.
This doesn't really get an attacker anything; the victim will just
fail to clone their evil repo.
2. If serverlen was small enough, we'll successfully extend the target
strbuf, and then copy an arbitrary set of bytes from "base". And
then one of these is true:
a. That set of bytes is much larger than the length of the "base"
string. This is an out-of-bounds read, but there's no
out-of-bounds write, since the strbuf code both allocates and
copies using the same size_t. This is likely to cause a
segfault as we try to read unmapped pages of memory.
b. Like (2a), but if the set of bytes is small enough we might
not segfault. We might read random memory from the process and
copy it into the "target" strbuf.
What happens then? We know that "base" ends with a NUL
terminator, which will be copied into "target" as well. So
even though target.len might be 1000 bytes (or whatever), when
interpreted as a NUL-terminated string, target.buf is still
the exact same string as "base".
And that's all we ever do with target: pass it around as a C
string, and then eventually strbuf_detach() it to become a C
string. So even though there was arbitrary memory copied into
the strbuf, we never access it.
c. The other interesting case is when serverlen is actually
_shorter_ than the length of base. And there we truncate the
string. Probably in a way that makes it totally invalid, but
if you were very unlucky you could turn something like:
http://victim.com.evil.domain:8000
into:
http://victim.com
Which looks like the start of a redirect attack, except that
the attacker could just have written "http://victim.com" in
the first place! Either way we feed it to
is_alternate_allowed(), which is where we check redirect and
protocol rules.
I think we can just treat this like a regular bug.
And it's quite a weird setup in the first place, as it implies that the
root of the web server is serving a repository (i.e., that you can get
something useful from "http://example.com/info/refs"). The bug has been
there since b3661567cf ([PATCH] Add support for alternates in HTTP,
2005-09-14) without anybody noticing.
I kind of doubt anybody really cares about making this work, but it's
easy enough to do so: the host-portion of the URL ends at either the
first slash or the end-of-string. So we can just replace strchr() with
strchrnul().
The test setup is a little gross, as we take over the httpd document
root by shoving our bare-repo components into it. But it demonstrates
the problem and shows that our solution actually allows the alternate to
function, if the server is configured to allow it.
Reported-by: slonkazoid <slonkazoid@slonk.ing> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 12 May 2026 16:20:22 +0000 (12:20 -0400)]
pretty: drop strbuf pre-sizing from add_rfc2047()
At the top of add_rfc2047() we do this:
strbuf_grow(sb, len * 3 + strlen(encoding) + 100);
where "len" is the size of the header (like an author name) we are about
to encode into the buffer. This pre-sizing is purely an optimization; we
use strbuf_addf() and friends to actually write into the buffer, and
they will grow the buffer as necessary.
But there's a problem with the code above: the input can be arbitrarily
large, so we might overflow a size_t while doing that computation,
ending up with a too-small allocation request. Overflowing requires an
impractically large input on a 64-bit system, but is easy to demonstrate
on a 32-bit system with a commit whose author name is ~1.4GB.
Because this pre-sizing is just an optimization, there's no real harm.
We'll start with a smaller buffer and grow it as necessary. But it
_looks_ like a vulnerability, since some other code may pre-size a
strbuf and then write directly into its buffer. So it's worth avoiding
the overflow in the first place.
The obvious way to do that is via checked operations like st_add() and
friends. But taking a step back, is this pre-sizing actually helping
anything?
The computation goes all the way back to 4234a76167 (Extend
--pretty=oneline to cover the first paragraph,, 2007-06-11), but back
then we really were sizing the array to write into directly! In 674d172730 (Rework pretty_print_commit to use strbufs instead of custom
buffers., 2007-09-10) that switched to a strbuf, and at that point it
was a pure optimization.
Is the optimization helping? I don't think so. Even for a gigantic case
like the 1.4GB author name, I couldn't measure any slowdown when
removing it. And most input will be much smaller, and added to a running
strbuf containing the rest of the email-header output. We can just rely
on strbuf's usual amortized-linear growth.
So deleting the line seems like the best way to go. It eliminates the
integer overflow and makes the code a tiny bit simpler.
Reported-by: Luke Martin <lmartin@paramenoeng.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
To help Windows 10 installations, avoid removing files whose
contents are still mmap()'ed.
* js/maintenance-fix-deadlock-on-win10:
maintenance(geometric): do release the `.idx` files before repacking
mingw: optionally use legacy (non-POSIX) delete semantics
Junio C Hamano [Tue, 12 May 2026 02:04:44 +0000 (11:04 +0900)]
Merge branch 'js/ci-github-actions-update'
Update various GitHub Actions versions.
* js/ci-github-actions-update:
l10n: bump mshick/add-pr-comment from v2 to v3
ci: bump git-for-windows/setup-git-for-windows-sdk from v1 to v2
ci: bump actions/checkout from v5 to v6
ci: bump actions/github-script from v8 to v9
ci: bump actions/{upload,download}-artifact to v7 and v8
ci: bump microsoft/setup-msbuild from v2 to v3
Taylor Blau [Tue, 12 May 2026 00:47:12 +0000 (20:47 -0400)]
pack-bitmap: prevent pattern leak on pseudo-merge re-assignment
When "bitmapPseudoMerge.*.pattern" appears more than once for the same
group, `pseudo_merge_config()` frees the old `regex_t *` pointer
but does not call `regfree()` on it first. This leaks whatever internal
state `regcomp()` allocated.
The final cleanup path in `pseudo_merge_group_release()` does call
`regfree()` before `free()`, so only the intermediate replacement is
affected.
Fix this by guarding the replacement with a NULL check and calling
`regfree()` before `free()` when the pointer is non-NULL.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 12 May 2026 00:47:09 +0000 (20:47 -0400)]
Documentation: fix broken `sampleRate` in gitpacking(7)
The documentation explaining some sample configurations for bitmap
pseudo-merges incorrectly uses a sample rate outside of the allowed
(0,1] range.
This dates back to faf558b23ef (pseudo-merge: implement support for
selecting pseudo-merge commits, 2024-05-23), and was likely written when
the allowable range for this configuration was the integral values
between (0,100].
Fix this to conform to the actual allowable range for this
configuration.
Noticed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 12 May 2026 00:47:06 +0000 (20:47 -0400)]
pack-bitmap: reject pseudo-merge "sampleRate" of 0
The "bitmapPseudoMerge.*.sampleRate" configuration controls what
fraction of unstable commits are included in each pseudo-merge group.
The config validation accepts values in the range `[0, 1]`, but a value
of exactly 0 causes a division by zero in `select_pseudo_merges_1()`:
if (j % (uint32_t)(1.0 / group->sample_rate))
When `sample_rate` is 0, `1.0 / 0.0` produces `+inf`, and casting
infinity to `uint32_t` is undefined behavior in C. On most platforms
this yields 0, making the subsequent modulo operation (`j % 0`) a
fatal arithmetic trap.
This path was not previously reachable because an earlier bug caused
all pseudo-merge candidates to be classified as "stable" (where the
sampling rate is not used), regardless of their actual commit date. Now
that the date classification is fixed, the unstable path is exercised
and the division by zero can fire.
Fix this by changing the validation to require a strict lower bound and
thus reject 0.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 12 May 2026 00:47:03 +0000 (20:47 -0400)]
pack-bitmap: parse commits in `find_pseudo_merge_group_for_ref()`
`find_pseudo_merge_group_for_ref()` uses the commit's date to classify
it as either "stable" (older than the stable threshold) or "unstable"
(otherwise).
However, to find the relevant commit from a given OID, the function
`find_pseudo_merge_group_for_ref()` uses `lookup_commit()` which does
not parse commits.
Because an unparsed commit has its "date" set to zero, every candidate
is placed in the "stable" bucket regardless of its actual committer
timestamp. This means the `bitmapPseudoMerge.*.threshold` and
`stableThreshold` configuration options have no effect: the
stable/unstable split is always determined by comparing against zero
rather than the real commit date.
The net result is that pseudo-merge groups are partitioned by
`stableSize` instead of the intended decay-based sizing, and the
`sampleRate` knob (which only applies to the unstable path) is never
exercised.
Fix this by calling `repo_parse_commit()` after `lookup_commit()`,
bailing out of the callback if parsing fails.
The corresponding test configures two pseudo-merge groups that both
match all tags. The "stable" group uses `threshold=1.month.ago`, and the
"all" group uses `threshold=now`. The test use our custom
"GIT_TEST_DATE_NOW" environment variable by setting it to the value of
"$test_tick" to align Git's notion of "now" (and therefore
"1.month.ago") with the `test_tick` timestamps, so the commits appear to
be younger than one month: only the "all" group matches them, producing
exactly one pseudo-merge.
Without the fix every commit has `date == 0`, which satisfies `date <=
threshold` for both groups (since 0 is older than one month ago), and
the "stable" group erroneously matches as well.
Now that commits are correctly classified as "unstable", the bug
described in the test exercising the "sampleRate=0" test is reachable,
and the test is marked as failing. It will be fixed in a following
commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 12 May 2026 00:47:00 +0000 (20:47 -0400)]
pack-bitmap: fix pseudo-merge lookup for shared commits
When a commit appears in more than one pseudo-merge group, its entry in
the commit lookup table has the high bit set in its offset field,
indicating that the offset points to an "extended" table containing the
set of pseudo-merges for that commit.
There are three bugs in this path:
* The `next_ext` offset in `write_pseudo_merges()` undercounts the
per-entry size of the lookup table (8 vs. 12 bytes).
* `nth_pseudo_merge_ext()` calls `read_pseudo_merge_commit_at()` on a
pseudo-merge bitmap offset, misinterpreting it as a 12-byte commit
table entry.
* The error check after `pseudo_merge_ext_at()` in
`apply_pseudo_merges_for_commit()` tests `< -1` instead of `< 0`,
silently swallowing errors from `error()`.
The first bug is on the write side: each commit lookup entry contains a
4- and 8-byte unsigned value for a total of 12 bytes, but the
calculation assumes that the entry only contains 8 bytes of data. This
makes `next_ext` too small, so the extended-table offsets that get
written point into the middle of the non-extended lookup table rather
than past it. The reader then interprets non-extended lookup data as
extended entries, producing garbage.
The second bug is on the read side and is independently fatal: even with
a correctly positioned extended table, `nth_pseudo_merge_ext()` feeds
the offset it reads (which points at pseudo-merge bitmap data) to
`read_pseudo_merge_commit_at()`. That function tries to parse 12 bytes
as a `pseudo_merge_commit` struct, clobbering `merge->pseudo_merge_ofs`
with whatever happens to be at that location. The caller only needs
`pseudo_merge_ofs`, so the fix is to store the offset directly rather
than re-parsing a commit table entry. The `commit_pos` field is left
untouched, retaining the value that `find_pseudo_merge()` set earlier.
The third bug is latent. With the first two fixes applied, the extended
table is correctly written and read, so `pseudo_merge_ext_at()` does not
fail during normal operation. The `< -1` vs `< 0` distinction only
matters when the bitmap file is corrupt or truncated, in which case the
error would be silently ignored and the code would proceed with
uninitialized data.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 12 May 2026 00:46:57 +0000 (20:46 -0400)]
pack-bitmap: fix inverted binary search in `pseudo_merge_at()`
The binary search in `pseudo_merge_at()` has its "lo" and "hi" updates
swapped: when the midpoint's offset is less than the target, it sets `hi
= mi` (searching left) instead of `lo = mi + 1` (searching right), and
vice versa.
This means that lookups for pseudo-merges whose offset is not near the
midpoint of the pseudo-merge table are likely to fail.
In practice, with a single pseudo-merge group this is masked because the
lone entry is always at the midpoint. With multiple groups, the inverted
comparisons cause lookups to search in the wrong direction, potentially
missing entries.
Swap the "lo" and "hi" assignments to search in the correct direction,
making it possible to apply pseudo-merges during fill-in when more than
one pseudo-merge exists in a group.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 12 May 2026 00:46:54 +0000 (20:46 -0400)]
pack-bitmap-write: sort pseudo-merge commit lookup table in pack order
The pseudo-merge commit lookup table stores each commit's position in
the pack- or pseudo-pack order, and is used to perform a binary search
in order to determine which pseudo-merge(s) a given commit belongs to.
However, the table was previously sorted in lexical order (via
`oid_array_sort()`), causing the binary search to fail.
While this causes pseudo-merge bitmaps to be de-facto broken for fill-in
traversal, there are a couple of important points to keep in mind:
* Pseudo-merge application during the initial phases of a bitmap-based
traversal are applied via `cascade_pseudo_merges_1()`. This function
enumerates the known pseudo-merges and determines if its parents are
a subset of the traversal roots.
This is a different path than the fill-in traversal, where we are
looking for any pseudo-merges which may be satisfied after visiting
some commit along an object walk, which involves the aforementioned
(broken) binary search.
As a consequence, any pseudo-merges we apply at this stage are done
so correctly.
* While this bug makes applying pseudo-merges during fill-in traversal
effectively broken, it does not produce wrong results. Instead of
applying the *wrong* pseudo-merge, we will simply fail to find
satisfied pseudo-merges, leaving the traversal to use the existing
fill-in routines.
Fix this by sorting the table by bit position before writing, matching
the order that the reader's binary search expects.
This does produce a change the on-disk format insofar as the actual code
now complies with the documented format (for more details, refer to:
Documentation/technical/bitmap-format.adoc). Given that this never
worked in the first place, such a change should be OK to perform.
If an out-of-tree implementation of pseudo-merges happened to generate
bitmaps that comply with the documented format, they will continue to be
read and interpreted as normal.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 12 May 2026 00:46:51 +0000 (20:46 -0400)]
t5333: demonstrate various pseudo-merge bugs
Using the test helper introduced via the previous commit, add various
failing tests demonstrating bugs in the pseudo-merge implementation.
These are all marked as failing with one exception. The "sampleRate=0"
test describes a latent bug, which is only reachable through a code path
that is itself masked by a separate bug. A future commit will fix that
bug, and, in turn, cause the aforementioned test to fail. Accordingly,
that commit will mark the test as failing, and it will be re-marked as
passing in a separate commit which fixes the once-latent bug.
For the rest: the following commits will explain and fix the underlying
bugs in detail.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 12 May 2026 00:46:48 +0000 (20:46 -0400)]
t/helper: add 'test-tool bitmap write' subcommand
In f16eb1c091 (pseudo-merge: fix disk reads from find_pseudo_merge(),
2026-03-31), we noted that `apply_pseudo_merges_for_commit()` is never
triggered by the existing test suite, and that this bears further
investigation.
This patch is the first one to begin that investigation. The following
patches will expose and fix a variety of bugs in the implementation of
pseudo-merge bitmaps.
In order to do so, however, many of these tests require very precise
selection of which commits receive bitmaps and which do not. To date,
there isn't a standard approach to easily facilitate this. Address this
by introducing a `test-tool bitmap write` subcommand that writes a
bitmap for a given packfile, reading the set of commits which should
receive individual bitmaps from stdin like so:
, where "<pack-basename>" is the filename for a specific packfile (e.g.,
"pack-abc123.pack"), and "/path/to/commits.list" is a list of commit
OIDs which will receive bitmaps.
The helper respects `bitmapPseudoMerge.*` configuration for creating
pseudo-merge bitmaps alongside the regular commit bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Abhinav Gupta [Mon, 11 May 2026 12:21:53 +0000 (12:21 +0000)]
sequencer: remove todo_add_branch_context.commit
The 'commit' field in 'struct todo_add_branch_context' is unused.
It's written to, but never read from.
add_decorations_to_list() gets the commit passed to it explicitly
as an argument.
Signed-off-by: Abhinav Gupta <mail@abhinavg.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-reach: early exit paint_down_to_common for single merge-base
Commits not in the commit-graph get GENERATION_NUMBER_INFINITY and
sort to the top of the priority queue. After those, commits with
finite generation numbers are popped in non-increasing order.
When MERGE_BASE_FIND_ALL is not set the first doubly-painted commit
with a finite generation is therefore a best merge-base: no commit
still in the queue can be a descendant of it. Skip the expensive
STALE drain in this case.
Add MERGE_BASE_FIND_ALL to the merge_base_flags enum. Callers that
need every merge-base (repo_get_merge_bases_many, repo_get_merge_bases,
repo_in_merge_bases_many, remove_redundant_no_gen) pass the flag to
preserve existing behavior. git merge-base (without --all) passes 0,
triggering the early exit.
On a 2.2M-commit merge-heavy monorepo with commit-graph:
HEAD vs ~500: 5,229ms -> 24ms
HEAD vs ~1000: 4,214ms -> 39ms
HEAD vs ~5000: 3,799ms -> 46ms
HEAD vs ~10000: 3,827ms -> 61ms
Signed-off-by: Kristofer Karlsson <krka@spotify.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the boolean ignore_missing_commits parameter in
paint_down_to_common() with an enum merge_base_flags, and thread
the flags through merge_bases_many(), get_merge_bases_many_0(),
and the public repo_get_merge_bases_many_dirty() API.
This makes callsites with boolean parameters easier to read and
prepares the function for additional flags in a subsequent commit.
No functional change: the single caller that used
ignore_missing_commits (repo_in_merge_bases_many) now sets
MERGE_BASE_IGNORE_MISSING_COMMITS in the flags word, and all
other callers pass 0.
Signed-off-by: Kristofer Karlsson <krka@spotify.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
format-rev: introduce builtin for on-demand pretty formatting
Introduce a new builtin for pretty formatting one revision expression
per line or commit object names found in running text.
Sometimes you want to format commits. Most of the time you’re
walking the graph, e.g. getting a range of commits like
`master..topic`. That’s a job for git-log(1).
But there are times when you want to format commits that you encounter
on demand:
• Full hashes in running text that you might want to pretty-print
• git-last-modified(1) outputs full hashes that you can do the same
with
• git-cherry(1) has `-v` for commit subject, but maybe you want
something else?
But now you can’t use git-log(1), git-show(1), or git-rev-list(1):
• You can’t feed commits piecemeal to these commands, one input
for one output; they block until standard in is closed
• You can’t feed a list of possibly duplicate commits, like the output
of git-last-modified(1); they effectively deduplicate the output
Beyond these two points there’s also the input massage problem: you
cannot feed mixed input (revisions mixed with arbitrary text).
One might hope that git-cat-file(1) can save us. But it doesn’t
support pretty formats.
But there is one command that already both handles revisions as
arguments, revisions on standard input, and even revisions mixed in
with arbitrary text. Namely git-name-rev(1): the command for outputting
symbolic names for commits.
We made some room in `builtin/name-rev.c` two commits ago. Let’s
now add this new git-format-rev(1) command. Taking inspiration from
git-name-rev(1), there are two modes:
• revs: like git-name-rev(1) in argv mode, but one revision per line
on standard in
• text: like git-name-rev(1) with `--annotate-stdin`
***
We need to add this command to the exception list in
`t/t1517-outside-repo.sh` because it uses “EXPERIMENTAL!”
in the usage line.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
name-rev: make dedicated --annotate-stdin --name-only test
The previous commit split the `--name-only` handling:
1. `--annotate-stdin`: uses the new `struct command`
2. The rest: uses `struct name_ref_data`
But there is no dedicated test for the option combination in (1). That
means that the following tests will fail if you neglect to set
`command.u.name_only` properly:
name-rev --annotate-stdin works with commitGraph
name-rev --annotate-stdin works with non-monotonic timestamps
even though it has nothing to do with what these tests are supposed
to test.
Let’s add another regression test now that it is relevant.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
name-rev: factor code for sharing with a new command
We are about to introduce a new command git-format-rev(1) to this
file. Let’s factor some code so that we can share it with the new
command.
We want to be able to format commits found in freeform text, and
git-name-rev(1) already has a function for that but for symbolic
names. Let’s use a tagged union for the command-specific payload.
No functional changes.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>