Junio C Hamano [Wed, 4 Feb 2026 17:22:21 +0000 (09:22 -0800)]
Merge branch 'ar/run-command-hook-take-2' into ar/config-hooks
* ar/run-command-hook-take-2:
receive-pack: convert receive hooks to hook API
receive-pack: convert update hooks to new API
run-command: poll child input in addition to output
hook: add jobs option
reference-transaction: use hook API instead of run-command
transport: convert pre-push to hook API
hook: allow separate std[out|err] streams
hook: convert 'post-rewrite' hook in sequencer.c to hook API
hook: provide stdin via callback
run-command: add stdin callback for parallelization
run-command: add helper for pp child states
t1800: add hook output stream tests
Junio C Hamano [Tue, 3 Feb 2026 21:26:00 +0000 (13:26 -0800)]
diff-highlight: allow testing with Git 3.0 breaking changes
The diff-highlight (in contrib/) comes with its own test script,
which relies on the initial branch name being 'master'. This is not
just encoded in the test logic, but in the illustration in the file
that shows the topology of the history.
Force the initial branch name to 'master' to allow it pass.
Toon Claes [Tue, 3 Feb 2026 10:29:03 +0000 (11:29 +0100)]
cocci: extend MEMZERO_ARRAY() rules
Recently the MEMZERO_ARRAY() macro was introduced. In that commit also
coccinelle rules were added to capture cases that can be converted to
use that macro.
Later a few more cases were manually converted to use the macro, but
coccinelle didn't capture those. Extend the rules to capture those as
well.
In various cases the code could be further beautified by removing
parentheses which are no longer needed. Modify the coccinelle rules to
optimize those as well and fix them.
During conversion indentation also used spaces where tabs should be
used, fix that in one go.
Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pushkar Singh [Tue, 3 Feb 2026 16:48:16 +0000 (16:48 +0000)]
subtree: validate --prefix against commit in split
git subtree split currently validates --prefix against the working tree.
This breaks when splitting an older commit or when the working tree does
not contain the subtree, even though the commit does.
For example:
git subtree split --prefix=pkg <commit>
fails if pkg was removed later, even though it exists in <commit>.
Fix this by validating the prefix against the specified commit using
git cat-file instead of the working tree.
Add a test to ensure this behavior does not regress.
Signed-off-by: Pushkar Singh <pushkarkumarsingh1970@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
47beb37b (shortlog: match commit trailers with --group, 2020-09-27)
added the `trailer` bullet point with three paragraphs.[1] Later, 3dc95e09 (shortlog: support arbitrary commit format `--group`s,
2022-10-24) put the single-paragraph bullet point about `format` right
after the first paragraph about `trailer`. That meant that the second
and third paragraphs for `trailer` got moved to `format`.
Move the two paragraphs back to `trailer`. We now also need one blank
line before the final bullet point so that it does not get joined with
the second bullet point.
† 1: Technically the bullet list formatting was immediately fixed to
include all three paragraphs in 63d24fa0 (shortlog: allow multiple
groups to be specified, 2020-09-27)
Acked-by: Jeff King <peff@peff.net> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Tue, 3 Feb 2026 00:10:02 +0000 (18:10 -0600)]
odb: transparently handle common transaction behavior
A new ODB transaction is created and returned via
`odb_transaction_begin()` and stored in the ODB. Only a single
transaction may be pending at a time. If the ODB already has a
transaction, the function is expected to return NULL. Similarly, when
committing a transaction via `odb_transaction_commit()` the transaction
being committed must match the pending transaction and upon commit reset
the ODB transaction to NULL.
These behaviors apply regardless of the ODB transaction implementation.
Move the corresponding logic into `odb_transaction_{begin,commit}()`
accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Tue, 3 Feb 2026 00:10:01 +0000 (18:10 -0600)]
odb: prepare `struct odb_transaction` to become generic
An ODB transaction handles how objects are stored temporarily and
eventually committed. Due to object storage being implemented
differently for a given ODB source, the ODB transactions must be
implemented in a manner specific to the source the objects are being
written to. To provide generic transactions, `struct odb_transaction` is
updated to store a commit callback that can be configured to support a
specific ODB source. For now `struct odb_transaction_files` is the
only transaction type and what is always returned when starting a
transaction.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Tue, 3 Feb 2026 00:10:00 +0000 (18:10 -0600)]
object-file: rename transaction functions
In a subsequent commit, ODB transactions are made more generic to
facilitate each ODB source providing its own transaction handling.
Rename `object_file_transaction_{begin,commit}()` to
`odb_transaction_files_{begin,commit}()` to better match the future
source specific transaction implementation.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Tue, 3 Feb 2026 00:09:59 +0000 (18:09 -0600)]
odb: store ODB source in `struct odb_transaction`
Each `struct odb_transaction` currently stores a reference to the
`struct object_database`. Since transactions are handled per object
source, instead store a reference to the source.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 1 Feb 2026 11:47:53 +0000 (12:47 +0100)]
blame: fix coloring for repeated suspects
The option --ignore-rev passes the blame to an older commit. This can
cause adjacent scoreboard entries to blame the same commit. Currently
we only look at the present entry when determining whether a line needs
to be colored for --color-lines. Check the previous entry as well.
Reported-by: Seth McDonald <sethmcmail@pm.me> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
HodaSalim [Mon, 2 Feb 2026 16:18:00 +0000 (18:18 +0200)]
t9160:modernize test path checking
Replace old-style path checks with Git's dedicated test helpers:
- test -f → test_path_is_file
- test -d → test_path_is_dir
- test -s → test_file_not_empty
Fix typos with the word "subsequent"
Found using: git grep "test -[efd]" t/
This improves test readability and provides better error messages
when path checks fail.
Signed-off-by: HodaSalim <hoda.s.salim@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
.github/CONTRIBUTING.md: link to SubmittingPatches on git-scm.com
The relative link to SubmittingPatches is broken when viewed through
GitHub's specialized "Contributing" tab. Update the link to point to
the documentation on git-scm.com to be consistent with other links in
the same file. Also, wrap the line to improve readability.
Signed-off-by: Abdalrhman Mohamed <Eng.Abdalrhman.Abdalmonem@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Sat, 31 Jan 2026 13:32:54 +0000 (21:32 +0800)]
Merge branch 'jx/zh_CN' of github.com:jiangxin/git
* 'jx/zh_CN' of github.com:jiangxin/git:
l10n: zh_CN: standardize glossary terms
l10n: zh_CN: updated translation for 2.53
l10n: zh_CN: fix inconsistent use of standard vs. wide colons
Jiang Xin [Fri, 30 Jan 2026 02:38:47 +0000 (10:38 +0800)]
l10n: zh_CN: standardize glossary terms
Add preferred Chinese terminology notes and align existing translations
to the updated glossary. AI-assisted review was used to check and
improve legacy translations.
Tian Yuchen [Fri, 30 Jan 2026 17:01:23 +0000 (01:01 +0800)]
t/perf/p3400: speed up setup using fast-import
The setup phase in 't/perf/p3400-rebase.sh' generates 100 commits to
simulate a noisy history. It currently uses a shell loop that invokes
'git add', 'git commit', 'test_seq', and 'sort' in each iteration.
This incurs significant overhead due to repeated process spawning.
Optimize the setup by using 'git fast-import' to generate the commit
history. Additionally, pre-compute the forward and reversed file contents
to avoid repetitive execution of 'seq' and 'sort'.
To ensure the test measures rebase performance against a consistent
object layout (rather than the suboptimal pack/loose objects created
by the raw import), perform a full repack (`git repack -a -d`) at the
end of the setup.
This reduces the setup time significantly while maintaining the validity
of the subsequent performance tests.
Performance enhancement (Average value of 5 tests):
Real Rebase
Before: 29.045s 13.34s
After: 21.989s 12.84s
Measured on Lenovo Yoga 2020, Ubuntu 24.04.
Signed-off-by: Tian Yuchen <a3205153416@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
show-index: warn when falling back to SHA-1 outside a repository
When 'git show-index' is run outside of a repository and no hashing
algorithm is specified via --object-format, it silently falls back
to SHA-1, relying on the historical default.
This works for existing SHA-1 based index files, but the behavior can
be ambiguous and confusing when the input index file uses a different
hash algorithm, such as SHA-256.
Add a warning when this fallback happens to make the assumption
explicit and to guide users toward using --object-format when needed.
Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix this error by ensuring that the given revision peels to a commit.
This change also adds a test to verify git-last-modified(1) can operate
on an annotated tag. For this an annotated tag is added that points to
the second commit. But this causes ambiguous results when calling
git-name-rev(1) with `--tags`, because now two tags point to the same
commit. To remove this ambiguity, pass `--exclude=<tag>` to
git-name-rev(1) to exclude the new annotated tag.
Reported-by: Gusted <gusted@codeberg.org> Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Chris Idema [Thu, 29 Jan 2026 11:09:04 +0000 (11:09 +0000)]
git-gui: shift tabstops to account for the first column of patch text
When reviewing a change before staging, it is desirable to see text after
tabstops aligned the same way as in the text editor. However, since there
is always an additional character in column one in patch text ('+', '-',
or space), the alignment is broken if text before the first tab character
is just long enough to push the stop to the next tab position.
Commit a43c5f51a4b1 (git-gui: add configurable tab size to the diff view,
2012-02-12) added infrastructure that manipulates the tabstop positions
of the Tk text widget. However, it does so only when a 3-way diff is
shown and only so that it takes into account the one additional markup at
the beginning of lines. This only achieved that alignment does not get
worse for 3-way diffs compared to regular patch text, but left misaligned
text in regular patch text unmodified.
Use and modify this infrastructure to shift tabstops by one position for
regular patch text and two positions for 3-way diffs. Existing code
already resets the tabstops to an unshifted position when contents of
untracked files are displayed.
Signed-off-by: Chris Idema <github_chris_idema@proton.me>
[j6t: extend commit message] Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Amisha Chhajed [Thu, 29 Jan 2026 12:12:20 +0000 (17:42 +0530)]
string-list: add string_list_sort_u() that mimics "sort -u"
Many callsites of string_list_remove_duplicates() call it
immdediately after calling string_list_sort(), understandably
as the former requires string-list to be sorted, it is clear
that these places are sorting only to remove duplicates and
for no other reason.
Introduce a helper function string_list_sort_u that combines
these two calls that often appear together, to simplify
these callsites. Replace the current calls of those methods with
string_list_sort_u().
Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amisha Chhajed [Thu, 29 Jan 2026 12:12:19 +0000 (17:42 +0530)]
u-string-list: add unit tests for string-list methods
Unit tests in u-string-list.c does not cover several methods
in string-list, this gap in coverage makes it difficult to
ensure no regressions are introduced in future changes.
Add unit tests for the following methods to enhance coverage:
string_list_remove_empty_items()
unsorted_string_list_has_string()
unsorted_string_list_delete_item()
string_list_has_string()
string_list_insert()
string_list_sort()
string_list_remove()
Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Thu, 29 Jan 2026 13:41:39 +0000 (21:41 +0800)]
l10n: zh_CN: fix inconsistent use of standard vs. wide colons
Replace mixed usage of standard (ASCII) colons ':' with full-width
(wide) colons ':' in Chinese translations to ensure typographic
consistency, as reported by CAESIUS-TIM [1].
Full-width punctuation is preferred in Chinese localization for better
readability and adherence to typesetting conventions.
Emily Shaffer [Wed, 28 Jan 2026 21:39:27 +0000 (23:39 +0200)]
receive-pack: convert receive hooks to hook API
This converts the last remaining hooks to the new hook API, for
the same benefits as the previous conversions (no need to toggle
signals, manage custom struct child_process, call find_hook(),
prepares for specifying hooks via configs, etc.).
See the previous three commits for a more in-depth explanation of
how this all works.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emily Shaffer [Wed, 28 Jan 2026 21:39:26 +0000 (23:39 +0200)]
receive-pack: convert update hooks to new API
The hook API avoids creating a custom struct child_process and other
internal hook plumbing (e.g. calling find_hook()) and prepares for
the specification of hooks via configs or running parallel hooks.
Execution is still sequential through the run_hooks_opt .jobs == 1,
which is the unchanged default for all hooks.
When use_sideband==1, the async thread redirects the hook outputs to
sideband 2, otherwise it is not used and the hooks write directly to
the fds inherited from the main parent process.
When .jobs == 1, run-command's poll loop is avoided entirely via the
ungroup=1 option like before (this was Jeff's suggestion), achieving
the same real-time output performance.
When running in parallel, run-command with ungroup=0 will capture
and de-interleave the output of each hook, then write to the parent
stderr which is redirected via dup2 to the sideband thread, so that
each parallel hook output is presented clearly to the client.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adrian Ratiu [Wed, 28 Jan 2026 21:39:25 +0000 (23:39 +0200)]
run-command: poll child input in addition to output
Child input feeding might hit the 100ms output poll timeout as a
side-effect of the ungroup=0 design when feeding multiple children
in parallel and buffering their outputs.
This throttles the write throughput as reported by Kristoffer.
Peff also noted that the parent might block if the write pipe is full
and cause a deadlock if both parent + child wait for one another.
Thus we refactor the run-command I/O loop so it polls on both child
input and output fds to eliminate the risk of artificial 100ms
latencies and unnecessarily blocking the main process.
This ensures that parallel hooks are fed data ASAP while maintaining
responsiveness for (sideband) output.
It's worth noting that in our current design, sequential execution
is not affected by this because it still uses the ungroup=1 behavior,
so there are no run-command induced buffering delays since the child
sequentially outputs directly to the parent-inherited fds.
Reported-by: Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com> Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adrian Ratiu [Wed, 28 Jan 2026 21:39:24 +0000 (23:39 +0200)]
hook: add jobs option
Allow the API callers to specify the number of jobs across which
hook execution can be parallelized. It defaults to 1 and no hook
currently changes it, so all hooks run sequentially as before.
This allows us to both pave the way for parallel hook execution
(that will be a follow-up patch series building upon this) and to
finish the API conversion of builtin/receive-pack.c, keeping the
output async sideband thread ("muxer") design as Peff suggested.
When .jobs==1 nothing changes, the "copy_to_sideband" async thread
still outputs directly via sideband channel 2, keeping the current
(mostly) real-time output characteristics, avoids unnecessary poll
delays or deadlock risks.
When .jobs > 1, a more complex muxer is needed to buffer the hook
output and avoid interleaving. After working on this mux I quickly
realized I was re-implementing run-command with ungroup=0 so that
idea was dropped in favor of run-command which outputs to stderr.
In other words, run-command itself already can buffer/deinterleave
pp child outputs (ungroup=0), so we can just connect its stderr to
the sideband async task when jobs > 1.
Maybe it helps to illustrate how it works with ascii graphics:
Adrian Ratiu [Wed, 28 Jan 2026 21:39:23 +0000 (23:39 +0200)]
reference-transaction: use hook API instead of run-command
Convert the reference-transaction hook to the new hook API,
so it doesn't need to set up a struct child_process, call
find_hook or toggle the pipe signals.
The stdin feed callback is processing one ref update per
call. I haven't noticed any performance degradation due
to this, however we can batch as many we want in each call,
to ensure a good pipe throughtput (i.e. the child does not
wait after stdin).
Helped-by: Emily Shaffer <nasamuffin@google.com> Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emily Shaffer [Wed, 28 Jan 2026 21:39:22 +0000 (23:39 +0200)]
transport: convert pre-push to hook API
Move the pre-push hook from custom run-command invocations to
the new hook API which doesn't require a custom child_process
structure and signal toggling.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adrian Ratiu [Wed, 28 Jan 2026 21:39:21 +0000 (23:39 +0200)]
hook: allow separate std[out|err] streams
The hook API assumes that all hooks merge stdout to stderr.
This assumption is proven wrong by pre-push: some of its users
actually expect separate stdout and stderr streams and merging
them will cause a regression.
Therefore this adds a mechanism to allow pre-push to separate
the streams, which will be used in the next commit.
The mechanism is generic via struct run_hooks_opt just in case
there are any more surprise exceptions like this.
Reported-by: Chris Darroch <chrisd@apache.org> Suggested-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emily Shaffer [Wed, 28 Jan 2026 21:39:20 +0000 (23:39 +0200)]
hook: convert 'post-rewrite' hook in sequencer.c to hook API
Replace the custom run-command calls used by post-rewrite with
the newer and simpler hook_run_opt(), which does not need to
create a custom 'struct child_process' or call find_hook().
Another benefit of using the hook API is that hook_run_opt()
handles the SIGPIPE toggle logic.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emily Shaffer [Wed, 28 Jan 2026 21:39:19 +0000 (23:39 +0200)]
hook: provide stdin via callback
This adds a callback mechanism for feeding stdin to hooks alongside
the existing path_to_stdin (which slurps a file's content to stdin).
The advantage of this new callback is that it can feed stdin without
going through the FS layer. This helps when feeding large amount of
data and uses the run-command parallel stdin callback introduced in
the preceding commit.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emily Shaffer [Wed, 28 Jan 2026 21:39:18 +0000 (23:39 +0200)]
run-command: add stdin callback for parallelization
If a user of the run_processes_parallel() API wants to pipe a large
amount of information to the stdin of each parallel command, that
data could exceed the pipe buffer of the process's stdin and can be
too big to store in-memory via strbuf & friends or to slurp to a file.
Generally this is solved by repeatedly writing to child_process.in
between calls to start_command() and finish_command(). For a specific
pre-existing example of this, see transport.c:run_pre_push_hook().
This adds a generic callback API to run_processes_parallel() to do
exactly that in a unified manner, similar to the existing callback APIs,
which can then be used by hooks.h to convert the remaining hooks to the
new, simpler parallel interface.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adrian Ratiu [Wed, 28 Jan 2026 21:39:17 +0000 (23:39 +0200)]
run-command: add helper for pp child states
There is a recurring pattern of testing parallel process child states
and file descriptors to determine if a child is running, receiving any
input or if it's ready for cleanup.
Name the pp_child structure and introduce a helper to make the checks
more readable.
Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adrian Ratiu [Wed, 28 Jan 2026 21:39:16 +0000 (23:39 +0200)]
t1800: add hook output stream tests
Lack of test coverage in this area led to some regressions while
converting the remaining hooks to the newer hook.[ch] API.
Add some tests to verify hooks write to the expected output streams.
Suggested-by: Patrick Steinhardt <ps@pks.im> Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sam Bostock [Wed, 28 Jan 2026 05:39:45 +0000 (05:39 +0000)]
worktree: clarify that --expire only affects missing worktrees
The --expire option for "git worktree list" and "git worktree prune"
only affects worktrees whose working directory path no longer exists.
The help text did not make this clear, and the documentation
inconsistently used "unused" for prune but "missing" for list.
Update the help text and documentation to consistently describe these
as "missing worktrees", and use "prune" instead of "expire" when
describing the effect on missing worktrees since the terminology is
clearer.
While at it, expand the description of the "prune" subcommand itself
to better explain what it does and when to use it, as suggested by
Junio.
Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Sam Bostock <sam@sambostock.ca> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Mon, 26 Jan 2026 10:48:52 +0000 (10:48 +0000)]
xdiff: remove unused data from xdlclass_t
Prior to commit 6d507bd41a (xdiff: delete fields ha, line, size
in xdlclass_t in favor of an xrecord_t, 2025-09-26) xdlclass_t
carried a copy of all the fields in xrecord_t. That commit embedded
xrecord_t in xdlclass_t to make it easier to change the types of
the fields in xrecord_t. However commit 6a26019c81 (xdiff: split
xrecord_t.ha into line_hash and minimal_perfect_hash, 2025-11-18)
added the "minimal_perfect_hash" field to xrecord_t which is not
used by xdlclass_t. To avoid wasting space stop copying the whole
of xrecord_t and just copy the pointer and length that we need to
intern the line. Together with the previous commit this effectively
reverts 6d507bd41a.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Mon, 26 Jan 2026 10:48:51 +0000 (10:48 +0000)]
xdiff: remove "line_hash" field from xrecord_t
Prior to commit 6a26019c81 (xdiff: split xrecord_t.ha into line_hash
and minimal_perfect_hash, 2025-11-18) the "ha" field of xrecord_t
initially held the "line_hash" value and once the line had been
interned that field was updated to hold the "minimal_perfect_hash". The
"line_hash" is only used to intern the line so there is no point in
storing it after all the input lines have been interned.
Removing the "line_hash" field from xrecord_t and storing it in
xdlclass_t where it is actually used makes it clearer that it is a
temporary value and it should not be used once we're calculated the
"minimal_perfect_hash". This also reduces the size of xrecord_t by 25%
on 64-bit platforms and 40% on 32-bit platforms. While the struct is
small we create one instance per input line so any saving is welcome.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
odb: drop unused `for_each_{loose,packed}_object()` functions
We have converted all callers of `for_each_loose_object()` and
`for_each_packed_object()` to use their new replacement functions
instead. We can thus remove them now.
Do so and inline `packfile_store_for_each_object_internal()` now that it
only has a single callsite again. This makes it a bit easier to follow
the callback indirection that is happening there.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
To figure out which objects expired objects we enumerate all loose and
packed objects individually so that we can figure out their respective
mtimes. Refactor the code to instead use `odb_for_each_object()` with a
request that ask for the object mtime instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/pack-objects: use `packfile_store_for_each_object()`
When enumerating objects that are supposed to be stored in a new cruft
pack we use `for_each_packed_object()` and then derive each object's
mtime individually. Refactor this logic to instead use the new
`packfile_store_for_each_object()` function with an object info request
that asks for the respective mtimes.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
odb: introduce mtime fields for object info requests
There are some use cases where we need to figure out the mtime for
objects. Most importantly, this is the case when we want to prune
unreachable objects. But getting at that data requires users to manually
derive the info either via the loose object's mtime, the packfiles'
mtime or via the ".mtimes" file.
Introduce a new `struct object_info::mtimep` pointer that allows callers
to request an object's mtime. This new field will be used in a
subsequent commit.
Note that the concept of "mtime" is ambiguous: given an object, it may
be stored multiple times in the object database, and each of these
instances may have a different mtime. Disambiguating these mtimes is
nothing that can happen on the generic ODB layer: the caller may search
for the oldest object, the newest object, or even the relation of object
mtimes depending on the specific source they are located in. As such, it
is the responsibility of the caller to disambiguate mtimes.
A consequence of this is that it's most likely incorrect to look up the
mtime via `odb_read_object_info()`, as this interface does not give us
enough information to disambiguate the mtime. Document this accordingly
and tell users to use `odb_for_each_object()` instead.
Even with this gotcha though it's sensible to have this request as part
of the object info, as the mtime is a property of the object storage
format. If we for example had a "black-box" storage backend, we'd still
need to be able to query it for the mtime info in a generic way.
We could introduce a safety mechanism that for example calls `BUG()` in
case we look up the mtime outside of `odb_for_each_object()`. But that
feels somewhat heavy-handed.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
treewide: drop uses of `for_each_{loose,packed}_object()`
We're using `for_each_loose_object()` and `for_each_packed_object()` at
a couple of callsites to enumerate all loose and packed objects,
respectively. These functions will be removed in a subsequent commit in
favor of the newly introduced `odb_source_loose_for_each_object()` and
`packfile_store_for_each_object()` replacements.
Prepare for this by refactoring the sites accordingly.
Note that ideally, we'd convert all callsites to use the generic
`odb_for_each_object()` function already. But for some callers this is
not possible (yet), and it would require some significant refactorings
to make this work. Converting these site will thus be deferred to a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
treewide: enumerate promisor objects via `odb_for_each_object()`
We have multiple callsites where we enumerate all promisor objects in
the object database via `for_each_packed_object()`. This is done by
passing the `ODB_FOR_EACH_OBJECT_PROMISOR_ONLY` flag, which causes us to
skip over all non-promisor objects.
These callsites can be trivially converted to `odb_for_each_object()` as
we know to skip enumeration of loose objects in case the `PROMISOR_ONLY`
flag was passed by the caller.
Refactor the sites accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/fsck: refactor to use `odb_for_each_object()`
In git-fsck(1) we have two callsites where we iterate over all objects
via `for_each_loose_object()` and `for_each_packed_object()`. Both of
these are trivially convertible with `odb_for_each_object()`.
Refactor these callsites accordingly.
Note that `odb_for_each_object()` may iterate over the same object
multiple times, for example when it exists both in packed and loose
format. But this has already been the case beforehand, so this does not
result in a change in behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new function `odb_for_each_object()` that knows to iterate
through all objects part of a given object database. This function is
essentially a simple wrapper around the object database sources.
Subsequent commits will adapt callers to use this new function.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: introduce function to iterate through objects
Introduce a new function `packfile_store_for_each_object()`. This
function is equivalent to `odb_source_loose_for_each_object()`, except
that it:
- Works on a single packfile store instead of working on the object
database level. Consequently, it will only yield packed objects of a
single object database source.
- Passes a `struct object_info` to the callback function.
As such, it provides the same callback interface as we already provide
for loose objects now. These functions will be used in a subsequent step
to implement `odb_for_each_object()`.
The `for_each_packed_object()` function continues to exist for now, but
it will be removed at the end of this patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: extract function to iterate through objects of a store
In the next commit we're about to introduce a new function that knows to
iterate through objects of a given packfile store. Same as with the
equivalent function for loose objects, this new function will also be
agnostic of backends by using a `struct object_info`.
Prepare for this by extracting a new shared function to iterate through
a single packfile store.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-file: introduce function to iterate through objects
We have multiple divergent interfaces to iterate through objects of a
specific backend:
- `for_each_loose_object()` yields all loose objects.
- `for_each_packed_object()` (somewhat obviously) yields all packed
objects.
These functions have different function signatures, which makes it hard
to create a common abstraction layer that covers both of these.
Introduce a new function `odb_source_loose_for_each_object()` to plug
this gap. This function doesn't take any data specific to loose objects,
but instead it accepts a `struct object_info` that will be populated the
exact same as if `odb_source_loose_read_object()` was called.
The benefit of this new interface is that we can continue to pass
backend-specific data, as `struct object_info` contains a union for
these exact use cases. This will allow us to unify how we iterate
through objects across both loose and packed objects in a subsequent
commit.
The `for_each_loose_object()` function continues to exist for now, but
it will be removed at the end of this patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-file: extract function to read object info from path
Extract a new function that allows us to read object info for a specific
loose object via a user-supplied path. This function will be used in a
subsequent commit.
Note that this also allows us to drop `stat_loose_object()`, which is
a simple wrapper around `odb_loose_path()` plus lstat(3p).
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `flags` parameter accepted by various `for_each_object()` functions
is a bitfield of multiple flags. Such parameters are typically unsigned
in the Git codebase, but we use `enum odb_for_each_object_flags` in
some places.
Adapt these function signatures to use the correct type.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the `FOR_EACH_OBJECT_*` flags to have an `ODB_` prefix. This
prepares us for a new upcoming `odb_for_each_object()` function and
ensures that both the function and its flags have the same prefix.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karthik Nayak [Sun, 25 Jan 2026 22:52:41 +0000 (23:52 +0100)]
fetch: delay user information post committing of transaction
In Git 2.50 and earlier, we would display failure codes and error
message as part of the status display:
$ git fetch . v1.0.0:refs/heads/foo
error: cannot update ref 'refs/heads/foo': trying to write non-commit object f665776185ad074b236c00751d666da7d1977dbe to branch 'refs/heads/foo'
From .
! [new tag] v1.0.0 -> foo (unable to update local ref)
With the addition of batched updates, this information is no longer
shown to the user:
$ git fetch . v1.0.0:refs/heads/foo
From .
* [new tag] v1.0.0 -> foo
error: cannot update ref 'refs/heads/foo': trying to write non-commit object f665776185ad074b236c00751d666da7d1977dbe to branch 'refs/heads/foo'
Since reference updates are batched and processed together at the end,
information around the outcome is not available during individual
reference parsing.
To overcome this, collate and delay the output to the end. Introduce
`ref_update_display_info` which will hold individual update's
information and also whether the update failed or succeeded. This
finally allows us to iterate over all such updates and print them to the
user.
Using an dynamic array and strmap does add some overhead to
'git-fetch(1)', but from benchmarking this seems to be not too bad:
Benchmark 1: fetch: many refs (refformat = files, refcount = 1000, revision = master)
Time (mean ± σ): 42.6 ms ± 1.2 ms [User: 13.1 ms, System: 29.8 ms]
Range (min … max): 40.1 ms … 45.8 ms 47 runs
Benchmark 2: fetch: many refs (refformat = files, refcount = 1000, revision = HEAD)
Time (mean ± σ): 43.1 ms ± 1.2 ms [User: 12.7 ms, System: 30.7 ms]
Range (min … max): 40.5 ms … 45.8 ms 48 runs
Summary
fetch: many refs (refformat = files, refcount = 1000, revision = master) ran
1.01 ± 0.04 times faster than fetch: many refs (refformat = files, refcount = 1000, revision = HEAD)
Another approach would be to move the status printing logic to be
handled post the transaction being committed. That however would require
adding an iterator to the ref transaction that tracks both the outcome
(success/failure) and the original refspec information for each update,
which is more involved infrastructure work compared to the strmap
approach here.
Helped-by: Phillip Wood <phillip.wood123@gmail.com> Reported-by: Jeff King <peff@peff.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karthik Nayak [Sun, 25 Jan 2026 22:52:40 +0000 (23:52 +0100)]
receive-pack: utilize rejected ref error details
In 9d2962a7c4 (receive-pack: use batched reference updates, 2025-05-19),
git-receive-pack(1) switched to using batched reference updates. This also
introduced a regression wherein instead of providing detailed error
messages for failed referenced updates, the users were provided generic
error messages based on the error type.
Now that the updates also contain detailed error message, propagate
those to the client via 'rp_error'. The detailed error messages can be
very verbose, for e.g. in the files backend, when trying to write a
non-commit object to a branch, you would see:
Here the refname is repeated multiple times due to how error messages
are propagated and filled over the code stack. This potentially can be
cleaned up in a future commit.
Reported-by: Elijah Newren <newren@gmail.com> Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karthik Nayak [Sun, 25 Jan 2026 22:52:39 +0000 (23:52 +0100)]
fetch: utilize rejected ref error details
In 0e358de64a (fetch: use batched reference updates, 2025-05-19),
git-fetch(1) switched to using batched reference updates. This also
introduced a regression wherein instead of providing detailed error
messages for failed referenced updates, the users were provided generic
error messages based on the error type.
Similar to the previous commit, switch to using detailed error messages
if present for failed reference updates to fix this regression.
Reported-by: Elijah Newren <newren@gmail.com> Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karthik Nayak [Sun, 25 Jan 2026 22:52:38 +0000 (23:52 +0100)]
update-ref: utilize rejected error details if available
When git-update-ref(1) received the '--update-ref' flag, the error
details generated in the refs namespace wasn't propagated with failed
updates. Instead only an error code pertaining to the type of rejection
was noted.
This missed detailed error message which the user can act upon. The
previous commits added the required code to propagate these detailed
error messages from the refs namespace. Now that additional details are
available, let's output this additional details to stderr. This allows
users to have additional information over the already present machine
parsable output.
While we're here, improve the existing tests for the machine parsable
output by checking for the entire output string and not just the
rejection reason.
Reported-by: Elijah Newren <newren@gmail.com> Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karthik Nayak [Sun, 25 Jan 2026 22:52:37 +0000 (23:52 +0100)]
refs: add rejection detail to the callback function
The previous commit started storing the rejection details alongside the
error code for rejected updates. Pass this along to the callback
function `ref_transaction_for_each_rejected_update()`. Currently the
field is unused, but will be integrated in the upcoming commits.
Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karthik Nayak [Sun, 25 Jan 2026 22:52:36 +0000 (23:52 +0100)]
refs: skip to next ref when current ref is rejected
In `refs_verify_refnames_available()` we have two nested loops: the
outer loop iterates over all references to check, while the inner loop
checks for filesystem conflicts for a given ref by breaking down its
path.
With batched updates, when we detect a filesystem conflict, we mark the
update as rejected and execute 'continue'. However, this only skips to
the next iteration of the inner loop, not the outer loop as intended.
This causes the same reference to be repeatedly rejected. Fix this by
using a goto statement to skip to the next reference in the outer loop.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sun, 25 Jan 2026 17:08:06 +0000 (09:08 -0800)]
Merge branch 'master' of https://github.com/j6t/git-gui
* 'master' of https://github.com/j6t/git-gui:
git-gui: mark *.po files at any directory level as UTF-8
git-gui i18n: Update Bulgarian translation (558t)
git-gui i18n: Update Bulgarian translation (557t)
Johannes Sixt [Sun, 25 Jan 2026 09:46:23 +0000 (10:46 +0100)]
git-gui: mark *.po files at any directory level as UTF-8
When a commit is viewed in Gitk that changes a file in po/glossary, the
patch text shows mojibake instead of correctly decoded UTF-8 text.
Gitk retrieves the encoding attribute to decide how to treat the bytes
that make up the patch text. There is an attribute definition that all
files are US-ASCII, and a later attribute definition overrides this.
But the override, which specifies UTF-8, applies only to *.po files in
directory po/ and does not apply to subdirectories.
Widen the pattern to apply to all directory levels.
* dk/replay-doc-omit-irrelevant-rev-list-options:
lint-gitlink: preemptively ignore all /ifn?def|endif/ macros
replay: drop rev-list formatting options from manual
Junio C Hamano [Fri, 23 Jan 2026 21:34:36 +0000 (13:34 -0800)]
Merge branch 'js/symlink-windows'
Upstream symbolic link support on Windows from Git-for-Windows.
* js/symlink-windows:
mingw: special-case index entries for symlinks with buggy size
mingw: emulate `stat()` a little more faithfully
mingw: try to create symlinks without elevated permissions
mingw: add support for symlinks to directories
mingw: implement basic `symlink()` functionality (file symlinks only)
mingw: implement `readlink()`
mingw: allow `mingw_chdir()` to change to symlink-resolved directories
mingw: support renaming symlinks
mingw: handle symlinks to directories in `mingw_unlink()`
mingw: add symlink-specific error codes
mingw: change default of `core.symlinks` to false
mingw: factor out the retry logic
mingw: compute the correct size for symlinks in `mingw_lstat()`
mingw: teach dirent about symlinks
mingw: let `mingw_lstat()` error early upon problems with reparse points
mingw: drop the separate `do_lstat()` function
mingw: implement `stat()` with symlink support
mingw: don't call `GetFileAttributes()` twice in `mingw_lstat()`
Junio C Hamano [Fri, 23 Jan 2026 21:34:36 +0000 (13:34 -0800)]
Merge branch 'js/ci-leak-skip-svn'
Dscho observed that SVN tests are taking too much time in CI leak
checking tasks, but most time is spent not in our code but in libsvn
code (which happen to be written in Perl), whose leaks have little
value to discover for us. Skip SVN, P4, and CVS tests in the leak
checking tasks.
* js/ci-leak-skip-svn:
ci: skip CVS and P4 tests in leaks job, too
ci(*-leaks): skip the git-svn tests to save time