Junio C Hamano [Tue, 29 Jul 2025 20:03:47 +0000 (13:03 -0700)]
Merge branch 'ps/object-store-midx-dedup-info' into seen
* ps/object-store-midx-dedup-info:
midx: compute paths via their source
midx: stop duplicating info redundant with its owning source
midx: write multi-pack indices via their source
midx: load multi-pack indices via their source
midx: drop redundant `struct repository` parameter
odb: return newly created in-memory sources
odb: allow `odb_find_source()` to fail
odb: store locality in object database sources
With the preceding commits we started to always have the object database
source available when we load, write or access multi-pack indices. With
this in place we can change how MIDX paths are computed so that we don't
have to pass in the combination of a hash algorithm and object directory
anymore, but only the object database source.
Refactor the code accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
midx: stop duplicating info redundant with its owning source
Multi-pack indices store some information that is redundant with their
owning source:
- The locality bit that tracks whether the source is the primary
object source or an alternate.
- The object directory path the multi-pack index is located in.
- The pointer to the owning parent directory.
All of this information is already contained in `struct odb_source`. So
now that we always have that struct available when loading a multi-pack
index we have it readily accessible.
Drop the redundant information and instead store a pointer to the object
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the preceding commit, refactor the writing side of multi-pack
indices so that we pass in the object database source where the index
should be written to.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
To load a multi-pack index the caller is expected to pass both the
repository and the object directory where the multi-pack index is
located. While this works, this layout has a couple of downsides:
- We need to pass in information reduntant with the owning source,
namely its object directory and whether the source is local or not.
- We don't have access to the source when loading the multi-pack
index. If we had that access, we could store a pointer to the owning
source in the MIDX and thus deduplicate some information.
- Multi-pack indices are inherently specific to the object source and
its format. With the goal of pluggable object backends in mind we
will eventually want the backends to own the logic of reading and
writing multi-pack indices. Making the logic work on top of object
sources is a step into that direction.
Refactor loading of multi-pack indices accordingly.
This surfaces one small problem though: git-multi-pack-index(1) and our
MIDX test helper both know to read and write multi-pack-indices located
in a different object directory. This issue is addressed by adding the
user-provided object directory as an in-memory alternate.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
midx: drop redundant `struct repository` parameter
There are a couple of functions that take both a `struct repository` and
a `struct multi_pack_index`. This provides redundant information though
without much benefit given that the multi-pack index already has a
pointer to its owning repository.
Drop the `struct repository` parameter from such functions. While at it,
reorder the list of parameters of `fill_midx_entry()` so that the MIDX
comes first to better align with our coding guidelines.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Callers have no trivial way to obtain the newly created object database
source when adding it to the in-memory list of alternates. While not yet
needed anywhere, a subsequent commit will want to obtain that pointer.
Refactor the function to return the source to make it easily accessible.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When trying to locate a source for an unknown object directory we will
die right away. In subsequent patches we will add new callsites though
that want to handle this situation gracefully instead.
Refactor the function to return a `NULL` pointer if the source could not
be found and adapt the callsites to die instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 29 Jul 2025 18:34:08 +0000 (11:34 -0700)]
Merge branch 'ps/object-store-midx' into ps/object-store-midx-dedup-info
* ps/object-store-midx:
midx: remove now-unused linked list of multi-pack indices
packfile: stop using linked MIDX list in `get_all_packs()`
packfile: stop using linked MIDX list in `find_pack_entry()`
packfile: refactor `get_multi_pack_index()` to work on sources
midx: stop using linked list when closing MIDX
packfile: refactor `prepare_packed_git_one()` to work on sources
midx: start tracking per object database source
Junio C Hamano [Tue, 29 Jul 2025 16:28:13 +0000 (09:28 -0700)]
Merge branch 'lo/repo-info' into seen
A new subcommand "git repo" gives users a way to grab various
repository characteristics.
* lo/repo-info:
repo: add the --format flag
repo: add field layout.shallow
repo: add field layout.bare
repo: add the field references.format
repo: declare the repo command
Junio C Hamano [Tue, 29 Jul 2025 16:28:12 +0000 (09:28 -0700)]
Merge branch 'ds/sparse-checkout-clean' into seen
"git sparse-checkout" subcommand learned a new "clean" action to
prune otherwise unused working-tree files that are outside the
areas of interest.
* ds/sparse-checkout-clean:
sparse-checkout: make 'clean' clear more files
t: expand tests around sparse merges and clean
sparse-index: point users to new 'clean' action
sparse-checkout: add --verbose option to 'clean'
dir: add generic "walk all files" helper
sparse-checkout: match some 'clean' behavior
sparse-checkout: add basics of 'clean' command
sparse-checkout: remove use of the_repository
Junio C Hamano [Tue, 29 Jul 2025 16:28:11 +0000 (09:28 -0700)]
Merge branch 'ac/deglobal-sparse-variables' into seen
Two global variables related to sparse checkout have been moved to
the repository settings structure.
Ready?
* ac/deglobal-sparse-variables:
environment: remove the global variable 'sparse_expect_files_outside_of_patterns'
environment: move access to "core.sparsecheckoutcone" into repo_settings
environment: move access to "core.sparsecheckout" into repo_settings
Junio C Hamano [Tue, 29 Jul 2025 16:28:09 +0000 (09:28 -0700)]
Merge branch 'pw/3.0-commentchar-auto-deprecation' into seen
Proposes to deprecate "core.commentChar=auto" that attempts to
dynamically pick a suitable comment character, as it is too much
trouble to support for little benefit.
* pw/3.0-commentchar-auto-deprecation:
commit: print advice when core.commentString=auto
breaking-changes: deprecate support for core.commentString=auto
Junio C Hamano [Tue, 29 Jul 2025 16:28:09 +0000 (09:28 -0700)]
Merge branch 'cc/promisor-remote-capability' into seen
The "promisor-remote" capability mechanism has been updated to
allow the "partialCloneFilter" settings and the "token" value to be
communicated from the server side.
* cc/promisor-remote-capability:
promisor-remote: use string constants for 'name' and 'url' too
promisor-remote: allow a client to check fields
promisor-remote: refactor how we parse advertised fields
promisor-remote: allow a server to advertise more fields
promisor-remote: refactor to get rid of 'struct strvec'
Junio C Hamano [Tue, 29 Jul 2025 16:28:08 +0000 (09:28 -0700)]
Merge branch 'en/ort-rename-fixes' into seen
Various bugs about rename handling in "ort" merge strategy have
been fixed.
Comments?
* en/ort-rename-fixes:
merge-ort: fix directory rename on top of source of other rename/delete
merge-ort: fix incorrect file handling
t6423: fix missed staging of file in testcases 12i,12j,12k
t6423: document two bugs with rename-to-self testcases
merge-ort: drop unnecessary temporary in check_for_directory_rename()
merge-ort: update comments to modern testfile location
Junio C Hamano [Tue, 29 Jul 2025 16:27:58 +0000 (09:27 -0700)]
Merge branch 'ps/remote-rename-fix' into jch
"git remote rename origin upstream" failed to move origin/HEAD to
upstream/HEAD when origin/HEAD is unborn and performed other
renames extremely inefficiently, which has been corrected.
* ps/remote-rename-fix:
builtin/remote: only iterate through refs that are to be renamed
builtin/remote: rework how remote refs get renamed
refs: simplify logic when migrating reflog entries
refs: pass refname when invoking reflog entry callback
Junio C Hamano [Tue, 29 Jul 2025 16:27:58 +0000 (09:27 -0700)]
Merge branch 'rs/tighten-alias-help' into jch
"git -c alias.foo=bar foo -h baz" reported "'foo' is aliased to
'bar'" and then went on to do "git foo -h baz", which was
unexpected. Tighten the rule so that alias expansion is reported
only when "-h" is the sole option.
* rs/tighten-alias-help:
git: show alias info only with lone -h
Junio C Hamano [Tue, 29 Jul 2025 16:27:57 +0000 (09:27 -0700)]
Merge branch 'tb/prepare-midx-pack-cleanup' into jch
Improvement on Multi-pack-index API.
* tb/prepare-midx-pack-cleanup:
midx: return a `packed_git` pointer from `prepare_midx_pack()`
midx-write.c: extract inner loop from fill_packs_from_midx()
midx-write.c: guard against incremental MIDXs in want_included_pack()
midx: access pack names through `nth_midxed_pack_name()`
Junio C Hamano [Tue, 29 Jul 2025 16:27:56 +0000 (09:27 -0700)]
Merge branch 'ps/reflog-migrate-fixes' into jch
"git refs migrate" to migrate the reflog entries from a refs
backend to another had a handful of bugs squashed.
* ps/reflog-migrate-fixes:
refs: fix invalid old object IDs when migrating reflogs
refs: stop unsetting REF_HAVE_OLD for log-only updates
refs/files: detect race when generating reflog entry for HEAD
refs: fix identity for migrated reflogs
ident: fix type of string length parameter
builtin/reflog: implement subcommand to write new entries
refs: export `ref_transaction_update_reflog()`
builtin/reflog: improve grouping of subcommands
Documentation/git-reflog: convert to use synopsis type
Junio C Hamano [Tue, 29 Jul 2025 16:27:55 +0000 (09:27 -0700)]
Merge branch 'ps/object-file-wo-the-repository' into jch
Reduce implicit assumption and dependence on the_repository in the
object-file subsystem.
* ps/object-file-wo-the-repository:
object-file: get rid of `the_repository` in index-related functions
object-file: get rid of `the_repository` in `force_object_loose()`
object-file: get rid of `the_repository` in `read_loose_object()`
object-file: get rid of `the_repository` in loose object iterators
object-file: remove declaration for `for_each_file_in_obj_subdir()`
object-file: inline `for_each_loose_file_in_objdir_buf()`
object-file: get rid of `the_repository` when writing objects
odb: introduce `odb_write_object()`
loose: write loose objects map via their source
object-file: get rid of `the_repository` in `finalize_object_file()`
object-file: get rid of `the_repository` in `loose_object_info()`
object-file: get rid of `the_repository` when freshening objects
object-file: inline `check_and_freshen()` functions
object-file: get rid of `the_repository` in `has_loose_object()`
object-file: stop using `the_hash_algo`
object-file: fix -Wsign-compare warnings
Junio C Hamano [Tue, 29 Jul 2025 16:27:54 +0000 (09:27 -0700)]
Merge branch 'lm/add-p-context' into jch
"git add/etc -p" now honors diff.context configuration variable,
and learns to honor -U<n> option.
* lm/add-p-context:
add-patch: add diff.context command line overrides
add-patch: respect diff.context configuration
t: use test_config in t4055
t: use test_grep in t3701 and t4055
Junio C Hamano [Tue, 29 Jul 2025 16:27:53 +0000 (09:27 -0700)]
Merge branch 'ps/config-wo-the-repository' into jch
The config API had a set of convenience wrapper functions that
implicitly use the_repository instance; they have been removed and
inlined at the calling sites.
* ps/config-wo-the-repository: (21 commits)
config: fix sign comparison warnings
config: move Git config parsing into "environment.c"
config: remove unused `the_repository` wrappers
config: drop `git_config_set_multivar()` wrapper
config: drop `git_config_get_multivar_gently()` wrapper
config: drop `git_config_set_multivar_in_file_gently()` wrapper
config: drop `git_config_set_in_file_gently()` wrapper
config: drop `git_config_set()` wrapper
config: drop `git_config_set_gently()` wrapper
config: drop `git_config_set_in_file()` wrapper
config: drop `git_config_get_bool()` wrapper
config: drop `git_config_get_ulong()` wrapper
config: drop `git_config_get_int()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string_multi()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get()` wrapper
config: drop `git_config_clear()` wrapper
...
Junio C Hamano [Tue, 29 Jul 2025 16:27:52 +0000 (09:27 -0700)]
Merge branch 'kn/for-each-ref-skip-updates' into jch
Code clean-up.
* kn/for-each-ref-skip-updates:
ref-filter: use REF_ITERATOR_SEEK_SET_PREFIX instead of '1'
t6302: add test combining '--start-after' with '--exclude'
for-each-ref: reword the documentation for '--start-after'
for-each-ref: fix documentation argument ordering
ref-cache: use 'size_t' instead of int for length
Junio C Hamano [Tue, 29 Jul 2025 16:27:50 +0000 (09:27 -0700)]
Merge branch 'hl/test-helper-fd-close' into jch
A few file descriptors left unclosed upon program completion in a
few test helper programs are now closed.
* hl/test-helper-fd-close:
test-delta: close output descriptor after use
test-delta: use strbufs to hold input files
test-delta: handle errors with die()
t/helper/test-truncate: close file descriptor after truncation
Junio C Hamano [Tue, 29 Jul 2025 16:27:50 +0000 (09:27 -0700)]
Merge branch 'ow/rebase-verify-insn-fmt-before-initializing-state' into jch
"git rebase -i" with bogus rebase.instructionFormat configuration
failed to produce the todo file after recording the state files,
leading to confused "git status"; this has been corrected.
* ow/rebase-verify-insn-fmt-before-initializing-state:
rebase: write script before initializing state
Junio C Hamano [Tue, 29 Jul 2025 16:27:49 +0000 (09:27 -0700)]
Merge branch 'ps/object-store-midx' into jch
Redefine where the multi-pack-index sits in the object subsystem,
which recently was restructured to allow multiple backends that
support a single object source that belongs to one repository. A
midx does span mulitple "object sources".
* ps/object-store-midx:
midx: remove now-unused linked list of multi-pack indices
packfile: stop using linked MIDX list in `get_all_packs()`
packfile: stop using linked MIDX list in `find_pack_entry()`
packfile: refactor `get_multi_pack_index()` to work on sources
midx: stop using linked list when closing MIDX
packfile: refactor `prepare_packed_git_one()` to work on sources
midx: start tracking per object database source
Junio C Hamano [Tue, 29 Jul 2025 16:27:49 +0000 (09:27 -0700)]
Merge branch 'sk/reftable-clarify-tests' into jch
The reftable unit tests are now ported to the "clar" unit testing
framework.
* sk/reftable-clarify-tests:
t/unit-tests: finalize migration of reftable-related tests
t/unit-tests: convert reftable stack test to use clar
t/unit-tests: convert reftable record test to use clar
t/unit-tests: convert reftable readwrite test to use clar
t/unit-tests: convert reftable table test to use clar
t/unit-tests: convert reftable pq test to use clar
t/unit-tests: convert reftable merged test to use clar
t/unit-tests: convert reftable block test to use clar
t/unit-tests: convert reftable basics test to use clar test framework
t/unit-tests: implement clar specific reftable test helper functions
Junio C Hamano [Tue, 29 Jul 2025 16:27:47 +0000 (09:27 -0700)]
Merge branch 'jc/document-test-balloons-in-flight' into jch
To help our developers, document what C99 language features are
being considered for adoption, in addition to what past experiments
have already decided.
* jc/document-test-balloons-in-flight:
CodingGuidelines: document test balloons in flight
Junio C Hamano [Tue, 29 Jul 2025 16:27:45 +0000 (09:27 -0700)]
Merge branch 'kn/for-each-ref-skip' into jch
"git for-each-ref" learns "--skip-until" option to help
applications that want to page its output.
* kn/for-each-ref-skip:
ref-cache: set prefix_state when seeking
for-each-ref: introduce a '--start-after' option
ref-filter: remove unnecessary else clause
refs: selectively set prefix in the seek functions
ref-cache: remove unused function 'find_ref_entry()'
refs: expose `ref_iterator` via 'refs.h'
builtin/remote: only iterate through refs that are to be renamed
When renaming a remote we also need to rename all references
accordingly. But while we only need to rename references that are
contained in the "refs/remotes/$OLDNAME/" namespace, we end up using
`refs_for_each_rawref()` that iterates through _all_ references. We know
to exit early in the callback in case we see an irrelevant reference,
but ultimately this is still a waste of compute as we knowingly iterate
through references that we won't ever care about.
Improve this by introducing `refs_for_each_rawref_in()`, which knows to
only iterate through (potentially broken) references in a given prefix.
The following benchmark renames a remote with a single reference in a
repository that has 100k unrelated references. This shows a sizeable
improvement with the "files" backend:
Benchmark 1: rename remote (refformat = files, revision = HEAD~)
Time (mean ± σ): 42.6 ms ± 0.9 ms [User: 29.1 ms, System: 8.4 ms]
Range (min … max): 40.1 ms … 43.3 ms 10 runs
Benchmark 2: rename remote (refformat = files, revision = HEAD)
Time (mean ± σ): 31.7 ms ± 4.0 ms [User: 19.6 ms, System: 6.9 ms]
Range (min … max): 27.1 ms … 36.0 ms 10 runs
Summary
rename remote (refformat = files, revision = HEAD) ran
1.35 ± 0.17 times faster than rename remote (refformat = files, revision = HEAD~)
The "reftable" backend shows roughly the same absolute improvement, but
given that it's already significantly faster than the "files" backend
this translates to a much larger relative improvement:
Benchmark 1: rename remote (refformat = reftable, revision = HEAD~)
Time (mean ± σ): 18.2 ms ± 0.5 ms [User: 12.7 ms, System: 3.0 ms]
Range (min … max): 17.3 ms … 21.4 ms 110 runs
Benchmark 2: rename remote (refformat = reftable, revision = HEAD)
Time (mean ± σ): 8.8 ms ± 0.5 ms [User: 3.8 ms, System: 2.9 ms]
Range (min … max): 7.5 ms … 9.9 ms 167 runs
Summary
rename remote (refformat = reftable, revision = HEAD) ran
2.07 ± 0.12 times faster than rename remote (refformat = reftable, revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/remote: rework how remote refs get renamed
It was recently reported [1] that renaming a remote that has dangling
symrefs is broken. This issue can be trivially reproduced:
$ git init repo
Initialized empty Git repository in /tmp/repo/.git/
$ cd repo/
$ git remote add origin /dev/null
$ git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/master
$ git remote rename origin renamed
$ git symbolic-ref refs/remotes/origin/HEAD
refs/remotes/origin/master
$ git symbolic-ref refs/remotes/renamed/HEAD
fatal: ref refs/remotes/renamed/HEAD is not a symbolic ref
As one can see, the "HEAD" reference did not get renamed but stays in
the same place. There are two issues here:
- We use `refs_resolve_ref_unsafe()` to resolve references, but we
don't pass the `RESOLVE_REF_NO_RECURSE` flag. Consequently, if the
reference does not resolve, the function will fail and we thus
ignore this branch.
- We use `refs_for_each_ref()` to iterate through the old remote's
references, but that function ignores broken references.
Both of these issues are easy to fix. But having a closer look at the
logic that renames remote references surfaces that it leaves a lot to be
desired overall.
The problem is that we're using O(|refs| + |symrefs| * 2) many reference
transactions to perform the renames. We first delete all symrefs, then
individually rename every direct reference and finally we recreate the
symrefs. On the one hand this isn't even remotely an atomic operation,
so if we hit any error we'll already have deleted all references.
But more importantly it is also extremely inperformant. The number of
transactions for symrefs doesn't really bother us too much, as there
should generally only be a single symref anyway ("HEAD"). But the
renames are very expensive:
- For the "reftable" backend we perform auto-compaction after every
single rename, which does add up.
- For the "files" backend we potentially have to rewrite the
"packed-refs" file on every single rename in case they are packed.
The consequence here is quadratic runtime performance. Renaming a
100k references takes hours to complete.
Ideally we'd refactor the code to use a single transaction to perform
all the reference updates atomically. But unfortunately that does not
work in the case where you rename a remote so that it becomes nested
into itself, as that can lead to conflicting reference updates.
The next-best thing is to do it in two transactions: one to delete all
the references, and one to recreate the references and their reflogs.
This signicantly speeds up the operation with the "files" backend. The
following benchmark renames a remote with 10000 references:
Benchmark 1: rename remote (refformat = files, revision = HEAD~)
Time (mean ± σ): 238.770 s ± 13.857 s [User: 91.473 s, System: 143.793 s]
Range (min … max): 204.863 s … 247.699 s 10 runs
Benchmark 2: rename remote (refformat = files, revision = HEAD)
Time (mean ± σ): 2.103 s ± 0.036 s [User: 0.360 s, System: 1.313 s]
Range (min … max): 2.011 s … 2.141 s 10 runs
Summary
rename remote (refformat = files, revision = HEAD) ran
113.53 ± 6.87 times faster than rename remote (refformat = files, revision = HEAD~)
For the "reftable" backend we see a significant speedup, as well, but
given that we don't have quadratic runtime behaviour there it's way less
extreme:
Benchmark 1: rename remote (refformat = reftable, revision = HEAD~)
Time (mean ± σ): 8.604 s ± 0.539 s [User: 4.985 s, System: 2.368 s]
Range (min … max): 7.880 s … 9.556 s 10 runs
Benchmark 2: rename remote (refformat = reftable, revision = HEAD)
Time (mean ± σ): 1.177 s ± 0.103 s [User: 0.446 s, System: 0.270 s]
Range (min … max): 1.023 s … 1.410 s 10 runs
Summary
rename remote (refformat = reftable, revision = HEAD) ran
7.31 ± 0.79 times faster than rename remote (refformat = reftable, revision = HEAD~)
While at it, fix the handling of symbolic/broken references by using
`refs_for_each_rawref()`. Add tests that cover both this reported issue
and tests that verify that nesting remotes into itself continues to work
with conflicting references.
One thing to note: with this change we cannot provide a proper progress
monitor anymore as we queue the references into the transactions as we
iterate through them. Consequently, as we don't know yet how many refs
there are in total, we cannot report how many percent of the operation
is done anymore. But that's a small price to pay considering that you
now shouldn't need the progress monitor in most situations at all
anymore.
refs: simplify logic when migrating reflog entries
When migrating reflog entries between two storage formats we have to do
so via two callback-driven functions:
- `migrate_one_reflog()` gets invoked via `refs_for_each_reflog()` to
first list all available reflogs.
- `migrate_one_reflog_entry()` gets invoked via
`refs_for_each_reflog_ent()` in `migrate_one_reflog()`.
Before the preceding commit we didn't have the refname available in
`migrate_one_reflog_entry()`, which made it necessary to have a separate
structure that we pass to the second callback so that we can propagate
the refname. Now that `refs_for_each_reflog_ent()` knows to pass the
refname to the callback though that indirection isn't necessary anymore.
There's one catch though: we do have an update index that is also stored
in the entry-specific callback data. This update index is required so
that we can tell the ref backend in which order it should persist the
reflog entries to disk.
But that purpose can be trivially achieved by just converting it into a
global counter that is used for all reflog entries, regardless of which
reference they are for. The ordering will remain the same as both the
update index and the refname is considered when sorting the entries.
Move the index into `struct migration_data` and drop the now-unused
`struct reflog_migration_data` to simplify the code a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: pass refname when invoking reflog entry callback
With `refs_for_each_reflog_ent()` callers can iterate through all the
reflog entries for a given reference. The callback that is being invoked
for each such entry does not receive the name of the reference that we
are currently iterating through. This isn't really a limiting factor, as
callers can simply pass the name via the callback data.
But this layout sometimes does make for a bit of an awkward calling
pattern. One example: when iterating through all reflogs, and for each
reflog we iterate through all refnames, we have to do some extra book
keeping to track which reference name we are currently yielding reflog
entries for.
Change the signature of the callback function so that the reference name
of the reflog gets passed through to it. Adapt callers accordingly and
start using the new parameter in trivial cases. The next commit will
refactor the reference migration logic to make use of this parameter so
that we can simplify its logic a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 29 Jul 2025 16:17:33 +0000 (09:17 -0700)]
Merge branch 'ps/reflog-migrate-fixes' into ps/remote-rename-fix
* ps/reflog-migrate-fixes:
refs: fix invalid old object IDs when migrating reflogs
refs: stop unsetting REF_HAVE_OLD for log-only updates
refs/files: detect race when generating reflog entry for HEAD
refs: fix identity for migrated reflogs
ident: fix type of string length parameter
builtin/reflog: implement subcommand to write new entries
refs: export `ref_transaction_update_reflog()`
builtin/reflog: improve grouping of subcommands
Documentation/git-reflog: convert to use synopsis type
refs: fix invalid old object IDs when migrating reflogs
When migrating reflog entries between different storage formats we end
up with invalid old object IDs for the migrated entries: instead of
writing the old object ID of the to-be-migrated entry, we end up with
the all-zeroes object ID.
The root cause of this issue is that we don't know to use the old object
ID provided by the caller. Instead, we manually resolve the old object
ID by resolving the current value of its matching reference. But as that
reference does not yet exist in the target ref storage we always end up
resolving it to all-zeroes.
This issue got unnoticed as there is no user-facing command that would
even show the old object ID. While `git log -g` knows to show the new
object ID, we don't have any formatting directive to show the old object
ID.
Fix the bug by introducing a new flag `REF_LOG_USE_PROVIDED_OIDS`. If
set, backends are instructed to use the old and new object IDs provided
by the caller, without doing any manual resolving. Set this flag in
`ref_transaction_update_reflog()`.
Amend our tests in t1460-refs-migrate to use our test tool to read
reflog entries. This test tool prints out both old and new object ID of
each reflog entry, which fixes the test gap. Furthermore it also prints
the full identity used to write the reflog, which provides test coverage
for the previous commit in this patch series that fixed the identity for
migrated reflogs.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: stop unsetting REF_HAVE_OLD for log-only updates
The `REF_HAVE_OLD` flag indicates whether a given ref update has its old
object ID set. If so, the value of that field is used to verify whether
the current state of the reference matches this expected state. It is
thus an important part of mitigating races with a concurrent process
that updates the same set of references.
When writing reflogs though we explicitly unset that flag. This is a
sensible thing to do: the old state of reflog entry updates may not
necessarily match the current on-disk state of its accompanying ref, but
it's only intended to signal what old object ID we want to write into
the new reflog entry. For example when migrating refs we end up writing
many reflog entries for a single reference, and most likely those reflog
entries will have many different old object IDs.
But unsetting this flag also removes a useful signal, namely that the
caller _did_ provide an old object ID for a given reflog entry. This
signal will become useful in a subsequent commit, where we add a new
flag that tells the transaction to use the provided old and new object
IDs to write a reflog entry. The `REF_HAVE_OLD` flag is then used as a
signal to verify that the caller really did provide an old object ID.
Stop unsetting the flag so that we can use it as this described signal
in a subsequent commit. Skip checking the old object ID for log-only
updates so that we don't expect it to match the current on-disk state.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/files: detect race when generating reflog entry for HEAD
When updating a reference that is being pointed to HEAD we don't only
write a reflog message for that particular reference, but also generate
one for HEAD. This logic is handled by `split_head_update()`, where we:
1. Verify that the condition actually triggered. This is done by
reading HEAD at the start of the transaction so that we can then
check whether a given reference update refers to its target.
2. Queue a new log-only update for HEAD in case it did.
But the logic is unfortunately not free of races, as we do not lock the
HEAD reference after we have read its target. This can lead to the
following two scenarios:
- HEAD gets concurrently updated to point to one of the references we
have already processed. This causes us not writing a reflog message
even though we should have done so.
- HEAD gets concurrently updated to point to not point to a reference
anymore that we have already processed. This causes us to write a
reflog message even though we should _not_ have done so.
Improve the situation by introducing a new `REF_LOG_VIA_SPLIT` flag that
is specific to the "files" backend. If set, we will double check that
the HEAD reference still points to the reference that we are creating
the reflog entry for after we have locked HEAD. Furthermore, instead of
manually resolving the old object ID of that entry, we now use the same
old state as for the parent update.
Unfortunately, this change only helps with the second race. We cannot
reliably plug the first race without locking the HEAD reference at the
start of the transaction. Locking HEAD unconditionally would effectively
serialize all writes though, and that doesn't seem like an option. Also,
double checking its value at the end of the transaction is not an option
either, as its target may have flip-flopped during the transaction.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When migrating reflog entries between different storage formats we must
reconstruct the identity of reflog entries. This is done by passing the
committer passed to the `migrate_one_reflog_entry()` callback function
to `fmt_ident()`.
This results in an invalid identity though: `fmt_ident()` expects the
caller to provide both name and mail of the author, but we pass the full
identity as mail. This leads to an identity like:
pks <Patrick Steinhardt ps@pks.im>
Fix the bug by splitting the identity line first. This allows us to
extract both the name and mail so that we can pass them to `fmt_ident()`
separately.
This commit does not yet add any tests as there is another bug in the
reflog migration that will be fixed in a subsequent commit. Once that
bug is fixed we'll make the reflog verification in t1450 stricter, and
that will catch both this bug here and the other bug.
Note that we also add two new `name` and `mail` string buffers to the
callback structures and splice them through to the callbacks. This is
done so that we can avoid allocating a new buffer every time we compute
the committer information.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The last parameter in `split_ident_line()` is the length of the line
passed in by the caller. As such, most callers pass in either the result
of `strlen()`, `struct strbuf::len` or a pointer diff, all of which
are expected to be positive numbers. Regardless of that, the function
accepts a signed integer, which is somewhat confusing.
Fix the function signature to instead accept a `size_t`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/reflog: implement subcommand to write new entries
While we provide a couple of subcommands in git-reflog(1) to remove
reflog entries, we don't provide any to write new entries. Obviously
this is not an operation that really would be needed for many use cases
out there, or otherwise people would have complained that such a command
does not exist yet. But the introduction of the "reftable" backend
changes the picture a bit, as it is now basically impossible to manually
append a reflog entry if one wanted to do so due to the binary format.
Plug this gap by introducing a simple "write" subcommand. For now, all
this command does is to append a single new reflog entry with the given
object IDs and message to the reflog. More specifically, it is not yet
possible to:
- Write multiple reflog entries at once.
- Insert reflog entries at arbitrary indices.
- Specify the date of the reflog entry.
- Insert reflog entries that refer to nonexistent objects.
If required, those features can be added at a future point in time. For
now though, the new command aims to fulfill the most basic use cases
while being as strict as possible when it comes to verifying parameters.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit we'll add another user that wants to write reflog
entries. This requires them to call `ref_transaction_update_reflog()`,
but that function is local to "refs.c".
Export the function to prepare for the change. While at it, drop the
`flags` field, as all callers are for now expected to use the same flags
anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The way subcommands of git-reflog(1) are laid out does not make any
immediate sense. Reorder them such that read-only subcommands precede
writing commands for a bit more structure.
Furthermore, move the "expire" subcommand last. This prepares for a
subsequent change where we are about to introduce a new "write" command
to append reflog entries. Like this, the writing subcommands are ordered
such that those affecting a single reflog come before those spanning
across all reflogs.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation/git-reflog: convert to use synopsis type
With 974cdca345c (doc: introduce a synopsis typesetting, 2024-09-24) we
have introduced a new synopsis type that simplifies the rules for
typesetting a command's synopsis. Convert the git-reflog(1)
documentation to use it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leon Michalak [Tue, 29 Jul 2025 07:01:51 +0000 (07:01 +0000)]
add-patch: add diff.context command line overrides
This patch compliments the previous commit, where builtins that use
add-patch infrastructure now respect diff.context and
diff.interHunkContext file configurations.
In particular, this patch helps users who don't want to set persistent
context configurations or just want a way to override them on a one-time
basis, by allowing the relevant builtins to accept corresponding command
line options that override the file configurations.
This mimics commands such as diff and log, which allow for both context
file configuration and command line overrides.
Signed-off-by: Leon Michalak <leonmichalak6@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leon Michalak [Tue, 29 Jul 2025 07:01:50 +0000 (07:01 +0000)]
add-patch: respect diff.context configuration
Various builtins that use add-patch infrastructure do not respect
the user's diff.context and diff.interHunkContext file configurations.
The user may be used to seeing their diffs with customized context size,
but not in the patches "git add -p" shows them to pick from.
Teach add-patch infrastructure to read these configuration variables and
pass their values when spawning the underlying plumbing commands as
their command line option.
Signed-off-by: Leon Michalak <leonmichalak6@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
meson: ensure correct "clar-decls.h" header is used
The "clar-decls.h" header gets generated by us to extract prototypes of
unit test functions from our clar-based tests. This generated file is
then written into "t/unit-tests/" and included via "unit-test.h". The
intent of all this is that we can keep "-Wmissing-prototype" warnings
enabled. If we had that warning disabled, it would be easy to miss in
case any of the non-static functions had a typo in its name and thus
wasn't picked up by our test case extractor.
Including the file directly has a big downside though: if a source tree
was built both with our Makefile and with Meson, then the Meson build
would include the "clar-decls.h" file from our Makefile. And if those
are out of sync we get compiler errors.
We already fixed a similar issue in 4771501c0a (meson: ensure correct
version-def.h is used, 2025-01-14). Let's do the same and pass the
absolute path to "clar-decls.h" via a preprocessor define.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 25 Jul 2025 05:13:09 +0000 (01:13 -0400)]
t7510: use $PWD instead of $(pwd) inside PATH
On Windows, $(pwd) will give us a Windows-style path like "D:/foo".
Putting that into $PATH confuses anybody parsing that variable, since
colon is a separator character in $PATH. Instead, we should use the
Unix-style value we get from $PWD ("/d/foo").
This is similar to the cases fixed by 71dd50472d (t0021, t5615: use $PWD
instead of $(pwd) in PATH-like shell variables, 2016-11-11).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Han Young [Mon, 28 Jul 2025 03:55:48 +0000 (11:55 +0800)]
blame: remove parameter detailed in get_commit_info()
The get_commit_info() function accepts a parameter that can be used
to stop the commit parsing early. However, none of the callers use
this feature, and testing proved that the performance gain of
stopping parsing early is negligible and unmeasurable.
Signed-off-by: Han Young <hanyang.tony@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin: unmark git-switch and git-restore as experimental
In 4e43b7ff (Declare both git-switch and git-restore experimental,
2019-04-25), the newly introduced git-switch(1) and git-restore(1)
commands were marked as experimental. This was done to provide time to
make breaking changes to the interface. It has now been over six years
since these commands were implemented and there hasn't been much change.
Consequently, users have grown to rely on how these commands work and it
is no longer feasible to make any breaking changes.
Let's remove the experimental label for git-switch(1) and
git-restore(1).
Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ref-filter: use REF_ITERATOR_SEEK_SET_PREFIX instead of '1'
In the commit 51511d68f4 (for-each-ref: introduce a '--start-after'
option, 2025-07-15), for introducing the '--start-after' flag, the
`ref_iterator_seek()` was modified to also accept a flag. This was to
allow the function to also set the prefix when
'REF_ITERATOR_SEEK_SET_PREFIX' was set.
In `do_filter_refs()` instead of passing the flag, we pass in '1' which
is the value of the flag. While this works, this is definitely hard to
read and introduces inconsistency. Change it to use the flag.
While here, remove the unnecessary 'if (prefix)' clause in the 'else'
statement, since the block already checks for 'prefix'.
Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t6302: add test combining '--start-after' with '--exclude'
The '--start-after' doesn't explicitly mention being compatible with the
'--exclude' flag, generally only incompatibility is explicitly called
out. However, it would be nice to test the compatibility between the
two to avoid future regressions. Let's do that.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
for-each-ref: reword the documentation for '--start-after'
The documentation for '--start-after' states that the flag cannot be
used with general pattern matching. This is a bit vague, since there is
no clear understanding about what 'general' means here. Rewrite the
sentence to be more specific.
While here, fix a typo in the 'OPT_STRING'.
Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve the 'git-for-each-ref(1)' documentation with two corrections:
1. Add parentheses around `--exclude=<pattern>` to indicate this option
can be repeated as a complete unit.
2. Move `--stdin | <pattern> ...` to the end, after all flags, since
`<pattern>` is a positional argument that should appear last in the
argument list.
While here, change to using the synopsis block which will automatically
format placeholders in italics and keywords in monospace.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit 090eb5336c (refs: selectively set prefix in the seek
functions, 2025-07-15) modified the ref-cache iterator to support
seeking to a specified marker without setting the prefix.
The commit adds and uses an integer 'len' to capture the length of the
seek marker to compare with the entries of a given directory. Since the
type of the variable is 'int', this is met with a typecast of converting
a `strlen` to 'int' so it can be assigned to the 'len' variable.
This is whole operation is a bit wrong:
1. Since the 'len' variable is eventually used in a 'strncmp', it should
have been of type 'size_t'.
2. This also truncates the value provided from 'strlen' to an int, which
could cause a large refname to produce a negative number.
Let's do the correct thing here and simply use 'size_t' for `len`.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdl_hash_record_verbatim uses modified djb2 hash with XOR instead of ADD
for combining. The ADD-based variant is used as the basis of the modern
("GNU") symbol lookup scheme in ELF. Glibc dynamic loader received an
optimized version of this hash function thanks to Noah Goldstein [1].
Switch xdl_hash_record_verbatim to additive hashing and implement
an optimized loop following the scheme suggested by Noah.
Timing 'git log --oneline --shortstat v2.0.0..v2.5.0' under perf, I got
version | cycles, bn | instructions, bn
---------------------------------------
A 6.38 11.3
B 6.21 10.89
C 5.80 9.95
D 5.83 8.74
---------------------------------------
A: baseline (git master at e4ef0485fd78)
B: plus 'xdiff: refactor xdl_hash_record()'
C: and plus this patch
D: with 'xdiff: use xxhash' by Phillip Wood
The resulting speedup for xdl_hash_record_verbatim itself is about 1.5x.
Junio C Hamano [Mon, 28 Jul 2025 19:02:34 +0000 (12:02 -0700)]
Merge branch 'ac/auto-comment-char-fix'
"git commit" that concludes a conflicted merge failed to notice and remove
existing comment added automatically (like "# Conflicts:") when the
core.commentstring is set to 'auto'.
* ac/auto-comment-char-fix:
config: set comment_line_str to "#" when core.commentChar=auto
commit: avoid scanning trailing comments when 'core.commentChar' is "auto"
The pop_most_recent_commit() function can have quite expensive
worst case performance characteristics, which has been optimized by
using prio-queue data structure.
* rs/pop-recent-commit-with-prio-queue:
commit: use prio_queue_replace() in pop_most_recent_commit()
prio-queue: add prio_queue_replace()
commit: convert pop_most_recent_commit() to prio_queue
Christian Couder [Fri, 25 Jul 2025 16:05:36 +0000 (18:05 +0200)]
t9350: redirect input to only fast-import
A number of tests in "t9350-fast-export.sh" are using sub-shells to
redirect content to a number of commands instead of only
`git fast-import`.
This is confusing and possibly error-prone, so let's change those tests
so that no sub-shell is used and the content goes only to
`git fast-import`.
Reported-by: Elijah Newren <newren@gmail.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Fri, 25 Jul 2025 18:41:24 +0000 (20:41 +0200)]
git: show alias info only with lone -h
Builtin commands show usage information on stdout if called with -h as
their only option, usage.c::show_usage_if_asked() makes sure of that.
Aliases show alias information on stderr if called with -h as the first
option since a9a60b94cc (git.c: handle_alias: prepend alias info when
first argument is -h, 2018-10-09). This is surprising when using
aliases for commands that take -h as a normal argument among others,
like git grep.
Tighten the condition and show the alias information only if -h is the
only option given, to be consistent with builtins.
It's probably still is a good idea to write to stderr, as an alias
command doesn't have to be a builtin and could instead produce output
with just -h that might be spoiled by an extra alias info line.
Reported-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 24 Jul 2025 23:03:56 +0000 (16:03 -0700)]
Merge branch 'ss/compat-bswap-revamp'
Clean-up compat/bswap.h mess.
* ss/compat-bswap-revamp:
bswap.h: provide a built-in based version of bswap32/64 if possible
bswap.h: remove optimized x86 version of bswap32/64
bswap.h: always overwrite ntohl/ ntohll macros
bswap.h: define GIT_LITTLE_ENDIAN on msvc as little endian
bswap.h: add support for __BYTE_ORDER__
Junio C Hamano [Thu, 24 Jul 2025 23:03:55 +0000 (16:03 -0700)]
Merge branch 'pw/adopt-c99-bool-officially'
Declare weather-balloon we raised for "bool" type 18 months ago a
success and officially allow using the type in our codebase.
* pw/adopt-c99-bool-officially:
strbuf: convert predicates to return bool
git-compat-util: convert string predicates to return bool
CodingGuidelines: allow the use of bool
In 090eb5336c (refs: selectively set prefix in the seek functions,
2025-07-15) we separated the seeking functionality of reference
iterators from the functionality to set prefix to an iterator. This
allows users of ref iterators to seek to a particular reference to
provide pagination support.
The files-backend, uses the ref-cache iterator to iterate over loose
refs. The iterator tracks directories and entries already processed via
a stack of levels. Each level corresponds to a directory under the files
backend. New levels are added to the stack, and when all entries from a
level is yielded, the corresponding level is popped from the stack.
To accommodate seeking, we need to populate and traverse the levels to
stop the requested seek marker at the appropriate level and its entry
index. Each level also contains a 'prefix_state' which is used for
prefix matching, this allows the iterator to skip levels/entries which
don't match a prefix. The default value of 'prefix_state' is
PREFIX_CONTAINS_DIR, which yields all entries within a level. When
purely seeking without prefix matching, we want to yield all entries.
The commit however, skips setting the value explicitly. This causes the
MemorySanitizer to issue a 'use-of-uninitialized-value' error when
running 't/t6302-for-each-ref-filter'.
Set the value explicitly to avoid to fix the issue.
Reported-by: Kyle Lippincott <spectral@google.com> Helped-by: Kyle Lippincott <spectral@google.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
K Jayatheerth [Thu, 24 Jul 2025 15:24:18 +0000 (20:54 +0530)]
submodule: skip redundant active entries when pattern covers path
configure_added_submodule always writes an explicit
submodule.<name>.active entry, even when the new
path is already matched by submodule.active
patterns. This leads to unnecessary and cluttered configuration.
change the logic to centralize wildmatch-based pattern lookup,
in configure_added_submodule. Wrap the active-entry write in a conditional
that only fires when that helper reports no existing pattern covers the
submodule’s path.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
K Jayatheerth [Thu, 24 Jul 2025 15:24:17 +0000 (20:54 +0530)]
submodule: prevent overwriting .gitmodules on path reuse
Adding a submodule at a path that previously hosted
another submodule (e.g., 'child') reuses the submodule
name derived from the path. If the original submodule
was only moved (e.g., to 'child_old') and not renamed,
this silently overwrites its configuration in .gitmodules.
This behavior loses user configuration and causes
confusion when the original submodule is expected
to remain intact. It assumes that the path-derived
name is always safe to reuse, even though the name
might still be in use elsewhere in the repository.
Teach module_add() to check if the computed submodule
name already exists in the repository's submodule config,
and if so, refuse the operation unless the user explicitly
renames the submodule or uses the --force option,
which will automatically generate a unique name by
appending a number (e.g., child1).
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t/unit-tests: finalize migration of reftable-related tests
The old `lib-reftable.{c,h}` implemented helper functions for our
homegrown unit-testing framework. As part of migrating reftable-related
tests to the Clar framework, Clar-specific versions of these functions
in `lib-reftable-clar.{c,h}` were introduced.
Now that all test files using these helpers have been converted to Clar,
we can safely remove the original `lib-reftable.{c,h}` and rename the
Clar- specific versions back to `lib-reftable.{c,h}`. This restores a
clean and consistent naming scheme for shared test utilities.
Finally, update our build system to reflect the changes made and remove
redundant code related to the reftable tests and our old homegrown
unit-testing setup. `test-lib.{c,h}` remains unchanged in our build
system as some files particularly `t/helper/test-example-tap.c` depends
on it in order to run, and removing that would be beyond the scope of
this patch.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t/unit-tests: convert reftable stack test to use clar
Adapt reftable stack test file to use clar by using clar assertions
where necessary.
This marks the end of all unit tests migrated away from the
`unit-tests/t-*.c` pattern, there are no longer any files matching that
glob. Remove the sanity check for `t-*.c` files to prevent Meson
configuration errors during CI and local builds.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>