Junio C Hamano [Thu, 8 Jan 2026 07:41:19 +0000 (16:41 +0900)]
Merge branch 'bc/sha1-256-interop-02' into seen
The code to maintain mapping between object names in multiple hash
functions is being added, written in Rust.
* bc/sha1-256-interop-02:
object-file-convert: always make sure object ID algo is valid
rust: add a small wrapper around the hashfile code
rust: add a new binary object map format
rust: add functionality to hash an object
rust: add a build.rs script for tests
hash: expose hash context functions to Rust
write-or-die: add an fsync component for the object map
csum-file: define hashwrite's count as a uint32_t
rust: add additional helpers for ObjectID
hash: add a function to look up hash algo structs
rust: add a hash algorithm abstraction
rust: add a ObjectID struct
hash: use uint32_t for object_id algorithm
conversion: don't crash when no destination algo
repository: require Rust support for interoperability
Junio C Hamano [Thu, 8 Jan 2026 07:41:18 +0000 (16:41 +0900)]
Merge branch 'ps/history' into seen
"git history" history rewriting UI.
Comments?
* ps/history:
builtin/history: implement "reword" subcommand
builtin: add new "history" command
wt-status: provide function to expose status for trees
replay: yield the object ID of the final rewritten commit
replay: small set of cleanups
builtin/replay: move core logic into "libgit.a"
builtin/replay: extract core logic to replay revisions
Junio C Hamano [Thu, 8 Jan 2026 07:41:18 +0000 (16:41 +0900)]
Merge branch 'cc/lop-filter-auto' into seen
"auto filter" logic for large-object promisor remote.
Comments?
* cc/lop-filter-auto:
fetch-pack: wire up and enable auto filter logic
promisor-remote: keep advertised filter in memory
list-objects-filter-options: implement auto filter resolution
list-objects-filter-options: support 'auto' mode for --filter
doc: fetch: document `--filter=<filter-spec>` option
fetch: make filter_options local to cmd_fetch()
clone: make filter_options local to cmd_clone()
promisor-remote: allow a client to store fields
promisor-remote: refactor initialising field lists
Junio C Hamano [Thu, 8 Jan 2026 07:41:17 +0000 (16:41 +0900)]
Merge branch 'hn/status-compare-with-push' into seen
"git status" learned to show comparison between the current branch
and its push destination as well as its upstream, when the two are
different (i.e., triangular workflow).
Comments?
* hn/status-compare-with-push:
status: show comparison with push remote tracking branch
refactor format_branch_comparison in preparation
* js/neuter-sideband:
sideband: add options to allow more control sequences to be passed through
sideband: do allow ANSI color sequences by default
sideband: introduce an "escape hatch" to allow control characters
sideband: mask control characters
Junio C Hamano [Thu, 8 Jan 2026 07:41:15 +0000 (16:41 +0900)]
Merge branch 'dw/config-global-list' into seen
"git config --list --global", unlike "git config --list", did not
consult both of the two possible per-user sources of the
configuration files, i.e. $HOME/.gitconfig and the XDG one, which
has been corrected.
* dw/config-global-list:
config: keep bailing on unreadable global files
config: read global scope via config_sequence
config: test home and xdg files in `list --global`
cleanup_path: force forward slashes on Windows
Junio C Hamano [Thu, 8 Jan 2026 07:41:15 +0000 (16:41 +0900)]
Merge branch 'lc/rebase-trailer' into seen
Refactor code paths to run "interpret-trailers" from "git
commit/tag" and use it in "git rebase".
* lc/rebase-trailer:
rebase: support --trailer
trailer: append trailers in-process and drop the fork to `interpret-trailers`
trailer: move process_trailers to trailer.h
interpret-trailers: factor out buffer-based processing to process_trailers()
Junio C Hamano [Thu, 8 Jan 2026 07:41:14 +0000 (16:41 +0900)]
Merge branch 'jc/exclude-with-gitignore' into seen
"git add ':(exclude)foo.o'" is clearly a request not to add 'foo.o',
but the command complained about listing an ignored path foo.o on
the command line, which has been corrected.
Comments?
* jc/exclude-with-gitignore:
dir.c: do not be fooled by :(exclude) pathspec elements
Junio C Hamano [Thu, 8 Jan 2026 07:41:13 +0000 (16:41 +0900)]
Merge branch 'en/xdiff-cleanup-3' into seen
Preparation of xdiff/ codebase to work with Rust
Comments?
* en/xdiff-cleanup-3:
SQUASH??? cocci
xdiff: move xdl_cleanup_records() from xprepare.c to xdiffi.c
xdiff: remove dependence on xdlclassifier from xdl_cleanup_records()
xdiff: replace xdfile_t.dend with xdfenv_t.delta_end
xdiff: replace xdfile_t.dstart with xdfenv_t.delta_start
xdiff: cleanup xdl_trim_ends()
xdiff: use xdfenv_t in xdl_trim_ends() and xdl_cleanup_records()
xdiff: let patience and histogram benefit from xdl_trim_ends()
xdiff: don't waste time guessing the number of lines
xdiff: make classic diff explicit by creating xdl_do_classic_diff()
ivec: introduce the C side of ivec
Junio C Hamano [Thu, 8 Jan 2026 07:41:13 +0000 (16:41 +0900)]
Merge branch 'ob/core-attributesfile-in-repository' into seen
The core.attributesfile is intended to be set per repository, but
were kept track of by a single global variable in-core, which has
been corrected by moving it to per-repository data structure.
Comments?
* ob/core-attributesfile-in-repository:
environment: move "core.attributesFile" into repo-setting
Junio C Hamano [Thu, 8 Jan 2026 07:40:37 +0000 (16:40 +0900)]
Merge branch 'ps/read-object-info-improvements' into jch
The object-info API has been cleaned up.
Comments?
* ps/read-object-info-improvements:
packfile: drop repository parameter from `packed_object_info()`
packfile: skip unpacking object header for disk size requests
packfile: disentangle return value of `packed_object_info()`
packfile: always populate pack-specific info when reading object info
packfile: extend `is_delta` field to allow for "unknown" state
packfile: always declare object info to be OI_PACKED
object-file: always set OI_LOOSE when reading object info
Junio C Hamano [Thu, 8 Jan 2026 07:40:36 +0000 (16:40 +0900)]
Merge branch 'sb/doc-worktree-prune-expire-improvement' into jch
The help text and the documentation for the "--expire" option of
"git worktree [list|prune]" have been improved.
* sb/doc-worktree-prune-expire-improvement:
worktree: use 'prune' instead of 'expire' in help text
worktree: clarify --expire applies to missing worktrees
Junio C Hamano [Thu, 8 Jan 2026 07:40:36 +0000 (16:40 +0900)]
Merge branch 'js/symlink-windows' into jch
Upstream symbolic link support on Windows from Git-for-Windows.
* js/symlink-windows:
mingw: special-case index entries for symlinks with buggy size
mingw: emulate `stat()` a little more faithfully
mingw: try to create symlinks without elevated permissions
mingw: add support for symlinks to directories
mingw: implement basic `symlink()` functionality (file symlinks only)
mingw: implement `readlink()`
mingw: allow `mingw_chdir()` to change to symlink-resolved directories
mingw: support renaming symlinks
mingw: handle symlinks to directories in `mingw_unlink()`
mingw: add symlink-specific error codes
mingw: change default of `core.symlinks` to false
mingw: factor out the retry logic
mingw: compute the correct size for symlinks in `mingw_lstat()`
mingw: teach dirent about symlinks
mingw: let `mingw_lstat()` error early upon problems with reparse points
mingw: drop the separate `do_lstat()` function
mingw: implement `stat()` with symlink support
mingw: don't call `GetFileAttributes()` twice in `mingw_lstat()`
Junio C Hamano [Thu, 8 Jan 2026 07:40:36 +0000 (16:40 +0900)]
Merge branch 'js/prep-symlink-windows' into jch
Further preparation to upstream symbolic link support on Windows.
* js/prep-symlink-windows:
trim_last_path_component(): avoid hard-coding the directory separator
strbuf_readlink(): support link targets that exceed PATH_MAX
strbuf_readlink(): avoid calling `readlink()` twice in corner-cases
init: do parse _all_ core.* settings early
mingw: do resolve symlinks in `getcwd()`
Junio C Hamano [Thu, 8 Jan 2026 07:40:34 +0000 (16:40 +0900)]
Merge branch 'jk/parse-int' into jch
Introduce a more robust way to parse a decimal integer stored in a
piece of memory that is not necessarily terminated with NUL (which
Asan strict-string-check complains even when use of strtol() is
safe due to varified existence of whitespace after the digits).
* jk/parse-int:
fsck: use parse_unsigned_from_buf() for parsing timestamp
cache-tree: use parse_int_from_buf()
parse: add functions for parsing from non-string buffers
parse: prefer bool to int for boolean returns
Junio C Hamano [Thu, 8 Jan 2026 07:40:34 +0000 (16:40 +0900)]
Merge branch 'kn/ref-location' into jch
A mechanism to specify what reference backend to use and store
references in which directory is introduced, which would likely to
be useful during ref migration.
Comments?
* kn/ref-location:
refs: add GIT_REF_URI to specify reference backend and directory
refs: support obtaining ref_store for given dir
Junio C Hamano [Thu, 8 Jan 2026 07:40:32 +0000 (16:40 +0900)]
Merge branch 'ps/packfile-store-in-odb-source' into jch
The packfile_store data structure is moved from object store to odb
source.
* ps/packfile-store-in-odb-source:
packfile: move MIDX into packfile store
packfile: refactor `find_pack_entry()` to work on the packfile store
packfile: inline `find_kept_pack_entry()`
packfile: only prepare owning store in `packfile_store_prepare()`
packfile: only prepare owning store in `packfile_store_get_packs()`
packfile: move packfile store into object source
packfile: refactor misleading code when unusing pack windows
packfile: refactor kept-pack cache to work with packfile stores
packfile: pass source to `prepare_pack()`
packfile: create store via its owning source
Junio C Hamano [Thu, 8 Jan 2026 07:40:32 +0000 (16:40 +0900)]
Merge branch 'ps/clar-integers' into jch
Import newer version of "clar", unit testing framework.
* ps/clar-integers:
gitattributes: disable blank-at-eof errors for clar test expectations
t/unit-tests: demonstrate use of integer comparison assertions
t/unit-tests: update clar to 39f11fe
Junio C Hamano [Thu, 8 Jan 2026 07:40:32 +0000 (16:40 +0900)]
Merge branch 'kh/replay-invalid-onto-advance' into jch
Test coverage of "git replay" has been improved.
* kh/replay-invalid-onto-advance:
t3650: add more regression tests for failure conditions
replay: die if we cannot parse object
replay: improve code comment and die message
replay: die descriptively when invalid commit-ish is given
replay: find *onto only after testing for ref name
replay: remove dead code and rearrange
Junio C Hamano [Thu, 8 Jan 2026 07:40:31 +0000 (16:40 +0900)]
Merge branch 'rs/commit-stack' into jch
Code clean-up, unifying various hand-rolled "list of commit
objects" and use the commit_stack API.
* rs/commit-stack:
commit-reach: use commit_stack
commit-graph: use commit_stack
commit: add commit_stack_grow()
shallow: use commit_stack
pack-bitmap-write: use commit_stack
commit: add commit_stack_init()
test-reach: use commit_stack
remote: use commit_stack for src_commits
remote: use commit_stack for sent_tips
remote: use commit_stack for local_commits
name-rev: use commit_stack
midx: use commit_stack
log: use commit_stack
revision: export commit_stack
Junio C Hamano [Thu, 8 Jan 2026 07:40:30 +0000 (16:40 +0900)]
Merge branch 'ja/doc-synopsis-style-more' into jch
More doc style updates.
* ja/doc-synopsis-style-more:
doc: convert git-remote to synopsis style
doc: convert git stage to use synopsis block
doc: convert git-status tables to AsciiDoc format
doc: convert git-status to synopsis style
doc: fix t0450-txt-doc-vs-help to select only first synopsis block
Junio C Hamano [Thu, 8 Jan 2026 07:40:12 +0000 (16:40 +0900)]
Merge branch 'en/ort-recursive-d-f-conflict-fix'
The ort merge machinery hit an assertion failure in a history with
criss-cross merges renamed a directory and a non-directory, which
has been corrected.
* en/ort-recursive-d-f-conflict-fix:
merge-ort: fix corner case recursive submodule/directory conflict handling
Running "git diff" with "--name-only" and other options that allows
us not to look at the blob contents, while objects that are lazily
fetched from a promisor remote, caused use-after-free, which has
been corrected.
* ds/diff-lazy-fetch-with-name-only-fix:
diff: avoid segfault with freed entries
Junio C Hamano [Thu, 8 Jan 2026 07:40:11 +0000 (16:40 +0900)]
Merge branch 'rs/tag-wo-the-repository'
Code clean-up.
* rs/tag-wo-the-repository:
tag: stop using the_repository
tag: support arbitrary repositories in parse_tag()
tag: support arbitrary repositories in gpg_verify_tag()
tag: use algo of repo parameter in parse_tag_buffer()
Phillip Wood [Thu, 18 Dec 2025 16:50:26 +0000 (16:50 +0000)]
replay: drop commits that become empty
If the changes in a commit being replayed are already in the branch
that the commits are being replayed onto, then "git replay" creates an
empty commit. This is confusing because the commit message no longer
matches the contents of the commit. Drop the commit instead. Commits
that start off empty are not dropped. This matches the behavior of
"git rebase --reapply-cherry-pick --empty=drop" and "git cherry-pick
--empty-drop".
If a branch points to a commit that is dropped it will be updated
to point to the last commit that was not dropped. This can be seen
in the new test where "topic1" is updated to point to the rebased
"C" as "F" is dropped because it is already upstream. While this is
a breaking change, "git replay" is marked as experimental to allow
improvements like this that change the behavior.
Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adrian Ratiu [Wed, 7 Jan 2026 23:01:45 +0000 (01:01 +0200)]
submodule: detect conflicts with existing gitdir configs
Credit goes to Emily and Josh for testing and noticing a corner-case
which caused conflicts with existing gitdir configs to silently pass
validation, then fail later in add_submodule() with a cryptic error:
fatal: A git directory for 'nested%2fsub' is found locally with remote(s):
origin /.../trash directory.t7425-submodule-gitdir-path-extension/sub
This change ensures the validation step checks existing gitdirs for
conflicts. We only have to do this for submodules having gitdirs,
because those without submodule.%s.gitdir need to be migrated and
will throw an error earlier in the submodule codepath.
Quoting Josh:
My testing setup has been as follows:
* Using our locally-built Git with our downstream patch of [1] included:
* create a repo "sub"
* create a repo "super"
* In "super":
* mkdir nested
* git submodule add ../sub nested/sub
* Verify that the submodule's gitdir is .git/modules/nested%2fsub
* Using a build of git from upstream `next` plus this series:
* git config set --global extensions.submodulepathconfig true
* git clone --recurse-submodules super super2
* create a repo "nested%2fsub"
* In "super2":
* git submodule add ../nested%2fsub
At this point I'd expect the collision detection / encoding to take
effect, but instead I get the error listed above.
End quote
Suggested-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adrian Ratiu [Wed, 7 Jan 2026 23:01:44 +0000 (01:01 +0200)]
submodule: hash the submodule name for the gitdir path
If none of the previous plain-text / encoding / derivation steps work
and case 2.4 is reached, then try a hash of the submodule name to see
if that can be a valid gitdir before giving up and throwing an error.
This is a "last resort" type of measure to avoid conflicts since it
loses the human readability of the gitdir path. This logic will be
reached in rare cases, as can be seen in the test we added.
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new check when extension.submodulePathConfig is enabled, to
detect and prevent case-folding filesystem colisions. When this
new check is triggered, a stricter casefolding aware URI encoding
is used to percent-encode uppercase characters.
By using this check/retry mechanism the uppercase encoding is
only applied when necessary, so case-sensitive filesystems are
not affected.
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adrian Ratiu [Wed, 7 Jan 2026 23:01:42 +0000 (01:01 +0200)]
submodule--helper: fix filesystem collisions by encoding gitdir paths
Fix nested filesystem collisions by url-encoding gitdir paths stored
in submodule.%s.gitdir, when extensions.submodulePathConfig is enabled.
Credit goes to Junio and Patrick for coming up with this design: the
encoding is only applied when necessary, to newly added submodules.
Existing modules don't need the encoding because git already errors
out when detecting nested gitdirs before this patch.
This commit adds the basic url-encoding and some tests. Next commits
extend the encode -> validate -> retry loop to fix more conflicts.
Suggested-by: Junio C Hamano <gitster@pobox.com> Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adrian Ratiu [Wed, 7 Jan 2026 23:01:41 +0000 (01:01 +0200)]
builtin/credential-store: move is_rfc3986_unreserved to url.[ch]
is_rfc3986_unreserved() was moved to credential-store.c and was made
static by f89854362c (credential-store: move related functions to
credential-store file, 2023-06-06) under a correct assumption, at the
time, that it was the only place using it.
However now we need it to apply URL-encoding to submodule names when
constructing gitdir paths, to avoid conflicts, so bring it back as a
public function exposed via url.h, instead of the old helper path
(strbuf), which has nothing to do with 3986 encoding/decoding anymore.
This function will be used in subsequent commits which do the encoding.
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adrian Ratiu [Wed, 7 Jan 2026 23:01:40 +0000 (01:01 +0200)]
submodule--helper: add gitdir migration command
Manually running
"git config submodule.<name>.gitdir .git/modules/<name>"
for each submodule can be impractical, so add a migration command to
submodule--helper to automatically create configs for all submodules
as required by extensions.submodulePathConfig.
The command calls create_default_gitdir_config() which validates the
gitdir paths before adding the configs.
Suggested-by: Junio C Hamano <gitster@pobox.com> Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new config `init.defaultSubmodulePathConfig` which allows
enabling `extensions.submodulePathConfig` for new submodules by
default (those created via git init or clone).
Important: setting init.defaultSubmodulePathConfig = true does
not globally enable `extensions.submodulePathConfig`. Existing
repositories will still have the extension disabled and will
require migration (for example via git submodule--helper command
added in the next commit).
Suggested-by: Patrick Steinhardt <ps@pks.im> Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The idea of this extension is to abstract away the submodule gitdir
path implementation: everyone is expected to use the config and not
worry about how the path is computed internally, either in git or
other implementations.
With this extension enabled, the submodule.<name>.gitdir repo config
becomes the single source of truth for all submodule gitdir paths.
The submodule.<name>.gitdir config is added automatically for all new
submodules when this extension is enabled.
Git will throw an error if the extension is enabled and a config is
missing, advising users how to migrate. Migration is manual for now.
E.g. to add a missing config entry for an existing "foo" module:
git config submodule.foo.gitdir .git/modules/foo
Suggested-by: Junio C Hamano <gitster@pobox.com> Suggested-by: Phillip Wood <phillip.wood123@gmail.com> Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adrian Ratiu [Wed, 7 Jan 2026 23:01:37 +0000 (01:01 +0200)]
builtin/submodule--helper: add gitdir command
This exposes the gitdir name computed by submodule_name_to_gitdir()
internally, to make it easier for users and tests to interact with it.
Next commit will add a gitdir configuration, so this helper can also be
used to easily query that config or validate any gitdir path the user
sets (submodule_name_to_git_dir now runs the validation logic, since
our previous commit).
Based-on-patch-by: Brandon Williams <bwilliams.eng@gmail.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the ad-hoc validation checks sprinkled across the source tree,
after calling submodule_name_to_gitdir() into the function proper,
which now always validates the gitdir before returning it.
This simplifies the API and helps to:
1. Avoid redundant validation calls after submodule_name_to_gitdir().
2. Avoid the risk of callers forgetting to validate.
3. Ensure gitdir paths provided by users via configs are always valid
(config gitdir paths are added in a subsequent commit).
The validation function can still be called as many times as needed
outside submodule_name_to_gitdir(), for example we keep two calls
which are still required, to avoid parallel clone races by re-running
the validation in builtin/submodule-helper.c.
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adrian Ratiu [Wed, 7 Jan 2026 23:01:35 +0000 (01:01 +0200)]
submodule--helper: use submodule_name_to_gitdir in add_submodule
While testing submodule gitdir path encoding, I noticed submodule--helper
is still using a hardcoded modules gitdir path leading to test failures.
Call the submodule_name_to_gitdir() helper instead, which was invented
exactly for this purpose and is already used by all the other locations
which work on gitdirs.
Also narrow the scope of the submod_gitdir_path variable which is not
used anymore in the updated "else" branch.
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: drop repository parameter from `packed_object_info()`
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.
Drop the redundant parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: skip unpacking object header for disk size requests
While most of the object info requests for a packed object require us to
unpack its headers, reading its disk size doesn't. We still unpack the
object header in that case though, which is unnecessary work.
Skip reading the header if only the disk size is requested. This leads
to a small speedup when reading disk size, only. The following benchmark
was done in the Git repository:
Benchmark 1: ./git rev-list --disk-usage HEAD (rev = HEAD~)
Time (mean ± σ): 105.2 ms ± 0.6 ms [User: 91.4 ms, System: 13.3 ms]
Range (min … max): 103.7 ms … 106.0 ms 27 runs
Benchmark 2: ./git rev-list --disk-usage HEAD (rev = HEAD)
Time (mean ± σ): 96.7 ms ± 0.4 ms [User: 86.2 ms, System: 10.0 ms]
Range (min … max): 96.2 ms … 98.1 ms 30 runs
Summary
./git rev-list --disk-usage HEAD (rev = HEAD) ran
1.09 ± 0.01 times faster than ./git rev-list --disk-usage HEAD (rev = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: disentangle return value of `packed_object_info()`
The `packed_object_info()` function returns the type of the packed
object. While we use an `enum object_type` to store the return value,
this type is not to be confused with the actual object type. It _may_
contain the object type, but it may just as well encode that the given
packed object is stored as a delta.
We have removed the only caller that relied on this returned object type
in the preceding commit, so let's simplify semantics and return either 0
on success or a negative error code otherwise.
This unblocks a small optimization where we can skip reading the object
type altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: always populate pack-specific info when reading object info
When reading object information via `packed_object_info()` we may not
populate the object info's packfile-specific fields. This leads to
inconsistent object info depending on whether the info was populated via
`packfile_store_read_object_info()` or `packed_object_info()`.
Fix this inconsistency so that we can always assume the pack info to be
populated when reading object info from a pack.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: extend `is_delta` field to allow for "unknown" state
The `struct object_info::u::packed::is_delta` field determines whether
or not a specific object is stored as a delta. It only stores whether or
not the object is stored as delta, so it is treated as a boolean value.
This boolean is insufficient though: when reading a packed object via
`packfile_store_read_object_info()` we know to skip parsing the actual
object when the user didn't request any object-specific data. In that
case we won't read the object itself, but will only look up its position
in the packfile. Consequently, we do not know whether it is a delta or
not.
This isn't really an issue right now, as the check for an empty request
is broken. But a subsequent commit will fix it, and once we do we will
have the need to also represent an "unknown" delta state.
Prepare for this change by introducing a new enum that encodes the
object type. We don't use the "unknown" state just yet, but will start
to do so in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: always declare object info to be OI_PACKED
When reading object info via a packfile we yield one of two types:
- The object can either be OI_PACKED, which is what a caller would
typically expect.
- Or it can be OI_DBCACHED if it is stored in the delta base cache.
The latter really is an implementation detail though, and callers
typically don't care at all about the difference. Furthermore, the
information whether or not it is part of the delta base cache can
already be derived via the `is_delta` field, so the fact that we discern
between OI_PACKED and OI_DBCACHED only further complicates the
interface.
There aren't all that many callers that care about the `whence` field in
the first place. In fact, there's only three:
- `packfile_store_read_object_info()` checks for `whence == OI_PACKED`
and then populates the packfile information of the object info
structure. We now start to do this also for deltified objects, which
gives its callers strictly more information.
- `repack_local_links()` wants to determine whether the object is part
of a promisor pack and checks for `whence == OI_PACKED`. If so, it
verifies that the packfile is a promisor pack. It's arguably wrong
to declare that an object is not part of a promisor pack only
because it is stored in the delta base cache.
- `is_not_in_promisor_pack_obj()` does the same, but checks that a
specific object is _not_ part of a promisor pack. The same reasoning
as above applies.
Drop the OI_DBCACHED enum completely. None of the callers seem to care
about the distinction.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-file: always set OI_LOOSE when reading object info
There are some early returns in `odb_source_loose_read_object_info()`
in cases where we don't have to open the loose object. These return
paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
it becomes impossible for the caller to tell the format of such an
object.
The root cause of this really is that we have so many different return
paths in the function. As a consequence, it's harder than necessary to
make sure that all successful exit paths sot up the `whence` field as
expected.
Address this by refactoring the function to have a single exit path.
Like this, we can trivially set up the `whence` field when we exit
successfully from the function.
Note that we also:
- Rename `status` to `ret` to match our usual coding style, but also
to show that the old `status` variable is now always getting the
expected value. Furthermore, the value is not initialized anymore,
which has the consequence that most compilers will warn for exit
paths where we forgot to set it.
- Move the setup of scratch pointers closer to `parse_loose_header()`
to show where it's needed.
- Guard a couple of variables on cleanup so that they only get
released in case they have been set up.
- Reset `oi->delta_base_oid` towards the end of the function, together
with all the other object info pointers.
Overall, all these changes result in a diff that is somewhat hard to
read. But the end result is significantly easier to read and reason
about, so I'd argue this one-time churn is worth it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ashlesh Gawande [Wed, 7 Jan 2026 07:47:24 +0000 (13:17 +0530)]
t5550: add netrc tests for http 401/403
git allows using .netrc file to supply credentials for HTTP auth.
Three test cases are added in this patch to provide missing coverage
when cloning over HTTP using .netrc file:
- First test case checks that the git clone is successful when credentials
are provided via .netrc file
- Second test case checks that the git clone fails when the .netrc file
provides invalid credentials. The HTTP server is expected to return
401 Unauthorized in such a case. The test checks that the user is
provided with a prompt for username/password on 401 to provide
the valid ones.
- Third test case checks that the git clone fails when the .netrc file
provides credentials that are valid but do not have permission for
this user. For example one may have multiple tokens in GitHub
and uses the one which was not authorized for cloning this repo.
In such a case the HTTP server returns 403 Forbidden.
For this test, the apache.conf is modified to return a 403
on finding a forbidden-user. No prompt for username/password is
expected after the 403 (unlike 401). This is because prompting may wipe
out existing credentials or conflict with custom credential helpers.
Signed-off-by: Ashlesh Gawande <git@ashlesh.me> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement a new "reword" subcommand for git-history(1). This subcommand
is similar to the user performing an interactive rebase with a single
commit changed to use the "reword" instruction.
The "reword" subcommand is built on top of the replay subsystem
instead of the sequencer. This leads to some major differences compared
to git-rebase(1):
- We do not check out the commit that is to be reworded and instead
perform the operation in-memory. This has the obvious benefit of
being significantly faster compared to git-rebase(1), but even more
importantly it allows the user to rewrite history even if there are
local changes in the working tree or in the index.
- We do not execute any hooks, even though we leave some room for
changing this in the future.
- By default, all local branches that contain the commit will be
rewritten. This especially helps with workflows that use stacked
branches.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When rewriting history via git-rebase(1) there are a few very common use
cases:
- The ordering of two commits should be reversed.
- A commit should be split up into two commits.
- A commit should be dropped from the history completely.
- Multiple commits should be squashed into one.
- Editing an existing commit that is not the tip of the current
branch.
While these operations are all doable, it often feels needlessly kludgey
to do so by doing an interactive rebase, using the editor to say what
one wants, and then perform the actions. Also, some operations like
splitting up a commit into two are way more involved than that and
require a whole series of commands.
Another problem that rebases have is that dependent branches are not
being updated. The use of stacked branches has grown quite common with
competiting version control systems like Jujutsu though, so it clearly
is a need that users have. While rebases _can_ serve this use case if
one always works on the latest stacked branch, it is somewhat awkward
and very easy to get wrong.
Add a new "history" command to plug these gaps. This command will have
several different subcommands to imperatively rewrite history for common
use cases like the above.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
wt-status: provide function to expose status for trees
The "wt-status" subsystem is responsible for printing status information
around the current state of the working tree. This most importantly
includes information around whether the working tree or the index have
any changes.
We're about to introduce a new command where the changes in neither of
them are actually relevant to us. Instead, what we want is to format the
changes between two different trees. While it is a little bit of a
stretch to add this as functionality to _working tree_ status, it
doesn't make any sense to open-code this functionality, either.
Implement a new function `wt_status_collect_changes_trees()` that diffs
two trees and formats the status accordingly. This function is not yet
used, but will be in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
replay: yield the object ID of the final rewritten commit
In a subsequent commit we'll introduce a new git-history(1) command that
uses the replay machinery to rewrite commits. One of its supported modes
will only want to update the "HEAD" reference, but that is not currently
supported by the replay machinery.
Allow implementing this use case by exposing a `final_oid` field for the
reference updates. This field will be set to the last commit that was
rewritten, which is sufficient information for us to implement this mode
in git-history(1).
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Perform a small set of cleanups so that the "replay" logic compiles with
"-Wsign-compare" and doesn't use `the_repository` anymore. Note that
there are still some implicit dependencies on `the_repository`, e.g.
because we use `get_commit_output_encoding()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the core logic used to replay commits into "libgit.a" so that it
can be easily reused by other commands. It will be used in a subsequent
commit where we're about to introduce a new git-history(1) command.
Note that with this change we have no sign-comparison warnings anymore,
and neither do we depend on `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/replay: extract core logic to replay revisions
We're about to move the core logic used to replay revisions onto a new
base into the "libgit.a" library. Prepare for this by pulling out the
logic into a new function `replay_revisions()` that:
1. Takes a set of revisions to replay and some options that tell it how
it ought to replay the revisions.
2. Replays the commits.
3. Records any reference updates that would be caused by replaying the
commits in a structure that is owned by the caller.
The logic itself will be moved into a separate file in the next commit.
This change is not expected to cause user-visible change in behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 8 Jan 2026 02:01:22 +0000 (11:01 +0900)]
Merge branch 'kh/replay-invalid-onto-advance' into ps/history
* kh/replay-invalid-onto-advance:
t3650: add more regression tests for failure conditions
replay: die if we cannot parse object
replay: improve code comment and die message
replay: die descriptively when invalid commit-ish is given
replay: find *onto only after testing for ref name
replay: remove dead code and rearrange
Paulo Casaretto [Wed, 7 Jan 2026 16:45:55 +0000 (16:45 +0000)]
lockfile: add PID file for debugging stale locks
When a lock file is held, it can be helpful to know which process owns
it, especially when debugging stale locks left behind by crashed
processes. Add an optional feature that creates a companion PID file
alongside each lock file, containing the PID of the lock holder.
For a lock file "foo.lock", the PID file is named "foo~pid.lock". The
tilde character is forbidden in refnames and allowed in Windows
filenames, which guarantees no collision with the refs namespace
(e.g., refs "foo" and "foo~pid" cannot both exist). The file contains
a single line in the format "pid <value>" followed by a newline.
The PID file is created when a lock is acquired (if enabled), and
automatically cleaned up when the lock is released (via commit or
rollback). The file is registered as a tempfile so it gets cleaned up
by signal and atexit handlers if the process terminates abnormally.
When a lock conflict occurs, the code checks for an existing PID file
and, if found, uses kill(pid, 0) to determine if the process is still
running. This allows providing context-aware error messages:
Lock is held by process 12345. Wait for it to finish, or remove
the lock file to continue.
Or for a stale lock:
Lock was held by process 12345, which is no longer running.
Remove the stale lock file to continue.
The feature is controlled via core.lockfilePid configuration (boolean).
Defaults to false. When enabled, PID files are created for all lock
operations.
Existing PID files are always read when displaying lock errors,
regardless of the core.lockfilePid setting. This ensures helpful
diagnostics even when the feature was previously enabled and later
disabled.
Signed-off-by: Paulo Casaretto <pcasaretto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
add -p: show user's hunk decision when selecting hunks
When a user is interactively deciding which hunks to use or skip for
staging, unstaging, stashing etc, there is no way to know the
decision previously chosen for a hunk when navigating through the
previous and next hunks using K/J respectively.
Improve the UI to explicitly show if a user has previously decided to
use a hunk (by pressing 'y') or skip the hunk (by pressing 'n').
This will improve clarity and aid the navigation process for the
user.
Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Wed, 7 Jan 2026 01:29:26 +0000 (01:29 +0000)]
fsck: snapshot default refs before object walk
Fsck has a race when operating on live repositories; consider the
following simple script that writes new commits as fsck runs:
#!/bin/bash
git fsck &
PID=$!
while ps -p $PID >/dev/null; do
sleep 3
git commit -q --allow-empty -m "Another commit"
done
Since fsck walks objects for connectivity and then reads the refs at the
end to check, this can cause fsck to get confused and think that the new
refs refer to missing commits and that new reflog entries are invalid.
Running the above script in a clone of git.git results in the following
(output ellipsized to remove additional errors of the same type):
We can minimize the race opportunities by taking a snapshot of refs at
program invocation, doing the connectivity check, and then checking the
snapshotted refs afterward. This avoids races with regular refs between
fsck and adding objects to the database, though it still leaves a race
between a gc and fsck. We are less concerned about folks simultaneously
running gc with fsck; though, if it becomes an issue, we could lock fsck
during gc. We definitely do not want to lock fsck during operations
that may add objects to the object store; that would be problematic for
forges.
Note that refs aren't the only problem, though; reflog entries and index
entries could be problematic as well. For now we punt on index entries
just leaving a TODO comment, and for reflogs we use a coarse solution of
taking the time at the beginning of the program and ignoring reflog
entries newer than that time. That may be imperfect if dealing with a
network filesystem, so we leave TODO comment for those that want to
improve that handling as well.
As a high level overview:
* In addition to fsck_handle_ref(), which now is only a few lines long
to process a ref, there's also a snapshot_ref() which is called
early in the program for each ref and takes all the error checking
logic.
* The iterating over refs that used to be in get_default_heads() plus
a loop over the arguments now appears in shapshot_refs().
* There's a new process_refs() as well that kind of looks like the old
get_default_heads() though it is streamlined due to the work done by
snapshot_refs().
This combination of changes modifies the output of running the script
(from the beginning of this commit message) to:
$ ./fsck-while-writing.sh
Checking ref database: 100% (1/1), done.
Checking object directories: 100% (256/256), done.
warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: invalid format - expected 'tagger' line
Checking objects: 100% (835091/835091), done.
Checking connectivity: 833846, done.
Verifying commits in commit graph: 100% (242243/242243), done.
While worries about live updates while running fsck is likely of most
interest for forge operators, it may also benefit those with
automated jobs (such as git maintenance) or even casual users who want
to do other work in their clone while fsck is running.
Originally-based-on-a-patch-by: Matthew John Cheetham <mjcheetham@outlook.com> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-pack index still is tracked as a member of the object database
source, but ultimately the MIDX is always tied to one specific packfile
store.
Move the structure into `struct packfile_store` accordingly. This
ensures that the packfile store now keeps track of all data related to
packfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: refactor `find_pack_entry()` to work on the packfile store
The function `find_pack_entry()` doesn't work on a specific packfile
store, but instead works on the whole repository. This causes a bit of a
conceptual mismatch in its callers:
- `packfile_store_freshen_object()` supposedly acts on a store, and
its callers know to iterate through all sources already.
The `find_kept_pack_entry()` function is only used in
`has_oject_kept_pack()`, which is only a trivial wrapper itself. Inline
the latter into the former.
Furthermore, reorder the code so that we can drop the declaration of the
function in "packfile.h". This allows us to make the function file-local.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: only prepare owning store in `packfile_store_prepare()`
When calling `packfile_store_prepare()` we prepare not only the provided
packfile store, but also all those of all other sources part of the same
object database. This was required when the store was still sitting on
the object database level. But now that it sits on the source level it's
not anymore.
Refactor the code so that we only prepare the single packfile store
passed by the caller. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: only prepare owning store in `packfile_store_get_packs()`
When calling `packfile_store_get_packs()` we prepare not only the
provided packfile store, but also all those of all other sources part of
the same object database. This was required when the store was still
sitting on the object database level. But now that it sits on the source
level it's not anymore.
Adapt the code so that we only prepare the MIDX of the provided store.
All callers only work in the context of a single store or call the
function in a loop over all sources, so this change shouldn't have any
practical effects.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packfile store is a member of `struct object_database`, which means
that we have a single store per database. This doesn't really make much
sense though: each source connected to the database has its own set of
packfiles, so there is a conceptual mismatch here. This hasn't really
caused much of a problem in the past, but with the advent of pluggable
object databases this is becoming more of a problem because some of the
sources may not even use packfiles in the first place.
Move the packfile store down by one level from the object database into
the object database source. This ensures that each source now has its
own packfile store, and we can eventually start to abstract it away
entirely so that the caller doesn't even know what kind of store it
uses.
Note that we only need to adjust a relatively small number of callers,
way less than one might expect. This is because most callers are using
`repo_for_each_pack()`, which handles enumeration of all packfiles that
exist in the repository. So for now, none of these callers need to be
adapted. The remaining callers that iterate through the packfiles
directly and that need adjustment are those that are a bit more tangled
with packfiles. These will be adjusted over time.
Note that this patch only moves the packfile store, and there is still a
bunch of functions that seemingly operate on a packfile store but that
end up iterating over all sources. These will be adjusted in subsequent
commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: refactor misleading code when unusing pack windows
The function `unuse_one_window()` is responsible for unmapping one of
the packfile windows, which is done when we have exceeded the allowed
number of window.
The function receives a `struct packed_git` as input, which serves as an
additional packfile that should be considered to be closed. If not
given, we seemingly skip that and instead go through all of the
repository's packfiles. The conditional that checks whether we have a
packfile though does not make much sense anymore, as we dereference the
packfile regardless of whether or not it is a `NULL` pointer to derive
the repository's packfile store.
The function was originally introduced via f0e17e86e1 (pack: move
release_pack_memory(), 2017-08-18), and here we indeed had a caller that
passed a `NULL` pointer. That caller was later removed via 9827d4c185
(packfile: drop release_pack_memory(), 2019-08-12), so starting with
that commit we always pass a `struct packed_git`. In 9c5ce06d74
(packfile: use `repository` from `packed_git` directly, 2024-12-03) we
then inadvertently started to rely on the fact that the pointer is never
`NULL` because we use it now to identify the repository.
Arguably, it didn't really make sense in the first place that the caller
provides a packfile, as the selected window would have been overridden
anyway by the subsequent loop over all packfiles if there was an older
window. So the overall logic is quite misleading overall. The only case
where it _could_ make a difference is when there were two packfiles with
the same `last_used` value, but that case doesn't ever happen because
the `pack_used_ctr` is strictly increasing.
Refactor the code so that we instead pass in the object database to
help make the code less misleading.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: refactor kept-pack cache to work with packfile stores
The kept pack cache is a cache of packfiles that are marked as kept
either via an accompanying ".kept" file or via an in-memory flag. The
cache can be retrieved via `kept_pack_cache()`, where one needs to pass
in a repository.
Ultimately though the kept-pack cache is a property of the packfile
store, and this causes problems in a subsequent commit where we want to
move down the packfile store to be a per-object-source entity.
Prepare for this and refactor the kept-pack cache to work on top of a
packfile store instead. While at it, rename both the function and flags
specific to the kept-pack cache so that they can be properly attributed
to the respective subsystems.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When preparing a packfile we pass various pieces attached to the pack's
object database source via the `struct prepare_pack_data`. Refactor this
code to instead pass in the source directly. This reduces the number of
variables we need to pass and allows for a subsequent refactoring where
we start to prepare the pack via the source.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In subsequent patches we're about to move the packfile store from the
object database layer into the object database source layer. Once done,
we'll have one packfile store per source, where the source is owning the
store.
Prepare for this future and refactor `packfile_store_new()` to be
initialized via an object database source instead of via the object
database itself.
This refactoring leads to a weird in-between state where the store is
owned by the object database but created via the source. But this makes
subsequent refactorings easier because we can now start to access the
owning source of a given store.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test indirectly checks that the lost-found folder has 2 files in it
and then checks that the expected two files exist. Make this more
deliberate by removing the old test -f and compare the actual ls of the
lost-found directory with the expected files.
Signed-off-by: Andrew Chitester <andchi@fastmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible to hit a memory leak when reading data from a submodule
via git-grep(1):
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x55555562e726 in calloc (git+0xda726)
#1 0x555555964734 in xcalloc ../wrapper.c:154:8
#2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2
#3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6
#4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17
#5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3
#6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2
#7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2
#8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4
#9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8
#10 0x5555558540bd in odb_read_object ../odb.c:920:6
#11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12
#12 0x55555580a13a in grep_source_load ../grep.c:1986:10
#13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7
#14 0x555555807574 in grep_source_1 ../grep.c:1625:8
#15 0x555555807322 in grep_source ../grep.c:1837:10
#16 0x5555556a5c58 in run ../builtin/grep.c:208:10
#17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o
#18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
The root caues of this leak is the way we set up and release the
submodule:
1. We use `repo_submodule_init()` to initialize a new repository. This
repository is stored in `repos_to_free`.
2. We now read data from the submodule repository.
3. We then call `repo_clear()` on the submodule repositories.
4. `repo_clear()` calls `odb_free()`.
5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`.
The issue here is the 5th step: we call `odb_free_sources()` _before_ we
call `odb_close()`. But `odb_free_sources()` already frees all sources,
so the logic that closes them in `odb_close()` now becomes a no-op. As a
consequence, we never explicitly close sources at all.
Fix the leak by closing the store before we free the sources.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/gc: fix condition for whether to write commit graphs
When performing auto-maintenance we check whether commit graphs need to
be generated by counting the number of commits that are reachable by any
reference, but not covered by a commit graph. This search is performed
by iterating through all references and then doing a depth-first search
until we have found enough commits that are not present in the commit
graph.
This logic has a memory leak though:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x55555562e433 in malloc (git+0xda433)
#1 0x555555964322 in do_xmalloc ../wrapper.c:55:8
#2 0x5555559642e6 in xmalloc ../wrapper.c:76:9
#3 0x55555579bf29 in commit_list_append ../commit.c:1872:35
#4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4
#5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12
#6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9
#7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9
#8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11
#9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8
#10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9
#11 0x55555575166a in run_builtin ../git.c:506:11
#12 0x5555557502f0 in handle_builtin ../git.c:779:9
#13 0x555555751127 in run_argv ../git.c:862:4
#14 0x55555575007b in cmd_main ../git.c:984:19
#15 0x5555557523aa in main ../common-main.c:9:11
#16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#18 0x5555555f0934 in _start (git+0x9c934)
The root cause of this memory leak is our use of `commit_list_append()`.
This function expects as parameters the item to append and the _tail_ of
the list to append. This tail will then be overwritten with the new tail
of the list so that it can be used in subsequent calls. But we call it
with `commit_list_append(parent->item, &stack)`, so we end up losing
everything but the new item.
This issue only surfaces when counting merge commits. Next to being a
memory leak, it also shows that we're in fact miscounting as we only
respect children of the last parent. All previous parents are discarded,
so their children will be disregarded unless they are hit via another
reference.
While crafting a test case for the issue I was puzzled that I couldn't
establish the proper border at which the auto-condition would be
fulfilled. As it turns out, there's another bug: if an object is at the
tip of any reference we don't mark it as seen. Consequently, if it is
the tip of or reachable via another ref, we'd count that object multiple
times.
Fix both of these bugs so that we properly count objects without leaking
any memory.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>