Karsten Blees [Fri, 9 Jan 2026 20:05:06 +0000 (20:05 +0000)]
mingw: add symlink-specific error codes
The Win32 API calls do not set `errno`; Instead, error codes for failed
operations must be obtained via the `GetLastError()` function. Git would
not know what to do with those error values, though, which is why Git's
Windows compatibility layer translates them to `errno` values.
Let's handle a couple of symlink-related error codes that will become
relevant with the upcoming support for symlinks on Windows.
Signed-off-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karsten Blees [Fri, 9 Jan 2026 20:05:05 +0000 (20:05 +0000)]
mingw: change default of `core.symlinks` to false
Symlinks on Windows don't work the same way as on Unix systems. For
example, there are different types of symlinks for directories and
files, and unless using a recent-ish Windows version in Developer Mode,
creating symlinks requires administrative privileges.
By default, disable symlink support on Windows. That is, users
explicitly have to enable it with `git config [--system|--global]
core.symlinks true`; For convenience, `git init` (and `git clone`)
will perform a test whether the current setup allows creating symlinks
and will configure that setting in the repository config.
The test suite ignores system / global config files. Allow
testing *with* symlink support by checking if native symlinks are
enabled in MSYS2 (via setting the special environment variable
`MSYS=winsymlinks:nativestrict` to ask the MSYS2 runtime to enable
creating symlinks).
Note: This assumes that Git's test suite is run in MSYS2's Bash, which
is true for the time being (an experiment to switch to BusyBox-w32
failed due to the experimental nature of BusyBox-w32).
Signed-off-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karsten Blees [Fri, 9 Jan 2026 20:05:04 +0000 (20:05 +0000)]
mingw: factor out the retry logic
In several places, Git's Windows-specific code follows the pattern where
it tries to perform an operation, and retries several times when that
operation fails, sleeping an increasing amount of time, before finally
giving up and asking the user whether to rety (after, say, closing an
editor that held a handle to a file, preventing the operation from
succeeding).
This logic is a bit hard to use, and inconsistent:
`mingw_unlink()` and `mingw_rmdir()` duplicate the code to retry,
and both of them do so incompletely. They also do not restore `errno` if the
user answers 'no'.
Introduce a `retry_ask_yes_no()` helper function that handles retry with
small delay, asking the user, and restoring `errno`.
Note that in `mingw_unlink()`, we include the `_wchmod()` call in the
retry loop (which may fail if the file is locked exclusively).
In `mingw_rmdir()`, we include special error handling in the retry loop.
Signed-off-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
mingw: compute the correct size for symlinks in `mingw_lstat()`
POSIX specifies that upon successful return from `lstat()`: "the
value of the st_size member shall be set to the length of the pathname
contained in the symbolic link not including any terminating null byte".
Git typically doesn't trust the `stat.st_size` member of symlinks (e.g.
see `strbuf_readlink()`). Therefore, it is tempting to save on the extra
overhead of opening and reading the reparse point merely to calculate
the exact size of the link target.
This is, in fact, what Git for Windows did, from May 2015 to May 2020.
At least almost: some functions take shortcuts if `st_size` is 0 (e.g.
`diff_populate_filespec()`), hence Git for Windows hard-coded the length
of all symlinks to MAX_PATH.
This did cause problems, though, specifically in Git repositories
that were also accessed by Git for Cygwin or Git for WSL. For example,
doing `git reset --hard` using Git for Windows would update the size of
symlinks in the index to be MAX_PATH; at a later time Git for Cygwin
or Git for WSL would find that symlinks have changed size during `git
status` and update the index. And then Git for Windows would think that
the index needs to be updated. Even if the symlinks did not, in fact,
change. To avoid that, the correct size must be determined.
Signed-off-by: Bill Zissimopoulos <billziss@navimatics.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karsten Blees [Fri, 9 Jan 2026 20:05:02 +0000 (20:05 +0000)]
mingw: teach dirent about symlinks
Move the `S_IFLNK` detection to `file_attr_to_st_mode()`.
Implement `DT_LNK` detection in dirent.c's `readdir()` function.
Signed-off-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karsten Blees [Fri, 9 Jan 2026 20:05:01 +0000 (20:05 +0000)]
mingw: let `mingw_lstat()` error early upon problems with reparse points
When obtaining lstat information for reparse points, we need to call
`FindFirstFile()` in addition to `GetFileInformationEx()` to obtain
the type of the reparse point (symlink, mount point etc.). However,
currently there is no error handling whatsoever if `FindFirstFile()`
fails.
Call `FindFirstFile()` before modifying the `stat *buf` output parameter
and error out if the call fails.
Note: The `FindFirstFile()` return value includes all the data
that we get from `GetFileAttributesEx()`, so we could replace
`GetFileAttributesEx()` with `FindFirstFile()`. We don't do that because
`GetFileAttributesEx()` is about twice as fast for single files. I.e.
we only pay the extra cost of calling `FindFirstFile()` in the rare case
that we encounter a reparse point.
Please also note that the indentation the remaining reparse point
code changed, and hence the best way to look at this diff is with
`--color-moved -w`. That code was _not_ moved because a subsequent
commit will move it to an altogether different function, anyway.
Signed-off-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karsten Blees [Fri, 9 Jan 2026 20:05:00 +0000 (20:05 +0000)]
mingw: drop the separate `do_lstat()` function
With the new `mingw_stat()` implementation, `do_lstat()` is only called
from `mingw_lstat()` (with the function parameter `follow == 0`). Remove
the extra function and the old `mingw_stat()`-specific (`follow == 1`)
logic.
Signed-off-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karsten Blees [Fri, 9 Jan 2026 20:04:59 +0000 (20:04 +0000)]
mingw: implement `stat()` with symlink support
With respect to symlinks, the current `mingw_stat()` implementation is
almost identical to `mingw_lstat()`: except for the file type (`st_mode
& S_IFMT`), it returns information about the link rather than the target.
Implement `mingw_stat()` by opening the file handle requesting minimal
permissions, and then calling `GetFileInformationByHandle()` on it. This
way, all links are resolved by the Windows file system layer.
Signed-off-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karsten Blees [Fri, 9 Jan 2026 20:04:58 +0000 (20:04 +0000)]
mingw: don't call `GetFileAttributes()` twice in `mingw_lstat()`
The Win32 API function `GetFileAttributes()` cannot handle paths with
trailing dir separators. The current `mingw_stat()`/`mingw_lstat()`
implementation calls `GetFileAttributes()` twice if the path has
trailing slashes (first with the original path that was passed as
function parameter, and and a second time with a path copy with trailing
'/' removed).
With the conversion to wide Unicode, we get the length of the path for
free, and also have a (wide char) buffer that can be modified. This
makes it easy to avoid that extraneous Win32 API call.
Signed-off-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sat, 10 Jan 2026 02:32:29 +0000 (18:32 -0800)]
Merge branch 'js/prep-symlink-windows' into js/symlink-windows
* js/prep-symlink-windows:
trim_last_path_component(): avoid hard-coding the directory separator
strbuf_readlink(): support link targets that exceed 2*PATH_MAX
strbuf_readlink(): avoid calling `readlink()` twice in corner-cases
init: do parse _all_ core.* settings early
mingw: do resolve symlinks in `getcwd()`
t7800: work around the MSYS path conversion on Windows
t6423: introduce Windows-specific handling for symlinking to /dev/null
t1305: skip symlink tests that do not apply to Windows
t1006: accommodate for symlink support in MSYS2
t0600: fix incomplete prerequisite for a test case
t0301: another fix for Windows compatibility
t0001: handle `diff --no-index` gracefully
mingw: special-case `open(symlink, O_CREAT | O_EXCL)`
apply: symbolic links lack a "trustable executable bit"
t9700: accommodate for Windows paths
Karsten Blees [Fri, 9 Jan 2026 20:05:09 +0000 (20:05 +0000)]
trim_last_path_component(): avoid hard-coding the directory separator
Currently, this function hard-codes the directory separator as the
forward slash.
However, on Windows the backslash character is valid, too. And we want
to call this function in the upcoming support for symlinks on Windows
with the symlink targets (which naturally use the canonical directory
separator on Windows, which is _not_ the forward slash).
Prepare that function to be useful also in that context.
Signed-off-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf_readlink(): support link targets that exceed 2*PATH_MAX
The `strbuf_readlink()` function refuses to read link targets that
exceed 2*PATH_MAX (even if a sufficient size was specified by the
caller).
The reason that that limit is 2*PATH_MAX instead of PATH_MAX is that
the symlink targets do not need to be normalized. After running
`ln -s a/../a/../a/../a/../b c`, the target of the symlink `c` will not
be normalized to `b` but instead be much longer. As such, symlink
targets' lengths can far exceed PATH_MAX.
They are frequently much longer than 2*PATH_MAX on Windows, which
actually supports paths up to 32,767 characters, but sets PATH_MAX to
260 for backwards compatibility. For full details, see
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
Let's just hard-code the limit used by `strbuf_readlink()` to 32,767 and
make it independent of the current platform's PATH_MAX.
Based-on-a-patch-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karsten Blees [Fri, 9 Jan 2026 20:05:07 +0000 (20:05 +0000)]
strbuf_readlink(): avoid calling `readlink()` twice in corner-cases
The `strbuf_readlink()` function calls `readlink()`` twice if the hint
argument specifies the exact size of the link target (e.g. by passing
stat.st_size as returned by `lstat()`). This is necessary because
`readlink(..., hint) == hint` could mean that the buffer was too small.
Use `hint + 1` as buffer size to prevent this.
Signed-off-by: Karsten Blees <karsten.blees@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In Git for Windows, `has_symlinks` is set to 0 by default. Therefore, we
need to parse the config setting `core.symlinks` to know if it has been
set to `true`. In `git init`, we must do that before copying the
templates because they might contain symbolic links.
Even if the support for symbolic links on Windows has not made it to
upstream Git yet, we really should make sure that all the `core.*`
settings are parsed before proceeding, as they might very well change
the behavior of `git init` in a way the user intended.
This fixes https://github.com/git-for-windows/git/issues/3414
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
As pointed out in https://github.com/git-for-windows/git/issues/1676,
the `git rev-parse --is-inside-work-tree` command currently fails when
the current directory's path contains symbolic links.
The underlying reason for this bug is that `getcwd()` is supposed to
resolve symbolic links, but our `mingw_getcwd()` implementation did not.
We do have all the building blocks for that, though: the
`GetFinalPathByHandleW()` function will resolve symbolic links. However,
we only called that function if `GetLongPathNameW()` failed, for
historical reasons: the latter function was supported for a long time,
but the former API function was introduced only with Windows Vista, and
we used to support also Windows XP. With that support having been
dropped, we are free to call the symbolic link-resolving function right
away.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Fri, 9 Jan 2026 17:49:13 +0000 (17:49 +0000)]
fsck: snapshot default refs before object walk
Fsck has a race when operating on live repositories; consider the
following simple script that writes new commits as fsck runs:
#!/bin/bash
git fsck &
PID=$!
while ps -p $PID >/dev/null; do
sleep 3
git commit -q --allow-empty -m "Another commit"
done
Since fsck walks objects for connectivity and then reads the refs at the
end to check, this can cause fsck to get confused and think that the new
refs refer to missing commits and that new reflog entries are invalid.
Running the above script in a clone of git.git results in the following
(output ellipsized to remove additional errors of the same type):
We can minimize the race opportunities by taking a snapshot of refs at
program invocation, doing the connectivity check, and then checking the
snapshotted refs afterward. This avoids races with regular refs between
fsck and adding objects to the database, though it still leaves a race
between a gc and fsck. We are less concerned about folks simultaneously
running gc with fsck; though, if it becomes an issue, we could lock fsck
during gc. We definitely do not want to lock fsck during operations
that may add objects to the object store; that would be problematic for
forges.
Note that refs aren't the only problem, though; reflog entries and index
entries could be problematic as well. For now we punt on index entries
just leaving a TODO comment, and for reflogs we use a coarse solution of
taking the time at the beginning of the program and ignoring reflog
entries newer than that time. That may be imperfect if dealing with a
network filesystem, so we leave TODO comment for those that want to
improve that handling as well.
As a high level overview:
* In addition to fsck_handle_ref(), which now is only a few lines long
to process a ref, there's also a snapshot_ref() which is called
early in the program for each ref and takes all the error checking
logic.
* The iterating over refs that used to be in get_default_heads() plus
a loop over the arguments now appears in shapshot_refs().
* There's a new process_refs() as well that kind of looks like the old
get_default_heads() though it is streamlined due to the work done by
snapshot_refs().
This combination of changes modifies the output of running the script
(from the beginning of this commit message) to:
$ ./fsck-while-writing.sh
Checking ref database: 100% (1/1), done.
Checking object directories: 100% (256/256), done.
warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: invalid format - expected 'tagger' line
Checking objects: 100% (835091/835091), done.
Checking connectivity: 833846, done.
Verifying commits in commit graph: 100% (242243/242243), done.
While worries about live updates while running fsck is likely of most
interest for forge operators, it may also benefit those with
automated jobs (such as git maintenance) or even casual users who want
to do other work in their clone while fsck is running.
Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-pack index still is tracked as a member of the object database
source, but ultimately the MIDX is always tied to one specific packfile
store.
Move the structure into `struct packfile_store` accordingly. This
ensures that the packfile store now keeps track of all data related to
packfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: refactor `find_pack_entry()` to work on the packfile store
The function `find_pack_entry()` doesn't work on a specific packfile
store, but instead works on the whole repository. This causes a bit of a
conceptual mismatch in its callers:
- `packfile_store_freshen_object()` supposedly acts on a store, and
its callers know to iterate through all sources already.
The `find_kept_pack_entry()` function is only used in
`has_object_kept_pack()`, which is only a trivial wrapper itself. Inline
the latter into the former.
Furthermore, reorder the code so that we can drop the declaration of the
function in "packfile.h". This allows us to make the function file-local.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: only prepare owning store in `packfile_store_prepare()`
When calling `packfile_store_prepare()` we prepare not only the provided
packfile store, but also all those of all other sources part of the same
object database. This was required when the store was still sitting on
the object database level. But now that it sits on the source level it's
not anymore.
Refactor the code so that we only prepare the single packfile store
passed by the caller. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: only prepare owning store in `packfile_store_get_packs()`
When calling `packfile_store_get_packs()` we prepare not only the
provided packfile store, but also all those of all other sources part of
the same object database. This was required when the store was still
sitting on the object database level. But now that it sits on the source
level it's not anymore.
Adapt the code so that we only prepare the MIDX of the provided store.
All callers only work in the context of a single store or call the
function in a loop over all sources, so this change shouldn't have any
practical effects.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packfile store is a member of `struct object_database`, which means
that we have a single store per database. This doesn't really make much
sense though: each source connected to the database has its own set of
packfiles, so there is a conceptual mismatch here. This hasn't really
caused much of a problem in the past, but with the advent of pluggable
object databases this is becoming more of a problem because some of the
sources may not even use packfiles in the first place.
Move the packfile store down by one level from the object database into
the object database source. This ensures that each source now has its
own packfile store, and we can eventually start to abstract it away
entirely so that the caller doesn't even know what kind of store it
uses.
Note that we only need to adjust a relatively small number of callers,
way less than one might expect. This is because most callers are using
`repo_for_each_pack()`, which handles enumeration of all packfiles that
exist in the repository. So for now, none of these callers need to be
adapted. The remaining callers that iterate through the packfiles
directly and that need adjustment are those that are a bit more tangled
with packfiles. These will be adjusted over time.
Note that this patch only moves the packfile store, and there is still a
bunch of functions that seemingly operate on a packfile store but that
end up iterating over all sources. These will be adjusted in subsequent
commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: refactor misleading code when unusing pack windows
The function `unuse_one_window()` is responsible for unmapping one of
the packfile windows, which is done when we have exceeded the allowed
number of window.
The function receives a `struct packed_git` as input, which serves as an
additional packfile that should be considered to be closed. If not
given, we seemingly skip that and instead go through all of the
repository's packfiles. The conditional that checks whether we have a
packfile though does not make much sense anymore, as we dereference the
packfile regardless of whether or not it is a `NULL` pointer to derive
the repository's packfile store.
The function was originally introduced via f0e17e86e1 (pack: move
release_pack_memory(), 2017-08-18), and here we indeed had a caller that
passed a `NULL` pointer. That caller was later removed via 9827d4c185
(packfile: drop release_pack_memory(), 2019-08-12), so starting with
that commit we always pass a `struct packed_git`. In 9c5ce06d74
(packfile: use `repository` from `packed_git` directly, 2024-12-03) we
then inadvertently started to rely on the fact that the pointer is never
`NULL` because we use it now to identify the repository.
Arguably, it didn't really make sense in the first place that the caller
provides a packfile, as the selected window would have been overridden
anyway by the subsequent loop over all packfiles if there was an older
window. So the overall logic is quite misleading overall. The only case
where it _could_ make a difference is when there were two packfiles with
the same `last_used` value, but that case doesn't ever happen because
the `pack_used_ctr` is strictly increasing.
Refactor the code so that we instead pass in the object database to
help make the code less misleading.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: refactor kept-pack cache to work with packfile stores
The kept pack cache is a cache of packfiles that are marked as kept
either via an accompanying ".kept" file or via an in-memory flag. The
cache can be retrieved via `kept_pack_cache()`, where one needs to pass
in a repository.
Ultimately though the kept-pack cache is a property of the packfile
store, and this causes problems in a subsequent commit where we want to
move down the packfile store to be a per-object-source entity.
Prepare for this and refactor the kept-pack cache to work on top of a
packfile store instead. While at it, rename both the function and flags
specific to the kept-pack cache so that they can be properly attributed
to the respective subsystems.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When preparing a packfile we pass various pieces attached to the pack's
object database source via the `struct prepare_pack_data`. Refactor this
code to instead pass in the source directly. This reduces the number of
variables we need to pass and allows for a subsequent refactoring where
we start to prepare the pack via the source.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In subsequent patches we're about to move the packfile store from the
object database layer into the object database source layer. Once done,
we'll have one packfile store per source, where the source is owning the
store.
Prepare for this future and refactor `packfile_store_new()` to be
initialized via an object database source instead of via the object
database itself.
This refactoring leads to a weird in-between state where the store is
owned by the object database but created via the source. But this makes
subsequent refactorings easier because we can now start to access the
owning source of a given store.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 9 Jan 2026 03:39:01 +0000 (03:39 +0000)]
builtin.h: update documentation
The documentation for the builtin API was moved from the technical
documentation and into a comment in builtin.h by ec14d4ecb5 (builtin.h: take
over documentation from api-builtin.txt, 2017-08-02). This documentation
wasn't updated as part of the major overhaul to include a repository struct
in 9b1cb5070f (builtin: add a repository parameter for builtin functions,
2024-09-13).
There was a brief update regarding the move from *.txt to *.adoc by e8015223c7 (builtin.h: *.txt -> *.adoc fixes, 2025-03-03).
I noticed that there was quite a bit missing from the old documentation,
which is still visible on git-scm.com [1].
This change updates the documentation in the following ways:
1. Updates the cmd_foo() prototype to include a repository.
2. Adds some newlines to have uniformity in the list of flags.
3. Adds a description of the NO_PARSEOPT flag.
4. Describes the tests that perform checks on all builtins, which may trip
up a contributor working on a new builtin.
I double-checked these instructions against a toy example in my local branch
to be sure that it was complete.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
K Jayatheerth [Fri, 9 Jan 2026 03:20:27 +0000 (08:50 +0530)]
t7101: modernize test path checks
Replace old-style `test -[df]` and `! test -[df]` assertions with
the modern `test_path_is_file`, `test_path_is_dir`, and
`test_path_is_missing` helpers.
These helpers provide more informative error messages in case of
failure (e.g., "File 'foo' is missing" instead of just exit code 1).
While at it, fix a typo and an incorrect path
reference in one of the test descriptions.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitfaq: document using stash import/export to sync working tree
Git 2.51 learned how to import and export stashes. This is a
secure and robust way to transfer working tree states across machines
and comes with almost none of the pitfalls of rsync or other tools.
Recommend this as an alternative in the FAQ.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Michael Lyons [Thu, 8 Jan 2026 15:30:21 +0000 (10:30 -0500)]
doc: git-blame: convert to new doc format
- Use _<placeholder>_ instead of <placeholder> in the description
- Use _underscores_ around math associated with <placeholders>
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Michael Lyons <git@michael.lyo.nz> Acked-by: Jean-Noël AVILA <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Michael Lyons [Thu, 8 Jan 2026 15:30:20 +0000 (10:30 -0500)]
doc: blame-options: convert to new doc format
- Use _<placeholder>_ instead of <placeholder> in the description
- Modify some samples to use <placeholders>
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Michael Lyons <git@michael.lyo.nz> Acked-by: Jean-Noël AVILA <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
add -p: show user's hunk decision when selecting hunks
When a user is interactively deciding which hunks to use or skip for
staging, unstaging, stashing etc, there is no way to know the
decision previously chosen for a hunk when navigating through the
previous and next hunks using K/J respectively.
Improve the UI to explicitly show if a user has previously decided to
use a hunk (by pressing 'y') or skip the hunk (by pressing 'n').
This will improve clarity when and aid the navigation process for the
user.
Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default `--unstable` is a legacy format that predates `--stable`.
That’s why 2871f4d4 (builtin: patch-id: add --verbatim as a command mode,
2022-10-24) made `--verbatim` lock in[1] `--stable`:
Users of --unstable mainly care about compatibility with old git
versions, which unstripping the whitespace would break. Thus there
isn't a usecase for the combination of --verbatim and --unstable,
and we don't expose this so as to not add maintainence burden.
† 1: imply `--stable`, disallow `--unstable`
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc: patch-id: don’t use semicolon between bullet points
These bullet points are full-fledged paragraphs with sentences. It’s
best to restrict semicolon-termination to the case when the bullet list
amounts to a list of items.[1]
† 1: Like “List: ... • first; ... • second; and ... • third.”
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 8 Jan 2026 07:40:12 +0000 (16:40 +0900)]
Merge branch 'en/ort-recursive-d-f-conflict-fix'
The ort merge machinery hit an assertion failure in a history with
criss-cross merges renamed a directory and a non-directory, which
has been corrected.
* en/ort-recursive-d-f-conflict-fix:
merge-ort: fix corner case recursive submodule/directory conflict handling
Running "git diff" with "--name-only" and other options that allows
us not to look at the blob contents, while objects that are lazily
fetched from a promisor remote, caused use-after-free, which has
been corrected.
* ds/diff-lazy-fetch-with-name-only-fix:
diff: avoid segfault with freed entries
Junio C Hamano [Thu, 8 Jan 2026 07:40:11 +0000 (16:40 +0900)]
Merge branch 'rs/tag-wo-the-repository'
Code clean-up.
* rs/tag-wo-the-repository:
tag: stop using the_repository
tag: support arbitrary repositories in parse_tag()
tag: support arbitrary repositories in gpg_verify_tag()
tag: use algo of repo parameter in parse_tag_buffer()
Ashlesh Gawande [Wed, 7 Jan 2026 07:47:24 +0000 (13:17 +0530)]
t5550: add netrc tests for http 401/403
git allows using .netrc file to supply credentials for HTTP auth.
Three test cases are added in this patch to provide missing coverage
when cloning over HTTP using .netrc file:
- First test case checks that the git clone is successful when credentials
are provided via .netrc file
- Second test case checks that the git clone fails when the .netrc file
provides invalid credentials. The HTTP server is expected to return
401 Unauthorized in such a case. The test checks that the user is
provided with a prompt for username/password on 401 to provide
the valid ones.
- Third test case checks that the git clone fails when the .netrc file
provides credentials that are valid but do not have permission for
this user. For example one may have multiple tokens in GitHub
and uses the one which was not authorized for cloning this repo.
In such a case the HTTP server returns 403 Forbidden.
For this test, the apache.conf is modified to return a 403
on finding a forbidden-user. No prompt for username/password is
expected after the 403 (unlike 401). This is because prompting may wipe
out existing credentials or conflict with custom credential helpers.
Signed-off-by: Ashlesh Gawande <git@ashlesh.me> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 8 Jan 2026 02:01:22 +0000 (11:01 +0900)]
Merge branch 'kh/replay-invalid-onto-advance' into ps/history
* kh/replay-invalid-onto-advance:
t3650: add more regression tests for failure conditions
replay: die if we cannot parse object
replay: improve code comment and die message
replay: die descriptively when invalid commit-ish is given
replay: find *onto only after testing for ref name
replay: remove dead code and rearrange
This test indirectly checks that the lost-found folder has 2 files in it
and then checks that the expected two files exist. Make this more
deliberate by removing the old test -f and compare the actual ls of the
lost-found directory with the expected files.
Signed-off-by: Andrew Chitester <andchi@fastmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible to hit a memory leak when reading data from a submodule
via git-grep(1):
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x55555562e726 in calloc (git+0xda726)
#1 0x555555964734 in xcalloc ../wrapper.c:154:8
#2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2
#3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6
#4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17
#5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3
#6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2
#7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2
#8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4
#9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8
#10 0x5555558540bd in odb_read_object ../odb.c:920:6
#11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12
#12 0x55555580a13a in grep_source_load ../grep.c:1986:10
#13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7
#14 0x555555807574 in grep_source_1 ../grep.c:1625:8
#15 0x555555807322 in grep_source ../grep.c:1837:10
#16 0x5555556a5c58 in run ../builtin/grep.c:208:10
#17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o
#18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
The root caues of this leak is the way we set up and release the
submodule:
1. We use `repo_submodule_init()` to initialize a new repository. This
repository is stored in `repos_to_free`.
2. We now read data from the submodule repository.
3. We then call `repo_clear()` on the submodule repositories.
4. `repo_clear()` calls `odb_free()`.
5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`.
The issue here is the 5th step: we call `odb_free_sources()` _before_ we
call `odb_close()`. But `odb_free_sources()` already frees all sources,
so the logic that closes them in `odb_close()` now becomes a no-op. As a
consequence, we never explicitly close sources at all.
Fix the leak by closing the store before we free the sources.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/gc: fix condition for whether to write commit graphs
When performing auto-maintenance we check whether commit graphs need to
be generated by counting the number of commits that are reachable by any
reference, but not covered by a commit graph. This search is performed
by iterating through all references and then doing a depth-first search
until we have found enough commits that are not present in the commit
graph.
This logic has a memory leak though:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x55555562e433 in malloc (git+0xda433)
#1 0x555555964322 in do_xmalloc ../wrapper.c:55:8
#2 0x5555559642e6 in xmalloc ../wrapper.c:76:9
#3 0x55555579bf29 in commit_list_append ../commit.c:1872:35
#4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4
#5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12
#6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9
#7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9
#8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11
#9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8
#10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9
#11 0x55555575166a in run_builtin ../git.c:506:11
#12 0x5555557502f0 in handle_builtin ../git.c:779:9
#13 0x555555751127 in run_argv ../git.c:862:4
#14 0x55555575007b in cmd_main ../git.c:984:19
#15 0x5555557523aa in main ../common-main.c:9:11
#16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#18 0x5555555f0934 in _start (git+0x9c934)
The root cause of this memory leak is our use of `commit_list_append()`.
This function expects as parameters the item to append and the _tail_ of
the list to append. This tail will then be overwritten with the new tail
of the list so that it can be used in subsequent calls. But we call it
with `commit_list_append(parent->item, &stack)`, so we end up losing
everything but the new item.
This issue only surfaces when counting merge commits. Next to being a
memory leak, it also shows that we're in fact miscounting as we only
respect children of the last parent. All previous parents are discarded,
so their children will be disregarded unless they are hit via another
reference.
While crafting a test case for the issue I was puzzled that I couldn't
establish the proper border at which the auto-condition would be
fulfilled. As it turns out, there's another bug: if an object is at the
tip of any reference we don't mark it as seen. Consequently, if it is
the tip of or reachable via another ref, we'd count that object multiple
times.
Fix both of these bugs so that we properly count objects without leaking
any memory.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 6 Jan 2026 10:25:58 +0000 (05:25 -0500)]
cat-file: only use bitmaps when filtering
Commit 8002e8ee18 (builtin/cat-file: use bitmaps to efficiently filter
by object type, 2025-04-02) introduced a performance regression when we
are not filtering objects: it uses bitmaps even when they won't help,
incurring extra costs. For example, running the new perf tests from this
commit, which check the performance of listing objects by oid:
$ export GIT_PERF_LARGE_REPO=/path/to/linux.git
$ git -C "$GIT_PERF_LARGE_REPO" repack -adb
$ GIT_SKIP_TESTS=p1006.1 ./run 8002e8ee18^ 8002e8ee18 p1006-cat-file.sh
[...]
Test 8002e8ee18^ 8002e8ee18
-------------------------------------------------------------------------------
1006.2: list all objects (sorted) 1.48(1.44+0.04) 6.39(6.35+0.04) +331.8%
1006.3: list all objects (unsorted) 3.01(2.97+0.04) 3.40(3.29+0.10) +13.0%
1006.4: list blobs 4.85(4.67+0.17) 1.68(1.58+0.10) -65.4%
An invocation that filters, like listing all blobs (1006.4), does
benefit from using the bitmaps; it now doesn't have to check the type of
each object from the pack data, so the tradeoff is worth it.
But for listing all objects in sorted idx order (1006.2), we otherwise
would never open the bitmap nor the revindex file. Worse, our sorting
step gets much worse. Normally we append into an array in pack .idx
order, and the sort step is trivial. But with bitmaps, we get the
objects in pack order, which is apparently random with respect to oid,
and have to sort the whole thing. (Note that this freshly-packed state
represents the best case for .idx sorting; if we had two packs, then
we'd have their objects one after the other and qsort would have to
interleave them).
The unsorted test in 1006.3 is interesting: there we are going in pack
order, so we load the revindex for the pack anyway. And though we don't
sort the result, we do use an oidset to check for duplicates. So we can
see in the 8002e8ee18^ timings that those two things cost ~1.5s over the
sorted case (mostly the oidset hash cost). We also incur the extra cost
to open the bitmap file as of 8002e8ee18, which seems to be ~400ms.
(This would probably be faster with a bitmap lookup table, but writing
that out is not yet the default).
So we know that bitmaps help when there's filtering to be done, but
otherwise make things worse. Let's only use them when there's a filter.
The perf script shows that we've fixed the regressions without hurting
the bitmap case:
Test 8002e8ee18^ 8002e8ee18 HEAD
--------------------------------------------------------------------------------------------------------
1006.2: list all objects (sorted) 1.56(1.53+0.03) 6.44(6.37+0.06) +312.8% 1.62(1.54+0.06) +3.8%
1006.3: list all objects (unsorted) 3.04(2.98+0.06) 3.45(3.38+0.07) +13.5% 3.04(2.99+0.04) +0.0%
1006.4: list blobs 5.14(4.98+0.15) 1.76(1.68+0.06) -65.8% 1.73(1.64+0.09) -66.3%
Note that there's another related case: we might have a filter that
cannot be used with bitmaps. That check is handled already for us in
for_each_bitmapped_object(), though we'd still load the bitmap and
revindex files pointlessly in that case. I don't think it can happen in
practice for cat-file, though, since it allows only blob:none,
blob:limit, and object:type filters, all of which work with bitmaps.
It would be easy-ish to insert an extra check like:
can_filter_bitmap(&opt->objects_filter);
into the conditional, but I didn't bother here. It would be redundant
with the call in for_each_bitmapped_object(), and the can_filter helper
function is static local in the bitmap code (so we'd have to make it
public).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
it will use the repo in /some/path. But if you use the "run" helper
script to aggregate and compare results, like this:
GIT_PERF_LARGE_REPO=/some/path ./run HEAD^ HEAD p1006-cat-file.sh
it will ignore that variable. This is because the presence of the
LARGE_REPO variable in GIT-BUILD-OPTIONS overrides what's in the
environment. This started with 4638e8806e (Makefile: use common template
for GIT-BUILD-OPTIONS, 2024-12-06), which now writes even empty
variables (though arguably it was wrong even before with a non-empty
value, as we generally prefer the environment to take precedence over
on-disk config).
We had the same problem in perf-lib.sh itself, and we hacked around it
with 32b74b9809 (perf: do allow `GIT_PERF_*` to be overridden again,
2025-04-04). That's what lets the direct invocation of "./p1006" work
above.
And in fact that was sufficient for "./run", too, until it started
loading GIT-BUILD-OPTIONS itself in 5756ccd181 (t/perf: fix benchmarks
with out-of-tree builds, 2025-04-28). Now it has the same problem: it
clobbers any incoming GIT_PERF options from the environment.
We can use the same hack here in the "run" script. It's quite ugly, but
it's just short enough that I don't think it's worth trying to factor it
out into a common shell library.
In the long run, we might consider teaching GIT-BUILD-OPTIONS to be more
gentle in overwriting existing entries. There are probably other
GIT_TEST_* variables which would need the same treatment. And if and
when we come up with a more complete solution, we can use it in both
spots.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 6 Jan 2026 10:13:49 +0000 (05:13 -0500)]
t/perf/perf-lib: fix assignment of TEST_OUTPUT_DIRECTORY
Using the perf suite's "run" helper in a vanilla build fails like this:
$ make && (cd t/perf && ./run p0000-perf-lib-sanity.sh)
=== Running 1 tests in this tree ===
perf 1 - test_perf_default_repo works: 1 2 3 ok
perf 2 - test_checkout_worktree works: 1 2 3 ok
ok 3 - test_export works
perf 4 - export a weird var: 1 2 3 ok
perf 5 - éḿíẗ ńöń-ÁŚĆÍÍ ćḧáŕáćẗéŕś: 1 2 3 ok
ok 6 - test_export works with weird vars
perf 7 - important variables available in subshells: 1 2 3 ok
perf 8 - test-lib-functions correctly loaded in subshells: 1 2 3 ok
# passed all 8 test(s)
1..8
cannot open test-results/p0000-perf-lib-sanity.subtests: No such file or directory at ./aggregate.perl line 159.
It is trying to aggregate results written into t/perf/test-results, but
the p0000 script did not write anything there.
The "run" script looks in $TEST_OUTPUT_DIRECTORY/test-results, or if
that variable is not set, in test-results in the current working
directory (which should be t/perf itself). It pulls the value of
$TEST_OUTPUT_DIRECTORY from the GIT-BUILD-OPTIONS file.
But that doesn't quite match the setup in perf-lib.sh (which is what
scripts like p0000 use). There we do this at the top of the script:
TEST_OUTPUT_DIRECTORY=$(pwd)
and then let test-lib.sh append "/test-results" to that. Historically,
that made the vanilla case work: we'd always use t/perf/test-results.
But when $TEST_OUTPUT_DIRECTORY was set, it would break.
Commit 5756ccd181 (t/perf: fix benchmarks with out-of-tree builds,
2025-04-28) fixed that second case by loading GIT-BUILD-OPTIONS
ourselves. But that broke the vanilla case!
Now our setting of $TEST_OUTPUT_DIRECTORY in perf-lib.sh is ignored,
because it is overwritten by GIT-BUILD-OPTIONS. And when test-lib.sh
sees that the output directory is empty, it defaults to t/test-results,
rather than t/perf/test-results.
Nobody seems to have noticed, probably for two reasons:
1. It only matters if you're trying to aggregate results (like the
"run" script does). Just running "./p0000-perf-lib-sanity.sh"
manually still produces useful output; the stored result files are
just in an unexpected place.
2. There might be leftover files in t/perf/test-results from previous
runs (before 5756ccd181). In particular, the ".subtests" files
don't tend to change, and the lack of that file is what causes it
to barf completely. So it's possible that the aggregation could
have been showing stale results that did not match the run that
just happened.
We can fix it by setting TEST_OUTPUT_DIRECTORY only after we've loaded
GIT-BUILD-OPTIONS, so that we override its value and not the other way
around. And we'll do so only when the variable is not set, which should
retain the fix for that case from 5756ccd181.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 6 Jan 2026 07:33:53 +0000 (16:33 +0900)]
Merge branch 'ar/run-command-hook'
Use hook API to replace ad-hoc invocation of hook scripts with the
run_command() API.
* ar/run-command-hook:
receive-pack: convert receive hooks to hook API
receive-pack: convert update hooks to new API
hooks: allow callers to capture output
run-command: allow capturing of collated output
hook: allow overriding the ungroup option
reference-transaction: use hook API instead of run-command
transport: convert pre-push to hook API
hook: convert 'post-rewrite' hook in sequencer.c to hook API
hook: provide stdin via callback
run-command: add stdin callback for parallelization
run-command: add first helper for pp child states
Julia Evans [Mon, 5 Jan 2026 21:48:18 +0000 (16:48 -0500)]
doc: git-reset: clarify `git reset <pathspec>`
From user feedback:
- Continued confusion about the terms "tree-ish" and "pathspec"
- The word "hunks" is confusing folks, use "changes" instead.
- On the part about `git restore`, there were a few comments to the
effect of "wait, this doesn't actually update any files? What? Why?"
Be more direct that `git reset` does not update files: there's no
obvious reason to suggest that folks use `git reset` followed by `git
restore`, instead suggest just using `git restore`.
Continue avoiding the use of the word "reset" to
describe what "git reset" does.
Signed-off-by: Julia Evans <julia@jvns.ca> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Julia Evans [Mon, 5 Jan 2026 21:48:17 +0000 (16:48 -0500)]
doc: git-reset: clarify `git reset [mode]`
From user feedback, there was some confusion about the differences
between the modes, including:
1. Sometimes it says "index" and sometimes "index file".
Fix by replacing "index file" with "index".
2. Many comments about not being able to understand what `--merge` does.
Fix by mentioning obscure situations, since that seems to be what
it's for. Most folks will use `git <cmd> --abort`.
3. Issues telling the difference between --soft and --mixed, as well as
--keep. Leave --keep alone because I couldn't understand its use case,
but change `--soft` / `--mixed` / `--hard` as follows:
--mixed is the default, so put it first.
Describe --soft/--mixed/--hard with the following structure:
* Start by saying what happens to the files in the working directory,
because the thing users want to avoid most is irretrievably losing
changes to their working directory files.
* Then describe what happens to the staging area. Right now it seems to
frame leaving the index alone as being a sort of neutral action.
I think this is part of what's confusing users, because in Git when
you update HEAD, Git almost always updates the index to match HEAD.
So leaving the index unchanged while updating HEAD is actually quite
unusual, and it deserves to be flagged.
* Finally, give an example for --soft to explain a common use case.
Signed-off-by: Julia Evans <julia@jvns.ca> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Julia Evans [Mon, 5 Jan 2026 21:48:16 +0000 (16:48 -0500)]
doc: git-reset: clarify intro
From user feedback, there were several points of confusion:
- What "tree-ish", "entries", "working tree", "HEAD", and "index" mean
("I have no clue what the index is", "I've been using git for 20 years
and still don't know what a tree-ish is"). Avoid using these terms
where it makes sense.
- What "optionally modifying index and working tree to match" means
("to match what?" "optionally based on what?")
Remove this from the intro, we can say it later when giving more
details.
- One user suggested that "The <tree-ish>/<commit> defaults to HEAD
in all forms." should be repeated later on, since it's easy to miss.
Instead say that HEAD is the default in each case later.
Another issue is that `git reset` consistently describes the action
it does as "Reset ...", commands should not use their name to describe
themselves, and that the word "mode" is used to mean several different
things on this page.
Address these by being more clear about two use cases for `git reset`
("to undo operations" and "to update staged files"), and explaining what
the conditions are for each case instead of forcing the user to figure
out the pattern is in first form vs the other 3 forms.
Signed-off-by: Julia Evans <julia@jvns.ca> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Julia Evans [Mon, 5 Jan 2026 21:48:15 +0000 (16:48 -0500)]
doc: git-reset: reorder the forms
From user feedback: three users commented that the `git reset [mode]`
form is the one that they primarily use, and that they were suprised to
see it listed last.
("I've never used git reset in any mode other than --hard").
Move it to be first, since the `git reset [mode]` form is what
"Reset current HEAD to the specified state" at the beginning refers
to, and because the `git reset [mode]` form is the only thing that
`git reset` uniquely does, the others could also be done with
`git restore`.
Signed-off-by: Julia Evans <julia@jvns.ca> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
`parse_object` can return `NULL`. That will in turn make
`repo_peel_to_type` return the same.
Let’s die fast and descriptively with the `*_or_die` variant.
Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
replay: die descriptively when invalid commit-ish is given
Giving an invalid commit-ish to `--onto` makes git-replay(1) fail with:
fatal: Replaying down to root commit is not supported yet!
Going backwards from this point:
1. `onto` is `NULL` from `set_up_replay_mode`;
2. that function in turn calls `peel_committish`; and
3. here we return `NULL` if `repo_get_oid` fails.
Let’s die immediately with a descriptive error message instead.
Doing this also provides us with a descriptive error if we “forget” to
provide an argument to `--onto` (but we really do unintentionally):[1]
$ git replay --onto ^main topic1
fatal: '^main' is not a valid commit-ish
Note that the `--advance` case won’t be triggered in practice because
of the “argument to --advance must be a reference” check (see the
previous test, and commit).
† 1: The argument to `--onto` is mandatory and the option parser accepts
both `--onto=<name>` (stuck form) and `--onto name`. The latter
form makes it easy to unintentionally pass something to the option
when you really meant to pass a positional argument.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
replay: find *onto only after testing for ref name
We are about to make `peel_committish` die when it cannot find
a commit-ish instead of returning `NULL`. But that would make e.g.
`git replay --advance=refs/non-existent` die with a less descriptive
error message; the highest-level error message is that the name does
not exist as a ref, not that we cannot find a commit-ish based on
the name.
Let’s try to find the ref and only after that try to peel to
as a commit-ish.
Also add a regression test to protect this error order from future
modifications.
Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
22d99f01 (replay: add --advance or 'cherry-pick' mode, 2023-11-24) both
added `--advance` and made one of `--onto` or `--advance` mandatory.
But `determine_replay_mode` claims that there is a third alternative;
neither of `--onto` or `--advance` were given:
if (onto_name) {
...
} else if (*advance_name) {
...
} else {
...
}
But this is false—the fallthrough else-block is dead code.
Commit 22d99f01 was iterated upon by several people.[1] The initial
author wrote code for a sort of *guess mode*, allowing for shorter
commands when that was possible. But the next person instead made one
of the aforementioned options mandatory. In turn this code was dead on
arrival in git.git.
Let’s remove this code. We can also join the if-block with the
condition `!*advance_name` into the `*onto` block since we do not set
`*advance_name` in this function. It only looked like we might set it
since the dead code has this line:
*advance_name = xstrdup_or_null(last_key);
Let’s also rename the function since we do not determine the
replay mode here. We just set up `*onto` and refs to update.
Note that there might be more dead code caused by this *guess mode*.
We only concern ourselves with this function for now.
Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pushkar Singh [Sun, 4 Jan 2026 19:47:59 +0000 (19:47 +0000)]
t1300: use test helpers instead of `test` command
Replace `test -f` and `test -h` checks with `test_path_is_file` and
`test_path_is_symlink`. Using the test framework helpers provides
clearer diagnostics and keeps tests consistent across the suite.
Signed-off-by: Pushkar Singh <pushkarkumarsingh1970@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When difftool's sync-back mechanism checks for changes, it compares
stat data between the temporary index and the modified files. If the
modification happens within the same timestamp granularity window and
file size stays the same, the change goes undetected.
On Windows, this is more likely to manifest because Git relies on
inode changes as a fallback when other stat fields match, but Windows
filesystems lack inodes. This is a real bug that could affect users
scripting difftool similarly, as seen at:
Fix the test by changing the replacement content to "modified content"
(17 bytes), ensuring the size difference is detected regardless of
timestamp resolution or platform-specific stat behavior.
Note: This fixes the test flakiness but not the underlying issue in
difftool's change detection. Other tests with same-size file patterns
(t0010-racy-git.sh, t2200-add-update.sh) are not affected because they
use normal index operations with proper racy-git detection.
Signed-off-by: Paul Tarjan <github@paulisageek.com> Reviewed-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Paul Tarjan [Thu, 1 Jan 2026 00:19:23 +0000 (00:19 +0000)]
t7527: fix flaky fsmonitor event tests with retry logic
The fsmonitor event tests (edit, create, delete, rename, etc.) were
flaky because there can be a race between the daemon writing events
to the trace file and the test's grep commands checking for them.
Add a retry_grep() helper function (similar to retry_until_success
in lib-git-p4.sh) that retries grep with a timeout, and use it in
all event-checking tests to wait for one expected event before
checking the rest.
Signed-off-by: Paul Tarjan <github@paulisageek.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 30 Dec 2025 03:58:19 +0000 (12:58 +0900)]
Merge branch 'ps/repack-avoid-noop-midx-rewrite'
Even when there is no changes in the packfile and no need to
recompute bitmaps, "git repack" recomputed and updated the MIDX
file, which has been corrected.
* ps/repack-avoid-noop-midx-rewrite:
midx-write: skip rewriting MIDX with `--stdin-packs` unless needed
midx-write: extract function to test whether MIDX needs updating
midx: fix `BUG()` when getting preferred pack without a reverse index
Junio C Hamano [Tue, 30 Dec 2025 03:58:19 +0000 (12:58 +0900)]
Merge branch 'js/test-symlink-windows'
Prepare test suite for Git for Windows that supports symbolic
links.
* js/test-symlink-windows:
t7800: work around the MSYS path conversion on Windows
t6423: introduce Windows-specific handling for symlinking to /dev/null
t1305: skip symlink tests that do not apply to Windows
t1006: accommodate for symlink support in MSYS2
t0600: fix incomplete prerequisite for a test case
t0301: another fix for Windows compatibility
t0001: handle `diff --no-index` gracefully
mingw: special-case `open(symlink, O_CREAT | O_EXCL)`
apply: symbolic links lack a "trustable executable bit"
t9700: accommodate for Windows paths
Junio C Hamano [Tue, 30 Dec 2025 03:58:19 +0000 (12:58 +0900)]
Merge branch 'jt/repo-struct-more-objinfo'
More object database related information are shown in "git repo
structure" output.
* jt/repo-struct-more-objinfo:
builtin/repo: add object disk size info to structure table
builtin/repo: add disk size info to keyvalue stucture output
builtin/repo: add inflated object info to structure table
builtin/repo: add inflated object info to keyvalue structure output
builtin/repo: humanise count values in structure output
strbuf: split out logic to humanise byte values
builtin/repo: group per-type object values into struct
Derrick Stolee [Mon, 29 Dec 2025 21:44:57 +0000 (21:44 +0000)]
diff: avoid segfault with freed entries
When computing a diff in a partial clone, there is a chance that we
could trigger a prefetch of missing objects at the same time as we are
freeing entries from the global diff queue. This is difficult to
reproduce, as we need to have some objects be freed from the queue
before triggering the prefetch of missing objects. There is a new test
in t4067 that does trigger the segmentation fault that results in this
case.
The fix is to set the queue pointer to NULL after it is freed, and then
to be careful about NULL values in the prefetch.
The more elaborate explanation is that within diffcore_std(), we may
skip the initial prefetch due to the output format (--name-only in the
test) and go straight to diffcore_skip_stat_unmatch(). In that method,
the index entries that have been invalidated by path changes show up as
entries but may be deleted because they are not actually content diffs
and only newer timestamps than expected. As those entries are deleted,
later entries are checked with diff_filespec_check_stat_unmatch(), which
uses diff_queued_diff_prefetch() as the missing_object_cb in its diff
options. That can trigger downloading missing objects if the appropriate
scenario occurs to trigger a call to diff_popoulate_filespec(). It's
finally within that callback to diff_queued_diff_prefetch() that the
segfault occurs.
The test was hard to find because it required some real differences,
some not-different files that had a newer modified time, and the order
of those files alphabetically was important to trigger the deletion
before the prefetch was triggered.
I briefly considered a "lock" member for the diff queue, but it was a
much larger diff and introduced many more possible error scenarios.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Deveshi Dwivedi [Mon, 29 Dec 2025 18:57:37 +0000 (18:57 +0000)]
t5403: use test_path_is_file instead of test -f
Replace 'test -f' with the test_path_is_file in
t5403-post-checkout-hook.sh. This helper provides better error
messages when tests fail, making it easier to debug issues.
Signed-off-by: Deveshi Dwivedi <deveshigurgaon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
While these may look similar to both a562d90a350d (merge-ort: fix failing merges in special corner case,
2025-11-03)
and f6ecb603ff8a (merge-ort: fix directory rename on top of source of other
rename/delete, 2025-08-06)
the cause is different and in this case the problem is not an
over-conservative assertion, but a bug before the assertion where we did
not update all relevant state appropriately.
It sadly took me a really long time to figure out how to get a simple
reproducer for this one. It doesn't really have that many moving parts,
but there are multiple pieces of background information needed to
understand it.
First of all, when we have two files added at the same path, merge-ort
does a two-way merge of those files. If we have two directories added
at the same path, we basically do the same thing (taking the union of
files, and two-way merging files with the same name). But two-way
merging requires components of the same type. We can't merge the
contents of a regular file with a directory, or with a symlink, or with
a submodule. Nor can any of those other types be merged with each
other, e.g. merging a submodule with a directory is a bad idea. When
two paths have the same name but their types do not match, merge-ort is
forced to move one of them to an alternate filename (using the
unique_path() function).
Second, if two commits being merged have more than one merge-base,
merge-ort will merge the merge-bases to create a virtual merge-base, and
use that as the base commit.
Third, one of the really important optimizations in merge-ort is trivial
tree-level resolution (roughly meaning merging trees without recursing
into them). This optimization has some nuance to it that is important
to the current bug, and to understand it, it helps to first look at the
high-level overview of how merge-ort runs; there are basically three
high-level functions that the work is divided between:
collect_merge_info() - walks the top-level trees getting individual
paths of interest
detect_renames() - detect renames between paths in order to match up
paths for three-way merging
process_entries() - does a few things of interest:
* three-way merging of files,
* other special handling (e.g. adjusting paths with conflicting
types to avoid path collisions)
* as it finishes handling all the files within a subdirectory,
writes out a new tree object for that directory
If it were not for renames, we could just always do tree-level merging
whenever the tree on at least one side was unmodified. Unfortunately,
we need to recurse into trees to determine whether there are renames.
However, we can also do tree-level merging so long as there aren't any
*relevant* renames (another merge-ort optimization), which we can
determine without recursing into trees.
We would also be able to do tree-level merging if we somehow apriori
knew what renames existed, by only recursing into the trees which we
could otherwise trivially merge if they contained files involved in
renames. That might not seem useful, because we need to find out the
renames and we have to recurse into trees to do so, but when you find
out that the process_entries() step is more computationally expensive
than the collect_merge_info() step, it yields an interesting strategy:
* run collect_merge_info()
* run detect_renames()
* cache the renames()
* restart -- rerun collect_merge_info(), using the cached renames to
only recurse into the needed trees
* we already have the renames cached so no need to re-detect
* run process_entries() on the reduced list of paths
which was implemented back in 7bee6c100431 (merge-ort: avoid recursing
into directories when we don't need to, 2021-07-16) Crucially, this
restarting only occurs if the number of paths we could skip recursing
into exceeds the number we still need to recurse into by some safety
factor (wanted_factor in handle_deferred_entries()); forgetting this
fact is a great way to repeatedly fail to create a minimal testcase for
several days and go down alternate wrong paths).
Now, I earlier summarized this optimization as "merging trees without
recursing into them", but this optimization does not require that all
three sides of history has a directory at a given path. So long as the
tree on one side matches the tree in the base version, we can decide to
resolve in favor of whatever the other side of history has at that path
-- be it a directory, a file, a submodule, or a symlink. Unfortunately,
the code in question didn't fully realize this, and was written assuming
the base version and both sides would have a directory at the given
path, as can be seen by the "ci->filemask == 0" comment in
resolve_trivial_directory_merge() that was added as part of 7bee6c100431
(merge-ort: avoid recursing into directories when we don't need to,
2021-07-16). A few additional lines of code are needed to handle cases
where we have something other than a directory on the other side of
history.
But, knowing that resolve_trivial_directory_merge() doesn't have
sufficient state updating logic doesn't show us how to trigger a bug
without combining with the other bits of information we provided above.
Here's a relevant testcase:
* branches A & B
* commit A1: adds "folder" as a directory with files tracked under it
* commit B1: adds "folder" as a submodule
* commit A2: merges B1 into A1, keeping "folder" as a directory
(and in fact, with no changes to "folder" since A1), discarding the
submodule
* commit B2: merges A1 into B1, keeping "folder" as a submodule
(and in fact, with no changes to "folder" since B1), discarding the
directory
Here, if we try to merge A2 & B2, the logic proceeds as follows:
* we have multiple merge-bases: A1 & B1. So we have to merge those
to get a virtual merge base.
* due to "folder" as a directory and "folder" as a submodule, the
path collision logic triggers and renames "folder" as a submodule
to "folder~Temporary merge branch 2" so we can keep it alongside
"folder" as a directory.
* we now have a virtual merge base (containing both "folder"
directory and a "folder~Temporary merge branch 2" submodule) and
can now do the outer merge
* in the first step of the outer merge, we attempt to defer recursing
into folder/ as a directory, but find we need to for rename
detection.
* in rename detection, we note that "folder~Temporary merge branch 2"
has the same hash as "folder" as a submodule in B2, which means we
have an exact rename.
* after rename detection, we discover no path in folder/ is needed
for renames, and so we can cache renames and restart.
* after restarting, we avoid recursing into "folder/" and realize we
can resolve it trivially since it hasn't been modified. The
resolution removes "folder/", leaving us only "folder" as a
submodule from commit B2.
* After this point, we should have a rename/delete conflict on
"folder~Temporary merge branch 2" -> "folder", but our marking of
the merge of "folder" as clean broke our ability to handle that and
in fact triggers an assertion in process_renames().
When there was a df_conflict (directory/"file" conflict, where "file"
could be submodule or regular file or symlink), ensure
resolve_trivial_directory_merge() handles it properly. In particular:
* do not pre-emptively mark the path as cleanly merged if the
remaining path is a file; allow it to be processed in
process_entries() later to determine if it was clean
* clear the parts of dirmask or filemask corresponding to the matching
sides of history, since we are resolving those away
* clear the df_conflict bit afterwards; since we cleared away the two
matching sides and only have one side left, that one side can't
have a directory/file conflict with itself.
Also add the above minimal testcase showcasing this bug to t6422, **with
a sufficient number of paths under the folder/ directory to actually
trigger it**. (I wish I could have all those days back from all the
wrong paths I went down due to not having enough files under that
directory...)
I know this commit has a very high ratio of lines in the commit message
to lines of comments, and a relatively high ratio of comments to actual
code, but given how long it took me to track down, on the off chance
that we ever need to further modify this logic, I wanted it thoroughly
documented for future me and for whatever other poor soul might end up
needing to read this commit message.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 28 Dec 2025 18:10:51 +0000 (19:10 +0100)]
tag: stop using the_repository
gpg_verify_tag() shows the passed in object name on error. Both callers
provide one. It falls back to abbreviated hashes for future callers
that pass in a NULL name. DEFAULT_ABBREV is default_abbrev, which in
turn is a global variable that's populated by git_default_config() and
only available with USE_THE_REPOSITORY_VARIABLE.
Don't let that hypothetical hold us back from getting rid of
the_repository in tag.c. Fall back to full hashes, which are more
appropriate for error messages anyway. This allows us to stop setting
USE_THE_REPOSITORY_VARIABLE.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 28 Dec 2025 18:10:50 +0000 (19:10 +0100)]
tag: support arbitrary repositories in parse_tag()
Allow callers of parse_tag() pass in the repository to use. Let most of
them pass in the_repository to get the same result as before. One of
them has stopped using the_repository in ef9b0370da (sha1-name.c: store
and use repo in struct disambiguate_state, 2019-04-16); let it pass in
its stored repository.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 28 Dec 2025 18:10:49 +0000 (19:10 +0100)]
tag: support arbitrary repositories in gpg_verify_tag()
Allow callers of gpg_verify_tag() specify the repository to use by
providing a parameter for that. One of the two has not been using
the_repository since 43a8391977 (builtin/verify-tag: stop using
`the_repository`, 2025-03-08); let it pass in the correct repository.
The other simply passes the_repository to get the same result as before.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 28 Dec 2025 18:10:48 +0000 (19:10 +0100)]
tag: use algo of repo parameter in parse_tag_buffer()
Stop using "the_hash_algo" explicitly and implictly via parse_oid_hex()
and instead use the "hash_algo" member of the passed in repository,
which is more correct.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sun, 28 Dec 2025 08:36:16 +0000 (17:36 +0900)]
Merge branch 'ap/packfile-promisor-object-optim'
The code path that enumerates promisor objects have been optimized
to skip pointlessly parsing blob objects.
* ap/packfile-promisor-object-optim:
packfile: skip hash checks in add_promisor_object()
object: apply skip_hash and discard_tree optimizations to unknown blobs too
René Scharfe [Sat, 27 Dec 2025 09:29:35 +0000 (10:29 +0100)]
config: use git_parse_int() in git_config_get_expiry_in_days()
git_config_get_expiry_in_days() calls git_parse_signed() with the
maximum value of int, which is equivalent to calling git_parse_int().
Do that instead, as its shorter and clearer.
This requires demoting "days" to int to match. Promote "scale" to
intmax_t in turn to arrive at the same result when multiplying them.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emily Shaffer [Fri, 26 Dec 2025 12:23:34 +0000 (14:23 +0200)]
receive-pack: convert receive hooks to hook API
This converts the last remaining hooks to the new hook API, for
the same benefits as the previous conversions (no need to toggle
signals, manage custom struct child_process, call find_hook(),
prepares for specifyinig hooks via configs, etc.).
I noticed a performance degradation when processing large amounts
of hook input with just 1 line per callback, due to run-command's
poll loop, therefore I batched 500 lines per callback, to ensure
similar pipe throughput as before and to avoid hook child waiting
on stdin.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emily Shaffer [Fri, 26 Dec 2025 12:23:33 +0000 (14:23 +0200)]
receive-pack: convert update hooks to new API
Use the new hook sideband API introduced in the previous commit.
The hook API avoids creating a custom struct child_process and other
internal hook plumbing (e.g. calling find_hook()) and prepares for
the specification of hooks via configs or running parallel hooks.
Execution is still sequential through the current hook.[ch] via the
run_process_parallel_opts.processes=1 arg.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emily Shaffer [Fri, 26 Dec 2025 12:23:31 +0000 (14:23 +0200)]
run-command: allow capturing of collated output
Some callers, for example server-side hooks which wish to relay hook
output to clients across a transport, want to capture what would
normally print to stderr and do something else with it. Allow that via a
callback.
By calling the callback regardless of whether there's output available,
we allow clients to send e.g. a keepalive if necessary.
Because we expose a strbuf, not a fd or FILE*, there's no need to create
a temporary pipe or similar - we can just skip the print to stderr and
instead hand it to the caller.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>