Junio C Hamano [Tue, 15 Apr 2025 20:50:14 +0000 (13:50 -0700)]
Merge branch 'ps/object-wo-the-repository'
The object layer has been updated to take an explicit repository
instance as a parameter in more code paths.
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
Junio C Hamano [Tue, 15 Apr 2025 20:50:13 +0000 (13:50 -0700)]
Merge branch 'jk/zlib-inflate-fixes'
Fix our use of zlib corner cases.
* jk/zlib-inflate-fixes:
unpack_loose_rest(): rewrite return handling for clarity
unpack_loose_rest(): simplify error handling
unpack_loose_rest(): never clean up zstream
unpack_loose_rest(): avoid numeric comparison of zlib status
unpack_loose_header(): avoid numeric comparison of zlib status
git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT
unpack_loose_header(): fix infinite loop on broken zlib input
unpack_loose_header(): report headers without NUL as "bad"
unpack_loose_header(): simplify next_out assignment
loose_object_info(): BUG() on inflating content with unknown type
Junio C Hamano [Tue, 8 Apr 2025 18:43:16 +0000 (11:43 -0700)]
Merge branch 'fr/vimdiff-layout-fixes'
Layout configuration in vimdiff backend didn't work as advertised,
which has been corrected.
* fr/vimdiff-layout-fixes:
mergetools: vimdiff: add tests for layout with REMOTE as the target
mergetools: vimdiff: fix layout where REMOTE is the target
Junio C Hamano [Tue, 8 Apr 2025 18:43:14 +0000 (11:43 -0700)]
Merge branch 'tb/incremental-midx-part-2'
Incrementally updating multi-pack index files.
* tb/incremental-midx-part-2:
midx: implement writing incremental MIDX bitmaps
pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
pack-bitmap.c: keep track of each layer's type bitmaps
ewah: implement `struct ewah_or_iterator`
pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
pack-bitmap.c: compute disk-usage with incremental MIDXs
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
pack-bitmap.c: open and store incremental bitmap layers
pack-revindex: prepare for incremental MIDX bitmaps
Documentation: describe incremental MIDX bitmaps
Documentation: remove a "future work" item from the MIDX docs
Junio C Hamano [Tue, 8 Apr 2025 18:43:14 +0000 (11:43 -0700)]
Merge branch 'ps/reftable-sans-compat-util'
Make the code in reftable library less reliant on the service
routines it used to borrow from Git proper, to make it easier to
use by external users of the library.
* ps/reftable-sans-compat-util:
Makefile: skip reftable library for Coccinelle
reftable: decouple from Git codebase by pulling in "compat/posix.h"
git-compat-util.h: split out POSIX-emulating bits
compat/mingw: split out POSIX-related bits
reftable/basics: introduce `REFTABLE_UNUSED` annotation
reftable/basics: stop using `SWAP()` macro
reftable/stack: stop using `sleep_millisec()`
reftable/system: introduce `reftable_rand()`
reftable/reader: stop using `ARRAY_SIZE()` macro
reftable/basics: provide wrappers for big endian conversion
reftable/basics: stop using `st_mult()` in array allocators
reftable: stop using `BUG()` in trivial cases
reftable/record: don't `BUG()` in `reftable_record_cmp()`
reftable/record: stop using `BUG()` in `reftable_record_init()`
reftable/record: stop using `COPY_ARRAY()`
reftable/blocksource: stop using `xmmap()`
reftable/stack: stop using `write_in_full()`
reftable/stack: stop using `read_in_full()`
Junio C Hamano [Tue, 8 Apr 2025 18:43:13 +0000 (11:43 -0700)]
Merge branch 'tb/refspec-fetch-cleanup'
Code clean-up.
* tb/refspec-fetch-cleanup:
refspec: replace `refspec_item_init()` with fetch/push variants
refspec: remove refspec_item_init_or_die()
refspec: replace `refspec_init()` with fetch/push variants
refspec: treat 'fetch' as a Boolean value
Junio C Hamano [Tue, 8 Apr 2025 18:43:12 +0000 (11:43 -0700)]
Merge branch 'en/assert-wo-side-effects'
Ensure what we write in assert() does not have side effects,
and introduce ASSERT() macro to mark those that cannot be
mechanically checked for lack of side effects.
* en/assert-wo-side-effects:
treewide: replace assert() with ASSERT() in special cases
ci: add build checking for side-effects in assert() calls
git-compat-util: introduce ASSERT() macro
Junio C Hamano [Mon, 7 Apr 2025 21:23:17 +0000 (14:23 -0700)]
Merge branch 'cc/lop-remote'
Bugfix in newly introduced large-object-promisor remote support.
* cc/lop-remote:
promisor-remote: compare remote names case sensitively
promisor-remote: fix possible issue when no URL is advertised
promisor-remote: fix segfault when remote URL is missing
t5710: arrange to delete the client before cloning
Junio C Hamano [Mon, 7 Apr 2025 21:23:17 +0000 (14:23 -0700)]
Merge branch 'jc/name-rev-stdin'
Using "git name-rev --stdin" as an example, improve the framework to
prepare tests to pretend to be in the future where the breaking
changes have already happened.
* jc/name-rev-stdin:
name-rev: remove "--stdin" support
t6120: further modernize
t6120: avoid hiding "git" exit status
t: introduce WITH_BREAKING_CHANGES prerequisite
t: extend test_lazy_prereq
t: document test_lazy_prereq
Add a new builtin driver for generic INI files (e. g. the gitconfig
files), where:
- the funcname regular expression matches section names, i. e. any
string between brackets at the beginning of the line, with or without
indentation;
- word_regex matches any word with one or more non-whitespace
characters without checking if it is a valid variable name or value.
Also add tests for the new userdiff driver. These files define sections
and subsections, with and without indentation.
Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: D. Ben Knoble <ben.knoble@gmail.com> Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sat, 29 Mar 2025 07:39:10 +0000 (16:39 +0900)]
Merge branch 'en/random-cleanups'
Miscellaneous code clean-ups.
* en/random-cleanups:
merge-ort: remove extraneous word in comment
merge-ort: fix accidental strset<->strintmap
t7615: be more explicit about diff algorithm used
t6423: fix a comment that accidentally reversed two commits
stash: remove merge-recursive.h include
Junio C Hamano [Sat, 29 Mar 2025 07:39:10 +0000 (16:39 +0900)]
Merge branch 'jk/use-wunreachable-code-for-devs'
Enable -Wunreachable-code for developer builds.
* jk/use-wunreachable-code-for-devs:
config.mak.dev: enable -Wunreachable-code
git-compat-util: add NOT_CONSTANT macro and use it in atfork_prepare()
run-command: use errno to check for sigfillset() error
Junio C Hamano [Sat, 29 Mar 2025 07:39:08 +0000 (16:39 +0900)]
Merge branch 'jk/fetch-ref-prefix-cleanup'
In protocol v2 where the refs advertisement is constrained, we try
to tell the server side not to limit the advertisement when there
is no specific need to, which has been the source of confusion and
recent bugs. Revamp the logic to simplify.
* jk/fetch-ref-prefix-cleanup:
fetch: use ref prefix list to skip ls-refs
fetch: avoid ls-refs only to ask for HEAD symref update
fetch: stop protecting additions to ref-prefix list
fetch: ask server to advertise HEAD for config-less fetch
refspec_ref_prefixes(): clean up refspec_item logic
t5516: beef up exact-oid ref prefixes test
t5516: drop NEEDSWORK about v2 reachability behavior
t5516: prefer "oid" to "sha1" in some test titles
t5702: fix typo in test name
First step of deprecating and removing merge-recursive.
* en/merge-ort-prepare-to-remove-recursive:
am: switch from merge_recursive_generic() to merge_ort_generic()
merge-ort: fix merge.directoryRenames=false
t3650: document bug when directory renames are turned off
merge-ort: support having merge verbosity be set to 0
merge-ort: allow rename detection to be disabled
merge-ort: add new merge_ort_generic() function
Junio C Hamano [Sat, 29 Mar 2025 07:39:06 +0000 (16:39 +0900)]
Merge branch 'cc/signed-fast-export-import'
"git fast-export | git fast-import" learns to deal with commit and
tag objects with embedded signatures a bit better.
* cc/signed-fast-export-import:
fast-export, fast-import: add support for signed-commits
fast-export: do not modify memory from get_commit_buffer
git-fast-export.adoc: clarify why 'verbatim' may not be a good idea
fast-export: rename --signed-tags='warn' to 'warn-verbatim'
fast-export: fix missing whitespace after switch
git-fast-import.adoc: add missing LF in the BNF
Junio C Hamano [Wed, 26 Mar 2025 07:26:10 +0000 (16:26 +0900)]
Merge branch 'en/merge-process-renames-crash-fix'
The merge-recursive and merge-ort machinery crashed in corner cases
when certain renames are involved.
* en/merge-process-renames-crash-fix:
merge-ort: fix slightly overzealous assertion for rename-to-self
t6423: add a testcase causing a failed assertion in process_renames
Junio C Hamano [Wed, 26 Mar 2025 07:26:10 +0000 (16:26 +0900)]
Merge branch 'ua/some-builtins-wo-the-repository'
A handful of built-in command implementations have been rewritten
to use the repository instance supplied by git.c:run_builtin(), its
caller.
* ua/some-builtins-wo-the-repository:
builtin/checkout-index: stop using `the_repository`
builtin/for-each-ref: stop using `the_repository`
builtin/ls-files: stop using `the_repository`
builtin/pack-refs: stop using `the_repository`
builtin/send-pack: stop using `the_repository`
builtin/verify-commit: stop using `the_repository`
builtin/verify-tag: stop using `the_repository`
config: teach repo_config to allow `repo` to be NULL
Junio C Hamano [Wed, 26 Mar 2025 07:26:10 +0000 (16:26 +0900)]
Merge branch 'tb/refs-exclude-fixes'
The refname exclusion logic in the packed-ref backend has been
broken for some time, which confused upload-pack to advertise
different set of refs. This has been corrected.
Junio C Hamano [Wed, 26 Mar 2025 07:26:09 +0000 (16:26 +0900)]
Merge branch 'sj/ref-consistency-checks-more'
"git fsck" becomes more careful when checking the refs.
* sj/ref-consistency-checks-more:
builtin/fsck: add `git refs verify` child process
packed-backend: check whether the "packed-refs" is sorted
packed-backend: add "packed-refs" entry consistency check
packed-backend: check whether the refname contains NUL characters
packed-backend: add "packed-refs" header consistency check
packed-backend: check if header starts with "# pack-refs with: "
packed-backend: check whether the "packed-refs" is regular file
builtin/refs: get worktrees without reading head information
t0602: use subshell to ensure working directory unchanged
By default, whatever ends up been written to the "MERGED" window will
become the file which conflict we are resolving.
However, it is possible to use the "@" symbol to specify a different
one. For example, if we use this slightly different version of the
previously used string:
"(LOCAL,BASE,@REMOTE)/MERGED"
...then the user should proceed to edit the contents of the top right
window (instead of the bottom window) as *that* is what will become the
conflicts free file once vim is closed.
Before this commit, the "@" marker worked for all targets *except* for
"REMOTE". In other words, these worked as expected:
Reported-by: kawarimidoll <kawarimidoll+git@gmail.com> Suggested-by: D. Ben Knoble <ben.knoble@gmail.com> Signed-off-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eli Schwartz [Tue, 25 Mar 2025 20:08:48 +0000 (16:08 -0400)]
meson: disable coccinelle configuration when building from a tarball
Wiring up coccinelle in the build, depends on running git commands to
get the list of files to operate on. Reasonable, for a feature mainly
used by people developing on git. If building git itself from a tarball
distribution of git's own source code, one likely does not need to run
coccinelle.
But running those git commands failed, and caused the build to error
out, if `spatch` was installed -- because the build assumed that its
presence indicated a desire to use it on this source tree. Instead, we
can expand the conditional to check for both `spatch` and the `.git`
file or directory.
Meson's `opt.require()` method allows us to add a prerequisite for the
feature option. If the prerequisite fails, then the option either:
- converts autodetection to disabled
- emits an informative error if the feature was set to enabled:
```
ERROR: Feature coccinelle cannot be enabled: coccinelle can only be run from a git checkout
```
Signed-off-by: Eli Schwartz <eschwartz@gentoo.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
D. Ben Knoble [Mon, 24 Mar 2025 20:52:23 +0000 (16:52 -0400)]
vimdiff: clarify the sigil used for marking the buffer to save
The original documentation from 7b5cf8be18 (vimdiff: add tool
documentation, 2022-03-30) mistakenly described the marker as an
asterisk, which is the character "*". The code and examples have always
looked for an arobase ("@").
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Acked-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Tue, 25 Mar 2025 00:51:48 +0000 (19:51 -0500)]
advice: allow disabling default branch name advice
The default branch name advice message is displayed when
`repo_default_branch_name()` is invoked and the `init.defaultBranch`
config is not set. In this scenario, the advice message is always shown
even if the `--no-advice` option is used.
Adapt `repo_default_branch_name()` to allow the default branch name
advice message to be disabled with the `--no-advice` option and
corresponding configuration.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 199f44cb2ead (builtin/clone: allow remote helpers to detect repo,
2024-02-27), clones started partially initializing the refdb before
executing the remote helpers by creating a HEAD file and "refs/"
directory. This has resulted in some scenarios where git-clone(1) now
prints the default branch name advice message where it previously did
not.
A side-effect of the HEAD file already existing, is that computation of
the default branch name is handled later in execution. This matters
because prior to 97abaab5f6 (refs: drop `git_default_branch_name()`,
2024-05-17), the default branch value would be computed during its first
execution and cached. Subsequent invocations would simply return the
cached value. Since the next `git_default_branch_name()` call site,
which is invoked through `guess_remote_head()`, is not configured to
suppress the advice message, computing the default branch name results
in the advice message being printed.
Configure `guess_remote_head()` to suppress the advice message,
restoring the previous behavior.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Tue, 25 Mar 2025 00:51:46 +0000 (19:51 -0500)]
remote: allow `guess_remote_head()` to suppress advice
The `repo_default_branch_name()` invoked through `guess_remote_head()`
is configured to always display the default branch advice message.
Adapt `guess_remote_head()` to accept flags and convert the `all`
parameter to a flag. Add the `REMOTE_GUESS_HEAD_QUIET` flag to to enable
suppression of advice messages. Call sites are updated accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tuomas Ahola [Mon, 24 Mar 2025 21:47:03 +0000 (23:47 +0200)]
bulk-checkin: fix sign compare warnings
In file bulk-checkin.c, three warnings are emitted by
"-Wsign-compare", two of which are caused by trivial loop iterator
type mismatches. For the third case, the type of `rsize` from
can be changed to size_t as both options of the ternary expression are
unsigned and the signedness of the variable isn't really needed
anywhere.
To prevent `read_result != rsize` making a clash, it is to be noted
that `read_result` is checked not to hold negative values. Therefore
casting the variable to size_t is a safe operation and enough to
remove the sign-compare warning.
Fix issues accordingly, and remove `DISABLE_SIGN_COMPARE_WARNINGS` to
enable "-Wsign-compare" for the file.
Signed-off-by: Tuomas Ahola <taahol@utu.fi> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a bug to obtain the peer certificate without verifying it.
Having said that, from my reading of
https://www.openssl.org/docs/man1.1.1/man3/SSL_set_verify.html, it would
appear that Git is saved by the fact that it calls
`SSL_CTX_set_verify(ctx, SSL_VERIFY_PEER, NULL)` already early on.
In other words, that `SSL_VERIFY_PEER` combined with the `NULL`
parameter (i.e. no overridden callback) would _already_ verify the peer
certificate. The fact that we later call `SSL_get_peer_certificate()`
is mistaken by CodeQL to mean that that peer certificate still needs to
be verified, but that had already happened at that point.
Nevertheless, it is better to verify the peer certificate explicitly
than to rely on some side effect that is really hard to reason about
(and that took me more than one business day to analyze fully). It also
makes it easier for static analyzers to validate the correctness of the
code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The check for dubious ownership has one particular quirk on Windows: if
running as an administrator, files owned by the Administrators _group_
are considered owned by the user.
The rationale for that is: When running in elevated mode, Git creates
files that aren't owned by the individual user but by the Administrators
group.
There is yet another quirk, though: The check I introduced to determine
whether the current user is an administrator uses the
`CheckTokenMembership()` function with the current process token. And
that check only succeeds when running in elevated mode!
Let's be a bit more lenient here and look harder whether the current
user is an administrator. We do this by looking for a so-called "linked
token". That token exists when administrators run in non-elevated mode,
and can be used to create a new process in elevated mode. And feeding
_that_ token to the `CheckTokenMembership()` function succeeds!
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 24 Mar 2025 00:51:51 +0000 (00:51 +0000)]
maintenance: add loose-objects.batchSize config
The 'loose-objects' task of 'git maintenance run' first deletes loose
objects that exit within packfiles and then collects loose objects into
a packfile. This second step uses an implicit limit of fifty thousand
that cannot be modified by users.
Add a new config option that allows this limit to be adjusted or ignored
entirely.
While creating tests for this option, I noticed that actually there was
an off-by-one error due to the strict comparison in the limit check. I
considered making the limit check turn true on equality, but instead I
thought to use INT_MAX as a "no limit" barrier which should mean it's
never possible to hit the limit. Thus, a new decrement to the limit is
provided if the value is positive. (The restriction to positive values
is to avoid underflow if INT_MIN is configured.)
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 24 Mar 2025 00:51:50 +0000 (00:51 +0000)]
maintenance: force progress/no-quiet to children
The --no-quiet option for 'git maintenance run' is supposed to indicate
that progress should happen even while ignoring the value of isatty(2).
However, Git implicitly asks child processes to check isatty(2) since
these arguments are not passed through.
The pass through of --no-quiet will be useful in a test in the next
change.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
David Mandelberg [Sun, 23 Mar 2025 21:06:53 +0000 (17:06 -0400)]
completion: fix bugs with slashes in remote names
Previously, some calls to for-each-ref passed fixed numbers of path
components to strip from refs, assuming that remote names had no slashes
in them. This made completions like:
commit: move clear_commit_marks_many() loop body to clear_commit_marks()
clear_commit_marks_many() clears multiple commits one by one. Move the
code for handling a single commit to clear_commit_marks() and call it
instead of the other way around, to simplify the code.
Taylor Blau [Thu, 20 Mar 2025 17:57:08 +0000 (13:57 -0400)]
midx: implement writing incremental MIDX bitmaps
Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.
The details for doing so are mostly straightforward. The main changes
are as follows:
- find_object_pos() now makes use of an extra MIDX parameter which is
used to locate the bit positions of objects which are from previous
layers (and thus do not exist in the current layer's pack_order
field).
(Note also that the pack_order field is moved into struct
write_midx_context to further simplify the callers for
write_midx_bitmap()).
- bitmap_writer_build_type_index() first determines how many objects
precede the current bitmap layer and offsets the bits it sets in
each respective type-level bitmap by that amount so they can be OR'd
together.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:57:05 +0000 (13:57 -0400)]
pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
Now that we have initialized arrays for each bitmap layer's type bitmaps
in the previous commit, adjust existing callers to use them in
preparation for multi-layered bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:57:02 +0000 (13:57 -0400)]
pack-bitmap.c: keep track of each layer's type bitmaps
Prepare for reading the type-level bitmaps from previous bitmap layers
by maintaining an array for each type, where each element in that type's
array corresponds to one layer's bitmap for that type.
These fields will be used in a later commit to instantiate the 'struct
ewah_or_iterator' for each type.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:59 +0000 (13:56 -0400)]
ewah: implement `struct ewah_or_iterator`
While individual bitmap layers store different commit, type-level, and
pseudo-merge bitmaps, only the top-most layer is used to compute
reachability traversals.
Many functions which implement the aforementioned traversal rely on
enumerating the results according to the type-level bitmaps, and so
would benefit from a conceptual type-level bitmap that spans multiple
layers.
Implement `struct ewah_or_iterator` which is capable of enumerating
multiple EWAH bitmaps at once, and OR-ing the results together. When
initialized with, for example, all of the commit type bitmaps from each
layer, callers can pretend as if they are enumerating a large type-level
bitmap which contains the commits from *all* bitmap layers.
There are a couple of alternative approaches which were considered:
- Decompress each EWAH bitmap and OR them together, enumerating a
single (non-EWAH) bitmap. This would work, but has the disadvantage
of decompressing a potentially large bitmap, which may not be
necessary if the caller does not wish to read all of it.
- Recursively call bitmap internal functions, reusing the "result" and
"haves" bitmap from the top-most layer. This approach resembles the
original implementation of this feature, but is inefficient in that
it both (a) requires significant refactoring to implement, and (b)
enumerates large sections of later bitmaps which are all zeros (as
they pertain to objects in earlier layers).
(b) is not so bad in and of itself, but can cause significant
slow-downs when combined with expensive loop bodies.
This approach (enumerating an OR'd together version of all of the
type-level bitmaps from each layer) produces a significantly more
straightforward implementation with significantly less refactoring
required in order to make it work.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:56 +0000 (13:56 -0400)]
pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
Prepare for using pseudo-merges with incremental MIDX bitmaps by
attempting to apply pseudo-merges from each layer when encountering a
given commit during a walk.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:46 +0000 (13:56 -0400)]
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
Implement support for the special `--test-bitmap` mode of `git rev-list`
when using incremental MIDXs.
The bitmap_test_data structure is extended to contain a "base" pointer
that mirrors the structure of the bitmap chain that it is being used to
test.
When we find a commit to test, we first chase down the ->base pointer to
find the appropriate bitmap_test_data for the bitmap layer that the
given commit is contained within, and then perform the test on that
bitmap.
In order to implement this, light modifications are made to
bitmap_for_commit() to reimplement it in terms of a new function,
find_bitmap_for_commit(), which fills out a pointer which indicates the
bitmap layer which contains the given commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:43 +0000 (13:56 -0400)]
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
In a similar fashion as previous commits in the first phase of
incremental MIDXs, enumerate not just the packs in the current
incremental MIDX layer, but previous ones as well.
Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
single pack from a MIDX, use the oldest layer's preferred pack as it is
likely to contain the largest number of reusable sections.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:40 +0000 (13:56 -0400)]
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
Since we may ask for a pack_id that is in an earlier MIDX layer relative
to the one corresponding to our bitmap, use nth_midxed_pack() instead of
accessing the ->packs array directly.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:37 +0000 (13:56 -0400)]
pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
EWAH-compressed bitmap corresponding to some given commit object.
Teach this function about incremental MIDX bitmaps by teaching it to
recur on earlier bitmap layers when it fails to find a given commit in
the current layer.
The changes to do so are as follows:
- Avoid initializing hash_pos at its declaration, since
bitmap_for_commit() is now a recursive function and may receive a
NULL bitmap_index pointer as its first argument.
- In cases where we would previously return NULL (to indicate that a
lookup failed and the given bitmap_index does not contain an entry
corresponding to the given commit), recursively call the function on
the previous bitmap layer.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:34 +0000 (13:56 -0400)]
pack-bitmap.c: open and store incremental bitmap layers
Prepare the pack-bitmap machinery to work with incremental MIDXs by
adding a new "base" field to keep track of the bitmap index associated
with the previous MIDX layer.
The changes in this commit are mostly boilerplate to open the correct
bitmap(s), add them to the chain of bitmap layers along the "base"
pointer, ensure that the correct packs and their reverse indexes are
loaded across MIDX layers, etc.
While we're at it, keep track of a base_nr field to indicate how many
bitmap layers (including the current bitmap) exist. This will be used in
a future commit to allocate an array of 'struct ewah_bitmap' pointers to
collect all of the respective type bitmaps among all layers to
initialize a multi-EWAH iterator.
Subsequent commits will teach the functions within the pack-bitmap
machinery how to interact with these new fields.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:31 +0000 (13:56 -0400)]
pack-revindex: prepare for incremental MIDX bitmaps
Prepare the reverse index machinery to handle object lookups in an
incremental MIDX bitmap. These changes are broken out across a few
functions:
- load_midx_revindex() learns to use the appropriate MIDX filename
depending on whether the given 'struct multi_pack_index *' is
incremental or not.
- pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
object position in the MIDX pseudo-pack order, and find the
earliest containing MIDX (similar to midx.c::midx_for_object().
- midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the
number of objects in the base (since 'vb - midx->revindx_data' is
relative to the containing MIDX, and pack_pos_to_midx() expects a
global position).
Likewise, this function adjusts its output by adding
m->num_objects_in_base to return a global position out through the
`*pos` pointer.
Together, these changes are sufficient to use the multi-pack index's
reverse index format for incremental multi-pack reachability bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 20 Mar 2025 17:56:24 +0000 (13:56 -0400)]
Documentation: remove a "future work" item from the MIDX docs
One of the items listed as "future work" in the MIDX's technical
documentation is to extend the format to allow MIDXs to be written
incrementally across multiple layers.
This was suggested all the way back in ceab693d1f (multi-pack-index: add
design document, 2018-07-12), and implemented in b9497848df (Merge
branch 'tb/incremental-midx-part-1', 2024-08-19). Let's remove it
accordingly.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
compat/mingw: fix EACCESS when opening files with `O_CREAT | O_EXCL`
In our CI systems we can observe that t0610 fails rather frequently.
This testcase races a bunch of git-update-ref(1) processes with one
another which are all trying to update a unique reference, where we
expect that all processes succeed and end up updating the reftable
stack. The error message in this case looks like the following:
Instrumenting the code with a couple of calls to `BUG()` in relevant
sites where we return `REFTABLE_IO_ERROR` quickly leads one to discover
that this error is caused when calling `flock_acquire()`, which is a
thin wrapper around our lockfile API. Curiously, the error code we get
in such cases is `EACCESS`, indicating that we are not allowed to access
the file.
The root cause of this is an oddity of `CreateFileW()`, which is what
`_wopen()` uses internally. Quoting its documentation [1]:
If you call CreateFile on a file that is pending deletion as a
result of a previous call to DeleteFile, the function fails. The
operating system delays file deletion until all handles to the file
are closed. GetLastError returns ERROR_ACCESS_DENIED.
This behaviour is triggered quite often in the above testcase because
all the processes race with one another trying to acquire the lock for
the "tables.list" file. This is due to how locking works in the reftable
library when compacting a stack:
1. Lock the "tables.list" file and reads its contents.
2. Decide which tables to compact.
3. Lock each of the individual tables that we are about to compact.
4. Unlock the "tables.list" file.
5. Compact the individual tables into one large table.
6. Re-lock the "tables.list" file.
7. Write the new list of tables into it.
8. Commit the "tables.list" file.
The important step is (4): we don't commit the file directly by renaming
it into place, but instead we delete the lockfile so that concurrent
processes can continue to append to the reftable stack while we compact
the tables. And because we use `DeleteFileW()` to do so, we may now race
with another process that wants to acquire that lockfile. So if we are
unlucky, we would now see `ERROR_ACCESS_DENIED` instead of the expected
`ERROR_FILE_EXISTS`, which the lockfile subsystem isn't prepared to
handle and thus it will bail out without retrying to acquire the lock.
In theory, the issue is not limited to the reftable library and can be
triggered by every other user of the lockfile subsystem, as well. My gut
feeling tells me it's rather unlikely to surface elsewhere though.
Fix the issue by translating the error to `EEXIST`. This makes the
lockfile subsystem handle the error correctly: in case a timeout is set
it will now retry acquiring the lockfile until the timeout has expired.
With this, t0610 is now always passing on my machine whereas it was
previously failing in around 20-30% of all test runs.
meson: fix compat sources when compiling with MSVC
In our compat library we have both "msvc.c" and "mingw.c". The former is
mostly a thin wrapper around the latter as it directly includes it, but
it has a couple of extra headers that aren't included in "mingw.c" and
is expected to be used with the Visual Studio compiler toolchain.
While our Makefile knows to pick up the correct file depending on
whether or not the Visual Studio toolchain is used, we don't do the same
with Meson. Fix this.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Wed, 19 Mar 2025 22:52:58 +0000 (18:52 -0400)]
repack: begin combining cruft packs with `--combine-cruft-below-size`
The previous commit changed the behavior of repack's '--max-cruft-size'
to specify a cruft pack-specific override for '--max-pack-size'.
Introduce a new flag, '--combine-cruft-below-size' which is a
replacement for the old behavior of '--max-cruft-size'. This new flag
does explicitly what it says: it combines together cruft packs which are
smaller than a given threshold, and leaves alone ones which are
larger.
This accomplishes the original intent of '--max-cruft-size', which was
to avoid repacking cruft packs larger than the given threshold.
The new behavior is slightly different. Instead of building up small
packs together until the threshold is met, '--combine-cruft-below-size'
packs up *all* cruft packs smaller than the threshold. This means that
we may make a pack much larger than the given threshold (e.g., if you
aggregate 5 packs which are each 99 MiB in size with a threshold of 100
MiB).
But that's OK: the point isn't to restrict the size of the cruft packs
we generate, it's to avoid working with ones that have already grown too
large. If repositories still want to limit the size of the generated
cruft pack(s), they may use '--max-cruft-size'.
There's some minor test fallout as a result of the slight differences in
behavior between the old meaning of '--max-cruft-size' and the behavior
of '--combine-cruft-below-size'. In the test which is now called
"--combine-cruft-below-size combines packs", we need to use the new flag
over the old one to exercise that test's intended behavior. The
remainder of the changes there are to improve the clarity of the
comments.
Suggested-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Wed, 19 Mar 2025 22:52:54 +0000 (18:52 -0400)]
repack: avoid combining cruft packs with `--max-cruft-size`
In 37dc6d8104 (builtin/repack.c: implement support for
`--max-cruft-size`, 2023-10-02), we exposed new functionality that
allowed repositories to specify the behavior of when we should combine
multiple cruft packs together.
This feature was designed to ensure that we never repacked cruft packs
which were larger than the given threshold in order to provide tighter
I/O bounds for repositories that have many unreachable objects. In
essence, specifying '--max-cruft-size=N' instructed 'repack' to
aggregate cruft packs together (in order of ascending size) until the
combine size grows past 'N', and then make a new cruft pack whose
contents includes the packs we rolled up.
But this isn't quite how it works in practice. Suppose for example that
we have two cruft packs which are each 100MiB in size. One might expect
specifying "--max-cruft-size=200M" would combine these two packs
together, and then avoid repacking them until a pruning GC takes place.
In reality, 'repack' would try and aggregate these together, but writing
a pack that is strictly smaller than 200 MiB (since pack-objects'
"--max-pack-size" provides a strict bound for packs containing more than
one object).
So instead we'll write out a pack that is, say, 199 MiB in size, and
then another 1 MiB pack containing the balance. If we later repack the
repository without adding any new unreachable objects, we'll repeat the
same exercise again, making the same 199 MiB and 1 MiB packs each time.
This happens because of a poor choice to bolt the '--max-cruft-size'
functionality onto pack-objects' '--max-pack-size', forcing us to
generate packs which are always smaller than the provided threshold and
thus subject to repacking.
The following commit will introduce a new flag that implements something
similar to the behavior above. Let's prepare for that by making repack's
'--max-cruft-size' flag behave as an cruft pack-specific override for
'--max-pack-size'.
Do so by temporarily repurposing the 'collapse_small_cruft_packs()'
function to instead generate a cruft pack using the same instructions as
if we didn't specify any maximum pack size. The calling code looks
something like:
This patch makes collapse_small_cruft_packs() behave identically to the
'else' arm of the conditional above. This repurposing of
'collapse_small_cruft_packs()' is intentional, since it will set us up
nicely to introduce the new behavior in the following commit.
Naturally, there is some test fallout in the test which exercises the
old meaning of '--max-cruft-size'. Mark that test as failing for now to
be dealt with in the following commit. Likewise, add a new test which
explicitly tests the behavior of '--max-cruft-size' to place a hard
limit on the size of any generated cruft pack(s).
Note that this is a breaking change, as it alters the user-visible
behavior of '--max-cruft-size'. But I'm OK changing this behavior in
this instance, since the behavior wasn't accurate to begin with.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous commit moved a handful of tests from a different script into
t7704, including one that relies on generating random blobs.
Incidentally, the original home of this test defined its own helper
"write_blob" for doing so, which is identical in function to our
"generate_random_blob" (and is slightly inferior to the latter, which
cleans up after itself).
Rewrite the test that uses "write_blob" to no longer do so and then
remove the function.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Wed, 19 Mar 2025 22:52:48 +0000 (18:52 -0400)]
t/t7704-repack-cruft.sh: clarify wording in --max-cruft-size tests
Now that a number of new tests have landed in t7704, make sure that they
all make sense and are testing the things they say they are.
Things are mostly OK, but a handful of tests needed tweaks. Those tweaks
are as follows:
- Use the terms "too large" or "too small" in tests that exercise the
'--max-cruft-size' behavior. This has historically been treated as a
threshold beneath which to combine cruft packs, but that will change
in a subsequent commit. Prepare for that by using a more generic
term.
- Remove references to "--max-cruft-size" in the freshening tests.
These tests provide coverage of our ability to record updated mtimes
for objects already in cruft packs whose mtimes are upserted from
various sources (loose objects, finding that object in a new pack,
another cruft pack, etc.).
These have nothing to do with the '--max-cruft-size' feature, and in
fact none of the tests even *use* '--max-cruft-size'. Name them
appropriately to make it clear that these tests exercise freshening
behavior, not '--max-cruft-size' behavior.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The former is designed to test low-level pack generation mechanics at
the 'git pack-objects --cruft'-level, which is plumbing. The latter, on
the other hand, is designed to test the user-facing behavior through
'git repack --cruft', which is porcelain (under the "ancillary
manipulators" sub-section).
At some point a handful of tests which should have been added to the
latter script were instead written to the former. This isn't a huge
deal, but rectifying it is straightforward. Move a handful of
'repack'-related tests out of t5329 and into their rightful home in
t7704.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Wed, 19 Mar 2025 16:22:58 +0000 (16:22 +0000)]
treewide: replace assert() with ASSERT() in special cases
When the compiler/linker cannot verify that an assert() invocation is
free of side effects for us (e.g. because the assertion includes some
kind of function call), replace the use of assert() with ASSERT().
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Wed, 19 Mar 2025 16:22:57 +0000 (16:22 +0000)]
ci: add build checking for side-effects in assert() calls
It is a big no-no to have side-effects in an assertion, because if the
assert() is compiled out, you don't get that side-effect, leading to the
code behaving differently. That can be a large headache to debug.
We have roughly 566 assert() calls in our codebase (my grep might have
picked up things that aren't actually assert() calls, but most appeared
to be). All but 9 of them can be determined by gcc to be free of side
effects with a clever redefine of assert() provided by Bruno De Fraine
(from
https://stackoverflow.com/questions/10593492/catching-assert-with-side-effects),
who upon request has graciously placed his two-liner into the public
domain without warranty of any kind. The current 9 assert() calls
flagged by this clever redefinition of assert() appear to me to be free
of side effects as well, but are too complicated for a compiler/linker
to figure that since each assertion involves some kind of function call.
Add a CI job which will find and report these possibly problematic
assertions, and have the job suggest to the user that they replace these
with ASSERT() calls.
Example output from running:
```
ERROR: The compiler could not verify the following assert()
calls are free of side-effects. Please replace with
ASSERT() calls.
/home/newren/floss/git/diffcore-rename.c:1409
assert(!dir_rename_count || strmap_empty(dir_rename_count));
/home/newren/floss/git/merge-ort.c:1645
assert(renames->deferred[side].trivial_merges_okay &&
!strset_contains(&renames->deferred[side].target_dirs,
path));
/home/newren/floss/git/merge-ort.c:794
assert(omittable_hint ==
(!starts_with(type_short_descriptions[type], "CONFLICT") &&
!starts_with(type_short_descriptions[type], "ERROR")) ||
type == CONFLICT_DIR_RENAME_SUGGESTED);
/home/newren/floss/git/merge-recursive.c:1200
assert(!merge_remote_util(commit));
/home/newren/floss/git/object-file.c:2709
assert(would_convert_to_git_filter_fd(istate, path));
/home/newren/floss/git/parallel-checkout.c:280
assert(is_eligible_for_parallel_checkout(pc_item->ce, &pc_item->ca));
/home/newren/floss/git/scalar.c:244
assert(have_fsmonitor_support());
/home/newren/floss/git/scalar.c:254
assert(have_fsmonitor_support());
/home/newren/floss/git/sequencer.c:4968
assert(!(opts->signoff || opts->no_commit ||
opts->record_origin || should_edit(opts) ||
opts->committer_date_is_author_date ||
opts->ignore_date));
```
Note that if there are possibly problematic assertions, not necessarily
all of them will be shown in a single run, because the compiler errors
may include something like "ld: ... more undefined references to
`not_supposed_to_survive' follow" instead of listing each individually.
But in such cases, once you clean up a few that are shown in your first
run, subsequent runs will show (some of) the ones that remain, allowing
you to iteratively remove them all.
Helped-by: Bruno De Fraine <defraine@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Wed, 19 Mar 2025 16:22:56 +0000 (16:22 +0000)]
git-compat-util: introduce ASSERT() macro
Create a ASSERT() macro which is similar to assert(), but will not be
compiled out when NDEBUG is defined, and is thus safe to use even if its
argument has side-effects.
We will use this new macro in a subsequent commit to convert a few
existing assert() invocations to ASSERT(). In particular, we'll
convert the handful of invocations which cannot be proven to be free of
side effects with a simple compiler/linker hack.
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meet Soni [Wed, 19 Mar 2025 15:29:27 +0000 (20:59 +0530)]
reftable: adapt write_object_record() to propagate block_writer_add() errors
Previously, write_object_record() would flush the current block and retry
appending the record whenever block_writer_add() returned any nonzero
error. This forced an assumption that every failure meant the block was
full, even when errors such as memory allocation or I/O failures occurred.
Update the write_object_record() to inspect the error code returned by
block_writer_add() and flush and reinitialize the writer iff the
error is REFTABLE_ENTRY_TOO_BIG_ERROR. For any other error, immediately
propagate it.
If the flush and reinitialization still fail with
REFTABLE_ENTRY_TOO_BIG_ERROR, reset the record's offset length to zero
before a final attempt.
All call sites now handle various error codes returned by
block_writer_add().
Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>