Get rid of our dependency on `the_repository` in `find_odb()` by passing
in the object database in which we want to search for the source and
adjusting all callers.
Rename the function to `odb_find_source()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In subsequent commits we'll get rid of our use of `the_repository` in
"odb.c" in favor of explicitly passing in a `struct object_database` or
a `struct odb_source`. In some cases though we'll need access to the
repository, for example to read a config value from it, but we don't
have a way to access the repository owning a specific object database.
Introduce parent pointers for `struct object_database` to its owning
repository as well as for `struct odb_source` to its owning object
database, which will allow us to adapt those use cases.
Note that this change requires us to pass through the object database to
`link_alt_odb_entry()` so that we can set up the parent pointers for any
source there. The callchain is adapted to pass through the object
database accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-store: rename `object_directory` to `odb_source`
The `object_directory` structure is used as an access point for a single
object directory like ".git/objects". While the structure isn't yet
fully self-contained, the intent is for it to eventually contain all
information required to access objects in one specific location.
While the name "object directory" is a good fit for now, this will
change over time as we continue with the agenda to make pluggable object
databases a thing. Eventually, objects may not be accessed via any kind
of directory at all anymore, but they could instead be backed by any
kind of durable storage mechanism. While it seems quite far-fetched for
now, it is thinkable that eventually this might even be some form of a
database, for example.
As such, the current name of this structure will become worse over time
as we evolve into the direction of pluggable ODBs. Immediate next steps
will start to carve out proper self-contained object directories, which
requires us to pass in these object directories as parameters. Based on
our modern naming schema this means that those functions should then be
named after their subsystem, which means that we would start to bake the
current name into the codebase more and more.
Let's preempt this by renaming the structure. There have been a couple
alternatives that were discussed:
- `odb_backend` was discarded because it led to the association that
one object database has a single backend, but the model is that one
alternate has one backend. Furthermore, "backend" is more about the
actual backing implementation and less about the high-level concept.
- `odb_alternate` was discarded because it is a bit of a stretch to
also call the main object directory an "alternate".
Instead, pick `odb_source` as the new name. It makes it sufficiently
clear that there can be multiple sources and does not cause confusion
when mixed with the already-existing "alternate" terminology.
In the future, this change allows us to easily introduce for example a
`odb_files_source` and other format-specific implementations.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-store: rename `raw_object_store` to `object_database`
The `raw_object_store` structure is the central entry point for reading
and writing objects in a repository. The main purpose of this structure
is to manage object directories and provide an interface to access and
write objects in those object directories.
Right now, many of the functions associated with the raw object store
implicitly rely on `the_repository` to get access to its `objects`
pointer, which is the `raw_object_store`. As we want to generally get
rid of using `the_repository` across our codebase we will have to
convert this implicit dependency on this global variable into an
explicit parameter.
This conversion can be done by simply passing in an explicit pointer to
a repository and then using its `->objects` pointer. But there is a
second effort underway, which is to make the object subsystem more
selfcontained so that we can eventually have pluggable object backends.
As such, passing in a repository wouldn't make a ton of sense, and the
goal is to convert the object store interfaces such that we always pass
in a reference to the `raw_object_store` instead.
This will expose the `raw_object_store` type to a lot more callers
though, which surfaces that this type is named somewhat awkwardly. The
"raw_" prefix makes readers wonder whether there is a non-raw variant of
the object store, but there isn't. Furthermore, we nowadays want to name
functions in a way that they can be clearly attributed to a specific
subsystem, but calling them e.g. `raw_object_store_has_object()` is just
too unwieldy, even when dropping the "raw_" prefix.
Instead, rename the structure to `object_database`. This term is already
used a lot throughout our codebase, and it cannot easily be mistaken for
"object directories", either. Furthermore, its acronym ODB is already
well-known and works well as part of a function's name, like for example
`odb_has_object()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 30 May 2025 18:59:18 +0000 (11:59 -0700)]
Merge branch 'ps/midx-negative-packfile-cache'
When a stale .midx file refers to .pack files that no longer exist,
we ended up checking for these non-existent files repeatedly, which
has been optimized by memoizing the non-existence.
* ps/midx-negative-packfile-cache:
midx: stop repeatedly looking up nonexistent packfiles
packfile: explain ordering of how we look up auxiliary pack files
Junio C Hamano [Fri, 30 May 2025 18:59:17 +0000 (11:59 -0700)]
Merge branch 'kh/notes-doc-fixes'
"git notes --help" documentation updates.
* kh/notes-doc-fixes:
doc: notes: use stuck form throughout
doc: notes: treat --stdin equally between copy/remove
doc: notes: point out copy --stdin use with argv
doc: notes: clearly state that --stripspace is the default
doc: notes: remove stripspace discussion from other options
doc: notes: rework --[no-]stripspace
doc: notes: split out options with negated forms
doc: config: mention core.commentChar on commit.cleanup
doc: stripspace: mention where the default comes from
"git apply --index/--cached" when applying a deletion patch in
reverse failed to give the mode bits of the path "removed" by the
patch to the file it creates, which has been corrected.
* mm/apply-reverse-mode-of-deleted-path:
apply: set file mode when --reverse creates a deleted file
t4129: test that git apply warns for unexpected mode changes
Junio C Hamano [Fri, 30 May 2025 18:59:16 +0000 (11:59 -0700)]
Merge branch 'op/cvsserver-perl-warning'
Recent versions of Perl started warning against "! A =~ /pattern/"
which does not negate the result of the matching. As it turns out
that the problematic function is not even called, it was removed.
* op/cvsserver-perl-warning:
cvsserver: remove unused escapeRefName function
Junio C Hamano [Fri, 30 May 2025 18:59:16 +0000 (11:59 -0700)]
Merge branch 'am/sparse-index-name-hash-fix'
Avoid adding directory path to a sparse-index tree entries to the
name-hash, since they would bloat the hashtable without anybody
querying for them. This was done already for a single threaded
part of the code, but now the multi-threaded code also does the
same.
Junio C Hamano [Fri, 30 May 2025 18:59:16 +0000 (11:59 -0700)]
Merge branch 'pw/midx-repack-overflow-fix'
Integer overflow fix around code paths for "git multi-pack-index repack"..
* pw/midx-repack-overflow-fix:
midx docs: clarify tie breaking
midx: avoid negative array index
midx repack: avoid potential integer overflow on 64 bit systems
midx repack: avoid integer overflow on 32 bit systems
Since f93b2a0424 (reftable/basics: introduce `REFTABLE_UNUSED`
annotation, 2025-02-18), the reftable library was migrated to
use an internal version of `UNUSED`, which unconditionally sets
a GNU __attribute__ to avoid warnings function parameters that
are not being used.
Make the definition conditional to prevent breaking the build
with non GNU compilers.
Reported-by: "Randall S. Becker" <rsbecker@nexbridge.com> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 29 May 2025 16:03:01 +0000 (09:03 -0700)]
Merge branch 'master' of https://github.com/j6t/git-gui
* 'master' of https://github.com/j6t/git-gui:
git-gui: wire up support for the Meson build system
git-gui: stop including GIT-VERSION-FILE file
git-gui: extract script to generate macOS app
git-gui: extract script to generate macOS wrapper
git-gui: extract script to generate "tclIndex"
git-gui: extract script to generate "git-gui"
git-gui: drop no-op GITGUI_SCRIPT replacement
git-gui: make output of GIT-VERSION-GEN source'able
git-gui: prepare GIT-VERSION-GEN for out-of-tree builds
git-gui: replace GIT-GUI-VARS with GIT-GUI-BUILD-OPTIONS
Junio C Hamano [Thu, 29 May 2025 16:02:14 +0000 (09:02 -0700)]
Merge branch 'master' of https://github.com/j6t/gitk
* 'master' of https://github.com/j6t/gitk:
gitk: do not hard-code color of search results in commit list
gitk: place file name arguments after options in msgfmt call
gitk: Legacy widgets doesn't have combobox
Johannes Sixt [Thu, 29 May 2025 08:01:14 +0000 (10:01 +0200)]
Merge branch 'pks-meson-support' of github.com:pks-t/git-gui
* 'pks-meson-support' of github.com:pks-t/git-gui:
git-gui: wire up support for the Meson build system
git-gui: stop including GIT-VERSION-FILE file
git-gui: extract script to generate macOS app
git-gui: extract script to generate macOS wrapper
git-gui: extract script to generate "tclIndex"
git-gui: extract script to generate "git-gui"
git-gui: drop no-op GITGUI_SCRIPT replacement
git-gui: make output of GIT-VERSION-GEN source'able
git-gui: prepare GIT-VERSION-GEN for out-of-tree builds
git-gui: replace GIT-GUI-VARS with GIT-GUI-BUILD-OPTIONS
"git receive-pack" optionally learns not to care about connectivity
check, which can be useful when the repository arranges to ensure
connectivity by some other means.
* jt/receive-pack-skip-connectivity-check:
builtin/receive-pack: add option to skip connectivity check
t5410: test receive-pack connectivity check
midx: stop repeatedly looking up nonexistent packfiles
The multi-pack index acts as a cache across a set of packfiles so that
we can quickly look up which of those packfiles contains a given object.
As such, the multi-pack index naturally needs to be updated every time
one of the packfiles goes away, or otherwise the multi-pack index has
grown stale.
A stale multi-pack index should be handled gracefully by Git though, and
in fact it is: if the indexed pack cannot be found we simply ignore it
and eventually we fall back to doing the object lookup by just iterating
through all packs, even if those aren't indexed.
But while this fallback works, it has one significant downside: we don't
cache the fact that a pack has vanished. This leads to us repeatedly
trying to look up the same pack only to realize that it (still) doesn't
exist.
This issue can be easily demonstrated by creating a repository with a
stale multi-pack index and a couple of objects. We do so by creating a
repository with two packfiles, both of which are indexed by the
multi-pack index, and then repack those two packfiles. Note that we have
to move the multi-pack-index before doing the final repack, as Git knows
to delete it otherwise.
$ git init repo
$ cd repo/
$ git config set maintenance.auto false
$ for i in $(seq 1000); do printf "%d-original" $i >file-$i; done
$ git add .
$ git commit -moriginal
$ git repack -dl
$ for i in $(seq 1000); do printf "%d-modified" $i >file-$i; done
$ git commit -a -mmodified
$ git repack -dl
$ git multi-pack-index write
$ mv .git/objects/pack/multi-pack-index .
$ git repack -Adl
$ mv multi-pack-index .git/objects/pack/
Commands that cause a lot of objects lookups will now repeatedly invoke
`add_packed_git()`, which leads to three failed access(3p) calls as well
as one failed stat(3p) call. The following strace for example is done
for `git log --patch` in the above repository:
Fix the issue by introducing a negative lookup cache for indexed packs.
This cache works by simply storing an invalid pointer for a missing pack
when `prepare_midx_pack()` fails to look up the pack. Most users of the
`packs` array don't need to be adjusted, either, as they all know to
call `prepare_midx_pack()` before accessing the array.
With this change in place we can now see a significantly reduced number
of syscalls:
Furthermore, this change also results in a speedup:
Benchmark 1: git log --patch (revision = HEAD~)
Time (mean ± σ): 50.4 ms ± 2.5 ms [User: 22.0 ms, System: 24.4 ms]
Range (min … max): 45.4 ms … 54.9 ms 53 runs
Benchmark 2: git log --patch (revision = HEAD)
Time (mean ± σ): 12.7 ms ± 0.4 ms [User: 11.1 ms, System: 1.6 ms]
Range (min … max): 12.4 ms … 15.0 ms 191 runs
Summary
git log --patch (revision = HEAD) ran
3.96 ± 0.22 times faster than git log --patch (revision = HEAD~)
In the end, it should in theory never be necessary to have this negative
lookup cache given that we know to update the multi-pack index together
with repacks. But as the change is quite contained and as the speedup
can be significant as demonstrated above, it does feel sensible to have
the negative lookup cache regardless.
Based-on-patch-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
packfile: explain ordering of how we look up auxiliary pack files
When adding a packfile to an object database we perform four syscalls:
- Three calls to access(3p) are done to check for auxiliary data
structures.
- One call to stat(3p) is done to check for the ".pack" itself.
One curious bit is that we perform the access(3p) calls before checking
for the packfile itself, but if the packfile doesn't exist we discard
all results. The access(3p) calls are thus essentially wasted, so one
may be triggered to reorder those calls so that we can short-circuit the
other syscalls in case the packfile does not exist.
The order in which we look up files is quite important though to help
avoid races:
- When installing a packfile we move auxiliary data structures into
place before we install the ".idx" file.
- When deleting a packfile we first delete the ".idx" and ".pack"
files before deleting auxiliary data structures.
As such, to avoid any races with concurrently created or deleted packs
we need to make sure that we _first_ read auxiliary data structures
before we read the corresponding ".idx" or ".pack" file. Otherwise it
may easily happen that we return a populated but misclassified pack.
Add a comment to `add_packed_git()` to make future readers aware of this
ordering requirement.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc: notes: treat --stdin equally between copy/remove
46538012d94 (notes remove: --stdin reads from the standard input,
2011-05-18) added `--stdin` for the `remove` subcommand, documenting it
in the “Options” section. But `copy --stdin` was added before that, in 160baa0d9cb (notes: implement 'git notes copy --stdin', 2010-03-12).
Treat this option equally between the two subcommands:
• remove: mention `--stdin` on the subcommand as well, like for `copy`
• copy: mention it as well under the option documentation
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc: notes: clearly state that --stripspace is the default
Clearly state when which of the regular and negated form of the
option take effect.[1]
Also mention the subtle behavior that occurs when you mix options like
`-m` and `-C`, including a note that it might be fixed in the future.
The topic was brought up on v8 of the `--separator` series.[2][3]
[1]: https://lore.kernel.org/git/xmqqcyct1mtq.fsf@gitster.g/
[2]: https://lore.kernel.org/git/xmqq4jp326oj.fsf@gitster.g/
† 3: v11 was the version that landed
Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc: notes: remove stripspace discussion from other options
Cleaning up whitespace in metadata is typical porcelain behavior and
this default does not need to be pointed out.[1] Only speak up when
the default `--stripspace` is not used.
Also remove all misleading mentions of comment lines in the process;
see the previous commit.
Also remove the period that trails the parenthetical here.
† 1: See `-F` in git-commit(1) which has nothing to say about whitespace
cleanup. The cleanup discussion is on `--cleanup`.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document this option by copying the bullet list from git-stripspace(1).
A bullet list is cleaner when there are this many points to consider.
We also get a more standardized description of the multiple-blank-lines
behavior. Compare the repeating (git-notes(1)):
empty lines other than a single line between paragraphs
With (git-stripspace(1)):
multiple consecutive empty lines
And:
leading [...] whitespace
With:
empty lines from the beginning
Leading whitespace in the form of spaces (indentation) are not removed.
However, empty lines at the start of the message are removed.
Note that we drop the mentions of comment line handling because they are
wrong; this option does not control how lines which can be recognized as
comment lines are handled. Only interactivity controls that:
• Comment lines are stripped after editing interactively
• Lines which could be recognized as comment lines are left alone when
the message is given non-interactively
So it is misleading to document the comment line behavior on
this option.
Further, the text is wrong:
Lines starting with `#` will be stripped out in non-editor cases
like `-m`, [...]
Comment lines are still indirectly discussed on other options. We will
deal with them in the next commit.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 27 May 2025 20:59:10 +0000 (13:59 -0700)]
Merge branch 'js/misc-fixes'
Assorted fixes for issues found with CodeQL.
* js/misc-fixes:
sequencer: stop pretending that an assignment is a condition
bundle-uri: avoid using undefined output of `sscanf()`
commit-graph: avoid using stale stack addresses
trace2: avoid "futile conditional"
Avoid redundant conditions
fetch: avoid unnecessary work when there is no current branch
has_dir_name(): make code more obvious
upload-pack: rename `enum` to reflect the operation
commit-graph: avoid malloc'ing a local variable
fetch: carefully clear local variable's address after use
commit: simplify code
Junio C Hamano [Tue, 27 May 2025 20:59:10 +0000 (13:59 -0700)]
Merge branch 'sj/use-mmap-to-check-packed-refs'
The code path to access the "packed-refs" file while "fsck" is
taught to mmap the file, instead of reading the whole file in the
memory.
* sj/use-mmap-to-check-packed-refs:
packed-backend: mmap large "packed-refs" file during fsck
packed-backend: extract snapshot allocation in `load_contents`
packed-backend: fsck should warn when "packed-refs" file is empty
Junio C Hamano [Tue, 27 May 2025 20:59:09 +0000 (13:59 -0700)]
Merge branch 'ds/sparse-apply-add-p'
"git apply" and "git add -i/-p" code paths no longer unnecessarily
expand sparse-index while working.
* ds/sparse-apply-add-p:
p2000: add performance test for patch-mode commands
reset: integrate sparse index with --patch
git add: make -p/-i aware of sparse index
apply: integrate with the sparse index
Junio C Hamano [Tue, 27 May 2025 20:59:09 +0000 (13:59 -0700)]
Merge branch 'rj/build-tweaks-part2'
Updates to meson-based build procedure.
* rj/build-tweaks-part2:
configure.ac: upgrade to a compilation check for sysinfo
meson.build: correct setting of GIT_EXEC_PATH
meson: correct path to system config/attribute files
meson: correct install location of YAML.pm
meson.build: quote the GITWEBDIR build configuration
Junio C Hamano [Tue, 27 May 2025 20:59:08 +0000 (13:59 -0700)]
Merge branch 'jk/no-funny-object-types'
Support to create a loose object file with unknown object type has
been dropped.
* jk/no-funny-object-types:
object-file: drop support for writing objects with unknown types
hash-object: handle --literally with OPT_NEGBIT
hash-object: merge HASH_* and INDEX_* flags
hash-object: stop allowing unknown types
t: add lib-loose.sh
t/helper: add zlib test-tool
oid_object_info(): drop type_name strbuf
fsck: stop using object_info->type_name strbuf
oid_object_info_convert(): stop using string for object type
cat-file: use type enum instead of buffer for -t option
object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag
cat-file: make --allow-unknown-type a noop
object-file.h: fix typo in variable declaration
Function 'escapeRefName' introduced in 51a7e6dbc9 has never been used.
Despite being dead code, changes in Perl 5.41.4 exposed precedence
warning within its logic, which then caused test failures in t9402 by
logging the warnings to stderr while parsing the code. The affected
tests are t9402.30, t9402.31, t9402.32 and t9402.34.
Remove this unused function to simplify the codebase and stop the
warnings and test failures. Its corresponding unescapeRefName function,
which remains in use, has had its comments updated.
Reported-by: Jitka Plesnikova <jplesnik@redhat.com> Signed-off-by: Ondřej Pohořelský <opohorel@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark Mentovai [Sat, 24 May 2025 03:40:46 +0000 (23:40 -0400)]
apply: set file mode when --reverse creates a deleted file
Commit 01aff0a (apply: correctly reverse patch's pre- and post-image
mode bits, 2023-12-26) revised reverse_patches() to maintain the desired
property that when only one of patch::old_mode and patch::new_mode is
set, the mode will be carried in old_mode. That property is generally
correct, with one notable exception: when creating a file, only new_mode
will be set. Since reversing a deletion results in a creation, new_mode
must be set in that case.
Omitting handling for this case means that reversing a patch that
removes an executable file will not result in the executable permission
being set on the re-created file. Existing test coverage for file modes
focuses only on mode changes of existing files.
Swap old_mode and new_mode in reverse_patches() for what's represented
in the patch as a file deletion, as it is transformed into a file
creation under reversal. This causes git apply --reverse to set the
executable permission properly when re-creating a deleted executable
file.
Add tests ensuring that git apply sets file modes correctly on file
creation, both in the forward and reverse directions.
Signed-off-by: Mark Mentovai <mark@chromium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark Mentovai [Sat, 24 May 2025 03:40:45 +0000 (23:40 -0400)]
t4129: test that git apply warns for unexpected mode changes
There is no test covering what commit 01aff0a (apply: correctly reverse
patch's pre- and post-image mode bits, 2023-12-26) addressed. Prior to
that commit, git apply was erroneously unaware of a file's expected mode
while reverse-patching a file whose mode was not changing.
Add the missing test coverage to assure that git apply is aware of the
expected mode of a file being patched when the patch does not indicate
that the file's mode is changing. This is achieved by arranging a file
mode so that it doesn't agree with patch being applied, and checking git
apply's output for the warning it's supposed to raise in this situation.
Test in both reverse and normal (forward) directions.
Signed-off-by: Mark Mentovai <mark@chromium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 23 May 2025 22:34:07 +0000 (15:34 -0700)]
Merge branch 'ds/scalar-no-maintenance'
Two "scalar" subcommands that adds a repository that hasn't been
under "scalar"'s control are taught an option not to enable the
scheduled maintenance on it.
Junio C Hamano [Fri, 23 May 2025 22:34:06 +0000 (15:34 -0700)]
Merge branch 'js/ci-build-win-in-release-mode'
win+Meson CI pipeline, unlike other pipelines for Windows,
used to build artifacts in develper mode, which has been changed to
build them in release mode for consistency.
* js/ci-build-win-in-release-mode:
ci(win+Meson): build in Release mode
Phillip Wood [Thu, 22 May 2025 15:55:23 +0000 (16:55 +0100)]
midx docs: clarify tie breaking
Clarify what happens when an object exists in more than one pack, but
not in the preferred pack. "git multi-pack-index repack" relies on ties
for objects that are not in the preferred pack being resolved in favor
of the newest pack that contains a copy of the object. If ties were
resolved in favor of the oldest pack as the current documentation
suggests the multi-pack index would not reference any of the objects in
the pack created by "git multi-pack-index repack".
Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Thu, 22 May 2025 15:55:22 +0000 (16:55 +0100)]
midx: avoid negative array index
nth_midxed_pack_int_id() returns the index of the pack file in the multi
pack index's list of packfiles that the specified object. The index is
returned as a uint32_t. Storing this in an int will make the index
negative if the most significant bit is set. Fix this by using uint32_t
as the rest of the code does. This is unlikely to be a practical problem
as it requires the multipack index to reference 2^31 packfiles.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Thu, 22 May 2025 15:55:21 +0000 (16:55 +0100)]
midx repack: avoid potential integer overflow on 64 bit systems
On a 64 bit system the calculation
p->pack_size * pack_info[i].referenced_objects
could overflow. If a pack file contains 2^28 objects with an average
compressed size of 1KB then the pack size will be 2^38B. If all of the
objects are referenced by the multi-pack index the sum above will
overflow. Avoid this by using shifted integer arithmetic and changing
the order of the calculation so that the pack size is divided by the
total number of objects in the pack before multiplying by the number of
objects referenced by the multi-pack index. Using a shift of 14 bits
should give reasonable accuracy while avoiding overflow for pack sizes
less that 1PB.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The calculation to estimated size of the objects in the pack referenced
by the multi-pack-index uses st_mult() to multiply the pack size by the
number of referenced objects before dividing by the total number of
objects in the pack. As size_t is 32 bits on 32 bit systems this
calculation easily overflows. Fix this by using 64bit arithmetic instead.
Also fix a potential overflow when caluculating the total size of the
objects referenced by the multipack index with a batch size larger
than SIZE_MAX / 2. In that case
total_size += estimated_size
can overflow as both total_size and estimated_size can be greater that
SIZE_MAX / 2. This is addressed by using saturating arithmetic for the
addition. Although estimated_size is of type uint64_t by the time we
reach this sum it is bounded by the batch size which is of type size_t
and so casting estimated_size to size_t does not truncate the value.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Alex Mironov [Wed, 21 May 2025 21:29:31 +0000 (21:29 +0000)]
name-hash: don't add sparse directories in threaded lazy init
Ensure that logic added in 5f11669586 (name-hash: don't add directories
to name_hash, 2021-04-12) also applies in multithreaded hashtable init
path.
As per the original single-threaded change above: sparse directory entries
represent a directory that is outside the sparse-checkout definition.
These are not paths to blobs, so should not be added to the name_hash
table. Instead, they should be added to the directory hashtable when
'ignore_case' is true.
Add a condition to avoid placing sparse directories into the name_hash
hashtable. This avoids filling the table with extra entries that will
never be queried.
Signed-off-by: Alex Mironov <alexandrfox@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karthik Nayak [Tue, 20 May 2025 14:40:12 +0000 (16:40 +0200)]
t: remove unexpected SANITIZE_LEAK variables
As of 1fc7ddf35b (test-lib: unconditionally enable leak checking,
2024-11-20), both the `GIT_TEST_PASSING_SANITIZE_LEAK` and
`TEST_PASSES_SANITIZE_LEAK` variables no longer have any meaning, the
leak checks are enabled by default. However, some newly added tests
include them by mistake. Let's clean this up.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Tue, 20 May 2025 16:32:18 +0000 (11:32 -0500)]
builtin/receive-pack: add option to skip connectivity check
During git-receive-pack(1), connectivity of the object graph is
validated to ensure that the received packfile does not leave the
repository in a broken state. This is done via git-rev-list(1) and
walking the objects, which can be expensive for large repositories.
Generally, this check is critical to avoid an incomplete received
packfile from corrupting a repository. Server operators may have
additional knowledge though around exactly how Git is being used on the
server-side which can be used to facilitate more efficient connectivity
computation of incoming objects.
For example, if it can be ensured that all objects in a repository are
connected and do not depend on any missing objects, the connectivity of
newly written objects can be checked by walking the object graph
containing only the new objects from the updated tips and identifying
the missing objects which represent the boundary between the new objects
and the repository. These boundary objects can be checked in the
canonical repository to ensure the new objects connect as expected and
thus avoid walking the rest of the object graph.
Git itself cannot make the guarantees required for such an optimization
as it is possible for a repository to contain an unreachable object that
references a missing object without the repository being considered
corrupt.
Introduce the --skip-connectivity-check option for git-receive-pack(1)
which bypasses this connectivity check to give more control to the
server-side. Note that without proper server-side validation of newly
received objects handled outside of Git, usage of this option risks
corrupting a repository.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Tue, 20 May 2025 16:32:17 +0000 (11:32 -0500)]
t5410: test receive-pack connectivity check
As part of git-recieve-pack(1), the connectivity of objects is checked.
Add a test validating that git-receive-pack(1) fails due to an incoming
packfile that would leave the repository with missing objects. Instead
of creating a new test file, "t5410" is generalized for receive-pack
testing.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 19 May 2025 23:02:45 +0000 (16:02 -0700)]
Merge branch 'ag/doc-send-email'
The `send-email` documentation has been updated with OAuth2.0
related examples.
* ag/doc-send-email:
docs: add credential helper for outlook and gmail in OAuth list of helpers
docs: improve send-email documentation
send-mail: improve checks for valid_fqdn
Bundle-URI feature did not use refs recorded in the bundle other
than normal branches as anchoring points to optimize the follow-up
fetch during "git clone"; now it is told to utilize all.
* sc/bundle-uri-use-all-refs-in-bundle:
bundle-uri: add test for bundle-uri clones with tags
bundle-uri: copy all bundle references ino the refs/bundle space
Ramsay Jones [Mon, 19 May 2025 16:25:23 +0000 (17:25 +0100)]
configure.ac: upgrade to a compilation check for sysinfo
Commit f5e3c6c57d ("meson: do a full usage-based compile check for
sysinfo", 2025-04-25) updated the 'sysinfo()' check, as part of the
meson build, due to the failure of the check on Solaris. Prior to
that commit, the meson build only checked the availability of the
'<sys/sysinfo.h>' header file. On Solaris, both the header and the
'sysinfo()' function exist, but are completely unrelated to the same
function on Linux (and cygwin).
Commit 50dec7c566 ("config.mak.uname: add sysinfo() configuration for
cygwin", 2025-04-17) added a similar 'sysinfo()' check to the autoconf
build. This check looked for the 'sysinfo()' function itself, rather
than just the header, but it will fail (incorrectly set HAVE_SYSINFO)
for the same reason.
In order to correctly identify the 'sysinfo()' function we require as
part of 'git-gc' (used in the 'total_ram() function), we also upgrade
to a compilation check, in a similar way to the meson commit. Note that
since commit c9a51775a3 ("builtin/gc.c: correct RAM calculation when
using sysinfo", 2025-04-17) both the 'totalram' and 'mem_unit' fields
of the 'struct sysinfo' are used, so the new check includes both of
those fields in the compile check.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ramsay Jones [Mon, 19 May 2025 16:25:22 +0000 (17:25 +0100)]
meson.build: correct setting of GIT_EXEC_PATH
For the non-'runtime prefix' case, the meson build sets the GIT_EXEC_PATH
build variable to an absolute path equivalent to <prefix>/libexec/git-core.
In comparison, the default make build sets it to a relative path equivalent
to 'libexec/git-core'. Indeed, the make build requires the use of some
means outside of the Makefile (eg. config.mak[.*] or the command-line)
to set GIT_EXEC_PATH to anything other than 'libexec/git-core'.
For example, the make invocation:
$ make gitexecdir=/some/other/bin all install
will build git with GIT_EXEC_PATH set to '/some/other/bin' and install
the 'library' executables to that location. However, without setting the
'gitexecdir' make variable, irrespective of the 'runtime prefix' setting,
the GIT_EXEC_PATH is always set to 'libexec/git-core'.
The meson built-in 'libexecdir' option can be used to provide a similar
configurability. The default value for the option is 'libexec'. Attempting
to set the option to '' on the command-line, will reset it to the '.'
string, presumably to ensure a relative path value.
This commit allows the meson build, similar to the above, to configure the
project like:
so that the GIT_EXEC_PATH is set to '/some/other/bin'. Absent the
-Dlibexecdir argument, the GIT_EXEC_PATH is set to 'libexec/git-core'.
In order to correct the value of GIT_EXEC_PATH, default the value to the
static string value 'libexec/git-core', and only override if the value
of the 'libexecdir' option has a value different to 'libexec' or '.'.
Also, like the Makefile, add a check for an absolute path when the
runtime prefix option is true (and if so, error out).
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ramsay Jones [Mon, 19 May 2025 16:25:21 +0000 (17:25 +0100)]
meson: correct path to system config/attribute files
The path to the system-wide config and attributes files are not being
set correctly in the meson build. Unless explicitly overridden on the
command line during setup, the 'gitconfig' and 'gitattributes' options
are defaulting to absolute paths in the '/etc' system directory. This
is only appropriate if the <prefix> is set specifically to '/usr'.
The directory in which these files are placed is generally referred to
as the 'system configuration directory' or 'sysconfdir' for short. When
the prefix is '/usr' then the sysconfdir is usually set to '/etc', but
any other value for prefix results in the relative directory value 'etc'
instead. (eg if prefix is '/usr/local', then the 'etc' relative value
results in a system configuration directory of '/usr/local/etc'). When
setting the 'sysconfdir' builtin option value, the meson system uses
exactly this algorithm, so we can use get_option('sysconfdir') directly
when setting the (non-overridden) build variables.
In order to allow for overriding from the command line, remove the
default values specified for the 'gitconfig' and 'gitattributes' options
in the 'meson_options.txt' file. This allows the user to specify any
pathname for those options, while being able to test for the unset
(empty) value. An absolute pathname will be used unchanged and a relative
pathname will be appended to '<prefix>/'. These values are then used to
set the 'ETC_GITCONFIG' and 'ETC_GITATTRIBUTES' build variables which are,
in turn, passed to the compiler as '-D' arguments.
When the 'gitconfig' or 'gitattributes' options are not used, then use
the built-in 'sysconfdir' and set the ETC_GITCONFIG build variable to
the string "<sysconfdir>/gitconfig". Similarly, set ETC_ATTRIBUTES to
"<sysconfdir>/gitattributes".
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ramsay Jones [Mon, 19 May 2025 16:25:20 +0000 (17:25 +0100)]
meson: correct install location of YAML.pm
When executing an 'meson install' the YAML.pm file is incorrectly
placed in the <prefix>/share/perl5/Git/SVN directory. The YAML.pm
file should be placed in a 'Memoize' subdirectory instead. In order
to correct the location, update the 'install_dir' of the relevant
target in the 'perl/Git/SVN/Memoize/meson.build' file.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ramsay Jones [Mon, 19 May 2025 16:25:19 +0000 (17:25 +0100)]
meson.build: quote the GITWEBDIR build configuration
The build configuration options with (non-empty) values, for example
filesystem paths potentially containing spaces, have been set using
the '.set_quoted()' method. However, the GITWEBDIR value has been
set using the '.set()' method instead. In order to correctly quote
the GITWEBDIR value, replace the '.set()' method with '.set_quoted()'.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eli Schwartz [Mon, 19 May 2025 17:09:42 +0000 (13:09 -0400)]
meson: reformat default options to workaround bug in `meson configure`
Since 13cb20fc46 ("meson: fix compilation with Visual Studio",
2025-01-22) it has not been possible to list build options via `meson
configure`. This is due to Meson's static analysis of build options
failing to handle constant folding, and thinking we set a totally
invalid default `-std=`.
This is reported upstream but we anyways need to work with existing
versions. It turns out there is a simple solution: turn the entire
default option into a conditional branch, which means Meson sees either
nothing, or everything.
As a result, Git users can once again see pretty-printed options before
building.
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Bug: https://github.com/mesonbuild/meson/issues/14623 Signed-off-by: Eli Schwartz <eschwartz@gentoo.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
K Jayatheerth [Sun, 18 May 2025 07:43:17 +0000 (13:13 +0530)]
docs: replace git_config to repo_config
Since this document was written, the built-in API has been
updated a few times, but the document was left stale.
Adjust to the current best practices by calling repo_config() on the
repository instance the subcommand implementation receives as a
parameter, instead of calling git_config() that used to be the
common practice.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
K Jayatheerth [Sun, 18 May 2025 07:43:16 +0000 (13:13 +0530)]
docs: clarify cmd_psuh signature and explain UNUSED macro
The sample program, as written, would no longer build for at least two
reasons:
- Since this document was first written, the convention to call a
subcommand implementation has changed, and cmd_psuh() now needs
to accept the fourth parameter, repository.
- These days, compiler warning options for developers include one
that detects and complains about unused parameters, so ones that
are deliberately unused have to be marked as such.
Update the old-style examples to adjust to the current practices,
with explanations as needed.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
K Jayatheerth [Sun, 18 May 2025 07:43:15 +0000 (13:13 +0530)]
docs: remove unused mentoring mailing list reference
The git-mentoring group was initially created to help newcomers
with their development itches. However, in practice,
most of their questions were already being addressed
directly on the mailing list, and contributors consistently
received helpful responses there.
Remove the mentoring group details from the Documentation.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Fri, 16 May 2025 20:04:18 +0000 (20:04 +0000)]
merge-tree: add a new --quiet flag
Git Forges may be interested in whether two branches can be merged while
not being interested in what the resulting merge tree is nor which files
conflicted. For such cases, add a new --quiet flag which
will make use of the new mergeability_only flag added to merge-ort in
the previous commit. This option allows the merge machinery to, in the
outer layer of the merge:
* exit early when a conflict is detected
* avoid writing (most) merged blobs/trees to the object store
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Fri, 16 May 2025 20:04:17 +0000 (20:04 +0000)]
merge-ort: add a new mergeability_only option
Git Forges may be interested in whether two branches can be merged while
not being interested in what the resulting merge tree is nor which files
conflicted. For such cases, add a new mergeability_only option. This
option allows the merge machinery to, in the "outer layer" of the merge:
* exit upon first[-ish] conflict
* avoid (not prevent) writing merged blobs/trees to the object store
I have a number of qualifiers there, so let me explain each:
"outer layer":
Note that since the recursive merge of merge bases (corresponding to
call_depth > 0) can conflict without the outer final merge
(corresponding to call_depth == 0) conflicting, we can't short-circuit
nor avoid writing merged blobs/trees to the object store during those
inner merges.
"first-ish conflict":
The current patch only exits early from process_entries() on the first
conflict it detects, but conflicts could have been detected in a
previous function call, namely detect_and_process_renames(). However:
* conflicts detected by detect_and_process_renames() are quite rare
conflict types
* the detection would still come after regular rename detection
(which is the expensive part of detect_and_process_renames()), so
it is not saving us much in computation time given that
process_entries() directly follows detect_and_process_renames()
* [this overlaps with the next bullet point] process_entries() is the
place where virtually all object writing occurs (object writing is
sometimes more of a concern for Forges than computation time), so
exiting early here isn't saving us much in object writes either
* the code changes needed to handle an earlier exit are slightly
more invasive in detect_and_process_renames() than for
process_entries().
Given the rareness of the even earlier conflicts, the limited savings
we'd get from exiting even earlier, and in an attempt to keep this
patch simpler, we don't guarantee that we actually exit on the first
conflict detected. We can always revisit this decision later if we
decide that a further micro-optimization to exit slightly earlier in
rare cases is worthwhile.
"avoid (not prevent) writing objects":
The detect_and_process_renames() call can also write objects to the
object store, when rename/rename conflicts involve one (or more) files
that have also been modified on both sides. Because of this alternate
call path leading to handle_content_merges(), our "early exit" does not
prevent writing objects entirely, even within the "outer layer"
(i.e. even within call_depth == 0). I figure that's fine though, since
we're already writing objects for the inner merges (i.e. for call_depth
> 0), which are likely going to represent vastly more objects than files
involved in rename/rename+modify/modify cases in the outer merge, on
average.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Fri, 16 May 2025 16:26:26 +0000 (16:26 +0000)]
sequencer: make it clearer that commit descriptions are just comments
Every once in a while, users report that editing the commit summaries
in the todo list does not get reflected in the rebase operation,
suggesting that users are (a) only using one-line commit messages, and
(b) not understanding that the commit summaries are merely helpful
comments to help them find the right hashes.
It may be difficult to correct users' poor commit messages, but we can
at least try to make it clearer that the commit summaries are not
directives of some sort by inserting a comment character. Hopefully
that leads to them looking a little further and noticing the hints at
the bottom to use 'reword' or 'edit' directives.
Yes, this change may look funny at first since it hardcodes '#' rather
than using comment_line_str. However:
* comment_line_str exists to allow disambiguation between lines in
a commit message and lines that are instructions to users editing
the commit message. No such disambiguation is needed for these
comments that occur on the same line after existing directives
* the exact "comment" character(s) on regular pick lines used aren't
actually important; I could have used anything, including completely
random variable length text for each line and it'd work because we
ignore everything after 'pick' and the hash.
* The whole point of this change is to signal to users that they
should NOT be editing any part of the line after the hash (and if
they do so, their edits will be ignored), while the whole point of
comment_line_str is to allow highly flexible editing. So making
it more general by using comment_line_str actually feels
counterproductive.
* The character for merge directives absolutely must be '#'; that
has been deeply hardcoded for a long time (see below), and will
break if some other comment character is used instead. In a
desire to have pick and merge directives be similar, I use the
same comment character for both.
* Perhaps merge directives could be fixed to not be inflexible about
the comment character used, if someone feels highly motivated, but
I think that should be done in a separate follow-on patch.
Here are (some of?) the locations where '#' has already been hardcoded
for a long time for merges:
1) In check_label_or_ref_arg():
case TODO_LABEL:
/*
* '#' is not a valid label as the merge command uses it to
* separate merge parents from the commit subject.
*/
2) In do_merge():
/*
* For octopus merges, the arg starts with the list of revisions to be
* merged. The list is optionally followed by '#' and the oneline.
*/
merge_arg_len = oneline_offset = arg_len;
for (p = arg; p - arg < arg_len; p += strspn(p, " \t\n")) {
if (!*p)
break;
if (*p == '#' && (!p[1] || isspace(p[1]))) {
3) In label_oid():
if ((buf->len == the_hash_algo->hexsz &&
!get_oid_hex(label, &dummy)) ||
(buf->len == 1 && *label == '#') ||
hashmap_get_from_hash(&state->labels,
strihash(label), label)) {
/*
* If the label already exists, or if the label is a
* valid full OID, or the label is a '#' (which we use
* as a separator between merge heads and oneline), we
* append a dash and a number to make it unique.
*/
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 16 May 2025 14:55:30 +0000 (14:55 +0000)]
p2000: add performance test for patch-mode commands
The previous three changes contributed performance improvements to 'git
apply', 'git add -p', and 'git reset -p' when using a sparse index. The
improvement to 'git apply' also improved 'git checkout -p'. Add
performance tests to demonstrate this (and to help validate that
performance remains good in the future).
In the truncated test output below, we see that the full checkout
performance changes within noise expectations, but the sparse index
cases improve 33% and then 96% for 'git add -p' and 41% and then 95% for
'git reset -p'. 'git checkout -p' improves immediatley by 91% because it
does not need any change to its builtin.
It is worth noting that if our test was more involved and had multiple
hunks to evaluate, then the time spent in 'git apply' would dominate due
to multiple index loads and writes. As it stands, we need the sparse
index improvement in 'git add -p' itself to confirm this performance
improvement.
Since the change for 'git add -i' is identical, we avoid a second test
case for that similar operation.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 16 May 2025 14:55:29 +0000 (14:55 +0000)]
reset: integrate sparse index with --patch
Similar to the previous change for 'git add -p', the reset builtin
checked for integration with the sparse index after possibly redirecting
its logic toward the interactive logic. This means that the builtin
would expand the sparse index to a full one upon read.
Move this check earlier within cmd_reset() to improve performance here.
Add tests to guarantee that we are not universally expanding the index.
Add behavior tests to check that we are doing the same operations as a
full index.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 16 May 2025 14:55:28 +0000 (14:55 +0000)]
git add: make -p/-i aware of sparse index
It is slow to expand a sparse index in-memory due to parsing of trees.
We aim to minimize that performance cost when possible. 'git add -p'
uses 'git apply' child processes to modify the index, but still there
are some expansions that occur.
It turns out that control flows out of cmd_add() in the interactive
cases before the lines that confirm that the builtin is integrated with
the sparse index.
Moving that integration point earlier in cmd_add() allows 'git add -i'
and 'git add -p' to operate without expanding a sparse index to a full
one.
Add test cases that confirm that these interactive add options work with
the sparse index.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 16 May 2025 14:55:27 +0000 (14:55 +0000)]
apply: integrate with the sparse index
The sparse index allows storing directory entries in the index, marked
with the skip-wortkree bit and pointing to a tree object. This may be an
unexpected data shape for some implementation areas, so we are rolling
it out incrementally on a builtin-per-builtin basis.
This change enables the sparse index for 'git apply'. The main
motivation for this change is that 'git apply' is used as a child
process of 'git add -p' and expanding the sparse index for each of those
child processes can lead to significant performance issues.
The good news is that the actual index manipulation code used by 'git
apply' is already integrated with the sparse index, so the only product
change is to mark the builtin as allowing the sparse index so it isn't
inflated on read.
The more involved part of this change is around adding tests that verify
how 'git apply' behaves in a sparse-checkout environment and whether or
not the index expands in certain operations.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Moumita Dhar [Fri, 16 May 2025 14:45:12 +0000 (20:15 +0530)]
userdiff: extend Bash pattern to cover more shell function forms
The previous function regex required explicit matching of function
bodies using `{`, `(`, `((`, or `[[`, which caused several issues:
- It failed to capture valid functions where `{` was on the next line
due to line continuation (`\`).
- It did not recognize functions with single command body, such as
`x () echo hello`.
Replacing the function body matching logic with `.*$`, ensures
that everything on the function definition line is captured.
Additionally, the word regex is refined to better recognize shell
syntax, including additional parameter expansion operators and
command-line options.
Signed-off-by: Moumita Dhar <dhar61595@gmail.com> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:50:13 +0000 (00:50 -0400)]
object-file: drop support for writing objects with unknown types
Since "hash-object --literally" no longer supports objects with unknown
types, there are now no callers of write_object_file_literally() and its
helpers. Let's drop them to simplify the code.
In particular, this gets rid of some ugly copy-and-paste code from
write_object_file_literally(), which is a parallel implementation of
write_object_file(). When the split was originally made, the two weren't
that long, but commits like 63a6745a07 (object-file: update the loose
object map when writing loose objects, 2023-10-01) ended up having to
duplicate some tricky code.
This patch drops all of that duplication and should make things less
error-prone going forward.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:50:10 +0000 (00:50 -0400)]
hash-object: handle --literally with OPT_NEGBIT
Since we recently removed the hash_literally() function, the hash-object
--literally option has been simplified to just removing the
INDEX_FORMAT_CHECK flag. Rather than pass it around as a separate bool,
we can just have the option parser remove the bit from the set of flags
directly. This simplifies the helper functions.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:50:08 +0000 (00:50 -0400)]
hash-object: merge HASH_* and INDEX_* flags
The hash-object command has its own custom flag bits that it sets based
on command-line options. But since we dropped hash_literally() in the
previous commit, the only thing we do with those flag bits is convert
them directly into "index_flags" to pass to index_fd().
This extra layer of indirection makes the code harder to read and reason
about. Let's just use the INDEX_* flags directly.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:50:05 +0000 (00:50 -0400)]
hash-object: stop allowing unknown types
When passed the "--literally" option, hash-object will allow any
arbitrary string for its "-t" type option. Such objects are only useful
for testing or debugging, as they cannot be used in the normal way
(e.g., you cannot fetch their contents!).
Let's drop this feature, which will eventually let us simplify the
object-writing code. This is technically backwards incompatible, but
since such objects were never really functional, it seems unlikely that
anybody will notice.
We will retain the --literally flag, as it also instructs hash-object
not to worry about other format issues (e.g., type-specific things that
fsck would complain about). The documentation does not need to be
updated, as it was always vague about which checks we're loosening (it
uses only the phrase "any garbage").
The code change is a bit hard to verify from just the patch text. We can
drop our local hash_literally() helper, but it was really just wrapping
write_object_file_literally(). We now replace that with calling
index_fd(), as we do for the non-literal code path, but dropping the
INDEX_FORMAT_CHECK flag. This ends up being the same semantically as
what the _literally() code path was doing (modulo handling unknown
types, which is our goal).
We'll be able to clean up these code paths a bit more in subsequent
patches.
The existing test is flipped to show that we now reject the unknown
type. The additional "extra-long type" test is now redundant, as we bail
early upon seeing a bogus type.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:50:02 +0000 (00:50 -0400)]
t: add lib-loose.sh
This commit adds a shell library for writing raw loose objects into the
object database. Normally this is done with hash-object, but the
specific intent here is to allow broken objects that hash-object may not
support.
We'll convert several cases that use "hash-object --literally" to write
objects with invalid types. That works currently, but dropping this
dependency will allow us to remove that feature and simplify the
object-writing code.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:49:59 +0000 (00:49 -0400)]
t/helper: add zlib test-tool
It's occasionally useful when testing or debugging to be able to do raw
zlib inflate/deflate operations (e.g., to check the bytes of a specific
loose or packed object).
Even though zlib's deflate algorithm is used by many other programs,
this is surprisingly hard to do in a portable way. E.g., gzip can do
this if you manually munge some header bytes. But the result is somewhat
arcane, and we don't assume gzip is available anyway. Likewise, pigz
will handle raw zlib, but we can't assume it is available.
So let's introduce a short test helper for just doing zlib operations.
We'll use it in subsequent patches to add some new tests, but it would
also have come in handy a few times in the past:
- The hard-coded pack data from 3b910d0c5e (add tests for indexing
packs with delta cycles, 2013-08-23) could probably be generated on
the fly.
- Likewise we could avoid the hard-coded data from 0b1493c2d4
(git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT,
2025-02-25). Though note this would require support for more zlib
options.
- It would have helped with the debugging documented in 41dfbb2dbe
(howto: add article on recovering a corrupted object, 2013-10-25).
I'll leave refactoring existing tests for another day, but I hope the
examples above show the general utility.
I aimed for simplicity in the code. In particular, it will read all
input into a memory buffer, rather than streaming. That makes the zlib
loops harder to get wrong (which has been a source of subtle bugs in the
past).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>