Derrick Stolee [Fri, 16 May 2025 18:11:55 +0000 (18:11 +0000)]
pack-objects: introduce GIT_TEST_PACK_PATH_WALK
There are many tests that validate whether 'git pack-objects' works as
expected. Instead of duplicating these tests, add a new test environment
variable, GIT_TEST_PACK_PATH_WALK, that implies --path-walk by default
when specified.
This was useful in testing the implementation of the --path-walk
implementation, helping to find tests that are overly specific to the
default object walk. These include:
- t0411-clone-from-partial.sh : One test fetches from a repo that does
not have the boundary objects. This causes the path-based walk to
fail. Disable the variable for this test.
- t5306-pack-nobase.sh : Similar to t0411, one test fetches from a repo
without a boundary object.
- t5310-pack-bitmaps.sh : One test compares the case when packing with
bitmaps to the case when packing without them. Since we disable the
test variable when writing bitmaps, this causes a difference in the
object list (the --path-walk option adds an extra object). Specify
--no-path-walk in both processes for the comparison. Another test
checks for a specific delta base, but when computing dynamically
without using bitmaps, the base object it too small to be considered
in the delta calculations so no base is used.
- t5316-pack-delta-depth.sh : This script cares about certain delta
choices and their chain lengths. The --path-walk option changes how
these chains are selected, and thus changes the results of this test.
- t5322-pack-objects-sparse.sh : This demonstrates the effectiveness of
the --sparse option and how it combines with --path-walk.
- t5332-multi-pack-reuse.sh : This test verifies that the preferred
pack is used for delta reuse when possible. The --path-walk option is
not currently aware of the preferred pack at all, so finds a
different delta base.
- t7406-submodule-update.sh : When using the variable, the --depth
option collides with the --path-walk feature, resulting in a warning
message. Disable the variable so this warning does not appear.
I want to call out one specific test change that is only temporary:
- t5530-upload-pack-error.sh : One test cares specifically about an
"unable to read" error message. Since the current implementation
performs delta calculations within the path-walk API callback, a
different "unable to get size" error message appears. When this
is changed in a future refactoring, this test change can be reverted.
Similar to GIT_TEST_NAME_HASH_VERSION, we do not add this option to the
linux-TEST-vars CI build as that's already an overloaded build.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 16 May 2025 18:11:54 +0000 (18:11 +0000)]
p5313: add performance tests for --path-walk
The previous change added a --path-walk option to 'git pack-objects'.
Create a performance test that demonstrates the time and space benefits
of the feature.
In order to get an appropriate comparison, we need to avoid reusing
deltas and recompute them from scratch.
Compare the creation of a thin pack representing a small push and the
creation of a relatively large non-thin pack.
Running on my copy of the Git repository results in this data (removing
the repack tests for --name-hash-version):
Test this tree
------------------------------------------------------------------------
5313.2: thin pack with --name-hash-version=1 0.02(0.01+0.01)
5313.3: thin pack size with --name-hash-version=1 1.6K
5313.4: big pack with --name-hash-version=1 2.55(4.20+0.26)
5313.5: big pack size with --name-hash-version=1 16.4M
5313.6: shallow fetch pack with --name-hash-version=1 1.24(2.03+0.08)
5313.7: shallow pack size with --name-hash-version=1 12.2M
5313.10: thin pack with --name-hash-version=2 0.03(0.01+0.01)
5313.11: thin pack size with --name-hash-version=2 1.6K
5313.12: big pack with --name-hash-version=2 1.91(3.23+0.20)
5313.13: big pack size with --name-hash-version=2 16.4M
5313.14: shallow fetch pack with --name-hash-version=2 1.06(1.57+0.10)
5313.15: shallow pack size with --name-hash-version=2 12.5M
5313.18: thin pack with --path-walk 0.03(0.01+0.01)
5313.19: thin pack size with --path-walk 1.6K
5313.20: big pack with --path-walk 2.05(3.24+0.27)
5313.21: big pack size with --path-walk 16.3M
5313.22: shallow fetch pack with --path-walk 1.08(1.66+0.07)
5313.23: shallow pack size with --path-walk 12.4M
Note that the timing is slower because there is no threading in the
--path-walk case (yet). Also, the shallow pack cases are really not
using the --path-walk logic right now because it is disabled until some
additions are made to the path walk API.
The cases where the --path-walk option really shines is when the default
name-hash is overwhelmed with unhelpful collisions. An open source
example can be found in the microsoft/fluentui repo [1] at a certain
commit [2].
Notice in particular that in the small thin pack, the time performance
has improved from 0.36s for --name-hash-version=1 to 0.08s and this is
likely due to the improved size of the resulting pack: 18.4K instead of
1.2M. The relatively new --name-hash-version=2 is competitive with
--path-walk (0.12s and 22.0K) but not quite as successful.
Finally, running this on a copy of the Linux kernel repository results
in these data points:
Derrick Stolee [Fri, 16 May 2025 18:11:53 +0000 (18:11 +0000)]
pack-objects: update usage to match docs
The t0450 test script verifies that builtin usage matches the synopsis
in the documentation. Adjust the builtin to match and then remove 'git
pack-objects' from the exception list.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 16 May 2025 18:11:52 +0000 (18:11 +0000)]
pack-objects: add --path-walk option
In order to more easily compute delta bases among objects that appear at
the exact same path, add a --path-walk option to 'git pack-objects'.
This option will use the path-walk API instead of the object walk given
by the revision machinery. Since objects will be provided in batches
representing a common path, those objects can be tested for delta bases
immediately instead of waiting for a sort of the full object list by
name-hash. This has multiple benefits, including avoiding collisions by
name-hash.
The objects marked as UNINTERESTING are included in these batches, so we
are guaranteeing some locality to find good delta bases.
After the individual passes are done on a per-path basis, the default
name-hash is used to find other opportunistic delta bases that did not
match exactly by the full path name.
The current implementation performs delta calculations while walking
objects, which is not ideal for a few reasons. First, this will cause
the "Enumerating objects" phase to be much longer than usual. Second, it
does not take advantage of threading during the path-scoped delta
calculations. Even with this lack of threading, the path-walk option is
sometimes faster than the usual approach. Future changes will refactor
this code to allow for threading, but that complexity is deferred until
later to keep this patch as simple as possible.
This new walk is incompatible with some features and is ignored by
others:
* Object filters are not currently integrated with the path-walk API,
such as sparse-checkout or tree depth. A blobless packfile could be
integrated easily, but that is deferred for later.
* Server-focused features such as delta islands, shallow packs, and
using a bitmap index are incompatible with the path-walk API.
* The path walk API is only compatible with the --revs option, not
taking object lists or pack lists over stdin. These alternative ways
to specify the objects currently ignores the --path-walk option
without even a warning.
Future changes will create performance tests that demonstrate the power
of this approach.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 16 May 2025 14:55:30 +0000 (14:55 +0000)]
p2000: add performance test for patch-mode commands
The previous three changes contributed performance improvements to 'git
apply', 'git add -p', and 'git reset -p' when using a sparse index. The
improvement to 'git apply' also improved 'git checkout -p'. Add
performance tests to demonstrate this (and to help validate that
performance remains good in the future).
In the truncated test output below, we see that the full checkout
performance changes within noise expectations, but the sparse index
cases improve 33% and then 96% for 'git add -p' and 41% and then 95% for
'git reset -p'. 'git checkout -p' improves immediatley by 91% because it
does not need any change to its builtin.
It is worth noting that if our test was more involved and had multiple
hunks to evaluate, then the time spent in 'git apply' would dominate due
to multiple index loads and writes. As it stands, we need the sparse
index improvement in 'git add -p' itself to confirm this performance
improvement.
Since the change for 'git add -i' is identical, we avoid a second test
case for that similar operation.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 16 May 2025 14:55:29 +0000 (14:55 +0000)]
reset: integrate sparse index with --patch
Similar to the previous change for 'git add -p', the reset builtin
checked for integration with the sparse index after possibly redirecting
its logic toward the interactive logic. This means that the builtin
would expand the sparse index to a full one upon read.
Move this check earlier within cmd_reset() to improve performance here.
Add tests to guarantee that we are not universally expanding the index.
Add behavior tests to check that we are doing the same operations as a
full index.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 16 May 2025 14:55:28 +0000 (14:55 +0000)]
git add: make -p/-i aware of sparse index
It is slow to expand a sparse index in-memory due to parsing of trees.
We aim to minimize that performance cost when possible. 'git add -p'
uses 'git apply' child processes to modify the index, but still there
are some expansions that occur.
It turns out that control flows out of cmd_add() in the interactive
cases before the lines that confirm that the builtin is integrated with
the sparse index.
Moving that integration point earlier in cmd_add() allows 'git add -i'
and 'git add -p' to operate without expanding a sparse index to a full
one.
Add test cases that confirm that these interactive add options work with
the sparse index.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Fri, 16 May 2025 14:55:27 +0000 (14:55 +0000)]
apply: integrate with the sparse index
The sparse index allows storing directory entries in the index, marked
with the skip-wortkree bit and pointing to a tree object. This may be an
unexpected data shape for some implementation areas, so we are rolling
it out incrementally on a builtin-per-builtin basis.
This change enables the sparse index for 'git apply'. The main
motivation for this change is that 'git apply' is used as a child
process of 'git add -p' and expanding the sparse index for each of those
child processes can lead to significant performance issues.
The good news is that the actual index manipulation code used by 'git
apply' is already integrated with the sparse index, so the only product
change is to mark the builtin as allowing the sparse index so it isn't
inflated on read.
The more involved part of this change is around adding tests that verify
how 'git apply' behaves in a sparse-checkout environment and whether or
not the index expands in certain operations.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Moumita Dhar [Fri, 16 May 2025 14:45:12 +0000 (20:15 +0530)]
userdiff: extend Bash pattern to cover more shell function forms
The previous function regex required explicit matching of function
bodies using `{`, `(`, `((`, or `[[`, which caused several issues:
- It failed to capture valid functions where `{` was on the next line
due to line continuation (`\`).
- It did not recognize functions with single command body, such as
`x () echo hello`.
Replacing the function body matching logic with `.*$`, ensures
that everything on the function definition line is captured.
Additionally, the word regex is refined to better recognize shell
syntax, including additional parameter expansion operators and
command-line options.
Signed-off-by: Moumita Dhar <dhar61595@gmail.com> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:50:13 +0000 (00:50 -0400)]
object-file: drop support for writing objects with unknown types
Since "hash-object --literally" no longer supports objects with unknown
types, there are now no callers of write_object_file_literally() and its
helpers. Let's drop them to simplify the code.
In particular, this gets rid of some ugly copy-and-paste code from
write_object_file_literally(), which is a parallel implementation of
write_object_file(). When the split was originally made, the two weren't
that long, but commits like 63a6745a07 (object-file: update the loose
object map when writing loose objects, 2023-10-01) ended up having to
duplicate some tricky code.
This patch drops all of that duplication and should make things less
error-prone going forward.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:50:10 +0000 (00:50 -0400)]
hash-object: handle --literally with OPT_NEGBIT
Since we recently removed the hash_literally() function, the hash-object
--literally option has been simplified to just removing the
INDEX_FORMAT_CHECK flag. Rather than pass it around as a separate bool,
we can just have the option parser remove the bit from the set of flags
directly. This simplifies the helper functions.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:50:08 +0000 (00:50 -0400)]
hash-object: merge HASH_* and INDEX_* flags
The hash-object command has its own custom flag bits that it sets based
on command-line options. But since we dropped hash_literally() in the
previous commit, the only thing we do with those flag bits is convert
them directly into "index_flags" to pass to index_fd().
This extra layer of indirection makes the code harder to read and reason
about. Let's just use the INDEX_* flags directly.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:50:05 +0000 (00:50 -0400)]
hash-object: stop allowing unknown types
When passed the "--literally" option, hash-object will allow any
arbitrary string for its "-t" type option. Such objects are only useful
for testing or debugging, as they cannot be used in the normal way
(e.g., you cannot fetch their contents!).
Let's drop this feature, which will eventually let us simplify the
object-writing code. This is technically backwards incompatible, but
since such objects were never really functional, it seems unlikely that
anybody will notice.
We will retain the --literally flag, as it also instructs hash-object
not to worry about other format issues (e.g., type-specific things that
fsck would complain about). The documentation does not need to be
updated, as it was always vague about which checks we're loosening (it
uses only the phrase "any garbage").
The code change is a bit hard to verify from just the patch text. We can
drop our local hash_literally() helper, but it was really just wrapping
write_object_file_literally(). We now replace that with calling
index_fd(), as we do for the non-literal code path, but dropping the
INDEX_FORMAT_CHECK flag. This ends up being the same semantically as
what the _literally() code path was doing (modulo handling unknown
types, which is our goal).
We'll be able to clean up these code paths a bit more in subsequent
patches.
The existing test is flipped to show that we now reject the unknown
type. The additional "extra-long type" test is now redundant, as we bail
early upon seeing a bogus type.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:50:02 +0000 (00:50 -0400)]
t: add lib-loose.sh
This commit adds a shell library for writing raw loose objects into the
object database. Normally this is done with hash-object, but the
specific intent here is to allow broken objects that hash-object may not
support.
We'll convert several cases that use "hash-object --literally" to write
objects with invalid types. That works currently, but dropping this
dependency will allow us to remove that feature and simplify the
object-writing code.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:49:59 +0000 (00:49 -0400)]
t/helper: add zlib test-tool
It's occasionally useful when testing or debugging to be able to do raw
zlib inflate/deflate operations (e.g., to check the bytes of a specific
loose or packed object).
Even though zlib's deflate algorithm is used by many other programs,
this is surprisingly hard to do in a portable way. E.g., gzip can do
this if you manually munge some header bytes. But the result is somewhat
arcane, and we don't assume gzip is available anyway. Likewise, pigz
will handle raw zlib, but we can't assume it is available.
So let's introduce a short test helper for just doing zlib operations.
We'll use it in subsequent patches to add some new tests, but it would
also have come in handy a few times in the past:
- The hard-coded pack data from 3b910d0c5e (add tests for indexing
packs with delta cycles, 2013-08-23) could probably be generated on
the fly.
- Likewise we could avoid the hard-coded data from 0b1493c2d4
(git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT,
2025-02-25). Though note this would require support for more zlib
options.
- It would have helped with the debugging documented in 41dfbb2dbe
(howto: add article on recovering a corrupted object, 2013-10-25).
I'll leave refactoring existing tests for another day, but I hope the
examples above show the general utility.
I aimed for simplicity in the code. In particular, it will read all
input into a memory buffer, rather than streaming. That makes the zlib
loops harder to get wrong (which has been a source of subtle bugs in the
past).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:49:56 +0000 (00:49 -0400)]
oid_object_info(): drop type_name strbuf
We provide a mechanism for callers to get the object type as a raw
string, rather than an object_type enum. This was in theory useful for
returning types that are not representable in the enum, but we consider
any such type to be an error, and there are no callers that use the
strbuf anymore.
Let's drop support to simplify the code a bit.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:49:53 +0000 (00:49 -0400)]
fsck: stop using object_info->type_name strbuf
When fsck-ing a loose object, we use object_info's type_name strbuf to
record the parsed object type as a string. For most objects this is
redundant with the object_type enum, but it does let us report the
string when we encounter an object with an unknown type (for which there
is no matching enum value).
There are a few downsides, though:
1. The code to report these cases is not actually robust. Since we did
not pass a strbuf to unpack_loose_header(), we only retrieved types
from headers up to 32 bytes. In longer cases, we'd simply say
"object corrupt or missing".
2. This is the last caller that uses object_info's type_name strbuf
support. It would be nice to refactor it so that we can simplify
that code.
3. Likewise, we'll check the hash of the object using its unknown type
(again, as long as that type is short enough). That depends on the
hash_object_file_literally() code, which we'd eventually like to
get rid of.
So we can simplify things by bailing immediately in read_loose_object()
when we encounter an unknown type. This has a few user-visible effects:
a. Instead of producing a single line of error output like this:
we'll now issue two lines (the first from read_loose_object() when
we see the unparsable header, and the second from the fsck code,
since we couldn't read the object):
This is a little more verbose, but this sort of error should be
rare (such objects are almost impossible to work with, and cannot
be transferred between repositories as they are not representable
in packfiles). And as a bonus, reporting the broken header in full
could help with debugging other cases (e.g., a header like "blob
xyzzy\0" would fail in parsing the size, but previously we'd not
have showed the offending bytes).
b. An object with an unknown type will be reported as corrupt, without
actually doing a hash check. Again, I think this is unlikely to
matter in practice since such objects are totally unusable.
We'll update one fsck test to match the new error strings. And we can
remove another test that covered the case of an object with an unknown
type _and_ a hash corruption. Since we'll skip the hash check now in
this case, the test is no longer interesting.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:49:50 +0000 (00:49 -0400)]
oid_object_info_convert(): stop using string for object type
In oid_object_info_convert(), we convert objects between their sha1 and
sha256 variants. To do this, we naturally need to know the type, which
we get from oid_object_info_extended() using its type_name strbuf
option.
But getting the value as a string (versus an object_type enum) is not
helpful. Since we do not allow unknown types, the regular enum is
sufficient. And the resulting code is a bit simpler, as we no longer
have to manage the extra allocation nor convert the string to an enum
ourselves.
Note that at first glance, it might seem like we should retain the error
check for "type == -1" to catch bogus types found by the underlying
parser. But we don't need it, as an unknown type would have yielded an
error from the call to oid_object_info_extended(), which would already
have caused us to return an error.
In fact, I suspect this was always impossible to trigger. Even when we
were converting the string to a type enum ourselves, an invalid type
would never have escaped oid_object_info_extended(), since we never
passed the (now removed) OBJECT_INFO_ALLOW_UNKNOWN_TYPE option.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:49:47 +0000 (00:49 -0400)]
cat-file: use type enum instead of buffer for -t option
Now that we no longer support OBJECT_INFO_ALLOW_UNKNOWN_TYPE, there is
no need to pass a strbuf into oid_object_info_extended() to record the
type. The regular object_type enum is sufficient to capture all of the
types we will allow.
This simplifies the code a bit, and will eventually let us drop
object_info's type_name strbuf support.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:49:45 +0000 (00:49 -0400)]
object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag
Since cat-file dropped its "--allow-unknown-type" option in the previous
commit, there are no more uses of the internal flag that implemented it.
Let's drop it.
That in turn lets us drop the strbuf parameter of unpack_loose_header(),
which now is always NULL. And without that, we can drop all of the
additional code to inflate larger headers into the strbuf.
Arguably we could drop ULHR_TOO_LONG, as no callers really care about
the distinction from ULHR_BAD. But it's easy enough to retain, and it
does let us produce a slightly more specific message in one instance.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 16 May 2025 04:49:35 +0000 (00:49 -0400)]
cat-file: make --allow-unknown-type a noop
The cat-file command has some minor support for handling objects with
"unknown" types. I.e., strings that are not "blob", "commit", "tree", or
"tag".
In theory this could be used for debugging or experimenting with
extensions to Git. But in practice this support is not very useful:
1. You can get the type and size of such objects, but nothing else.
Not even the contents!
2. Only loose objects are supported, since packfiles use numeric ids
for the types, rather than strings.
3. Likewise you cannot ever transfer objects between repositories,
because they cannot be represented in the packfiles used for the
on-the-wire protocol.
The support for these unknown types complicates the object-parsing code,
and has led to bugs such as b748ddb7a4 (unpack_loose_header(): fix
infinite loop on broken zlib input, 2025-02-25). So let's drop it.
The first step is to remove the user-facing parts, which are accessible
only via cat-file. This is technically backwards-incompatible, but given
the limitations listed above, these objects couldn't possibly be useful
in any workflow.
However, we can't just rip out the option entirely. That would hurt a
caller who ran:
git cat-file -t --allow-unknown-object <oid>
and fed it normal, well-formed objects. There --allow-unknown-type was
doing nothing, but we wouldn't want to start bailing with an error. So
to protect any such callers, we'll retain --allow-unknown-type as a
noop.
The code change is fairly small (but we'll able to clean up more code in
follow-on patches). The test updates drop any use of the option. We
still retain tests that feed the broken objects to cat-file without
--allow-unknown-type, as we should continue to confirm that those
objects are rejected. Note that in one spot we can drop a layer of loop,
re-indenting the body; viewing the diff with "-w" helps there.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Provide an overview of the set of functions used for manipulating
`json_writer`s, by describing what functions should be used for
each JSON-related task.
Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a docstring for each function that manipulates json_writers.
Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 16 May 2025 00:24:55 +0000 (17:24 -0700)]
Merge branch 'ps/maintenance-missing-tasks'
Make repository clean-up tasks "gc" can do available to "git
maintenance" front-end.
* ps/maintenance-missing-tasks:
builtin/maintenance: introduce "rerere-gc" task
builtin/gc: move rerere garbage collection into separate function
builtin/maintenance: introduce "worktree-prune" task
builtin/gc: move pruning of worktrees into a separate function
builtin/gc: remove global variables where it is trivial to do
builtin/gc: fix indentation of `cmd_gc()` parameters
Junio C Hamano [Fri, 16 May 2025 00:24:55 +0000 (17:24 -0700)]
Merge branch 'cf/wrapper-bsd-eloop'
The fallback implementation of open_nofollow() depended on
open("symlink", O_NOFOLLOW) to set errno to ELOOP, but a few BSD
derived systems use different errno, which has been worked around.
* cf/wrapper-bsd-eloop:
wrapper: NetBSD gives EFTYPE and FreeBSD gives EMFILE where POSIX uses ELOOP
Lidong Yan [Fri, 9 May 2025 08:30:35 +0000 (08:30 +0000)]
commit-graph: fix memory leak when `fill_oids_from_packs()` fails
In commit-graph.c:fill_oids_from_packs, if open_pack_index failed,
memory allocated and returned by add_packed_git will leak. Simply
add close_pack and free(p) will solve this problem.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Lidong Yan [Wed, 14 May 2025 13:53:28 +0000 (13:53 +0000)]
sequencer: fix memory leak if `todo_list_rearrange_squash()` failed
In sequencer.c:todo_list_rearrange_squash, if it fails, memory
allocated in `next`, `tail`, `subjects` and `subject2item` will leak.
Jump to cleanup label before return could fix this leak problem.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Lidong Yan [Tue, 13 May 2025 02:49:10 +0000 (02:49 +0000)]
mailinfo: fix pointential memory leak if `decode_header` failed
In mailinfo.c:decode_header, if convert_to_utf8 failed, the strbuf stored
in dec will leak. Simply add strbuf_release and free(dec) will solve
this problem.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
sequencer: stop pretending that an assignment is a condition
In 3e81bccdf3 (sequencer: factor out todo command name parsing,
2019-06-27), a `return` statement was introduced that basically was a
long sequence of conditions, combined with `&&`, except for the last
condition which is not really a condition but an assignment.
The point of this construct was to return 1 (i.e. `true`) from the
function if all of those conditions held true, and also assign the `bol`
pointer to the end of the parsed command.
Some static analyzers are really unhappy about such constructs. And
human readers are at least puzzled, if not confused, by seeing a single
`=` inside a chain of conditions where they would have expected to see
`==` instead and, based on experience, immediately suspect a typo.
Let's help all of this by turning this into the more verbose, more
readable form of an `if` construct that both assigns the pointer as well
as returns 1 if all of the conditions hold true.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
bundle-uri: avoid using undefined output of `sscanf()`
In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
code was introduced that assumes that an `sscanf()` call leaves its
output variables unchanged unless the return value indicates success.
However, the POSIX documentation makes no such guarantee:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html
So let's make sure that the output variable `maxCreationToken` is
always well-defined.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code is a bit too hard to reason about to fully assess whether the
`fill_commit_graph_info()` function is called at all after
`write_commit_graph()` returns (and hence the stack variable
`topo_levels` goes out of context).
Let's simply make sure that the stack address is no longer used at that
stage, thereby making the code quite a bit easier to reason about.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
CodeQL reports empty `if` blocks that only contain a comment as "futile
conditional". The comment talks about potential plans to turn this into
a warning, but that seems not to have been necessary. Replace the entire
construct with a concise comment.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
While `if (i <= 0) ... else if (i > 0) ...` is technically equivalent to
`if (i <= 0) ... else ...`, the latter is vastly easier to read because
it avoids writing out a condition that is unnecessary. Let's drop such
unnecessary conditions.
Pointed out by CodeQL.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch: avoid unnecessary work when there is no current branch
As pointed out by CodeQL, `branch_get()` may return `NULL`, in which
case `branch_has_merge_config()` would return early, but we can even
avoid enumerating the refs prefixes in that case, saving even more CPU
cycles.
Technically, we should enclose these two statements in an `if (branch)
{...}` block, but the indentation is already quite deep, therefore I
refrained from doing that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
One thing that might be non-obvious to readers (or to analyzers like
CodeQL) is that the function essentially does nothing when the Git index
is empty, and in particular that it does not look at the value of
`len_eq_last` (which would be uninitialized at that point).
Let's make this much easier to understand, by returning early if the Git
index is empty, and by avoiding empty `else` blocks.
This commit changes indentation and is hence best viewed using
`--ignore-space-change`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
upload-pack: rename `enum` to reflect the operation
While 3145ea957d (upload-pack: introduce fetch server command,
2018-03-15) added support for the `fetch` command, from the server's
point of view it is an upload, and hence the `enum` should really be
called `upload_state` instead of `fetch_state`. Likewise, rename its
values.
This also helps unconfuse CodeQL which would otherwise be at sixes or
sevens about having _two_ non-local definitions of the same `enum` with
the same values.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do need a context to write the commit graph, but that context is only
needed during the life time of `commit_graph_write()`, therefore it can
easily be a stack variable.
This also helps CodeQL recognize that it is safe to assign the address
of other local variables to the context's fields.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch: carefully clear local variable's address after use
As pointed out by CodeQL, it is a potentially dangerous practice to
store local variables' addresses in non-local structs. Yet this is
exactly what happens with the `acked_commits` attribute that is used in
`cmd_fetch()`: The pointer to a local variable is assigned to it.
Now, it is Git's convention that `cmd_*()` functions are essentially
only returning just before exiting the process, therefore there is
little danger that this attribute is used after the code flow returns
from that function.
However, code in `cmd_*()` function is often so useful that it gets
lifted into a library function, at which point this issue could become a
real problem.
Let's make sure to clear the `acked_commits` attribute out after it was
used, and before the function returns (at which point the address would
go stale).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The difference of two unsigned integers is defined to be unsigned, and
therefore it is misleading to check whether it is greater than zero
(instead, the more natural way would be to check whether the difference
is zero or not).
Let's instead avoid the subtraction altogether, and compare the two
operands directly, which makes the code more obvious as a side effect.
Pointed out by CodeQL's rule with the ID
`cpp/unsigned-difference-expression-compared-zero`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Sixt [Wed, 14 May 2025 20:41:30 +0000 (22:41 +0200)]
git-gui: do not end the commit message with an empty line
The commit message is processed to remove unnecessary empty lines.
In particular, it is ensured that the text ends with at most one LF
character. This one is always present, because the Tk text widget
ensures that is present.
However, did not consider that the processed text is written to the
commit message file using `puts`, which also appends a LF character,
so that the final commit message ends with two LF. Trim all trailing
LF characters, and while we are here, use `string trim`, which lets
us remove the leading LF in the same command.
Reported-by: Gareth Fenn <garethfenn@gmail.com> Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
gitk: do not hard-code color of search results in commit list
A global variable exists that holds the color name used to highlight
search results everywhere, except that in the commit list the color
is still hard-coded to "yellow". Use the global variable there as well.
Signed-off-by: Alexander Ogorodov <bnfour@bnfour.net>
Elijah Newren [Wed, 14 May 2025 20:33:25 +0000 (20:33 +0000)]
replay: replace the_repository with repo parameter passed to cmd_replay ()
Replace the_repository everywhere with repo, feed repo from cmd_replay()
to all the other functions in the file that need it, and remove the
UNUSED annotation on repo.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
shejialuo [Wed, 14 May 2025 15:50:42 +0000 (23:50 +0800)]
packed-backend: mmap large "packed-refs" file during fsck
During fsck, we use "strbuf_read" to read the content of "packed-refs"
without using mmap mechanism. This is a bad practice which would consume
more memory than using mmap mechanism. Besides, as all code paths in
"packed-backend.c" use this way, we should make "fsck" align with the
current codebase.
As we have introduced the helper function "allocate_snapshot_buffer", we
can simply use this function to use mmap mechanism.
Suggested-by: Jeff King <peff@peff.net> Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
shejialuo [Wed, 14 May 2025 15:50:35 +0000 (23:50 +0800)]
packed-backend: extract snapshot allocation in `load_contents`
"load_contents" would choose which way to load the content of the
"packed-refs". However, we cannot directly use this function when
checking the consistency due to we don't want to open the file. And we
also need to reuse the logic to avoid causing repetition.
Let's create a new helper function "allocate_snapshot_buffer" to extract
the snapshot allocation logic in "load_contents" and update the
"load_contents" to align with the behavior.
Suggested-by: Jeff King <peff@peff.net> Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
shejialuo [Wed, 14 May 2025 15:50:26 +0000 (23:50 +0800)]
packed-backend: fsck should warn when "packed-refs" file is empty
We assume the "packed-refs" won't be empty and instead has at least one
line in it (even when there are no refs packed, there is the file header
line). Because there is no terminating LF in the empty file, we will
report "packedRefEntryNotTerminated(ERROR)" to the user.
However, the runtime code paths would accept an empty "packed-refs"
file, for example, "create_snapshot" would simply return the "snapshot"
without checking the content of "packed-refs". So, we should skip
checking the content of "packed-refs" when it is empty during fsck.
After 694b7a1999 (repack_without_ref(): write peeled refs in the
rewritten file, 2013-04-22), we would always write a header into the
"packed-refs" file. So, versions of Git that are not too ancient never
write such an empty "packed-refs" file.
As an empty file often indicates a sign of a filesystem-level issue, the
way we want to resolve this inconsistency is not make everybody totally
silent but notice and report the anomaly.
Let's create a "FSCK_INFO" message id "EMPTY_PACKED_REFS_FILE" to report
to the users that "packed-refs" is empty.
Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Wed, 14 May 2025 13:52:44 +0000 (09:52 -0400)]
scalar reconfigure: improve --maintenance docs
The --maintenance option for 'scalar reconfigure' has three possible
values. Improve the documentation by specifying the option in the -h
help menu and usage information.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Aditya Garg [Mon, 12 May 2025 08:11:19 +0000 (08:11 +0000)]
send-email: try to get fqdn by running hostname -f on Linux and macOS
`hostname` is a popular command available on both Linux and macOS. As
per the man-page[1], `hostname -f` command returns the fully qualified
domain name (FQDN) of the system. The current Net::Domain perl module
being used in the script for the same has been quite unrealiable in many
cases. Thankfully, we now have a better check for valid_fqdn, which does
reject the invalid FQDNs given by this module properly, but at the same
time, it will result in a fallback to 'localhost.localdomain' being
used. `hostname -f` has been quite reliable (probably even more reliable
than the Net::Domain module) and before falling back to
'localhost.localdomain', we should try to use it. Interestingly, the
`hostname` command is actually used by perl modules like Net::Domain[2]
and Sys::Hostname[3] to get the hostname. So, lets give `hostname -f` a
chance as well!
Junio C Hamano [Tue, 13 May 2025 21:05:06 +0000 (14:05 -0700)]
Merge branch 'js/ci-buildsystems-cleanup'
Code clean-up around stale CI elements and building with Visual Studio.
* js/ci-buildsystems-cleanup:
config.mak.uname: drop the `vcxproj` target
contrib/buildsystems: drop support for building . vcproj/.vcxproj files
ci: stop linking the `prove` cache
With 7304bd2bc39 (ci: wire up Visual Studio build with Meson,
2025-01-22) we have introduced a CI job that builds and tests Git with
Microsoft Visual Studio via Meson. This job is only being executed by
default on GitHub Workflows though -- on GitLab CI it is marked as a
"manual" job, so the developer has to actively trigger these jobs.
The consequence of this split is that any breakage specific to this job
is only noticed by developers who mainly work with GitHub. Let's improve
this situation by also running the job by default on GitLab CI.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-gui: wire up support for the Meson build system
The Git project has started to wire up Meson as a build system in Git
v2.48.0. Wire up support for Meson in "git-gui" so that we can trivially
include it as a subproject in Git.
The "GITGUI_VERSION" variable is made available by generating and
including the "GIT-VERSION-FILE" file. Its value has been used in
various build steps, but in the preceding commits we have refactored
those to instead source the "GIT-VERSION-FILE" directly. As a result,
the variable is now only used in a single recipe, and this use can be
trivially replaced with sed(1).
Refactor the recipe to do so and stop including "GIT-VERSION-FILE" to
simplify the build process.
Extract script to generate the macOS app. This change allows us to reuse
the build logic with the Meson build system.
Note that as part of this change we also modify the TKEXECUTABLE
variable to track its full path. Like this we don't have to propagate
both the TKEXECUTABLE and TKFRAMEWORK variables into the script, and the
basename can be trivially computed from TKEXECUTABLE anyway.
The value of the GITGUI_SCRIPT variable is only used in a single place
as part of an sed(1) script that massages the "git-gui.sh" script.
Interestingly, this specific replacement does seem to be a no-op: we
replace "^ argv0=$$0" with " argv=$(GITGUI_SCRIPT)", which has a value
of "$$0". The result would thus be completely unchanged.
git-gui: make output of GIT-VERSION-GEN source'able
The output of GIT-VERSION-GEN can be sourced by our Makefile to make the
version available there. The output has a couple of spaces around the
equals sign, which is perfectly valid for parsing it in our Makefile.
But in subsequent steps we'll also want to source the file in a couple
of newly-introduced shell scripts, but having spaces around variable
assignments is invalid there.
Prepare for this step by dropping the spaces surrounding the equals
sign. Like this, we can easily use the same file both in our Makefile
and in shell scripts.
git-gui: prepare GIT-VERSION-GEN for out-of-tree builds
The GIT-VERSION-GEN unconditionally writes version information into the
source directory in the form of the "GIT-VERSION-FILE". We are about to
introduce the Meson build system though, which enforces out-of-tree
builds by default, and in that context we cannot continue to write
version information into the source tree.
Prepare the script for out-of-tree builds by treating the source
directory different from the output file.
git-gui: replace GIT-GUI-VARS with GIT-GUI-BUILD-OPTIONS
The GIT-GUI-VARS file is used to track whether any of our build options
has changed. Unfortunately, the format of that file does not allow us to
propagate those build options to other scripts. But as we are about to
introduce support for the Meson build system, we will extract a couple
of scripts to deduplicate core build logic across Makefiles and Meson.
With this refactoring, it will become necessary to make build options
more widely accessible.
Replace GIT-GUI-VARS with a new GIT-GUI-BUILD-OPTIONS file that is being
populated from a template. This file can easily be sourced from build
scripts in subsequent steps.
Junio C Hamano [Mon, 12 May 2025 19:03:11 +0000 (12:03 -0700)]
whatschanged: list it in BreakingChanges document
This can be squashed into the previous step. That is how our "git
pack-redundant" conversion did.
Theoretically, however, those who want to gauge the need to keep the
command by exposing their users to patches before this one may want
to wait until their experiment finishes before they formally say
"this will go away".
This change is made into a separate patch from the previous step
precisely to help those folks.
While at it, update the documentation page to use the new [synopsis]
facility to mark-up the SYNOPSIS part.
Helped-by: Elijah Newren <newren@gmail.com>
[en: typofix] Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 12 May 2025 19:03:10 +0000 (12:03 -0700)]
whatchanged: remove when built with WITH_BREAKING_CHANGES
As we made "git whatchanged" require "--i-still-use-this" and asked
the users to report if they still want to use it, the logical next
step is to allow us build Git without "whatchanged" to prepare for
its eventual removal.
If we were to follow the pattern established in 8ccc75c2 (remote:
announce removal of "branches/" and "remotes/", 2025-01-22), we can
do this together with the documentation update to officially list
that the command will be removed in the BreakingChanges document,
but let's just keep the changes separate just in case we want to
proceed a bit slower.
Junio C Hamano [Mon, 12 May 2025 19:03:09 +0000 (12:03 -0700)]
whatchanged: require --i-still-use-this
The documentation of "git whatchanged" is pretty explicit that the
command was retained for historical reasons to help those whose fingers
cannot be retrained. Let's see if they still are finding it hard to
type "git log --raw" instead of "git whatchanged" by marking the
command as "nominated for removal", and require "--i-still-use-this"
on the command line. Adjust the tests so that the option is passed
when we invoke the command. In addition, we test that the command
fails when "--i-still-use-this" is not given.
Junio C Hamano [Mon, 12 May 2025 19:03:08 +0000 (12:03 -0700)]
tests: prepare for a world without whatchanged
Some tests on fast-import run "git whatchanged" without even
checking the output from the command. It is tempting to remove the
calls altogether since they are not doing anything useful, but they
presumably were added there while the tests were developed to manually
sanity check which paths were touched.
Replace these calls with "git log --raw", which is a rough
equivalent in the more modern Git.
This does not remove "git whatchanged", but we no longer have to
worry about adjusting these places when we eventually do.
Helped-by: Elijah Newren <newren@gmail.com>
[en: log message] Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 12 May 2025 21:22:48 +0000 (14:22 -0700)]
Merge branch 'ps/object-store-cleanup'
Further code clean-up in the object-store layer.
* ps/object-store-cleanup:
object-store: drop `repo_has_object_file()`
treewide: convert users of `repo_has_object_file()` to `has_object()`
object-store: allow fetching objects via `has_object()`
object-store: move function declarations to their respective subsystems
object-store: move and rename `odb_pack_keep()`
object-store: drop `loose_object_path()`
object-store: move `struct packed_git` into "packfile.h"
Junio C Hamano [Mon, 12 May 2025 19:03:06 +0000 (12:03 -0700)]
you-still-use-that??: help deprecating commands for removal
Commands slated for removal like "git pack-redundant" now require
an explicit "--i-still-use-this" option to run. This is to
discourage casual use and surface their pending deprecation to
users.
The warning message is long, so factor it into a helper function
you_still_use_that() to simplify reuse by other commands.
Also add a missing test to ensure this enforcement works for
"pack-redundant".
Helped-by: Elijah Newren <newren@gmail.com>
[en: log message] Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 12 May 2025 18:52:33 +0000 (14:52 -0400)]
raw_object_store: drop extra pointer to replace_map
We store the replacement data in an oidmap, which is itself a pointer in
the raw_object_store struct. But there's no need for an extra pointer
indirection here. It is always allocated and initialized along with the
containing struct, and we never check it for NULL-ness.
Let's embed the map directly in the struct, which is simpler and avoids
extra pointer chasing.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 12 May 2025 18:51:30 +0000 (14:51 -0400)]
oidmap: add size function
Callers which want to know how many items are in an oidmap have to look
at the underlying hashmap struct, leaking an implementation detail.
Let's provide a type-appropriate wrapper and use it.
Note in the call from lookup_replace_object(), the caller was actually
looking at the hashmap's tablesize parameter (the allocated size of the
table) rather than hashmap_get_size(), the number of items in the table.
This probably should have been checking the number of items all along,
but the two are functionally equivalent here since we only add to the
map and never remove anything. Thus if there was any allocation, it was
because there is at least one item.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 12 May 2025 18:50:28 +0000 (14:50 -0400)]
oidmap: rename oidmap_free() to oidmap_clear()
This function does not free the oidmap struct itself; it just drops all
items from the map (using hashmap_clear_() internally). It should be
called oidmap_clear(), per CodingGuidelines.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Lidong Yan [Mon, 12 May 2025 12:22:10 +0000 (12:22 +0000)]
pack-bitmap: fix memory leak if `load_bitmap_entries_v1` failed
In pack-bitmap.c:load_bitmap_entries_v1, the function `read_bitmap_1`
allocates a bitmap and reads index data into it. However, if any of
the validation checks following the allocation fail, the allocated bitmap
is not freed, resulting in a memory leak. To avoid this, the validation
checks should be performed before the bitmap is allocated.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "stats" directory contains a couple of scripts to do some statistics
on a repository:
- "git-common-hash" shows the longest common hash prefixes and can be
used to determine the minimum prefix length to use for object names
to be unique. The script has last been touched in 53474eb92ff
(contrib: update stats/mailmap script, 2012-12-12) and searching for
it on the internet doesn't really surface any potential use cases or
even mentions of it.
Modern Git also shouldn't really need this tool as it knows to
automatically scale printed prefixes via some heuristics.
- "mailmap.pl" performs some statistics on the number of mailmapped
commits in a repository. It has last been modified in 53474eb92ff
(contrib: update stats/mailmap script, 2012-12-12) and has since
been bitrotting. It doesn't even compile nowadays anymore:
$ perl contrib/stats/mailmap.pl
Experimental keys on scalar is now forbidden at contrib/stats/mailmap.pl line 57.
Type of arg 1 to keys must be hash or array (not hash element) at contrib/stats/mailmap.pl line 57, near "}) "
Experimental keys on scalar is now forbidden at contrib/stats/mailmap.pl line 57.
Type of arg 1 to keys must be hash or array (not private variable) at contrib/stats/mailmap.pl line 57, near "$h)"
Experimental keys on scalar is now forbidden at contrib/stats/mailmap.pl line 64.
Type of arg 1 to keys must be hash or array (not private variable) at contrib/stats/mailmap.pl line 64, near "$h)"
Execution of contrib/stats/mailmap.pl aborted due to compilation errors.
This should be good-enough signal to indicate that nobody is using
this script at all anymore.
- "packinfo.pl" takes the output from git-verify-pack(1) and performs
some pretty printing thereof. On the one hand it reformats the
output to be easier to read and provide some summaries. On the other
hand it may also print filenames of blobs.
We don't have any replacement for this tool. Ideally, we should move
its functionality into git-verify-pack(1) itself.
Remove the first two scripts, but retain "packinfo.pl".
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git-new-workdir" command has been introduced to make it possible to
have a separate working directory in a different place. The command thus
predates git-worktree(1), which is what people use nowadays to create
any such working directory. As such, the script doesn't really have much
of a reason to exist nowadays anymore.
It also doesn't seem like the script is still in use: the last time it
has received an update was in e32afab7b03 (git-new-workdir: don't fail
if the target directory is empty, 2014-11-26), more than a decade ago.
Remove it as well as the tests that depend on it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the "emacs/" directory still exists, all of its code has been
replaced with stubs in 6d5ed4836db (git{,-blame}.el: remove old
bitrotting Emacs code, 2018-04-11). Instead, the recommendation is to
use Emacs' own vc-annotate mode.
Remove the code altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git-resurrect.sh" script can be used to find traces of a branch tip
in the reflog and resurrect that branch. Despite a couple of global
cleanups, the script hasn't seen any activity since it was introduced in e1ff064e1bf (contrib git-resurrect: find traces of a branch name and
resurrect it, 2009-02-04).
Furthermore, the tool does not work with the "reftable" backend at all
as it directly reads ".git/logs/HEAD". As reflogs are stored as part of
the individual tables though that file wouldn't exist in a "reftable"-
enabled repository.
Last but not least, the tool doesn't even work unless it is explicitly
invoked via `git resurrect` as it sources "git-sh-setup". As none of our
build systems know to install this script, users thus have to go out of
their way to really make it work, which is highly unlikely.
Another source that indicates that this tool can be removed is a
question for how to restore deleted branches on StackOverflow [1]. The
top-voted answer uses git-reflog(1) directly and has received more than
3000 votes to date. While "git-resurrect.sh" is also mentioned, it only
got 16 upvotes, and comments mention the above caveat that users have to
do some manual setup to make it work.
It's thus rather clear that the tool doesn't have a lot or even any
users. Remove it.
The "persistent-https" remote helper supposedly speeds up SSL operations
by running a daemon that keeps a connection open to a remote server. It
is effectively unmaintained nowadays: the last time it received an
update was in accb613afd2 (contrib/persistent-https: use Git version for
build label, 2016-07-20) and its parent commits to make it compile with
Go 1.7+.
This Go toolchain is somewhat dated by now though and unsupported. The
oldest still-supported toolchain is Go 1.23, which was released in
August 2024. It is not possible to compile the remote helper with that
Go version anymore:
$ go version
go version go1.23.8 linux/amd64
$ make
case $(go version) in \
"go version go"1.[0-5].*) EQ=" " ;; *) EQ="=" ;; esac && \
go build -o git-remote-persistent-https \
-ldflags "-X main._BUILD_EMBED_LABEL${EQ}GIT_VERSION=2.49.0.943.g965a70ebf62"
go: cannot find main module, but found .git/config in /home/pks/Development/git
to create a module there, run:
cd ../.. && go mod init
make: *** [Makefile:31: git-remote-persistent-https] Error 1
The problem is that modern Go toolchains require a "go.mod" file, but we
don't have any such files. This requirement exists since quite a while
already, so it's clear that nobody has tried to use this remote helper
anytime recent.
Remove the remote helper.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "mw-to-git" directory contains tools for accessing MediaWiki via
Git. The scripts are essentially unmaintained in Git: despite a couple
of global cleanups, the last changes were a couple of security-related
issues part of 9a8606465e8 (remote-mediawiki: use "sh" to eliminate
unquoted commands, 2020-09-21) and its parents. We don't ever run any of
the tests so it is more likely than not that many of the tests have been
bitrotting, like e.g. documented in f8ab018dafc (remote-mediawiki tests:
annotate failing tests, 2020-09-21).
According to Matthieu Moy [1], one of the original developers of this
tool, it didn't receive any attention recently and there is no
motivation to keep maintaining it anymore in the community. The project
has been spun out of Git [2] and thus has a new official home, but did
not receive much attention over there, either.
As such, it seems like the MediaWiki transport helper is slowly fading
away. But given that there is a new home, it doesn't make sense to have
it as part of Git anymore only to let it rot. Remove the directory.
The "hooks" directory contains a handful of example hooks. Most of these
hooks are highly specific and haven't really received any updates over
the last couple of years, except for some global cleanups. The multimail
hook has also been removed in f74d11471fa (multimail: stop shipping a
copy, 2021-06-10) in favor of its upstream project [1].
Remove those hooks. If we want to provide examples for how to use Git
hooks we should do that as part of our documentation, for example in
githooks(5).
The "thunderbird-patch-inline" directory in "contrib/" contains a script
to send patch files via Thunderbird. This script depends on the
ExternalEditor extension [1], which seems to be effectively unmaintained
with the last update being in 2008. While the extension has eventually
been maintained in [2], that fork hasn't received any updates since
2020, either.
As such, the ExternalEditor extension does not work with modern versions
of Thunderbird anymore, and as the "thunderbird-patch-inline" script
depends on the ExternalEditor extension it likely doesn't work anymore,
either. The fact that this script hasn't been touched for the last 10
years outside of some global cleanup supports the idea that it is not
useful anymore.
The "remote-helpers" directory contains two remote helper scripts for
Mercurial and Bazaar. These scripts have since been converted into stubs
in b2c851a8e67 (Revert "Merge branch 'jc/graduate-remote-hg-bzr' (early
part)", 2014-05-20) as the helpers have been moved into their own
upstream projects [1][2].
Given that these stubs have been created more than a decade ago it is
very unlikely that anybody still tries to use them. Remove them.
The "examples" directory used to contain scripted versions of some of
our builtins. These have all been removed in 49eb8d39c78 (Remove
contrib/examples/*, 2018-03-25), but we left a note in the directory to
make it discoverable that there used to be examples.
It is unlikely that anybody still looks at these examples more than 7
years after they have been removed. Remove the note and its directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remotes can be configured either via a repository's config or by using
the ".git/branches/" or ".git/remotes/" directories. Back when the new
config-based mechanism has been introduced we also introduced a helper
script that migrates from the old-style remote configuration to the new
config-based mechanism.
With the recent removal announcement for the two directories we also
started to instruct users to migrate repositories that still use these
mechanism to use config-based remotes. Notably though, the migration
path doesn't even use the migration script. Instead, git-remote(1)
itself knows how to migrate any such remote via `git remote rename`.
In fact, a full migration _cannot_ use the script as it only knows to
migrate remotes from ".git/remotes/", but not ".git/branches/". As such,
the migration path via `git remote rename` is the only feasible way to
fully migrate repositories over to the new format.
Last but not least, the script doesn't even work as-is as it sources
"git-sh-setup". For this to work it would need to be invoked either via
Git so that this script is in our PATH, users would have to manually
call it with an adjusted PATH, or distributions need to install the
script into "$prefix/libexec/git-core" with a "git-" prefix. All of
these steps are unlikely enough to underpin the claim that this script
is not used at all.
So given that:
- The script cannot perform a full migration of all deprecated remote
types.
- We don't advertise it anywhere.
- It has been basically untouched since 2007.
- It doesn't even work unless users do manual steps.
It should be safe enough to just remove it. Do so.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable: fix perf regression when reading blocks of unwanted type
In fd888311fbc (reftable/table: move reading block into block reader,
2025-04-07), we have refactored how reftable blocks are read so that
most of the logic is contained in the "block.c" subsystem itself. Most
importantly, the whole logic to read the data itself is now contained in
that subsystem.
This change caused a significant performance regression though when
reading blocks that aren't of the specific type one is searching for:
Benchmark 1: update-ref: create 100k refs (revision = fd888311fbc~)
Time (mean ± σ): 2.171 s ± 0.028 s [User: 1.189 s, System: 0.977 s]
Range (min … max): 2.117 s … 2.206 s 10 runs
Benchmark 2: update-ref: create 100k refs (revision = fd888311fbc)
Time (mean ± σ): 3.418 s ± 0.030 s [User: 2.371 s, System: 1.037 s]
Range (min … max): 3.377 s … 3.473 s 10 runs
Summary
update-ref: create 100k refs (revision = fd888311fbc~) ran
1.57 ± 0.02 times faster than update-ref: create 100k refs (revision = fd888311fbc)
The root caute of the performance regression is that we changed when
exactly blocks of an uninteresting type are being discarded. Previous to
the refactoring in the mentioned commit we'd load the block data, read
its type, notice that it's not the wanted type and discard the block.
After the commit though we don't discard the block immediately, but we
fully decode it only to realize that it's not the desired type. We then
discard the block again, but have already performed a bunch of pointless
work.
Fix the regression by making `reftable_block_init()` return early in
case the block is not of the desired type. This fixes the performance
hit:
Benchmark 1: update-ref: create 100k refs (revision = HEAD~)
Time (mean ± σ): 2.712 s ± 0.018 s [User: 1.990 s, System: 0.716 s]
Range (min … max): 2.682 s … 2.741 s 10 runs
Benchmark 2: update-ref: create 100k refs (revision = HEAD)
Time (mean ± σ): 1.670 s ± 0.012 s [User: 0.991 s, System: 0.676 s]
Range (min … max): 1.652 s … 1.693 s 10 runs
Summary
update-ref: create 100k refs (revision = HEAD) ran
1.62 ± 0.02 times faster than update-ref: create 100k refs (revision = HEAD~)
Note that the baseline performance is lower than in the original due to
a couple of unrelated performance improvements that have landed since
the original commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>