refs/reftable: track last log record name via strbuf
The reflog iterator enumerates all reflogs known to a ref backend. In
the "reftable" backend there is no way to list all existing reflogs
directly. Instead, we have to iterate through all reflog entries and
discard all those redundant entries for which we have already returned a
reflog entry.
This logic is implemented by tracking the last reflog name that we have
emitted to the iterator's user. If the next log record has the same name
we simply skip it until we find another record with a different refname.
This last reflog name is stored in a simple C string, which requires us
to free and reallocate it whenever we need to update the reflog name.
Convert it to use a `struct strbuf` instead, which reduces the number of
allocations. Before:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 1,068,485 allocs, 1,068,363 frees, 281,122,886 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 68,485 allocs, 68,363 frees, 256,234,072 bytes allocated
Note that even after this change we still allocate quite a lot of data,
even though the number of allocations does not scale with the number of
log records anymore. This remainder comes mostly from decompressing the
log blocks, where we decompress each block into newly allocated memory.
This will be addressed at a later point in time.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/record: use scratch buffer when decoding records
When decoding log records we need a temporary buffer to decode the
reflog entry's name, mail address and message. As this buffer is local
to the function we thus have to reallocate it for every single log
record which we're about to decode, which is inefficient.
Refactor the code such that callers need to pass in a scratch buffer,
which allows us to reuse it for multiple decodes. This reduces the
number of allocations when iterating through reflogs. Before:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 2,068,487 allocs, 2,068,365 frees, 305,122,946 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 1,068,485 allocs, 1,068,363 frees, 281,122,886 bytes allocated
Note that this commit also drop some redundant calls to `strbuf_reset()`
right before calling `decode_string()`. The latter already knows to
reset the buffer, so there is no need for these.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/record: reuse message when decoding log records
Same as the preceding commit we can allocate log messages as needed when
decoding log records, thus further reducing the number of allocations.
Before:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 3,068,488 allocs, 3,068,366 frees, 307,122,961 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 2,068,487 allocs, 2,068,365 frees, 305,122,946 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/record: reuse refnames when decoding log records
When decoding a log record we always reallocate their refname arrays.
This results in quite a lot of needless allocation churn.
Refactor the code to grow the array as required only. Like this, we
should usually only end up reallocating the array a small handful of
times when iterating over many refs. Before:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 4,068,487 allocs, 4,068,365 frees, 332,011,793 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 3,068,488 allocs, 3,068,366 frees, 307,122,961 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each reflog entry contains information regarding the authorship of who
has made the change. This authorship information is not the same as that
of any of the commits that the reflog entry references, but instead
corresponds to the local user that has executed the command. Thus, it is
almost always the case that all reflog entries have the same author.
We can make use of this fact when decoding reftable records: instead of
freeing and then reallocating the authorship information of log records,
we can special-case when the next record during an iteration has the
exact same authorship as the preceding record. If so, then there is no
need to reallocate the respective fields.
This change results in two allocations less per log record that we're
iterating over in the most common case. Before:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 4,068,487 allocs, 4,068,365 frees, 332,011,793 bytes allocated
An alternative would be to store the capacity of both name and email and
then use `REFTABLE_ALLOC_GROW()` to conditionally reallocate the array.
But reftable records are copied around quite a lot, and thus we need to
be a bit mindful of the overall record size. Furthermore, a memory
comparison should also be more efficient than having to copy over memory
even if we wouldn't have to allocate a new array every time.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/record: convert old and new object IDs to arrays
In 7af607c58d (reftable/record: store "val1" hashes as static arrays,
2024-01-03) and b31e3cc620 (reftable/record: store "val2" hashes as
static arrays, 2024-01-03) we have converted ref records to store their
object IDs in a static array. Convert log records to do the same so that
their old and new object IDs are arrays, too.
This change results in two allocations less per log record that we're
iterating over. Before:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 8,068,495 allocs, 8,068,373 frees, 401,011,862 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/reftable: reload correct stack when creating reflog iter
When creating a new reflog iterator, we first have to reload the stack
that the iterator is being created. This is done so that any concurrent
writes to the stack are reflected. But `reflog_iterator_for_stack()`
always reloads the main stack, which is wrong.
Fix this and reload the correct stack.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 5 Mar 2024 17:09:46 +0000 (09:09 -0800)]
Merge branch 'ps/reftable-iteration-perf-part2' into ps/reftable-reflog-iteration-perf
* ps/reftable-iteration-perf-part2:
refs/reftable: precompute prefix length
reftable: allow inlining of a few functions
reftable/record: decode keys in place
reftable/record: reuse refname when copying
reftable/record: reuse refname when decoding
reftable/merged: avoid duplicate pqueue emptiness check
reftable/merged: circumvent pqueue with single subiter
reftable/merged: handle subiter cleanup on close only
reftable/merged: remove unnecessary null check for subiters
reftable/merged: make subiters own their records
reftable/merged: advance subiter on subsequent iteration
reftable/merged: make `merged_iter` structure private
reftable/pq: use `size_t` to track iterator index
Junio C Hamano [Sun, 3 Mar 2024 22:06:00 +0000 (14:06 -0800)]
clean: further clean-up of implementation around "--force"
We clarified how "clean.requireForce" interacts with the "--dry-run"
option in the previous commit, both in the implementation and in the
documentation. Even when "git clean" (without other options) is
required to be used with "--force" (i.e. either clean.requireForce
is unset, or explicitly set to true) to protect end-users from
casual invocation of the command by mistake, "--dry-run" does not
require "--force" to be used, because it is already its own
protection mechanism by being a no-op to the working tree files.
The previous commit, however, missed another clean-up opportunity
around the same area. Just like in the "--dry-run" mode, the
command in the "--interactive" mode does not require "--force",
either. This is because by going interactive and giving the end
user one more chance to confirm, the mode itself is serving as its
own protection mechanism.
Let's take things one step further, and unify the code that defines
interaction between "--force" and these two other options. Just
like we added explanation for the reason why "--dry-run" does not
honor "clean.requireForce", give an explanation for the reason why
"--interactive" makes "clean.requireForce" to be ignored.
Finally, add some tests to show the interaction between "--force"
and "--interactive". We already have tests that show interaction
between "--force" and "--dry-run", but didn't test "--interactive".
We're recomputing the prefix length on every iteration of the ref
iterator. Precompute it for another speedup when iterating over 1
million refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 100.3 ms ± 3.7 ms [User: 97.3 ms, System: 2.8 ms]
Range (min … max): 97.5 ms … 139.7 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 95.8 ms ± 3.4 ms [User: 92.9 ms, System: 2.8 ms]
Range (min … max): 93.0 ms … 121.9 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a few functions which are basically just accessors to
structures. As those functions are executed inside the hot loop when
iterating through many refs, the fact that they cannot be inlined is
costing us some performance.
Move the function definitions into their respective headers so that they
can be inlined. This results in a performance improvement when iterating
over 1 million refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 105.9 ms ± 3.6 ms [User: 103.0 ms, System: 2.8 ms]
Range (min … max): 103.1 ms … 133.4 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 100.7 ms ± 3.4 ms [User: 97.8 ms, System: 2.8 ms]
Range (min … max): 97.8 ms … 124.0 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading a record from a block, we need to decode the record's key.
As reftable keys are prefix-compressed, meaning they reuse a prefix from
the preceding record's key, this is a bit more involved than just having
to copy the relevant bytes: we need to figure out the prefix and suffix
lengths, copy the prefix from the preceding record and finally copy the
suffix from the current record.
This is done by passing three buffers to `reftable_decode_key()`: one
buffer that holds the result, one buffer that holds the last key, and
one buffer that points to the current record. The final key is then
assembled by calling `strbuf_add()` twice to copy over the prefix and
suffix.
Performing two memory copies is inefficient though. And we can indeed do
better by decoding keys in place. Instead of providing two buffers, the
caller may only call a single buffer that is already pre-populated with
the last key. Like this, we only have to call `strbuf_setlen()` to trim
the record to its prefix and then `strbuf_add()` to add the suffix.
This refactoring leads to a noticeable performance bump when iterating
over 1 million refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 112.2 ms ± 3.9 ms [User: 109.3 ms, System: 2.8 ms]
Range (min … max): 109.2 ms … 149.6 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 106.0 ms ± 3.5 ms [User: 103.2 ms, System: 2.7 ms]
Range (min … max): 103.2 ms … 133.7 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.06 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do the same optimization as in the preceding commit, but this time for
`reftable_record_copy()`. While not as noticeable, it still results in a
small speedup when iterating over 1 million refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 114.0 ms ± 3.8 ms [User: 111.1 ms, System: 2.7 ms]
Range (min … max): 110.9 ms … 144.3 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 112.5 ms ± 3.7 ms [User: 109.5 ms, System: 2.8 ms]
Range (min … max): 109.2 ms … 140.7 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.01 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When decoding a reftable record we will first release the user-provided
record and then decode the new record into it. This is quite inefficient
as we basically need to reallocate at least the refname every time.
Refactor the function to start tracking the refname capacity. Like this,
we can stow away the refname, release, restore and then grow the refname
to the required number of bytes via `REFTABLE_ALLOC_GROW()`.
This refactoring is safe to do because all functions that assigning to
the refname will first call `reftable_ref_record_release()`, which will
zero out the complete record after releasing memory.
This change results in a nice speedup when iterating over 1 million
refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 124.0 ms ± 3.9 ms [User: 121.1 ms, System: 2.7 ms]
Range (min … max): 120.4 ms … 152.7 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 114.4 ms ± 3.7 ms [User: 111.5 ms, System: 2.7 ms]
Range (min … max): 111.0 ms … 152.1 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.08 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Furthermore, with this change we now perform a mostly constant number of
allocations when iterating. Before this change:
HEAP SUMMARY:
in use at exit: 13,603 bytes in 125 blocks
total heap usage: 1,006,620 allocs, 1,006,495 frees, 25,398,363 bytes allocated
After this change:
HEAP SUMMARY:
in use at exit: 13,603 bytes in 125 blocks
total heap usage: 6,623 allocs, 6,498 frees, 509,592 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calling `merged_iter_next_void()` we first check whether the iter
has been exhausted already. We already perform this check two levels
down the stack in `merged_iter_next_entry()` though, which makes this
check redundant.
Now if this check was there to accelerate the common case it might have
made sense to keep it. But the iterator being exhausted is rather the
uncommon case because you can expect most reftable stacks to contain
more than two refs.
Simplify the code by removing the check. As `merged_iter_next_void()` is
basically empty except for calling `merged_iter_next()` now, merge these
two functions. This also results in a tiny speedup when iterating over
many refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 125.6 ms ± 3.8 ms [User: 122.7 ms, System: 2.8 ms]
Range (min … max): 122.4 ms … 153.4 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 124.0 ms ± 3.9 ms [User: 121.1 ms, System: 2.8 ms]
Range (min … max): 120.1 ms … 156.4 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/merged: circumvent pqueue with single subiter
The merged iterator uses a priority queue to order records so that we
can yielid them in the expected order. This priority queue of course
comes with some overhead as we need to add, compare and remove entries
in that priority queue.
In the general case, that overhead cannot really be avoided. But when we
have a single subiter left then there is no need to use the priority
queue anymore because the order is exactly the same as what that subiter
would return.
While having a single subiter may sound like an edge case, it happens
more frequently than one might think. In the most common scenario, you
can expect a repository to have a single large table that contains most
of the records and then a set of smaller tables which contain later
additions to the reftable stack. In this case it is quite likely that we
exhaust subiters of those smaller stacks before exhausting the large
table.
Special-case this and return records directly from the remaining
subiter. This results in a sizeable speedup when iterating over 1m refs
in a repository with a single table:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 135.4 ms ± 4.4 ms [User: 132.5 ms, System: 2.8 ms]
Range (min … max): 131.0 ms … 166.3 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 126.3 ms ± 3.9 ms [User: 123.3 ms, System: 2.8 ms]
Range (min … max): 122.7 ms … 157.0 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.07 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/merged: handle subiter cleanup on close only
When advancing one of the subiters fails we immediately release
resources associated with that subiter. This is not necessary though as
we will release these resources when closing the merged iterator anyway.
Drop the logic and only release resources when the merged iterator is
done. This is a mere cleanup that should help reduce the cognitive load
when reading through the code.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/merged: remove unnecessary null check for subiters
Whenever we advance a subiter we first call `iterator_is_null()`. This
is not needed though because we only ever advance subiters which have
entries in the priority queue, and we do not end entries to the priority
queue when the subiter has been exhausted.
Drop the check as well as the now-unused function. This results in a
surprisingly big speedup:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 138.1 ms ± 4.4 ms [User: 135.1 ms, System: 2.8 ms]
Range (min … max): 133.4 ms … 167.3 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 134.4 ms ± 4.2 ms [User: 131.5 ms, System: 2.8 ms]
Range (min … max): 130.0 ms … 164.0 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.03 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
For each subiterator, the merged table needs to track their current
record. This record is owned by the priority queue though instead of by
the merged iterator. This is not optimal performance-wise.
For one, we need to move around records whenever we add or remove a
record from the priority queue. Thus, the bigger the entries the more
bytes we need to copy around. And compared to pointers, a reftable
record is rather on the bigger side. The other issue is that this makes
it harder to reuse the records.
Refactor the code so that the merged iterator tracks ownership of the
records per-subiter. Instead of having records in the priority queue, we
can now use mere pointers to the per-subiter records. This also allows
us to swap records between the caller and the per-subiter record instead
of doing an actual copy via `reftable_record_copy_from()`, which removes
the need to release the caller-provided record.
This results in a noticeable speedup when iterating through many refs.
The following benchmark iterates through 1 million refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 145.5 ms ± 4.5 ms [User: 142.5 ms, System: 2.8 ms]
Range (min … max): 141.3 ms … 177.0 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 139.0 ms ± 4.7 ms [User: 136.1 ms, System: 2.8 ms]
Range (min … max): 134.2 ms … 182.2 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
This refactoring also allows a subsequent refactoring where we start
reusing memory allocated by the reftable records because we do not need
to release the caller-provided record anymore.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/merged: advance subiter on subsequent iteration
When advancing the merged iterator, we pop the topmost entry from its
priority queue and then advance the sub-iterator that the entry belongs
to, adding the result as a new entry. This is quite sensible in the case
where the merged iterator is used to actually iterate through records.
But the merged iterator is also used when we look up a single record,
only, so advancing the sub-iterator is wasted effort because we would
never even look at the result.
Instead of immediately advancing the sub-iterator, we can also defer
this to the next iteration of the merged iterator by storing the
intent-to-advance. This results in a small speedup when reading many
records. The following benchmark creates 10000 refs, which will also end
up with many ref lookups:
Benchmark 1: update-ref: create many refs (revision = HEAD~)
Time (mean ± σ): 337.2 ms ± 7.3 ms [User: 200.1 ms, System: 136.9 ms]
Range (min … max): 329.3 ms … 373.2 ms 100 runs
Benchmark 2: update-ref: create many refs (revision = HEAD)
Time (mean ± σ): 332.5 ms ± 5.9 ms [User: 197.2 ms, System: 135.1 ms]
Range (min … max): 327.6 ms … 359.8 ms 100 runs
Summary
update-ref: create many refs (revision = HEAD) ran
1.01 ± 0.03 times faster than update-ref: create many refs (revision = HEAD~)
While this speedup alone isn't really worth it, this refactoring will
also allow two additional optimizations in subsequent patches. First, it
will allow us to special-case when there is only a single sub-iter left
to circumvent the priority queue altogether. And second, it makes it
easier to avoid copying records to the caller.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/merged: make `merged_iter` structure private
The `merged_iter` structure is not used anywhere outside of "merged.c",
but is declared in its header. Move it into the code file so that it is
clear that its implementation details are never exposed to anything.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable priority queue is used by the merged iterator to yield
records from its sub-iterators in the expected order. Each entry has a
record corresponding to such a sub-iterator as well as an index that
indicates which sub-iterator the record belongs to. But while the
sub-iterators are tracked with a `size_t`, we store the index as an
`int` in the entry.
Fix this and use `size_t` consistently.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The TODO comment suggested to heed core.bare from template config file
if no command line override given. And the prev_bare_repository
variable seems to have been placed for this sole purpose as it is not
used anywhere else.
However, it was clarified by Junio [1] that such values (including
core.bare) are ignored intentionally and does not make sense to
propagate them from template config to repository config. Also, the
directories for the worktree and repository are already created, and
therefore the bare/non-bare decision has already been made, by the
point we reach the codepath where the TODO comment is placed.
Therefore, prev_bare_repository does not have a usecase with/without
supporting core.bare from template. And the removal of
prev_bare_repository is safe as proved by the later part of the
comment:
"Unfortunately, the line above is equivalent to
is_bare_repository_cfg = !work_tree;
which ignores the config entirely even if no `--[no-]bare`
command line option was present.
To see why, note that before this function, there was this call:
prev_bare_repository = is_bare_repository()
expanding the right hand side:
= is_bare_repository_cfg && !get_git_work_tree()
= is_bare_repository_cfg && !work_tree
note that the last simplification above is valid because nothing
calls repo_init() or set_git_work_tree() between any of the
relevant calls in the code, and thus the !get_git_work_tree()
calls will return the same result each time. So, what we are
interested in computing is the right hand side of the line of
code just above this comment:
prev_bare_repository || !work_tree
= is_bare_repository_cfg && !work_tree || !work_tree
= !work_tree
because "A && !B || !B == !B" for all boolean values of A & B."
Therefore, remove the TODO comment and remove prev_bare_repository
variable. Also, update relevant testcases and remove one redundant
testcase.
Helped-by: Elijah Newren <newren@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This happens because we're completing references after seeing a valid
subcommand in the command line. This prevents the implicit "show" from
working properly, but also introduces a new problem: it keeps offering
subcommand options when the subcommand is implicit:
Sergey Organov [Sun, 3 Mar 2024 09:50:32 +0000 (12:50 +0300)]
clean: improve -n and -f implementation and documentation
What -n actually does in addition to its documented behavior is
ignoring of configuration variable clean.requireForce, that makes
sense provided -n prevents files removal anyway.
So, first, document this in the manual, and then modify implementation
to make this more explicit in the code.
Improved implementation also stops to share single internal variable
'force' between command-line -f option and configuration variable
clean.requireForce, resulting in more clear logic.
Two error messages with slightly different text depending on if
clean.requireForce was explicitly set or not, are merged into a single
one.
The resulting error message now does not mention -n as well, as it
neither matches intended clean.requireForce usage nor reflects
clarified implementation.
Documentation of clean.requireForce is changed accordingly.
Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 3 Mar 2024 12:19:43 +0000 (13:19 +0100)]
parse-options: rearrange long_name matching code
Move the code for handling a full match of long_name first and get rid
of negations. Reduce the indent of the code for matching abbreviations
and remove unnecessary curly braces. Combine the checks for whether
negation is allowed and whether arg is "n", "no" or "no-" because they
belong together and avoid a continue statement. The result is shorter,
more readable code.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 3 Mar 2024 12:19:42 +0000 (13:19 +0100)]
parse-options: normalize arg and long_name before comparison
Strip "no-" from arg and long_name before comparing them. This way we
no longer have to repeat the comparison with an offset of 3 for negated
arguments.
Note that we must not modify the "flags" value, which tracks whether arg
is negated, inside the loop. When registering "--n", "--no" or "--no-"
as abbreviation for any negative option, we used to OR it with OPT_UNSET
and end the loop. We can simply hard-code OPT_UNSET and leave flags
unchanged instead.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 3 Mar 2024 12:19:41 +0000 (13:19 +0100)]
parse-options: detect ambiguous self-negation
Git currently does not detect the ambiguity of an option that starts
with "no" like --notes and its negated form if given just --n or --no.
All Git commands with such options have other negatable options, and
we detect the ambiguity with them, so that's currently only a potential
problem for scripts that use git rev-parse --parseopt.
Let's fix it nevertheless, as there's no need for that confusion. To
detect the ambiguity we have to loosen the check in register_abbrev(),
as an option is considered an alias of itself. Add non-matching
negation flags as a criterion to recognize an option being ambiguous
with its negated form.
And we need to keep going after finding a non-negated option as an
abbreviated candidate and perform the negation checks in the same
loop.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 3 Mar 2024 12:19:40 +0000 (13:19 +0100)]
parse-options: factor out register_abbrev() and struct parsed_option
Add a function, register_abbrev(), for storing the necessary details for
remembering an abbreviated and thus potentially ambiguous option. Call
it instead of sharing the code using goto, to make the control flow more
explicit.
Conveniently collect these details in the new struct parsed_option to
reduce the number of necessary function arguments.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 3 Mar 2024 12:19:39 +0000 (13:19 +0100)]
parse-options: set arg of abbreviated option lazily
Postpone setting the opt pointer until we're about to call get_value(),
which uses it. There's no point in setting it eagerly for every
abbreviated candidate option, which may turn out to be ambiguous.
Removing this assignment from the loop doesn't noticeably improve the
performance, but allows further simplification.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve consistency and usefulness of the error message by recognizing
abbreviated negated options even if they have a (most likely bogus)
argument. With this patch we get:
$ git rm --no-dry=bogus
error: option `no-dry-run' takes no value
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 3 Mar 2024 10:13:28 +0000 (11:13 +0100)]
t-ctype: avoid duplicating class names
TEST_CTYPE_FUNC defines a function for testing a character classifier,
TEST_CHAR_CLASS calls it, causing the class name to be mentioned twice.
Avoid the need to define a class-specific function by letting
TEST_CHAR_CLASS do all the work. This is done by using the internal
functions test__run_begin() and test__run_end(), but they do exist to be
used in test macros after all.
Alternatively we could unroll the loop to provide a very long expression
that tests all 256 characters and EOF and hand that to TEST, but that
seems awkward and hard to read.
No change of behavior or output intended.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 3 Mar 2024 10:13:26 +0000 (11:13 +0100)]
t-ctype: simplify EOF check
EOF is not a member of any character class. If a classifier function
returns a non-zero result for it, presumably by mistake, then the unit
test check reports:
# check "!iseof(EOF)" failed at t/unit-tests/t-ctype.c:53
# i: 0xffffffff (EOF)
The numeric value of EOF is not particularly interesting in this
context. Stop printing the second line.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 3 Mar 2024 10:13:25 +0000 (11:13 +0100)]
t-ctype: allow NUL anywhere in the specification string
Replace the custom function is_in() for looking up a character in the
specification string with memchr(3) and sizeof. This is shorter,
simpler and allows NUL anywhere in the string, which may come in handy
if we ever want to support more character classes that contain it.
Getting the string size using sizeof only works in a macro and with a
string constant. Use ARRAY_SIZE and compile-time checks to make sure we
are not passed a string pointer.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sat, 2 Mar 2024 19:03:48 +0000 (11:03 -0800)]
repack: check error writing to pack-objects subprocess
When "git repack" repacks promisor objects, it starts a pack-objects
subprocess and uses xwrite() to send object names over the pipe to
it, but without any error checking. An I/O error or short write
(even though a short write is unlikely for such a small amount of
data) can result in a packfile that lacks certain objects we wanted
to put in there, leading to a silent repository corruption.
Use write_in_full(), instead of xwrite(), to mitigate short write
risks, check errors from it, and abort if we see a failure.
Junio C Hamano [Sat, 2 Mar 2024 19:03:47 +0000 (11:03 -0800)]
sideband: avoid short write(2)
The sideband demultiplexor writes the data it receives on sideband
with xwrite(). We can lose data if the underlying write(2) results
in a short write.
If they are limited to unimportant bytes like eye-candy progress
meter, it may be OK to lose them, but lets be careful and ensure
that we use write_in_full() instead. Note that the original does
not check for errors, and this rewrite does not check for one. At
least not yet.
Junio C Hamano [Sat, 2 Mar 2024 19:03:46 +0000 (11:03 -0800)]
unpack: replace xwrite() loop with write_in_full()
We have two packfile stream consumers, index-pack and
unpack-objects, that allow excess payload after the packfile stream
data. Their code to relay excess data hasn't changed significantly
since their original implementation that appeared in 67e5a5ec
(git-unpack-objects: re-write to read from stdin, 2005-06-28) and 9bee2478 (mimic unpack-objects when --stdin is used with index-pack,
2006-10-25).
These code blocks contain hand-rolled loops using xwrite(), written
before our write_in_full() helper existed. This helper now provides
the same functionality.
Replace these loops with write_in_full() for shorter, clearer
code. Update related variables accordingly.
Junio C Hamano [Sat, 2 Mar 2024 17:13:51 +0000 (09:13 -0800)]
test_i18ngrep: hard deprecate and forbid its use
Since v2.44.0-rc0~109 (Merge branch 'sp/test-i18ngrep', 2023-12-27)
none of the tests we have, either in 'master' or in flight and
collected in 'seen', use test_i18ngrep.
Perhaps it is good time to update test_i18ngrep to BUG to avoid
people adding new calls to it.
Junio C Hamano [Fri, 1 Mar 2024 22:38:55 +0000 (14:38 -0800)]
Merge branch 'ja/doc-placeholders-markup-rules' into HEAD
The way placeholders are to be marked-up in documentation have been
specified; use "_<placeholder>_" to typeset the word inside a pair
of <angle-brakets> emphasized.
* ja/doc-placeholders-markup-rules:
doc: clarify the format of placeholders
Junio C Hamano [Fri, 1 Mar 2024 22:38:55 +0000 (14:38 -0800)]
Merge branch 'ps/reflog-list' into HEAD
"git reflog" learned a "list" subcommand that enumerates known reflogs.
* ps/reflog-list:
builtin/reflog: introduce subcommand to list reflogs
refs: stop resolving ref corresponding to reflogs
refs: drop unused params from the reflog iterator callback
refs: always treat iterators as ordered
refs/files: sort merged worktree and common reflogs
refs/files: sort reflogs returned by the reflog iterator
dir-iterator: support iteration in sorted order
dir-iterator: pass name to `prepare_next_entry_data()` directly
Junio C Hamano [Fri, 1 Mar 2024 22:38:54 +0000 (14:38 -0800)]
Merge branch 'ja/docfixes' into HEAD
Doc update.
* ja/docfixes:
doc: end sentences with full-stop
doc: close unclosed angle-bracket of a placeholder in git-clone doc
doc: git-rev-parse: enforce command-line description syntax
Git builtins used to be called like e.g. `git-commit`, not `git
commit` (*dashed form* and *non-dashed form*, respectively). The dashed
form was deprecated in version 1.5.4 (2006). Now only a few commands
have an alternative dashed form when `SKIP_DASHED_BUILT_INS` is
active.[1]
The mention here is from 2f7ee089dff (parse-options: Add a gitcli(5) man
page., 2007-12-13), back when the deprecation was relatively
recent. These days though it seems like an irrelevant point to make to
budding CLI scripters—you don’t have to warn against a style that
probably doesn’t even work on their git(1) installation.
† 1: 179227d6e21 (Optionally skip linking/copying the built-ins,
2020-09-21)
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is another preparatory refactor to unify the trailer formatters.
For background, note that the "trailers" string array is the
`char **trailers` member in `struct trailer_info` and that the
trailer_item objects are the elements of the `struct list_head *head`
linked list.
Currently trailer_info_get() only populates `char **trailers`. And
parse_trailers() first calls trailer_info_get() so that it can use the
`char **trailers` to populate a list of `struct trailer_item` objects
Instead of calling trailer_info_get() directly from
format_trailers_from_commit(), make it call parse_trailers() instead
because parse_trailers() already calls trailer_info_get().
This change is a NOP because format_trailer_info() (which
format_trailers_from_commit() wraps around) only looks at the "trailers"
string array, not the trailer_item objects which parse_trailers()
populates. For now we do need to create a dummy
LIST_HEAD(trailer_objects);
because parse_trailers() expects it in its signature.
In a future patch, we'll change format_trailer_info() to use the parsed
trailer_item objects (trailer_objects) instead of the `char **trailers`
array.
Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Linus Arver [Fri, 1 Mar 2024 00:14:45 +0000 (00:14 +0000)]
format_trailer_info(): move "fast path" to caller
This is another preparatory refactor to unify the trailer formatters.
This allows us to drop the "msg" parameter from format_trailer_info(),
so that it take 3 parameters, similar to format_trailers() which also
takes 3 parameters:
The short-term goal is to make format_trailer_info() be smart enough to
deprecate format_trailers(). And then ultimately we will rename
format_trailer_info() to format_trailers().
Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Linus Arver [Fri, 1 Mar 2024 00:14:44 +0000 (00:14 +0000)]
format_trailers(): use strbuf instead of FILE
This is another preparatory refactor to unify the trailer formatters.
Make format_trailers() also write to a strbuf, to align with
format_trailers_from_commit() which also does the same. Doing this makes
format_trailers() behave similar to format_trailer_info() (which will
soon help us replace one with the other).
Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Linus Arver [Fri, 1 Mar 2024 00:14:41 +0000 (00:14 +0000)]
trailer: move interpret_trailers() to interpret-trailers.c
The interpret-trailers.c builtin is the only place we need to call
interpret_trailers(), so move its definition there (together with a few
helper functions called only by it) and remove its external declaration
from <trailer.h>.
Several helper functions that are called by interpret_trailers() remain
in trailer.c because other callers in the same file still call them.
Declare them in <trailer.h> so that interpret_trailers() (now in
builtin/interpret-trailers.c) can continue calling them as a trailer API
user.
This enriches <trailer.h> with a more granular API, which can then be
unit-tested in the future (because interpret_trailers() by itself does
too many things to be able to be easily unit-tested).
Take this opportunity to demote some file-handling functions out of the
trailer API implementation, as these have nothing to do with trailers.
Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
and although they are similar enough (even taking the same
process_trailer_options struct pointer) they are used quite differently.
One might intuitively think that format_trailers_from_commit() builds on
top of format_trailers(), but this is not the case. Instead
format_trailers_from_commit() calls format_trailer_info() and
format_trailers() is never called in that codepath.
This is a preparatory refactor to help us deprecate format_trailers() in
favor of format_trailer_info() (at which point we can rename the latter
to the former). When the deprecation is complete, both
format_trailers_from_commit(), and the interpret-trailers builtin will
be able to call into the same helper function (instead of
format_trailers() and format_trailer_info(), respectively). Unifying the
formatters is desirable because it simplifies the API.
Reorder parameters for format_trailers_from_commit() to prefer
const struct process_trailer_options *opts
as the first parameter, because these options are intimately tied to
formatting trailers. And take
struct strbuf *out
last, because it's an "out parameter" (something that the caller wants
to use as the output of this function).
Similarly, reorder parameters for format_trailer_info(), because later
on we will unify the two together.
Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Linus Arver [Fri, 1 Mar 2024 00:14:40 +0000 (00:14 +0000)]
trailer: rename functions to use 'trailer'
Rename process_trailers() to interpret_trailers(), because it matches
the name for the builtin command of the same name
(git-interpret-trailers), which is the sole user of process_trailers().
In a following commit, we will move "interpret_trailers" from trailer.c
to builtin/interpret-trailers.c. That move will necessitate the growth
of the trailer.h API, forcing us to expose some additional functions in
trailer.h.
Rename relevant functions so that they include the term "trailer" in
their name, so that clients of the API will be able to easily identify
them by their "trailer" moniker, just like all the other functions
already exposed by trailer.h.
Rename `struct list_head *head` to `struct list_head *trailers` because
"head" conveys no additional information beyond the "list_head" type.
Reorder parameters for format_trailers_from_commit() to prefer
const struct process_trailer_options *opts
as the first parameter, because these options are intimately tied to
formatting trailers. Parameters like `FILE *outfile` should be last
because they are a kind of 'out' parameter, so put such parameters at
the end. This will be the pattern going forward in this series.
Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Linus Arver [Fri, 1 Mar 2024 00:14:39 +0000 (00:14 +0000)]
shortlog: add test for de-duplicating folded trailers
The shortlog builtin was taught to use the trailer iterator interface in 47beb37bc6 (shortlog: match commit trailers with --group, 2020-09-27).
The iterator always unfolds values and this has always been the case
since the time the iterator was first introduced in f0939a0eb1 (trailer:
add interface for iterating over commit trailers, 2020-09-27). Add a
comment line to remind readers of this behavior.
The fact that the iterator always unfolds values is important
(at least for shortlog) because unfolding allows it to recognize both
folded and unfolded versions of the same trailer for de-duplication.
Capture the existing behavior in a new test case to guard against
regressions in this area. This test case is based off of the existing
"shortlog de-duplicates trailers in a single commit" just above it. Now
if we were to remove the call to
unfold_value(&iter->val);
inside the iterator, this new test case will break.
Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Linus Arver [Fri, 1 Mar 2024 00:14:38 +0000 (00:14 +0000)]
trailer: free trailer_info _after_ all related usage
In de7c27a186 (trailer: use offsets for trailer_start/trailer_end,
2023-10-20), we started using trailer block offsets in trailer_info. In
particular, we dropped the use of a separate stack variable "size_t
trailer_end", in favor of accessing the new "trailer_block_end" member
of trailer_info (as "info.trailer_block_end").
At that time, we forgot to also move the
trailer_info_release(&info);
line to be _after_ this new use of the trailer_info struct. Move it now.
Note that even without this patch, we didn't have leaks or any other
problems because trailer_info_release() only frees memory allocated on
the heap. The "trailer_block_end" member was allocated on the stack back
then (as it is now) so it was still safe to use for all this time.
Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
By and large, variable groupings in Documentation/config.txt are sorted
alphabetically, though a few are not. Those outliers make it more
difficult to find a specific grouping when quickly running an eye over
the list to locate a variable of interest. Address this shortcoming by
sorting the groupings alphabetically.
NOTE: This change only sorts the top-level groupings (i.e. "core.*"
comes after "completion.*"); it does not touch the ordering of variables
within each group since variables within individual groups might
intentionally be ordered in some other fashion (such as
most-common-first or most-important-first).
Reported-by: Bruno Haible <bruno@clisp.org> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eugenio Gigante [Thu, 29 Feb 2024 19:44:44 +0000 (20:44 +0100)]
add: use unsigned type for collection of bits
The 'refresh' function in 'builtin/add.c' declares 'flags' as
signed, and passes it as an argument to the 'refresh_index'
function, which though expects an unsigned value.
Since in this case 'flags' represents a bag of bits, whose MSB is
not used in special ways, change the type of 'flags' to unsigned.
Signed-off-by: Eugenio Gigante <giganteeugenio2@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 28 Feb 2024 22:50:50 +0000 (17:50 -0500)]
upload-pack: only accept packfile-uris if we advertised it
Clients are only supposed to request particular capabilities or features
if the server advertised them. For the "packfile-uris" feature, we only
advertise it if uploadpack.blobpacfileuri is set, but we always accept a
request from the client regardless.
In practice this doesn't really hurt anything, as we'd pass the client's
protocol list on to pack-objects, which ends up ignoring it. But we
should try to follow the protocol spec, and tightening this up may catch
buggy or misbehaving clients more easily.
Thanks to recent refactoring, we can hoist the config check from
upload_pack_advertise() into upload_pack_config(). Note the subtle
handling of a value-less bool (which does not count for triggering an
advertisement).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-reach(repo_get_merge_bases_many_dirty): pass on errors
(Actually, this commit is only about passing on "missing commits"
errors, but adding that to the commit's title would have made it too
long.)
The `merge_bases_many()` function was just taught to indicate parsing
errors, and now the `repo_get_merge_bases_many_dirty()` function is
aware of that, too.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-reach(repo_get_merge_bases_many): pass on "missing commits" errors
The `merge_bases_many()` function was just taught to indicate parsing
errors, and now the `repo_get_merge_bases_many()` function is aware of
that, too.
Naturally, there are a lot of callers that need to be adjusted now, too.
Next stop: `repo_get_merge_bases_dirty()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-reach(get_octopus_merge_bases): pass on "missing commits" errors
The `merge_bases_many()` function was just taught to indicate parsing
errors, and now the `repo_get_merge_bases()` function (which is also
surfaced via the `get_merge_bases()` macro) is aware of that, too.
Naturally, the callers need to be adjusted now, too.
Next step: adjust `repo_get_merge_bases_many()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-reach(repo_get_merge_bases): pass on "missing commits" errors
The `merge_bases_many()` function was just taught to indicate parsing
errors, and now the `repo_get_merge_bases()` function (which is also
surfaced via the `repo_get_merge_bases()` macro) is aware of that, too.
Naturally, there are a lot of callers that need to be adjusted now, too.
Next step: adjust the callers of `get_octopus_merge_bases()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-reach(merge_bases_many): pass on "missing commits" errors
The `paint_down_to_common()` function was just taught to indicate
parsing errors, and now the `merge_bases_many()` function is aware of
that, too.
One tricky aspect is that `merge_bases_many()` parses commits of its
own, but wants to gracefully handle the scenario where NULL is passed as
a merge head, returning the empty list of merge bases. The way this was
handled involved calling `repo_parse_commit(NULL)` and relying on it to
return an error. This has to be done differently now so that we can
handle missing commits correctly by producing a fatal error.
Next step: adjust the caller of `merge_bases_many()`:
`get_merge_bases_many_0()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a commit cannot be parsed, it is currently ignored when looking for
merge bases. That's undesirable as the operation can pretend success in
a corrupt repository, even though the command should fail with an error
message.
Let's start at the bottom of the stack by teaching the
`paint_down_to_common()` function to return an `int`: if negative, it
indicates fatal error, if 0 success.
This requires a couple of callers to be adjusted accordingly.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-reach(paint_down_to_common): prepare for handling shallow commits
When `git fetch --update-shallow` needs to test for commit ancestry, it
can naturally run into a missing object (e.g. if it is a parent of a
shallow commit). For the purpose of `--update-shallow`, this needs to be
treated as if the child commit did not even have that parent, i.e. the
commit history needs to be clamped.
For all other scenarios, clamping the commit history is actually a bug,
as it would hide repository corruption (for an analysis regarding
shallow and partial clones, see the analysis further down).
Add a flag to optionally ask the function to ignore missing commits, as
`--update-shallow` needs it to, while detecting missing objects as a
repository corruption error by default.
This flag is needed, and cannot be replaced by `is_repository_shallow()`
to indicate that situation, because that function would return 0 in the
`--update-shallow` scenario: There is not actually a `shallow` file in
that scenario, as demonstrated e.g. by t5537.10 ("add new shallow root
with receive.updateshallow on") and t5538.4 ("add new shallow root with
receive.updateshallow on").
Note: shallow commits' parents are set to `NULL` internally already,
therefore there is no need to special-case shallow repositories here, as
the merge-base logic will not try to access parent commits of shallow
commits.
Likewise, partial clones aren't an issue either: If a commit is missing
during the revision walk in the merge-base logic, it is fetched via
`promisor_remote_get_direct()`. And not only the single missing commit
object: Due to the way the "promised" objects are fetched (in
`fetch_objects()` in `promisor-remote.c`, using `fetch
--filter=blob:none`), there is no actual way to fetch a single commit
object, as the remote side will pass that commit OID to `pack-objects
--revs [...]` which in turn passes it to `rev-list` which interprets
this as a commit _range_ instead of a single object. Therefore, in
partial clones (unless they are shallow in addition), all commits
reachable from a commit that is in the local object database are also
present in that local database.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 28 Feb 2024 22:48:18 +0000 (17:48 -0500)]
upload-pack: use existing config mechanism for advertisement
When serving a v2 capabilities request, we call upload_pack_advertise()
to tell us the set of features we can advertise to the client. That
involves looking at various config options, all of which need to be kept
in sync with the rules we use in upload_pack_config to set flags like
allow_filter, allow_sideband_all, and so on. If these two pieces of code
get out of sync then we may refuse to respect a capability we
advertised, or vice versa accept one that we should not.
Instead, let's call the same config helper that we'll use for processing
the actual client request, and then just pick the values out of the
resulting struct. This is only a little bit shorter than the current
code, but we don't repeat any policy logic (e.g., we don't have to worry
about the magic sideband-all environment variable here anymore).
And this reveals a gap in the existing code: there is no struct flag for
the packfile-uris capability (we accept it even if it is not advertised,
which we should not). We'll leave the advertisement code for now and
deal with it in the next patch.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 28 Feb 2024 22:47:18 +0000 (17:47 -0500)]
upload-pack: centralize setup of sideband-all config
We read uploadpack.allowsidebandall to set a matching flag in our
upload_pack_data struct. But for our tests, we also respect
GIT_TEST_SIDEBAND_ALL from the environment, and anybody looking at the
flag in the struct needs to remember to check both. There's only one
such piece of code now, but we're about to add another.
So let's have the config step actually fold the environment value into
the struct, letting the rest of the code use the flag in the obvious
way.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 28 Feb 2024 22:46:47 +0000 (17:46 -0500)]
upload-pack: use repository struct to get config
Our upload_pack_v2() function gets a repository struct, but we ignore it
totally. In practice this doesn't cause any problems, as it will never
differ from the_repository. But in the spirit of taking a small step
towards getting rid of the_repository, let's at least starting using it
to grab config. There are probably other spots that could benefit, but
it's a start.
Note that we don't need to pass the repo for protected_config(); the
whole point there is that we are not looking at repo config, so there is
no repo-specific version of the function.
For the v0 version of the protocol, we're not passed a repository
struct, so we'll continue to use the_repository there.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 28 Feb 2024 22:39:07 +0000 (17:39 -0500)]
upload-pack: free tree buffers after parsing
When a client sends us a "want" or "have" line, we call parse_object()
to get an object struct. If the object is a tree, then the parsed state
means that tree->buffer points to the uncompressed contents of the tree.
But we don't really care about it. We only really need to parse commits
and tags; for trees and blobs, the important output is just a "struct
object" with the correct type.
But much worse, we do not ever free that tree buffer. It's not leaked in
the traditional sense, in that we still have a pointer to it from the
global object hash. But if the client requests many trees, we'll hold
all of their contents in memory at the same time.
Nobody really noticed because it's rare for clients to directly request
a tree. It might happen for a lightweight tag pointing straight at a
tree, or it might happen for a "tree:depth" partial clone filling in
missing trees.
But it's also possible for a malicious client to request a lot of trees,
causing upload-pack's memory to balloon. For example, without this
patch, requesting every tree in git.git like:
pktline() {
local msg="$*"
printf "%04x%s\n" $((1+4+${#msg})) "$msg"
}
want_trees() {
pktline command=fetch
printf 0001
git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
while read oid type; do
test "$type" = "tree" || continue
pktline want $oid
done
pktline done
printf 0000
}
shows a peak heap usage of ~3.7GB. Which is just about the sum of the
sizes of all of the uncompressed trees. For linux.git, it's closer to
17GB.
So the obvious thing to do is to call free_tree_buffer() after we
realize that we've parsed a tree. We know that upload-pack won't need it
later. But let's push the logic into parse_object_with_flags(), telling
it to discard the tree buffer immediately. There are two reasons for
this. One, all of the relevant call-sites already call the with_options
variant to pass the SKIP_HASH flag. So it actually ends up as less code
than manually free-ing in each spot. And two, it enables an extra
optimization that I'll discuss below.
I've touched all of the sites that currently use SKIP_HASH in
upload-pack. That drops the peak heap of the upload-pack invocation
above from 3.7GB to ~24MB.
I've also modified the caller in get_reference(); a partial clone
benefits from its use in pack-objects for the reasons given in 0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects,
2022-09-06), where we were measuring blob requests. But note that the
results of get_reference() are used for traversing, as well; so we
really would _eventually_ use the tree contents. That makes this at
first glance a space/time tradeoff: we won't hold all of the trees in
memory at once, but we'll have to reload them each when it comes time to
traverse.
And here's where our extra optimization comes in. If the caller is not
going to immediately look at the tree contents, and it doesn't care
about checking the hash, then parse_object() can simply skip loading the
tree entirely, just like we do for blobs! And now it's not a space/time
tradeoff in get_reference() anymore. It's just a lazy-load: we're
delaying reading the tree contents until it's time to actually traverse
them one by one.
And of course for upload-pack, this optimization means we never load the
trees at all, saving lots of CPU time. Timing the "every tree from
git.git" request above shows upload-pack dropping from 32 seconds of CPU
to 19 (the remainder is mostly due to pack-objects actually sending the
pack; timing just the upload-pack portion shows we go from 13s to
~0.28s).
These are all highly gamed numbers, of course. For real-world
partial-clone requests we're saving only a small bit of time in
practice. But it does help harden upload-pack against malicious
denial-of-service attacks.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 28 Feb 2024 22:39:03 +0000 (17:39 -0500)]
upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places
In commit 0bc2557951 (upload-pack: skip parse-object re-hashing of
"want" objects, 2022-09-06), we optimized the parse_object() calls for
v2 "want" lines from the client so that they avoided parsing blobs, and
so that they used the commit-graph rather than parsing commit objects
from scratch.
We should extend that to two other spots:
1. We parse "have" objects in the got_oid() function. These won't
generally be non-commits (unlike "want" lines from a partial
clone). But we still benefit from the use of the commit-graph.
2. For v0, the "want" lines are parsed in receive_needs(). These are
also less likely to be non-commits because by default they have to
be ref tips. There are config options you might set to allow
non-tip objects, but you'd mostly do so to support partial clones,
and clients recent enough to support partial clone will generally
speak v2 anyway.
So I don't expect this change to improve performance much for day-to-day
operations. But both are possible denial-of-service vectors, where an
attacker can waste our time by sending over a large number of objects to
parse (of course we may waste even more time serving a pack to them, but
we try as much as possible to optimize that in pack-objects; we should
do what we can here in upload-pack, too).
With this patch, running p5600 with GIT_TEST_PROTOCOL_VERSION=0 shows
similar results to what we saw in 0bc2557951 (which ran with the v2
protocol by default). Here are the numbers for linux.git:
Test HEAD^ HEAD
-----------------------------------------------------------------------------
5600.3: checkout of result 50.91(87.95+2.93) 41.75(79.00+3.18) -18.0%
Or for a more extreme (and malicious) case, we can claim to "have" every
blob in git.git over the v0 protocol:
$ time ./git.old upload-pack . <input >/dev/null
real 0m52.951s
user 0m51.633s
sys 0m1.304s
$ time ./git.new upload-pack . <input >/dev/null
real 0m0.261s
user 0m0.156s
sys 0m0.105s
(Note that these don't actually compute a pack because of the hacky
protocol usage, so those numbers are representing the raw blob-parsing
effort done by upload-pack).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 28 Feb 2024 22:39:00 +0000 (17:39 -0500)]
upload-pack: always turn off save_commit_buffer
When the client sends us "want $oid" lines, we call parse_object($oid)
to get an object struct. It's important to parse the commits because we
need to traverse them in the negotiation phase. But of course we don't
need to hold on to the commit messages for each one.
We've turned off the save_commit_buffer flag in get_common_commits() for
a long time, since f0243f26f6 (git-upload-pack: More efficient usage of
the has_sha1 array, 2005-10-28). That helps with the commits we see
while actually traversing. But:
1. That function is only used by the v0 protocol. I think the v2
protocol's code path leaves the flag on (and thus pays the extra
memory penalty), though I didn't measure it specifically.
2. If the client sends us a bunch of "want" lines, that happens before
the negotiation phase. So we'll hold on to all of those commit
messages. Generally the number of "want" lines scales with the
refs, not with the number of objects in the repo. But a malicious
client could send a lot in order to waste memory.
As an example of (2), if I generate a request to fetch all commits in
git.git like this:
pktline() {
local msg="$*"
printf "%04x%s\n" $((1+4+${#msg})) "$msg"
}
want_commits() {
pktline command=fetch
printf 0001
git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
while read oid type; do
test "$type" = "commit" || continue
pktline want $oid
done
pktline done
printf 0000
}
before this patch upload-pack peaks at ~125MB, and after at ~35MB. The
difference is not coincidentally about the same as the sum of all commit
object sizes as computed by:
git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectsize)' |
perl -alne '$v += $F[1] if $F[0] eq "commit"; END { print $v }'
In a larger repository like linux.git, that number is ~1GB.
In a repository with a full commit-graph file this will have no impact
(and the commit graph would save us from parsing at all, so is a much
better solution!). But it's easy to do, might help a little in
real-world cases (where even if you have a commit graph it might not be
fully up to date), and helps a lot for a worst-case malicious request.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Wed, 28 Feb 2024 22:38:58 +0000 (17:38 -0500)]
upload-pack: disallow object-info capability by default
We added an "object-info" capability to the v2 upload-pack protocol in a2ba162cda (object-info: support for retrieving object info,
2021-04-20). In the almost 3 years since, we have not added any
client-side support, and it does not appear to exist in other
implementations either (JGit understands the verb on the server side,
but not on the client side).
Since this largely unused code is accessible over the network by
default, it increases the attack surface of upload-pack. I don't know of
any particularly severe problem, but one issue is that because of the
request/response nature of the v2 protocol, it will happily read an
unbounded number of packets, adding each one to a string list (without
regard to whether they are objects we know about, duplicates, etc).
This may be something we want to improve in the long run, but in the
short term it makes sense to disable the feature entirely. We'll add a
config option as an escape hatch for anybody who wants to develop the
feature further.
A more gentle option would be to add the config option to let people
disable it manually, but leave it enabled by default. But given that
there's no client side support, that seems like the wrong balance with
security.
Disabling by default will slow adoption a bit once client-side support
does become available (there were some patches[1] in 2022, but nothing
got merged and there's been nothing since). But clients have to deal
with older servers that do not understand the option anyway (and the
capability system handles that), so it will just be a matter of servers
flipping their config at that point (and hopefully once any unbounded
allocations have been addressed).
[jk: this is a patch that GitHub has been running for several years, but
rebased forward and with a new commit message for upstream]
Jeff King [Wed, 28 Feb 2024 22:38:46 +0000 (17:38 -0500)]
upload-pack: accept only a single packfile-uri line
When we see a packfile-uri line from the client, we use
string_list_split() to split it on commas and store the result in a
string_list. A single packfile-uri line is therefore limited to storing
~64kb, the size of a pkt-line.
But we'll happily accept multiple such lines, and each line appends to
the string list, growing without bound.
In theory this could be useful, making:
0017packfile-uris http
0018packfile-uris https
equivalent to:
001dpackfile-uris http,https
But the protocol documentation doesn't indicate that this should work
(and indeed, refers to this in the singular as "the following argument
can be included in the client's request"). And the client-side
implementation in fetch-pack has always sent a single line (JGit appears
to understand the line on the server side but has no client-side
implementation, and libgit2 understands neither).
If we were worried about compatibility, we could instead just put a
limit on the maximum number of values we'd accept. The current client
implementation limits itself to only two values: "http" and "https", so
something like "256" would be more than enough. But accepting only a
single line seems more in line with the protocol documentation, and
matches other parts of the protocol (e.g., we will not accept a second
"filter" line).
We'll also make this more explicit in the protocol documentation; as
above, I think this was always the intent, but there's no harm in making
it clear.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 28 Feb 2024 22:38:40 +0000 (17:38 -0500)]
upload-pack: use a strmap for want-ref lines
When the "ref-in-want" capability is advertised (which it is not by
default), then upload-pack processes a "want-ref" line from the client
by checking that the name is a valid ref and recording it in a
string-list.
In theory this list should grow no larger than the number of refs in the
server-side repository. But since we don't do any de-duplication, a
client which sends "want-ref refs/heads/foo" over and over will cause
the array to grow without bound.
We can fix this by switching to strmap, which efficiently detects
duplicates. There are two client-visible changes here:
1. The "wanted-refs" response will now be in an apparently-random
order (based on iterating the hashmap) rather than the order given
by the client. The protocol documentation is quiet on ordering
here. The current fetch-pack implementation is happy with any
order, as it looks up each returned ref using a binary search in
its local sorted list. JGit seems to implement want-ref on the
server side, but has no client-side support. libgit2 doesn't
support either side.
It would obviously be possible to record the original order or to
use the strmap as an auxiliary data structure. But if the client
doesn't care, we may as well do the simplest thing.
2. We'll now reject duplicates explicitly as a protocol error. The
client should never send them (and our current implementation, even
when asked to "git fetch master:one master:two" will de-dup on the
client side).
If we wanted to be more forgiving, we could perhaps just throw away
the duplicates. But then our "wanted-refs" response back to the
client would omit the duplicates, and it's hard to say what a
client that accidentally sent a duplicate would do with that. So I
think we're better off to complain loudly before anybody
accidentally writes such a client.
Let's also add a note to the protocol documentation clarifying that
duplicates are forbidden. As discussed above, this was already the
intent, but it's not very explicit.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 28 Feb 2024 22:37:44 +0000 (17:37 -0500)]
upload-pack: use oidset for deepen_not list
We record the oid of every deepen-not line the client sends to us. For a
well-behaved client, the resulting array should be bounded by the number
of unique refs we have. But because there's no de-duplication, a
malicious client can cause the array to grow unbounded by just sending
the same "refs/heads/foo" over and over (assuming such a ref exists).
Since the deepen-not list is just being fed to a "rev-list --not"
traversal, the order of items doesn't matter. So we can replace the
oid_array with an oidset which notices and skips duplicates.
That bounds the memory in malicious cases to be linear in the number of
unique refs. And even in non-malicious cases, there may be a slight
improvement in memory usage if multiple refs point to the same oid
(though in practice this list is probably pretty tiny anyway, as it
comes from the user specifying "--shallow-exclude" on the client fetch).
Note that in the trace2 output we'll now output the number of
de-duplicated objects, rather than the total number of "deepen-not"
lines we received. This is arguably a more useful value for tracing /
debugging anyway.
Reported-by: Benjamin Flesch <benjaminflesch@icloud.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 28 Feb 2024 22:37:20 +0000 (17:37 -0500)]
upload-pack: switch deepen-not list to an oid_array
When we see a "deepen-not" line from the client, we verify that the
given name can be resolved as a ref, and then add it to a string list to
be passed later to an internal "rev-list --not" traversal. We record the
actual refname in the string list (so the traversal resolves it again
later), but we'd be better off recording the resolved oid:
1. There's a tiny bit of wasted work in resolving it twice.
2. There's a small race condition with simultaneous updates; the later
traversal may resolve to a different value (or not at all). This
shouldn't cause any bad behavior (we do not care about the value
in this first resolution, so whatever value rev-list gets is OK)
but it could mean a confusing error message (if upload-pack fails
to resolve the ref it produces a useful message, but a failing
traversal later results in just "revision walk setup failed").
3. It makes it simpler to de-duplicate the results. We don't de-dup at
all right now, but we will in the next patch.
>From the client's perspective the behavior should be the same.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 28 Feb 2024 22:37:13 +0000 (17:37 -0500)]
upload-pack: drop separate v2 "haves" array
When upload-pack sees a "have" line in the v0 protocol, it immediately
calls got_oid() with its argument and potentially produces an ACK
response. In the v2 protocol, we simply record the argument in an
oid_array, and only later process all of the "have" objects by calling
the equivalent of got_oid() on the contents of the array.
This makes some sense, as v2 is a pure request/response protocol, as
opposed to v0's asynchronous negotiation phase. But there's a downside:
a client can send us an infinite number of garbage "have" lines, which
we'll happily slurp into the array, consuming memory. Whereas in v0,
they are limited by the number of objects in the repository (because
got_oid() only records objects we have ourselves, and we avoid
duplicates by setting a flag on the object struct).
We can make v2 behave more like v0 by also calling got_oid() directly
when v2 parses a "have" line. Calling it early like this is OK because
got_oid() itself does not interact with the client; it only confirms
that we have the object and sets a few flags. Note that unlike v0, v2
does not ever (before or after this patch) check the return code of
got_oid(), which lets the caller know whether we have the object. But
again, that makes sense; v0 is using it to asynchronously tell the
client to stop sending. In v2's synchronous protocol, we just discard
those entries (and decide how to ACK at the end of each round).
There is one slight tweak we need, though. In v2's state machine, we
reach the SEND_ACKS state if the other side sent us any "have" lines,
whether they were useful or not. Right now we do that by checking
whether the "have" array had any entries, but if we record only the
useful ones, that doesn't work. Instead, we can add a simple boolean
that tells us whether we saw any have line (even if it was useless).
This lets us drop the "haves" array entirely, as we're now placing
objects directly into the "have_obj" object array (which is where
got_oid() put them in the long run anyway). And as a bonus, we can drop
the secondary "common" array used in process_haves_and_send_acks(). It
was essentially a copy of "haves" minus the objects we do not have. But
now that we are using "have_obj" directly, we know everything in it is
useful. So in addition to protecting ourselves against malicious input,
we should slightly lower our memory usage for normal inputs.
Note that there is one user-visible effect. The trace2 output records
the number of "haves". Previously this was the total number of "have"
lines we saw, but now is the number of useful ones. We could retain the
original meaning by keeping a separate counter, but it doesn't seem
worth the effort; this trace info is for debugging and metrics, and
arguably the count of common oids is at least as useful as the total
count.
Reported-by: Benjamin Flesch <benjaminflesch@icloud.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Michael Lohmann [Wed, 28 Feb 2024 13:54:54 +0000 (08:54 -0500)]
revision: implement `git log --merge` also for rebase/cherry-pick/revert
'git log' learned in ae3e5e1ef2 (git log -p --merge [[--] paths...],
2006-07-03) to show commits touching conflicted files in the range
HEAD...MERGE_HEAD, an addition documented in d249b45547 (Document
rev-list's option --merge, 2006-08-04).
It can be useful to look at the commit history to understand what lead
to merge conflicts also for other mergy operations besides merges, like
cherry-pick, revert and rebase.
For rebases and cherry-picks, an interesting range to look at is
HEAD...{REBASE_HEAD,CHERRY_PICK_HEAD}, since even if all the commits
included in that range are not directly part of the 3-way merge,
conflicts encountered during these operations can indeed be caused by
changes introduced in preceding commits on both sides of the history.
For revert, as we are (most likely) reversing changes from a previous
commit, an appropriate range is REVERT_HEAD..HEAD, which is equivalent
to REVERT_HEAD...HEAD and to HEAD...REVERT_HEAD, if we keep HEAD and its
parents on the left side of the range.
As such, adjust the code in prepare_show_merge so it constructs the
range HEAD...$OTHER for OTHER={MERGE_HEAD, CHERRY_PICK_HEAD, REVERT_HEAD
or REBASE_HEAD}. Note that we try these pseudorefs in order, so keep
REBASE_HEAD last since the three other operations can be performed
during a rebase. Note also that in the uncommon case where $OTHER and
HEAD do not share a common ancestor, this will show the complete
histories of both sides since their root commits, which is the same
behaviour as currently happens in that case for HEAD and MERGE_HEAD.
Adjust the documentation of this option accordingly.
Co-authored-by: Johannes Sixt <j6t@kdbg.org> Co-authored-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Michael Lohmann <mi.al.lohmann@gmail.com>
[jc: tweaked in j6t's precedence fix that tries REBASE_HEAD last] Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Michael Lohmann [Wed, 28 Feb 2024 13:54:53 +0000 (08:54 -0500)]
revision: ensure MERGE_HEAD is a ref in prepare_show_merge
This is done to
(1) ensure MERGE_HEAD is a ref,
(2) obtain the oid without any prefixing by refs.c:repo_dwim_ref()
(3) error out when MERGE_HEAD is a symref.
Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Michael Lohmann <mi.al.lohmann@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some functions in Git's source code follow the convention that returning
a negative value indicates a fatal error, e.g. repository corruption.
Let's use this convention in `repo_in_merge_bases()` to report when one
of the specified commits is missing (i.e. when `repo_parse_commit()`
reports an error).
Also adjust the callers of `repo_in_merge_bases()` to handle such
negative return values.
Note: As of this patch, errors are returned only if any of the specified
merge heads is missing. Over the course of the next patches, missing
commits will also be reported by the `paint_down_to_common()` function,
which is called by `repo_in_merge_bases_many()`, and those errors will
be properly propagated back to the caller at that stage.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently this function treats unrelated commit histories the same way
as commit histories with missing commit objects.
Typically, missing commit objects constitute a corrupt repository,
though, and should be reported as such. The next commits will make it
so, but there is one exception: In `git fetch --update-shallow` we
_expect_ commit objects to be missing, and we do want to treat the
now-incomplete commit histories as unrelated.
To allow for that, let's introduce an additional parameter that is
passed to `repo_in_merge_bases_many()` to trigger this behavior, and use
it in the two callers in `shallow.c`.
This commit changes behavior slightly: unless called from the
`shallow.c` functions that set the `ignore_missing_commits` bit, any
non-existing tip commit that is passed to `repo_in_merge_bases_many()`
will now result in an error.
Note: When encountering missing commits while traversing the commit
history in search for merge bases, with this commit there won't be a
change in behavior just yet, their children will still be interpreted as
root commits. This bug will get fixed by follow-up commits.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-reach(paint_down_to_common): plug two memory leaks
When a commit is missing, we return early (currently pretending that no
merge basis could be found in that case). At that stage, it is possible
that a merge base could have been found already, and added to the
`result`, which is now leaked.
The priority queue has a similar issue: There might still be a commit in
that queue.
Let's release both, to address the potential memory leaks.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Wed, 28 Feb 2024 09:10:11 +0000 (10:10 +0100)]
revision: fix --missing=[print|allow*] for annotated tags
In 9830926c7d (rev-list: add commit object support in `--missing`
option, 2023-10-27) we fixed the `--missing` option in `git rev-list`
so that it works with missing commits, not just blobs/trees.
Unfortunately, such a command was still failing with a "fatal: bad
object <oid>" if it was passed a missing commit, blob or tree as an
argument (before the rev walking even begins). This was fixed in a
recent commit.
That fix still doesn't work when an argument passed to the command is
an annotated tag pointing to a missing commit though. In that case
`git rev-list --missing=...` still errors out with a "fatal: bad
object <oid>" error where <oid> is the object ID of the missing
commit.
Let's fix this issue, and also, while at it, let's add tests not just
for annotated tags but also for regular tags and branches.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>