Taylor Blau [Thu, 14 Dec 2023 22:24:44 +0000 (17:24 -0500)]
pack-bitmap: enable reuse from all bitmapped packs
Now that both the pack-bitmap and pack-objects code are prepared to
handle marking and using objects from multiple bitmapped packs for
verbatim reuse, allow marking objects from all bitmapped packs as
eligible for reuse.
Within the `reuse_partial_packfile_from_bitmap()` function, we no longer
only mark the pack whose first object is at bit position zero for reuse,
and instead mark any pack contained in the MIDX as a reuse candidate.
Provide a handful of test cases in a new script (t5332) exercising
interesting behavior for multi-pack reuse to ensure that we performed
all of the previous steps correctly.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:24:42 +0000 (17:24 -0500)]
pack-objects: allow setting `pack.allowPackReuse` to "single"
In e704fc7978 (pack-objects: introduce pack.allowPackReuse, 2019-12-18),
the `pack.allowPackReuse` configuration option was introduced, allowing
users to disable the pack reuse mechanism.
To prepare for debugging multi-pack reuse, allow setting configuration
to "single" in addition to the usual bool-or-int values.
"single" implies the same behavior as "true", "1", "yes", and so on. But
it will complement a new "multi" value (to be introduced in a future
commit). When set to "single", we will only perform pack reuse on a
single pack, regardless of whether or not there are multiple MIDX'd
packs.
This requires no code changes (yet), since we only support single pack
reuse.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:24:36 +0000 (17:24 -0500)]
pack-objects: add tracing for various packfile metrics
As part of the multi-pack reuse effort, we will want to add some tests
that assert that we reused a certain number of objects from a certain
number of packs.
We could do this by grepping through the stderr output of
`pack-objects`, but doing so would be brittle in case the output format
changed.
Instead, let's use the trace2 mechanism to log various pieces of
information about the generated packfile, which we can then use to
compare against desired values.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:24:34 +0000 (17:24 -0500)]
pack-bitmap: prepare to mark objects from multiple packs for reuse
Now that the pack-objects code is equipped to handle reusing objects
from multiple packs, prepare the pack-bitmap code to mark objects from
multiple packs as reuse candidates.
In order to prepare the pack-bitmap code for this change, remove the
same set of assumptions we unwound in previous commits from the helper
function `reuse_partial_packfile_from_bitmap_1()`, in preparation for it
to be called in a loop over the set of bitmapped packs in a following
commit.
Most importantly, we can no longer assume that the bit position
corresponding to the first object in a given reuse pack candidate is at
the beginning of the bitmap itself.
For the single pack that this assumption is still true for (in MIDX
bitmaps, this is the preferred pack, in single-pack bitmaps it is the
pack the bitmap is tied to), we can still use our whole-words
optimization.
But for all subsequent packs, we can not make use of this optimization,
since it assumes that all delta bases are being sent from the same pack,
which would break if we are sending OFS_DELTAs down to the client. To
understand why, consider two packs, P1 and P2 where:
- P1 has object A which is a delta on base B
- P2 has its own copy of B, in addition to other objects
Suppose that the MIDX which covers P1 and P2 selected its copy of A from
P1, but selected its copy of B from P2. Since A is a delta of B, but the
base was selected from a different pack, sending the bytes corresponding
to A as an OFS_DELTA verbatim from P1 would be incorrect, since we don't
guarantee that B is in the same place relative to A in the generated
pack as in P1.
For now, we detect and reject these cross-pack deltas by searching for
the (pack_id, offset) pair for the delta's base object (using the same
pack_id as the pack containing the delta'd object) in the MIDX. If we
find a match, that means that the MIDX did indeed pick the base object
from the same pack, and we are OK to reuse the delta.
If we don't find a match, however, that means that the base object was
selected from a different pack in the MIDX, and we can let the slower
path handle re-delta'ing our candidate object.
In the future, there are a couple of other things we could do, namely:
- Turn any cross-pack deltas (which are stored as OFS_DELTAs) into
REF_DELTAs. We already do this today when reusing an OFS_DELTA
without `--delta-base-offset` enabled, so it's not a huge stretch to
do the same for cross-pack deltas even when `--delta-base-offset` is
enabled.
This would work, but would obviously result in larger-than-necessary
packs, as we in theory *could* represent these cross-pack deltas by
patching an existing OFS_DELTA. But it's not clear how much that
would matter in practice. I suspect it would have a lot to do with
how you pack your repository in the first place.
- Finally, we could patch OFS_DELTAs across packs in a similar fashion
as we do today for OFS_DELTAs within a single pack on either side of
a gap. This would result in the smallest packs of the three options
here, but implementing this would be more involved.
At minimum, you'd have to keep the reusable chunks list for all
reused packs, not just the one we're currently processing. And you'd
have to ensure that any bases which are a part of cross-pack deltas
appear before the delta. I think this is possible to do, but would
require assembling the reusable chunks list potentially in a
different order than they appear in the source packs.
For now, let's pursue the simplest approach and reject any cross-pack
deltas.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have extracted the `midx_key_to_pack_pos()` function, we can
implement the `midx_pair_to_pack_pos()` function which accepts (pack_id,
offset) tuples and returns an index into the psuedo-pack order.
This will be used in a following commit in order to figure out whether
or not the MIDX chose a given delta's base object from the same pack as
the delta resides in. It will do so by locating the base object's offset
in the pack, and then performing a binary search using the same pack ID
with the base object's offset.
If (and only if) it finds a match (at any position) we can guarantee
that the MIDX selected both halves of the delta/base pair from the same
pack.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:24:28 +0000 (17:24 -0500)]
pack-revindex: factor out `midx_key_to_pack_pos()` helper
The `midx_to_pack_pos()` function implements a binary search over
objects in the MIDX between lexical and pseudo-pack order. It does this
by taking in an index into the lexical order (i.e. the same argument
you'd use for `nth_midxed_object_id()` and similar) and spits out a
position in the pseudo-pack order.
This works for all callers, since they currently all are translating
from lexical order to pseudo-pack order. But future callers may want to
translate a known (offset, pack_id) tuple into an index into the
psuedo-pack order, without knowing where that (offset, pack_id) tuple
appears in lexical order.
Prepare for implementing a function that translates between a (offset,
pack_id) tuple into an index into the psuedo-pack order by extracting a
helper function which does just that, and then reimplementing
midx_to_pack_pos() in terms of it.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:24:25 +0000 (17:24 -0500)]
midx: implement `midx_preferred_pack()`
When performing a binary search over the objects in a MIDX's bitmap
(i.e. in pseudo-pack order), the reader reconstructs the pseudo-pack
ordering using a combination of (a) the preferred pack, (b) the pack's
lexical position in the MIDX based on pack names, and (c) the object
offset within the pack.
In order to perform this binary search, the reader must know the
identity of the preferred pack. This could be stored in the MIDX, but
isn't for historical reasons, mostly because it can easily be inferred
at read-time by looking at the object in the first bit position and
finding out which pack it was selected from in the MIDX, like so:
In midx_to_pack_pos() which performs this binary search, we look up the
identity of the preferred pack before each search. This is relatively
quick, since it involves two table-driven lookups (one in the MIDX's
revindex for `pack_pos_to_midx()`, and another in the MIDX's object
table for `nth_midxed_pack_int_id()`).
But since the preferred pack does not change after the MIDX is written,
it is safe to cache this value on the MIDX itself.
Write a helper to do just that, and rewrite all of the existing
call-sites that care about the identity of the preferred pack in terms
of this new helper.
This will prepare us for a subsequent patch where we will need to binary
search through the MIDX's pseudo-pack order multiple times.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:24:22 +0000 (17:24 -0500)]
git-compat-util.h: implement checked size_t to uint32_t conversion
In a similar fashion as other checked cast functions in this header
(such as `cast_size_t_to_ulong()` and `cast_size_t_to_int()`), implement
a checked cast function for going from a size_t to a uint32_t value.
This function will be utilized in a future commit which needs to make
such a conversion.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:24:17 +0000 (17:24 -0500)]
pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse
The function `write_reused_pack_verbatim()` within
`builtin/pack-objects.c` is responsible for writing out a continuous
set of objects beginning at the start of the reuse packfile.
In the existing implementation, we did something like:
while (pos < reuse_packfile_bitmap->word_alloc &&
reuse_packfile_bitmap->words[pos] == (eword_t)~0)
pos++;
if (pos)
/* write first `pos * BITS_IN_WORD` objects from pack */
as an optimization to record a single chunk for the longest continuous
prefix of objects wanted out of the reuse pack, instead of having a
chunk for each individual object. For more details, see bb514de356
(pack-objects: improve partial packfile reuse, 2019-12-18).
In order to retain this optimization in a multi-pack reuse world, we can
no longer assume that the first object in a pack is on a word boundary
in the bitmap storing the set of reusable objects.
Assuming that all objects from the beginning of the reuse packfile up to
the object corresponding to the first bit on a word boundary are part of
the result, consume whole words at a time until the last whole word
belonging to the reuse packfile. Copy those objects to the resulting
packfile, and track that we reused them by recording a single chunk.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:24:14 +0000 (17:24 -0500)]
pack-objects: prepare `write_reused_pack()` for multi-pack reuse
The function `write_reused_pack()` within `builtin/pack-objects.c` is
responsible for performing pack-reuse on a single pack, and has two main
functions:
- it dispatches a call to `write_reused_pack_verbatim()` to see if we
can reuse portions of the packfile in whole-word chunks
- for any remaining objects (that is, any objects that appear after
the first "gap" in the bitmap), call write_reused_pack_one() on that
object to record it for reuse.
Prepare this function for multi-pack reuse by removing the assumption
that the bit position corresponding to the first object being reused
from a given pack must be at bit position zero.
The changes in this function are mostly straightforward. Initialize `i`
to the position of the first word to contain bits corresponding to that
reuse pack. In most situations, we throw the initialized value away,
since we end up replacing it with the return value from
write_reused_pack_verbatim(), moving us past the section of whole words
that we reused.
Likewise, modify the per-object loop to ignore any bits at the beginning
of the first word that do not belong to the pack currently being reused,
as well as skip to the "done" section once we have processed the last
bit corresponding to this pack.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:24:12 +0000 (17:24 -0500)]
pack-objects: pass `bitmapped_pack`'s to pack-reuse functions
Further prepare pack-objects to perform verbatim pack-reuse over
multiple packfiles by converting functions that take in a pointer to a
`struct packed_git` to instead take in a pointer to a `struct
bitmapped_pack`.
The additional information found in the bitmapped_pack struct (such as
the bit position corresponding to the beginning of the pack) will be
necessary in order to perform verbatim pack-reuse.
Note that we don't use any of the extra pieces of information contained
in the bitmapped_pack struct, so this step is merely preparatory and
does not introduce any functional changes.
Note further that we do not change the argument type to
write_reused_pack_one(). That function is responsible for copying
sections of the packfile directly and optionally patching any OFS_DELTAs
to account for not reusing sections of the packfile in between a delta
and its base.
As such, that function is (and should remain) oblivious to multi-pack
reuse, and does not require any of the extra pieces of information
stored in the bitmapped_pack struct.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:24:09 +0000 (17:24 -0500)]
pack-objects: keep track of `pack_start` for each reuse pack
When reusing objects from a pack, we keep track of a set of one or more
`reused_chunk`s, corresponding to sections of one or more object(s) from
a source pack that we are reusing. Each chunk contains two pieces of
information:
- the offset of the first object in the source pack (relative to the
beginning of the source pack)
- the difference between that offset, and the corresponding offset in
the pack we're generating
The purpose of keeping track of these is so that we can patch an
OFS_DELTAs that cross over a section of the reuse pack that we didn't
take.
For instance, consider a hypothetical pack as shown below:
Suppose that we are sending objects "base", "other", and "delta", and
that the "delta" object is stored as an OFS_DELTA, and that its base is
"base". If we don't send any objects in the "(unused)" range, we can't
copy the delta'd object directly, since its delta offset includes a
range of the pack that we didn't copy, so we have to account for that
difference when patching and reassembling the delta.
In order to compute this value correctly, we need to know not only where
we are in the packfile we're assembling (with `hashfile_total(f)`) but
also the position of the first byte of the packfile that we are
currently reusing. Currently, this works just fine, since when reusing
only a single pack those two values are always identical (because
verbatim reuse is the first thing pack-objects does when enabled after
writing the pack header).
But when reusing multiple packs which have one or more gaps, we'll need
to account for these two values diverging.
Together, these two allow us to compute the reused chunk's offset
difference relative to the start of the reused pack, as desired.
Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
, all of which assume that there is exactly one packfile being reused:
the global constant `reuse_packfile`.
Prepare for reusing objects from multiple packs by making reuse packfile
a parameter of each of the above functions in preparation for calling
these functions in a loop with multiple packfiles.
Note that we still have the global "reuse_packfile", but pass it through
each of the above function's parameter lists, eliminating all but one
direct access (the top-level caller in `write_pack_file()`). Even after
this series, we will still have a global, but it will hold the array of
reusable packfiles, and we'll pass them one at a time to these functions
in a loop.
Note also that we will eventually need to pass a `bitmapped_pack`
instead of a `packed_git` in order to hold onto additional information
required for reuse (such as the bit position of the first object
belonging to that pack). But that change will be made in a future commit
so as to minimize the noise below as much as possible.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:24:04 +0000 (17:24 -0500)]
pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()`
Further prepare for enabling verbatim pack-reuse over multiple packfiles
by changing the signature of reuse_partial_packfile_from_bitmap() to
populate an array of `struct bitmapped_pack *`'s instead of a pointer to
a single packfile.
Since the array we're filling out is sized dynamically[^1], add an
additional `size_t *` parameter which will hold the number of reusable
packs (equal to the number of elements in the array).
Note that since we still have not implemented true multi-pack reuse,
these changes aren't propagated out to the rest of the caller in
builtin/pack-objects.c.
In the interim state, we expect that the array has a single element, and
we use that element to fill out the static `reuse_packfile` variable
(which is a bog-standard `struct packed_git *`). Future commits will
continue to push this change further out through the pack-objects code.
[^1]: That is, even though we know the number of packs which are
candidates for pack-reuse, we do not know how many of those
candidates we can actually reuse.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The signature of `reuse_partial_packfile_from_bitmap()` currently takes
in a bitmap, as well as three output parameters (filled through
pointers, and passed as arguments), and also returns an integer result.
The output parameters are filled out with: (a) the packfile used for
pack-reuse, (b) the number of objects from that pack that we can reuse,
and (c) a bitmap indicating which objects we can reuse. The return value
is either -1 (when there are no objects to reuse), or 0 (when there is
at least one object to reuse).
Some of these parameters are redundant. Notably, we can infer from the
bitmap how many objects are reused by calling bitmap_popcount(). And we
can similar compute the return value based on that number as well.
As such, clean up the signature of this function to drop the "*entries"
parameter, as well as the int return value, since the single caller of
this function can infer these values themself.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:23:59 +0000 (17:23 -0500)]
ewah: implement `bitmap_is_empty()`
In a future commit, we will want to check whether or not a bitmap has
any bits set in any of its words. The best way to do this (prior to the
existence of this patch) is to call `bitmap_popcount()` and check
whether the result is non-zero.
But this is semi-wasteful, since we do not need to know the exact number
of bits set, only whether or not there is at least one of them.
Implement a new helper function to check just that.
Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:23:56 +0000 (17:23 -0500)]
pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions
When trying to assemble a pack with bitmaps using `--use-bitmap-index`,
`pack-objects` asks the pack-bitmap machinery for a bitmap which
indicates the set of objects we can "reuse" verbatim from on-disk.
This set is roughly comprised of: a prefix of objects in the bitmapped
pack (or preferred pack, in the case of a multi-pack reachability
bitmap), plus any other objects not included in the prefix, excluding
any deltas whose base we are not sending in the resulting pack.
The pack-bitmap machinery is responsible for computing this bitmap, and
does so with the following functions:
In the existing implementation, the first function is responsible for
(a) marking the prefix of objects in the reusable pack, and then (b)
calling try_partial_reuse() on any remaining objects to ensure that they
are also reusable (and removing them from the bitmapped set if they are
not).
Likewise, the `try_partial_reuse()` function is responsible for checking
whether an isolated object (that is, an object from the bitmapped
pack/preferred pack not contained in the prefix from earlier) may be
reused, i.e. that it isn't a delta of an object that we are not sending
in the resulting pack.
These functions are based on two core assumptions, which we will unwind
in this and the following commits:
1. There is only a single pack from the bitmap which is eligible for
verbatim pack-reuse. For single-pack bitmaps, this is trivially the
bitmapped pack. For multi-pack bitmaps, this is (currently) the
MIDX's preferred pack.
2. The pack eligible for reuse has its first object in bit position 0,
and all objects from that pack follow in pack-order from that first
bit position.
In order to perform verbatim pack reuse over multiple packs, we must
unwind these two assumptions. Most notably, in order to reuse bits from
a given packfile, we need to know the first bit position occupied by
an object form that packfile. To propagate this information around, pass
a `struct bitmapped_pack *` anywhere we previously passed a `struct
packed_git *`, since the former contains the bitmap position we're
interested in (as well as a pointer to the latter).
As an additional step, factor out a sub-routine from the main
`reuse_partial_packfile_from_bitmap()` function, called
`reuse_partial_packfile_from_bitmap_1()`. This new function will be
responsible for figuring out which objects may be reused from a single
pack, and the existing function will dispatch multiple calls to its new
helper function for each reusable pack.
Consequently, `reuse_partial_packfile_from_bitmap()` will now maintain
an array of reusable packs instead of a single such pack. We currently
expect that array to have only a single element, so this awkward state
is short-lived. It will serve as useful scaffolding in subsequent
commits as we begin to work towards enabling multi-pack reuse.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:23:54 +0000 (17:23 -0500)]
midx: implement `midx_locate_pack()`
The multi-pack index API exposes a `midx_contains_pack()` function that
takes in a string ending in either ".idx" or ".pack" and returns whether
or not the MIDX contains a given pack corresponding to that string.
There is no corresponding function to locate the position of a pack
within the MIDX's pack order (sorted lexically by pack filename).
We could add an optional out parameter to `midx_contains_pack()` that is
filled out with the pack's position when the parameter is non-NULL. To
minimize the amount of fallout from this change, instead introduce a new
function by renaming `midx_contains_pack()` to `midx_locate_pack()`,
adding that output parameter, and then reimplementing
`midx_contains_pack()` in terms of it.
Future patches will make use of this new function.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:23:51 +0000 (17:23 -0500)]
midx: implement `BTMP` chunk
When a multi-pack bitmap is used to implement verbatim pack reuse (that
is, when verbatim chunks from an on-disk packfile are copied
directly[^1]), it does so by using its "preferred pack" as the source
for pack-reuse.
This allows repositories to pack the majority of their objects into a
single (often large) pack, and then use it as the single source for
verbatim pack reuse. This increases the amount of objects that are
reused verbatim (and consequently, decrease the amount of time it takes
to generate many packs). But this performance comes at a cost, which is
that the preferred packfile must pace its growth with that of the entire
repository in order to maintain the utility of verbatim pack reuse.
As repositories grow beyond what we can reasonably store in a single
packfile, the utility of verbatim pack reuse diminishes. Or, at the very
least, it becomes increasingly more expensive to maintain as the pack
grows larger and larger.
It would be beneficial to be able to perform this same optimization over
multiple packs, provided some modest constraints (most importantly, that
the set of packs eligible for verbatim reuse are disjoint with respect
to the subset of their objects being sent).
If we assume that the packs which we treat as candidates for verbatim
reuse are disjoint with respect to any of their objects we may output,
we need to make only modest modifications to the verbatim pack-reuse
code itself. Most notably, we need to remove the assumption that the
bits in the reachability bitmap corresponding to objects from the single
reuse pack begin at the first bit position.
Future patches will unwind these assumptions and reimplement their
existing functionality as special cases of the more general assumptions
(e.g. that reuse bits can start anywhere within the bitset, but happen
to start at 0 for all existing cases).
This patch does not yet relax any of those assumptions. Instead, it
implements a foundational data-structure, the "Bitampped Packs" (`BTMP`)
chunk of the multi-pack index. The `BTMP` chunk's contents are described
in detail here. Importantly, the `BTMP` chunk contains information to
map regions of a multi-pack index's reachability bitmap to the packs
whose objects they represent.
For now, this chunk is only written, not read (outside of the test-tool
used in this patch to test the new chunk's behavior). Future patches
will begin to make use of this new chunk.
[^1]: Modulo patching any `OFS_DELTA`'s that cross over a region of the
pack that wasn't used verbatim.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:23:48 +0000 (17:23 -0500)]
midx: factor out `fill_pack_info()`
When selecting which packfiles will be written while generating a MIDX,
the MIDX internals fill out a 'struct pack_info' with various pieces of
book-keeping.
Instead of filling out each field of the `pack_info` structure
individually in each of the two spots that modify the array of such
structures (`ctx->info`), extract a common routine that does this for
us.
This reduces the code duplication by a modest amount. But more
importantly, it zero-initializes the structure before assigning values
into it. This hardens us for a future change which will add additional
fields to this structure which (until this patch) was not
zero-initialized.
As a result, any new fields added to the `pack_info` structure need only
be updated in a single location, instead of at each spot within midx.c.
There are no functional changes in this patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:23:45 +0000 (17:23 -0500)]
pack-bitmap: plug leak in find_objects()
The `find_objects()` function creates an object_list for any tips of the
reachability query which do not have corresponding bitmaps.
The object_list is not used outside of `find_objects()`, but we never
free it with `object_list_free()`, resulting in a leak. Let's plug that
leak by calling `object_list_free()`, which results in t6113 becoming
leak-free.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:23:42 +0000 (17:23 -0500)]
pack-bitmap-write: deep-clear the `bb_commit` slab
The `bb_commit` commit slab is used by the pack-bitmap-write machinery
to track various pieces of bookkeeping used to generate reachability
bitmaps.
Even though we clear the slab when freeing the bitmap_builder struct
(with `bitmap_builder_clear()`), there are still pointers which point to
locations in memory that have not yet been freed, resulting in a leak.
Plug the leak by introducing a suitable `free_fn` for the `struct
bb_commit` type, and make sure it is called on each member of the slab
via the `deep_clear_bb_data()` function.
Note that it is possible for both of the arguments to `bitmap_free()` to
be NULL, but `bitmap_free()` is a noop for NULL arguments, so it is OK
to pass them unconditionally.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 14 Dec 2023 22:23:39 +0000 (17:23 -0500)]
pack-objects: free packing_data in more places
The pack-objects internals use a packing_data struct to track what
objects are part of the pack(s) being formed.
Since these structures contain allocated fields, failing to
appropriately free() them results in a leak. Plug that leak by
introducing a clear_packing_data() function, and call it in the
appropriate spots.
This is a fairly straightforward leak to plug, since none of the callers
expect to read any values or have any references to parts of the address
space being freed.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sun, 10 Dec 2023 00:37:51 +0000 (16:37 -0800)]
Merge branch 'tz/send-email-negatable-options'
Newer versions of Getopt::Long started giving warnings against our
(ab)use of it in "git send-email". Bump the minimum version
requirement for Perl to 5.8.1 (from September 2002) to allow
simplifying our implementation.
* tz/send-email-negatable-options:
send-email: avoid duplicate specification warnings
perl: bump the required Perl version to 5.8.1 from 5.8.0
"git for-each-ref --no-sort" still sorted the refs alphabetically
which paid non-trivial cost. It has been redefined to show output
in an unspecified order, to allow certain optimizations to take
advantage of.
* vd/for-each-ref-unsorted-optimization:
t/perf: add perf tests for for-each-ref
ref-filter.c: use peeled tag for '*' format fields
for-each-ref: clean up documentation of --format
ref-filter.c: filter & format refs in the same callback
ref-filter.c: refactor to create common helper functions
ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()'
ref-filter.h: add functions for filter/format & format-only
ref-filter.h: move contains caches into filter
ref-filter.h: add max_count and omit_empty to ref_format
ref-filter.c: really don't sort when using --no-sort
Junio C Hamano [Sun, 10 Dec 2023 00:37:50 +0000 (16:37 -0800)]
Merge branch 'ps/ban-a-or-o-operator-with-test'
Test and shell scripts clean-up.
* ps/ban-a-or-o-operator-with-test:
Makefile: stop using `test -o` when unlinking duplicate executables
contrib/subtree: convert subtree type check to use case statement
contrib/subtree: stop using `-o` to test for number of args
global: convert trivial usages of `test <expr> -a/-o <expr>`
Junio C Hamano [Sun, 10 Dec 2023 00:37:49 +0000 (16:37 -0800)]
Merge branch 'ps/ref-tests-update'
Update ref-related tests.
* ps/ref-tests-update:
t: mark several tests that assume the files backend with REFFILES
t7900: assert the absence of refs via git-for-each-ref(1)
t7300: assert exact states of repo
t4207: delete replace references via git-update-ref(1)
t1450: convert tests to remove worktrees via git-worktree(1)
t: convert tests to not access reflog via the filesystem
t: convert tests to not access symrefs via the filesystem
t: convert tests to not write references via the filesystem
t: allow skipping expected object ID in `ref-store update-ref`
Junio C Hamano [Sun, 10 Dec 2023 00:37:48 +0000 (16:37 -0800)]
Merge branch 'jk/chunk-bounds-more'
Code clean-up for jk/chunk-bounds topic.
* jk/chunk-bounds-more:
commit-graph: mark chunk error messages for translation
commit-graph: drop verify_commit_graph_lite()
commit-graph: check order while reading fanout chunk
commit-graph: use fanout value for graph size
commit-graph: abort as soon as we see a bogus chunk
commit-graph: clarify missing-chunk error messages
commit-graph: drop redundant call to "lite" verification
midx: check consistency of fanout table
commit-graph: handle overflow in chunk_size checks
Junio C Hamano [Sun, 10 Dec 2023 00:37:48 +0000 (16:37 -0800)]
Merge branch 'ps/ci-gitlab'
Add support for GitLab CI.
* ps/ci-gitlab:
ci: add support for GitLab CI
ci: install test dependencies for linux-musl
ci: squelch warnings when testing with unusable Git repo
ci: unify setup of some environment variables
ci: split out logic to set up failed test artifacts
ci: group installation of Docker dependencies
ci: make grouping setup more generic
ci: reorder definitions for grouping functions
Junio C Hamano [Sun, 10 Dec 2023 00:37:47 +0000 (16:37 -0800)]
Merge branch 'js/doc-unit-tests-with-cmake'
Update the base topic to work with CMake builds.
* js/doc-unit-tests-with-cmake:
cmake: handle also unit tests
cmake: use test names instead of full paths
cmake: fix typo in variable name
artifacts-tar: when including `.dll` files, don't forget the unit-tests
unit-tests: do show relative file paths
unit-tests: do not mistake `.pdb` files for being executable
cmake: also build unit tests
Junio C Hamano [Sun, 10 Dec 2023 00:37:46 +0000 (16:37 -0800)]
Merge branch 'ps/httpd-tests-on-nixos'
Portability tweak.
* ps/httpd-tests-on-nixos:
t9164: fix inability to find basename(1) in Subversion hooks
t/lib-httpd: stop using legacy crypt(3) for authentication
t/lib-httpd: dynamically detect httpd and modules path
A warning is issued for options which are specified more than once
beginning with perl-Getopt-Long >= 2.55. In addition to causing users
to see warnings, this results in test failures which compare the output.
An example, from t9001-send-email.37:
| +++ diff -u expect actual
| --- expect 2023-11-14 10:38:23.854346488 +0000
| +++ actual 2023-11-14 10:38:23.848346466 +0000
| @@ -1,2 +1,7 @@
| +Duplicate specification "no-chain-reply-to" for option "no-chain-reply-to"
| +Duplicate specification "to-cover|to-cover!" for option "to-cover"
| +Duplicate specification "cc-cover|cc-cover!" for option "cc-cover"
| +Duplicate specification "no-thread" for option "no-thread"
| +Duplicate specification "no-to-cover" for option "no-to-cover"
| fatal: longline.patch:35 is longer than 998 characters
| warning: no patches were sent
| error: last command exited with $?=1
| not ok 37 - reject long lines
Remove the duplicate option specs. These are primarily the explicit
'--no-' prefix opts which were added in f471494303 (git-send-email.perl:
support no- prefix with older GetOptions, 2015-01-30). This was done
specifically to support perl-5.8.0 which includes Getopt::Long 2.32[1].
Getopt::Long 2.33 added support for the '--no-' prefix natively by
appending '!' to the option specification string, which was included in
perl-5.8.1 and is not present in perl-5.8.0. The previous commit bumped
the minimum supported Perl version to 5.8.1 so we no longer need to
provide the '--no-' variants for negatable options manually.
Teach `--git-completion-helper` to output the '--no-' options. They are
not included in the options hash and would otherwise be lost.
Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Todd Zullinger [Thu, 16 Nov 2023 19:30:10 +0000 (14:30 -0500)]
perl: bump the required Perl version to 5.8.1 from 5.8.0
The following commit will make use of a Getopt::Long feature which is
only present in Perl >= 5.8.1. Document that as the minimum version we
support.
Many of our Perl scripts will continue to run with 5.8.0 but this change
allows us to adjust them as needed without breaking any promises to our
users.
The Perl requirement was last changed in d48b284183 (perl: bump the
required Perl version to 5.8 from 5.6.[21], 2010-09-24). At that time,
5.8.0 was 8 years old. It is now over 21 years old.
Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Tue, 14 Nov 2023 19:53:58 +0000 (19:53 +0000)]
t/perf: add perf tests for for-each-ref
Add performance tests for 'for-each-ref'. The tests exercise different
combinations of filters/formats/options, as well as the overall performance
of 'git for-each-ref | git cat-file --batch-check' to demonstrate the
performance difference vs. 'git for-each-ref' with "%(*fieldname)" format
specifiers.
All tests are run against a repository with 40k loose refs - 10k commits,
each having a unique:
- branch
- custom ref (refs/custom/special_*)
- annotated tag pointing at the commit
- annotated tag pointing at the other annotated tag (i.e., a nested tag)
After those tests are finished, the refs are packed with 'pack-refs --all'
and the same tests are rerun.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Tue, 14 Nov 2023 19:53:57 +0000 (19:53 +0000)]
ref-filter.c: use peeled tag for '*' format fields
In most builtins ('rev-parse <revision>^{}', 'show-ref --dereference'),
"dereferencing" a tag refers to a recursive peel of the tag object. Unlike
these cases, the dereferencing prefix ('*') in 'for-each-ref' format
specifiers triggers only a single, non-recursive dereference of a given tag
object. For most annotated tags, a single dereference is all that is needed
to access the tag's associated commit or tree; "recursive" and
"non-recursive" dereferencing are functionally equivalent in these cases.
However, nested tags (annotated tags whose target is another annotated tag)
dereferenced once return another tag, where a recursive dereference would
return the commit or tree.
Currently, if a user wants to filter & format refs and include information
about a recursively-dereferenced tag, they can do so with something like
'cat-file --batch-check':
But the combination of commands is inefficient. So, to improve the
performance of this use case and align the defererencing behavior of
'for-each-ref' with that of other commands, update the ref formatting code
to use the peeled tag (from 'peel_iterated_oid()') to populate '*' fields
rather than the tag's immediate target object (from 'get_tagged_oid()').
Additionally, add a test to 't6300-for-each-ref' to verify new nested tag
behavior and update 't6302-for-each-ref-filter.sh' to print the correct
value for nested dereferenced fields.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Tue, 14 Nov 2023 19:53:56 +0000 (19:53 +0000)]
for-each-ref: clean up documentation of --format
Move the description of the `*` prefix from the --format option
documentation to the part of the command documentation that deals with other
object type-specific modifiers. Also reorganize and reword the remaining
--format documentation so that the explanation of the default format doesn't
interrupt the details on format string interpolation.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Tue, 14 Nov 2023 19:53:55 +0000 (19:53 +0000)]
ref-filter.c: filter & format refs in the same callback
Update 'filter_and_format_refs()' to try to perform ref filtering &
formatting in a single ref iteration, without an intermediate 'struct
ref_array'. This can only be done if no operations need to be performed on a
pre-filtered array; specifically, if the refs are
- filtered on reachability,
- sorted, or
- formatted with ahead-behind information
they cannot be filtered & formatted in the same iteration. In that case,
fall back on the current filter-then-sort-then-format flow.
This optimization substantially improves memory usage due to no longer
storing a ref array in memory. In some cases, it also dramatically reduces
runtime (e.g. 'git for-each-ref --no-sort --count=1', which no longer loads
all refs into a 'struct ref_array' to printing only the first ref).
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Tue, 14 Nov 2023 19:53:54 +0000 (19:53 +0000)]
ref-filter.c: refactor to create common helper functions
Factor out parts of 'ref_array_push()', 'ref_filter_handler()', and
'filter_refs()' into new helper functions:
* Extract the code to grow a 'struct ref_array' and append a given 'struct
ref_array_item *' to it from 'ref_array_push()' into 'ref_array_append()'.
* Extract the code to filter a given ref by refname & object ID then create
a new 'struct ref_array_item *' from 'filter_one()' into
'apply_ref_filter()'.
* Extract the code for filter pre-processing, contains cache creation, and
ref iteration from 'filter_refs()' into 'do_filter_refs()'.
In later patches, these helpers will be used by new ref-filter API
functions. This patch does not result in any user-facing behavior changes or
changes to callers outside of 'ref-filter.c'.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Tue, 14 Nov 2023 19:53:53 +0000 (19:53 +0000)]
ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()'
Rename 'ref_filter_handler()' to 'filter_one()' to more clearly distinguish
it from other ref filtering callbacks that will be added in later patches.
The "*_one()" naming convention is common throughout the codebase for
iteration callbacks.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Tue, 14 Nov 2023 19:53:52 +0000 (19:53 +0000)]
ref-filter.h: add functions for filter/format & format-only
Add two new public methods to 'ref-filter.h':
* 'print_formatted_ref_array()' which, given a format specification & array
of ref items, formats and prints the items to stdout.
* 'filter_and_format_refs()' which combines 'filter_refs()',
'ref_array_sort()', and 'print_formatted_ref_array()' into a single
function.
This consolidates much of the code used to filter and format refs in
'builtin/for-each-ref.c', 'builtin/tag.c', and 'builtin/branch.c', reducing
duplication and simplifying the future changes needed to optimize the filter
& format process.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Tue, 14 Nov 2023 19:53:51 +0000 (19:53 +0000)]
ref-filter.h: move contains caches into filter
Move the 'contains_cache' and 'no_contains_cache' used in filter_refs into
an 'internal' struct of the 'struct ref_filter'. In later patches, the
'struct ref_filter *' will be a common data structure across multiple
filtering functions. These caches are part of the common functionality the
filter struct will support, so they are updated to be internally accessible
wherever the filter is used.
The design used here mirrors what was introduced in 576de3d956
(unpack_trees: start splitting internal fields from public API, 2023-02-27)
for 'unpack_trees_options'.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Tue, 14 Nov 2023 19:53:50 +0000 (19:53 +0000)]
ref-filter.h: add max_count and omit_empty to ref_format
Add an internal 'array_opts' struct to 'struct ref_format' containing
formatting options that pertain to the formatting of an entire ref array:
'max_count' and 'omit_empty'. These values are specified by the '--count'
and '--omit-empty' options, respectively, to 'for-each-ref'/'tag'/'branch'.
Storing these values in the 'ref_format' will simplify the consolidation of
ref array formatting logic across builtins in later patches.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Tue, 14 Nov 2023 19:53:49 +0000 (19:53 +0000)]
ref-filter.c: really don't sort when using --no-sort
When '--no-sort' is passed to 'for-each-ref', 'tag', and 'branch', the
printed refs are still sorted by ascending refname. Change the handling of
sort options in these commands so that '--no-sort' to truly disables
sorting.
'--no-sort' does not disable sorting in these commands is because their
option parsing does not distinguish between "the absence of '--sort'"
(and/or values for tag.sort & branch.sort) and '--no-sort'. Both result in
an empty 'sorting_options' string list, which is parsed by
'ref_sorting_options()' to create the 'struct ref_sorting *' for the
command. If the string list is empty, 'ref_sorting_options()' interprets
that as "the absence of '--sort'" and returns the default ref sorting
structure (equivalent to "refname" sort).
To handle '--no-sort' properly while preserving the "refname" sort in the
"absence of --sort'" case, first explicitly add "refname" to the string list
*before* parsing options. This alone doesn't actually change any behavior,
since 'compare_refs()' already falls back on comparing refnames if two refs
are equal w.r.t all other sort keys.
Now that the string list is populated by default, '--no-sort' is the only
way to empty the 'sorting_options' string list. Update
'ref_sorting_options()' to return a NULL 'struct ref_sorting *' if the
string list is empty, and add a condition to 'ref_array_sort()' to skip the
sort altogether if the sort structure is NULL. Note that other functions
using 'struct ref_sorting *' do not need any changes because they already
ignore NULL values.
Finally, remove the condition around sorting in 'ls-remote', since it's no
longer necessary. Unlike 'for-each-ref' et. al., it does *not* do any
sorting by default. This default is preserved by simply leaving its sort key
string list empty before parsing options; if no additional sort keys are
set, 'struct ref_sorting *' is NULL and sorting is skipped.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andy Koppe [Tue, 14 Nov 2023 21:43:39 +0000 (21:43 +0000)]
rebase: rewrite --(no-)autosquash documentation
Rewrite the description of the rebase --(no-)autosquash options to try
to make it a bit clearer. Don't use "the '...'" to refer to part of a
commit message, mention how --interactive can be used to review the
todo list, and add a bit more detail on commit --squash/amend.
Signed-off-by: Andy Koppe <andy.koppe@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andy Koppe [Tue, 14 Nov 2023 21:43:38 +0000 (21:43 +0000)]
rebase: support --autosquash without -i
The rebase --autosquash option is quietly ignored when used without
--interactive (apart from preventing preemptive fast-forwarding and
triggering conflicts with apply backend options).
Change that to support --autosquash without --interactive, by dropping
its restriction to REBASE_INTERACTIVE_EXCPLICIT mode. When used this
way, auto-squashing is done without opening the todo list editor.
Drop the -i requirement from the --autosquash description, and amend
t3415-rebase-autosquash.sh to test the option and the rebase.autoSquash
config variable with and without -i.
Signed-off-by: Andy Koppe <andy.koppe@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andy Koppe [Tue, 14 Nov 2023 21:43:37 +0000 (21:43 +0000)]
rebase: fully ignore rebase.autoSquash without -i
Setting the rebase.autoSquash config variable to true implies a couple
of restrictions: it prevents preemptive fast-forwarding and it triggers
conflicts with apply backend options. However, it only actually results
in auto-squashing when combined with the --interactive (or -i) option,
due to code in run_specific_rebase() that disables auto-squashing unless
the REBASE_INTERACTIVE_EXPLICIT flag is set.
Doing autosquashing for rebase.autoSquash without --interactive would be
problematic in terms of backward compatibility, but conversely, there is
no need for the aforementioned restrictions without --interactive.
So drop the options.config_autosquash check from the conditions for
clearing allow_preemptive_ff, as the case where it is combined with
--interactive is already covered by the REBASE_INTERACTIVE_EXPLICIT
flag check above it.
Also drop the "apply options are incompatible with rebase.autoSquash"
error, because it is unreachable if it is restricted to --interactive,
as apply options already cause an error when used with --interactive.
Drop the tests for the error from t3422-rebase-incompatible-options.sh,
which has separate tests for the conflicts of --interactive with apply
options.
When neither --autosquash nor --no-autosquash are given, only set
options.autosquash to true if rebase.autosquash is combined with
--interactive.
Don't initialize options.config_autosquash to -1, as there is no need to
distinguish between rebase.autoSquash being unset or explicitly set to
false.
Finally, amend the rebase.autoSquash documentation to say it only
affects interactive mode.
Signed-off-by: Andy Koppe <andy.koppe@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Mon, 13 Nov 2023 23:17:51 +0000 (23:17 +0000)]
glossary: add definitions for dereference & peel
Add 'gitglossary' definitions for "dereference" (as it used for both symrefs
and objects) and "peel". These terms are used in options and documentation
throughout Git, but they are not clearly defined anywhere and the behavior
they refer to depends heavily on context. Provide explicit definitions to
clarify existing documentation to users and help contributors to use the
most appropriate terminology possible in their additions to Git.
Update other definitions in the glossary that use the term "dereference" to
link to 'def_dereference'.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a late amendment of 4a6e4b960263 (CI: remove Travis CI support,
2021-11-23), whereby the `.prove` file (being written by the `prove`
command that is used to run the test suite) is no longer retained
between CI builds: This feature was only ever used in the Travis CI
builds, we tried for a while to do the same in Azure Pipelines CI runs
(but I gave up on it after a while), and we never used that feature in
GitHub Actions (nor does the new GitLab CI code use it).
Retaining the Prove cache has been fragile from the start, even though
the idea seemed good at the time, the idea being that the `.prove` file
caches information about previous `prove` runs (`save`) and uses them
(`slow`) to run the tests in the order from longer-running to shorter
ones, making optimal use of the parallelism implied by `--jobs=<N>`.
However, using a Prove cache can cause some surprising behavior: When
the `prove` caches information about a test script it has run,
subsequent `prove` runs (with `--state=slow`) will run the same test
script again even if said script is not specified on the `prove`
command-line!
So far, this bug did not matter. Right until d8f416bbb87c (ci: run unit
tests in CI, 2023-11-09) did it not matter.
But starting with that commit, we invoke `prove` _twice_ in CI, once to
run the regular test suite of regression test scripts, and once to run
the unit tests. Due to the bug, the second invocation re-runs all of the
tests that were already run as part of the first invocation. This not
only wastes build minutes, it also frequently causes the `osx-*` jobs to
fail because they already take a long time and now are likely to run
into a timeout.
The worst part about it is that there is actually no benefit to keep
running with `--state=slow,save`, ever since we decided no longer to
try to reuse the Prove cache between CI runs.
So let's just drop that Prove option and live happily ever after.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Makefile: stop using `test -o` when unlinking duplicate executables
When building executables we may end up with both `foo` and `foo.exe` in
the project's root directory. This can cause issues on Cygwin, which is
why we unlink the `foo` binary (see 6fc301bbf68 (Makefile: remove $foo
when $foo.exe is built/installed., 2007-01-10)). This step is skipped if
either:
- `foo` is a directory, which can happen when building Git on
Windows via MSVC (see ade2ca0ca9f (Do not try to remove
directories when removing old links, 2009-10-27)).
- `foo` is a hardlink to `foo.exe`, which can happen on Cygwin (see 0d768f7c8f1 (Makefile: building git in cygwin 1.7.0, 2008-08-15)).
These two conditions are currently chained together via `test -o`, which
is discouraged by our code style guide. Convert the recipe to instead
use an `if` statement with `&&`'d conditions, which both matches our
style guide and is easier to ready.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
contrib/subtree: convert subtree type check to use case statement
The `subtree_for_commit ()` helper function asserts that the subtree
identified by its parameters are either a commit or tree. This is done
via the `-o` parameter of test, which is discouraged.
Refactor the code to instead use a switch statement over the type.
Despite being aligned with our coding guidelines, the resulting code is
arguably also easier to read.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
contrib/subtree: stop using `-o` to test for number of args
Functions in git-subtree.sh all assert that they are being passed the
correct number of arguments. In cases where we accept a variable number
of arguments we assert this via a single call to `test` with `-o`, which
is discouraged by our coding guidelines.
Convert these cases to stop doing so. This requires us to decompose
assertions of the style `assert test $# = 2 -o $# = 3` into two calls
because we have no easy way to logically chain statements passed to the
assert function.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
global: convert trivial usages of `test <expr> -a/-o <expr>`
Our coding guidelines say to not use `test` with `-a` and `-o` because
it can easily lead to bugs. Convert trivial cases where we still use
these to instead instead concatenate multiple invocations of `test` via
`&&` and `||`, respectively.
While not all of the converted instances can cause ambiguity, it is
worth getting rid of all of them regardless:
- It becomes easier to reason about the code as we do not have to
argue why one use of `-a`/`-o` is okay while another one isn't.
- We don't encourage people to use these expressions.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t9164: fix inability to find basename(1) in Subversion hooks
Hooks executed by Subversion are spawned with an empty environment. By
default, not even variables like PATH will be propagated to them. In
order to ensure that we're still able to find required executables, we
thus write the current PATH variable into the hook script itself and
then re-export it in t9164.
This happens too late in the script though, as we already tried to
execute the basename(1) utility before exporting the PATH variable. This
tends to work on most platforms as the fallback value of PATH for Bash
(see `getconf PATH`) is likely to contain this binary. But on more
exotic platforms like NixOS this is not the case, and thus the test
fails.
While we could work around this issue by simply setting PATH earlier, it
feels fragile to inject a user-controlled value into the script and have
the shell interpret it. Instead, we can refactor the hook setup to write
a `hooks-env` file that configures PATH for us. Like this, Subversion
will know to set up the environment as expected for all hooks.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t/lib-httpd: stop using legacy crypt(3) for authentication
When setting up httpd for our tests, we also install a passwd and
proxy-passwd file that contain the test user's credentials. These
credentials currently use crypt(3) as the password encryption schema.
This schema can be considered deprecated nowadays as it is not safe
anymore. Quoting Apache httpd's documentation [1]:
> Unix only. Uses the traditional Unix crypt(3) function with a
> randomly-generated 32-bit salt (only 12 bits used) and the first 8
> characters of the password. Insecure.
This is starting to cause issues in modern Linux distributions. glibc
has deprecated its libcrypt library that used to provide crypt(3) in
favor of the libxcrypt library. This newer replacement provides a
compile time switch to disable insecure password encryption schemata,
which causes crypt(3) to always return `EINVAL`. The end result is that
httpd tests that exercise authentication will fail on distros that use
libxcrypt without these insecure encryption schematas.
Regenerate the passwd files to instead use the default password
encryption schema, which is md5. While it feels kind of funny that an
MD5-based encryption schema should be more secure than anything else, it
is the current default and supported by all platforms. Furthermore, it
really doesn't matter all that much given that these files are only used
for testing purposes anyway.
t/lib-httpd: dynamically detect httpd and modules path
In order to set up the Apache httpd server, we need to locate both the
httpd binary and its default module path. This is done with a hardcoded
list of locations that we scan. While this works okayish with distros
that more-or-less follow the Filesystem Hierarchy Standard, it falls
apart on others like NixOS that don't.
While it is possible to specify these paths via `LIB_HTTPD_PATH` and
`LIB_HTTPD_MODULE_PATH`, it is not a nice experience for the developer
to figure out how to set those up. And in fact we can do better by
dynamically detecting both httpd and its module path at runtime:
- The httpd binary can be located via PATH.
- The module directory can (in many cases) be derived via the
`HTTPD_ROOT` compile-time variable.
Amend the code to do so.
Note that the new runtime-detected paths will only be used as a fallback
in case none of the hardcoded paths are usable. For the PATH lookup this
is because httpd is typically installed into "/usr/sbin", which is often
not included in the user's PATH variable. And the module path detection
relies on a configured httpd installation and may thus not work in all
cases, either.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simon Ser [Thu, 9 Nov 2023 11:19:56 +0000 (11:19 +0000)]
format-patch: fix ignored encode_email_headers for cover letter
When writing the cover letter, the encode_email_headers option was
ignored. That is, UTF-8 subject lines and email addresses were
written out as-is, without any Q-encoding, even if
--encode-email-headers was passed on the command line.
This is due to encode_email_headers not being copied over from
struct rev_info to struct pretty_print_context. Fix that and add
a test.
Signed-off-by: Simon Ser <contact@emersion.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The unit tests should also be available e.g. in Visual Studio's Test
Explorer when configuring Git's source code via CMake.
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The primary purpose of Git's CMake definition is to allow developing Git
in Visual Studio. As part of that, the CTest feature allows running
individual test scripts conveniently in Visual Studio's Test Explorer.
However, this Test Explorer's design targets object-oriented languages
and therefore expects the test names in the form
`<namespace>.<class>.<testname>`. And since we specify the full path
of the test scripts instead, including the ugly `/.././t/` part, these
dots confuse the Test Explorer and it uses a large part of the path as
"namespace".
Let's just use `t.suite.<name>` instead. This presents the tests in
Visual Studio's Test Explorer in the following form by default (i.e.
unless the user changes the view via the "Group by" menu):
◢ ◈ git
◢ ◈ t
◢ ◈ suite
◈ t0000-basic
◈ t0001-init
◈ t0002-gitfile
[...]
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
artifacts-tar: when including `.dll` files, don't forget the unit-tests
As of recent, Git also builds executables in `t/unit-tests/`. For
technical reasons, when building with CMake and Visual C, the
dependencies (".dll files") need to be copied there, too, otherwise
running the executable will fail "due to missing dependencies".
The CMake definition already contains the directives to copy those
`.dll` files, but we also need to adjust the `artifacts-tar` rule in
the `Makefile` accordingly to let the `vs-test` job in the CI runs
pass successfully.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Visual C interpolates `__FILE__` with the absolute _Windows_ path of
the source file. GCC interpolates it with the relative path, and the
tests even verify that.
So let's make sure that the unit tests only emit such paths.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
unit-tests: do not mistake `.pdb` files for being executable
When building the unit tests via CMake, the `.pdb` files are built.
Those are, essentially, files containing the debug information
separately from the executables.
Let's not confuse them with the executables we actually want to run.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Josh Steadmon [Thu, 9 Nov 2023 18:50:44 +0000 (10:50 -0800)]
ci: run unit tests in CI
Run unit tests in both Cirrus and GitHub CI. For sharded CI instances
(currently just Windows on GitHub), run only on the first shard. This is
OK while we have only a single unit test executable, but we may wish to
distribute tests more evenly when we add new unit tests in the future.
We may also want to add more status output in our unit test framework,
so that we can do similar post-processing as in
ci/lib.sh:handle_failed_tests().
Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Thu, 9 Nov 2023 18:50:43 +0000 (10:50 -0800)]
unit tests: add TAP unit test framework
This patch contains an implementation for writing unit tests with TAP
output. Each test is a function that contains one or more checks. The
test is run with the TEST() macro and if any of the checks fail then the
test will fail. A complete program that tests STRBUF_INIT would look
like
int main(void)
{
TEST(t_static_init(), "static initialization works);
return test_done();
}
The output of this program would be
ok 1 - static initialization works
1..1
If any of the checks in a test fail then they print a diagnostic message
to aid debugging and the test will be reported as failing. For example a
failing integer check would look like
# check "x >= 3" failed at my-test.c:102
# left: 2
# right: 3
not ok 1 - x is greater than or equal to three
There are a number of check functions implemented so far. check() checks
a boolean condition, check_int(), check_uint() and check_char() take two
values to compare and a comparison operator. check_str() will check if
two strings are equal. Custom checks are simple to implement as shown in
the comments above test_assert() in test-lib.h.
Tests can be skipped with test_skip() which can be supplied with a
reason for skipping which it will print. Tests can print diagnostic
messages with test_msg(). Checks that are known to fail can be wrapped
in TEST_TODO().
There are a couple of example test programs included in this
patch. t-basic.c implements some self-tests and demonstrates the
diagnostic output for failing test. The output of this program is
checked by t0080-unit-test-output.sh. t-strbuf.c shows some example
unit tests for strbuf.c
The unit tests will be built as part of the default "make all" target,
to avoid bitrot. If you wish to build just the unit tests, you can run
"make build-unit-tests". To run the tests, you can use "make unit-tests"
or run the test binaries directly, as in "./t/unit-tests/bin/t-strbuf".
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Josh Steadmon [Thu, 9 Nov 2023 18:50:42 +0000 (10:50 -0800)]
unit tests: add a project plan document
In our current testing environment, we spend a significant amount of
effort crafting end-to-end tests for error conditions that could easily
be captured by unit tests (or we simply forgo some hard-to-setup and
rare error conditions). Describe what we hope to accomplish by
implementing unit tests, and explain some open questions and milestones.
Discuss desired features for test frameworks/harnesses, and provide a
comparison of several different frameworks. Finally, document our
rationale for implementing a custom framework.
Co-authored-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 9 Nov 2023 07:26:28 +0000 (02:26 -0500)]
commit-graph: mark chunk error messages for translation
The patches from f32af12cee (Merge branch 'jk/chunk-bounds', 2023-10-23)
added many new untranslated error messages. While it's unlikely for most
users to see these messages at all, most of the other commit-graph error
messages are translated (and likewise for the matching midx messages).
Let's mark them all for consistency (and to help any poor unfortunate
user who does manage to find a broken graph file).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 9 Nov 2023 07:25:24 +0000 (02:25 -0500)]
commit-graph: drop verify_commit_graph_lite()
As we've moved all of the checks from this function directly into the
chunk-reading code used by the caller (and there is only one caller), we
can just drop it entirely.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>