Taylor Blau [Tue, 30 Mar 2021 15:04:26 +0000 (11:04 -0400)]
pack-revindex: read multi-pack reverse indexes
Implement reading for multi-pack reverse indexes, as described in the
previous patch.
Note that these functions don't yet have any callers, and won't until
multi-pack reachability bitmaps are introduced in a later patch series.
In the meantime, this patch implements some of the infrastructure
necessary to support multi-pack bitmaps.
There are three new functions exposed by the revindex API:
- load_midx_revindex(): loads the reverse index corresponding to the
given multi-pack index.
- midx_to_pack_pos() and pack_pos_to_midx(): these convert between the
multi-pack index and pseudo-pack order.
load_midx_revindex() and pack_pos_to_midx() are both relatively
straightforward.
load_midx_revindex() needs a few functions to be exposed from the midx
API. One to get the checksum of a midx, and another to get the .rev's
filename. Similar to recent changes in the packed_git struct, three new
fields are added to the multi_pack_index struct: one to keep track of
the size, one to keep track of the mmap'd pointer, and another to point
past the header and at the reverse index's data.
pack_pos_to_midx() simply reads the corresponding entry out of the
table.
midx_to_pack_pos() is the trickiest, since it needs to find an object's
position in the psuedo-pack order, but that order can only be recovered
in the .rev file itself. This mapping can be implemented with a binary
search, but note that the thing we're binary searching over isn't an
array of values, but rather a permuted order of those values.
So, when comparing two items, it's helpful to keep in mind the
difference. Instead of a traditional binary search, where you are
comparing two things directly, here we're comparing a (pack, offset)
tuple with an index into the multi-pack index. That index describes
another (pack, offset) tuple, and it is _those_ two tuples that are
compared.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a prerequisite to implementing multi-pack bitmaps, motivate and
describe the format and ordering of the multi-pack reverse index.
The subsequent patch will implement reading this format, and the patch
after that will implement writing it while producing a multi-pack index.
Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 30 Mar 2021 15:04:20 +0000 (11:04 -0400)]
midx: make some functions non-static
In a subsequent commit, pack-revindex.c will become responsible for
sorting a list of objects in the "MIDX pack order" (which will be
defined in the following patch). To do so, it will need to be know the
pack identifier and offset within that pack for each object in the MIDX.
The MIDX code already has functions for doing just that
(nth_midxed_offset() and nth_midxed_pack_int_id()), but they are
statically declared.
Since there is no reason that they couldn't be exposed publicly, and
because they are already doing exactly what the caller in
pack-revindex.c will want, expose them publicly so that they can be
reused there.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 30 Mar 2021 15:04:17 +0000 (11:04 -0400)]
midx: keep track of the checksum
write_midx_internal() uses a hashfile to write the multi-pack index, but
discards its checksum. This makes sense, since nothing that takes place
after writing the MIDX cares about its checksum.
That is about to change in a subsequent patch, when the optional
reverse index corresponding to the MIDX will want to include the MIDX's
checksum.
Store the checksum of the MIDX in preparation for that.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 30 Mar 2021 15:04:14 +0000 (11:04 -0400)]
midx: don't free midx_name early
A subsequent patch will need to refer back to 'midx_name' later on in
the function. In fact, this variable is already free()'d later on, so
this makes the later free() no longer redundant.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 30 Mar 2021 15:04:11 +0000 (11:04 -0400)]
midx: allow marking a pack as preferred
When multiple packs in the multi-pack index contain the same object, the
MIDX machinery must make a choice about which pack it associates with
that object. Prior to this patch, the lowest-ordered[1] pack was always
selected.
Pack selection for duplicate objects is relatively unimportant today,
but it will become important for multi-pack bitmaps. This is because we
can only invoke the pack-reuse mechanism when all of the bits for reused
objects come from the reuse pack (in order to ensure that all reused
deltas can find their base objects in the same pack).
To encourage the pack selection process to prefer one pack over another
(the pack to be preferred is the one a caller would like to later use as
a reuse pack), introduce the concept of a "preferred pack". When
provided, the MIDX code will always prefer an object found in a
preferred pack over any other.
No format changes are required to store the preferred pack, since it
will be able to be inferred with a corresponding MIDX bitmap, by looking
up the pack associated with the object in the first bit position (this
ordering is described in detail in a subsequent commit).
[1]: the ordering is specified by MIDX internals; for our purposes we
can consider the "lowest ordered" pack to be "the one with the
most-recent mtime.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 30 Mar 2021 15:04:07 +0000 (11:04 -0400)]
t/helper/test-read-midx.c: add '--show-objects'
The 'read-midx' helper is used in places like t5319 to display basic
information about a multi-pack-index.
In the next patch, the MIDX writing machinery will learn a new way to
choose from which pack an object is selected when multiple copies of
that object exist.
To disambiguate which pack introduces an object so that this feature can
be tested, add a '--show-objects' option which displays additional
information about each object in the MIDX.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
--object-dir <file> object directory containing set of packfile and pack-index pairs
--progress force progress reporting
While we're at it, clean up some duplication between the "no sub-command"
and "unrecognized sub-command" conditionals.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 30 Mar 2021 15:04:01 +0000 (11:04 -0400)]
builtin/multi-pack-index.c: don't enter bogus cmd_mode
Even before the recent refactoring, 'git multi-pack-index' calls
'trace2_cmd_mode()' before verifying that the sub-command is recognized.
Push this call down into the individual sub-commands so that we don't
enter a bogus command mode.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 30 Mar 2021 15:03:57 +0000 (11:03 -0400)]
builtin/multi-pack-index.c: split sub-commands
Handle sub-commands of the 'git multi-pack-index' builtin (e.g.,
"write", "repack", etc.) separately from one another. This allows
sub-commands with unique options, without forcing cmd_multi_pack_index()
to reject invalid combinations itself.
This comes at the cost of some duplication and boilerplate. Luckily, the
duplication is reduced to a minimum, since common options are shared
among sub-commands due to a suggestion by Ævar. (Sub-commands do have to
retain the common options, too, since this builtin accepts common
options on either side of the sub-command).
Roughly speaking, cmd_multi_pack_index() parses options (including
common ones), and stops at the first non-option, which is the
sub-command. It then dispatches to the appropriate sub-command, which
parses the remaining options (also including common options).
Unknown options are kept by the sub-commands in order to detect their
presence (and complain that too many arguments were given).
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 30 Mar 2021 15:03:54 +0000 (11:03 -0400)]
builtin/multi-pack-index.c: define common usage with a macro
Factor out the usage message into pieces corresponding to each mode.
This avoids options specific to one sub-command from being shared with
another in the usage.
A subsequent commit will use these #define macros to have usage
variables for each sub-command without duplicating their contents.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that there is a shared 'flags' member in the options structure,
there is no need to keep track of whether to force progress or not,
since ultimately the decision of whether or not to show a progress meter
is controlled by a bit in the flags member.
Manipulate that bit directly, and drop the now-unnecessary 'progress'
field while we're at it.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 30 Mar 2021 15:03:47 +0000 (11:03 -0400)]
builtin/multi-pack-index.c: inline 'flags' with options
Subcommands of the 'git multi-pack-index' command (e.g., 'write',
'verify', etc.) will want to optionally change a set of shared flags
that are eventually passed to the MIDX libraries.
Right now, options and flags are handled separately. That's fine, since
the options structure is never passed around. But a future patch will
make it so that common options shared by all sub-commands are defined in
a common location. That means that "flags" would have to become a global
variable.
Group it with the options structure so that we reduce the number of
global variables we have overall.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 24 Feb 2021 23:26:14 +0000 (15:26 -0800)]
Merge branch 'ds/chunked-file-api' into tb/reverse-midx
* ds/chunked-file-api:
commit-graph.c: display correct number of chunks when writing
chunk-format: add technical docs
chunk-format: restore duplicate chunk checks
midx: use 64-bit multiplication for chunk sizes
midx: use chunk-format read API
commit-graph: use chunk-format read API
chunk-format: create read chunk API
midx: use chunk-format API in write_midx_internal()
midx: drop chunk progress during write
midx: return success/failure in chunk write methods
midx: add num_large_offsets to write_midx_context
midx: add pack_perm to write_midx_context
midx: add entries to write_midx_context
midx: use context in write_midx_pack_names()
midx: rename pack_info to write_midx_context
commit-graph: use chunk-format write API
chunk-format: create chunk format write API
commit-graph: anonymize data in chunk_write_fn
Taylor Blau [Wed, 24 Feb 2021 17:12:22 +0000 (12:12 -0500)]
commit-graph.c: display correct number of chunks when writing
When writing a commit-graph, a progress meter is shown which indicates
the number of pieces of data to write (one per commit in each chunk).
In 47410aa837 (commit-graph: use chunk-format write API, 2021-02-18),
the number of chunks became tracked by the new chunk-format API. But a
stray local variable was left behind from when write_commit_graph_file()
used to keep track of the same.
Since this was no longer updated after 47410aa837, the progress meter
appeared broken:
$ git commit-graph write --reachable
Expanding reachable commits in commit graph: 837569, done.
Writing out commit graph in 3 passes: 166% (4187845/2512707), done.
Drop the local variable and rely instead on the chunk-format API to tell
us the correct number of chunks.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 23 Feb 2021 00:12:43 +0000 (16:12 -0800)]
Merge branch 'ab/test-lib'
Test framework clean-up.
* ab/test-lib:
test-lib-functions: assert correct parameter count
test-lib-functions: remove bug-inducing "diagnostics" helper param
test libs: rename "diff-lib" to "lib-diff"
t/.gitattributes: sort lines
test-lib-functions: move function to lib-bitmap.sh
test libs: rename gitweb-lib.sh to lib-gitweb.sh
test libs: rename bundle helper to "lib-bundle.sh"
test-lib-functions: remove generate_zero_bytes() wrapper
test-lib-functions: move test_set_index_version() to its user
test lib: change "error" to "BUG" as appropriate
test-lib: remove check_var_migration
Junio C Hamano [Tue, 23 Feb 2021 00:12:43 +0000 (16:12 -0800)]
Merge branch 'ab/pager-exit-log'
When a pager spawned by us exited, the trace log did not record its
exit status correctly, which has been corrected.
* ab/pager-exit-log:
pager: properly log pager exit code when signalled
run-command: add braces for "if" block in wait_or_whine()
pager: test for exit code with and without SIGPIPE
pager: refactor wait_for_pager() function
Junio C Hamano [Tue, 23 Feb 2021 00:12:42 +0000 (16:12 -0800)]
Merge branch 'bc/signed-objects-with-both-hashes'
Signed commits and tags now allow verification of objects, whose
two object names (one in SHA-1, the other in SHA-256) are both
signed.
* bc/signed-objects-with-both-hashes:
gpg-interface: remove other signature headers before verifying
ref-filter: hoist signature parsing
commit: allow parsing arbitrary buffers with headers
gpg-interface: improve interface for parsing tags
commit: ignore additional signatures when parsing signed commits
ref-filter: switch some uses of unsigned long to size_t
Junio C Hamano [Tue, 23 Feb 2021 00:12:42 +0000 (16:12 -0800)]
Merge branch 'dl/stash-cleanup'
Documentation, code and test clean-up around "git stash".
* dl/stash-cleanup:
stash: declare ref_stash as an array
t3905: use test_cmp() to check file contents
t3905: replace test -s with test_file_not_empty
t3905: remove nested git in command substitution
t3905: move all commands into test cases
t3905: remove spaces after redirect operators
git-stash.txt: be explicit about subcommand options
Derrick Stolee [Thu, 18 Feb 2021 14:07:39 +0000 (14:07 +0000)]
chunk-format: add technical docs
The chunk-based file format is now an API in the code, but we should
also take time to document it as a file format. Specifically, it matches
the CHUNK LOOKUP sections of the commit-graph and multi-pack-index
files, but there are some commonalities that should be grouped in this
document.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:38 +0000 (14:07 +0000)]
chunk-format: restore duplicate chunk checks
Before refactoring into the chunk-format API, the commit-graph parsing
logic included checks for duplicate chunks. It is unlikely that we would
desire a chunk-based file format that allows duplicate chunk IDs in the
table of contents, so add duplicate checks into
read_table_of_contents().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:37 +0000 (14:07 +0000)]
midx: use 64-bit multiplication for chunk sizes
When calculating the sizes of certain chunks, we should use 64-bit
multiplication always. This allows us to properly predict the chunk
sizes without risk of overflow.
Other possible overflows were discovered by evaluating each
multiplication in midx.c and ensuring that at least one side of the
operator was of type size_t or off_t.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:36 +0000 (14:07 +0000)]
midx: use chunk-format read API
Instead of parsing the table of contents directly, use the chunk-format
API methods read_table_of_contents() and pair_chunk(). In particular, we
can use the return value of pair_chunk() to generate an error when a
required chunk is missing.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:35 +0000 (14:07 +0000)]
commit-graph: use chunk-format read API
Instead of parsing the table of contents directly, use the chunk-format
API methods read_table_of_contents() and pair_chunk(). While the current
implementation loses the duplicate-chunk detection, that will be added
in a future change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:34 +0000 (14:07 +0000)]
chunk-format: create read chunk API
Add the capability to read the table of contents, then pair the chunks
with necessary logic using read_chunk_fn pointers. Callers will be added
in future changes, but the typical outline will be:
1. initialize a 'struct chunkfile' with init_chunkfile(NULL).
2. call read_table_of_contents().
3. for each chunk to parse,
a. call pair_chunk() to assign a pointer with the chunk position, or
b. call read_chunk() to run a callback on the chunk start and size.
4. call free_chunkfile() to clear the 'struct chunkfile' data.
We are re-using the anonymous 'struct chunkfile' data, as it is internal
to the chunk-format API. This gives it essentially two modes: write and
read. If the same struct instance was used for both reads and writes,
then there would be failures.
Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:33 +0000 (14:07 +0000)]
midx: use chunk-format API in write_midx_internal()
The chunk-format API allows writing the table of contents and all chunks
using the anonymous 'struct chunkfile' type. We only need to convert our
local chunk logic to this API for the multi-pack-index writes to share
that logic with the commit-graph file writes.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:32 +0000 (14:07 +0000)]
midx: drop chunk progress during write
Most expensive operations in write_midx_internal() use the context
struct's progress member, and these indicate the process of the
expensive operations within the chunk writing methods. However, there is
a competing progress struct that counts the progress over all chunks.
This is not very helpful compared to the others, so drop it.
This also reduces our barriers to combining the chunk writing code with
chunk-format.c.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:31 +0000 (14:07 +0000)]
midx: return success/failure in chunk write methods
Historically, the chunk-writing methods in midx.c have returned the
amount of data written so the writer method could compare this with the
table of contents. This presents with some interesting issues:
1. If a chunk writing method has a bug that miscalculates the written
bytes, then we can satisfy the table of contents without actually
writing the right amount of data to the hashfile. The commit-graph
writing code checks the hashfile struct directly for a more robust
verification.
2. There is no way for a chunk writing method to gracefully fail.
Returning an int presents an opportunity to fail without a die().
3. The current pattern doesn't match chunk_write_fn type exactly, so we
cannot share code with commit-graph.c
For these reasons, convert the midx chunk writer methods to return an
'int'. Since none of them fail at the moment, they all return 0.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:30 +0000 (14:07 +0000)]
midx: add num_large_offsets to write_midx_context
In an effort to align write_midx_internal() with the chunk-format API,
continue to group necessary data into "struct write_midx_context". This
change collects the "uint32_t num_large_offsets" into the context. With
this new data, write_midx_large_offsets() now matches the
chunk_write_fn type.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:29 +0000 (14:07 +0000)]
midx: add pack_perm to write_midx_context
In an effort to align write_midx_internal() with the chunk-format API,
continue to group necessary data into "struct write_midx_context". This
change collects the "uint32_t *pack_perm" and large_offsets_needed bit
into the context.
Update write_midx_object_offsets() to match chunk_write_fn.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:28 +0000 (14:07 +0000)]
midx: add entries to write_midx_context
In an effort to align write_midx_internal() with the chunk-format API,
continue to group necessary data into "struct write_midx_context". This
change collects the "struct pack_midx_entry *entries" list and its count
into the context.
Update write_midx_oid_fanout() and write_midx_oid_lookup() to take the
context directly, as these are easy conversions with this new data.
Only the callers of write_midx_object_offsets() and
write_midx_large_offsets() are updated here, since additional data in
the context before those methods can match chunk_write_fn.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:27 +0000 (14:07 +0000)]
midx: use context in write_midx_pack_names()
In an effort to align the write_midx_internal() to use the chunk-format
API, start converting chunk writing methods to match chunk_write_fn. The
first case is to convert write_midx_pack_names() to take "void *data".
We already have the necessary data in "struct write_midx_context", so
this conversion is rather mechanical.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:26 +0000 (14:07 +0000)]
midx: rename pack_info to write_midx_context
In an effort to streamline our chunk-based file formats, align some of
the code structure in write_midx_internal() to be similar to the
patterns in write_commit_graph_file().
Specifically, let's create a "struct write_midx_context" that can be
used as a data parameter to abstract function types.
This change only renames "struct pack_info" to "struct
write_midx_context" and the names of instances from "packs" to "ctx". In
future changes, we will expand the data inside "struct
write_midx_context" and align our chunk-writing method with the
chunk-format API.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:25 +0000 (14:07 +0000)]
commit-graph: use chunk-format write API
The commit-graph write logic is ready to make use of the chunk-format
write API. Each chunk write method is already in the correct prototype.
We only need to use the 'struct chunkfile' pointer and the correct API
calls.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 18 Feb 2021 14:07:24 +0000 (14:07 +0000)]
chunk-format: create chunk format write API
In anticipation of combining the logic from the commit-graph and
multi-pack-index file formats, create a new chunk-format API. Use a
'struct chunkfile' pointer to keep track of data that has been
registered for writes. This struct is anonymous outside of
chunk-format.c to ensure no user attempts to interfere with the data.
The next change will use this API in commit-graph.c, but the general
approach is:
1. initialize the chunkfile with init_chunkfile(f).
2. add chunks in the intended writing order with add_chunk().
3. write any header information to the hashfile f.
4. write the chunkfile data using write_chunkfile().
5. free the chunkfile struct using free_chunkfile().
Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 18 Feb 2021 01:21:43 +0000 (17:21 -0800)]
Merge branch 'js/reflog-expire-stale-fix'
"git reflog expire --stale-fix" can be used to repair the reflog by
removing entries that refer to objects that have been pruned away,
but was not careful to tolerate missing objects.
* js/reflog-expire-stale-fix:
reflog expire --stale-fix: be generous about missing objects
Junio C Hamano [Thu, 18 Feb 2021 01:21:42 +0000 (17:21 -0800)]
Merge branch 'js/commit-graph-warning'
When certain features (e.g. grafts) used in the repository are
incompatible with the use of the commit-graph, we used to silently
turned commit-graph off; we now tell the user what we are doing.
* js/commit-graph-warning:
commit-graph: when incompatible with graphs, indicate why
Junio C Hamano [Thu, 18 Feb 2021 01:21:41 +0000 (17:21 -0800)]
Merge branch 'jk/mailmap-only-at-root'
The .mailmap is documented to be read only from the root level of a
working tree, but a stray file in a bare repository also was read
by accident, which has been corrected.
* jk/mailmap-only-at-root:
mailmap: only look for .mailmap in work tree
Junio C Hamano [Thu, 18 Feb 2021 01:21:41 +0000 (17:21 -0800)]
Merge branch 'mt/grep-cached-untracked'
"git grep --untracked" is meant to be "let's ALSO find in these
files on the filesystem" when looking for matches in the working
tree files, and does not make any sense if the primary search is
done against the index, or the tree objects. The "--cached" and
"--untracked" options have been marked as mutually incompatible.
* mt/grep-cached-untracked:
grep: error out if --untracked is used with --cached
Junio C Hamano [Thu, 18 Feb 2021 01:21:41 +0000 (17:21 -0800)]
Merge branch 'sh/mergetool-hideresolved'
"git mergetool" feeds three versions (base, local and remote) of
a conflicted path unmodified. The command learned to optionally
prepare these files with unconflicted parts already resolved.
* sh/mergetool-hideresolved:
mergetool: add per-tool support and overrides for the hideResolved flag
mergetool: break setup_tool out into separate initialization function
mergetool: add hideResolved configuration
Junio C Hamano [Thu, 18 Feb 2021 01:21:41 +0000 (17:21 -0800)]
Merge branch 'js/range-diff-one-side-only'
The "git range-diff" command learned "--(left|right)-only" option
to show only one side of the compared range.
* js/range-diff-one-side-only:
range-diff: offer --left-only/--right-only options
range-diff: move the diffopt initialization down one layer
range-diff: combine all options in a single data structure
range-diff: simplify code spawning `git log`
range-diff: libify the read_patches() function again
range-diff: avoid leaking memory in two error code paths
Junio C Hamano [Thu, 18 Feb 2021 01:21:41 +0000 (17:21 -0800)]
Merge branch 'js/range-diff-wo-dotdot'
There are other ways than ".." for a single token to denote a
"commit range", namely "<rev>^!" and "<rev>^-<n>", but "git
range-diff" did not understand them.
* js/range-diff-wo-dotdot:
range-diff(docs): explain how to specify commit ranges
range-diff/format-patch: handle commit ranges other than A..B
range-diff/format-patch: refactor check for commit range
Junio C Hamano [Thu, 18 Feb 2021 01:21:40 +0000 (17:21 -0800)]
Merge branch 'jt/clone-unborn-head'
"git clone" tries to locally check out the branch pointed at by
HEAD of the remote repository after it is done, but the protocol
did not convey the information necessary to do so when copying an
empty repository. The protocol v2 learned how to do so.
* jt/clone-unborn-head:
clone: respect remote unborn HEAD
connect, transport: encapsulate arg in struct
ls-refs: report unborn targets of symrefs
Junio C Hamano [Thu, 18 Feb 2021 01:21:40 +0000 (17:21 -0800)]
Merge branch 'mr/bisect-in-c-4'
Piecemeal of rewrite of "git bisect" in C continues.
* mr/bisect-in-c-4:
bisect--helper: retire `--check-and-set-terms` subcommand
bisect--helper: reimplement `bisect_skip` shell function in C
bisect--helper: retire `--bisect-auto-next` subcommand
bisect--helper: use `res` instead of return in BISECT_RESET case option
bisect--helper: retire `--bisect-write` subcommand
bisect--helper: reimplement `bisect_replay` shell function in C
bisect--helper: reimplement `bisect_log` shell function in C
Junio C Hamano [Thu, 18 Feb 2021 01:21:40 +0000 (17:21 -0800)]
Merge branch 'ds/commit-graph-genno-fix'
Fix incremental update of commit-graph file around corrected commit
date data.
* ds/commit-graph-genno-fix:
commit-graph: prepare commit graph
commit-graph: be extra careful about mixed generations
commit-graph: compute generations separately
commit-graph: validate layers for generation data
commit-graph: always parse before commit_graph_data_at()
commit-graph: use repo_parse_commit
Junio C Hamano [Thu, 18 Feb 2021 01:21:40 +0000 (17:21 -0800)]
Merge branch 'ak/corrected-commit-date'
The commit-graph learned to use corrected commit dates instead of
the generation number to help topological revision traversal.
* ak/corrected-commit-date:
doc: add corrected commit date info
commit-reach: use corrected commit dates in paint_down_to_common()
commit-graph: use generation v2 only if entire chain does
commit-graph: implement generation data chunk
commit-graph: implement corrected commit date
commit-graph: return 64-bit generation number
commit-graph: add a slab to store topological levels
t6600-test-reach: generalize *_three_modes
commit-graph: consolidate fill_commit_graph_info
revision: parse parent in indegree_walk_step()
commit-graph: fix regression when computing Bloom filters
Junio C Hamano [Fri, 12 Feb 2021 22:21:04 +0000 (14:21 -0800)]
Merge branch 'tb/precompose-prefix-too'
When commands are started from a subdirectory, they may have to
compare the path to the subdirectory (called prefix and found out
from $(pwd)) with the tracked paths. On macOS, $(pwd) and
readdir() yield decomposed path, while the tracked paths are
usually normalized to the precomposed form, causing mismatch. This
has been fixed by taking the same approach used to normalize the
command line arguments.
Junio C Hamano [Fri, 12 Feb 2021 22:21:04 +0000 (14:21 -0800)]
Merge branch 'jk/complete-branch-force-delete'
The command line completion (in contrib/) completed "git branch -d"
with branch names, but "git branch -D" offered tagnames in addition,
which has been corrected. "git branch -M" had the same problem.
* jk/complete-branch-force-delete:
doc/git-branch: fix awkward wording for "-c"
completion: handle other variants of "branch -m"
completion: treat "branch -D" the same way as "branch -d"
Junio C Hamano [Fri, 12 Feb 2021 22:21:04 +0000 (14:21 -0800)]
Merge branch 'tb/pack-revindex-on-disk'
Introduce an on-disk file to record revindex for packdata, which
traditionally was always created on the fly and only in-core.
* tb/pack-revindex-on-disk:
t5325: check both on-disk and in-memory reverse index
pack-revindex: ensure that on-disk reverse indexes are given precedence
t: support GIT_TEST_WRITE_REV_INDEX
t: prepare for GIT_TEST_WRITE_REV_INDEX
Documentation/config/pack.txt: advertise 'pack.writeReverseIndex'
builtin/pack-objects.c: respect 'pack.writeReverseIndex'
builtin/index-pack.c: write reverse indexes
builtin/index-pack.c: allow stripping arbitrary extensions
pack-write.c: prepare to write 'pack-*.rev' files
packfile: prepare for the existence of '*.rev' files
Junio C Hamano [Fri, 12 Feb 2021 22:21:04 +0000 (14:21 -0800)]
Merge branch 'ab/tests-various-fixup'
Various test updates.
* ab/tests-various-fixup:
rm tests: actually test for SIGPIPE in SIGPIPE test
archive tests: use a cheaper "zipinfo -h" invocation to get header
upload-pack tests: avoid a non-zero "grep" exit status
git-svn tests: rewrite brittle tests to use "--[no-]merges".
git svn mergeinfo tests: refactor "test -z" to use test_must_be_empty
git svn mergeinfo tests: modernize redirection & quoting style
cache-tree tests: explicitly test HEAD and index differences
cache-tree tests: use a sub-shell with less indirection
cache-tree tests: remove unused $2 parameter
cache-tree tests: refactor for modern test style
Add assertions of the correct parameter count of various functions, in
particularly the wrappers for the shell "test" built-in.
In an earlier commit we fixed a bug with an incorrect number of
arguments being passed to "test_path_is_{file,missing}". Let's also
guard other similar functions from the same sort of misuse.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the optional "diagnostics" parameter of the
test_path_is_{file,dir,missing} functions.
We have a lot of uses of these functions, but the only legitimate use
of the diagnostics parameter is from when the functions themselves
were introduced in 2caf20c52b7 (test-lib: user-friendly alternatives
to test [-d|-f|-e], 2010-08-10).
But as the the rest of this diff demonstrates its presence did more to
silently introduce bugs in our tests. Fix such bugs in the tests added
in ae4e89e549b (gc: add --keep-largest-pack option, 2018-04-15), and c04ba51739a (t6046: testcases checking whether updates can be skipped
in a merge, 2018-04-19).
Let's also assert that those functions are called with exactly one
parameter, a follow-up commit will add similar asserts to other
functions in test-lib-functions.sh that we didn't have existing misuse
of.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the "diff-lib" to "lib-diff". With this rename and preceding
commits there is no remaining t/*lib* which doesn't follow the
convention of being called t/lib-*.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 11 Feb 2021 21:58:43 +0000 (13:58 -0800)]
Merge branch 'en/merge-ort-perf'
The "ort" merge strategy.
* en/merge-ort-perf:
merge-ort: begin performance work; instrument with trace2_region_* calls
merge-ort: ignore the directory rename split conflict for now
merge-ort: fix massive leak
Junio C Hamano [Thu, 11 Feb 2021 21:58:43 +0000 (13:58 -0800)]
Merge branch 'en/ort-directory-rename'
ORT merge strategy learns to infer "renamed directory" while
merging.
* en/ort-directory-rename:
merge-ort: fix a directory rename detection bug
merge-ort: process_renames() now needs more defensiveness
merge-ort: implement apply_directory_rename_modifications()
merge-ort: add a new toplevel_dir field
merge-ort: implement handle_path_level_conflicts()
merge-ort: implement check_for_directory_rename()
merge-ort: implement apply_dir_rename() and check_dir_renamed()
merge-ort: implement compute_collisions()
merge-ort: modify collect_renames() for directory rename handling
merge-ort: implement handle_directory_level_conflicts()
merge-ort: implement compute_rename_counts()
merge-ort: copy get_renamed_dir_portion() from merge-recursive.c
merge-ort: add outline of get_provisional_directory_renames()
merge-ort: add outline for computing directory renames
merge-ort: collect which directories are removed in dirs_removed
merge-ort: initialize and free new directory rename data structures
merge-ort: add new data structures for directory rename detection
Andrew Klotz [Thu, 11 Feb 2021 20:30:53 +0000 (20:30 +0000)]
config: improve error message for boolean config
Currently invalid boolean config values return messages about 'bad
numeric', which is slightly misleading when the error was due to a
boolean value. We can improve the developer experience by returning a
boolean error message when we know the value is neither a bool text or
int.
before with an invalid boolean value of `non-boolean`, its unclear what
numeric is referring to:
fatal: bad numeric config value 'non-boolean' for 'commit.gpgsign': invalid unit
now the error message mentions `non-boolean` is a bad boolean value:
fatal: bad boolean config value 'non-boolean' for 'commit.gpgsign'
Signed-off-by: Andrew Klotz <agc.klotz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Tue, 9 Feb 2021 07:28:50 +0000 (23:28 -0800)]
t3905: remove nested git in command substitution
If a git command in a nested command substitution fails, it will be
silently ignored since only the return code of the outer command
substitutions is reported. Factor out nested command substitutions so
that the error codes of those commands are reported.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Tue, 9 Feb 2021 07:28:49 +0000 (23:28 -0800)]
t3905: move all commands into test cases
In order to modernize the tests, move commands that currently run
outside of test cases into a test case. Where possible, clean up files
that are produced using test_when_finished() but in the case where files
persist over multiple test cases, create a new test case to perform
cleanup.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Tue, 9 Feb 2021 07:28:48 +0000 (23:28 -0800)]
t3905: remove spaces after redirect operators
For shell scripts, the usual convention is for there to be no space
after redirection operators, (e.g. `>file`, not `> file`). Remove these
spaces wherever they appear.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Tue, 9 Feb 2021 07:28:47 +0000 (23:28 -0800)]
git-stash.txt: be explicit about subcommand options
Currently, the options for the `list` and `show` subcommands are just
listed as `<options>`. This seems to imply, from a cursory glance at the
summary, that they take the stash options listed below. However, reading
more carefully, we see that they take log options and diff options
respectively.
Make it more obvious that they take log and diff options by explicitly
stating this in the subcommand summary.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit-graph: when incompatible with graphs, indicate why
When `gc.writeCommitGraph = true`, it is possible that the commit-graph
is _still_ not written: replace objects, grafts and shallow repositories
are incompatible with the commit-graph feature.
Under such circumstances, we need to indicate to the user why the
commit-graph was not written instead of staying silent about it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reflog expire --stale-fix: be generous about missing objects
Whenever a user runs `git reflog expire --stale-fix`, the most likely
reason is that their repository is at least _somewhat_ corrupt. Which
means that it is more than just possible that some objects are missing.
If that is the case, that can currently let the command abort through
the phase where it tries to mark all reachable objects.
Instead of adding insult to injury, let's be gentle and continue as best
as we can in such a scenario, simply by ignoring the missing objects and
moving on.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff: plug memory leak from regcomp() on {log,diff} -I
Fix a memory leak in 296d4a94e7 (diff: add -I<regex> that ignores
matching changes, 2020-10-20) by freeing the memory it allocates in
the newly introduced diff_free(). See the previous commit for details
on that.
This memory leak was intentionally introduced in 296d4a94e7, see the
discussion on a previous iteration of it in
https://lore.kernel.org/git/xmqqeelycajx.fsf@gitster.c.googlers.com/
At that time freeing the memory was somewhat tedious, but since it
isn't anymore with the newly introduced diff_free() let's use it.
Let's retain the pattern for diff_free_file() and add a
diff_free_ignore_regex(), even though (unlike "diff_free_file") we
don't need to call it elsewhere. I think this'll make for more
readable code than gradually accumulating a giant diff_free()
function, sharing "int i" across unrelated code etc.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a diff_free() function to free anything we may have allocated in
the "diff_options" struct, and the ability to make calling it a noop
by setting "no_free" in "diff_options".
This is required because when e.g. "git diff" is run we'll allocate
things in that struct, use the diff machinery once, and then exit.
But if we run e.g. "git log -p" we're going to re-use what we
allocated across multiple diff_flush() calls, and only want to free
things at the end.
We've thus ended up with features like the recently added "diff -I"[1]
where we'll leak memory. As it turns out it could have simply used the
pattern established in 6ea57703f6 (log: prepare log/log-tree to reuse
the diffopt.close_file attribute, 2016-06-22).
Manually adding more such flags to things log_tree_commit() every time
we need to allocate something would be tedious. Let's instead move
that fclose() code it to a new diff_free(), in anticipation of freeing
more things in that function in follow-up commits.
Some functions such as log_tree_commit() need an idiom of optionally
retaining a previous "no_free", as they may either free the memory
themselves, or their caller may do so. I'm keeping that idiom in
log_show_early() for good measure, even though I don't think it's
currently called in this manner. It also gets passed an existing
"struct rev_info", so future callers may want to set the "no_free"
flag.
This change is a bit hard to read because while the freeing pattern
we're introducing isn't unusual, the "file" member is a special
snowflake. We usually don't want to fclose() it. This is because
"file" is usually stdout, in which case we don't want to fclose()
it. We only want to opt-in to closing it when we e.g. open a file on
the filesystem. Thus the opt-in "close_file" flag.
So the API in general just needs a "no_free" flag to defer freeing,
but the "file" member still needs its "close_file" flag. This is made
more confusing because while refactoring this code we could replace
some "close_file=0" with "no_free=1", whereas others need to set both
flags.
This is because there were some cases where an existing "close_file=0"
meant "let's defer deallocation", and others where it meant "we don't
want to close this file handle at all".
1. 296d4a94e7 (diff: add -I<regex> that ignores matching changes,
2020-10-20)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
brian m. carlson [Thu, 11 Feb 2021 02:08:06 +0000 (02:08 +0000)]
gpg-interface: remove other signature headers before verifying
When we have a multiply signed commit, we need to remove the signature
in the header before verifying the object, since the trailing signature
will not be over both pieces of data. Do so, and verify that we
validate the signature appropriately.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
brian m. carlson [Thu, 11 Feb 2021 02:08:05 +0000 (02:08 +0000)]
ref-filter: hoist signature parsing
When we parse a signature in the ref-filter code, we continually
increment the buffer pointer. Hoist the signature parsing above the
blank line delimiting headers and body so we can find the signature when
using a header to sign the buffer.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
brian m. carlson [Thu, 11 Feb 2021 02:08:04 +0000 (02:08 +0000)]
commit: allow parsing arbitrary buffers with headers
Currently only commits are signed with headers. However, in the future,
we'll also sign tags with headers as well. Let's refactor out a
function called parse_buffer_signed_by_header which does exactly that.
In addition, since we'll want to sign things other than commits this
way, let's call the function sign_with_header instead of do_sign_commit.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
brian m. carlson [Thu, 11 Feb 2021 02:08:03 +0000 (02:08 +0000)]
gpg-interface: improve interface for parsing tags
We have a function which parses a buffer with a signature at the end,
parse_signature, and this function is used for signed tags. However,
we'll need to store values for multiple algorithms, and we'll do this by
using a header for the non-default algorithm.
Adjust the parse_signature interface to store the parsed data in two
strbufs and turn the existing function into parse_signed_buffer. The
latter is still used in places where we know we always have a signed
buffer, such as push certs.
Adjust all the callers to deal with this new interface.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 11 Feb 2021 00:48:07 +0000 (16:48 -0800)]
Merge branch 'tb/ci-run-cocci-with-18.04'
The version of Ubuntu Linux used by default at GitHub Actions CI
has been updated to one that lack coccinelle; until it gets fixed,
work it around by sticking to the previous release (18.04).
* tb/ci-run-cocci-with-18.04:
.github/workflows/main.yml: run static-analysis on bionic
Junio C Hamano [Wed, 10 Feb 2021 22:48:33 +0000 (14:48 -0800)]
Merge branch 'ab/detox-gettext-tests'
Get rid of "GETTEXT_POISON" support altogether, which may or may
not be controversial.
* ab/detox-gettext-tests:
tests: remove uses of GIT_TEST_GETTEXT_POISON=false
tests: remove support for GIT_TEST_GETTEXT_POISON
ci: remove GETTEXT_POISON jobs
Junio C Hamano [Wed, 10 Feb 2021 22:48:33 +0000 (14:48 -0800)]
Merge branch 'jk/pretty-lazy-load-commit'
Some pretty-format specifiers do not need the data in commit object
(e.g. "%H"), but we were over-eager to load and parse it, which has
been made even lazier.
* jk/pretty-lazy-load-commit:
pretty: lazy-load commit data when expanding user-format
Junio C Hamano [Wed, 10 Feb 2021 22:48:32 +0000 (14:48 -0800)]
Merge branch 'js/rebase-i-commit-cleanup-fix'
When "git rebase -i" processes "fixup" insn, there is no reason to
clean up the commit log message, but we did the usual stripspace
processing. This has been corrected.
* js/rebase-i-commit-cleanup-fix:
rebase -i: do leave commit message intact in fixup! chains
Junio C Hamano [Wed, 10 Feb 2021 22:48:32 +0000 (14:48 -0800)]
Merge branch 'jk/t0000-cleanups'
Code clean-up.
* jk/t0000-cleanups:
t0000: consistently use single quotes for outer tests
t0000: run cleaning test inside sub-test
t0000: run prereq tests inside sub-test
t0000: keep clean-up tests together
Junio C Hamano [Wed, 10 Feb 2021 22:48:31 +0000 (14:48 -0800)]
Merge branch 'jk/use-oid-pos'
Code clean-up to ensure our use of hashtables using object names as
keys use the "struct object_id" objects, not the raw hash values.
* jk/use-oid-pos:
oid_pos(): access table through const pointers
hash_pos(): convert to oid_pos()
rerere: use strmap to store rerere directories
rerere: tighten rr-cache dirname check
rerere: check dirname format while iterating rr_cache directory
commit_graft_pos(): take an oid instead of a bare hash
Eric Wong [Wed, 10 Feb 2021 21:55:43 +0000 (21:55 +0000)]
t1500: ensure current --since= behavior remains
This behavior of git-rev-parse is observed since git 1.8.3.1
at least(*), and likely earlier versions.
At least one git-reliant project in-the-wild relies on this
current behavior of git-rev-parse being able to handle multiple
--since= arguments without squeezing identical results together.
So add a test to prevent the potential for regression in
downstream projects.
(*) 1.8.3.1 the version packaged for CentOS 7.x
Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sort the lines starting with "/", the only out-of-place line was added
along with most of the file in 614f4f0f350 (Fix the remaining tests
that failed with core.autocrlf=true, 2017-05-09).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>