Matheus Tavares [Tue, 4 May 2021 16:27:31 +0000 (13:27 -0300)]
parallel-checkout: add tests for basic operations
Add tests to populate the working tree during clone and checkout using
sequential and parallel mode, to confirm that they produce identical
results. Also test basic checkout mechanics, such as checking for
symlinks in the leading directories and the abidance to --force.
Note: some helper functions are added to a common lib file which is only
included by t2080 for now. But they will also be used by other
parallel-checkout tests in the following patches.
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Matheus Tavares [Tue, 4 May 2021 16:27:30 +0000 (13:27 -0300)]
checkout-index: add parallel checkout support
Allow checkout-index to use the parallel checkout framework, honoring
the checkout.workers configuration.
There are two code paths in checkout-index which call
`checkout_entry()`, and thus, can make use of parallel checkout:
`checkout_file()`, which is used to write paths explicitly given at the
command line; and `checkout_all()`, which is used to write all paths in
the index, when the `--all` option is given.
In both operation modes, checkout-index doesn't abort immediately on a
`checkout_entry()` failure. Instead, it tries to check out all remaining
paths before exiting with a non-zero exit code. To keep this behavior
when parallel checkout is being used, we must allow
`run_parallel_checkout()` to try writing the queued entries before we
exit, even if we already got an error code from a previous
`checkout_entry()` call.
However, `checkout_all()` doesn't return on errors, it calls `exit()`
with code 128. We could make it call `run_parallel_checkout()` before
exiting, but it makes the code easier to follow if we unify the exit
path for both checkout-index modes at `cmd_checkout_index()`, and let
this function take care of the interactions with the parallel checkout
API. So let's do that.
With this change, we also have to consider whether we want to keep using
128 as the error code for `git checkout-index --all`, while we use 1 for
`git checkout-index <path>` (even when the actual error is the same).
Since there is not much value in having code 128 only for `--all`, and
there is no mention about it in the docs (so it's unlikely that changing
it will break any existing script), let's make both modes exit with code
1 on `checkout_entry()` errors.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Matheus Tavares [Tue, 4 May 2021 16:27:29 +0000 (13:27 -0300)]
builtin/checkout.c: complete parallel checkout support
Pathspec-limited checkouts (like `git checkout *.txt`) are performed by
a code path that doesn't yet support parallel checkout because it calls
checkout_entry() directly, instead of unpack_trees(). Let's add parallel
checkout support for this code path too.
The transient cache entries allocated in checkout_merged() are now
allocated in a mem_pool which is only discarded after parallel checkout
finishes. This is done because the entries need to be valid when
run_parallel_checkout() is called.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Matheus Tavares [Tue, 4 May 2021 16:27:28 +0000 (13:27 -0300)]
make_transient_cache_entry(): optionally alloc from mem_pool
Allow make_transient_cache_entry() to optionally receive a mem_pool
struct in which it should allocate the entry. This will be used in the
following patch, to store some transient entries which should persist
until parallel checkout finishes.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make parallel checkout configurable by introducing two new settings:
checkout.workers and checkout.thresholdForParallelism. The first defines
the number of workers (where one means sequential checkout), and the
second defines the minimum number of entries to attempt parallel
checkout.
To decide the default value for checkout.workers, the parallel version
was benchmarked during three operations in the linux repo, with cold
cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and
checking out v5.8 from v5.7 (checkout II). The four tables below show
the mean run times and standard deviations for 5 runs in: a local file
system on SSD, a local file system on HDD, a Linux NFS server, and
Amazon EFS (all on Linux). Each parallel checkout test was executed with
the number of workers that brings the best overall results in that
environment.
Local SSD:
Sequential 10 workers Speedup
Clone 8.805 s ± 0.043 s 3.564 s ± 0.041 s 2.47 ± 0.03
Checkout I 9.678 s ± 0.057 s 4.486 s ± 0.050 s 2.16 ± 0.03
Checkout II 5.034 s ± 0.072 s 3.021 s ± 0.038 s 1.67 ± 0.03
Local HDD:
Sequential 10 workers Speedup
Clone 32.288 s ± 0.580 s 30.724 s ± 0.522 s 1.05 ± 0.03
Checkout I 54.172 s ± 7.119 s 54.429 s ± 6.738 s 1.00 ± 0.18
Checkout II 40.465 s ± 2.402 s 38.682 s ± 1.365 s 1.05 ± 0.07
Linux NFS server (v4.1, on EBS, single availability zone):
Sequential 32 workers Speedup
Clone 240.368 s ± 6.347 s 57.349 s ± 0.870 s 4.19 ± 0.13
Checkout I 242.862 s ± 2.215 s 58.700 s ± 0.904 s 4.14 ± 0.07
Checkout II 65.751 s ± 1.577 s 23.820 s ± 0.407 s 2.76 ± 0.08
EFS (v4.1, replicated over multiple availability zones):
Sequential 32 workers Speedup
Clone 922.321 s ± 2.274 s 210.453 s ± 3.412 s 4.38 ± 0.07
Checkout I 1011.300 s ± 7.346 s 297.828 s ± 0.964 s 3.40 ± 0.03
Checkout II 294.104 s ± 1.836 s 126.017 s ± 1.190 s 2.33 ± 0.03
The above benchmarks show that parallel checkout is most effective on
repositories located on an SSD or over a distributed file system. For
local file systems on spinning disks, and/or older machines, the
parallelism does not always bring a good performance. For this reason,
the default value for checkout.workers is one, a.k.a. sequential
checkout.
To decide the default value for checkout.thresholdForParallelism,
another benchmark was executed in the "Local SSD" setup, where parallel
checkout showed to be beneficial. This time, we compared the runtime of
a `git checkout -f`, with and without parallelism, after randomly
removing an increasing number of files from the Linux working tree. The
"sequential fallback" column below corresponds to the executions where
checkout.workers was 10 but checkout.thresholdForParallelism was equal
to the number of to-be-updated files plus one (so that we end up writing
sequentially). Each test case was sampled 15 times, and each sample had
a randomly different set of files removed. Here are the results:
sequential fallback 10 workers speedup
10 files 772.3 ms ± 12.6 ms 769.0 ms ± 13.6 ms 1.00 ± 0.02
20 files 780.5 ms ± 15.8 ms 775.2 ms ± 9.2 ms 1.01 ± 0.02
50 files 806.2 ms ± 13.8 ms 767.4 ms ± 8.5 ms 1.05 ± 0.02
100 files 833.7 ms ± 21.4 ms 750.5 ms ± 16.8 ms 1.11 ± 0.04
200 files 897.6 ms ± 30.9 ms 730.5 ms ± 14.7 ms 1.23 ± 0.05
500 files 1035.4 ms ± 48.0 ms 677.1 ms ± 22.3 ms 1.53 ± 0.09
1000 files 1244.6 ms ± 35.6 ms 654.0 ms ± 38.3 ms 1.90 ± 0.12
2000 files 1488.8 ms ± 53.4 ms 658.8 ms ± 23.8 ms 2.26 ± 0.12
From the above numbers, 100 files seems to be a reasonable default value
for the threshold setting.
Note: Up to 1000 files, we observe a drop in the execution time of the
parallel code with an increase in the number of files. This is a rather
odd behavior, but it was observed in multiple repetitions. Above 1000
files, the execution time increases according to the number of files, as
one would expect.
About the test environments: Local SSD tests were executed on an
i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local
HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores
with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running
Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge
instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large
instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing,
the linux repository was removed (or checked out back to its previous
state), and `sync && sysctl vm.drop_caches=3` was executed.
Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use multiple worker processes to distribute the queued entries and call
write_pc_item() in parallel for them. The items are distributed
uniformly in contiguous chunks. This minimizes the chances of two
workers writing to the same directory simultaneously, which could affect
performance due to lock contention in the kernel. Work stealing (or any
other format of re-distribution) is not implemented yet.
The protocol between the main process and the workers is quite simple.
They exchange binary messages packed in pkt-line format, and use
PKT-FLUSH to mark the end of input (from both sides). The main process
starts the communication by sending N pkt-lines, each corresponding to
an item that needs to be written. These packets contain all the
necessary information to load, smudge, and write the blob associated
with each item. Then it waits for the worker to send back N pkt-lines
containing the results for each item. The resulting packet must contain:
the identification number of the item that it refers to, the status of
the operation, and the lstat() data gathered after writing the file (iff
the operation was successful).
For now, checkout always uses a hardcoded value of 2 workers, only to
demonstrate that the parallel checkout framework correctly divides and
writes the queued entries. The next patch will add user configurations
and define a more reasonable default, based on tests with the said
settings.
Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
unpack-trees: add basic support for parallel checkout
This new interface allows us to enqueue some of the entries being
checked out to later uncompress them, apply in-process filters, and
write out the files in parallel. For now, the parallel checkout
machinery is enabled by default and there is no user configuration, but
run_parallel_checkout() just writes the queued entries in sequence
(without spawning additional workers). The next patch will actually
implement the parallelism and, later, we will make it configurable.
Note that, to avoid potential data races, not all entries are eligible
for parallel checkout. Also, paths that collide on disk (e.g.
case-sensitive paths in case-insensitive file systems), are detected by
the parallel checkout code and skipped, so that they can be safely
sequentially handled later. The collision detection works like the
following:
- If the collision was at basename (e.g. 'a/b' and 'a/B'), the framework
detects it by looking for EEXIST and EISDIR errors after an
open(O_CREAT | O_EXCL) failure.
- If the collision was at dirname (e.g. 'a/b' and 'A'), it is detected
at the has_dirs_only_path() check, which is done for the leading path
of each item in the parallel checkout queue.
Both verifications rely on the fact that, before enqueueing an entry for
parallel checkout, checkout_entry() makes sure that there is no file at
the entry's path and that its leading components are all real
directories. So, any later change in these conditions indicates that
there was a collision (either between two parallel-eligible entries or
between an eligible and an ineligible one).
After all parallel-eligible entries have been processed, the collided
(and thus, skipped) entries are sequentially fed to checkout_entry()
again. This is similar to the way the current code deals with
collisions, overwriting the previously checked out entries with the
subsequent ones. The only difference is that, since we no longer create
the files in the same order that they appear on index, we are not able
to determine which of the colliding entries will survive on disk (for
the classic code, it is always the last entry).
Co-authored-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 7 Apr 2021 23:54:09 +0000 (16:54 -0700)]
Merge branch 'ab/fsck-api-cleanup'
Fsck API clean-up.
* ab/fsck-api-cleanup:
fetch-pack: use new fsck API to printing dangling submodules
fetch-pack: use file-scope static struct for fsck_options
fetch-pack: don't needlessly copy fsck_options
fsck.c: move gitmodules_{found,done} into fsck_options
fsck.c: add an fsck_set_msg_type() API that takes enums
fsck.c: pass along the fsck_msg_id in the fsck_error callback
fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
fsck.c: give "FOREACH_MSG_ID" a more specific name
fsck.c: undefine temporary STR macro after use
fsck.c: call parse_msg_type() early in fsck_set_msg_type()
fsck.h: re-order and re-assign "enum fsck_msg_type"
fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
fsck.c: remove (mostly) redundant append_msg_id() function
fsck.c: rename variables in fsck_set_msg_type() for less confusion
fsck.h: use "enum object_type" instead of "int"
fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
fsck.c: refactor and rename common config callback
Junio C Hamano [Wed, 7 Apr 2021 23:54:09 +0000 (16:54 -0700)]
Merge branch 'js/security-md'
SECURITY.md that is facing individual contributors and end users
has been introduced. Also a procedure to follow when preparing
embargoed releases has been spelled out.
* js/security-md:
Document how we do embargoed releases
SECURITY: describe how to report vulnerabilities
Junio C Hamano [Wed, 7 Apr 2021 23:54:08 +0000 (16:54 -0700)]
Merge branch 'zh/commit-trailer'
"git commit" learned "--trailer <key>[=<value>]" option; together
with the interpret-trailers command, this will make it easier to
support custom trailers.
Junio C Hamano [Wed, 7 Apr 2021 23:54:08 +0000 (16:54 -0700)]
Merge branch 'ah/plugleaks'
Plug or annotate remaining leaks that trigger while running the
very basic set of tests.
* ah/plugleaks:
transport: also free remote_refs in transport_disconnect()
parse-options: don't leak alias help messages
parse-options: convert bitfield values to use binary shift
init-db: silence template_dir leak when converting to absolute path
init: remove git_init_db_config() while fixing leaks
worktree: fix leak in dwim_branch()
clone: free or UNLEAK further pointers when finished
reset: free instead of leaking unneeded ref
symbolic-ref: don't leak shortened refname in check_symref()
Junio C Hamano [Fri, 2 Apr 2021 21:43:14 +0000 (14:43 -0700)]
Merge branch 'mt/parallel-checkout-part-1'
Preparatory API changes for parallel checkout.
* mt/parallel-checkout-part-1:
entry: add checkout_entry_ca() taking preloaded conv_attrs
entry: move conv_attrs lookup up to checkout_entry()
entry: extract update_ce_after_write() from write_entry()
entry: make fstat_output() and read_blob_entry() public
entry: extract a header file for entry.c functions
convert: add classification for conv_attrs struct
convert: add get_stream_filter_ca() variant
convert: add [async_]convert_to_working_tree_ca() variants
convert: make convert_attrs() and convert structs public
When accessing a server with a URL like https://user:pass@site/, we
did not to fall back to the basic authentication with the
credential material embedded in the URL after the "Negotiate"
authentication failed. Now we do.
* cs/http-use-basic-after-failed-negotiate:
remote-curl: fall back to basic auth if Negotiate fails
Junio C Hamano [Tue, 30 Mar 2021 21:35:36 +0000 (14:35 -0700)]
Merge branch 'mt/checkout-remove-nofollow'
When "git checkout" removes a path that does not exist in the
commit it is checking out, it wasn't careful enough not to follow
symbolic links, which has been corrected.
* mt/checkout-remove-nofollow:
checkout: don't follow symlinks when removing entries
symlinks: update comment on threaded_check_leading_path()
It is customary not to begin the help text for each option given to
the parse-options API with a capital letter. Various (sub)commands'
option arrays don't follow the guideline provided by the parse_options
Documentation regarding the descriptions.
Downcase the first word of some option descriptions for "column"
and "range-diff".
Signed-off-by: Chinmoy Chakraborty <chinmoy12c@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dennis Ameling [Mon, 29 Mar 2021 12:41:45 +0000 (12:41 +0000)]
cmake(install): include vcpkg dlls
Our CMake configuration generates not only build definitions, but also
install definitions: After building Git using `msbuild git.sln`, the
built artifacts can be installed via `msbuild INSTALL.vcxproj`.
To specify _where_ the files should be installed, the
`-DCMAKE_INSTALL_PREFIX=<path>` option can be used when running CMake.
However, this process would really only install the files that were just
built. On Windows, we need more than that: We also need the `.dll` files
of the dependencies (such as libcurl). The `vcpkg` ecosystem, which we
use to obtain those dependencies, can be asked to install said `.dll`
files really easily, so let's do that.
This requires more than just the built `vcpkg` artifacts in the CI build
definition; We now clone the `vcpkg` repository so that the relevant
CMake scripts are available, in particular the ones related to defining
the toolchain.
Signed-off-by: Dennis Ameling <dennis@dennisameling.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
cmake: add a preparatory work-around to accommodate `vcpkg`
We are about to add support for installing the `.dll` files of Git's
dependencies (such as libcurl) in the CMake configuration. The `vcpkg`
ecosystem from which we get said dependencies makes that relatively
easy: simply turn on `X_VCPKG_APPLOCAL_DEPS_INSTALL`.
However, current `vcpkg` introduces a limitation if one does that:
While it is totally cool with CMake to specify multiple targets within
one invocation of `install(TARGETS ...) (at least according to
https://cmake.org/cmake/help/latest/command/install.html#command:install),
`vcpkg`'s parser insists on a single target per `install(TARGETS ...)`
invocation.
Well, that's easily accomplished: Let's feed the targets individually to
the `install(TARGETS ...)` function in a `foreach()` look.
This also has the advantage that we do not have to manually cull off the
two entries from the `${PROGRAMS_BUILT}` array before scheduling the
remainder to be installed into `libexec/git-core`. Instead, we iterate
through the array and decide for each entry where it wants to go.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch-pack: use new fsck API to printing dangling submodules
Refactor the check added in 5476e1efde (fetch-pack: print and use
dangling .gitmodules, 2021-02-22) to make use of us now passing the
"msg_id" to the user defined "error_func". We can now compare against
the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated
message.
Let's also replace register_found_gitmodules() with directly
manipulating the "gitmodules_found" member. A recent commit moved it
into "fsck_options" so we could do this here.
I'm sticking this callback in fsck.c. Perhaps in the future we'd like
to accumulate such callbacks into another file (maybe fsck-cb.c,
similar to parse-options-cb.c?), but while we've got just the one
let's just put it into fsck.c.
A better alternative in this case would be some library some more
obvious library shared by fetch-pack.c ad builtin/index-pack.c, but
there isn't such a thing.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch-pack: use file-scope static struct for fsck_options
Change code added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) so that we use a file-scoped "static struct
fsck_options" instead of defining one in the "fsck_gitmodules_oids()"
function.
We use this pattern in all of
builtin/{fsck,index-pack,mktag,unpack-objects}.c. It's odd to see
fetch-pack be the odd one out. One might think that we're using other
fsck_options structs in fetch-pack, or doing on fsck twice there, but
we're not.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the behavior of the .gitmodules validation added in 5476e1efde (fetch-pack: print and use dangling .gitmodules,
2021-02-22) so we're using one "fsck_options".
I found that code confusing to read. One might think that not setting
up the error_func earlier means that we're relying on the "error_func"
not being set in some code in between the two hunks being modified
here.
But we're not, all we're doing in the rest of "cmd_index_pack()" is
further setup by calling fsck_set_msg_types(), and assigning to
do_fsck_object.
So there was no reason in 5476e1efde to make a shallow copy of the
fsck_options struct before setting error_func. Let's just do this
setup at the top of the function, along with the "walk" assignment.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck.c: move gitmodules_{found,done} into fsck_options
Move the gitmodules_{found,done} static variables added in 159e7b080bf (fsck: detect gitmodules files, 2018-05-02) into the
fsck_options struct. It makes sense to keep all the context in the
same place.
This requires changing the recently added register_found_gitmodules()
function added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) to take fsck_options. That function will be
removed in a subsequent commit, but as it'll require the new
gitmodules_found attribute of "fsck_options" we need this intermediate
step first.
An earlier version of this patch removed the small amount of
duplication we now have between FSCK_OPTIONS_{DEFAULT,STRICT} with a
FSCK_OPTIONS_COMMON macro. I don't think such de-duplication is worth
it for this amount of copy/pasting.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck.c: add an fsck_set_msg_type() API that takes enums
Change code I added in acf9de4c94e (mktag: use fsck instead of custom
verify_tag(), 2021-01-05) to make use of a new API function that takes
the fsck_msg_{id,type} types, instead of arbitrary strings that
we'll (hopefully) parse into those types.
At the time that the fsck_set_msg_type() API was introduced in 0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) it was only intended to be used to parse user-supplied
data.
For things that are purely internal to the C code it makes sense to
have the compiler check these arguments, and to skip the sanity
checking of the data in fsck_set_msg_type() which is redundant to
checks we get from the compiler.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck.c: pass along the fsck_msg_id in the fsck_error callback
Change the fsck_error callback to also pass along the
fsck_msg_id. Before this change the only way to get the message id was
to parse it back out of the "message".
Let's pass it down explicitly for the benefit of callers that might
want to use it, as discussed in [1].
Passing the msg_type is now redundant, as you can always get it back
from the msg_id, but I'm not changing that convention. It's really
common to need the msg_type, and the report() function itself (which
calls "fsck_error") needs to call fsck_msg_type() to discover
it. Let's not needlessly re-do that work in the user callback.
fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
Move the FOREACH_FSCK_MSG_ID macro and the fsck_msg_id enum it helps
define from fsck.c to fsck.h. This is in preparation for having
non-static functions take the fsck_msg_id as an argument.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck.c: give "FOREACH_MSG_ID" a more specific name
Rename the FOREACH_MSG_ID macro to FOREACH_FSCK_MSG_ID in preparation
for moving it over to fsck.h. It's good convention to name macros
in *.h files in such a way as to clearly not clash with any other
names in other files.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In f417eed8cde (fsck: provide a function to parse fsck message IDs,
2015-06-22) the "STR" macro was introduced, but that short macro name
was not undefined after use as was done earlier in the same series for
the MSG_ID macro in c99ba492f1c (fsck: introduce identifiers for fsck
messages, 2015-06-22).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck.c: call parse_msg_type() early in fsck_set_msg_type()
There's no reason to defer the calling of parse_msg_type() until after
we've checked if the "id < 0". This is not a hot codepath, and
parse_msg_type() itself may die on invalid input.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck.h: re-order and re-assign "enum fsck_msg_type"
Change the values in the "enum fsck_msg_type" from being manually
assigned to using default C enum values.
This means we end up with a FSCK_IGNORE=0, which was previously
defined as "2".
I'm confident that nothing relies on these values, we always compare
them for equality. Let's not omit "0" so it won't be assumed that
we're using these as a boolean somewhere.
This also allows us to re-structure the fields to mark which are
"private" v.s. "public". See the preceding commit for a rationale for
not simply splitting these into two enums, namely that this is used
for both the private and public fsck API.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
fsck_msg_type enum.
These defines were originally introduced in:
- ba002f3b28a (builtin-fsck: move common object checking code to
fsck.c, 2008-02-25)
- f50c4407305 (fsck: disallow demoting grave fsck errors to warnings,
2015-06-22)
- efaba7cc77f (fsck: optionally ignore specific fsck issues
completely, 2015-06-22)
- f27d05b1704 (fsck: allow upgrading fsck warnings to errors,
2015-06-22)
The reason these were defined in two different places is because we
use FSCK_{IGNORE,INFO,FATAL} only in fsck.c, but FSCK_{ERROR,WARN} are
used by external callbacks.
Untangling that would take some more work, since we expose the new
"enum fsck_msg_type" to both. Similar to "enum object_type" it's not
worth structuring the API in such a way that only those who need
FSCK_{ERROR,WARN} pass around a different type.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
Refactor "if options->msg_type" and other code added in 0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) to reduce the scope of the "int msg_type" variable.
This is in preparation for changing its type in a subsequent commit,
only using it in the "!options->msg_type" scope makes that change
This also brings the code in line with the fsck_set_msg_type()
function (also added in 0282f4dced0), which does a similar check for
"!options->msg_type". Another minor benefit is getting rid of the
style violation of not having braces for the body of the "if".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
Rename the remaining variables of type fsck_msg_id from "id" to
"msg_id". This change is relatively small, and is worth the churn for
a later change where we have different id's in the "report" function.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck.c: remove (mostly) redundant append_msg_id() function
Remove the append_msg_id() function in favor of calling
prepare_msg_ids(). We already have code to compute the camel-cased
msg_id strings in msg_id_info, let's use it.
When the append_msg_id() function was added in 71ab8fa840f (fsck:
report the ID of the error/warning, 2015-06-22) the prepare_msg_ids()
function didn't exist. When prepare_msg_ids() was added in a46baac61eb (fsck: factor out msg_id_info[] lazy initialization code,
2018-05-26) this code wasn't moved over to lazy initialization.
This changes the behavior of the code to initialize all the messages
instead of just camel-casing the one we need on the fly. Since the
common case is that we're printing just one message this is mostly
redundant work.
But that's OK in this case, reporting this fsck issue to the user
isn't performance-sensitive. If we were somehow doing so in a tight
loop (in a hopelessly broken repository?) this would help, since we'd
save ourselves from re-doing this work for identical messages, we
could just grab the prepared string from msg_id_info after the first
invocation.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the fsck_walk_func to use an "enum object_type" instead of an
"int" type. The types are compatible, and ever since this was added in 355885d5315 (add generic, type aware object chain walker, 2008-02-25)
we've used entries from object_type (OBJ_BLOB etc.).
So this doesn't really change anything as far as the generated code is
concerned, it just gives the compiler more information and makes this
easier to read.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
Refactor the definitions of FSCK_OPTIONS_{DEFAULT,STRICT} to use
designated initializers. This allows us to omit those fields that
are initialized to 0 or NULL.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dennis Ameling [Sat, 27 Mar 2021 23:06:23 +0000 (23:06 +0000)]
cmake(install): fix double .exe suffixes
By mistake, the `.exe` extension is appended _twice_ when installing the
dashed executables into `libexec/git-core/` on Windows (the extension is
already appended when adding items to the `git_links` list in the
`#Creating hardlinks` section).
Signed-off-by: Dennis Ameling <dennis@dennisameling.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just like the Makefile-based build learned to skip hard-linking the
dashed built-ins in 179227d6e21 (Optionally skip linking/copying the
built-ins, 2020-09-21), this patch teaches the CMake-based build the
same trick.
Note: In contrast to the Makefile-based process, the built-ins would
only be linked during installation, not already when Git is built.
Therefore, the CMake-based build that we use in our CI builds _already_
does not link those built-ins (because the files are not installed
anywhere, they are used to run the test suite in-place).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever we fix critical vulnerabilities, we follow some sort of
protocol (e.g. setting a coordinated release date, keeping the fix under
embargo until that time, coordinating with packagers and/or hosting
sites, etc).
Similar in spirit to `Documentation/howto/maintain-git.txt`, let's
formalize the details in a document.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the same document, describe that Git does not have Long Term Support
(LTS) release trains, although security fixes are always applied to a
few of the most recent release trains.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 26 Mar 2021 21:59:03 +0000 (14:59 -0700)]
Merge branch 'cm/rebase-i-fixup-amend-reword'
"git commit --fixup=<commit>", which was to tweak the changes made
to the contents while keeping the original log message intact,
learned "--fixup=(amend|reword):<commit>", that can be used to
tweak both the message and the contents, and only the message,
respectively.
* cm/rebase-i-fixup-amend-reword:
doc/git-commit: add documentation for fixup=[amend|reword] options
t3437: use --fixup with options to create amend! commit
t7500: add tests for --fixup=[amend|reword] options
commit: add a reword suboption to --fixup
commit: add amend suboption to --fixup to create amend! commit
sequencer: export and rename subject_length()
Junio C Hamano [Fri, 26 Mar 2021 21:59:03 +0000 (14:59 -0700)]
Merge branch 'cm/rebase-i-updates'
Follow-up fixes to "cm/rebase-i" topic.
* cm/rebase-i-updates:
doc/rebase -i: fix typo in the documentation of 'fixup' command
t/t3437: fixup the test 'multiple fixup -c opens editor once'
t/t3437: use named commits in the tests
t/t3437: simplify and document the test helpers
t/t3437: check the author date of fixed up commit
t/t3437: remove the dependency of 'expected-message' file from tests
t/t3437: fixup here-docs in the 'setup' test
t/lib-rebase: update the documentation of FAKE_LINES
rebase -i: clarify and fix 'fixup -c' rebase-todo help
sequencer: rename a few functions
sequencer: fixup the datatype of the 'flag' argument
Junio C Hamano [Fri, 26 Mar 2021 21:59:03 +0000 (14:59 -0700)]
Merge branch 'cm/rebase-i'
"rebase -i" is getting cleaned up and also enhanced.
* cm/rebase-i:
doc/git-rebase: add documentation for fixup [-C|-c] options
rebase -i: teach --autosquash to work with amend!
t3437: test script for fixup [-C|-c] options in interactive rebase
rebase -i: add fixup [-C | -c] command
sequencer: use const variable for commit message comments
sequencer: pass todo_item to do_pick_commit()
rebase -i: comment out squash!/fixup! subjects from squash message
sequencer: factor out code to append squash message
rebase -i: only write fixup-message when it's needed
Junio C Hamano [Fri, 26 Mar 2021 21:59:02 +0000 (14:59 -0700)]
Merge branch 'ab/make-cleanup'
Reorganize Makefile to allow building git.o and other essential
objects without extra stuff needed only for testing.
* ab/make-cleanup:
Makefile: add {program,xdiff,test,git,fuzz}-objs & objects targets
Makefile: split OBJECTS into OBJECTS and GIT_OBJS
Makefile: sort OBJECTS assignment for subsequent change
Makefile: split up long OBJECTS line
Makefile: guard against TEST_OBJS in the environment
Derrick Stolee [Fri, 26 Mar 2021 12:38:11 +0000 (12:38 +0000)]
csum-file: make hashwrite() more readable
The hashwrite() method takes an input buffer and updates a hashfile's
hash function while writing the data to a file. To avoid overuse of
flushes, the hashfile has an internal buffer and most writes will use
memcpy() to transfer data from the input 'buf' to the hashfile's buffer
of size 8 * 1024 bytes.
Logic introduced by a8032d12 (sha1write: don't copy full sized buffers,
2008-09-02) reduces the number of memcpy() calls when the input buffer
is sufficiently longer than the hashfile's buffer, causing nr to be the
length of the full buffer. In these cases, the input buffer is used
directly in chunks equal to the hashfile's buffer size.
This method caught my attention while investigating some performance
issues, but it turns out that these performance issues were noise within
the variance of the experiment.
However, during this investigation, I inspected hashwrite() and
misunderstood it, even after looking closely and trying to make it
faster. This change simply reorganizes some parts of the loop within
hashwrite() to make it clear that each batch either uses memcpy() to the
hashfile's buffer or writes directly from the input buffer. The previous
code relied on indirection through local variables and essentially
inlined the implementation of hashflush() to reduce lines of code.
Helped-by: Jeff King <peff@peff.net> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 24 Mar 2021 21:36:27 +0000 (14:36 -0700)]
Merge branch 'nk/diff-index-fsmonitor'
"git diff-index" codepath has been taught to trust fsmonitor status
to reduce number of lstat() calls.
* nk/diff-index-fsmonitor:
fsmonitor: add perf test for git diff HEAD
fsmonitor: add assertion that fsmonitor is valid to check_removed
fsmonitor: skip lstat deletion check during git diff-index
Junio C Hamano [Wed, 24 Mar 2021 21:36:27 +0000 (14:36 -0700)]
Merge branch 'jk/fail-prereq-testfix'
GIT_TEST_FAIL_PREREQS is a mechanism to skip test pieces with
prerequisites to catch broken tests that depend on the side effects
of optional pieces, but did not work at all when negative
prerequisites were involved.
* jk/fail-prereq-testfix:
t: annotate !PTHREADS tests with !FAIL_PREREQS
Junio C Hamano [Wed, 24 Mar 2021 21:36:27 +0000 (14:36 -0700)]
Merge branch 'tb/geometric-repack'
"git repack" so far has been only capable of repacking everything
under the sun into a single pack (or split by size). A cleverer
strategy to reduce the cost of repacking a repository has been
introduced.
* tb/geometric-repack:
builtin/pack-objects.c: ignore missing links with --stdin-packs
builtin/repack.c: reword comment around pack-objects flags
builtin/repack.c: be more conservative with unsigned overflows
builtin/repack.c: assign pack split later
t7703: test --geometric repack with loose objects
builtin/repack.c: do not repack single packs with --geometric
builtin/repack.c: add '--geometric' option
packfile: add kept-pack cache for find_kept_pack_entry()
builtin/pack-objects.c: rewrite honor-pack-keep logic
p5303: measure time to repack with keep
p5303: add missing &&-chains
builtin/pack-objects.c: add '--stdin-packs' option
revision: learn '--no-kept-objects'
packfile: introduce 'find_kept_pack_entry()'
Han Xin [Tue, 23 Mar 2021 03:20:50 +0000 (11:20 +0800)]
pack-objects: fix comment of reused_chunk.difference
As record_reused_object(offset, offset - hashfile_total(out)) said,
reused_chunk.difference should be the offset of original packfile minus
the offset of the generated packfile. But the comment presented an opposite way.
Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 24 Mar 2021 17:35:40 +0000 (10:35 -0700)]
format-patch: give an overview of what a "patch" message is
The text says something called a "patch" is prepared one for each
commit, it is suitable for e-mail submission, and "am" is the
command to use it, but does not say what the "patch" really is.
The description in the page also refers to the "three-dash" line,
but it is unclear what it is, unless the reader is given a more
detailed overview of what the "patch" is.
Add a brief paragraph to give an overview of what the output looks
like.
Robert Foss [Tue, 23 Mar 2021 17:33:27 +0000 (18:33 +0100)]
git-send-email: Respect core.hooksPath setting
get-send-email currently makes the assumption that the
'sendemail-validate' hook exists inside of the repository.
Since the introduction of 'core.hooksPath' configuration option in 867ad08a261 (hooks: allow customizing where the hook directory is,
2016-05-04), this is no longer true.
Instead of assuming a hardcoded repo relative path, query
git for the actual path of the hooks directory.
Signed-off-by: Robert Foss <robert.foss@linaro.org> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the rebase.useBuiltin setting and the now-obsolete
GIT_TEST_REBASE_USE_BUILTIN test flag.
This was left in place after my d03ebd411c6 (rebase: remove the
rebase.useBuiltin setting, 2019-03-18) to help anyone who'd used the
experimental flag and wanted to know that it was the default, or that
they should transition their test environment to use the builtin
rebase unconditionally.
It's been more than long enough for those users to get a headsup about
this. So remove all the scaffolding that was left inplace after d03ebd411c6. I'm also removing the documentation entry, if anyone
still has this left in their configuration they can do some source
archaeology to figure out what it used to do, which makes more sense
than exposing every git user reading the documentation to this legacy
configuration switch.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ZheNing Hu [Tue, 23 Mar 2021 11:12:25 +0000 (11:12 +0000)]
format-patch: allow a non-integral version numbers
The `-v<n>` option of `format-patch` can give nothing but an
integral iteration number to patches in a series. Some people,
however, prefer to mark a new iteration with only a small fixup
with a non integral iteration number (e.g. an "oops, that was
wrong" fix-up patch for v4 iteration may be labeled as "v4.1").
Allow `format-patch` to take such a non-integral iteration
number.
`<n>` can be any string, such as '3.1' or '4rev2'. In the case
where it is a non-integral value, the "Range-diff" and "Interdiff"
headers will not include the previous version.
Signed-off-by: ZheNing Hu <adlternative@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The parallel checkout machinery will call checkout_entry() for entries
that could not be written in parallel due to path collisions. At this
point, we will already be holding the conversion attributes for each
entry, and it would be wasteful to let checkout_entry() load these
again. Instead, let's add the checkout_entry_ca() variant, which
optionally takes a preloaded conv_attrs struct.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Matheus Tavares [Tue, 23 Mar 2021 14:19:35 +0000 (11:19 -0300)]
entry: move conv_attrs lookup up to checkout_entry()
In a following patch, checkout_entry() will use conv_attrs to decide
whether an entry should be enqueued for parallel checkout or not. But
the attributes lookup only happens lower in this call stack. To avoid
the unnecessary work of loading the attributes twice, let's move it up
to checkout_entry(), and pass the loaded struct down to write_entry().
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Matheus Tavares [Tue, 23 Mar 2021 14:19:34 +0000 (11:19 -0300)]
entry: extract update_ce_after_write() from write_entry()
The code that updates the in-memory index information after an entry is
written currently resides in write_entry(). Extract it to a public
function so that it can be called by the parallel checkout functions,
outside entry.c, in a later patch.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Matheus Tavares [Tue, 23 Mar 2021 14:19:33 +0000 (11:19 -0300)]
entry: make fstat_output() and read_blob_entry() public
These two functions will be used by the parallel checkout code, so let's
make them public. Note: fstat_output() is renamed to
fstat_checkout_output(), now that it has become public, seeking to avoid
future name collisions.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Matheus Tavares [Tue, 23 Mar 2021 14:19:32 +0000 (11:19 -0300)]
entry: extract a header file for entry.c functions
The declarations of entry.c's public functions and structures currently
reside in cache.h. Although not many, they contribute to the size of
cache.h and, when changed, cause the unnecessary recompilation of
modules that don't really use these functions. So let's move them to a
new entry.h header. While at it let's also move a comment related to
checkout_entry() from entry.c to entry.h as it's more useful to describe
the function there.
ZheNing Hu [Tue, 23 Mar 2021 13:55:57 +0000 (13:55 +0000)]
commit: add --trailer option
Historically, Git has supported the 'Signed-off-by' commit trailer
using the '--signoff' and the '-s' option from the command line.
But users may need to provide other trailer information from the
command line such as "Helped-by", "Reported-by", "Mentored-by",
Now implement a new `--trailer <token>[(=|:)<value>]` option to pass
other trailers to `interpret-trailers` and insert them into commit
messages.
Signed-off-by: ZheNing Hu <adlternative@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 22 Mar 2021 21:00:25 +0000 (14:00 -0700)]
Merge branch 'ps/update-ref-trans-hook-doc'
Doc update.
* ps/update-ref-trans-hook-doc:
githooks.txt: clarify documentation on reference-transaction hook
githooks.txt: replace mentions of SHA-1 specific properties
* rs/pretty-describe:
archive: expand only a single %(describe) per archive
pretty: document multiple %(describe) being inconsistent
t4205: assert %(describe) test coverage
pretty: add merge and exclude options to %(describe)
pretty: add %(describe)
Junio C Hamano [Mon, 22 Mar 2021 21:00:23 +0000 (14:00 -0700)]
Merge branch 'en/ort-perf-batch-8'
Rename detection rework continues.
* en/ort-perf-batch-8:
diffcore-rename: compute dir_rename_guess from dir_rename_counts
diffcore-rename: limit dir_rename_counts computation to relevant dirs
diffcore-rename: compute dir_rename_counts in stages
diffcore-rename: extend cleanup_dir_rename_info()
diffcore-rename: move dir_rename_counts into dir_rename_info struct
diffcore-rename: add function for clearing dir_rename_count
Move computation of dir_rename_count from merge-ort to diffcore-rename
diffcore-rename: add a mapping of destination names to their indices
diffcore-rename: provide basic implementation of idx_possible_rename()
diffcore-rename: use directory rename guided basename comparisons
Junio C Hamano [Mon, 22 Mar 2021 21:00:23 +0000 (14:00 -0700)]
Merge branch 'ab/grep-pcre2-allocfix'
Updates to memory allocation code around the use of pcre2 library.
* ab/grep-pcre2-allocfix:
grep/pcre2: move definitions of pcre2_{malloc,free}
grep/pcre2: move back to thread-only PCREv2 structures
grep/pcre2: actually make pcre2 use custom allocator
grep/pcre2: use pcre2_maketables_free() function
grep/pcre2: use compile-time PCREv2 version test
grep/pcre2: add GREP_PCRE2_DEBUG_MALLOC debug mode
grep/pcre2: prepare to add debugging to pcre2_malloc()
grep/pcre2: correct reference to grep_init() in comment
grep/pcre2: drop needless assignment to NULL
grep/pcre2: drop needless assignment + assert() on opt->pcre2
Update C code that sets a few configuration variables when a remote
is configured so that it spells configuration variable names in the
canonical camelCase.
* ab/remote-write-config-in-camel-case:
remote: write camel-cased *.pushRemote on rename
remote: add camel-cased *.tagOpt key, like clone
We had a code to diagnose and die cleanly when a required
clean/smudge filter is missing, but an assert before that
unnecessarily fired, hiding the end-user facing die() message.
* mt/cleanly-die-upon-missing-required-filter:
convert: fail gracefully upon missing clean cmd on required filter