Josh Steadmon [Thu, 18 Nov 2021 22:01:41 +0000 (14:01 -0800)]
trace2: disable tr2_dst before warning on write errors
If writing a trace2 message fails, we optionally warn the user of this
fact. However, in 0ee10fd (usage: add trace2 entry upon warning(),
2020-11-23), we added a trace entry to the warning() function. This
means that we can enter an infinite loop of failing trace2 writes and
warnings. Fix this by disabling the failing trace2 destination before
issuing the warning.
Additionally, trace2 sets a default SIGPIPE handler
(tr2main_signal_handler) when it is initialized. This handler generates
its own trace2 messages when a signal is received. If a trace2 write
fails due to a broken pipe, this handler will run and then cause another
failed write. Fix this by temporarily ignoring SIGPIPE while writing
trace2 messages. This is safe because the write will still fail, and we
will disable the failing destination.
Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Glen Choo [Thu, 18 Nov 2021 00:53:25 +0000 (16:53 -0800)]
remote: die if branch is not found in repository
In a subsequent commit, we would like external-facing functions to be
able to accept "struct repository" and "struct branch" as a pair. This
is useful for functions like pushremote_for_branch(), which need to take
values from the remote_state and branch, even if branch == NULL.
However, a caller may supply an unrelated repository and branch, which
is not supported behavior.
To prevent misuse, add a die_on_missing_branch() helper function that
dies if a given branch is not from a given repository. Speed up the
existence check by replacing the branches list with a branches_hash
hashmap.
Like read_config(), die_on_missing_branch() is only called from
non-static functions; static functions are less prone to misuse because
they have strong conventions for keeping remote_state and branch in
sync.
Signed-off-by: Glen Choo <chooglen@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Glen Choo [Thu, 18 Nov 2021 00:53:24 +0000 (16:53 -0800)]
remote: remove the_repository->remote_state from static methods
Replace all remaining references of the_repository->remote_state in
static functions with a struct remote_state parameter.
To do so, move read_config() calls to non-static functions and create a
family of static functions, "remotes_*", that behave like "repo_*", but
accept struct remote_state instead of struct repository. In the case
where a static function calls a non-static function, replace the
non-static function with its "remotes_*" equivalent.
Signed-off-by: Glen Choo <chooglen@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Glen Choo [Thu, 18 Nov 2021 00:53:23 +0000 (16:53 -0800)]
remote: use remote_state parameter internally
Without changing external-facing functions, replace
the_repository->remote_state internally by adding a struct remote_state
parameter.
As a result, external-facing functions are still tied to the_repository,
but most static functions no longer reference
the_repository->remote_state. The exceptions are those that are used in
a way that depends on external-facing functions e.g. the callbacks to
remote_get_1().
Signed-off-by: Glen Choo <chooglen@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Glen Choo [Thu, 18 Nov 2021 00:53:22 +0000 (16:53 -0800)]
remote: move static variables into per-repository struct
remote.c does not works with non-the_repository because it stores its
state as static variables. To support non-the_repository, we can use a
per-repository struct for the remotes subsystem.
Prepare for this change by defining a struct remote_state that holds
the remotes subsystem state and move the static variables of remote.c
into the_repository->remote_state.
This introduces no behavioral or API changes.
Signed-off-by: Glen Choo <chooglen@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Glen Choo [Thu, 18 Nov 2021 00:53:21 +0000 (16:53 -0800)]
t5516: add test case for pushing remote refspecs
"git push remote-name" (that is, with no refspec given on the command
line) should push the refspecs in remote.remote-name.push. There is no
test case that checks this behavior in detached HEAD, so add one.
Signed-off-by: Glen Choo <chooglen@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Wed, 17 Nov 2021 11:20:25 +0000 (11:20 +0000)]
xdiff: simplify comparison
Now that the histogram algorithm calls xdl_classify_record() it is no
longer necessary to use xdl_recmatch() to compare lines, it is
sufficient just to compare the hash values. This has a negligible
effect on performance.
Phillip Wood [Wed, 17 Nov 2021 11:20:24 +0000 (11:20 +0000)]
xdiff: avoid unnecessary memory allocations
rindex and ha are only used by xdl_cleanup_records() which is not
called by the histogram or patience algorithms. The perf test results
show a small reduction in run time but that is probably within the
noise.
Phillip Wood [Wed, 17 Nov 2021 11:20:23 +0000 (11:20 +0000)]
diff histogram: intern strings
Histogram is the only diff algorithm not to call
xdl_classify_record(). xdl_classify_record() ensures that the hash
values of two strings that are not equal differ which means that it is
not necessary to use xdl_recmatch() when comparing lines, all that is
necessary is to compare the hash values. This gives a 7% reduction in
the runtime of "git log --patch" when using the histogram diff
algorithm.
Erwin Villejo [Wed, 17 Nov 2021 07:55:50 +0000 (07:55 +0000)]
pull: should be noop when already-up-to-date
The already-up-to-date pull bug was fixed for --ff-only but it did not
include the case where --ff or --ff-only are not specified. This updates
the --ff-only fix to include the case where --ff or --ff-only are not
specified in command line flags or config.
Signed-off-by: Erwin Villejo <erwin.villejo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 16 Nov 2021 21:38:50 +0000 (16:38 -0500)]
t5319: corrupt more bytes of the midx checksum
One of the tests in t5319 corrupts the checksum of the midx file by
writing a single 0xff over the final byte, and then confirms that we
detect the problem. This usually works fine, but would break if the
actual checksum ended with that same byte already.
It seems like this should happen in 1 out of 256 test runs, but it turns
out to be less often in practice. The contents of the midx are mostly
deterministic because it's based on the objects, and we remove most
sources of randomness by setting GIT_COMMITTER_DATE, etc. However,
there's still some randomness: some objects are duplicated between
packs, and the midx must decide which to use, which can be based on
timing.
So very occasionally we can end up with a real 0xff byte, and the test
fails. The most robust fix would be to read out the final byte and then
change it to something else (e.g., adding 1 mod 256). But that's awkward
to do in shell. Let's just blindly corrupt 10 bytes instead of 1, which
reduces our chances of an accidental noop to 1 in 2^80.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "checkout" command is one of the main sources of leaks in the test
suite, let's fix the common ones by not leaking from the "struct
branch_info".
Doing this is rather straightforward, albeit verbose, we need to
xstrdup() constant strings going into the struct, and free() the ones
we clobber as we go along.
This also means that we can delete previous partial leak fixes in this
area, i.e. the "path_to_free" accounting added by 96ec7b1e708 (Convert
resolve_ref+xstrdup to new resolve_refdup function, 2011-12-13).
There was some discussion about whether "we should retain the "const
char *" here and cast at free() time, or have it be a "char *". Since
this is not a public API with any sort of API boundary let's use
"char *", as is already being done for the "refname" member of the
same struct.
I.e. the ones that were newly passing when the --state=failed command
was run. I left out "t3040-subprojects-basic.sh" and
"t4131-apply-fake-ancestor.sh" to to optimization-level related
differences similar to the ones noted in[1], except that these would
be something the current 'linux-leaks' job would run into.
Teng Long [Thu, 18 Nov 2021 07:11:14 +0000 (15:11 +0800)]
midx: fix a formatting issue in "multi-pack-index.txt"
There is a formatting issue in "multi-pack-index.html", corresponding
to the nesting bulleted list of a wrong usage in "multi-pack-index.txt"
and this commit fix the problem.
In ASCIIDOC, it doesn't treat an indented character as the
beginning of a sub-list. If we want to write a nested bulleted list, we
could just use ASTERISK without any DASH like:
"
* Level 1 list item
** Level 2 list item
*** Level 3 list item
** Level 2 list item
* Level 1 list item
** Level 2 list item
* Level 1 list item
"
The DASH can be used for bulleted list too, But the DASH is suggested
only to be used as the marker for the first level because the DASH
doesn’t work well or a best practice for nested lists,
like (dash is as level 2 below):
"
* Level 1 list item
- Level 2 list item
* Level 1 list item
"
ASTERISK is recommanded to use because it works intuitively and clearly
("marker length = nesting level") in nested lists, but the DASH can't.
However, when you want to write a non-nested bulleted lists, DASH works
too, like:
"
- Level 1 list item
- Level 1 list item
- Level 1 list item
"
Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Sat, 13 Nov 2021 20:38:05 +0000 (20:38 +0000)]
SubmittingPatches: fix Asciidoc syntax in "GitHub CI" section
A superfluous ']' was added to the title of the GitHub CI section in f003a91f5c (SubmittingPatches: replace discussion of Travis with GitHub
Actions, 2021-07-22). Remove it.
While at it, format the URL for a GitHub user's workflow runs of Git
between backticks, since if not Asciidoc formats only the first part,
"https://github.com/<Your", as a link, which is not very useful.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Thu, 11 Nov 2021 00:54:16 +0000 (08:54 +0800)]
l10n: ko: fix typos found by git-po-helper
When checking typos in file "po/ko.po", "git-po-helper" reports lots of
false positives because there are no spaces between ASCII and Korean
characters. After applied commit adee197 "(dict: add smudge table for
Korean language, 2021-11-11)" of "git-l10n/git-po-helper" to suppress
these false positives, some easy-to-fix typos are found and fixed.
Josh Steadmon [Thu, 11 Nov 2021 22:34:25 +0000 (14:34 -0800)]
trace2: increment event format version
In 64bc752 (trace2: add trace2_child_ready() to report on background
children, 2021-09-20), we added a new "child_ready" event. In
Documentation/technical/api-trace2.txt, we promise that adding a new
event type will result in incrementing the trace2 event format version
number, but this was not done. Correct this in code & docs.
Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Calvin Wan [Thu, 11 Nov 2021 22:00:48 +0000 (22:00 +0000)]
protocol-v2.txt: align delim-pkt spec with usage
The current protocol EBNF allows command-request to end with the
capability list, if no command specific arguments follow, but the
protocol requires that after the capability list, there must be a
delim-pkt regardless of the number of command specific arguments. Fixed
the EBNF to match. Both JGit and libgit2's implementation has the
delim-pkt as mandatory. JGit's code is not publicly linkable, but
libgit2 is linked below[1]. As for currently implemented commands on v2
(ls-ref and fetch), the delim packet is already being passed through
object-file: free(*contents) only in read_loose_object() caller
In the preceding commit a free() of uninitialized memory regression in 96e41f58fe1 (fsck: report invalid object type-path combinations,
2021-10-01) was fixed, but we'd still have an issue with leaking
memory from fsck_loose(). Let's fix that issue too.
That issue was introduced in my 31deb28f5e0 (fsck: don't hard die on
invalid object types, 2021-10-01). It can be reproduced under
SANITIZE=leak with the test I added in 093fffdfbec (fsck tests: add
test for fsck-ing an unknown type, 2021-10-01):
./t1450-fsck.sh --run=84 -vixd
In some sense it's not a problem, we lost the same amount of memory in
terms of things malloc'd and not free'd. It just moved from the "still
reachable" to "definitely lost" column in valgrind(1) nomenclature[1],
since we'd have die()'d before.
But now that we don't hard die() anymore in the library let's properly
free() it. Doing so makes this code much easier to follow, since we'll
now have one function owning the freeing of the "contents" variable,
not two.
For context on that memory management pattern the read_loose_object()
function was added in f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) and subsequently used in c68b489e564 (fsck:
parse loose object paths directly, 2017-01-13). The pattern of it
being the task of both sides to free() the memory has been there in
this form since its inception.
Junio C Hamano [Thu, 11 Nov 2021 20:34:41 +0000 (12:34 -0800)]
Revert "connected: do not sort input revisions"
This reverts commit f45022dc2fd692fd024f2eb41a86a66f19013d43,
as this is like breakage in the traversal more likely. In a
history with 10 single strand of pearls,
1-->2-->3--...->7-->8-->9-->10
asking "rev-list --unsorted-input 1 10 --not 9 8 7 6 5 4" fails to
paint the bottom 1 uninteresting as the traversal stops, without
completing the propagation of uninteresting bit starting at 4 down
through 3 and 2 to 1.
object-file: fix SEGV on free() regression in v2.34.0-rc2
Fix a regression introduced in my 96e41f58fe1 (fsck: report invalid
object type-path combinations, 2021-10-01). When fsck-ing blobs larger
than core.bigFileThreshold, we'd free() a pointer to uninitialized
memory.
This issue would have been caught by SANITIZE=address, but since it
involves core.bigFileThreshold, none of the existing tests in our test
suite covered it.
Running them with the "big_file_threshold" in "environment.c" changed
to say "6" would have shown this failure, but let's add a dedicated
test for this scenario based on Han Xin's report[1].
The bug was introduced between v9 and v10[2] of the fsck series merged
in 061a21d36d8 (Merge branch 'ab/fsck-unexpected-type', 2021-10-25).
Reported-by: Han Xin <chiyutianyi@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ivan Frade [Wed, 10 Nov 2021 23:51:29 +0000 (23:51 +0000)]
http-fetch: redact url on die() message
http-fetch prints the URL after failing to fetch it. This can be
confusing to users (they cannot really do anything with it), and they
can share by accident a sensitive URL (e.g. with credentials) while
looking for help.
Redact the URL unless the GIT_TRACE_REDACT variable is set to false. This
mimics the redaction of other sensitive information in git, like the
Authorization header in HTTP.
Fix also capitalization of previous die() message (must start in
lowercase).
Signed-off-by: Ivan Frade <ifrade@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ivan Frade [Wed, 10 Nov 2021 23:51:28 +0000 (23:51 +0000)]
fetch-pack: redact packfile urls in traces
In some setups, packfile uris act as bearer token. It is not
recommended to expose them plainly in logs, although in special
circunstances (e.g. debug) it makes sense to write them.
Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT
variable is set to false. This mimics the redacting of the Authorization
header in HTTP.
Signed-off-by: Ivan Frade <ifrade@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Tan [Wed, 10 Nov 2021 23:40:33 +0000 (15:40 -0800)]
packfile: avoid overflowing shift during decode
unpack_object_header_buffer() attempts to protect against overflowing
left shifts, but the limit of the shift amount should not be the size of
the variable being shifted. It should be the size minus the size of its
contents. Fix that accordingly.
This was noticed at $DAYJOB by a fuzzer running internally.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse-options.c: use "enum parse_opt_result" for parse_nodash_opt()
Change the parse_nodash_opt() function to use "enum
parse_opt_result". In 352e761388b (parse-options.[ch]: consistently
use "enum parse_opt_result", 2021-10-08) its only caller
parse_options_step() started using that return type, and the
get_value() which will be called and return from it uses the same
enum.
Let's do the same here so that this function always returns an "enum
parse_opt_result" value.
We could go for either PARSE_OPT_HELP (-2) or PARSE_OPT_ERROR (-1)
here. The reason we ended up with "-2" is that in code added in 07fe54db3cd (parse-opt: do not print errors on unknown options, return
"-2" instead., 2008-06-23) we used that value in a meaningful way.
Then in 51a9949eda7 (parseopt: add PARSE_OPT_NODASH, 2009-05-07) the
use of "-2" was seemingly copy/pasted from parse_long_opt(), which was
the function immediately above the parse_nodash_opt() function added
in that commit.
Since we only care about whether the return value here is non-zero
let's use the more generic PARSE_OPT_ERROR.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 10 Nov 2021 23:01:20 +0000 (15:01 -0800)]
Merge branch 'js/simple-ipc-cygwin-socket-fix'
The way Cygwin emulates a unix-domain socket, on top of which the
simple-ipc mechanism is implemented, can race with the program on
the other side that wants to use the socket, and briefly make it
appear as a regular file before lstat(2) starts reporting it as a
socket. We now have a workaround on the side that connects to a
unix domain socket.
* js/simple-ipc-cygwin-socket-fix:
simple-ipc: work around issues with Cygwin's Unix socket emulation
"git pull --ff-only" and "git pull --rebase --ff-only" should make
it a no-op to attempt pulling from a remote that is behind us, but
instead the command errored out by saying it was impossible to
fast-forward, which may technically be true, but not a useful thing
to diagnose as an error. This has been corrected.
* jc/fix-pull-ff-only-when-already-up-to-date:
pull: --ff-only should make it a noop when already-up-to-date
Jeff King [Wed, 10 Nov 2021 06:00:47 +0000 (01:00 -0500)]
t/lib-gpg: avoid broken versions of ssh-keygen
The "-Y find-principals" option of ssh-keygen seems to be broken in
Debian's openssh-client 1:8.7p1-1, whereas it works fine in 1:8.4p1-5.
This causes several failures for GPGSSH tests. We fulfill the
prerequisite because generating the keys works fine, but actually
verifying a signature causes results ranging from bogus results to
ssh-keygen segfaulting.
We can find the broken version during the prereq check by feeding it
empty input. This should result in it complaining to stderr, but in the
broken version it triggers the segfault, causing the GPGSSH tests to be
skipped.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Robin Jarry [Wed, 10 Nov 2021 09:29:42 +0000 (10:29 +0100)]
receive-pack: ignore SIGPIPE while reporting status to client
Before running the post-receive hook, status info is reported back to
the client. If a remote client exits before or during the status report,
receive-pack is killed by SIGPIPE and post-receive is never executed.
The post-receive hook is often used to send email notifications (see
contrib/hooks/post-receive-email), update bug trackers, start automatic
builds, etc. Not executing it after an interrupted yet "successful" push
can lead to inconsistencies.
Ignore SIGPIPE before reporting status to the client to increase the
chances of post-receive running if pre-receive was successful. This does
not guarantee 100% consistency but it should resist early disconnection
by the client.
Signed-off-by: Robin Jarry <robin@jarry.cc> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Wed, 10 Nov 2021 18:35:59 +0000 (18:35 +0000)]
maintenance: disable cron on macOS
In eba1ba9 (maintenance: `git maintenance run` learned
`--scheduler=<scheduler>`, 2021-09-04), we introduced the ability to
specify a scheduler explicitly. This led to some extra checks around
whether an alternative scheduler was available. This added the
functionality of removing background maintenance from schedulers other
than the one selected.
On macOS, cron is technically available, but running 'crontab' triggers
a UI prompt asking for special permissions. This is the major reason why
launchctl is used as the default scheduler. The is_crontab_available()
method triggers this UI prompt, causing user disruption.
Remove this disruption by using an #ifdef to prevent running crontab
this way on macOS. This has the unfortunate downside that if a user
manually selects cron via the '--scheduler' option, then adjusting the
scheduler later will not remove the schedule from cron. The
'--scheduler' option ignores the is_available checks, which is how we
can get into this situation.
Extract the new check_crontab_process() method to avoid making the
'child' variable unused on macOS. The method is marked MAYBE_UNUSED
because it has no callers on macOS.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
simple-ipc: work around issues with Cygwin's Unix socket emulation
Cygwin emulates Unix sockets by writing files with custom contents and
then marking them as system files.
The tricky problem is that while the file is written and its `system`
bit is set, it is still identified as a file. This caused test failures
when Git is too fast looking for the Unix sockets and then complains
that there is a plain file in the way.
Let's work around this by adding a delayed retry loop, specifically for
Cygwin.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Tested-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Wed, 10 Nov 2021 00:55:14 +0000 (08:55 +0800)]
Merge branch 'master' of github.com:git/git
* 'master' of github.com:git/git:
Git 2.34-rc2
parse-options.[ch]: revert use of "enum" for parse_options()
t/lib-git.sh: fix ACL-related permissions failure
A few fixes before -rc2
async_die_is_recursing: work around GCC v11.x issue on Fedora
Document positive variant of commit and merge option "--no-verify"
pull: honor --no-verify and do not call the commit-msg hook
http-backend: remove a duplicated code branch
Jeff King [Tue, 9 Nov 2021 16:35:47 +0000 (11:35 -0500)]
git-jump: pass "merge" arguments to ls-files
We currently throw away any arguments given to "git jump merge". We
should instead pass them along to ls-files, since they're likely to be
pathspecs. This matches the behavior of "git jump diff", etc.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse-options.[ch]: revert use of "enum" for parse_options()
Revert the parse_options() prototype change in my recent 352e761388b (parse-options.[ch]: consistently use "enum
parse_opt_result", 2021-10-08) was incorrect. The parse_options()
function returns the number of argc elements that haven't been
processed, not "enum parse_opt_result".
Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jean-Noël Avila [Sat, 6 Nov 2021 18:48:52 +0000 (19:48 +0100)]
doc: use only hyphens as word separators in placeholders
According to CodingGuidelines, multi-word placeholders should use
hyphens as word separators.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com> Reviewed-by: Eli Schwartz <eschwartz@archlinux.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sat, 6 Nov 2021 18:48:58 +0000 (19:48 +0100)]
init doc: --shared=0xxx does not give umask but perm bits
The description that 0640 makes sure that the group members can read
the repository is correct, but calling that octal number a <umask>
is wrong. Let's call it <perm>, as the value is used to set the
permission bits.
Adam Dinwoodie [Fri, 5 Nov 2021 19:31:06 +0000 (19:31 +0000)]
t/lib-git.sh: fix ACL-related permissions failure
As well as checking that the relevant functionality is available, the
GPGSSH prerequisite check creates the SSH keys that are used by the test
functions it gates. If these keys are created in a directory that
has a default Access Control List, the key files can inherit those
permissions.
This can result in a scenario where the private keys are created
successfully, so the prerequisite check passes and the tests are run,
but the key files have permissions that are too permissive, meaning
OpenSSH will refuse to load them and the tests will fail.
To avoid this happening, before creating the keys, clear any default ACL
set on the directory that will contain them. This step allowed to fail;
if setfacl isn't present, that's a very likely indicator that the
filesystem in question simply doesn't support default ACLs.
Helped-by: Fabian Stelzer <fs@gigacodes.de> Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the "define_categories()" and "define_category_names()" functions
to take the already-parsed output of "category_list()" as an argument,
which brings our number of passes over "command-list.txt" from three
to two.
Then have "category_list()" itself take the output of "command_list()"
as an argument, bringing the number of times we parse the file to one.
Compared to the pre-image this speeds us up quite a bit:
$ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old
$ hyperfine --warmup 10 -L v ,.old 'sh generate-cmdlist.sh{v} command-list.txt'
Benchmark #1: sh generate-cmdlist.sh command-list.txt
Time (mean ± σ): 22.9 ms ± 0.3 ms [User: 15.8 ms, System: 9.6 ms]
Range (min … max): 22.5 ms … 24.0 ms 125 runs
Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
Time (mean ± σ): 30.1 ms ± 0.4 ms [User: 24.4 ms, System: 17.5 ms]
Range (min … max): 29.5 ms … 32.3 ms 96 runs
Summary
'sh generate-cmdlist.sh command-list.txt' ran
1.32 ± 0.02 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
generate-cmdlist.sh: replace "grep' invocation with a shell version
Replace the "grep" we run to exclude certain programs from the
generated output with a pure-shell loop that strips out the comments,
and sees if the "cmd" we're reading is on a list of excluded
programs. This uses a trick similar to test_have_prereq() in
test-lib-functions.sh.
On my *nix system this makes things quite a bit slower compared to
HEAD~:
o
'sh generate-cmdlist.sh.old command-list.txt' ran
1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt'
18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt'
But when I tried running generate-cmdlist.sh 100 times in CI I found
that it helped across the board even on OSX & Linux. I tried testing
it in CI with this ad-hoc few-liner:
for i in $(seq -w 0 11 | sort -nr)
do
git show HEAD~$i:generate-cmdlist.sh >generate-cmdlist-HEAD$i.sh &&
git add generate-cmdlist* &&
cp t/t0000-generate-cmdlist.sh t/t00$i-generate-cmdlist.sh || : &&
perl -pi -e "s/HEAD0/HEAD$i/g" t/t00$i-generate-cmdlist.sh &&
git add t/t00*.sh
done && git commit -m"generated it"
Here HEAD~02 and the t0002* file refers to this change, and HEAD~03
and t0003* file to the preceding commit, the relevant results were:
linux-gcc:
[12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU)
[12:05:30] t0003-generate-cmdlist.sh .. ok 32 ms ( 0.00 usr 0.00 sys + 2.66 cusr 1.81 csys = 4.47 CPU)
osx-gcc:
[11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU)
[11:58:16] t0003-generate-cmdlist.sh .. ok 92127 ms ( 0.02 usr 0.01 sys + 22.54 cusr 14.27 csys = 36.84 CPU)
vs-test:
[12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU)
[12:03:20] t0003-generate-cmdlist.sh .. ok 32 s ( 0.00 usr 0.02 sys + 13.25 cusr 26.10 csys = 39.37 CPU)
I.e. even on *nix running 100 of these in a loop was up to ~2x faster
in absolute runtime, I suspect it's due factors that are exacerbated
in the CI, e.g. much slower process startup due to some platform
limits, or a slower FS.
The "cut -d" change here is because we're not emitting the
40-character aligned output anymore, i.e. we'll get the output from
command_list() now, not an as-is line from command-list.txt.
This also makes the parsing more reliable, as we could tweak the
whitespace alignment without breaking this parser. Let's reword a
now-inaccurate comment in "command-list.txt" describing that previous
alignment limitation. We'll still need the "### command-list [...]"
line due to the "Documentation/cmd-list.perl" logic added in 11c6659d85d (command-list: prepare machinery for upcoming "common
groups" section, 2015-05-21).
There was a proposed change subsequent to this one[3] which continued
moving more logic into the "command_list() function, i.e. replaced the
"cut | tr | grep" chain in "category_list()" with an argument to
"command_list()".
That change might have had a bit of an effect, but not as much as the
preceding commit, so I decided to drop it. The relevant performance
numbers from it were:
linux-gcc:
[12:05:33] t0001-generate-cmdlist.sh .. ok 13 ms ( 0.00 usr 0.00 sys + 3.33 cusr 2.78 csys = 6.11 CPU)
[12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU)
osx-gcc:
[11:58:03] t0001-generate-cmdlist.sh .. ok 78416 ms ( 0.02 usr 0.01 sys + 11.78 cusr 6.22 csys = 18.03 CPU)
[11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU)
vs-test:
[12:03:20] t0001-generate-cmdlist.sh .. ok 34 s ( 0.00 usr 0.03 sys + 12.42 cusr 19.55 csys = 32.00 CPU)
[12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU)
As above HEAD~2 and t0002* are testing the code in this commit (and
the line is the same), but HEAD~1 and t0001* are testing that dropped
change in [3].
Jeff King [Fri, 5 Nov 2021 14:08:06 +0000 (15:08 +0100)]
generate-cmdlist.sh: do not shell out to "sed"
Replace the "sed" invocation in get_synopsis() with a pure-shell
version. This speeds up generate-cmdlist.sh significantly. Compared to
HEAD~ (old) and "master" we are, according to hyperfine(1):
'sh generate-cmdlist.sh command-list.txt' ran
12.69 ± 5.01 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
18.34 ± 3.03 times faster than 'sh generate-cmdlist.sh.master command-list.txt'
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a preceding commit we changed the print_command_list() loop to use
printf's auto-repeat feature. Let's now get rid of get_category_line()
entirely by not sorting the categories.
This will change the output of the generated code from e.g.:
- { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_complete | CAT_plumbingmanipulators },
To:
+ { "git-apply", N_("Apply a patch to files and/or to the index"), 0 | CAT_plumbingmanipulators | CAT_complete },
I.e. the categories are no longer sorted, but as they're OR'd together
it won't matter for the end result.
This speeds up the generate-cmdlist.sh a bit. Comparing HEAD~ (old)
and "master" to this code:
'sh generate-cmdlist.sh command-list.txt' ran
1.07 ± 0.33 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
1.15 ± 0.36 times faster than 'sh generate-cmdlist.sh.master command-list.txt'
Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Sixt [Fri, 5 Nov 2021 14:08:04 +0000 (15:08 +0100)]
generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
This is just a small code reduction. There is a small probability that
the new code breaks when the category list is empty. But that would be
noticed during the compile step.
Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
generate-cmdlist.sh: don't call get_categories() from category_list()
This isn't for optimization as the get_categories() is a purely shell
function, but rather for ease of readability, let's just inline these
two lines. We'll be changing this code some more in subsequent commits
to make this worth it.
Rename the get_categories() function to get_category_line(), since
that's what it's doing now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes Sixt [Fri, 5 Nov 2021 14:08:01 +0000 (15:08 +0100)]
generate-cmdlist.sh: spawn fewer processes
The function get_categories() is invoked in a loop over all commands.
As it runs several processes, this takes an awful lot of time on
Windows. To reduce the number of processes, move the process that
filters empty lines to the other invoker of the function, where it is
needed. The invocation of get_categories() in the loop does not need
the empty line filtered away because the result is word-split by the
shell, which eliminates the empty line automatically.
Furthermore, use sort -u instead of sort | uniq to remove yet another
process.
[Ævar: on Linux this seems to speed things up a bit, although with
hyperfine(1) the results are fuzzy enough to land within the
confidence interval]:
$ git show HEAD~:generate-cmdlist.sh >generate-cmdlist.sh.old
$ hyperfine --warmup 1 -L s ,.old -p 'make clean' 'sh generate-cmdlist.sh{s} command-list.txt'
Benchmark #1: sh generate-cmdlist.sh command-list.txt
Time (mean ± σ): 371.3 ms ± 64.2 ms [User: 430.4 ms, System: 72.5 ms]
Range (min … max): 320.5 ms … 517.7 ms 10 runs
Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
Time (mean ± σ): 489.9 ms ± 185.4 ms [User: 724.7 ms, System: 141.3 ms]
Range (min … max): 346.0 ms … 885.3 ms 10 runs
Summary
'sh generate-cmdlist.sh command-list.txt' ran
1.32 ± 0.55 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add " " before a "|" at the end of a line in generate-cmdlist.sh for
consistency with other code in the file. Some of the surrounding code
will be modified in subsequent commits.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit I'll change generate-cmdlist.sh to use C sort
order, and without this change we'd be led to believe that that change
caused a meaningful change in the output, so let's do this as a
separate step, right now the generate-cmdlist.sh script just uses the
order found in this file.
Note that this refers to the sort order of the lines in
command-list.txt, a subsequent commit will also change how we treat
the sort order of the "category" fields, but that's unrelated to this
change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 5 Nov 2021 09:01:31 +0000 (05:01 -0400)]
test_bitmap_hashes(): handle repository without bitmaps
If prepare_bitmap_git() returns NULL (one easy-to-trigger cause being
that the repository does not have bitmaps at all), then we'll segfault
accessing bitmap_git->hashes:
We should treat this the same as a repository with bitmaps but no
name-hashes, and quietly produce an empty output. The later call to
free_bitmap_index() in the cleanup label is OK, as it treats a NULL
pointer as a noop.
This isn't a big deal in practice, as this function is intended for and
used only by test-tool. It's probably worth fixing to avoid confusion,
but not worth adding coverage for this to the test suite.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>