Junio C Hamano [Tue, 7 Jun 2022 21:10:56 +0000 (14:10 -0700)]
Merge branch 'ab/plug-leak-in-revisions'
Plug the memory leaks from the trickiest API of all, the revision
walker.
* ab/plug-leak-in-revisions: (27 commits)
revisions API: add a TODO for diff_free(&revs->diffopt)
revisions API: have release_revisions() release "topo_walk_info"
revisions API: have release_revisions() release "date_mode"
revisions API: call diff_free(&revs->pruning) in revisions_release()
revisions API: release "reflog_info" in release revisions()
revisions API: clear "boundary_commits" in release_revisions()
revisions API: have release_revisions() release "prune_data"
revisions API: have release_revisions() release "grep_filter"
revisions API: have release_revisions() release "filter"
revisions API: have release_revisions() release "cmdline"
revisions API: have release_revisions() release "mailmap"
revisions API: have release_revisions() release "commits"
revisions API users: use release_revisions() for "prune_data" users
revisions API users: use release_revisions() with UNLEAK()
revisions API users: use release_revisions() in builtin/log.c
revisions API users: use release_revisions() in http-push.c
revisions API users: add "goto cleanup" for release_revisions()
stash: always have the owner of "stash_info" free it
revisions API users: use release_revisions() needing REV_INFO_INIT
revision.[ch]: document and move code declared around "init"
...
Josh Steadmon [Tue, 7 Jun 2022 18:21:57 +0000 (11:21 -0700)]
run-command: don't spam trace2_child_exit()
In rare cases[1], wait_or_whine() cannot determine a child process's
status (and will return -1 in this case). This can cause Git to issue
trace2 child_exit events despite the fact that the child may still be
running. In pathological cases, we've seen > 80 million exit events in
our trace logs for a single child process.
Fix this by only issuing trace2 events in finish_command_in_signal() if
we get a value other than -1 from wait_or_whine(). This can lead to
missing child_exit events in such a case, but that is preferable to
duplicating events on a scale that threatens to fill the user's
filesystem with invalid trace logs.
[1]: This can happen when:
* waitpid() returns -1 and errno != EINTR
* waitpid() returns an invalid PID
* the status set by waitpid() has neither the WIFEXITED() nor
WIFSIGNALED() flags
Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
hook API: fix v2.36.0 regression: hooks should be connected to a TTY
Fix a regression reported[1] against f443246b9f2 (commit: convert
{pre-commit,prepare-commit-msg} hook to hook.h, 2021-12-22): Due to
using the run_process_parallel() API in the earlier 96e7225b310 (hook:
add 'run' subcommand, 2021-12-22) we'd capture the hook's stderr and
stdout, and thus lose the connection to the TTY in the case of
e.g. the "pre-commit" hook.
As a preceding commit notes GNU parallel's similar --ungroup option
also has it emit output faster. While we're unlikely to have hooks
that emit truly massive amounts of output (or where the performance
thereof matters) it's still informative to measure the overhead. In a
similar "seq" test we're now ~30% faster:
seq 100000000
Benchmark 1: ./git hook run seq-hook' in 'origin/master
Time (mean ± σ): 787.1 ms ± 13.6 ms [User: 701.6 ms, System: 534.4 ms]
Range (min … max): 773.2 ms … 806.3 ms 10 runs
Benchmark 2: ./git hook run seq-hook' in 'HEAD~0
Time (mean ± σ): 603.4 ms ± 1.6 ms [User: 573.1 ms, System: 30.3 ms]
Range (min … max): 601.0 ms … 606.2 ms 10 runs
Summary
'./git hook run seq-hook' in 'HEAD~0' ran
1.30 ± 0.02 times faster than './git hook run seq-hook' in 'origin/master'
Reported-by: Anthony Sottile <asottile@umich.edu> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
[jc: minor fix-up to tests for consistency] Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a bug in fd3cb0501e1 (remote: move static variables into
per-repository struct, 2021-11-17) where we'd free(remote->pushurl[i])
after having NULL'd out remote->pushurl. itself. We free
"remote->pushurl" in the next "for"-loop, so doing this appears to
have been a copy/paste error.
Before this change GCC 12's -fanalyzer would correctly note that we'd
dereference NULL in this case, this change fixes that:
remote.c: In function ‘remote_clear’:
remote.c:153:17: error: dereference of NULL ‘*remote.pushurl’ [CWE-476] [-Werror=analyzer-null-dereference]
153 | free((char *)remote->pushurl[i]);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[...]
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
remote.c: remove braces from one-statement "for"-loops
Remove braces that don't follow the CodingGuidelines from code added
in fd3cb0501e1 (remote: move static variables into per-repository
struct, 2021-11-17). A subsequent commit will edit code adjacent to
this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
run-command: add an "ungroup" option to run_process_parallel()
Extend the parallel execution API added in c553c72eed6 (run-command:
add an asynchronous parallel child processor, 2015-12-15) to support a
mode where the stdout and stderr of the processes isn't captured and
output in a deterministic order, instead we'll leave it to the kernel
and stdio to sort it out.
This gives the API same functionality as GNU parallel's --ungroup
option. As we'll see in a subsequent commit the main reason to want
this is to support stdout and stderr being connected to the TTY in the
case of jobs=1, demonstrated here with GNU parallel:
Another is as GNU parallel's documentation notes a potential for
optimization. As demonstrated in next commit our results with "git
hook run" will be similar, but generally speaking this shows that if
you want to run processes in parallel where the exact order isn't
important this can be a lot faster:
$ hyperfine -r 3 -L o ,--ungroup 'parallel {o} seq ::: 10000000 >/dev/null '
Benchmark 1: parallel seq ::: 10000000 >/dev/null
Time (mean ± σ): 220.2 ms ± 9.3 ms [User: 124.9 ms, System: 96.1 ms]
Range (min … max): 212.3 ms … 230.5 ms 3 runs
Benchmark 2: parallel --ungroup seq ::: 10000000 >/dev/null
Time (mean ± σ): 154.7 ms ± 0.9 ms [User: 136.2 ms, System: 25.1 ms]
Range (min … max): 153.9 ms … 155.7 ms 3 runs
Summary
'parallel --ungroup seq ::: 10000000 >/dev/null ' ran
1.42 ± 0.06 times faster than 'parallel seq ::: 10000000 >/dev/null '
A large part of the juggling in the API is to make the API safer for
its maintenance and consumers alike.
For the maintenance of the API we e.g. avoid malloc()-ing the
"pp->pfd", ensuring that SANITIZE=address and other similar tools will
catch any unexpected misuse.
For API consumers we take pains to never pass the non-NULL "out"
buffer to an API user that provided the "ungroup" option. The
resulting code in t/helper/test-run-command.c isn't typical of such a
user, i.e. they'd typically use one mode or the other, and would know
whether they'd provided "ungroup" or not.
We could also avoid the strbuf_init() for "buffered_output" by having
"struct parallel_processes" use a static PARALLEL_PROCESSES_INIT
initializer, but let's leave that cleanup for later.
Using a global "run_processes_parallel_ungroup" variable to enable
this option is rather nasty, but is being done here to produce as
minimal of a change as possible for a subsequent regression fix. This
change is extracted from a larger initial version[1] which ends up
with a better end-state for the API, but in doing so needed to modify
all existing callers of the API. Let's defer that for now, and
narrowly focus on what we need for fixing the regression in the
subsequent commit.
It's safe to do this with a global variable because:
A) hook.c is the only user of it that sets it to non-zero, and before
we'll get any other API users we'll refactor away this method of
passing in the option, i.e. re-roll [1].
B) Even if hook.c wasn't the only user we don't have callers of this
API that concurrently invoke this parallel process starting API
itself in parallel.
As noted above "A" && "B" are rather nasty, and we don't want to live
with those caveats long-term, but for now they should be an acceptable
compromise.
Son Luong Ngoc [Tue, 7 Jun 2022 11:14:19 +0000 (13:14 +0200)]
fsmonitor: query watchman with right valid json
In rare circumstances where the current git index does not carry the
last_update_token, the fsmonitor v2 hook will be invoked with an
empty string which would caused the final rendered json to be invalid.
Which will left user with the following error message
> git status
failed to parse command from stdin: line 2, column 13, position 67: unexpected token near ','
Watchman: command returned no output.
Falling back to scanning...
Hide the "since" field in json query when "last_update_token" is empty.
Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Joakim Petersen [Tue, 7 Jun 2022 11:50:24 +0000 (13:50 +0200)]
git-prompt: make colourization consistent
The short upstream state indicator inherits the colour of the last short
state indicator before it (if there is one), and the sparsity state
indicator inherits this colour as well. This behaviour was introduced by 0ec7c23cdc6 (git-prompt: make upstream state indicator location
consistent, 2022-02-27), while before this change the aforementioned
indicators were white/the default text colour. Some examples to
illustrate this behaviour (assuming all indicators are enabled and
colourization is on):
* If there is something in the stash, both the '$' and the short
upstream state indicator following it will be blue.
* If the local tree has new, untracked files and there is nothing in
the stash, both the '%' and the short upstream state indicator
will be red.
* If all local changes are added to the index and the stash is empty,
both the '+' and the short upstream state indicator following it will
be green.
* If the local tree is clean and there is nothing in the stash, the
short upstream state indicator will be white/${default text colour}.
This appears to be an unintended side-effect of the change, and makes
little sense semantically (e.g. why is it bad to be in sync with
upstream when you have uncommitted local changes?). The cause of the
change in colourization is that previously, the short upstream state
indicator appeared immediately after the rebase/revert/bisect/merge
state indicator (note the position of $p in $gitstring):
local f="$h$w$i$s$u"
local gitstring="$c$b${f:+$z$f}${sparse}$r$p"
Said indicator is prepended with the clear colour code, and the short
upstream state indicator is thus also uncoloured. Now, the short
upstream state indicator follows the sequence of colourized indicators,
without any clearing of colour (again note the position of $p, now in
$f):
local f="$h$w$i$s$u$p"
local gitstring="$c$b${f:+$z$f}${sparse}$r${upstream}"
If the user is in a sparse checkout, the sparsity state indicator
follows a similar pattern to the short upstream state indicator.
However, clearing colour of the colourized indicators changes how the
sparsity state indicator is colourized, as it currently inherits (and
before the change referenced also inherited) the colour of the last
short state indicator before it. Reading the commit message of the
change that introduced the sparsity state indicator, afda36dbf3b
(git-prompt: include sparsity state as well, 2020-06-21), it appears
this colourization also was unintended, so clearing the colour for said
indicator further increases consistency.
Make the colourization of these state indicators consistent by making
all colourized indicators clear their own colour. Make colouring of $c
dependent on it not being empty, as it is no longer being used to colour
the branch name. Move clearing of $b's prefix to before colourization so
it gets cleared properly when colour codes are inserted into it. These
changes make changing the layout of the prompt less prone to unintended
colour changes in the future.
Change coloured Bash prompt tests to reflect the colourization changes:
* Move the colour codes to wrap the expected content of the expanded
$__git_ps1_branch_name in all tests.
* Insert a clear-colour code after the symbol for the first indicator
in "prompt - bash color pc mode - dirty status indicator - dirty
index and worktree", to reflect that all indicators should clear
their own colour.
Signed-off-by: Joakim Petersen <joak-pet@online.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Mon, 6 Jun 2022 20:59:13 +0000 (20:59 +0000)]
range-diff: show submodule changes irrespective of diff.submodule
After generating diffs for each range to be compared using a 'git log'
invocation, range-diff.c::read_patches looks for the "diff --git" header
in those diffs to recognize the beginning of a new change.
In a project with submodules, and with 'diff.submodule=log' set in the
config, this header is missing for the diff of a changed submodule, so
any submodule changes are quietly ignored in the range-diff.
When 'diff.submodule=diff' is set in the config, the "diff --git" header
is also missing for the submodule itself, but is shown for submodule
content changes, which can easily confuse 'git range-diff' and lead to
errors such as:
error: git apply: bad git-diff - inconsistent old filename on line 1
error: could not parse git header 'diff --git path/to/submodule/and/some/file/within
'
error: could not parse log for '@{u}..@{1}'
Force the submodule diff format to its default ("short") when invoking
'git log' to generate the patches for each range, such that submodule
changes are always detected.
Add a test, including an invocation with '--creation-factor=100' to
force the second commit in the range not to be considered a complete
rewrite, in order to verify we do indeed get the "short" format.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
tests: don't assume a .git/info for .git/info/sparse-checkout
Change those tests that assumed that a .git/info directory would be
created for them when writing .git/info/sparse-checkout to explicitly
create the directory by setting "TEST_CREATE_REPO_NO_TEMPLATE=1"
before sourcing test-lib.sh, and using the "--template=" argument to
"git clone".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
tests: don't assume a .git/info for .git/info/exclude
Change those tests that assumed that a .git/info directory would be
created for them when writing .git/info/exclude to explicitly create
the directory by setting "TEST_CREATE_REPO_NO_TEMPLATE=1" before
sourcing test-lib.sh, and using the "--template=" argument to "git
clone" and "git init".
In the case of ".git/modules/sub1/info" we deviate from the
established pattern in this and preceding commits of passing a
"--template=" and doing a "mkdir .git/info".
In that case "git checkout" will run the "submodule--helper clone",
and both e.g. "git submodule update --init" and "git checkout" do not
have a way to pass down options to the eventual "git init" or "git
clone". Let's instead assume that the submodule was populated with our
default templates, remove them, and then run the "mkdir".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
tests: don't assume a .git/info for .git/info/refs
Change those tests that assumed that a .git/info directory would be
created for them when writing .git/info/refs to explicitly create the
directory by using the "--template=" argument to "git init".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
tests: don't assume a .git/info for .git/info/attributes
Change those tests that assumed that a .git/info directory would be
created for them when writing .git/info/attributes to explicitly
create the directory by setting "TEST_CREATE_REPO_NO_TEMPLATE=1"
before sourcing test-lib.sh, and using the "--template=" argument to
"git clone".
The change here in here in t7814-grep-recurse-submodules.sh would
continue "succeeding" with only the "TEST_CREATE_REPO_NO_TEMPLATE=1"
part of this change. That's because those tests use
"test_expect_failure", so they'd "pass" without this change, as
"test_expect_failure" by design isn't discerning about what failure
conditions it'll accept.
But as we're fixing these sorts of issues across the test suite let's
fix this one too. This issue was spotted with a local merge with
another topic of mine[1], which introduces a stricter alternative to
"test_expect_failure".
tests: don't assume a .git/info for .git/info/grafts
Change those tests that assumed that a .git/info directory would be
created for them when writing .git/info/grafts to explicitly create
the directory.
Do this using the new "TEST_CREATE_REPO_NO_TEMPLATE" facility, and use
"mkdir" instead of "mkdir -p" to assert that we don't have the
.git/info already. An exception to this is the "with grafts" test in
"t6001-rev-list-graft.sh". There we're modifying our ".git" state in a
for-loop, in lieu of refactoring that more extensively let's use
"mkdir -p" there.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
tests: don't depend on template-created .git/branches
As noted in c8a58ac5a52 (Revert "Don't create the $GIT_DIR/branches
directory on init", 2009-10-31) there was an attempt long ago in 0cc5691a8b0 (Don't create the $GIT_DIR/branches directory on init,
2009-10-30) to get rid of the legacy "branches" directory.
We should probably get rid of its creation by removing the
"templates/branches--" file. But whatever our default behavior, our
tests should be tightened up to explicitly create the .git/branches
directory if they rely on our default templates, to make the
dependency on those templates clear.
So let's amend the two tests that would fail if .git/branches wasn't
created. To do this introduce a new "TEST_CREATE_REPO_NO_TEMPLATE"
variable, which we'll set before sourcing test-lib.sh, and change the
"git clone" and "git init" commands in the tests themselves to
explicitly pass "--template=".
This way they won't get a .git/branches in either their top-level
.git, or in the ones they create. We can then amend the tests that
rely on the ".git/branches" directory existing to create it
explicitly, and to remove it after its creation.
This new "TEST_CREATE_REPO_NO_TEMPLATE" variable is a less
heavy-handed version of the "NO_SET_GIT_TEMPLATE_DIR" variable. See a94d305bf80 (t/t0001-init.sh: add test for 'init with init.templatedir
set', 2010-02-26) for its implementation.
Unlike "TEST_CREATE_REPO_NO_TEMPLATE", this new
"TEST_CREATE_REPO_NO_TEMPLATE" variable is narrowly scoped to what the
"git init" in test-lib.sh does, as opposed to the global effect of
"NO_SET_GIT_TEMPLATE_DIR" and the setting of "GIT_TEMPLATE_DIR" in
wrap-for-bin.sh.
I experimented with adding a new "GIT_WRAP_FOR_BIN_VIA_TEST_LIB"
variable set in test-lib.sh, which would cause wrap-for-bin.sh to not
set GIT_TEMPLATE_DIR, GITPERLLIB etc, as we set those in
test-lib.sh. I think that's a viable approach, but it would interact
e.g. with the appending feature of GITPERLLIB added in 8bade1e12e2 (wrap-for-bin: make bin-wrappers chainable, 2013-07-04).
Doing so would allow us to convert the tests in t0001-init.sh that now
use "NO_SET_GIT_TEMPLATE_DIR" to simply unset "GIT_TEMPLATE_DIR" in a
sub-shell before invoking "git init" or "git clone". I think that
approach is worth pursuing, but let's table it for now. Some future
wrap-for-bin.sh refactoring can try to address it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a test added in 368aa52952d (add git-check-ignore sub-command,
2013-01-06) to clobber .git/info/exclude rather than append to
it.
These tests would break if the "templates/info--exclude" file added in d3af621b147 (Redo the templates generation and installation.,
2005-08-06) wasn't exactly 6 lines (of only comments).
Let's instead clobber the default .git/info/excludes file, and test
only our own expected content. This is not strictly needed for
anything in this series, but is a good cleanup while we're at it.
As discussed in the preceding commit a lot of things depend on the
"info" directory being created, but this was the only test that relied
on the specific content in the "templates/info--exclude" file.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Tan [Mon, 6 Jun 2022 17:54:37 +0000 (10:54 -0700)]
commit,shallow: unparse commits if grafts changed
When a commit is parsed, it pretends to have a different (possibly
empty) list of parents if there is graft information for that commit.
But there is a bug that could occur when a commit is parsed, the graft
information is updated (for example, when a shallow file is rewritten),
and the same commit is subsequently used: the parents of the commit do
not conform to the updated graft information, but the information at the
time of parsing.
This is usually not an issue, as a commit is usually introduced into the
repository at the same time as its graft information. That means that
when we try to parse that commit, we already have its graft information.
But it is an issue when fetching a shallow point directly into a
repository with submodules. The function
assign_shallow_commits_to_refs() parses all sought objects (including
the shallow point, which we are directly fetching). In update_shallow()
in fetch-pack.c, assign_shallow_commits_to_refs() is called before
commit_shallow_file(), which means that the shallow point would have
been parsed before graft information is updated. Once a commit is
parsed, it is no longer sensitive to any graft information updates. This
parsed commit is subsequently used when we do a revision walk to search
for submodules to fetch, meaning that the commit is considered to have
parents even though it is a shallow point (and therefore should be
treated as having no parents).
Therefore, whenever graft information is updated, mark the commits that
were previously grafts and the commits that are newly grafts as
unparsed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philip Oakley [Sat, 4 Jun 2022 11:17:48 +0000 (11:17 +0000)]
rebase: note `preserve` merges may be a pull config option
The `--preserve-merges` option was removed by v2.34.0. However
users may not be aware that it is also a Pull configuration option,
which is still offered by major IDE vendors such as Visual Studio.
Extend the `--preserve-merges` die message to also direct users to
the possible use of the `preserve` option in the `pull.rebase` config.
This is an additional 'belt and braces' information statement.
Signed-off-by: Philip Oakley <philipoakley@iee.email> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philip Oakley [Sat, 4 Jun 2022 11:17:47 +0000 (11:17 +0000)]
rebase: help users when dying with `preserve-merges`
Git would die if a "rebase --preserve-merges" was in progress.
Users could neither --quit, --abort, nor --continue the rebase.
Make the `rebase --abort` option available to allow users to remove
traces of any preserve-merges rebase, even if they had upgraded
during a rebase.
One trigger case was an unexpectedly difficult to resolve conflict, as
reported on the `git-users` group.
(https://groups.google.com/g/git-for-windows/c/3jMWbBlXXHM)
Other potential use-cases include git-experts using the portable
'Git on a stick' to help users with an older git version.
Signed-off-by: Philip Oakley <philipoakley@iee.email> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ZheNing Hu [Sun, 5 Jun 2022 13:37:28 +0000 (13:37 +0000)]
read-cache.c: reduce unnecessary cache entry name copying
575fa8a3 (read-cache: read data in a hash-independent way,
2019-02-19) added a new code to copy from the on-disk data into the
name member of the in-core cache entry, which is already done
immediately after that in a way that takes prefix-compression into
account.
Remove this code, as it is not just unnecessary, but also can be
reading beyond the on-disk data, when we are copying very long
prefix string from the previous entry.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
[jc: rewrote the log message with Réne's findings] Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Fri, 3 Jun 2022 21:55:36 +0000 (17:55 -0400)]
builtin/show-ref.c: avoid over-iterating with --heads, --tags
When `show-ref` is combined with the `--heads` or `--tags` options, it
can avoid iterating parts of a repository's references that it doesn't
care about.
But it doesn't take advantage of this potential optimization. When this
command was introduced back in 358ddb62cf (Add "git show-ref" builtin
command, 2006-09-15), `for_each_ref_in()` did exist. But since most
repositories don't have many (any?) references that aren't branches or
tags already, this makes little difference in practice.
Though for repositories with a large imbalance of branches and tags (or,
more likely in the case of server operators, many hidden references),
this can make quite a difference. Take, for example, a repository with
500,000 "hidden" references (all of the form "refs/__hidden__/N"), and
a single branch:
Outputting the existence of that single branch currently takes on the
order of ~50ms on my machine. The vast majority of this time is wasted
iterating through references that we know we're going to discard.
Instead, teach `show-ref` that it can iterate just "refs/heads" and/or
"refs/tags" when given `--heads` and/or `--tags`, respectively. A few
small interesting things to note:
- When given either option, we can avoid the general-purpose
for_each_ref() call altogether, since we know that it won't give us
any references that we wouldn't filter out already.
- We can make two separate calls to `for_each_fullref_in()` (and
avoid, say, the more specialized `for_each_fullref_in_prefixes()`,
since we know that the set of references enumerated by each is
disjoint, so we'll never see the same reference appear in both
calls.
- We have to use the "fullref" variant (instead of just
`for_each_branch_ref()` and `for_each_tag_ref()`), since we expect
fully-qualified reference names to appear in `show-ref`'s output.
When either of `heads_only` or `tags_only` is set, we can eliminate the
strcmp() calls in `builtin/show-ref.c::show_ref()` altogether, since we
know that `show_ref()` will never see a non-branch or tag reference.
Unfortunately, we can't use `for_each_fullref_in_prefixes()` to enhance
`show-ref`'s pattern matching, since `show-ref` patterns match on the
_suffix_ (e.g., the pattern "foo" shows "refs/heads/foo",
"refs/tags/foo", and etc, not "foo/*").
Nonetheless, in our synthetic example above, this provides a significant
speed-up ("git" is roughly v2.36, "git.compile" is this patch):
$ hyperfine -N 'git show-ref --heads' 'git.compile show-ref --heads'
Benchmark 1: git show-ref --heads
Time (mean ± σ): 49.9 ms ± 6.2 ms [User: 45.6 ms, System: 4.1 ms]
Range (min … max): 46.1 ms … 73.6 ms 43 runs
Benchmark 2: git.compile show-ref --heads
Time (mean ± σ): 2.8 ms ± 0.4 ms [User: 1.4 ms, System: 1.2 ms]
Range (min … max): 1.3 ms … 5.6 ms 957 runs
Summary
'git.compile show-ref --heads' ran
18.03 ± 3.38 times faster than 'git show-ref --heads'
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 6 Jun 2022 14:36:16 +0000 (14:36 +0000)]
remote: create fetch.credentialsInUrl config
Users sometimes provide a "username:password" combination in their
plaintext URLs. Since Git stores these URLs in plaintext in the
.git/config file, this is a very insecure way of storing these
credentials. Credential managers are a more secure way of storing this
information.
System administrators might want to prevent this kind of use by users on
their machines.
Create a new "fetch.credentialsInUrl" config option and teach Git to
warn or die when seeing a URL with this kind of information. The warning
anonymizes the sensitive information of the URL to be clear about the
issue.
This change currently defaults the behavior to "allow" which does
nothing with these URLs. We can consider changing this behavior to
"warn" by default if we wish. At that time, we may want to add some
advice about setting fetch.credentialsInUrl=ignore for users who still
want to follow this pattern (and not receive the warning).
An earlier version of this change injected the logic into
url_normalize() in urlmatch.c. While most code paths that parse URLs
eventually normalize the URL, that normalization does not happen early
enough in the stack to avoid attempting connections to the URL first. By
inserting a check into the remote validation, we identify the issue
before making a connection. In the old code path, this was revealed by
testing the new t5601-clone.sh test under --stress, resulting in an
instance where the return code was 13 (SIGPIPE) instead of 128 from the
die().
However, we can reuse the parsing information from url_normalize() in
order to benefit from its well-worn parsing logic. We can use the struct
url_info that is created in that method to replace the password with
"<redacted>" in our error messages. This comes with a slight downside
that the normalized URL might look slightly different from the input URL
(for instance, the normalized version adds a closing slash). This should
not hinder users figuring out what the problem is and being able to fix
the issue.
As an attempt to ensure the parsing logic did not catch any
unintentional cases, I modified this change locally to to use the "die"
option by default. Running the test suite succeeds except for the
explicit username:password URLs used in t5550-http-fetch-dumb.sh and
t5541-http-push-smart.sh. This means that all other tested URLs did not
trigger this logic.
The tests show that the proper error messages appear (or do not
appear), but also count the number of error messages. When only warning,
each process validates the remote URL and outputs a warning. This
happens twice for clone, three times for fetch, and once for push.
Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 3 Jun 2022 21:30:36 +0000 (14:30 -0700)]
Merge branch 'kl/setup-in-unreadable-worktree'
Disable the "do not remove the directory the user started Git in"
logic when Git cannot tell where that directory is. Earlier we
refused to run in such a case.
* kl/setup-in-unreadable-worktree:
setup: don't die if realpath(3) fails on getcwd(3)
Junio C Hamano [Fri, 3 Jun 2022 21:30:36 +0000 (14:30 -0700)]
Merge branch 'jx/l10n-workflow-change'
A workflow change for translators are being proposed.
* jx/l10n-workflow-change:
l10n: Document the new l10n workflow
Makefile: add "po-init" rule to initialize po/XX.po
Makefile: add "po-update" rule to update po/XX.po
po/git.pot: don't check in result of "make pot"
po/git.pot: this is now a generated file
Makefile: remove duplicate and unwanted files in FOUND_SOURCE_FILES
i18n CI: stop allowing non-ASCII source messages in po/git.pot
Makefile: have "make pot" not "reset --hard"
Makefile: generate "po/git.pot" from stable LOCALIZED_C
Makefile: sort source files before feeding to xgettext
Junio C Hamano [Fri, 3 Jun 2022 21:30:36 +0000 (14:30 -0700)]
Merge branch 'tb/geom-repack-with-keep-and-max'
Teach "git repack --geometric" work better with "--keep-pack" and
avoid corrupting the repository when packsize limit is used.
* tb/geom-repack-with-keep-and-max:
builtin/repack.c: ensure that `names` is sorted
t7703: demonstrate object corruption with pack.packSizeLimit
repack: respect --keep-pack with geometric repack
Junio C Hamano [Fri, 3 Jun 2022 21:30:34 +0000 (14:30 -0700)]
Merge branch 'ds/bundle-uri'
Preliminary code refactoring around transport and bundle code.
* ds/bundle-uri:
bundle.h: make "fd" version of read_bundle_header() public
remote: allow relative_url() to return an absolute url
remote: move relative_url()
http: make http_get_file() external
fetch-pack: move --keep=* option filling to a function
fetch-pack: add a deref_without_lazy_fetch_extended()
dir API: add a generalized path_match_flags() function
connect.c: refactor sending of agent & object-format
Junio C Hamano [Fri, 3 Jun 2022 21:30:34 +0000 (14:30 -0700)]
Merge branch 'ns/batch-fsync'
Introduce a filesystem-dependent mechanism to optimize the way the
bits for many loose object files are ensured to hit the disk
platter.
* ns/batch-fsync:
core.fsyncmethod: performance tests for batch mode
t/perf: add iteration setup mechanism to perf-lib
core.fsyncmethod: tests for batch mode
test-lib-functions: add parsing helpers for ls-files and ls-tree
core.fsync: use batch mode and sync loose objects by default on Windows
unpack-objects: use the bulk-checkin infrastructure
update-index: use the bulk-checkin infrastructure
builtin/add: add ODB transaction around add_files_to_cache
cache-tree: use ODB transaction around writing a tree
core.fsyncmethod: batched disk flushes for loose-objects
bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
bulk-checkin: rename 'state' variable and separate 'plugged' boolean
Junio C Hamano [Fri, 3 Jun 2022 21:30:33 +0000 (14:30 -0700)]
Merge branch 'en/sparse-cone-becomes-default'
Deprecate non-cone mode of the sparse-checkout feature.
* en/sparse-cone-becomes-default:
Documentation: some sparsity wording clarifications
git-sparse-checkout.txt: mark non-cone mode as deprecated
git-sparse-checkout.txt: flesh out pattern set sections a bit
git-sparse-checkout.txt: add a new EXAMPLES section
git-sparse-checkout.txt: shuffle some sections and mark as internal
git-sparse-checkout.txt: update docs for deprecation of 'init'
git-sparse-checkout.txt: wording updates for the cone mode default
sparse-checkout: make --cone the default
tests: stop assuming --no-cone is the default mode for sparse-checkout
Add a test for the regression introduced in my 9c4d58ff2c3 (ls-tree:
split up "fast path" callbacks, 2022-03-23) and fixed in 350296cc789 (ls-tree: `-l` should not imply recursive listing,
2022-04-04), and test for the test of ls-tree option/mode combinations
to make sure we don't have other blind spots.
The setup for these tests can be shared with those added in the 1041d58b4d9 (Merge branch 'tl/ls-tree-oid-only', 2022-04-04) topic, so
let's create a new t/lib-t3100.sh to help them share data.
The existing tests in "t3104-ls-tree-format.sh" didn't deal with a
submodule, which they'll now encounter with as the
setup_basic_ls_tree_data() sets one up.
This extensive testing should give us confidence that there were no
further regressions in this area. The lack of testing was noted back
in [1], but unfortunately we didn't cover that blind-spot before 9c4d58ff2c3.
run-command API users: use "env" not "env_array" in comments & names
Follow-up on a preceding commit which changed all references to the
"env_array" when referring to the "struct child_process" member. These
changes are all unnecessary for the compiler, but help the code's
human readers.
All the comments that referred to "env_array" have now been updated,
as well as function names and variables that had "env_array" in their
name, they now refer to "env".
In addition the "out" name for the submodule.h prototype was
inconsistent with the function definition's use of "env_array" in
submodule.c. Both of them use "env" now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start following-up on the rename mentioned in c7c4bdeccf3 (run-command
API: remove "env" member, always use "env_array", 2021-11-25) of
"env_array" to "env".
The "env_array" name was picked in 19a583dc39e (run-command: add
env_array, an optional argv_array for env, 2014-10-19) because "env"
was taken. Let's not forever keep the oddity of "*_array" for this
"struct strvec", but not for its "args" sibling.
This commit is almost entirely made with a coccinelle rule[1]. The
only manual change here is in run-command.h to rename the struct
member itself and to change "env_array" to "env" in the
CHILD_PROCESS_INIT initializer.
Change "BUG" output originally added in a97e4075a16 (Keep
rename/rename conflicts of intermediate merges while doing recursive
merge, 2007-03-31), and later made to say it was a "BUG" in 19c6a4f8369 (merge-recursive: do not return NULL only to cause
segfault, 2010-01-21) to use the new bug() function.
This gets the same job done with slightly less code, as we won't need
to prefix lines with "BUG: ". More importantly we'll now log the full
set of messages via trace2, before this we'd only log the one BUG()
invocation.
We don't replace the last "BUG()" invocation with "BUG_if_bug()", as
in this case we're sure that we called bug() earlier, so there's no
need to make it a conditional.
While we're at it let's replace "There" with "there" in the message,
i.e. not start a message with a capital letter, per the
CodingGuidelines.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amend code added in a6a84319686 (receive-pack.c: shorten the
execute_commands loop over all commands, 2015-01-07) and amended to
hard die in b6a4788586d (receive-pack.c: die instead of error in case
of possible future bug, 2015-01-07) to use the new bug() function
instead.
Let's also rename the warn_if_*() function that code is in to
BUG_if_*(), its name became outdated in b6a4788586d.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse-options.c: use optbug() instead of BUG() "opts" check
Change the assertions added in bf3ff338a25 (parse-options: stop
abusing 'callback' for lowlevel callbacks, 2019-01-27) to use optbug()
instead of BUG(). At this point we're looping over individual options,
so if we encounter any issues we'd like to report the offending option.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we run into bugs in parse-options.c usage it's good to be able to
note all the issues we ran into before dying. This use-case is why we
have the optbug() function introduced in 1e5ce570ca3 (parse-options:
clearer reporting of API misuse, 2010-12-02)
Let's change this code to use the new bug() API introduced in the
preceding commit, which cuts down on the verbosity of
parse_options_check().
There are existing uses of BUG() in adjacent code that should have
been using optbug() that aren't being changed here. That'll be done in
a subsequent commit. This only changes the optbug() callers.
Since this will invoke BUG() the previous exit(128) code will be
changed, but in this case that's what we want, i.e. to have
encountering a BUG() return the specific "BUG" exit code.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
usage.c: add a non-fatal bug() function to go with BUG()
Add a bug() function to use in cases where we'd like to indicate a
runtime BUG(), but would like to defer the BUG() call because we're
possibly accumulating more bug() callers to exhaustively indicate what
went wrong.
We already have this sort of facility in various parts of the
codebase, just in the form of ad-hoc re-inventions of the
functionality that this new API provides. E.g. this will be used to
replace optbug() in parse-options.c, and the 'error("BUG:[...]' we do
in a loop in builtin/receive-pack.c.
Unlike the code this replaces we'll log to trace2 with this new bug()
function (as with other usage.c functions, including BUG()), we'll
also be able to avoid calls to xstrfmt() in some cases, as the bug()
function itself accepts variadic sprintf()-like arguments.
Any caller to bug() can follow up such calls with BUG_if_bug(),
which will BUG() out (i.e. abort()) if there were any preceding calls
to bug(), callers can also decide not to call BUG_if_bug() and leave
the resulting BUG() invocation until exit() time. There are currently
no bug() API users that don't call BUG_if_bug() themselves after a
for-loop, but allowing for not calling BUG_if_bug() keeps the API
flexible. As the tests and documentation here show we'll catch missing
BUG_if_bug() invocations in our exit() wrapper.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
common-main.c: move non-trace2 exit() behavior out of trace2.c
Change the exit() wrapper added in ee4512ed481 (trace2: create new
combined trace facility, 2019-02-22) so that we'll split up the trace2
logging concerns from wanting to wrap the "exit()" function itself for
other purposes.
This makes more sense structurally, as we won't seem to conflate
non-trace2 behavior with the trace2 code. I'd previously added an
explanation for this in 368b5843158 (common-main.c: call exit(), don't
return, 2021-12-07), that comment is being adjusted here.
Now the only thing we'll do if we're not using trace2 is to truncate
the "code" argument to the lowest 8 bits.
We only need to do that truncation on non-POSIX systems, but in ee4512ed481 that "if defined(__MINGW32__)" code added in 47e3de0e796 (MinGW: truncate exit()'s argument to lowest 8 bits,
2009-07-05) was made to run everywhere. It might be good for clarify
to narrow that down by an "ifdef" again, but I'm not certain that in
the interim we haven't had some other non-POSIX systems rely the
behavior. On a POSIX system taking the lowest 8 bits is implicit, see
exit(3)[1] and wait(2)[2]. Let's leave a comment about that instead.
Jason Yundt [Thu, 2 Jun 2022 11:43:05 +0000 (07:43 -0400)]
gitweb: switch to an XHTML5 DOCTYPE
According to the HTML Standard FAQ:
“What is the DOCTYPE for modern HTML documents?
In text/html documents:
<!DOCTYPE html>
In documents delivered with an XML media type: no DOCTYPE is required
and its use is generally unnecessary. However, you may use one if you
want (see the following question). Note that the above is well-formed
XML.”
Source: [1]
Gitweb uses an XHTML 1.0 DOCTYPE:
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
While that DOCTYPE is still valid [2], it has several disadvantages:
1. It’s misleading. If an XML parser uses the DTD at the given link,
then the entities and ⋅ won’t get declared. Instead, the
parser has to use a DTD from the HTML Standard that has nothing to do
with XHTML 1.0 [2].
2. It’s obsolete. XHTML 1.0 was last revised in 2002 and was superseded in
2018 [3].
3. It’s unreliable. Gitweb uses and ⋅ but lets an external file
define them. “[…U]using entity references for characters in XML documents
is unsafe if they are defined in an external file (except for <, >,
&, ", and ').” [4]
Glen Choo [Tue, 31 May 2022 23:12:33 +0000 (23:12 +0000)]
remote.c: don't BUG() on 0-length branch names
4a2dcb1a08 (remote: die if branch is not found in repository,
2021-11-17) introduced a regression where multiple config entries with
an empty branch name, e.g.
[branch ""]
remote = foo
merge = bar
could cause Git to fail when it tries to look up branch tracking
information.
We parse the config key to get (branch name, branch name length), but
when the branch name subsection is empty, we get a bogus branch name,
e.g. "branch..remote" gives (".remote", 0). We continue to use the bogus
branch name as if it were valid, and prior to 4a2dcb1a08, this wasn't an
issue because length = 0 caused the branch name to effectively be ""
everywhere.
However, that commit handles length = 0 inconsistently when we create
the branch:
- When find_branch() is called to check if the branch exists in the
branch hash map, it interprets a length of 0 to mean that it should
call strlen on the char pointer.
- But the code path that inserts into the branch hash map interprets a
length of 0 to mean that the string is 0-length.
This results in the bug described above:
- "branch..remote" looks for ".remote" in the branch hash map. Since we
do not find it, we insert the "" entry into the hash map.
- "branch..merge" looks for ".merge" in the branch hash map. Since we
do not find it, we again try to insert the "" entry into the hash map.
However, the entries in the branch hash map are supposed to be
appended to, not overwritten.
- Since overwriting an entry is a BUG(), Git fails instead of silently
ignoring the empty branch name.
Fix the bug by removing the convenience strlen functionality, so that
0 means that the string is 0-length. We still insert a bogus branch name
into the hash map, but this will be fixed in a later commit.
Reported-by: "Ing. Martin Prantl Ph.D." <perry@ntis.zcu.cz> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 31 May 2022 16:22:20 +0000 (09:22 -0700)]
revert: --reference should apply only to 'revert', not 'cherry-pick'
As 'revert' and 'cherry-pick' share a lot of code, it is easy to
modify the behaviour of one command and inadvertently affect the
other. An earlier change to teach the '--reference' option and the
'revert.reference' configuration variable to the former was not
careful enough and 'cherry-pick --reference' wasn't rejected as an
error.
It is possible to think 'cherry-pick -x' might benefit from the
'--reference' option, but it is fundamentally different from
'revert' in at least two ways to make it questionable:
- 'revert' names a commit that is ancestor of the resulting commit,
so an abbreviated object name with human readable title is
sufficient to identify the named commit uniquely without using
the full object name. On the other hand, 'cherry-pick'
usually [*] picks a commit that is not an ancestor. It might be
even picking a private commit that never becomes part of the
public history.
- The whole commit message of 'cherry-pick' is a copy of the
original commit, and there is nothing gained to repeat only the
title part on 'cherry-picked from' message.
[*] well, you could revert and then you can pick the original that
was reverted to get back to where you were, but then you can
revert the revert to do the same thing.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 31 May 2022 06:24:02 +0000 (23:24 -0700)]
Merge branch 'cc/http-curlopt-resolve'
With the new http.curloptResolve configuration, the CURLOPT_RESOLVE
mechanism that allows cURL based applications to use pre-resolved
IP addresses for the requests is exposed to the scripts.
* cc/http-curlopt-resolve:
http: add custom hostname to IP address resolutions
scalar: teach `diagnose` to gather loose objects information
When operating at the scale that Scalar wants to support, certain data
shapes are more likely to cause undesirable performance issues, such as
large numbers of loose objects.
By including statistics about this, `scalar diagnose` now makes it
easier to identify such scenarios.
Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's helpful to see if there are other crud files in the pack
directory. Let's teach the `scalar diagnose` command to gather
file size information about pack files.
While at it, also enumerate the pack files in the alternate
object directories, if any are registered.
Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When analyzing problems with large worktrees/repositories, it is useful
to know how close to a "full disk" situation Scalar/Git operates. Let's
include this information.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.
The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.
Note: originally, Scalar was implemented in C# using the .NET API, where
we had the luxury of a comprehensive standard library that includes
basic functionality such as writing a `.zip` file. In the C version, we
lack such a commodity. Rather than introducing a dependency on, say,
libzip, we slightly abuse Git's `archive` machinery: we write out a
`.zip` of the empty try, augmented by a couple files that are added via
the `--add-file*` options. We are careful trying not to modify the
current repository in any way lest the very circumstances that required
`scalar diagnose` to be run are changed by the `diagnose` run itself.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `scalar` command needs a Scalar enlistment for many subcommands, and
looks in the current directory for such an enlistment (traversing the
parent directories until it finds one).
These is subcommands can also be called with an optional argument
specifying the enlistment. Here, too, we traverse parent directories as
needed, until we find an enlistment.
However, if the specified directory does not even exist, or is not a
directory, we should stop right there, with an error message.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the `--add-virtual-file=<path>:<content>` option, `git archive` now
supports use cases where relatively trivial files need to be added that
do not exist on disk.
This will allow us to generate `.zip` files with generated content,
without having to add said content to the object database and without
having to write it out to disk.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[jc: tweaked <path> handling] Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sun, 29 May 2022 22:39:51 +0000 (15:39 -0700)]
pathspec: correct an empty string used as a pathspec element
Pathspecs with only negative elements did not work with some
commands that pass the pathspec along to a subprocess. For
instance,
$ git add -p -- ':!*.txt'
should add everything except for paths ending in ".txt", but it gets
complaint from underlying "diff-index" and aborts.
We used to error out when a pathspec with only negative elements in
it, like the one in the above example. Later, 859b7f1d (pathspec:
don't error out on all-exclusionary pathspec patterns, 2017-02-07)
updated the logic to add an empty string as an extra element. The
intention was to let the extra element to match everything and let
the negative ones given by the user to subtract from it.
At around the same time, we were migrating from "an empty string is
a valid pathspec element that matches everything" to "either a dot
or ":/" is used to match all, and an empty string is rejected",
between d426430e (pathspec: warn on empty strings as pathspec,
2016-06-22) and 9e4e8a64 (pathspec: die on empty strings as
pathspec, 2017-06-06). I think 9e4e8a64, which happened long after 859b7f1d happened, was not careful enough to turn the empty string 859b7f1d added to either a dot or ":/".
A care should be taken as the definition of "everything" depends on
subcommand. For the purpose of "add -p", adding a "." to add
everything in the current directory is the right thing to do. But
for some other commands, ":/" (i.e. really really everything, even
things outside the current subdirectory) is the right choice.
We would break commands in a big way if we get this wrong, so add a
handful of test pieces to make sure the resulting code still
excludes the paths that are expected and includes "everything" else.
Junio C Hamano [Thu, 26 May 2022 19:37:31 +0000 (12:37 -0700)]
http.c: clear the 'finished' member once we are done with it
In http.c, the run_active_slot() function allows the given "slot" to
make progress by calling step_active_slots() in a loop repeatedly,
and the loop is not left until the request held in the slot
completes.
Ages ago, we used to use the slot->in_use member to get out of the
loop, which misbehaved when the request in "slot" completes (at
which time, the result of the request is copied away from the slot,
and the in_use member is cleared, making the slot ready to be
reused), and the "slot" gets reused to service a different request
(at which time, the "slot" becomes in_use again, even though it is
for a different request). The loop terminating condition mistakenly
thought that the original request has yet to be completed.
Today's code, after baa7b67d (HTTP slot reuse fixes, 2006-03-10)
fixed this issue, uses a separate "slot->finished" member that is
set in run_active_slot() to point to an on-stack variable, and the
code that completes the request in finish_active_slot() clears the
on-stack variable via the pointer to signal that the particular
request held by the slot has completed. It also clears the in_use
member (as before that fix), so that the slot itself can safely be
reused for an unrelated request.
One thing that is not quite clean in this arrangement is that,
unless the slot gets reused, at which point the finished member is
reset to NULL, the member keeps the value of &finished, which
becomes a dangling pointer into the stack when run_active_slot()
returns. Clear the finished member before the control leaves the
function, which has a side effect of unconfusing compilers like
recent GCC 12 that is over-eager to warn against such an assignment.
Frantisek Hrbata [Fri, 20 May 2022 12:49:52 +0000 (14:49 +0200)]
transport: free local and remote refs in transport_push()
Fix memory leaks in transport_push(), where remote_refs and local_refs
are never freed.
116 bytes in 1 blocks are definitely lost in loss record 56 of 103
at 0x484486F: malloc (vg_replace_malloc.c:381)
by 0x4938D7E: strdup (strdup.c:42)
by 0x628418: xstrdup (wrapper.c:39)
by 0x4FD454: process_capabilities (connect.c:232)
by 0x4FD454: get_remote_heads (connect.c:354)
by 0x610A38: handshake (transport.c:333)
by 0x612B02: transport_push (transport.c:1302)
by 0x4803D6: push_with_options (push.c:357)
by 0x4811D6: do_push (push.c:414)
by 0x4811D6: cmd_push (push.c:650)
by 0x405210: run_builtin (git.c:465)
by 0x405210: handle_builtin (git.c:719)
by 0x406363: run_argv (git.c:786)
by 0x406363: cmd_main (git.c:917)
by 0x404F17: main (common-main.c:56)
5,912 (388 direct, 5,524 indirect) bytes in 2 blocks are definitely lost in loss record 98 of 103
at 0x4849464: calloc (vg_replace_malloc.c:1328)
by 0x628705: xcalloc (wrapper.c:150)
by 0x5C216D: alloc_ref_with_prefix (remote.c:975)
by 0x5C232A: alloc_ref (remote.c:983)
by 0x5C232A: one_local_ref (remote.c:2299)
by 0x5C232A: one_local_ref (remote.c:2289)
by 0x5BDB03: do_for_each_repo_ref_iterator (iterator.c:418)
by 0x5B4C4F: do_for_each_ref (refs.c:1486)
by 0x5B4C4F: refs_for_each_ref (refs.c:1492)
by 0x5B4C4F: for_each_ref (refs.c:1497)
by 0x5C6ADF: get_local_heads (remote.c:2310)
by 0x612A85: transport_push (transport.c:1286)
by 0x4803D6: push_with_options (push.c:357)
by 0x4811D6: do_push (push.c:414)
by 0x4811D6: cmd_push (push.c:650)
by 0x405210: run_builtin (git.c:465)
by 0x405210: handle_builtin (git.c:719)
by 0x406363: run_argv (git.c:786)
by 0x406363: cmd_main (git.c:917)
Signed-off-by: Frantisek Hrbata <frantisek@hrbata.com> Reviewed-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Frantisek Hrbata [Fri, 20 May 2022 12:49:51 +0000 (14:49 +0200)]
transport: unify return values and exit point from transport_push()
It seems there is no reason to return 1 instead of -1 when push_refs()
is not set in transport vtable. Let's unify the error return values and
use the done label as a single exit point from transport_push().
Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Frantisek Hrbata <frantisek@hrbata.com> Reviewed-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Frantisek Hrbata [Fri, 20 May 2022 12:49:50 +0000 (14:49 +0200)]
transport: remove unnecessary indenting in transport_push()
Remove the big indented block for transport_push() check in transport vtable
and let's just return error immediately. Hopefully this makes the code
more readable.
Signed-off-by: Frantisek Hrbata <frantisek@hrbata.com> Reviewed-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This does not encourage the best practice of describing not just
"what" (i.e. "Revert X" on the title says what we did) but "why"
(i.e. and it does not say why X was undesirable).
We can instead phrase this first line of the body to be more like
This reverts commit 8fa7f667 (do this and that, 2022-04-25)
so that the title does not have to be
Revert "do this and that"
We can instead use the title to describe "why" we are reverting the
original commit.
Introduce the "--reference" option to "git revert", and also the
revert.reference configuration variable, which defaults to false, to
tweak the title and the first line of the draft commit message for
when creating a "revert" commit.
When this option is in use, the first line of the pre-filled editor
buffer becomes a comment line that tells the user to say _why_. If
the user exits the editor without touching this line by mistake,
what we prepare to become the first line of the body, i.e. "This
reverts commit 8fa7f667 (do this and that, 2022-04-25)", ends up to
be the title of the resulting commit. This behaviour is designed to
help such a user to identify such a revert in "git log --oneline"
easily so that it can be further reworded with "git rebase -i" later.
Jeff Hostetler [Thu, 26 May 2022 21:47:24 +0000 (21:47 +0000)]
t7527: improve implicit shutdown testing in fsmonitor--daemon
Refactor the tests that exercise implicit shutdown cases
to make them more robust and less racy.
The fsmonitor--daemon will implicitly shutdown in a variety
of situations, such as when the ".git" directory is deleted
or renamed.
The existing tests would delete or rename the directory, sleep
for one second, and then check the status of the daemon. This
is racy, since the client/status command has no way to sync
with the daemon. This was noticed occasionally on very slow
CI build machines where it would cause a random test to fail.
Replace the simple sleep with a sleep-and-retry loop.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:47:23 +0000 (21:47 +0000)]
fsmonitor--daemon: allow --super-prefix argument
Create a test in t7527 to verify that we get a stray warning from
`git fsmonitor--daemon start` when indirectly called from
`git submodule absorbgitdirs`.
Update `git fsmonitor--daemon` to take (and ignore) the `--super-prefix`
argument to suppress the warning.
When we have:
1. a submodule with a `sub/.git/` directory (rather than a `sub/.git`
file).
2. `core.fsmonitor` is turned on in the submodule, but the daemon is
not yet started in the submodule.
3. and someone does a `git submodule absorbgitdirs` in the super.
Git will recursively invoke `git submodule--helper absorb-git-dirs`
in the submodule. This will read the index and may attempt to start
the fsmonitor--daemon with the `--super-prefix` argument.
`git fsmonitor--daemon start` does not accept the `--super-prefix`
argument and causes a warning to be issued.
This does not cause a problem because the `refresh_index()` code
assumes a trivial response if the daemon does not start.
The net-net is a harmelss, but stray warning. Lets eliminate the
warning.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:47:22 +0000 (21:47 +0000)]
t7527: test Unicode NFC/NFD handling on MacOS
Confirm that the daemon reports events using the on-disk
spelling for Unicode NFC/NFD characters. On APFS we still
have Unicode aliasing, so we cannot create two files that
only differ by NFC/NFD, but the on-disk format preserves
the spelling used to create the file. On HFS+ we also
have aliasing, but the path is always stored on disk in
NFD.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:47:20 +0000 (21:47 +0000)]
t/helper/hexdump: add helper to print hexdump of stdin
Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:47:19 +0000 (21:47 +0000)]
fsmonitor: on macOS also emit NFC spelling for NFD pathname
Emit NFC or NFC and NFD spellings of pathnames on macOS.
MacOS is Unicode composition insensitive, so NFC and NFD spellings are
treated as aliases and collide. While the spelling of pathnames in
filesystem events depends upon the underlying filesystem, such as
APFS, HFS+ or FAT32, the OS enforces such collisions regardless of
filesystem.
Teach the daemon to always report the NFC spelling and to report
the NFD spelling when stored in that format on the disk.
This is slightly more general than "core.precomposeUnicode".
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:47:17 +0000 (21:47 +0000)]
fsmonitor: never set CE_FSMONITOR_VALID on submodules
Never set CE_FSMONITOR_VALID on the cache-entry of submodule
directories.
During a client command like 'git status', we may need to recurse
into each submodule to compute a status summary for the submodule.
Since the purpose of the ce_flag is to let Git avoid scanning a
cache-entry, setting the flag causes the recursive call to be
avoided and we report incorrect (no status) for the submodule.
We created an OS watch on the root directory of our working
directory and we receive events for everything in the cone
under it. When submodules are present inside our working
directory, we receive events for both our repo (the super) and
any subs within it. Since our index doesn't have any information
for items within the submodules, we can't use those events.
We could try to truncate the paths of those events back to the
submodule boundary and mark the GITLINK as dirty, but that
feels expensive since we would have to prefix compare every FS
event that we receive against a list of submodule roots. And
it still wouldn't be sufficient to correctly report status on
the submodule, since we don't have any space in the cache-entry
to cache the submodule's status (the 'SCMU' bits in porcelain
V2 speak). That is, the CE_FSMONITOR_VALID bit just says that
we don't need to scan/inspect it because we already know the
answer -- it doesn't say that the item is clean -- and we
don't have space in the cache-entry to store those answers.
So we should always do the recursive scan.
Therefore, we should never set the flag on GITLINK cache-entries.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:47:14 +0000 (21:47 +0000)]
fsmonitor: optimize processing of directory events
Teach Git to perform binary search over the cache-entries for a directory
notification and then linearly scan forward to find the immediate children.
Previously, when the FSMonitor reported a modified directory Git would
perform a linear search on the entire cache-entry array for all
entries matching that directory prefix and invalidate them. Since the
cache-entry array is already sorted, we can use a binary search to
find the first matching entry and then only linearly walk forward and
invalidate entries until the prefix changes.
Also, the original code would invalidate anything having the same
directory prefix. Since a directory event should only be received for
items that are immediately within the directory (and not within
sub-directories of it), only invalidate those entries and not the
whole subtree.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:47:12 +0000 (21:47 +0000)]
fsm-health-win32: force shutdown daemon if worktree root moves
Force shutdown fsmonitor daemon if the worktree root directory
is moved, renamed, or deleted.
Use Windows low-level GetFileInformationByHandle() to get and
compare the Windows system unique ID for the directory with a
cached version when we started up. This lets us detect the
case where someone renames the directory that we are watching
and then creates a new directory with the original pathname.
This is important because we are listening to a named pipe for
requests and they are stored in the Named Pipe File System (NPFS)
which a kernel-resident pseudo filesystem not associated with
the actual NTFS directory.
For example, if the daemon was watching "~/foo/", it would have
a directory-watch handle on that directory and a named-pipe
handle for "//./pipe/...foo". Moving the directory to "~/bar/"
does not invalidate the directory handle. (So the daemon would
actually be watching "~/bar" but listening on "//./pipe/...foo".
If the user then does "git init ~/foo" and causes another daemon
to start, the first daemon will still have ownership of the pipe
and the second daemon instance will fail to start. "git status"
clients in "~/foo" will ask "//./pipe/...foo" about changes and
the first daemon instance will tell them about "~/bar".
This commit causes the first daemon to shutdown if the system unique
ID for "~/foo" changes (changes from what it was when the daemon
started). Shutdown occurs after a periodic poll. After the
first daemon exits and releases the lock on the named pipe,
subsequent Git commands may cause another daemon to be started
on "~/foo". Similarly, a subsequent Git command may cause another
daemon to be started on "~/bar".
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:47:10 +0000 (21:47 +0000)]
fsmonitor--daemon: stub in health thread
Create another thread to watch over the daemon process and
automatically shut it down if necessary.
This commit creates the basic framework for a "health" thread
to monitor the daemon and/or the file system. Later commits
will add platform-specific code to do the actual work.
The "health" thread is intended to monitor conditions that
would be difficult to track inside the IPC thread pool and/or
the file system listener threads. For example, when there are
file system events outside of the watched worktree root or if
we want to have an idle-timeout auto-shutdown feature.
This commit creates the health thread itself, defines the thread-proc
and sets up the thread's event loop. It integrates this new thread
into the existing IPC and Listener thread models.
This commit defines the API to the platform-specific code where all of
the monitoring will actually happen.
The platform-specific code for MacOS is just stubs. Meaning that the
health thread will immediately exit on MacOS, but that is OK and
expected. Future work can define MacOS-specific monitoring.
The platform-specific code for Windows sets up enough of the
WaitForMultipleObjects() machinery to watch for system and/or custom
events. Currently, the set of wait handles only includes our custom
shutdown event (sent from our other theads). Later commits in this
series will extend the set of wait handles to monitor other
conditions.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:47:07 +0000 (21:47 +0000)]
fsmonitor--daemon: cd out of worktree root
Teach the fsmonitor--daemon to CD outside of the worktree
before starting up.
The common Git startup mechanism causes the CWD of the daemon process
to be in the root of the worktree. On Windows, this causes the daemon
process to hold a locked handle on the CWD and prevents other
processes from moving or deleting the worktree while the daemon is
running.
CD to HOME before entering main event loops.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:47:06 +0000 (21:47 +0000)]
fsm-listen-darwin: ignore FSEvents caused by xattr changes on macOS
Ignore FSEvents resulting from `xattr` changes. Git does not care about
xattr's or changes to xattr's, so don't waste time collecting these
events in the daemon nor transmitting them to clients.
Various security tools add xattrs to files and/or directories, such as
to mark them as having been downloaded. We should ignore these events
since it doesn't affect the content of the file/directory or the normal
meta-data that Git cares about.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:47:05 +0000 (21:47 +0000)]
unpack-trees: initialize fsmonitor_has_run_once in o->result
Initialize `o->result.fsmonitor_has_run_once` based upon value
in `o->src_index->fsmonitor_has_run_once` to prevent a second
fsmonitor query during the tree traversal and possibly getting
a skewed view of the working directory.
The checkout code has already talked to the fsmonitor and the
traversal is updating the index as it traverses, so there is
no need to query the fsmonitor.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:47:04 +0000 (21:47 +0000)]
fsmonitor-settings: NTFS and FAT32 on MacOS are incompatible
On MacOS mark repos on NTFS or FAT32 volumes as incompatible.
The builtin FSMonitor used Unix domain sockets on MacOS for IPC
with clients. These sockets are kept in the .git directory.
Unix sockets are not supported by NTFS and FAT32, so the daemon
cannot start up.
Test for this during our compatibility checking so that client
commands do not keep trying to start the daemon.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:47:00 +0000 (21:47 +0000)]
fsmonitor-settings: VFS for Git virtual repos are incompatible
VFS for Git virtual repositories are incompatible with FSMonitor.
VFS for Git is a downstream fork of Git. It contains its own custom
file system watcher that is aware of the virtualization. If a working
directory is being managed by VFS for Git, we should not try to watch
it because we may get incomplete results.
We do not know anything about how VFS for Git works, but we do
know that VFS for Git working directories contain a well-defined
config setting. If it is set, mark the working directory as
incompatible.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff Hostetler [Thu, 26 May 2022 21:46:59 +0000 (21:46 +0000)]
fsmonitor-settings: stub in Win32-specific incompatibility checking
Extend generic incompatibility checkout with platform-specific
mechanism. Stub in Win32 version.
In the existing fsmonitor-settings code we have a way to mark
types of repos as incompatible with fsmonitor (whether via the
hook and IPC APIs). For example, we do this for bare repos,
since there are no files to watch.
Extend this exclusion mechanism for platform-specific reasons.
This commit just creates the framework and adds a stub for Win32.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>