git.ipfire.org Git - thirdparty/git.git/log

Change default merge backend from recursive to ort

There are a few reasons to switch the default:
  * Correctness
  * Extensibility
  * Performance

I'll provide some summaries about each.

=== Correctness ===

The original impetus for a new merge backend was to fix issues that were
difficult to fix within recursive's design.  The success with this goal
is perhaps most easily demonstrated by running the following:

  $ git grep -2 KNOWN_FAILURE t/ | grep -A 4 GIT_TEST_MERGE_ALGORITHM
  $ git grep test_expect_merge_algorithm.failure.success t/
  $ git grep test_expect_merge_algorithm.success.failure t/

In order, these greps show:

  * Seven sets of submodule tests (10 total tests) that fail with
    recursive but succeed with ort
  * 22 other tests that fail with recursive, but succeed with ort
  * 0 tests that pass with recursive, but fail with ort

=== Extensibility ===

Being able to perform merges without touching the working tree or index
makes it possible to create new features that were difficult with the
old backend:

  * Merging, cherry-picking, rebasing, reverting in bare repositories...
    or just on branches that aren't checked out.

  * `git diff AUTO_MERGE` -- ability to see what changes the user has
    made to resolve conflicts so far (see commit 5291828df8 ("merge-ort:
    write $GIT_DIR/AUTO_MERGE whenever we hit a conflict", 2021-03-20)

  * A --remerge-diff option for log/show, used to show diffs for merges
    that display the difference between what an automatic merge would
    have created and what was recorded in the merge.  (This option will
    often result in an empty diff because many merges are clean, but for
    the non-clean ones it will show how conflicts were fixed including
    the removal of conflict markers, and also show additional changes
    made outside of conflict regions to e.g. fix semantic conflicts.)

  * A --remerge-diff-only option for log/show, similar to --remerge-diff
    but also showing how cherry-picks or reverts differed from what an
    automatic cherry-pick or revert would provide.

The last three have been implemented already (though only one has been
submitted upstream so far; the others were waiting for performance work
to complete), and I still plan to implement the first one.

=== Performance ===

I'll quote from the summary of my final optimization for merge-ort
(while fixing the testcase name from 'no-renames' to 'few-renames'):

                               Timings

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename    merge-ort
                 v2.30.0      current     detection     current
                ----------   ---------   -----------   ---------
few-renames:      18.912 s    18.030 s     11.699 s     198.3 ms
mega-renames:   5964.031 s   361.281 s    203.886 s     661.8 ms
just-one-mega:   149.583 s    11.009 s      7.553 s     264.6 ms

                           Speedup factors

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
few-renames:        1           1.05         1.6           95
mega-renames:       1          16.5         29           9012
just-one-mega:      1          13.6         20            565

And, for partial clone users:

             Factor reduction in number of objects needed

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
mega-renames:       1            1            1          181.3

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Update error message and code comment

There were two locations in the code that referred to 'merge-recursive'
but which were also applicable to 'merge-ort'. Update them to more
general wording.

Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-strategies.txt: add coverage of the `ort` merge strategy

Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-rebase.txt: correct out-of-date and misleading text about renames

Commit 58634dbff8 ("rebase: Allow merge strategies to be used when
rebasing", 2006-06-21) added the --merge option to git-rebase so that
renames could be detected (at least when using the `recursive` merge
backend).  However, git-am -3 gained that same ability in commit
579c9bb198 ("Use merge-recursive in git-am -3.", 2006-12-28).  As such,
the comment about being able to detect renames is not particularly
noteworthy.  Remove it.

Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-strategies.txt: fix simple capitalization error

Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-strategies.txt: avoid giving special preference to patience algorithm

We already have diff-algorithm that explains why there are special diff
algorithms, so we do not need to re-explain patience. patience exists
as its own toplevel option for historical reasons, but there's no reason
to give it special preference or document it again and suggest it's more
important than other diff algorithms, so just refer to it as a
deprecated shorthand for `diff-algorithm=patience`.

Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-strategies.txt: do not imply using copy detection is desired

Stating that the recursive strategy "currently cannot make use of
detected copies" implies that this is a technical shortcoming of the
current algorithm.  I disagree with that.  I don't see how copies could
possibly be used in a sane fashion in a merge algorithm -- would we
propagate changes in one file on one side of history to each copy of
that file when merging?  That makes no sense to me.  I cannot think of
anything else that would make sense either.  Change the wording to
simply state that we ignore any copies.

Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-strategies.txt: update wording for the resolve strategy

It is probably helpful to cover the default merge strategy first, so
move the text for the resolve strategy to later in the document.

Further, the wording for "resolve" claimed that it was "considered
generally safe and fast", which might imply in some readers minds that
the same is not true of other strategies. Rather than adding this text
to all the strategies, just remove it from this one.

Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: edit awkward references to `git merge-recursive`

A few places in the documentation referred to the "`recursive` strategy"
using the phrase "`git merge-recursive`", suggesting that it was forking
subprocesses to call a toplevel builtin. Perhaps that was relevant to
when rebase was a shell script, but it seems like a rather indirect way
to refer to the `recursive` strategy. Simplify the references.

Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

directory-rename-detection.txt: small updates due to merge-ort optimizations

In commit 0c4fd732f0 ("Move computation of dir_rename_count from
merge-ort to diffcore-rename", 2021-02-27), much of the logic for
computing directory renames moved into diffcore-rename.
directory-rename-detection.txt had claims that all of that logic was
found in merge-recursive. Update the documentation.

Acked-by: Derrick Stolee <dstolee@microsoft.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-rebase.txt: correct antiquated claims about --rebase-merges

When --rebase-merges was first introduced, it only worked with the
`recursive` strategy.  Some time later, it gained support for merges
using the `octopus` strategy.  The limitation of only supporting these
two strategies was documented in 25cff9f109 ("rebase -i --rebase-merges:
add a section to the man page", 2018-04-25) and lifted in e145d99347
("rebase -r: support merge strategies other than `recursive`",
2019-07-31).  However, when the limitation was lifted, the documentation
was not updated.  Update it now.

Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Git 2.33-rc0

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jk/check-pack-valid-before-opening-bitmap'

A race between repacking and using pack bitmaps has been corrected.

* jk/check-pack-valid-before-opening-bitmap:
pack-bitmap: check pack validity when opening bitmap

Merge branch 'jt/bulk-prefetch'

"git read-tree" had a codepath where blobs are fetched one-by-one
from the promisor remote, which has been corrected to fetch in bulk.

* jt/bulk-prefetch:
cache-tree: prefetch in partial clone read-tree
unpack-trees: refactor prefetching code

Merge branch 'fc/pull-no-rebase-merges-theirs-into-ours'

Documentation fix for "git pull --rebase=no".

* fc/pull-no-rebase-merges-theirs-into-ours:
doc: pull: fix rebase=false documentation

Merge branch 'ab/bundle-tests'

"git bundle" gained more test coverage.

* ab/bundle-tests:
bundle tests: use test_cmp instead of grep
bundle tests: use ">file" not ": >file"

Merge branch 'ps/perf-with-separate-output-directory'

Test update.

* ps/perf-with-separate-output-directory:
perf: fix when running with TEST_OUTPUT_DIRECTORY

Merge branch 'js/ci-check-whitespace-updates'

CI update.

* js/ci-check-whitespace-updates:
ci(check-whitespace): restrict to the intended commits
ci(check-whitespace): stop requiring a read/write token

Merge branch 'jk/config-env-doc'

Documentation around GIT_CONFIG has been updated.

* jk/config-env-doc:
  doc/git-config: simplify "override" advice for FILES section
  doc/git-config: clarify GIT_CONFIG environment variable
  doc/git-config: explain --file instead of referring to GIT_CONFIG

Merge branch 'pb/submodule-recurse-doc'

Doc update.

* pb/submodule-recurse-doc:
doc: clarify description of 'submodule.recurse'

Merge branch 'tb/bitmap-type-filter-comment-fix'

In-code comment update.

* tb/bitmap-type-filter-comment-fix:
pack-bitmap: clarify comment in filter_bitmap_exclude_type()

The seventh batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ps/t0000-output-directory-fix'

"TEST_OUTPUT_DIRECTORY=there make test" failed to work, which has
been corrected.

* ps/t0000-output-directory-fix:
t0000: fix test if run with TEST_OUTPUT_DIRECTORY

Merge branch 'tb/reverse-midx'

The code that gives an error message in "git multi-pack-index" when
no subcommand is given tried to print a NULL pointer as a strong,
which has been corrected.

* tb/reverse-midx:
multi-pack-index: fix potential segfault without sub-command

Merge branch 'hn/refs-debug-empty-prefix'

Debugging aid.

* hn/refs-debug-empty-prefix:
refs/debug: quote prefix

Merge branch 'pb/dont-complete-aliased-options'

The completion support used to offer alternate spelling of options
that exist only for compatibility, which has been corrected.

* pb/dont-complete-aliased-options:
parse-options: don't complete option aliases by default

Merge branch 'en/rename-limits-doc'

Documentation on "git diff -l<n>" and diff.renameLimit have been
updated, and the defaults for these limits have been raised.

* en/rename-limits-doc:
  rename: bump limit defaults yet again
  diffcore-rename: treat a rename_limit of 0 as unlimited
  doc: clarify documentation for rename/copy limits
  diff: correct warning message when renameLimit exceeded

Merge branch 'ds/gender-neutral-doc-guidelines'

A guideline for gender neutral documentation has been added.

* ds/gender-neutral-doc-guidelines:
CodingGuidelines: recommend gender-neutral description

Merge branch 'ds/status-with-sparse-index'

"git status" codepath learned to work with sparsely populated index
without hydrating it fully.

* ds/status-with-sparse-index:
  t1092: document bad sparse-checkout behavior
  fsmonitor: integrate with sparse index
  wt-status: expand added sparse directory entries
  status: use sparse-index throughout
  status: skip sparse-checkout percentage with sparse-index
  diff-lib: handle index diffs with sparse dirs
  dir.c: accept a directory as part of cone-mode patterns
  unpack-trees: unpack sparse directory entries
  unpack-trees: rename unpack_nondirectories()
  unpack-trees: compare sparse directories correctly
  unpack-trees: preserve cache_bottom
  t1092: add tests for status/add and sparse files
  t1092: expand repository data shape
  t1092: replace incorrect 'echo' with 'cat'
  sparse-index: include EXTENDED flag when expanding
  sparse-index: skip indexes with unmerged entries

Merge branch 'js/ci-make-sparse'

The CI gained a new job to run "make sparse" check.

* js/ci-make-sparse:
  ci/install-dependencies: handle "sparse" job package installs
  ci: run "apt-get update" before "apt-get install"
  ci: run `make sparse` as part of the GitHub workflow

Merge branch 'ab/pkt-line-tests'

Tests that cover protocol bits have been updated and helpers
used there have been consolidated.

* ab/pkt-line-tests:
test-lib-functions: use test-tool for [de]packetize()

Merge branch 'jk/t0000-subtests-fix'

Test fix.

* jk/t0000-subtests-fix:
t0000: clear GIT_SKIP_TESTS before running sub-tests

Merge branch 'dl/diff-merge-base'

"git diff --merge-base" documentation has been updated.

* dl/diff-merge-base:
git-diff: fix missing --merge-base docs

Merge branch 'ab/attribute-format'

Many "printf"-like helper functions we have have been annotated
with __attribute__() to catch placeholder/parameter mismatches.

* ab/attribute-format:
  advice.h: add missing __attribute__((format)) & fix usage
  *.h: add a few missing __attribute__((format))
  *.c static functions: add missing __attribute__((format))
  sequencer.c: move static function to avoid forward decl
  *.c static functions: don't forward-declare __attribute__

Merge branch 'jk/log-decorate-optim'

Optimize "git log" for cases where we wasted cycles to load ref
decoration data that may not be needed.

* jk/log-decorate-optim:
  load_ref_decorations(): fix decoration with tags
  add_ref_decoration(): rename s/type/deco_type/
  load_ref_decorations(): avoid parsing non-tag objects
  object.h: add lookup_object_by_type() function
  object.h: expand docstring for lookup_unknown_object()
  log: avoid loading decorations for userformats that don't need it
  pretty.h: update and expand docstring for userformat_find_requirements()

Merge branch 'sm/worktree-add-lock'

"git worktree add --lock" learned to record why the worktree is
locked with a custom message.

* sm/worktree-add-lock:
  worktree: teach `add` to accept --reason <string> with --lock
  worktree: mark lock strings with `_()` for translation
  t2400: clean up '"add" worktree with lock' test

Merge branch 'ew/many-alternate-optim'

Optimization for repositories with many alternate object store.

* ew/many-alternate-optim:
  oidtree: a crit-bit tree for odb_loose_cache
  oidcpy_with_padding: constify `src' arg
  make object_directory.loose_objects_subdir_seen a bitmap
  avoid strlen via strbuf_addstr in link_alt_odb_entry
  speed up alt_odb_usable() with many alternates

Merge branch 'hj/commit-allow-empty-message'

"git commit --allow-empty-message" won't abort the operation upon
an empty message, but the hint shown in the editor said otherwise.

* hj/commit-allow-empty-message:
commit: remove irrelavent prompt on `--allow-empty-message`
commit: reorganise commit hint strings

Merge branch 'dl/packet-read-response-end-fix'

Error message update.

* dl/packet-read-response-end-fix:
pkt-line: replace "stateless separator" with "response end"

ci/install-dependencies: handle "sparse" job package installs

This just matches the style/location of the package installation for
other jobs. There should be no functional change.

I did flip the order of the options and command-name ("-y update"
instead of "update -y") for consistency with other lines in the same
file.

Note also that we have to reorder the dependency install with the
"checkout" action, so that we actually have the "ci" scripts available.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: run "apt-get update" before "apt-get install"

The "sparse" workflow runs "apt-get install" to pick up a few necessary
packages. But it needs to run "apt-get update" first, or it risks trying
to download an old package version that no longer exists. And in fact
this happens now, with output like:

2021-07-26T17:40:51.2551880Z E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/c/curl/libcurl4-openssl-dev_7.68.0-1ubuntu2.5_amd64.deb 404 Not Found [IP: 52.147.219.192 80]
2021-07-26T17:40:51.2554304Z E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

Our other ci jobs don't suffer from this; they rely on scripts in ci/,
and ci/install-dependencies does the appropriate "apt-get update".

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

cache-tree: prefetch in partial clone read-tree

"git read-tree" checks the existence of the blobs referenced by the
given tree, but does not bulk prefetch them. Add a bulk prefetch.

The lack of prefetch here was noticed at $DAYJOB during a merge
involving some specific commits, but I couldn't find a minimal merge
that didn't also trigger the prefetch in check_updates() in
unpack-trees.c (and in all these cases, the lack of prefetch in
cache-tree.c didn't matter because all the relevant blobs would have
already been prefetched by then). This is why I used read-tree here to
exercise this code path.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

unpack-trees: refactor prefetching code

Refactor the prefetching code in unpack-trees.c into its own function,
because it will be used elsewhere in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap: check pack validity when opening bitmap

When pack-objects adds an entry to its list of objects to pack, it may
mark the packfile and offset that contains the file, which we can later
use to output the object verbatim.  If the packfile is deleted while we
are running (e.g., by another process running "git repack"), we may die
in use_pack() if the pack file cannot be opened.

We worked around this in 4c08018204 (pack-objects: protect against
disappearing packs, 2011-10-14) by making sure we can open the pack
before recording it as a source. This detects a pack which has already
disappeared while generating the packing list, and because we keep the
pack's file descriptor (or an mmap window) open, it means we can access
it later (unless you exceed core.packedgitlimit).

The bitmap code that was added later does not do this; it adds entries
to the packlist without checking that the packfile is still valid, and
is vulnerable to this race. It needs the same treatment as 4c08018204.

However, rather than add it in just that one spot, it makes more sense
to simply open and check the packfile when we open the bitmap.
Technically you can use the .bitmap without even looking in the .pack
file (e.g., if you are just printing a list of objects without accessing
them), but it's much simpler to do it early. That covers all later
direct uses of the pack (due to the cached descriptor) without having to
check each one directly. For example, in pack-objects we need to protect
the packlist entries, but we also access the pack directly as part of
the reuse_partial_pack_from_bitmap() feature. This patch covers both
cases.

There's no test here, because the problem is inherently racy. I
reproduced and verified the fix with this script:

  rm -rf parent.git push.git fetch.git

  push() {
    (
      cd push.git &&
      echo content >>file &&
      git add file &&
      git commit -qm "change $1" &&
      git push -q origin HEAD &&
      echo "push $1..."
    ) &&
    (
      cd parent.git &&
      git repack -ad -q &&
      echo "repack $1..."
    )
  }

  fetch() {
    rm -rf fetch.git &&
    git clone -q file://$PWD/parent.git fetch.git &&
    echo "fetch $1..."
  }

  git init --bare parent.git &&
  git --git-dir=parent.git config transfer.unpacklimit 1 &&
  git clone parent.git push.git &&
  (for i in `seq 1 1000`; do push $i || break; done) &
  pusher=$!
  (for i in `seq 1 1000`; do fetch $i || break; done) &
  fetcher=$!
  wait $fetcher
  kill $pusher

That simulates a race between a client cloning and a push triggering a
repack on the server. Without this patch, it generally fails within a
couple hundred iterations with:

  remote: fatal: packfile ./objects/pack/.tmp-1377349-pack-498afdec371232bdb99d1757872f5569331da61e.pack cannot be accessed
  error: git upload-pack: git-pack-objects died with error.
  fatal: git upload-pack: aborting due to possible repository corruption on the remote side.
  remote: aborting due to possible repository corruption on the remote side.
  fatal: early EOF
  fatal: fetch-pack: invalid index-pack output

With this patch, it reliably runs through all thousand attempts.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bundle tests: use test_cmp instead of grep

Change the bundle tests to fully compare the expected "git ls-remote"
or "git bundle list-heads" output, instead of merely grepping it.

This avoids subtle regressions in the tests. In
f62e0a39b6 (t5704 (bundle): add tests for bundle --stdin, 2010-04-19)
the "bundle --stdin <rev-list options>" test was added to make sure we
didn't include the tag.

But since the --stdin mode didn't work until 5bb0fd2cab (bundle:
arguments can be read from stdin, 2021-01-11) our grepping of
"master" (later "main") missed the important part of the test.

Namely that we should not include the "refs/tags/tag" tag in that
case. Since the test only grepped for "main" in the output we'd miss a
regression in that code.

So let's use test_cmp instead, and also in the other nearby tests
where it's easy.

This does make things a bit more verbose in the case of the test
that's checking the bundle header, since it's different under SHA1 and
SHA256. I think this makes test easier to follow.

I've got some WIP changes to extend the "git bundle" command to dump
parts of the header out, which are easier to understand if we test the
output explicitly like this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bundle tests: use ">file" not ": >file"

Change uses of ":" on the LHS of a ">" to the more commonly used
">file" pattern in t/t5607-clone-bundle.sh.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The sixth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'bc/rev-list-without-commit-line'

"git rev-list" learns to omit the "commit <object-name>" header
lines from the output with the `--no-commit-header` option.

* bc/rev-list-without-commit-line:
rev-list: add option for --pretty=format without header

Merge branch 'ab/imap-send-read-everything-simplify'

Code simplification.

* ab/imap-send-read-everything-simplify:
imap-send.c: use less verbose strbuf_fread() idiom

Merge branch 'ab/gitignore-discovery-doc'

Doc update.

* ab/gitignore-discovery-doc:
docs: .gitignore parsing is to the top of the repo

Merge branch 'js/ci-windows-update'

GitHub Actions / CI update.

* js/ci-windows-update:
  ci: accelerate the checkout
  ci (vs-build): build with NO_GETTEXT
  artifacts-tar: respect NO_GETTEXT
  ci (windows): transfer also the Git-tracked files to the test jobs
  ci: upgrade to using actions/{up,down}load-artifacts v2
  ci (vs-build): use `cmd` to copy the DLLs, not `powershell`
  ci: use the new GitHub Action to download git-sdk-64-minimal

Merge branch 'ab/send-email-optim'

"git send-email" optimization.

* ab/send-email-optim:
  perl: nano-optimize by replacing Cwd::cwd() with Cwd::getcwd()
  send-email: move trivial config handling to Perl
  perl: lazily load some common Git.pm setup code
  send-email: lazily load modules for a big speedup
  send-email: get rid of indirect object syntax
  send-email: use function syntax instead of barewords
  send-email: lazily shell out to "git var"
  send-email: lazily load config for a big speedup
  send-email: copy "config_regxp" into git-send-email.perl
  send-email: refactor sendemail.smtpencryption config parsing
  send-email: remove non-working support for "sendemail.smtpssl"
  send-email tests: test for boolean variables without a value
  send-email tests: support GIT_TEST_PERL_FATAL_WARNINGS=true

Merge branch 'jk/typofix'

Typofix.

* jk/typofix:
doc/rev-list-options: fix duplicate word typo

doc: pull: fix rebase=false documentation

"git pull --rebase=false" means we merge their history into ours, but
it has been described the other way around.

Cc: Stephen Haberman <stephen@exigencecorp.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
[jc: updated the log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap: clarify comment in filter_bitmap_exclude_type()

The code that eventually became filter_bitmap_exclude_type() was
originally introduced in 4f3bd5606a (pack-bitmap: implement BLOB_NONE
filtering, 2020-02-14) to accelerate BLOB_NONE filters with bitmaps.

In 856e12c18a (pack-bitmap.c: make object filtering functions generic,
2020-05-04), it became filter_bitmap_exclude_type(). But not all of the
comments were updated to be agnostic to the provided type.

Remove the remaining comments which should have been updated in
856e12c18a to reflect the type-agnostic nature of the function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: clarify description of 'submodule.recurse'

The doc for 'submodule.recurse' starts with "Specifies if commands
recurse into submodles by default". This is not exactly true of all
commands that have a '--recurse-submodules' option. For example, 'git
pull --recurse-submodules' does not run 'git pull' in each submodule,
but rather runs 'git submodule update --recursive' so that the submodule
working trees after the pull matches the commits recorded in the
superproject.

Clarify that by just saying that it enables '--recurse-submodules'.

Note that the way this setting interacts with 'fetch.recurseSubmodules'
and 'push.recurseSubmodules', which can have other values than true or
false, is already documented since 4da9e99e6e (doc: be more precise on
(fetch|push).recurseSubmodules, 2020-04-06).

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc/git-config: simplify "override" advice for FILES section

At the end of the FILES section, we indicate that you can override the
regular lookup rules with --global, etc. But:

  - we're missing the --local option

  - we point to GIT_CONFIG instead of --file, but the latter has much
    better documentation

  - we're vague about how the overrides work; the actual option
    descriptions are much better here

So let's just mention the names and point people back to the OPTIONS
section. We could perhaps even delete this paragraph entirely, but the
presence of the names may give people reading FILES a clue about where
to look for more information.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc/git-config: clarify GIT_CONFIG environment variable

The scope and utility of the GIT_CONFIG variable was drastically reduced
by dc87183189 (Only use GIT_CONFIG in "git config", not other programs,
2008-06-30). But the documentation in git-config(1) predates that, which
makes it rather misleading.

These days it is really just another way to say "--file". So let's say
that, and explicitly make it clear that it does not impact other Git
commands (like GIT_CONFIG_SYSTEM, etc, would).

I also bumped it to the bottom of the list of variables, and warned
people off of using it. We don't have any plans for deprecation at this
point, but there's little point in encouraging people to use it by
putting it at the top of the list.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc/git-config: explain --file instead of referring to GIT_CONFIG

The explanation for the --file option only refers to GIT_CONFIG. This
redirection to an environment variable is confusing, but doubly so
because the description of GIT_CONFIG is out of date.

Let's describe --file from scratch, detailing both the reading and
writing behavior as we do for other similar options like --system, etc.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t0000: fix test if run with TEST_OUTPUT_DIRECTORY

Testcases in t0000 are quite special given that they many of them run
nested testcases to verify that testing functionality itself works as
expected. These nested testcases are realized by writing a new ad-hoc
test script which again sources test-lib.sh, where the new script is
created in a nested subdirectory located beneath the current trash
directory. We then execute the new test script with the nested
subdirectory as current working directory and explicitly re-export
TEST_OUTPUT_DIRECTORY to point to that directory.

While this works as expected in the general case, it falls apart when
the developer has TEST_OUTPUT_DIRECTORY explicitly defined either via
the environment or via config.mak and runs "make test". In that case,
test-lib.sh will clobber the value that we've just carefully set up to
instead contain what the developer has defined. As a result, the
TEST_OUTPUT_DIRECTORY continues to point at the root output directory,
not at the nested one.

This issue causes breakage in the 'test_atexit is run' test case: the
nested test case writes files into "../../", which is assumed to be the
parent's trash directory. But because TEST_OUTPUT_DIRECTORY already
points to to the root output directory, we instead end up writing those
files outside of the output directory. The parent test case will then
try to check whether those files still exist in its own trash directory,
which thus must fail now.

Fix the issue by adding a new TEST_OUTPUT_DIRECTORY_OVERRIDE variable.
If set, then we'll always override the TEST_OUTPUT_DIRECTORY with its
value after sourcing GIT-BUILD-OPTIONS.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

multi-pack-index: fix potential segfault without sub-command

Since cd57bc41bb (builtin/multi-pack-index.c: display usage on
unrecognized command, 2021-03-30) we have used a "usage" label to avoid
having two separate callers of usage_with_options (one when no arguments
are given, and another for unrecognized sub-commands).

But the first caller has been broken since cd57bc41bb, since it will
happily jump to usage without arguments, and then pass argv[0] to the
"unrecognized subcommand" error.

Many compilers will save us from a segfault here, but the end result is
ugly, since it mentions an unrecognized subcommand when we didn't even
pass one, and (on GCC) includes "(null)" in its output.

Move the "usage" label down past the error about unrecognized
subcommands so that it is only triggered when it should be. While we're
at it, bulk up our test coverage in this area, too.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/debug: quote prefix

This makes the empty prefix ("") stand out better.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t0000: clear GIT_SKIP_TESTS before running sub-tests

In t0000, we run several fake "sub-test" suites to verify the behavior
of the test suite. But because we don't clear the parent environment
completely, the sub-tests can be fooled by variables meant for the
parent. For example:

GIT_SKIP_TESTS=t1234 ./t0000-basic.sh

fails when a sub-test expects its fake t1234 to actually run. This
particular pattern is unlikely in practice; we're running a single
script, and there is no t1234 in the real test suite anyway (not yet, at
least). A more real-world example is:

GIT_SKIP_TESTS=t[^0]* make test

to run only the t0* tests.

The fix is conceptually simple: we should clear the GIT_SKIP_TESTS
variable when running the sub-tests, because its contents (if any) will
be meant for the main test suite. This is easy to do centrally in our
sub-test helper.

But there's a catch: some of our tests do set GIT_SKIP_TESTS
intentionally to test the feature. We need to allow them to continue to
set it, but clear it for all the other tests. And the sub-test helper
can't tell if the GIT_SKIP_TESTS it sees is from a test or not. We can
handle this by adding a new option to the helper to let callers specify
the skip list.

I considered adding a more general "--eval" option to let callers set up
the env for the sub-test however they like. That would cover this case
and possible future ones. But the quoting gets awkward for the callers
(since we're now 2 layers deep in evals!), so I went with the simpler
more specific solution.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-lib-functions: use test-tool for [de]packetize()

The shell+perl "[de]packetize()" helper functions were added in
4414a150025 (t/lib-git-daemon: add network-protocol helpers,
2018-01-24), and around the same time we added the "pkt-line" helper
in 74e70029615 (test-pkt-line: introduce a packet-line test helper,
2018-03-14).

For some reason it seems we've mostly used the shell+perl version
instead of the helper since then. There were discussions around
88124ab2636 (test-lib-functions: make packetize() more efficient,
2020-03-27) and cacae4329fa (test-lib-functions: simplify packetize()
stdin code, 2020-03-29) to improve them and make them more efficient.

There was one good reason to do so, we needed an equivalent of
"test-tool pkt-line pack", but that command wasn't capable of handling
input with "\n" (a feature) or "\0" (just because it happens to be
printf-based under the hood).

Let's add a "pkt-line-raw" helper for that, and expose is at a
packetize_raw() to go with the existing packetize() on the shell
level, this gives us the smallest amount of change to the tests
themselves.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The fifth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ds/gender-neutral-doc'

Update the documentation not to assume users are of certain gender
and adds to guidelines to do so.

* ds/gender-neutral-doc:
  *: fix typos
  comments: avoid using the gender of our users
  doc: avoid using the gender of other people

Merge branch 'jt/partial-clone-submodule-1'

Prepare the internals for lazily fetching objects in submodules
from their promisor remotes.

* jt/partial-clone-submodule-1:
  promisor-remote: teach lazy-fetch in any repo
  run-command: refactor subprocess env preparation
  submodule: refrain from filtering GIT_CONFIG_COUNT
  promisor-remote: support per-repository config
  repository: move global r_f_p_c to repo struct

Merge branch 'ab/struct-init'

Code cleanup around struct_type_init() functions.

* ab/struct-init:
  string-list.h users: change to use *_{nodup,dup}()
  string-list.[ch]: add a string_list_init_{nodup,dup}()
  dir.[ch]: replace dir_init() with DIR_INIT
  *.c *_init(): define in terms of corresponding *_INIT macro
  *.h: move some *_INIT to designated initializers

Merge branch 'dd/test-stdout-count-lines'

Tiny test clean-up.

* dd/test-stdout-count-lines:
  t6402: preserve git exit status code
  t6400: preserve git ls-files exit status code
  test-lib-functions: introduce test_stdout_line_count

Merge branch 'hn/refs-test-cleanup'

Test clean-up.

* hn/refs-test-cleanup:
t7509: avoid direct file access for writing CHERRY_PICK_HEAD
t1415: avoid direct filesystem access for writing refs

Merge branch 'rs/khash-alloc-cleanup'

Code clean-up.

* rs/khash-alloc-cleanup:
khash: clarify that allocations never fail

Merge branch 'ar/help-micro-cleanup'

Tiny code clean-up.

* ar/help-micro-cleanup:
help: convert git_cmd to page in one place

Merge branch 'ar/submodule-helper-include-cleanup'

Code clean-up.

* ar/submodule-helper-include-cleanup:
submodule--helper: remove redundant include

Merge branch 'ab/bundle-updates'

Code clean-up and leak plugging in "git bundle".

* ab/bundle-updates:
  bundle: remove "ref_list" in favor of string-list.c API
  bundle.c: use a temporary variable for OIDs and names
  bundle cmd: stop leaking memory from parse_options_cmd_bundle()

Merge branch 'hn/refs-iterator-peel-returns-boolean'

Tiny API tweak.

* hn/refs-iterator-peel-returns-boolean:
refs: make explicit that ref_iterator_peel returns boolean

Merge branch 'ab/mktag-tests'

Fill test gaps.

* ab/mktag-tests:
  mktag tests: test fast-export
  mktag tests: test for-each-ref
  mktag tests: test update-ref and reachable fsck
  mktag tests: test hash-object --literally and unreachable fsck
  mktag tests: invert --no-strict test
  mktag tests: parse out options in helper

Merge branch 'ab/show-branch-tests'

Fill test gaps.

* ab/show-branch-tests:
  show-branch tests: add missing tests
  show-branch: don't <COLOR></RESET> for space characters
  show-branch tests: modernize test code
  show-branch tests: rename the one "show-branch" test file

Merge branch 'ab/fetch-negotiate-segv-fix'

Code recently added to support common ancestry negotiation during
"git push" did not sanity check its arguments carefully enough.

* ab/fetch-negotiate-segv-fix:
  fetch: fix segfault in --negotiate-only without --negotiation-tip=*
  fetch: document the --negotiate-only option
  send-pack.c: move "no refs in common" abort earlier

Merge branch 'ab/make-delete-on-error'

Use ".DELETE_ON_ERROR" pseudo target to simplify our Makefile.

* ab/make-delete-on-error:
Makefile: add and use the ".DELETE_ON_ERROR" flag

Merge branch 'ew/mmap-failures'

Error message update.

* ew/mmap-failures:
xmmap: inform Linux users of tuning knobs on ENOMEM

Merge branch 'js/config-mak-windows-pcre-fix'

Whitespace fix.

* js/config-mak-windows-pcre-fix:
config.mak.uname: PCRE1 cleanup

Merge branch 'js/gfw-system-config-loc-fix'

Update the location of system-side configuration file on Windows.

* js/gfw-system-config-loc-fix:
  config: normalize the path of the system gitconfig
  cmake(windows): set correct path to the system Git config
  mingw: move Git for Windows' system config where users expect it

Merge branch 'ks/submodule-cleanup'

Code cleanup.

* ks/submodule-cleanup:
submodule: remove unnecessary `prefix` based option logic

Merge branch 'tb/midx-use-checksum'

When rebuilding the multi-pack index file reusing an existing one,
we used to blindly trust the existing file and ended up carrying
corrupted data into the updated file, which has been corrected.

* tb/midx-use-checksum:
  midx: report checksum mismatches during 'verify'
  midx: don't reuse corrupt MIDXs when writing
  commit-graph: rewrite to use checksum_valid()
  csum-file: introduce checksum_valid()

Merge branch 'en/merge-dir-rename-corner-case-fix'

The merge code had funny interactions between content based rename
detection and directory rename detection.

* en/merge-dir-rename-corner-case-fix:
  merge-recursive: handle rename-to-self case
  merge-ort: ensure we consult df_conflict and path_conflicts
  t6423: test directory renames causing rename-to-self

Merge branch 'en/ort-perf-batch-13'

Performance tweaks of "git merge -sort" around lazy fetching of objects.

* en/ort-perf-batch-13:
  merge-ort: add prefetching for content merges
  diffcore-rename: use a different prefetch for basename comparisons
  diffcore-rename: allow different missing_object_cb functions
  t6421: add tests checking for excessive object downloads during merge
  promisor-remote: output trace2 statistics for number of objects fetched

Merge branch 'en/ort-perf-batch-12'

More fix-ups and optimization to "merge -sort".

* en/ort-perf-batch-12:
  merge-ort: miscellaneous touch-ups
  Fix various issues found in comments
  diffcore-rename: avoid unnecessary strdup'ing in break_idx
  merge-ort: replace string_list_df_name_compare with faster alternative

CodingGuidelines: recommend gender-neutral description

Technical writing seeks to convey information with minimal
friction. One way that a reader can experience friction is if they
encounter a description of "a user" that is later simplified using a
gendered pronoun. If the reader does not consider that pronoun to
apply to them, then they can experience cognitive dissonance that
removes focus from the information.

Give some basic tips to guide us avoid unnecessary uses of gendered
description.

Using a gendered pronoun is appropriate when referring to a specific
person.

There are acceptable existing uses of gendered pronouns within the
Git codebase, such as:

* References to real people (e.g. Linus Torvalds, "the Git maintainer").
  Do not misgender real people. If there is any doubt to the gender of a
  person, then avoid using pronouns.

* References to fictional people with clear genders (e.g. Alice and
  Bob).

* Sample text used in test cases (e.g t3702, t6432).

* The official text of the GPL license contains uses of "he or she",
  but using singular "they" (or modifying the text in some other
  way) is not within the scope of the Git project.

* Literal email messages in Documentation/howto/ should not be edited
  for grammatical concerns such as this, unless we update the entire
  document to fit the standard documentation format. If such an effort is
  taken on, then the authorship would change and no longer refer to the
  exact mail message.

* External projects consumed in contrib/ should not deviate solely for
  style reasons. Recommended edits should be contributed to those
  projects directly.

Other cases within the Git project were cleaned up by the previous
changes.

Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

parse-options: don't complete option aliases by default

Since 'OPT_ALIAS' was created in 5c387428f1 (parse-options: don't emit
"ambiguous option" for aliases, 2019-04-29), 'git clone
--git-completion-helper', which is used by the Bash completion script to
list options accepted by clone (via '__gitcomp_builtin'), lists both
'--recurse-submodules' and its alias '--recursive', which was not the
case before since '--recursive' had the PARSE_OPT_HIDDEN flag set, and
options with this flag are skipped by 'parse-options.c::show_gitcomp',
which implements 'git <cmd> --git-completion-helper'.

This means that typing 'git clone --recurs<TAB>' will yield both
'--recurse-submodules' and '--recursive', which is not ideal since both
do the same thing, and so the completion should directly complete the
canonical option.

At the point where 'show_gitcomp' is called in 'parse_options_step',
'preprocess_options' was already called in 'parse_options', so any
aliases are now copies of the original options with a modified help text
indicating they are aliases.

Helpfully, since 64cc539fd2 (parse-options: don't leak alias help
messages, 2021-03-21) these copies have the PARSE_OPT_FROM_ALIAS flag
set, so check that flag early in 'show_gitcomp' and do not print them,
unless the user explicitely requested that *all* completion be shown (by
setting 'GIT_COMPLETION_SHOW_ALL'). After all, if we want to encourage
the use of '--recurse-submodules' over '--recursive', we'd better just
suggest the former.

The only other options alias is 'log' and friends' '--mailmap', which is
an alias for '--use-mailmap', but the Bash completion helpers for these
commands do not use '__gitcomp_builtin', and thus are unnaffected by
this change.

Test the new behaviour in t9902-completion.sh. As a side effect, this
also tests the correct behaviour of GIT_COMPLETION_SHOW_ALL, which was
not tested before. Note that since '__gitcomp_builtin' caches the
options it shows, we need to re-source the completion script to clear
that cache for the second test.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rename: bump limit defaults yet again

These were last bumped in commit 92c57e5c1d29 (bump rename limit
defaults (again), 2011-02-19), and were bumped both because processors
had gotten faster, and because people were getting ugly merges that
caused problems and reporting it to the mailing list (suggesting that
folks were willing to spend more time waiting).

Since that time:
  * Linus has continued recommending kernel folks to set
    diff.renameLimit=0 (maps to 32767, currently)
  * Folks with repositories with lots of renames were happy to set
    merge.renameLimit above 32767, once the code supported that, to
    get correct cherry-picks
  * Processors have gotten faster
  * It has been discovered that the timing methodology used last time
    probably used too large example files.

The last point is probably worth explaining a bit more:

  * The "average" file size used appears to have been average blob size
    in the linux kernel history at the time (probably v2.6.25 or
    something close to it).
  * Since bigger files are modified more frequently, such a computation
    weights towards larger files.
  * Larger files may be more likely to be modified over time, but are
    not more likely to be renamed -- the mean and median blob size
    within a tree are a bit higher than the mean and median of blob
    sizes in the history leading up to that version for the linux
    kernel.
  * The mean blob size in v2.6.25 was half the average blob size in
    history leading to that point
  * The median blob size in v2.6.25 was about 40% of the mean blob size
    in v2.6.25.
  * Since the mean blob size is more than double the median blob size,
    any file as big as the mean will not be compared to any files of
    median size or less (because they'd be more than 50% dissimilar).
  * Since it is the number of files compared that provides the O(n^2)
    behavior, median-sized files should matter more than mean-sized
    ones.

The combined effect of the above is that the file size used in past
calculations was likely about 5x too large.  Combine that with a CPU
performance improvement of ~30%, and we can increase the limits by
a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated
time limits.

Keeping the same approximate time limit probably makes sense for
diff.renameLimit (there is no progress feedback in e.g. git log -p),
but the experience above suggests merge.renameLimit could be extended
significantly.  In fact, it probably would make sense to have an
unlimited default setting for merge.renameLimit, but that would
likely need to be coupled with changes to how progress is displayed.
(See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/
for details in that area.)  For now, let's just bump the approximate
time limit from 10s to 1m.

(Note: We do not want to use actual time limits, because getting results
that depend on how loaded your system is that day feels bad, and because
we don't discover that we won't get all the renames until after we've
put in a lot of work rather than just upfront telling the user there are
too many files involved.)

Using the original time limit of 2s for diff.renameLimit, and bumping
merge.renameLimit from 10s to 60s, I found the following timings using
the simple script at the end of this commit message (on an AWS c5.xlarge
which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"):

      N   Timing
   1300    1.995s
   7100   59.973s

So let's round down to nice even numbers and bump the limits from
400->1000, and from 1000->7000.

Here is the measure_rename_perf script (adapted from
https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/
in particular to avoid triggering the linear handling from
basename-guided rename detection):

    #!/bin/bash

    n=$1; shift

    rm -rf repo
    mkdir repo && cd repo
    git init -q -b main

    mkdata() {
      mkdir $1
      for i in `seq 1 $2`; do
        (sed "s/^/$i /" <../sample
         echo tag: $1
        ) >$1/$i
      done
    }

    mkdata initial $n
    git add .
    git commit -q -m initial

    mkdata new $n
    git add .
    cd new
    for i in *; do git mv $i $i.renamed; done
    cd ..
    git rm -q -rf initial
    git commit -q -m new

    time git diff-tree -M -l0 --summary HEAD^ HEAD

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diffcore-rename: treat a rename_limit of 0 as unlimited

In commit 89973554b52c (diffcore-rename: make diff-tree -l0 mean
-l<large>, 2017-11-29), -l0 was given a special magical "large" value,
but one which was not large enough for some uses (as can be seen from
commit 9f7e4bfa3b6d (diff: remove silent clamp of renameLimit,
2017-11-13). Make 0 (or a negative value) be treated as unlimited
instead and update the documentation to mention this.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: clarify documentation for rename/copy limits

A few places in the docs implied that rename/copy detection is always
quadratic or that all (unpaired) files were involved in the quadratic
portion of rename/copy detection.  The following two commits each
introduced an exception to this:

    9027f53cb505 (Do linear-time/space rename logic for exact renames,
                  2007-10-25)
    bd24aa2f97a0 (diffcore-rename: guide inexact rename detection based
                  on basenames, 2021-02-14)

(As a side note, for copy detection, the basename guided inexact rename
detection is turned off and the exact renames will only result in
sources (without the dests) being removed from the set of files used in
quadratic detection.  So, for copy detection, the documentation was
closer to correct.)

Avoid implying that all files involved in rename/copy detection are
subject to the full quadratic algorithm.  While at it, also note the
default values for all these settings.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff: correct warning message when renameLimit exceeded

The warning when quadratic rename detection was skipped referred to
"inexact rename detection".  For years, the only linear portion of
rename detection was looking for exact renames, so "inexact rename
detection" was an accurate way to refer to the quadratic portion of
rename detection.  However, that changed with commit bd24aa2f97a0
(diffcore-rename: guide inexact rename detection based on basenames,
2021-02-14).  Let's instead use the term "exhaustive rename detection"
to refer to the quadratic portion.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree: teach `add` to accept --reason <string> with --lock

The default reason stored in the lock file, "added with --lock",
is unlikely to be what the user would have given in a separate
`git worktree lock` command. Allowing `--reason` to be specified
along with `--lock` when adding a working tree gives the user control
over the reason for locking without needing a second command.

Signed-off-by: Stephen Manz <smanz@alum.mit.edu>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci(check-whitespace): restrict to the intended commits

During a run of the `check-whitespace` we want to verify that the
commits introduced in the Pull Request have no whitespace issues. We
only want to look at those commits, not the upstream commits (because
the contributor cannot do anything about the latter).

However, by using the `-<count>` form in `git log --check`, we run the
risk of looking at the wrong commits. The reason is that the
`actions/checkout` step does _not_ check out the tip commit of the Pull
Request's branch: Instead, it checks out a merge commit that merges that
branch into the target branch. For that reason, we already adjust the
commit count by incrementing it, but that is not enough: if the upstream
branch has newer commits, they are traversed _first_. And obviously we
will then miss some of the commits that we _actually_ wanted to look at.

Therefore, let's be careful to stop assuming a linear, up to date commit
topology in the contributed commits, and instead specify the correct
commit range.

Unfortunately, this means that we no longer can rely on a shallow clone:
There is no way of knowing just how many commits the upstream branch
advanced after the commit from which the PR branch branched off. So
let's just go with a full clone instead, and be safe rather than sorry
(if we have "too shallow" a situation, a commit range `@{u}..` may very
well include a shallow commit itself, and the output of `git show
--check <shallow>` is _not_ pretty).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci(check-whitespace): stop requiring a read/write token

As part of some recent security tightening, GitHub introduced the
ability to configure GitHub workflows to be run with a read-only token.
This is much more secure, in particular when working in a public
repository: While the regular read/write token might be restricted to
writing to the current branch, it is not necessarily restricted to
access only the current Pull Request.

However, the `check-whitespace` workflow threw a wrench into this plan:
it _requires_ write access (because it wants to add a PR comment in case
of a whitespace issue).

Let's just skip that PR comment. The user can always click through to
the actual error, even if it is slightly less convenient.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t1092: document bad sparse-checkout behavior

There are several situations where a repository with sparse-checkout
enabled will act differently than a normal repository, and in ways that
are not intentional. The test t1092-sparse-checkout-compatibility.sh
documents some of these deviations, but a casual reader might think
these are intentional behavior changes.

Add comments on these tests that make it clear that these behaviors
should be updated. Using 'NEEDSWORK' helps contributors find that these
are potential areas for improvement.

Helped-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsmonitor: integrate with sparse index

If we need to expand a sparse-index into a full one, then the FS Monitor
bitmap is going to be incorrect. Ensure that we start fresh at such an
event.

While this is currently a performance drawback, the eventual hope of the
sparse-index feature is that these expansions will be rare and hence we
will be able to keep the FS Monitor data accurate across multiple Git
commands.

These tests are added to demonstrate that the behavior is the same
across a full index and a sparse index, but also that file modifications
to a tracked directory outside of the sparse cone will trigger
ensure_full_index().

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

wt-status: expand added sparse directory entries

It is difficult, but possible, to get into a state where we intend to
add a directory that is outside of the sparse-checkout definition. Add a
test to t1092-sparse-checkout-compatibility.sh that demonstrates this
using a combination of 'git reset --mixed' and 'git checkout --orphan'.

This test failed before because the output of 'git status
--porcelain=v2' would not match on the lines for folder1/:

* The sparse-checkout repo (with a full index) would output each path
name that is intended to be added.

* The sparse-index repo would only output that "folder1/" is staged for
addition.

The status should report the full list of files to be added, and so this
sparse-directory entry should be expanded to a full list when reaching
it inside the wt_status_collect_changes_initial() method. Use
read_tree_at() to assist.

Somehow, this loop over the cache entries was not guarded by
ensure_full_index() as intended.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

status: use sparse-index throughout

By testing 'git -c core.fsmonitor= status -uno', we can check for the
simplest index operations that can be made sparse-aware. The necessary
implementation details are already integrated with sparse-checkout, so
modify command_requires_full_index to be zero for cmd_status().

In refresh_index(), we loop through the index entries to refresh their
stat() information. However, sparse directories have no stat()
information to populate. Ignore these entries.

This allows 'git status' to no longer expand a sparse index to a full
one. This is further tested by dropping the "-uno" option and adding an
untracked file into the worktree.

The performance test p2000-sparse-checkout-operations.sh demonstrates
these improvements:

Test                                  HEAD~1           HEAD
-----------------------------------------------------------------------------
2000.2: git status (full-index-v3)    0.31(0.30+0.05)  0.31(0.29+0.06) +0.0%
2000.3: git status (full-index-v4)    0.31(0.29+0.07)  0.34(0.30+0.08) +9.7%
2000.4: git status (sparse-index-v3)  2.35(2.28+0.10)  0.04(0.04+0.05) -98.3%
2000.5: git status (sparse-index-v4)  2.35(2.24+0.15)  0.05(0.04+0.06) -97.9%

Note that since HEAD~1 was expanding the sparse index by parsing trees,
it was artificially slower than the full index case. Thus, the 98%
improvement is misleading, and instead we should celebrate the 0.34s to
0.05s improvement of 85%. This is more indicative of the peformance
gains we are expecting by using a sparse index.

Note: we are dropping the assignment of core.fsmonitor here. This is not
necessary for the test script as we are not altering the config any
other way. Correct integration with FS Monitor will be validated in
later changes.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>