git.ipfire.org Git - thirdparty/git.git/log

merge-ort: call diffcore_rename() directly

We want to pass additional information to diffcore_rename() (or some
variant thereof) without plumbing that extra information through
diff_tree_oid() and diffcore_std().  Further, since we will need to
gather additional special information related to diffs and are walking
the trees anyway in collect_merge_info(), it seems odd to have
diff_tree_oid()/diffcore_std() repeat those tree walks.  And there may
be times where we can avoid traversing into a subtree in
collect_merge_info() (based on additional information at our disposal),
that the basic diff logic would be unable to take advantage of.  For all
these reasons, just create the add and delete pairs ourself and then
call diffcore_rename() directly.

This change is primarily about enabling future optimizations; the
advantage of avoiding extra tree traversals is small compared to the
cost of rename detection, and the advantage of avoiding the extra tree
traversals is somewhat offset by the extra time spent in
collect_merge_info() collecting the additional data anyway.  However...

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       13.294 s ±  0.103 s    12.775 s ±  0.062 s
    mega-renames:    187.248 s ±  0.882 s   188.754 s ±  0.284 s
    just-one-mega:     5.557 s ±  0.017 s     5.599 s ±  0.019 s

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitdiffcore doc: mention new preliminary step for rename detection

The last few patches have introduced a new preliminary step when rename
detection is on but both break detection and copy detection are off.
Document this new step. While we're at it, add a testcase that checks
the new behavior as well.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diffcore-rename: guide inexact rename detection based on basenames

Make use of the new find_basename_matches() function added in the last
two patches, to find renames more rapidly in cases where we can match up
files based on basenames.  As a quick reminder (see the last two commit
messages for more details), this means for example that
docs/extensions.txt and docs/config/extensions.txt are considered likely
renames if there are no remaining 'extensions.txt' files elsewhere among
the added and deleted files, and if a similarity check confirms they are
similar, then they are marked as a rename without looking for a better
similarity match among other files.  This is a behavioral change, as
covered in more detail in the previous commit message.

We do not use this heuristic together with either break or copy
detection.  The point of break detection is to say that filename
similarity does not imply file content similarity, and we only want to
know about file content similarity.  The point of copy detection is to
use more resources to check for additional similarities, while this is
an optimization that uses far less resources but which might also result
in finding slightly fewer similarities.  So the idea behind this
optimization goes against both of those features, and will be turned off
for both.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       13.815 s ±  0.062 s    13.294 s ±  0.103 s
    mega-renames:   1799.937 s ±  0.493 s   187.248 s ±  0.882 s
    just-one-mega:    51.289 s ±  0.019 s     5.557 s ±  0.017 s

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diffcore-rename: complete find_basename_matches()

It is not uncommon in real world repositories for the majority of file
renames to not change the basename of the file; i.e. most "renames" are
just a move of files into different directories.  We can make use of
this to avoid comparing all rename source candidates with all rename
destination candidates, by first comparing sources to destinations with
the same basenames.  If two files with the same basename are
sufficiently similar, we record the rename; if not, we include those
files in the more exhaustive matrix comparison.

This means we are adding a set of preliminary additional comparisons,
but for each file we only compare it with at most one other file.  For
example, if there was a include/media/device.h that was deleted and a
src/module/media/device.h that was added, and there are no other
device.h files in the remaining sets of added and deleted files after
exact rename detection, then these two files would be compared in the
preliminary step.

This commit does not yet actually employ this new optimization, it
merely adds a function which can be used for this purpose.  The next
commit will do the necessary plumbing to make use of it.

Note that this optimization might give us different results than without
the optimization, because it's possible that despite files with the same
basename being sufficiently similar to be considered a rename, there's
an even better match between files without the same basename.  I think
that is okay for four reasons: (1) it's easy to explain to the users
what happened if it does ever occur (or even for them to intuitively
figure out), (2) as the next patch will show it provides such a large
performance boost that it's worth the tradeoff, and (3) it's somewhat
unlikely that despite having unique matching basenames that other files
serve as better matches.  Reason (4) takes a full paragraph to
explain...

If the previous three reasons aren't enough, consider what rename
detection already does.  Break detection is not the default, meaning
that if files have the same _fullname_, then they are considered related
even if they are 0% similar.  In fact, in such a case, we don't even
bother comparing the files to see if they are similar let alone
comparing them to all other files to see what they are most similar to.
Basically, we override content similarity based on sufficient filename
similarity.  Without the filename similarity (currently implemented as
an exact match of filename), we swing the pendulum the opposite
direction and say that filename similarity is irrelevant and compare a
full N x M matrix of sources and destinations to find out which have the
most similar contents.  This optimization just adds another form of
filename similarity comparison, but augments it with a file content
similarity check as well.  Basically, if two files have the same
basename and are sufficiently similar to be considered a rename, mark
them as such without comparing the two to all other rename candidates.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diffcore-rename: compute basenames of source and dest candidates

We want to make use of unique basenames among remaining source and
destination files to help inform rename detection, so that more likely
pairings can be checked first.  (src/moduleA/foo.txt and
source/module/A/foo.txt are likely related if there are no other
'foo.txt' files among the remaining deleted and added files.)  Add a new
function, not yet used, which creates a map of the unique basenames
within rename_src and another within rename_dst, together with the
indices within rename_src/rename_dst where those basenames show up.
Non-unique basenames still show up in the map, but have an invalid index
(-1).

This function was inspired by the fact that in real world repositories,
files are often moved across directories without changing names.  Here
are some sample repositories and the percentage of their historical
renames (as of early 2020) that preserved basenames:
  * linux: 76%
  * gcc: 64%
  * gecko: 79%
  * webkit: 89%
These statistics alone don't prove that an optimization in this area
will help or how much it will help, since there are also unpaired adds
and deletes, restrictions on which basenames we consider, etc., but it
certainly motivated the idea to try something in this area.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t4001: add a test comparing basename similarity and content similarity

Add a simple test where a removed file is similar to two different added
files; one of them has the same basename, and the other has a slightly
higher content similarity. In the current test, content similarity is
weighted higher than filename similarity.

Subsequent commits will add a new rule that weighs a mixture of filename
similarity and content similarity in a manner that will change the
outcome of this testcase.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diffcore-rename: filter rename_src list when possible

We have to look at each entry in rename_src a total of rename_dst_nr
times.  When we're not detecting copies, any exact renames or ignorable
rename paths will just be skipped over.  While checking that these can
be skipped over is a relatively cheap check, it's still a waste of time
to do that check more than once, let alone rename_dst_nr times.  When
rename_src_nr is a few thousand times bigger than the number of relevant
sources (such as when cherry-picking a commit that only touched a
handful of files, but from a side of history that has different names
for some high level directories), this time can add up.

First make an initial pass over the rename_src array and move all the
relevant entries to the front, so that we can iterate over just those
relevant entries.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       14.119 s ±  0.101 s    13.815 s ±  0.062 s
    mega-renames:   1802.044 s ±  0.828 s  1799.937 s ±  0.493 s
    just-one-mega:    51.391 s ±  0.028 s    51.289 s ±  0.019 s

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diffcore-rename: no point trying to find a match better than exact

diffcore_rename() had some code to avoid having destination paths that
already had an exact rename detected from being re-checked for other
renames.  Source paths, however, were re-checked because we wanted to
allow the possibility of detecting copies.  But if copy detection isn't
turned on, then this merely amounts to attempting to find a
better-than-exact match, which naturally ends up being an expensive
no-op.  In particular, copy detection is never turned on by the merge
machinery.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       14.263 s ±  0.053 s    14.119 s ±  0.101 s
    mega-renames:   5504.231 s ±  5.150 s  1802.044 s ±  0.828 s
    just-one-mega:   158.534 s ±  0.498 s    51.391 s ±  0.028 s

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Sync with maint

Merge branch 'en/merge-ort-perf'

The "ort" merge strategy.

* en/merge-ort-perf:
  merge-ort: begin performance work; instrument with trace2_region_* calls
  merge-ort: ignore the directory rename split conflict for now
  merge-ort: fix massive leak

Merge branch 'en/ort-directory-rename'

ORT merge strategy learns to infer "renamed directory" while
merging.

* en/ort-directory-rename:
  merge-ort: fix a directory rename detection bug
  merge-ort: process_renames() now needs more defensiveness
  merge-ort: implement apply_directory_rename_modifications()
  merge-ort: add a new toplevel_dir field
  merge-ort: implement handle_path_level_conflicts()
  merge-ort: implement check_for_directory_rename()
  merge-ort: implement apply_dir_rename() and check_dir_renamed()
  merge-ort: implement compute_collisions()
  merge-ort: modify collect_renames() for directory rename handling
  merge-ort: implement handle_directory_level_conflicts()
  merge-ort: implement compute_rename_counts()
  merge-ort: copy get_renamed_dir_portion() from merge-recursive.c
  merge-ort: add outline of get_provisional_directory_renames()
  merge-ort: add outline for computing directory renames
  merge-ort: collect which directories are removed in dirs_removed
  merge-ort: initialize and free new directory rename data structures
  merge-ort: add new data structures for directory rename detection

Merge branch 'tb/ci-run-cocci-with-18.04' into maint

* tb/ci-run-cocci-with-18.04:
.github/workflows/main.yml: run static-analysis on bionic

Merge branch 'tb/ci-run-cocci-with-18.04'

The version of Ubuntu Linux used by default at GitHub Actions CI
has been updated to one that lack coccinelle; until it gets fixed,
work it around by sticking to the previous release (18.04).

* tb/ci-run-cocci-with-18.04:
.github/workflows/main.yml: run static-analysis on bionic

The seventh batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ab/detox-gettext-tests'

Get rid of "GETTEXT_POISON" support altogether, which may or may
not be controversial.

* ab/detox-gettext-tests:
  tests: remove uses of GIT_TEST_GETTEXT_POISON=false
  tests: remove support for GIT_TEST_GETTEXT_POISON
  ci: remove GETTEXT_POISON jobs

Merge branch 'ab/grep-pcre-invalid-utf8'

Update support for invalid UTF-8 in PCRE2.

* ab/grep-pcre-invalid-utf8:
grep/pcre2: better support invalid UTF-8 haystacks
grep/pcre2 tests: don't rely on invalid UTF-8 data test

Merge branch 'ab/retire-pcre1'

The support for deprecated PCRE1 library has been dropped.

* ab/retire-pcre1:
Remove support for v1 of the PCRE library
config.mak.uname: remove redundant NO_LIBPCRE1_JIT flag

Merge branch 'jk/pretty-lazy-load-commit'

Some pretty-format specifiers do not need the data in commit object
(e.g. "%H"), but we were over-eager to load and parse it, which has
been made even lazier.

* jk/pretty-lazy-load-commit:
pretty: lazy-load commit data when expanding user-format

Merge branch 'ds/more-index-cleanups'

Cleaning various codepaths up.

* ds/more-index-cleanups:
  t1092: test interesting sparse-checkout scenarios
  test-lib: test_region looks for trace2 regions
  sparse-checkout: load sparse-checkout patterns
  name-hash: use trace2 regions for init
  repository: add repo reference to index_state
  fsmonitor: de-duplicate BUG()s around dirty bits
  cache-tree: extract subtree_pos()
  cache-tree: simplify verify_cache() prototype
  cache-tree: clean up cache_tree_update()

Merge branch 'rs/worktree-list-verbose'

`git worktree list` now annotates worktrees as prunable, shows
locked and prunable attributes in --porcelain mode, and gained
a --verbose option.

* rs/worktree-list-verbose:
  worktree: teach `list` verbose mode
  worktree: teach `list` to annotate prunable worktree
  worktree: teach `list --porcelain` to annotate locked worktree
  t2402: ensure locked worktree is properly cleaned up
  worktree: teach worktree_lock_reason() to gently handle main worktree
  worktree: teach worktree to lazy-load "prunable" reason
  worktree: libify should_prune_worktree()

Merge branch 'js/rebase-i-commit-cleanup-fix'

When "git rebase -i" processes "fixup" insn, there is no reason to
clean up the commit log message, but we did the usual stripspace
processing. This has been corrected.

* js/rebase-i-commit-cleanup-fix:
rebase -i: do leave commit message intact in fixup! chains

Merge branch 'jk/t0000-cleanups'

Code clean-up.

* jk/t0000-cleanups:
  t0000: consistently use single quotes for outer tests
  t0000: run cleaning test inside sub-test
  t0000: run prereq tests inside sub-test
  t0000: keep clean-up tests together

Merge branch 'sg/t7800-difftool-robustify'

Test fix.

* sg/t7800-difftool-robustify:
t7800-difftool: don't accidentally match tmp dirs

Merge branch 'ab/lose-grep-debug'

Lose the debugging aid that may have been useful in the past, but
no longer is, in the "grep" codepaths.

* ab/lose-grep-debug:
grep/log: remove hidden --debug and --grep-debug options

Merge branch 'jk/use-oid-pos'

Code clean-up to ensure our use of hashtables using object names as
keys use the "struct object_id" objects, not the raw hash values.

* jk/use-oid-pos:
  oid_pos(): access table through const pointers
  hash_pos(): convert to oid_pos()
  rerere: use strmap to store rerere directories
  rerere: tighten rr-cache dirname check
  rerere: check dirname format while iterating rr_cache directory
  commit_graft_pos(): take an oid instead of a bare hash

Sync with 2.30.1

.github/workflows/main.yml: run static-analysis on bionic

GitHub Actions is transitioning workflow steps that run on
'ubuntu-latest' from 18.04 to 20.04 [1].

This works fine in all steps except the static-analysis one, since
Coccinelle isn't available on Ubuntu focal (it is only available in the
universe suite).

Until Coccinelle can be installed from 20.04's main suite, pin the
static-analysis build to run on 18.04, where it can be installed by
default.

[1]: https://github.com/actions/virtual-environments/issues/1816

Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Git 2.30.1

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'pb/ci-matrix-wo-shortcut' into maint

Our setting of GitHub CI test jobs were a bit too eager to give up
once there is even one failure found. Tweak the knob to allow
other jobs keep running even when we see a failure, so that we can
find more failures in a single run.

* pb/ci-matrix-wo-shortcut:
ci: do not cancel all jobs of a matrix if one fails

Merge branch 'pb/blame-funcname-range-userdiff' into maint

Test fix.

* pb/blame-funcname-range-userdiff:
annotate-tests: quote variable expansions containing path names

Merge branch 'jk/p5303-sed-portability-fix' into maint

A perf script was made more portable.

* jk/p5303-sed-portability-fix:
p5303: avoid sed GNU-ism

Merge branch 'ab/branch-sort' into maint

The implementation of "git branch --sort" wrt the detached HEAD
display has always been hacky, which has been cleaned up.

* ab/branch-sort:
  branch: show "HEAD detached" first under reverse sort
  branch: sort detached HEAD based on a flag
  ref-filter: move ref_sorting flags to a bitfield
  ref-filter: move "cmp_fn" assignment into "else if" arm
  ref-filter: add braces to if/else if/else chain
  branch tests: add to --sort tests
  branch: change "--local" to "--list" in comment

Merge branch 'ma/more-opaque-lock-file' into maint

Code clean-up.

* ma/more-opaque-lock-file:
  read-cache: try not to peek into `struct {lock_,temp}file`
  refs/files-backend: don't peek into `struct lock_file`
  midx: don't peek into `struct lock_file`
  commit-graph: don't peek into `struct lock_file`
  builtin/gc: don't peek into `struct lock_file`

Merge branch 'dl/p4-encode-after-kw-expansion' into maint

Text encoding fix for "git p4".

* dl/p4-encode-after-kw-expansion:
git-p4: fix syncing file types with pattern

Merge branch 'ar/t6016-modernise' into maint

Test update.

* ar/t6016-modernise:
t6016: move to lib-log-graph.sh framework

Merge branch 'zh/arg-help-format' into maint

Clean up option descriptions in "git cmd --help".

* zh/arg-help-format:
builtin/*: update usage format
parse-options: format argh like error messages

Merge branch 'ma/doc-pack-format-varint-for-sizes' into maint

Doc update.

* ma/doc-pack-format-varint-for-sizes:
pack-format.txt: document sizes at start of delta data

Merge branch 'ma/t1300-cleanup' into maint

Code clean-up.

* ma/t1300-cleanup:
  t1300: don't needlessly work with `core.foo` configs
  t1300: remove duplicate test for `--file no-such-file`
  t1300: remove duplicate test for `--file ../foo`

Merge branch 'fc/t6030-bisect-reset-removes-auxiliary-files' into maint

A 3-year old test that was not testing anything useful has been
corrected.

* fc/t6030-bisect-reset-removes-auxiliary-files:
test: bisect-porcelain: fix location of files

Sync with maint

The sixth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'sg/test-stress-jobs'

Test framework fix.

* sg/test-stress-jobs:
test-lib: prevent '--stress-jobs=X' from being ignored

Merge branch 'jk/weather-balloon-require-variadic-macro'

We've carried compatibility codepaths for compilers without
variadic macros for quite some time, but the world may be ready for
them to be removed. Force compilation failure on exotic platforms
where variadic macros are not available to find out who screams in
such a way that we can easily revert if it turns out that the world
is not yet ready.

* jk/weather-balloon-require-variadic-macro:
git-compat-util: always enable variadic macros

Merge branch 'pb/ci-matrix-wo-shortcut'

Our setting of GitHub CI test jobs were a bit too eager to give up
once there is even one failure found. Tweak the knob to allow
other jobs keep running even when we see a failure, so that we can
find more failures in a single run.

* pb/ci-matrix-wo-shortcut:
ci: do not cancel all jobs of a matrix if one fails

Merge branch 'pb/blame-funcname-range-userdiff'

Test fix.

* pb/blame-funcname-range-userdiff:
annotate-tests: quote variable expansions containing path names

Merge branch 'jk/p5303-sed-portability-fix'

A perf script was made more portable.

* jk/p5303-sed-portability-fix:
p5303: avoid sed GNU-ism

Merge branch 'jv/pack-objects-narrower-ref-iteration'

The "pack-objects" command needs to iterate over all the tags when
automatic tag following is enabled, but it actually iterated over
all refs and then discarded everything outside "refs/tags/"
hierarchy, which was quite wasteful.

* jv/pack-objects-narrower-ref-iteration:
builtin/pack-objects.c: avoid iterating all refs

Merge branch 'ph/use-delete-refs'

When removing many branches and tags, the code used to do so one
ref at a time. There is another API it can use to delete multiple
refs, and it makes quite a lot of performance difference when the
refs are packed.

* ph/use-delete-refs:
use delete_refs when deleting tags or branches

Merge branch 'tb/ls-refs-optim'

The ls-refs protocol operation has been optimized to narrow the
sub-hierarchy of refs/ it walks to produce response.

* tb/ls-refs-optim:
  ls-refs.c: traverse prefixes of disjoint "ref-prefix" sets
  ls-refs.c: initialize 'prefixes' before using it
  refs: expose 'for_each_fullref_in_prefixes'

Merge branch 'zh/ls-files-deduplicate'

"git ls-files" can and does show multiple entries when the index is
unmerged, which is a source for confusion unless -s/-u option is in
use.  A new option --deduplicate has been introduced.

* zh/ls-files-deduplicate:
  ls-files.c: add --deduplicate option
  ls_files.c: consolidate two for loops into one
  ls_files.c: bugfix for --deleted and --modified

Merge branch 'ds/cache-tree-basics'

Document, clean-up and optimize the code around the cache-tree
extension in the index.

* ds/cache-tree-basics:
  cache-tree: speed up consecutive path comparisons
  cache-tree: use ce_namelen() instead of strlen()
  index-format: discuss recursion of cache-tree better
  index-format: update preamble to cache tree extension
  index-format: use 'cache tree' over 'cached tree'
  cache-tree: trace regions for prime_cache_tree
  cache-tree: trace regions for I/O
  cache-tree: use trace2 in cache_tree_update()
  unpack-trees: add trace2 regions
  tree-walk: report recursion counts

Merge branch 'en/ort-conflict-handling'

ORT merge strategy learns more support for merge conflicts.

* en/ort-conflict-handling:
  merge-ort: add handling for different types of files at same path
  merge-ort: copy find_first_merges() implementation from merge-recursive.c
  merge-ort: implement format_commit()
  merge-ort: copy and adapt merge_submodule() from merge-recursive.c
  merge-ort: copy and adapt merge_3way() from merge-recursive.c
  merge-ort: flesh out implementation of handle_content_merge()
  merge-ort: handle book-keeping around two- and three-way content merge
  merge-ort: implement unique_path() helper
  merge-ort: handle directory/file conflicts that remain
  merge-ort: handle D/F conflict where directory disappears due to merge

Merge branch 'so/log-diff-merge'

"git log" learned a new "--diff-merges=<how>" option.

* so/log-diff-merge: (32 commits)
  t4013: add tests for --diff-merges=first-parent
  doc/git-show: include --diff-merges description
  doc/rev-list-options: document --first-parent changes merges format
  doc/diff-generate-patch: mention new --diff-merges option
  doc/git-log: describe new --diff-merges options
  diff-merges: add '--diff-merges=1' as synonym for 'first-parent'
  diff-merges: add old mnemonic counterparts to --diff-merges
  diff-merges: let new options enable diff without -p
  diff-merges: do not imply -p for new options
  diff-merges: implement new values for --diff-merges
  diff-merges: make -m/-c/--cc explicitly mutually exclusive
  diff-merges: refactor opt settings into separate functions
  diff-merges: get rid of now empty diff_merges_init_revs()
  diff-merges: group diff-merge flags next to each other inside 'rev_info'
  diff-merges: split 'ignore_merges' field
  diff-merges: fix -m to properly override -c/--cc
  t4013: add tests for -m failing to override -c/--cc
  t4013: support test_expect_failure through ':failure' magic
  diff-merges: revise revs->diff flag handling
  diff-merges: handle imply -p on -c/--cc logic for log.c
  ...

Prepare for 2.30.1

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'js/skip-dashed-built-ins-from-config-mak' into maint

Build fix.

* js/skip-dashed-built-ins-from-config-mak:
SKIP_DASHED_BUILT_INS: respect `config.mak`

Merge branch 'jt/packfile-as-uri-doc' into maint

Doc fix for packfile URI feature.

* jt/packfile-as-uri-doc:
Doc: clarify contents of packfile sent as URI

Merge branch 'ab/fsck-doc-fix' into maint

Documentation for "git fsck" lost stale bits that has become
incorrect.

* ab/fsck-doc-fix:
fsck doc: remove ancient out-of-date diagnostics

Merge branch 'jk/log-cherry-pick-duplicate-patches' into maint

When more than one commit with the same patch ID appears on one
side, "git log --cherry-pick A...B" did not exclude them all when a
commit with the same patch ID appears on the other side. Now it
does.

* jk/log-cherry-pick-duplicate-patches:
patch-ids: handle duplicate hashmap entries

Merge branch 'jk/forbid-lf-in-git-url' into maint

Newline characters in the host and path part of git:// URL are
now forbidden.

* jk/forbid-lf-in-git-url:
fsck: reject .gitmodules git:// urls with newlines
git_connect_git(): forbid newlines in host and path

Merge branch 'jc/macos-install-dependencies-fix' into maint

Fix for procedure to building CI test environment for mac.

* jc/macos-install-dependencies-fix:
ci/install-depends: attempt to fix "brew cask" stuff

Merge branch 'tb/local-clone-race-doc' into maint

Doc update.

* tb/local-clone-race-doc:
Documentation/git-clone.txt: document race with --local

Merge branch 'bc/doc-status-short' into maint

Doc update.

* bc/doc-status-short:
docs: rephrase and clarify the git status --short format

Merge branch 'ab/gettext-charset-comment-fix' into maint

Comments update.

* ab/gettext-charset-comment-fix:
gettext.c: remove/reword a mostly-useless comment
Makefile: remove a warning about old GETTEXT_POISON flag

Merge branch 'ug/doc-lose-dircache' into maint

Doc update.

* ug/doc-lose-dircache:
doc: remove "directory cache" from man pages

Merge branch 'ad/t4129-setfacl-target-fix' into maint

Test fix.

* ad/t4129-setfacl-target-fix:
t4129: fix setfacl-related permissions failure

Merge branch 'jk/t5516-deflake' into maint

Test fix.

* jk/t5516-deflake:
t5516: loosen "not our ref" error check

Merge branch 'vv/send-email-with-less-secure-apps-access' into maint

Doc update.

* vv/send-email-with-less-secure-apps-access:
git-send-email.txt: mention less secure app access with Gmail

Merge branch 'pb/mergetool-tool-help-fix' into maint

Fix 2.29 regression where "git mergetool --tool-help" fails to list
all the available tools.

* pb/mergetool-tool-help-fix:
mergetool--lib: fix '--tool-help' to correctly show available tools

Merge branch 'ds/for-each-repo-noopfix' into maint

"git for-each-repo --config=<var> <cmd>" should not run <cmd> for
any repository when the configuration variable <var> is not defined
even once.

* ds/for-each-repo-noopfix:
for-each-repo: do nothing on empty config

Merge branch 'jc/sign-off' into maint

Doc update.

* jc/sign-off:
SubmittingPatches: tighten wording on "sign-off" procedure

Merge branch 'mt/t4129-with-setgid-dir' into maint

Some tests expect that "ls -l" output has either '-' or 'x' for
group executable bit, but setgid bit can be inherited from parent
directory and make these fields 'S' or 's' instead, causing test
failures.

* mt/t4129-with-setgid-dir:
t4129: don't fail if setgid is set in the test directory

Merge branch 'en/stash-apply-sparse-checkout' into maint

"git stash" did not work well in a sparsely checked out working
tree.

* en/stash-apply-sparse-checkout:
  stash: fix stash application in sparse-checkouts
  stash: remove unnecessary process forking
  t7012: add a testcase demonstrating stash apply bugs in sparse checkouts

Merge branch 'nk/perf-fsmonitor-cleanup' into maint

Test fix.

* nk/perf-fsmonitor-cleanup:
p7519: allow running without watchman prereq

Merge branch 'rs/rebase-commit-validation' into maint

Diagnose command line error of "git rebase" early.

* rs/rebase-commit-validation:
rebase: verify commit parameter

Merge branch 'pb/doc-modules-git-work-tree-typofix' into maint

Doc fix.

* pb/doc-modules-git-work-tree-typofix:
gitmodules.txt: fix 'GIT_WORK_TREE' variable name

Merge branch 'ta/doc-typofix' into maint

Doc fix.

* ta/doc-typofix:
doc: fix some typos

Merge branch 'pk/subsub-fetch-fix-take-2' into maint

"git fetch --recurse-submodules" fix (second attempt).

* pk/subsub-fetch-fix-take-2:
submodules: fix of regression on fetching of non-init subsub-repo

The fifth batch

Merge branch 'jk/run-command-use-shell-doc'

The .use_shell flag in struct child_process that is passed to
run_command() API has been clarified with a bit more documentation.

* jk/run-command-use-shell-doc:
run-command: document use_shell option

Merge branch 'jk/peel-iterated-oid'

The peel_ref() API has been replaced with peel_iterated_oid().

* jk/peel-iterated-oid:
refs: switch peel_ref() to peel_iterated_oid()

Merge branch 'js/skip-dashed-built-ins-from-config-mak'

Build fix.

* js/skip-dashed-built-ins-from-config-mak:
SKIP_DASHED_BUILT_INS: respect `config.mak`

Merge branch 'jt/packfile-as-uri-doc'

Doc fix for packfile URI feature.

* jt/packfile-as-uri-doc:
Doc: clarify contents of packfile sent as URI

Merge branch 'ds/maintenance-prefetch-cleanup'

Test clean-up plus UI improvement by hiding extra refs that
the prefetch task uses from "log --decorate" output.

* ds/maintenance-prefetch-cleanup:
t7900: clean up some broken refs
maintenance: set log.excludeDecoration durin prefetch

Merge branch 'ab/fsck-doc-fix'

Documentation for "git fsck" lost stale bits that has become
incorrect.

* ab/fsck-doc-fix:
fsck doc: remove ancient out-of-date diagnostics

annotate-tests: quote variable expansions containing path names

The test case added by 9466e3809d ("blame: enable funcname blaming with
userdiff driver", 2020-11-01) forgot to quote variable expansions. This
causes failures when the current directory contains blanks.

One variable that the test case introduces will not have IFS characters
and could remain without quotes, but let's quote all expansions for
consistency, not just the one that has the path name.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Acked-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree: teach `list` verbose mode

"git worktree list" annotates each worktree according to its state such
as "prunable" or "locked", however it is not immediately obvious why
these worktrees are being annotated. For prunable worktrees a reason
is available that is returned by should_prune_worktree() and for locked
worktrees a reason might be available provided by the user via `lock`
command.

Let's teach "git worktree list" a --verbose mode that outputs the reason
why the worktrees are being annotated. The reason is a text that can take
virtually any size and appending the text on the default columned format
will make it difficult to extend the command with other annotations and
not fit nicely on the screen. In order to address this shortcoming the
annotation is then moved to the next line indented followed by the reason
If the reason is not available the annotation stays on the same line as
the worktree itself.

The output of "git worktree list" with verbose becomes like so:

    $ git worktree list --verbose
    ...
    /path/to/locked-no-reason    acb124 [branch-a] locked
    /path/to/locked-with-reason  acc125 [branch-b]
        locked: worktree with a locked reason
    /path/to/prunable-reason     ace127 [branch-d]
        prunable: gitdir file points to non-existent location
    ...

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree: teach `list` to annotate prunable worktree

The "git worktree list" command shows the absolute path to the worktree,
the commit that is checked out, the name of the branch, and a "locked"
annotation if the worktree is locked, however, it does not indicate
whether the worktree is prunable.

The "prune" command will remove a worktree if it is prunable unless
`--dry-run` option is specified. This could lead to a worktree being
removed without the user realizing before it is too late, in case the
user forgets to pass --dry-run for instance. If the "list" command shows
which worktree is prunable, the user could verify before running
"git worktree prune" and hopefully prevents the working tree to be
removed "accidentally" on the worse case scenario.

Let's teach "git worktree list" to show when a worktree is a prunable
candidate for both default and porcelain format.

In the default format a "prunable" text is appended:

    $ git worktree list
    /path/to/main      aba123 [main]
    /path/to/linked    123abc [branch-a]
    /path/to/prunable  ace127 (detached HEAD) prunable

In the --porcelain format a prunable label is added followed by
its reason:

    $ git worktree list --porcelain
    ...
    worktree /path/to/prunable
    HEAD abc1234abc1234abc1234abc1234abc1234abc12
    detached
    prunable gitdir file points to non-existent location
    ...

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree: teach `list --porcelain` to annotate locked worktree

Commit c57b3367be (worktree: teach `list` to annotate locked worktree,
2020-10-11) taught "git worktree list" to annotate locked worktrees by
appending "locked" text to its output, however, this is not listed in
the --porcelain format.

Teach "list --porcelain" to do the same and add a "locked" attribute
followed by its reason, thus making both default and porcelain format
consistent. If the locked reason is not available then only "locked"
is shown.

The output of the "git worktree list --porcelain" becomes like so:

    $ git worktree list --porcelain
    ...
    worktree /path/to/locked
    HEAD 123abcdea123abcd123acbd123acbda123abcd12
    detached
    locked

    worktree /path/to/locked-with-reason
    HEAD abc123abc123abc123abc123abc123abc123abc1
    detached
    locked reason why it is locked
    ...

In porcelain mode, if the lock reason contains special characters
such as newlines, they are escaped with backslashes and the entire
reason is enclosed in double quotes. For example:

   $ git worktree list --porcelain
   ...
   locked "worktree's path mounted in\nremovable device"
   ...

Furthermore, let's update the documentation to state that some
attributes in the porcelain format might be listed alone or together
with its value depending whether the value is available or not. Thus
documenting the case of the new "locked" attribute.

Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t2402: ensure locked worktree is properly cleaned up

c57b3367be (worktree: teach `list` to annotate locked worktree,
2020-10-11) introduced a new test to ensure locked worktrees are listed
with "locked" annotation. However, the test does not clean up after
itself as "git worktree prune" is not going to remove the locked worktree
in the first place. This not only leaves the test in an unclean state it
also potentially breaks following tests that rely on the
"git worktree list" output.

Let's fix that by unlocking the worktree before the "prune" command.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree: teach worktree_lock_reason() to gently handle main worktree

worktree_lock_reason() aborts with an assertion failure when called on
the main worktree since locking the main worktree is nonsensical. Not
only is this behavior undocumented, thus callers might not even be aware
that the call could potentially crash the program, but it also forces
clients to be extra careful:

    if (!is_main_worktree(wt) && worktree_locked_reason(...))
        ...

Since we know that locking makes no sense in the context of the main
worktree, we can simply return false for the main worktree, thus making
client code less complex by eliminating the need for the callers to have
inside knowledge about the implementation:

    if (worktree_lock_reason(...))
        ...

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree: teach worktree to lazy-load "prunable" reason

Add worktree_prune_reason() to allow a caller to discover whether a
worktree is prunable and the reason that it is, much like
worktree_lock_reason() indicates whether a worktree is locked and the
reason for the lock. As with worktree_lock_reason(), retrieve the
prunable reason lazily and cache it in the `worktree` structure.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree: libify should_prune_worktree()

As part of teaching "git worktree list" to annotate worktree that is a
candidate for pruning, let's move should_prune_worktree() from
builtin/worktree.c to worktree.c in order to make part of the worktree
public API.

should_prune_worktree() knows how to select the given worktree for
pruning based on an expiration date, however the expiration value is
stored in a static file-scope variable and it is not local to the
function. In order to move the function, teach should_prune_worktree()
to take the expiration date as an argument and document the new
parameter that is not immediately obvious.

Also, change the function comment to clearly state that the worktree's
path is returned in `wtpath` argument.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

p5303: avoid sed GNU-ism

Using "1~5" isn't portable. Nobody seems to have noticed, since perhaps
people don't tend to run the perf suite on more exotic platforms. Still,
it's better to set a good example.

We can use:

perl -ne 'print if $. % 5 == 1'

instead. But we can further observe that perl does a good job of the
other parts of this pipeline, and fold the whole thing together.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pretty: lazy-load commit data when expanding user-format

When we expand a user-format, we try to avoid work that isn't necessary
for the output. For instance, we don't bother parsing the commit header
until we know we need the author, subject, etc.

But we do always load the commit object's contents from disk, even if
the format doesn't require it (e.g., just "%H"). Traditionally this
didn't matter much, because we'd have loaded it as part of the traversal
anyway, and we'd typically have those bytes attached to the commit
struct (or these days, cached in a commit-slab).

But when we have a commit-graph, we might easily get to the point of
pretty-printing a commit without ever having looked at the actual object
contents. We should push off that load (and reencoding) until we're
certain that it's needed.

I think the results of p4205 show the advantage pretty clearly (we serve
parent and tree oids out of the commit struct itself, so they benefit as
well):

  # using git.git as the test repo
  Test                          HEAD^             HEAD
  ----------------------------------------------------------------------
  4205.1: log with %H           0.40(0.39+0.01)   0.03(0.02+0.01) -92.5%
  4205.2: log with %h           0.45(0.44+0.01)   0.09(0.09+0.00) -80.0%
  4205.3: log with %T           0.40(0.39+0.00)   0.04(0.04+0.00) -90.0%
  4205.4: log with %t           0.46(0.46+0.00)   0.09(0.08+0.01) -80.4%
  4205.5: log with %P           0.39(0.39+0.00)   0.03(0.03+0.00) -92.3%
  4205.6: log with %p           0.46(0.46+0.00)   0.10(0.09+0.00) -78.3%
  4205.7: log with %h-%h-%h     0.52(0.51+0.01)   0.15(0.14+0.00) -71.2%
  4205.8: log with %an-%ae-%s   0.42(0.41+0.00)   0.42(0.41+0.01) +0.0%

  # using linux.git as the test repo
  Test                          HEAD^             HEAD
  ----------------------------------------------------------------------
  4205.1: log with %H           7.12(6.97+0.14)   0.76(0.65+0.11) -89.3%
  4205.2: log with %h           7.35(7.19+0.16)   1.30(1.19+0.11) -82.3%
  4205.3: log with %T           7.58(7.42+0.15)   1.02(0.94+0.08) -86.5%
  4205.4: log with %t           8.05(7.89+0.15)   1.55(1.41+0.13) -80.7%
  4205.5: log with %P           7.12(7.01+0.10)   0.76(0.69+0.07) -89.3%
  4205.6: log with %p           7.38(7.27+0.10)   1.32(1.20+0.12) -82.1%
  4205.7: log with %h-%h-%h     7.81(7.67+0.13)   1.79(1.67+0.12) -77.1%
  4205.8: log with %an-%ae-%s   7.90(7.74+0.15)   7.81(7.66+0.15) -1.1%

I added the final test to show where we don't improve (the 1% there is
just lucky noise), but also as a regression test to make sure we're not
doing anything stupid like loading the commit multiple times when there
are several placeholders that need it.

Reported-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rebase -i: do leave commit message intact in fixup! chains

In 6e98de72c03 (sequencer (rebase -i): add support for the 'fixup' and
'squash' commands, 2017-01-02), this developer introduced a change of
behavior by mistake: when encountering a `fixup!` commit (or multiple
`fixup!` commits) without any `squash!` commit thrown in, the final `git
commit` was invoked with `--cleanup=strip`. Prior to that commit, the
commit command had been called without that `--cleanup` option.

Since we explicitly read the original commit message from a file in that
case, there is really no sense in forcing that clean-up.

We actually need to actively suppress that clean-up lest a configured
`commit.cleanup` may interfere with what we want to do: leave the commit
message unchanged.

Reported-by: Vojtěch Knyttl <vojtech@knyt.tl>
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t0000: consistently use single quotes for outer tests

When we use the sub-test helpers, we end up defining one shell snippet
inside another shell snippet. So if we use single-quotes for the outer
snippet, we have to use double-quotes within the inner snippet (it's
included as here-doc within the outer snippet, but using a single quote
would end the outer snippet early). Or vice versa we can use double
quotes for the outer snippet, but then single quotes in the inner.

We have some of each in the script, and neither is wrong. But it would
be nice to be consistent unless there is a good reason not to. Using
single quotes for the outer script is preferable, because it requires
less metacharacter quoting overall. For example, in:

  test_expect_success 'outer' '
run_sub_test_lib_test ...  <<-\EOF
echo $foo &&
test_expect_success "inner" "
echo \$bar
"
EOF
  '

we need only quote inside "inner", but not inside "outer" or the
here-doc. Whereas if we flip them, we have to quote in both places:

  test_expect_success 'outer' "
run_sub_test_lib_test ...  <<-\EOF
echo \$foo &&
test_expect_success 'inner' '
echo \$bar
'
EOF
  "

The exception is when we need a literal single-quote in an expected
output here-doc. There we can either use outer double-quotes, or just
use ${SQ} within the doc. I chose the latter for consistency (within
this test, but also with other test scripts that face the same problem).

There is one other interesting case, which is some tests that do:

  test_expect_success ... "
do_something --run='"'!3'"'
  "

This is rather confusing to read, but is correct. The outer script sees
'!3' in single-quotes, as does the eval'd snippet. This is perhaps being
overly cautious. In many interactive shells, an exclamation triggers
history expansion even inside double quotes, but that is not generally
true in non-interactive shells.

There's some conflicting information here. Commit 784ce03d55 (t4216:
avoid unnecessary subshell in test_bloom_filters_not_used, 2020-05-19)
reports it as a problem with OpenBSD 6.7's /bin/sh. However, we have
many instances in this script of prereqs like !LAZY_TRUE, which haven't
been a problem. I left them un-escaped here to test out this theory.
It's much nicer if we can not worry about this as a portability issue,
so it's worth knowing.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t0000: run cleaning test inside sub-test

Our check of test_when_finished is done directly in the main script, and
if we failed to clean, we complain and exit immediately. It's nicer to
signal a test failure here, for a few reasons:

  - this gives better output to the user when run under a TAP harness
    like "prove"

  - constency; it's the only test left in the file that behaves this way

  - half of its "if" conditional is nonsense anyway; it picked up a
    reference to GIT_TEST_FAIL_PREREQS_INTERNAL in dfe1a17df9 (tests:
    add a special setup where prerequisites fail, 2019-05-13) along with
    its neighbors, even though it has nothing to do with that flag

We could actually do this without a sub-test at all, and just put our
two tests (one to do cleanup, and one to check that it happened) in the
main script. But doing it in a subtest is conceptually cleaner (from the
perspective of the main test script, we are checking only one thing),
and it remains consistent with the "cleanup when failing" test directly
after it, which has to happen in a sub-test (to avoid the main script
complaining of the failed test).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t0000: run prereq tests inside sub-test

We test the behavior of prerequisites in t0000 by setting up fake ones
in the main test script, trying to run some tests, and then seeing if
those tests impacted the environment correctly. If they didn't, then we
write a message and manually call exit.

Instead, let's push these down into a sub-test, like many of the other
tests covering the framework itself. This has a few advantages:

  - it does not pollute the test output with mention of skipped tests
    (that we know are uninteresting -- the point of the test was to see
    that these are skipped).

  - when running in a TAP harness, we get a useful test failure message
    (whereas when the script exits early, a tool like "prove" simply
    says "Dubious, test returned 1").

  - we do not have to worry about different test environments, such as
    when GIT_TEST_FAIL_PREREQS_INTERNAL is set. Our sub-test helpers
    already give us a known environment.

  - the tests themselves are a bit easier to read, as we can just check
    the test-framework output to see what happened (and get the usual
    test_cmp diff if it failed)

A few notes on the implementation:

  - we could do one sub-test per each individual test_expect_success. I
    broke it up here into a few logical groups, as I think this makes it
    more readable

  - the original tests modified environment variables inside the test
    bodies. Instead, I've used "true" as the body of a test we expect to
    run and "false" otherwise. Technically this does not confirm that
    the body of the "true" test actually ran. We are trusting the
    framework output to believe that it truly ran, which is sufficient
    for these tests. And I think the end result is much simpler to
    follow.

  - the nested_prereq test uses a few bare "test -f" calls; I converted
    these to our usual test_path_is_* helpers while moving the code
    around.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t0000: keep clean-up tests together

We check that test_when_finished cleans up after a test, and that it
runs even after a failure. Those two were originally adjacent, but got
split apart by the new test added in 477dcaddb6 (tests: do not let lazy
prereqs inside `test_expect_*` turn off tracing, 2020-03-26), and then
further by more lazy-prereq tests. Let's move them back together.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

oid_pos(): access table through const pointers

When we are looking up an oid in an array, we obviously don't need to
write to the array. Let's mark it as const in the function interfaces,
as well as in the local variables we use to derference the void pointer
(note a few cases use pointers-to-pointers, so we mark everything
const).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>