Elijah Newren [Sun, 14 Feb 2021 07:51:51 +0000 (07:51 +0000)]
merge-ort: call diffcore_rename() directly
We want to pass additional information to diffcore_rename() (or some
variant thereof) without plumbing that extra information through
diff_tree_oid() and diffcore_std(). Further, since we will need to
gather additional special information related to diffs and are walking
the trees anyway in collect_merge_info(), it seems odd to have
diff_tree_oid()/diffcore_std() repeat those tree walks. And there may
be times where we can avoid traversing into a subtree in
collect_merge_info() (based on additional information at our disposal),
that the basic diff logic would be unable to take advantage of. For all
these reasons, just create the add and delete pairs ourself and then
call diffcore_rename() directly.
This change is primarily about enabling future optimizations; the
advantage of avoiding extra tree traversals is small compared to the
cost of rename detection, and the advantage of avoiding the extra tree
traversals is somewhat offset by the extra time spent in
collect_merge_info() collecting the additional data anyway. However...
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 13.294 s ± 0.103 s 12.775 s ± 0.062 s
mega-renames: 187.248 s ± 0.882 s 188.754 s ± 0.284 s
just-one-mega: 5.557 s ± 0.017 s 5.599 s ± 0.019 s
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Sun, 14 Feb 2021 07:51:50 +0000 (07:51 +0000)]
gitdiffcore doc: mention new preliminary step for rename detection
The last few patches have introduced a new preliminary step when rename
detection is on but both break detection and copy detection are off.
Document this new step. While we're at it, add a testcase that checks
the new behavior as well.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Sun, 14 Feb 2021 07:51:49 +0000 (07:51 +0000)]
diffcore-rename: guide inexact rename detection based on basenames
Make use of the new find_basename_matches() function added in the last
two patches, to find renames more rapidly in cases where we can match up
files based on basenames. As a quick reminder (see the last two commit
messages for more details), this means for example that
docs/extensions.txt and docs/config/extensions.txt are considered likely
renames if there are no remaining 'extensions.txt' files elsewhere among
the added and deleted files, and if a similarity check confirms they are
similar, then they are marked as a rename without looking for a better
similarity match among other files. This is a behavioral change, as
covered in more detail in the previous commit message.
We do not use this heuristic together with either break or copy
detection. The point of break detection is to say that filename
similarity does not imply file content similarity, and we only want to
know about file content similarity. The point of copy detection is to
use more resources to check for additional similarities, while this is
an optimization that uses far less resources but which might also result
in finding slightly fewer similarities. So the idea behind this
optimization goes against both of those features, and will be turned off
for both.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 13.815 s ± 0.062 s 13.294 s ± 0.103 s
mega-renames: 1799.937 s ± 0.493 s 187.248 s ± 0.882 s
just-one-mega: 51.289 s ± 0.019 s 5.557 s ± 0.017 s
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Sun, 14 Feb 2021 07:51:48 +0000 (07:51 +0000)]
diffcore-rename: complete find_basename_matches()
It is not uncommon in real world repositories for the majority of file
renames to not change the basename of the file; i.e. most "renames" are
just a move of files into different directories. We can make use of
this to avoid comparing all rename source candidates with all rename
destination candidates, by first comparing sources to destinations with
the same basenames. If two files with the same basename are
sufficiently similar, we record the rename; if not, we include those
files in the more exhaustive matrix comparison.
This means we are adding a set of preliminary additional comparisons,
but for each file we only compare it with at most one other file. For
example, if there was a include/media/device.h that was deleted and a
src/module/media/device.h that was added, and there are no other
device.h files in the remaining sets of added and deleted files after
exact rename detection, then these two files would be compared in the
preliminary step.
This commit does not yet actually employ this new optimization, it
merely adds a function which can be used for this purpose. The next
commit will do the necessary plumbing to make use of it.
Note that this optimization might give us different results than without
the optimization, because it's possible that despite files with the same
basename being sufficiently similar to be considered a rename, there's
an even better match between files without the same basename. I think
that is okay for four reasons: (1) it's easy to explain to the users
what happened if it does ever occur (or even for them to intuitively
figure out), (2) as the next patch will show it provides such a large
performance boost that it's worth the tradeoff, and (3) it's somewhat
unlikely that despite having unique matching basenames that other files
serve as better matches. Reason (4) takes a full paragraph to
explain...
If the previous three reasons aren't enough, consider what rename
detection already does. Break detection is not the default, meaning
that if files have the same _fullname_, then they are considered related
even if they are 0% similar. In fact, in such a case, we don't even
bother comparing the files to see if they are similar let alone
comparing them to all other files to see what they are most similar to.
Basically, we override content similarity based on sufficient filename
similarity. Without the filename similarity (currently implemented as
an exact match of filename), we swing the pendulum the opposite
direction and say that filename similarity is irrelevant and compare a
full N x M matrix of sources and destinations to find out which have the
most similar contents. This optimization just adds another form of
filename similarity comparison, but augments it with a file content
similarity check as well. Basically, if two files have the same
basename and are sufficiently similar to be considered a rename, mark
them as such without comparing the two to all other rename candidates.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Sun, 14 Feb 2021 07:51:47 +0000 (07:51 +0000)]
diffcore-rename: compute basenames of source and dest candidates
We want to make use of unique basenames among remaining source and
destination files to help inform rename detection, so that more likely
pairings can be checked first. (src/moduleA/foo.txt and
source/module/A/foo.txt are likely related if there are no other
'foo.txt' files among the remaining deleted and added files.) Add a new
function, not yet used, which creates a map of the unique basenames
within rename_src and another within rename_dst, together with the
indices within rename_src/rename_dst where those basenames show up.
Non-unique basenames still show up in the map, but have an invalid index
(-1).
This function was inspired by the fact that in real world repositories,
files are often moved across directories without changing names. Here
are some sample repositories and the percentage of their historical
renames (as of early 2020) that preserved basenames:
* linux: 76%
* gcc: 64%
* gecko: 79%
* webkit: 89%
These statistics alone don't prove that an optimization in this area
will help or how much it will help, since there are also unpaired adds
and deletes, restrictions on which basenames we consider, etc., but it
certainly motivated the idea to try something in this area.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Sun, 14 Feb 2021 07:51:46 +0000 (07:51 +0000)]
t4001: add a test comparing basename similarity and content similarity
Add a simple test where a removed file is similar to two different added
files; one of them has the same basename, and the other has a slightly
higher content similarity. In the current test, content similarity is
weighted higher than filename similarity.
Subsequent commits will add a new rule that weighs a mixture of filename
similarity and content similarity in a manner that will change the
outcome of this testcase.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Sun, 14 Feb 2021 07:35:01 +0000 (07:35 +0000)]
diffcore-rename: filter rename_src list when possible
We have to look at each entry in rename_src a total of rename_dst_nr
times. When we're not detecting copies, any exact renames or ignorable
rename paths will just be skipped over. While checking that these can
be skipped over is a relatively cheap check, it's still a waste of time
to do that check more than once, let alone rename_dst_nr times. When
rename_src_nr is a few thousand times bigger than the number of relevant
sources (such as when cherry-picking a commit that only touched a
handful of files, but from a side of history that has different names
for some high level directories), this time can add up.
First make an initial pass over the rename_src array and move all the
relevant entries to the front, so that we can iterate over just those
relevant entries.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 14.119 s ± 0.101 s 13.815 s ± 0.062 s
mega-renames: 1802.044 s ± 0.828 s 1799.937 s ± 0.493 s
just-one-mega: 51.391 s ± 0.028 s 51.289 s ± 0.019 s
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Wed, 3 Feb 2021 20:03:46 +0000 (20:03 +0000)]
diffcore-rename: no point trying to find a match better than exact
diffcore_rename() had some code to avoid having destination paths that
already had an exact rename detected from being re-checked for other
renames. Source paths, however, were re-checked because we wanted to
allow the possibility of detecting copies. But if copy detection isn't
turned on, then this merely amounts to attempting to find a
better-than-exact match, which naturally ends up being an expensive
no-op. In particular, copy detection is never turned on by the merge
machinery.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 14.263 s ± 0.053 s 14.119 s ± 0.101 s
mega-renames: 5504.231 s ± 5.150 s 1802.044 s ± 0.828 s
just-one-mega: 158.534 s ± 0.498 s 51.391 s ± 0.028 s
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 11 Feb 2021 21:58:43 +0000 (13:58 -0800)]
Merge branch 'en/merge-ort-perf'
The "ort" merge strategy.
* en/merge-ort-perf:
merge-ort: begin performance work; instrument with trace2_region_* calls
merge-ort: ignore the directory rename split conflict for now
merge-ort: fix massive leak
Junio C Hamano [Thu, 11 Feb 2021 21:58:43 +0000 (13:58 -0800)]
Merge branch 'en/ort-directory-rename'
ORT merge strategy learns to infer "renamed directory" while
merging.
* en/ort-directory-rename:
merge-ort: fix a directory rename detection bug
merge-ort: process_renames() now needs more defensiveness
merge-ort: implement apply_directory_rename_modifications()
merge-ort: add a new toplevel_dir field
merge-ort: implement handle_path_level_conflicts()
merge-ort: implement check_for_directory_rename()
merge-ort: implement apply_dir_rename() and check_dir_renamed()
merge-ort: implement compute_collisions()
merge-ort: modify collect_renames() for directory rename handling
merge-ort: implement handle_directory_level_conflicts()
merge-ort: implement compute_rename_counts()
merge-ort: copy get_renamed_dir_portion() from merge-recursive.c
merge-ort: add outline of get_provisional_directory_renames()
merge-ort: add outline for computing directory renames
merge-ort: collect which directories are removed in dirs_removed
merge-ort: initialize and free new directory rename data structures
merge-ort: add new data structures for directory rename detection
Junio C Hamano [Thu, 11 Feb 2021 00:48:07 +0000 (16:48 -0800)]
Merge branch 'tb/ci-run-cocci-with-18.04'
The version of Ubuntu Linux used by default at GitHub Actions CI
has been updated to one that lack coccinelle; until it gets fixed,
work it around by sticking to the previous release (18.04).
* tb/ci-run-cocci-with-18.04:
.github/workflows/main.yml: run static-analysis on bionic
Junio C Hamano [Wed, 10 Feb 2021 22:48:33 +0000 (14:48 -0800)]
Merge branch 'ab/detox-gettext-tests'
Get rid of "GETTEXT_POISON" support altogether, which may or may
not be controversial.
* ab/detox-gettext-tests:
tests: remove uses of GIT_TEST_GETTEXT_POISON=false
tests: remove support for GIT_TEST_GETTEXT_POISON
ci: remove GETTEXT_POISON jobs
Junio C Hamano [Wed, 10 Feb 2021 22:48:33 +0000 (14:48 -0800)]
Merge branch 'jk/pretty-lazy-load-commit'
Some pretty-format specifiers do not need the data in commit object
(e.g. "%H"), but we were over-eager to load and parse it, which has
been made even lazier.
* jk/pretty-lazy-load-commit:
pretty: lazy-load commit data when expanding user-format
Junio C Hamano [Wed, 10 Feb 2021 22:48:32 +0000 (14:48 -0800)]
Merge branch 'js/rebase-i-commit-cleanup-fix'
When "git rebase -i" processes "fixup" insn, there is no reason to
clean up the commit log message, but we did the usual stripspace
processing. This has been corrected.
* js/rebase-i-commit-cleanup-fix:
rebase -i: do leave commit message intact in fixup! chains
Junio C Hamano [Wed, 10 Feb 2021 22:48:32 +0000 (14:48 -0800)]
Merge branch 'jk/t0000-cleanups'
Code clean-up.
* jk/t0000-cleanups:
t0000: consistently use single quotes for outer tests
t0000: run cleaning test inside sub-test
t0000: run prereq tests inside sub-test
t0000: keep clean-up tests together
Junio C Hamano [Wed, 10 Feb 2021 22:48:31 +0000 (14:48 -0800)]
Merge branch 'jk/use-oid-pos'
Code clean-up to ensure our use of hashtables using object names as
keys use the "struct object_id" objects, not the raw hash values.
* jk/use-oid-pos:
oid_pos(): access table through const pointers
hash_pos(): convert to oid_pos()
rerere: use strmap to store rerere directories
rerere: tighten rr-cache dirname check
rerere: check dirname format while iterating rr_cache directory
commit_graft_pos(): take an oid instead of a bare hash
Taylor Blau [Mon, 8 Feb 2021 21:22:34 +0000 (16:22 -0500)]
.github/workflows/main.yml: run static-analysis on bionic
GitHub Actions is transitioning workflow steps that run on
'ubuntu-latest' from 18.04 to 20.04 [1].
This works fine in all steps except the static-analysis one, since
Coccinelle isn't available on Ubuntu focal (it is only available in the
universe suite).
Until Coccinelle can be installed from 20.04's main suite, pin the
static-analysis build to run on 18.04, where it can be installed by
default.
Junio C Hamano [Mon, 8 Feb 2021 22:05:55 +0000 (14:05 -0800)]
Merge branch 'pb/ci-matrix-wo-shortcut' into maint
Our setting of GitHub CI test jobs were a bit too eager to give up
once there is even one failure found. Tweak the knob to allow
other jobs keep running even when we see a failure, so that we can
find more failures in a single run.
* pb/ci-matrix-wo-shortcut:
ci: do not cancel all jobs of a matrix if one fails
Junio C Hamano [Mon, 8 Feb 2021 22:05:55 +0000 (14:05 -0800)]
Merge branch 'ab/branch-sort' into maint
The implementation of "git branch --sort" wrt the detached HEAD
display has always been hacky, which has been cleaned up.
* ab/branch-sort:
branch: show "HEAD detached" first under reverse sort
branch: sort detached HEAD based on a flag
ref-filter: move ref_sorting flags to a bitfield
ref-filter: move "cmp_fn" assignment into "else if" arm
ref-filter: add braces to if/else if/else chain
branch tests: add to --sort tests
branch: change "--local" to "--list" in comment
Junio C Hamano [Mon, 8 Feb 2021 22:05:54 +0000 (14:05 -0800)]
Merge branch 'ma/more-opaque-lock-file' into maint
Code clean-up.
* ma/more-opaque-lock-file:
read-cache: try not to peek into `struct {lock_,temp}file`
refs/files-backend: don't peek into `struct lock_file`
midx: don't peek into `struct lock_file`
commit-graph: don't peek into `struct lock_file`
builtin/gc: don't peek into `struct lock_file`
Junio C Hamano [Mon, 8 Feb 2021 22:05:53 +0000 (14:05 -0800)]
Merge branch 'ma/t1300-cleanup' into maint
Code clean-up.
* ma/t1300-cleanup:
t1300: don't needlessly work with `core.foo` configs
t1300: remove duplicate test for `--file no-such-file`
t1300: remove duplicate test for `--file ../foo`
We've carried compatibility codepaths for compilers without
variadic macros for quite some time, but the world may be ready for
them to be removed. Force compilation failure on exotic platforms
where variadic macros are not available to find out who screams in
such a way that we can easily revert if it turns out that the world
is not yet ready.
Junio C Hamano [Sat, 6 Feb 2021 00:40:46 +0000 (16:40 -0800)]
Merge branch 'pb/ci-matrix-wo-shortcut'
Our setting of GitHub CI test jobs were a bit too eager to give up
once there is even one failure found. Tweak the knob to allow
other jobs keep running even when we see a failure, so that we can
find more failures in a single run.
* pb/ci-matrix-wo-shortcut:
ci: do not cancel all jobs of a matrix if one fails
The "pack-objects" command needs to iterate over all the tags when
automatic tag following is enabled, but it actually iterated over
all refs and then discarded everything outside "refs/tags/"
hierarchy, which was quite wasteful.
* jv/pack-objects-narrower-ref-iteration:
builtin/pack-objects.c: avoid iterating all refs
Junio C Hamano [Sat, 6 Feb 2021 00:40:45 +0000 (16:40 -0800)]
Merge branch 'ph/use-delete-refs'
When removing many branches and tags, the code used to do so one
ref at a time. There is another API it can use to delete multiple
refs, and it makes quite a lot of performance difference when the
refs are packed.
* ph/use-delete-refs:
use delete_refs when deleting tags or branches
Junio C Hamano [Sat, 6 Feb 2021 00:40:45 +0000 (16:40 -0800)]
Merge branch 'tb/ls-refs-optim'
The ls-refs protocol operation has been optimized to narrow the
sub-hierarchy of refs/ it walks to produce response.
* tb/ls-refs-optim:
ls-refs.c: traverse prefixes of disjoint "ref-prefix" sets
ls-refs.c: initialize 'prefixes' before using it
refs: expose 'for_each_fullref_in_prefixes'
Junio C Hamano [Sat, 6 Feb 2021 00:40:44 +0000 (16:40 -0800)]
Merge branch 'zh/ls-files-deduplicate'
"git ls-files" can and does show multiple entries when the index is
unmerged, which is a source for confusion unless -s/-u option is in
use. A new option --deduplicate has been introduced.
* zh/ls-files-deduplicate:
ls-files.c: add --deduplicate option
ls_files.c: consolidate two for loops into one
ls_files.c: bugfix for --deleted and --modified
Junio C Hamano [Sat, 6 Feb 2021 00:40:44 +0000 (16:40 -0800)]
Merge branch 'ds/cache-tree-basics'
Document, clean-up and optimize the code around the cache-tree
extension in the index.
* ds/cache-tree-basics:
cache-tree: speed up consecutive path comparisons
cache-tree: use ce_namelen() instead of strlen()
index-format: discuss recursion of cache-tree better
index-format: update preamble to cache tree extension
index-format: use 'cache tree' over 'cached tree'
cache-tree: trace regions for prime_cache_tree
cache-tree: trace regions for I/O
cache-tree: use trace2 in cache_tree_update()
unpack-trees: add trace2 regions
tree-walk: report recursion counts
Junio C Hamano [Sat, 6 Feb 2021 00:40:44 +0000 (16:40 -0800)]
Merge branch 'en/ort-conflict-handling'
ORT merge strategy learns more support for merge conflicts.
* en/ort-conflict-handling:
merge-ort: add handling for different types of files at same path
merge-ort: copy find_first_merges() implementation from merge-recursive.c
merge-ort: implement format_commit()
merge-ort: copy and adapt merge_submodule() from merge-recursive.c
merge-ort: copy and adapt merge_3way() from merge-recursive.c
merge-ort: flesh out implementation of handle_content_merge()
merge-ort: handle book-keeping around two- and three-way content merge
merge-ort: implement unique_path() helper
merge-ort: handle directory/file conflicts that remain
merge-ort: handle D/F conflict where directory disappears due to merge
Junio C Hamano [Sat, 6 Feb 2021 00:40:44 +0000 (16:40 -0800)]
Merge branch 'so/log-diff-merge'
"git log" learned a new "--diff-merges=<how>" option.
* so/log-diff-merge: (32 commits)
t4013: add tests for --diff-merges=first-parent
doc/git-show: include --diff-merges description
doc/rev-list-options: document --first-parent changes merges format
doc/diff-generate-patch: mention new --diff-merges option
doc/git-log: describe new --diff-merges options
diff-merges: add '--diff-merges=1' as synonym for 'first-parent'
diff-merges: add old mnemonic counterparts to --diff-merges
diff-merges: let new options enable diff without -p
diff-merges: do not imply -p for new options
diff-merges: implement new values for --diff-merges
diff-merges: make -m/-c/--cc explicitly mutually exclusive
diff-merges: refactor opt settings into separate functions
diff-merges: get rid of now empty diff_merges_init_revs()
diff-merges: group diff-merge flags next to each other inside 'rev_info'
diff-merges: split 'ignore_merges' field
diff-merges: fix -m to properly override -c/--cc
t4013: add tests for -m failing to override -c/--cc
t4013: support test_expect_failure through ':failure' magic
diff-merges: revise revs->diff flag handling
diff-merges: handle imply -p on -c/--cc logic for log.c
...
Junio C Hamano [Sat, 6 Feb 2021 00:31:27 +0000 (16:31 -0800)]
Merge branch 'jk/log-cherry-pick-duplicate-patches' into maint
When more than one commit with the same patch ID appears on one
side, "git log --cherry-pick A...B" did not exclude them all when a
commit with the same patch ID appears on the other side. Now it
does.
Junio C Hamano [Sat, 6 Feb 2021 00:31:22 +0000 (16:31 -0800)]
Merge branch 'mt/t4129-with-setgid-dir' into maint
Some tests expect that "ls -l" output has either '-' or 'x' for
group executable bit, but setgid bit can be inherited from parent
directory and make these fields 'S' or 's' instead, causing test
failures.
* mt/t4129-with-setgid-dir:
t4129: don't fail if setgid is set in the test directory
The test case added by 9466e3809d ("blame: enable funcname blaming with
userdiff driver", 2020-11-01) forgot to quote variable expansions. This
causes failures when the current directory contains blanks.
One variable that the test case introduces will not have IFS characters
and could remain without quotes, but let's quote all expansions for
consistency, not just the one that has the path name.
Signed-off-by: Johannes Sixt <j6t@kdbg.org> Acked-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rafael Silva [Wed, 27 Jan 2021 08:03:10 +0000 (09:03 +0100)]
worktree: teach `list` verbose mode
"git worktree list" annotates each worktree according to its state such
as "prunable" or "locked", however it is not immediately obvious why
these worktrees are being annotated. For prunable worktrees a reason
is available that is returned by should_prune_worktree() and for locked
worktrees a reason might be available provided by the user via `lock`
command.
Let's teach "git worktree list" a --verbose mode that outputs the reason
why the worktrees are being annotated. The reason is a text that can take
virtually any size and appending the text on the default columned format
will make it difficult to extend the command with other annotations and
not fit nicely on the screen. In order to address this shortcoming the
annotation is then moved to the next line indented followed by the reason
If the reason is not available the annotation stays on the same line as
the worktree itself.
The output of "git worktree list" with verbose becomes like so:
$ git worktree list --verbose
...
/path/to/locked-no-reason acb124 [branch-a] locked
/path/to/locked-with-reason acc125 [branch-b]
locked: worktree with a locked reason
/path/to/prunable-reason ace127 [branch-d]
prunable: gitdir file points to non-existent location
...
Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rafael Silva [Wed, 27 Jan 2021 08:03:09 +0000 (09:03 +0100)]
worktree: teach `list` to annotate prunable worktree
The "git worktree list" command shows the absolute path to the worktree,
the commit that is checked out, the name of the branch, and a "locked"
annotation if the worktree is locked, however, it does not indicate
whether the worktree is prunable.
The "prune" command will remove a worktree if it is prunable unless
`--dry-run` option is specified. This could lead to a worktree being
removed without the user realizing before it is too late, in case the
user forgets to pass --dry-run for instance. If the "list" command shows
which worktree is prunable, the user could verify before running
"git worktree prune" and hopefully prevents the working tree to be
removed "accidentally" on the worse case scenario.
Let's teach "git worktree list" to show when a worktree is a prunable
candidate for both default and porcelain format.
In the default format a "prunable" text is appended:
In the --porcelain format a prunable label is added followed by
its reason:
$ git worktree list --porcelain
...
worktree /path/to/prunable
HEAD abc1234abc1234abc1234abc1234abc1234abc12
detached
prunable gitdir file points to non-existent location
...
Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rafael Silva [Wed, 27 Jan 2021 08:03:08 +0000 (09:03 +0100)]
worktree: teach `list --porcelain` to annotate locked worktree
Commit c57b3367be (worktree: teach `list` to annotate locked worktree,
2020-10-11) taught "git worktree list" to annotate locked worktrees by
appending "locked" text to its output, however, this is not listed in
the --porcelain format.
Teach "list --porcelain" to do the same and add a "locked" attribute
followed by its reason, thus making both default and porcelain format
consistent. If the locked reason is not available then only "locked"
is shown.
The output of the "git worktree list --porcelain" becomes like so:
In porcelain mode, if the lock reason contains special characters
such as newlines, they are escaped with backslashes and the entire
reason is enclosed in double quotes. For example:
Furthermore, let's update the documentation to state that some
attributes in the porcelain format might be listed alone or together
with its value depending whether the value is available or not. Thus
documenting the case of the new "locked" attribute.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rafael Silva [Wed, 27 Jan 2021 08:03:07 +0000 (09:03 +0100)]
t2402: ensure locked worktree is properly cleaned up
c57b3367be (worktree: teach `list` to annotate locked worktree,
2020-10-11) introduced a new test to ensure locked worktrees are listed
with "locked" annotation. However, the test does not clean up after
itself as "git worktree prune" is not going to remove the locked worktree
in the first place. This not only leaves the test in an unclean state it
also potentially breaks following tests that rely on the
"git worktree list" output.
Let's fix that by unlocking the worktree before the "prune" command.
Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rafael Silva [Tue, 19 Jan 2021 21:27:35 +0000 (22:27 +0100)]
worktree: teach worktree_lock_reason() to gently handle main worktree
worktree_lock_reason() aborts with an assertion failure when called on
the main worktree since locking the main worktree is nonsensical. Not
only is this behavior undocumented, thus callers might not even be aware
that the call could potentially crash the program, but it also forces
clients to be extra careful:
if (!is_main_worktree(wt) && worktree_locked_reason(...))
...
Since we know that locking makes no sense in the context of the main
worktree, we can simply return false for the main worktree, thus making
client code less complex by eliminating the need for the callers to have
inside knowledge about the implementation:
if (worktree_lock_reason(...))
...
Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rafael Silva [Tue, 19 Jan 2021 21:27:34 +0000 (22:27 +0100)]
worktree: teach worktree to lazy-load "prunable" reason
Add worktree_prune_reason() to allow a caller to discover whether a
worktree is prunable and the reason that it is, much like
worktree_lock_reason() indicates whether a worktree is locked and the
reason for the lock. As with worktree_lock_reason(), retrieve the
prunable reason lazily and cache it in the `worktree` structure.
Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rafael Silva [Tue, 19 Jan 2021 21:27:33 +0000 (22:27 +0100)]
worktree: libify should_prune_worktree()
As part of teaching "git worktree list" to annotate worktree that is a
candidate for pruning, let's move should_prune_worktree() from
builtin/worktree.c to worktree.c in order to make part of the worktree
public API.
should_prune_worktree() knows how to select the given worktree for
pruning based on an expiration date, however the expiration value is
stored in a static file-scope variable and it is not local to the
function. In order to move the function, teach should_prune_worktree()
to take the expiration date as an argument and document the new
parameter that is not immediately obvious.
Also, change the function comment to clearly state that the worktree's
path is returned in `wtpath` argument.
Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 29 Jan 2021 20:04:08 +0000 (15:04 -0500)]
p5303: avoid sed GNU-ism
Using "1~5" isn't portable. Nobody seems to have noticed, since perhaps
people don't tend to run the perf suite on more exotic platforms. Still,
it's better to set a good example.
We can use:
perl -ne 'print if $. % 5 == 1'
instead. But we can further observe that perl does a good job of the
other parts of this pipeline, and fold the whole thing together.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 28 Jan 2021 19:57:39 +0000 (14:57 -0500)]
pretty: lazy-load commit data when expanding user-format
When we expand a user-format, we try to avoid work that isn't necessary
for the output. For instance, we don't bother parsing the commit header
until we know we need the author, subject, etc.
But we do always load the commit object's contents from disk, even if
the format doesn't require it (e.g., just "%H"). Traditionally this
didn't matter much, because we'd have loaded it as part of the traversal
anyway, and we'd typically have those bytes attached to the commit
struct (or these days, cached in a commit-slab).
But when we have a commit-graph, we might easily get to the point of
pretty-printing a commit without ever having looked at the actual object
contents. We should push off that load (and reencoding) until we're
certain that it's needed.
I think the results of p4205 show the advantage pretty clearly (we serve
parent and tree oids out of the commit struct itself, so they benefit as
well):
# using git.git as the test repo
Test HEAD^ HEAD
----------------------------------------------------------------------
4205.1: log with %H 0.40(0.39+0.01) 0.03(0.02+0.01) -92.5%
4205.2: log with %h 0.45(0.44+0.01) 0.09(0.09+0.00) -80.0%
4205.3: log with %T 0.40(0.39+0.00) 0.04(0.04+0.00) -90.0%
4205.4: log with %t 0.46(0.46+0.00) 0.09(0.08+0.01) -80.4%
4205.5: log with %P 0.39(0.39+0.00) 0.03(0.03+0.00) -92.3%
4205.6: log with %p 0.46(0.46+0.00) 0.10(0.09+0.00) -78.3%
4205.7: log with %h-%h-%h 0.52(0.51+0.01) 0.15(0.14+0.00) -71.2%
4205.8: log with %an-%ae-%s 0.42(0.41+0.00) 0.42(0.41+0.01) +0.0%
# using linux.git as the test repo
Test HEAD^ HEAD
----------------------------------------------------------------------
4205.1: log with %H 7.12(6.97+0.14) 0.76(0.65+0.11) -89.3%
4205.2: log with %h 7.35(7.19+0.16) 1.30(1.19+0.11) -82.3%
4205.3: log with %T 7.58(7.42+0.15) 1.02(0.94+0.08) -86.5%
4205.4: log with %t 8.05(7.89+0.15) 1.55(1.41+0.13) -80.7%
4205.5: log with %P 7.12(7.01+0.10) 0.76(0.69+0.07) -89.3%
4205.6: log with %p 7.38(7.27+0.10) 1.32(1.20+0.12) -82.1%
4205.7: log with %h-%h-%h 7.81(7.67+0.13) 1.79(1.67+0.12) -77.1%
4205.8: log with %an-%ae-%s 7.90(7.74+0.15) 7.81(7.66+0.15) -1.1%
I added the final test to show where we don't improve (the 1% there is
just lucky noise), but also as a regression test to make sure we're not
doing anything stupid like loading the commit multiple times when there
are several placeholders that need it.
Reported-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
rebase -i: do leave commit message intact in fixup! chains
In 6e98de72c03 (sequencer (rebase -i): add support for the 'fixup' and
'squash' commands, 2017-01-02), this developer introduced a change of
behavior by mistake: when encountering a `fixup!` commit (or multiple
`fixup!` commits) without any `squash!` commit thrown in, the final `git
commit` was invoked with `--cleanup=strip`. Prior to that commit, the
commit command had been called without that `--cleanup` option.
Since we explicitly read the original commit message from a file in that
case, there is really no sense in forcing that clean-up.
We actually need to actively suppress that clean-up lest a configured
`commit.cleanup` may interfere with what we want to do: leave the commit
message unchanged.
Reported-by: Vojtěch Knyttl <vojtech@knyt.tl> Helped-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 28 Jan 2021 06:32:37 +0000 (01:32 -0500)]
t0000: consistently use single quotes for outer tests
When we use the sub-test helpers, we end up defining one shell snippet
inside another shell snippet. So if we use single-quotes for the outer
snippet, we have to use double-quotes within the inner snippet (it's
included as here-doc within the outer snippet, but using a single quote
would end the outer snippet early). Or vice versa we can use double
quotes for the outer snippet, but then single quotes in the inner.
We have some of each in the script, and neither is wrong. But it would
be nice to be consistent unless there is a good reason not to. Using
single quotes for the outer script is preferable, because it requires
less metacharacter quoting overall. For example, in:
The exception is when we need a literal single-quote in an expected
output here-doc. There we can either use outer double-quotes, or just
use ${SQ} within the doc. I chose the latter for consistency (within
this test, but also with other test scripts that face the same problem).
There is one other interesting case, which is some tests that do:
This is rather confusing to read, but is correct. The outer script sees
'!3' in single-quotes, as does the eval'd snippet. This is perhaps being
overly cautious. In many interactive shells, an exclamation triggers
history expansion even inside double quotes, but that is not generally
true in non-interactive shells.
There's some conflicting information here. Commit 784ce03d55 (t4216:
avoid unnecessary subshell in test_bloom_filters_not_used, 2020-05-19)
reports it as a problem with OpenBSD 6.7's /bin/sh. However, we have
many instances in this script of prereqs like !LAZY_TRUE, which haven't
been a problem. I left them un-escaped here to test out this theory.
It's much nicer if we can not worry about this as a portability issue,
so it's worth knowing.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 28 Jan 2021 06:32:35 +0000 (01:32 -0500)]
t0000: run cleaning test inside sub-test
Our check of test_when_finished is done directly in the main script, and
if we failed to clean, we complain and exit immediately. It's nicer to
signal a test failure here, for a few reasons:
- this gives better output to the user when run under a TAP harness
like "prove"
- constency; it's the only test left in the file that behaves this way
- half of its "if" conditional is nonsense anyway; it picked up a
reference to GIT_TEST_FAIL_PREREQS_INTERNAL in dfe1a17df9 (tests:
add a special setup where prerequisites fail, 2019-05-13) along with
its neighbors, even though it has nothing to do with that flag
We could actually do this without a sub-test at all, and just put our
two tests (one to do cleanup, and one to check that it happened) in the
main script. But doing it in a subtest is conceptually cleaner (from the
perspective of the main test script, we are checking only one thing),
and it remains consistent with the "cleanup when failing" test directly
after it, which has to happen in a sub-test (to avoid the main script
complaining of the failed test).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 28 Jan 2021 06:32:32 +0000 (01:32 -0500)]
t0000: run prereq tests inside sub-test
We test the behavior of prerequisites in t0000 by setting up fake ones
in the main test script, trying to run some tests, and then seeing if
those tests impacted the environment correctly. If they didn't, then we
write a message and manually call exit.
Instead, let's push these down into a sub-test, like many of the other
tests covering the framework itself. This has a few advantages:
- it does not pollute the test output with mention of skipped tests
(that we know are uninteresting -- the point of the test was to see
that these are skipped).
- when running in a TAP harness, we get a useful test failure message
(whereas when the script exits early, a tool like "prove" simply
says "Dubious, test returned 1").
- we do not have to worry about different test environments, such as
when GIT_TEST_FAIL_PREREQS_INTERNAL is set. Our sub-test helpers
already give us a known environment.
- the tests themselves are a bit easier to read, as we can just check
the test-framework output to see what happened (and get the usual
test_cmp diff if it failed)
A few notes on the implementation:
- we could do one sub-test per each individual test_expect_success. I
broke it up here into a few logical groups, as I think this makes it
more readable
- the original tests modified environment variables inside the test
bodies. Instead, I've used "true" as the body of a test we expect to
run and "false" otherwise. Technically this does not confirm that
the body of the "true" test actually ran. We are trusting the
framework output to believe that it truly ran, which is sufficient
for these tests. And I think the end result is much simpler to
follow.
- the nested_prereq test uses a few bare "test -f" calls; I converted
these to our usual test_path_is_* helpers while moving the code
around.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 28 Jan 2021 06:32:28 +0000 (01:32 -0500)]
t0000: keep clean-up tests together
We check that test_when_finished cleans up after a test, and that it
runs even after a failure. Those two were originally adjacent, but got
split apart by the new test added in 477dcaddb6 (tests: do not let lazy
prereqs inside `test_expect_*` turn off tracing, 2020-03-26), and then
further by more lazy-prereq tests. Let's move them back together.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 28 Jan 2021 06:20:23 +0000 (01:20 -0500)]
oid_pos(): access table through const pointers
When we are looking up an oid in an array, we obviously don't need to
write to the array. Let's mark it as const in the function interfaces,
as well as in the local variables we use to derference the void pointer
(note a few cases use pointers-to-pointers, so we mark everything
const).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>