git.ipfire.org Git - thirdparty/git.git/log

merge-ort: restart merge with cached renames to reduce process entry cost

The merge algorithm mostly consists of the following three functions:
   collect_merge_info()
   detect_and_process_renames()
   process_entries()
Prior to the trivial directory resolution optimization of the last half
dozen commits, process_entries() was consistently the slowest, followed
by collect_merge_info(), then detect_and_process_renames().  When the
trivial directory resolution applies, it often dramatically decreases
the amount of time spent in the two slower functions.

Looking at the performance results in the previous commit, the trivial
directory resolution optimization helps amazingly well when there are no
relevant renames.  It also helps really well when reapplying a long
series of linear commits (such as in a rebase or cherry-pick), since the
relevant renames may well be cached from the first reapplied commit.
But when there are any relevant renames that are not cached (represented
by the just-one-mega testcase), then the optimization does not help at
all.

Often, I noticed that when the optimization does not apply, it is
because there are a handful of relevant sources -- maybe even only one.
It felt frustrating to need to recurse into potentially hundreds or even
thousands of directories just for a single rename, but it was needed for
correctness.

However, staring at this list of functions and noticing that
process_entries() is the most expensive and knowing I could avoid it if
I had cached renames suggested a simple idea: change
   collect_merge_info()
   detect_and_process_renames()
   process_entries()
into
   collect_merge_info()
   detect_and_process_renames()
   <cache all the renames, and restart>
   collect_merge_info()
   detect_and_process_renames()
   process_entries()

This may seem odd and look like more work.  However, note that although
we run collect_merge_info() twice, the second time we get to employ
trivial directory resolves, which makes it much faster, so the increased
time in collect_merge_info() is small.  While we run
detect_and_process_renames() again, all renames are cached so it's
nearly a no-op (we don't call into diffcore_rename_extended() but we do
have a little bit of data structure checking and fixing up).  And the
big payoff comes from the fact that process_entries(), will be much
faster due to having far fewer entries to process.

This restarting only makes sense if we can save recursing into enough
directories to make it worth our while.  Introduce a simple heuristic to
guide this.  Note that this heuristic uses a "wanted_factor" that I have
virtually no actual real world data for, just some back-of-the-envelope
quasi-scientific calculations that I included in some comments and then
plucked a simple round number out of thin air.  It could be that
tweaking this number to make it either higher or lower improves the
optimization.  (There's slightly more here; when I first introduced this
optimization, I used a factor of 10, because I was completely confident
it was big enough to not cause slowdowns in special cases.  I was
certain it was higher than needed.  Several months later, I added the
rough calculations which make me think the optimal number is close to 2;
but instead of pushing to the limit, I just bumped it to 3 to reduce the
risk that there are special cases where this optimization can result in
slowing down the code a little.  If the ratio of path counts is below 3,
we probably will only see minor performance improvements at best
anyway.)

Also, note that while the diffstat looks kind of long (nearly 100
lines), more than half of it is in two comments explaining how things
work.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      205.1  ms ±  3.8  ms   204.2  ms ±  3.0  ms
    mega-renames:      1.564 s ±  0.010 s     1.076 s ±  0.015 s
    just-one-mega:   479.5  ms ±  3.9  ms   364.1  ms ±  7.0  ms

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: avoid recursing into directories when we don't need to

This combines the work of the last several patches, and implements the
conditions when we don't need to recurse into directories.  It's perhaps
easiest to see the logic by separating the fact that a directory might
have both rename sources and rename destinations:

  * rename sources: only files present in the merge base can serve as
    rename sources, and only when one side deletes that file.  When the
    tree on one side matches the merge base, that means every file
    within the subtree matches the merge base.  This means that the
    skip-irrelevant-rename-detection optimization from before kicks in
    and we don't need any of these files as rename sources.

  * rename destinations: the tree that does not match the merge base
    might have newly added and hence unmatched destination files.
    This is what usually prevents us from doing trivial directory
    resolutions in the merge machinery.  However, the fact that we have
    deferred recursing into this directory until the end means we know
    whether there are any unmatched relevant potential rename sources
    elsewhere in this merge.  If there are no unmatched such relevant
    sources anywhere, then there is no need to look for unmatched
    potential rename destinations to match them with.

This informs our algorithm:
  * Search through relevant_sources; if we have entries, they better all
    be reflected in cached_pairs or cached_irrelevant, otherwise they
    represent an unmatched potential rename source (causing the
    optimization to be disallowed).
  * For any relevant_source represented in cached_pairs, we do need to
    to make sure to get the destination for each source, meaning we need
    to recurse into any ancestor directories of those destinations.
  * Once we've recursed into all the rename destinations for any
    relevant_sources in cached_pairs, we can then do the trivial
    directory resolution for the remaining directories.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:        5.235 s ±  0.042 s   205.1  ms ±  3.8  ms
    mega-renames:      9.419 s ±  0.107 s     1.564 s ±  0.010 s
    just-one-mega:   480.1  ms ±  3.9  ms   479.5  ms ±  3.9  ms

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: defer recursing into directories when merge base is matched

When one side of history matches the merge base (including when the
merge base has no entry for the given directory), have
collect_merge_info_callback() defer recursing into the directory.  To
ensure those entries are eventually handled, add a call to
handled_deferred_entries() in collect_merge_info() after
traverse_trees() returns.

Note that the condition in collect_merge_info_callback() may look more
complicated than necessary at first glance;
renames->trivial_merges_okay[side] is always true until
handle_deferred_entries() is called, and possible_trivial_merges[side]
is always empty right now (and in the future won't be filled until
handle_deferred_entries() is called).  However, when
handle_deferred_entries() calls traverse_trees() for the relevant
deferred directories, those traverse_trees() calls will once again end
up in collect_merge_info_callback() for all the entries under those
subdirectories.  The extra conditions are there for such deferred cases
and will be used more as we do more with those variables.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: add a handle_deferred_entries() helper function

In order to allow trivial directory resolution, we first need to be able
to gather more information to determine if the optimization is safe.  To
enable that, we need a way of deferring the recursion into the directory
until a later time.  Naturally, deferring the entry into a subtree means
that we need some function that will later recurse into the subdirectory
exactly the same way that collect_merge_info_callback() would have done.

Add a helper function that does this.  For now this function is not used
but a subsequent commit will change that.  Future commits will also make
the function sometimes resolve directories instead of traversing inside.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: add data structures for allowable trivial directory resolves

As noted a few commits ago, we can resolve individual files early if all
three sides of the merge have a file at the path and two of the three
sides match.  We would really like to do the same thing with
directories, because being able to do a trivial directory resolve means
we don't have to recurse into the directory, potentially saving us a
huge amount of time in both collect_merge_info() and process_entries().
Unfortunately, resolving directories early would mean missing any
renames whose source or destination is underneath that directory.

If we somehow knew there weren't any renames under the directory in
question, then we could resolve it early.  Sadly, it is impossible to
determine whether there are renames under the directory in question
without recursing into it, and this has traditionally kept us from ever
implementing such an optimization.

In commit f89b4f2bee ("merge-ort: skip rename detection entirely if
possible", 2021-03-11), we added an additional reason that rename
detection could be skipped entirely -- namely, if no *relevant* sources
were present.  Without completing collect_merge_info_callback(), we do
not yet know if there are no relevant sources.  However, we do know that
if the current directory on one side matches the merge base, then every
source file within that directory will not be RELEVANT_CONTENT, and a
few simple checks can often let us rule out RELEVANT_LOCATION as well.
This suggests we can just defer recursing into such directories until
the end of collect_merge_info.

Since the deferred directories are known to not add any relevant sources
due to the above properties, then if there are no relevant sources after
we've traversed all paths other than the deferred ones, then we know
there are not any relevant sources.  Under those conditions, rename
detection is unnecessary, and that means we can resolve the deferred
directories without recursing into them.

Note that the logic for skipping rename detection was also modified
further in commit 76e253793c ("merge-ort, diffcore-rename: employ cached
renames when possible", 2021-01-30); in particular rename detection can
be skipped if we already have cached renames for each relevant source.
We can take advantage of this information as well with our deferral of
recursing into directories where one side matches the merge base.

Add some data structures that we will use to do these deferrals, with
some lengthy comments explaining their purpose.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: add some more explanations in collect_merge_info_callback()

The previous patch possibly raises some questions about whether
additional cases in collect_merge_info_callback() can be handled early.
Add some explanations in the form of comments to help explain these
better. While we're at it, add a few comments to denote what a few
boolean '0' or '1' values stand for.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: resolve paths early when we have sufficient information

When there are no directories involved at a given path, and all three
sides have a file at that path, and two of the three sides of history
match, we can immediately resolve the merge of that path in
collect_merge_info() and do not need to wait until process_entries().

This is actually a very minor improvement: half the time when I run it,
I see an improvement; the other half a slowdown. It seems to be in the
range of noise. However, this idea serves as the beginning of some
bigger optimizations coming in the following patches.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The fifth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ds/gender-neutral-doc'

Update the documentation not to assume users are of certain gender
and adds to guidelines to do so.

* ds/gender-neutral-doc:
  *: fix typos
  comments: avoid using the gender of our users
  doc: avoid using the gender of other people

Merge branch 'jt/partial-clone-submodule-1'

Prepare the internals for lazily fetching objects in submodules
from their promisor remotes.

* jt/partial-clone-submodule-1:
  promisor-remote: teach lazy-fetch in any repo
  run-command: refactor subprocess env preparation
  submodule: refrain from filtering GIT_CONFIG_COUNT
  promisor-remote: support per-repository config
  repository: move global r_f_p_c to repo struct

Merge branch 'ab/struct-init'

Code cleanup around struct_type_init() functions.

* ab/struct-init:
  string-list.h users: change to use *_{nodup,dup}()
  string-list.[ch]: add a string_list_init_{nodup,dup}()
  dir.[ch]: replace dir_init() with DIR_INIT
  *.c *_init(): define in terms of corresponding *_INIT macro
  *.h: move some *_INIT to designated initializers

Merge branch 'dd/test-stdout-count-lines'

Tiny test clean-up.

* dd/test-stdout-count-lines:
  t6402: preserve git exit status code
  t6400: preserve git ls-files exit status code
  test-lib-functions: introduce test_stdout_line_count

Merge branch 'hn/refs-test-cleanup'

Test clean-up.

* hn/refs-test-cleanup:
t7509: avoid direct file access for writing CHERRY_PICK_HEAD
t1415: avoid direct filesystem access for writing refs

Merge branch 'rs/khash-alloc-cleanup'

Code clean-up.

* rs/khash-alloc-cleanup:
khash: clarify that allocations never fail

Merge branch 'ar/help-micro-cleanup'

Tiny code clean-up.

* ar/help-micro-cleanup:
help: convert git_cmd to page in one place

Merge branch 'ar/submodule-helper-include-cleanup'

Code clean-up.

* ar/submodule-helper-include-cleanup:
submodule--helper: remove redundant include

Merge branch 'ab/bundle-updates'

Code clean-up and leak plugging in "git bundle".

* ab/bundle-updates:
  bundle: remove "ref_list" in favor of string-list.c API
  bundle.c: use a temporary variable for OIDs and names
  bundle cmd: stop leaking memory from parse_options_cmd_bundle()

Merge branch 'hn/refs-iterator-peel-returns-boolean'

Tiny API tweak.

* hn/refs-iterator-peel-returns-boolean:
refs: make explicit that ref_iterator_peel returns boolean

Merge branch 'ab/mktag-tests'

Fill test gaps.

* ab/mktag-tests:
  mktag tests: test fast-export
  mktag tests: test for-each-ref
  mktag tests: test update-ref and reachable fsck
  mktag tests: test hash-object --literally and unreachable fsck
  mktag tests: invert --no-strict test
  mktag tests: parse out options in helper

Merge branch 'ab/show-branch-tests'

Fill test gaps.

* ab/show-branch-tests:
  show-branch tests: add missing tests
  show-branch: don't <COLOR></RESET> for space characters
  show-branch tests: modernize test code
  show-branch tests: rename the one "show-branch" test file

Merge branch 'ab/fetch-negotiate-segv-fix'

Code recently added to support common ancestry negotiation during
"git push" did not sanity check its arguments carefully enough.

* ab/fetch-negotiate-segv-fix:
  fetch: fix segfault in --negotiate-only without --negotiation-tip=*
  fetch: document the --negotiate-only option
  send-pack.c: move "no refs in common" abort earlier

Merge branch 'ab/make-delete-on-error'

Use ".DELETE_ON_ERROR" pseudo target to simplify our Makefile.

* ab/make-delete-on-error:
Makefile: add and use the ".DELETE_ON_ERROR" flag

Merge branch 'ew/mmap-failures'

Error message update.

* ew/mmap-failures:
xmmap: inform Linux users of tuning knobs on ENOMEM

Merge branch 'js/config-mak-windows-pcre-fix'

Whitespace fix.

* js/config-mak-windows-pcre-fix:
config.mak.uname: PCRE1 cleanup

Merge branch 'js/gfw-system-config-loc-fix'

Update the location of system-side configuration file on Windows.

* js/gfw-system-config-loc-fix:
  config: normalize the path of the system gitconfig
  cmake(windows): set correct path to the system Git config
  mingw: move Git for Windows' system config where users expect it

Merge branch 'ks/submodule-cleanup'

Code cleanup.

* ks/submodule-cleanup:
submodule: remove unnecessary `prefix` based option logic

Merge branch 'tb/midx-use-checksum'

When rebuilding the multi-pack index file reusing an existing one,
we used to blindly trust the existing file and ended up carrying
corrupted data into the updated file, which has been corrected.

* tb/midx-use-checksum:
  midx: report checksum mismatches during 'verify'
  midx: don't reuse corrupt MIDXs when writing
  commit-graph: rewrite to use checksum_valid()
  csum-file: introduce checksum_valid()

Merge branch 'en/merge-dir-rename-corner-case-fix'

The merge code had funny interactions between content based rename
detection and directory rename detection.

* en/merge-dir-rename-corner-case-fix:
  merge-recursive: handle rename-to-self case
  merge-ort: ensure we consult df_conflict and path_conflicts
  t6423: test directory renames causing rename-to-self

Merge branch 'en/ort-perf-batch-13'

Performance tweaks of "git merge -sort" around lazy fetching of objects.

* en/ort-perf-batch-13:
  merge-ort: add prefetching for content merges
  diffcore-rename: use a different prefetch for basename comparisons
  diffcore-rename: allow different missing_object_cb functions
  t6421: add tests checking for excessive object downloads during merge
  promisor-remote: output trace2 statistics for number of objects fetched

Merge branch 'en/ort-perf-batch-12'

More fix-ups and optimization to "merge -sort".

* en/ort-perf-batch-12:
  merge-ort: miscellaneous touch-ups
  Fix various issues found in comments
  diffcore-rename: avoid unnecessary strdup'ing in break_idx
  merge-ort: replace string_list_df_name_compare with faster alternative

The fourth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'rs/grep-parser-fix'

"git grep --and -e foo" ought to have been diagnosed as an error
but instead segfaulted, which has been corrected.

* rs/grep-parser-fix:
grep: report missing left operand of --and

Merge branch 'bk/doc-commit-typofix'

Doc typo/grammo-fix.

* bk/doc-commit-typofix:
Documentation: fix typo in the --patch option of the commit command

Merge branch 'dc/p4-binary-submit-fix'

Prevent "git p4" from failing to submit changes to binary file.

* dc/p4-binary-submit-fix:
git-p4: fix failed submit by skip non-text data files

Merge branch 'ab/pre-auto-gc-hook-test'

Test fix.

* ab/pre-auto-gc-hook-test:
gc tests: add a test for the "pre-auto-gc" hook

Merge branch 'jk/union-merge-binary'

The "union" conflict resolution variant misbehaved when used with
binary merge driver.

* jk/union-merge-binary:
  ll_union_merge(): rename path_unused parameter
  ll_union_merge(): pass name labels to ll_xdl_merge()
  ll_binary_merge(): handle XDL_MERGE_FAVOR_UNION

Merge branch 'mr/cmake'

CMake update.

* mr/cmake:
  cmake: add warning for ignored MSGFMT_EXE
  cmake: create compile_commands.json by default
  cmake: add knob to disable vcpkg

Merge branch 'ab/describe-tests-fix'

Various updates to tests around "git describe"

* ab/describe-tests-fix:
  describe tests: support -C in "check_describe"
  describe tests: fix nested "test_expect_success" call
  describe tests: don't rely on err.actual from "check_describe"
  describe tests: refactor away from glob matching
  describe tests: improve test for --work-tree & --dirty

Merge branch 'ab/pickaxe-pcre2'

Rewrite the backend for "diff -G/-S" to use pcre2 engine when
available.

* ab/pickaxe-pcre2: (22 commits)
  xdiff-interface: replace discard_hunk_line() with a flag
  xdiff users: use designated initializers for out_line
  pickaxe -G: don't special-case create/delete
  pickaxe -G: terminate early on matching lines
  xdiff-interface: allow early return from xdiff_emit_line_fn
  xdiff-interface: prepare for allowing early return
  pickaxe -S: slightly optimize contains()
  pickaxe: rename variables in has_changes() for brevity
  pickaxe -S: support content with NULs under --pickaxe-regex
  pickaxe: assert that we must have a needle under -G or -S
  pickaxe: refactor function selection in diffcore-pickaxe()
  perf: add performance test for pickaxe
  pickaxe/style: consolidate declarations and assignments
  diff.h: move pickaxe fields together again
  pickaxe: die when --find-object and --pickaxe-all are combined
  pickaxe: die when -G and --pickaxe-regex are combined
  pickaxe tests: add missing test for --no-pickaxe-regex being an error
  pickaxe tests: test for -G, -S and --find-object incompatibility
  pickaxe tests: add test for "log -S" not being a regex
  pickaxe tests: add test for diffgrep_consume() internals
  ...

Merge branch 'hn/prep-tests-for-reftable'

Preliminary clean-up of tests before the main reftable changes
hits the codebase.

* hn/prep-tests-for-reftable: (22 commits)
  t1415: set REFFILES for test specific to storage format
  t4202: mark bogus head hash test with REFFILES
  t7003: check reflog existence only for REFFILES
  t7900: stop checking for loose refs
  t1404: mark tests that muck with .git directly as REFFILES.
  t2017: mark --orphan/logAllRefUpdates=false test as REFFILES
  t1414: mark corruption test with REFFILES
  t1407: require REFFILES for for_each_reflog test
  test-lib: provide test prereq REFFILES
  t5304: use "reflog expire --all" to clear the reflog
  t5304: restyle: trim empty lines, drop ':' before >
  t7003: use rev-parse rather than FS inspection
  t5000: inspect HEAD using git-rev-parse
  t5000: reformat indentation to the latest fashion
  t1301: fix typo in error message
  t1413: use tar to save and restore entire .git directory
  t1401-symbolic-ref: avoid direct filesystem access
  t1401: use tar to snapshot and restore repo state
  t5601: read HEAD using rev-parse
  t9300: check ref existence using test-helper rather than a file system check
  ...

Merge branch 'fc/push-simple-updates-cleanup'

Some more code and doc clarification around "git push".

* fc/push-simple-updates-cleanup:
  push: don't get a full remote object
  push: only check same_remote when needed
  push: remove trivial function
  push: remove redundant check
  push: factor out the typical case
  push: get rid of all the setup_push_* functions
  push: trivial simplifications
  push: make setup_push_* return the dst
  push: only get the branch when needed
  push: factor out null branch check
  push: split switch cases
  push: return immediately in trivial switch case
  push: create new get_upstream_ref() helper

Merge branch 'fc/push-simple-updates'

Some code and doc clarification around "git push".

* fc/push-simple-updates:
  doc: push: explain default=simple correctly
  push: remove unused code in setup_push_upstream()
  push: simplify setup_push_simple()
  push: reorganize setup_push_simple()
  push: copy code to setup_push_simple()
  push: hedge code of default=simple
  push: rename !triangular to same_remote

Merge branch 'zh/cat-file-batch-fix'

"git cat-file --batch-all-objects"" misbehaved when "--batch" is in
use and did not ask for certain object traits.

* zh/cat-file-batch-fix:
cat-file: merge two block into one
cat-file: handle trivial --batch format with --batch-all-objects

The third batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'js/stop-exporting-bogus-columns'

When we cannot figure out how wide the terminal is, we use a
fallback value of 80 ourselves (which cannot be avoided), but when
we run the pager, we export it in COLUMNS, which forces the pager
to use the hardcoded value, even when the pager is perfectly
capable to figure it out itself. Stop exporting COLUMNS when we
fall back on the hardcoded default value for our own use.

* js/stop-exporting-bogus-columns:
pager: avoid setting COLUMNS when we're guessing its value

Merge branch 'dd/document-log-decorate-default'

Doc clean-up.

* dd/document-log-decorate-default:
doc/log: correct default for --decorate

Merge branch 'ar/test-code-cleanup'

Test code clean-up.

* ar/test-code-cleanup:
t: fix whitespace around &&

Merge branch 'ba/object-info'

Code clean-up.

* ba/object-info:
protocol-caps.h: add newline at end of file

Merge branch 'ab/progress-cleanup'

Code clean-up.

* ab/progress-cleanup:
read-cache.c: don't guard calls to progress.c API

Merge branch 'ab/xdiff-bug-cleanup'

Code clean-up.

* ab/xdiff-bug-cleanup:
xdiff: use BUG(...), not xdl_bug(...)

Merge branch 'ms/mergetools-kdiff3-on-windows'

On Windows, mergetool has been taught to find kdiff3.exe just like
it finds winmerge.exe.

* ms/mergetools-kdiff3-on-windows:
mergetools/kdiff3: make kdiff3 work on Windows too

Merge branch 'ab/cmd-foo-should-return'

Code clean-up.

* ab/cmd-foo-should-return:
builtins + test helpers: use return instead of exit() in cmd_*

Merge branch 'ar/doc-libera-chat-in-my-first-contrib'

Update MyFirst document.

* ar/doc-libera-chat-in-my-first-contrib:
MyFirstContribution: link #git-devel to Libera Chat

Merge branch 'ar/mailinfo-memcmp-to-skip-prefix'

Code clean-up.

* ar/mailinfo-memcmp-to-skip-prefix:
mailinfo: use starts_with() when checking scissors

Merge branch 'jk/doc-max-pack-size'

Doc update.

* jk/doc-max-pack-size:
doc: warn people against --max-pack-size

Merge branch 'ab/fix-columns-to-80-during-tests'

Output from some of our tests were affected by the width of the
terminal that they were run in, which has been corrected by
exporting a fixed value in the COLUMNS environment.

* ab/fix-columns-to-80-during-tests:
test-lib.sh: set COLUMNS=80 for --verbose repeatability

Merge branch 'ar/more-typofix'

Typofixes.

* ar/more-typofix:
  git-worktree.txt: fix typo in example path
  t: fix typos in test messages
  blame: correct name of config option in docs

Merge branch 'fw/complete-cmd-idx-fix'

Recent update to completion script (in contrib/) broke those who
use the __git_complete helper to define completion to their custom
command.

* fw/complete-cmd-idx-fix:
completion: bash: fix late declaration of __git_cmd_idx

Merge branch 'jk/test-without-readlink-1'

Some test scripts assumed that readlink(1) was universally
installed and available, which is not the case.

* jk/test-without-readlink-1:
t: use portable wrapper for readlink(1)

Merge branch 'jx/sideband-cleanup'

The side-band demultiplexer that is used to display progress output
from the remote end did not clear the line properly when the end of
line hits at a packet boundary, which has been corrected.  Also
comes with test clean-ups.

* jx/sideband-cleanup:
  test: refactor to use "get_abbrev_oid" to get abbrev oid
  test: refactor to use "test_commit" to create commits
  test: compare raw output, not mangle tabs and spaces
  sideband: don't lose clear-to-eol at packet boundary

Merge branch 'jk/test-avoid-globmatch-with-skip-patterns'

We broke "GIT_SKIP_TESTS=t?000" to skip certain tests in recent
update, which got fixed.

* jk/test-avoid-globmatch-with-skip-patterns:
test-lib: avoid accidental globbing in match_pattern_list()

Merge branch 'jv/userdiff-csharp-update'

The userdiff pattern for C# learned the token "record".

* jv/userdiff-csharp-update:
userdiff: add support for C# record types

Merge branch 'ab/config-hooks-path-testfix'

Test fix.

* ab/config-hooks-path-testfix:
pre-commit hook tests: don't leave "actual" nonexisting on failure

Merge branch 'fc/pull-cleanups'

Code cleanup.

* fc/pull-cleanups:
  pull: trivial whitespace style fix
  pull: trivial cleanup
  pull: cleanup autostash check

Merge branch 'jk/bitmap-tree-optim'

Avoid duplicated work while building reachability bitmaps.

* jk/bitmap-tree-optim:
bitmaps: don't recurse into trees already in the bitmap

Merge branch 'ah/graph-typofix'

Typofix in an error message.

* ah/graph-typofix:
graph: improve grammar of "invalid color" error message

Merge branch 'jx/t6020-with-older-bash'

Work around inefficient glob substitution in older versions of bash
by rewriting parts of a test.

* jx/t6020-with-older-bash:
t6020: fix incompatible parameter expansion

Merge branch 'ar/typofix'

Typofixes.

* ar/typofix:
*: fix typos which duplicate a word

Merge branch 'jk/revision-squelch-gcc-warning'

Warning fix.

* jk/revision-squelch-gcc-warning:
add_pending_object_with_path(): work around "gcc -O3" complaint

Merge branch 'ah/uninitialized-reads-fix'

Make the codebase MSAN clean.

* ah/uninitialized-reads-fix:
  builtin/checkout--worker: zero-initialise struct to avoid MSAN complaints
  split-index: use oideq instead of memcmp to compare object_id's
  bulk-checkin: make buffer reuse more obvious and safer

Merge branch 'js/no-more-multimail'

Remove multimail from contrib/

* js/no-more-multimail:
multimail: stop shipping a copy

Merge branch 'js/subtree-on-windows-fix'

Update "git subtree" to work better on Windows.

* js/subtree-on-windows-fix:
subtree: fix assumption about the directory separator
subtree: fix the GIT_EXEC_PATH sanity check to work on Windows

Merge branch 'dd/svn-test-wo-locale-a'

"git-svn" tests assumed that "locale -a", which is used to pick an
available UTF-8 locale, is available everywhere. A knob has been
introduced to allow testers to specify a suitable locale to use.

* dd/svn-test-wo-locale-a:
t: use user-specified utf-8 locale for testing svn

Merge branch 'fc/doc-default-to-upstream-config'

Doc clean-up.

* fc/doc-default-to-upstream-config:
doc: merge: mention default of defaulttoupstream

Merge branch 'js/trace2-discard-event-docfix'

Docfix.

* js/trace2-discard-event-docfix:
docs: fix api-trace2 doc for "too_many_files" event

Merge branch 'tb/complete-diff-anchored'

The command line completion (in contrib/) learned that "git diff"
takes the "--anchored" option.

* tb/complete-diff-anchored:
completion: add --anchored to diff's options

Merge branch 'tk/partial-clone-repack-doc'

Docfix.

* tk/partial-clone-repack-doc:
Remove warning that repack only works on non-promisor packfiles

fetch: fix segfault in --negotiate-only without --negotiation-tip=*

The recent --negotiate-only option would segfault in the call to
oid_array_for_each() in negotiate_using_fetch() unless one or more
--negotiation-tip=* options were provided.

All of the other tests for the feature combine both, but nothing was
checking this assumption, let's do that and add a test for it. Fixes a
bug in 9c1e657a8fd (fetch: teach independent negotiation (no
packfile), 2021-05-04).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

submodule--helper: remove redundant include

"dir.h" should have been included only once.

Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

help: convert git_cmd to page in one place

Depending on the chosen format of help pages, git-help uses function
show_man_page, show_info_page, or show_html_page. The first thing all
three functions do is to convert given `git_cmd` to a `page` using
function cmd_to_page.

Move the common part of these three functions to function cmd_help to
avoid code duplication.

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Felipe Contreras <felipe.contreras@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

khash: clarify that allocations never fail

We use our standard allocation functions and macros (xcalloc,
ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h. They terminate
the program on error instead, so code that's using them doesn't have to
handle allocation failures. Make this behavior explicit by turning
kh_resize_ into a void function and removing the related unreachable
error handling code.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7509: avoid direct file access for writing CHERRY_PICK_HEAD

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t1415: avoid direct filesystem access for writing refs

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t6402: preserve git exit status code

In t6402, we're checking number of files in the index and the working
tree by piping the output of Git's command to "wc -l", thus losing the
exit status code of git.

Let's use the new helper test_stdout_line_count in order to preserve
Git's exit status code.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t6400: preserve git ls-files exit status code

In t6400, we're checking number of files in the index and the working
tree by piping the output of "git ls-files" to "wc -l", thus losing the
exit status code of git.

Let's use the newly introduced test_stdout_line_count in order to check
the exit status code of Git's command.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-lib-functions: introduce test_stdout_line_count

In some tests, we're checking the number of lines in output of some
commands, including but not limited to Git's command.

We're doing the check by running those commands in the left side of
a pipe, thus losing the exit status code of those commands. Meanwhile,
we really want to check the exit status code of Git's command.

Let's write the output of those commands to a temporary file, and use
test_line_count separately in order to check exit status code of
those commands properly.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bundle: remove "ref_list" in favor of string-list.c API

Move away from the "struct ref_list" in bundle.c in favor of the
almost identical string-list.c API.

That API fits this use-case perfectly, but did not exist in its
current form when this code was added in 2e0afafebd (Add git-bundle:
move objects and references by archive, 2007-02-22), with hindsight we
could have used the path-list API, which later got renamed to
string-list. See 8fd2cb4069 (Extract helper bits from
c-merge-recursive work, 2006-07-25)

We need to change "name" to "string" and "oid" to "util" to make this
conversion, but other than that the APIs are pretty much identical for
what bundle.c made use of.

Let's also replace the memset(..,0,...) pattern with a more idiomatic
"INIT" macro, and finally add a *_release() function so to free the
allocated memory.

Before this the add_to_ref_list() would leak memory, now e.g. "bundle
list-heads" reports no memory leaks at all under valgrind.

In the bundle_header_init() function we're using a clever trick to
memcpy() what we'd get from the corresponding
BUNDLE_HEADER_INIT. There is a concurrent series to make use of that
pattern more generally, see [1].

1. https://lore.kernel.org/git/cover-0.5-00000000000-20210701T104855Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bundle.c: use a temporary variable for OIDs and names

In preparation for moving away from accessing the OID and name via the
"oid" and "name" slots in a subsequent commit, change the code that
accesses it to use named variables. This makes the subsequent change
smaller.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bundle cmd: stop leaking memory from parse_options_cmd_bundle()

Fix a memory leak from the prefix_filename() function introduced with
its use in 3b754eedd5 (bundle: use prefix_filename with bundle path,
2017-03-20).

As noted in that commit the leak was intentional as a part of being
sloppy about freeing resources just before we exit, I'm changing this
because I'll be fixing other memory leaks in the bundle API (including
the library version) in subsequent commits. It's easier to reason
about those fixes if valgrind runs cleanly at the end without any
leaks whatsoever.

An earlier version of this change[1] went out of its way to not leak
memory on the die() codepaths here, but doing so will only avoid
reports of potential leaks under heap-only leak trackers such as
valgrind, not the SANITIZE=leak mode.

Avoiding those leaks as well might be useful to enable us to run
cleanly under the likes of valgrind in the future. But for now the
relative verbosity of the resulting code, and the fact that we don't
have some valgrind or SANITIZE=leak mode as part of our CI (it's only
run ad-hoc, see [2]), means we're not worrying about that for now.

1. https://lore.kernel.org/git/87v95vdxrc.fsf@evledraar.gmail.com/
2. https://lore.kernel.org/git/87czsv2idy.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

string-list.h users: change to use *_{nodup,dup}()

Change all in-tree users of the string_list_init(LIST, BOOL) API to
use string_list_init_{nodup,dup}(LIST) instead.

As noted in the preceding commit let's leave the now-unused
string_list_init() wrapper in-place for any in-flight users, it can be
removed at some later date.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

string-list.[ch]: add a string_list_init_{nodup,dup}()

In order to use the new "memcpy() a 'blank' struct on the stack"
pattern for string_list_init(), and to make the macro initialization
consistent with the function initialization introduce two new
string_list_init_{nodup,dup}() functions. These are like the old
string_list_init() when called with a false and true second argument,
respectively.

I think this not only makes things more consistent, but also easier to
read. I often had to lookup what the ", 0)" or ", 1)" in these
invocations meant, now it's right there in the function name, and
corresponds to the macros.

A subsequent commit will convert existing API users to this pattern,
but as this is a very common API let's leave a compatibility function
in place for later removal. This intermediate state also proves that
the compatibility function works.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

dir.[ch]: replace dir_init() with DIR_INIT

Remove the dir_init() function and replace it with a DIR_INIT
macro. In many cases in the codebase we need to initialize things with
a function for good reasons, e.g. needing to call another function on
initialization. The "dir_init()" function was not one such case, and
could trivially be replaced with a more idiomatic macro initialization
pattern.

The only place where we made use of its use of memset() was in
dir_clear() itself, which resets the contents of an an existing struct
pointer. Let's use the new "memcpy() a 'blank' struct on the stack"
idiom to do that reset.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

*.c *_init(): define in terms of corresponding *_INIT macro

Change the common patter in the codebase of duplicating the
initialization logic between an *_INIT macro and a
corresponding *_init() function to use the macro as the canonical
source of truth.

Now we no longer need to keep the function up-to-date with the macro
version. This implements a suggestion by Jeff King who found that
under -O2 [1] modern compilers will init new version in place without
the extra copy[1]. The performance of a single *_init() won't matter
in most cases, but even if it does we're going to be producing
efficient machine code to perform these operations.

1. https://lore.kernel.org/git/YNyrDxUO1PlGJvCn@coredump.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

*.h: move some *_INIT to designated initializers

Move *_INIT macros I'll use in a subsequent commits to designated
initializers. This isn't required for those follow-up changes, but
since next commits will change things in this area, let's use the
modern pattern over the old one while we're at it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-lib: avoid accidental globbing in match_pattern_list()

We have a custom match_pattern_list() function which we use for matching
test names (like "t1234") against glob-like patterns (like "t1???") for
$GIT_SKIP_TESTS, --verbose-only, etc.

Those patterns may have multiple whitespace-separated elements (e.g.,
"t0* t1234 t5?78"). The callers of match_pattern_list thus pass the
strings unquoted, so that the shell does the usual field-splitting into
separate arguments.

But this also means the shell will do the usual globbing for each
argument, which can result in us seeing an expansion based on what's in
the filesystem, rather than the real pattern. For example, if I have the
path "t5000" in the filesystem, and you feed the pattern "t?000", that
_should_ match the string "t0000", but it won't after the shell has
expanded it to "t5000".

This has been a bug ever since that function was introduced. But it
didn't usually trigger since we typically use the function inside the
trash directory, which has a very limited set of files that are unlikely
to match. It became a lot easier to trigger after edc23840b0 (test-lib:
bring $remove_trash out of retirement, 2021-05-10), because now we match
$GIT_SKIP_TESTS before even entering the trash directory. So the t5000
example above can be seen with:

GIT_SKIP_TESTS=t?000 ./t0000-basic.sh

which should skip all tests but doesn't.

We can fix this by using "set -f" to ask the shell not to glob (which is
in POSIX, so should hopefully be portable enough). We only want to do
this in a subshell (to avoid polluting the rest of the script), which
means we need to get the whole string intact into the match_pattern_list
function by quoting it. Arguably this is a good idea anyway, since it
makes it much more obvious that we intend to split, and it's not simply
sloppy scripting.

Diagnosed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fetch: document the --negotiate-only option

There was no documentation for the --negotiate-only option added in
9c1e657a8fd (fetch: teach independent negotiation (no packfile),
2021-05-04), only documentation for the related push.negotiation
option added in the following commit in 477673d6f39 (send-pack:
support push negotiation, 2021-05-04).

Let's document it, and update the cross-linking I'd added between
--negotiation-tip=* and 'fetch.negotiationAlgorithm' in
526608284a7 (fetch doc: cross-link two new negotiation options,
2018-08-01).

I think it would be better to say "in common with the remote" here
than "...the server", but the documentation for --negotiation-tip=*
above this talks about "the server", so let's continue doing that in
this related option. See 3390e42adb3 (fetch-pack: support negotiation
tip whitelist, 2018-07-02) for that documentation.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

send-pack.c: move "no refs in common" abort earlier

Move the early return if we have no remote refs in send_pack()
earlier.

When this was added in 4c353e890c0 (Warn when send-pack does nothing,
2005-12-04) one of the first things we'd do was to abort, but as of
cfee10a773b (send-pack/receive-pack: allow errors to be reported back
to pusher., 2005-12-25) we've added numerous server_supports()
conditions that are acted on later in the function, that won't be used
if we don't have remote refs.

Then as of 477673d6f39 (send-pack: support push negotiation,
2021-05-04) we started doing even more work on the assumption that we
had some remote refs to feed to --negotiation-tip=* options.

We only hit this condition if we have nothing to push, so we don't
need to consider "push.negotiate" etc. only to do nothing with that
information.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: handle rename-to-self case

Directory rename detection can cause transitive renames, e.g. if the two
different sides of history each do one half of:
    A/file -> B/file
    B/     -> C/
then directory rename detection transitively renames to give us
    A/file -> C/file

However, when C/ == A/, note that this gives us
    A/file -> A/file.

merge-recursive assumed that any rename D -> E would have D != E.  While
that is almost always true, the above is a special case where it is not.
So we cannot do things like delete the rename source, we cannot assume
that a file existing at path E implies a rename/add conflict and we have
to be careful about what stages end up in the output.

This change feels a bit hackish.  It took me surprisingly many hours to
find, and given merge-recursive's design causing it to attempt to
enumerate all combinations of edge and corner cases with special code
for each combination, I'm worried there are other similar fixes needed
elsewhere if we can just come up with the right special testcase.
Perhaps an audit would rule it out, but I have not the energy.
merge-recursive deserves to die, and since it is on its way out anyway,
fixing this particular bug narrowly will have to be good enough.

Reported-by: Anders Kaseorg <andersk@mit.edu>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: ensure we consult df_conflict and path_conflicts

Path conflicts (typically rename path conflicts, e.g.
rename/rename(1to2) or rename/add/delete), and directory/file conflicts
should obviously result in files not being marked as clean in the merge.
We had a codepath where we missed consulting the path_conflict and
df_conflict flags, based on match_mask. Granted, it requires an unusual
setup to trigger this codepath (directory rename causing rename-to-self
is the only case I can think of), but we still need to handle it. To
make it clear that we have audited the other codepaths that do not
explicitly mention these flags, add some assertions that the flags are
not set.

Reported-by: Anders Kaseorg <andersk@mit.edu>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t6423: test directory renames causing rename-to-self

Directory rename detection can cause transitive renames, e.g. if the two
different sides of history each do one half of:
    A/file -> B/file
    B/     -> C/
then directory rename detection transitively renames to give us C/file.
Since the default for merge.directoryRenames is conflict, this results
in an error message saying it is unclear whether the file should be
placed at B/file or C/file.

What if C/ is A/, though?  In such a case, the transitive rename would
give us A/file, the original name we started with.  Logically, having
an error message with B/file vs. A/file should be fine, as should
leaving the file where it started.  But the logic in both
merge-recursive and merge-ort did not handle a case of a filename being
renamed to itself correctly; merge-recursive had two bugs, and merge-ort
had one.  Add some testcases covering such a scenario.

Based-on-testcase-by: Anders Kaseorg <andersk@mit.edu>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>