git.ipfire.org Git - thirdparty/git.git/log

t/perf: add iteration setup mechanism to perf-lib

Tests that affect the repo in stateful ways are easier to write if we
can run setup steps outside of the measured portion of perf iteration.

This change adds a "--setup 'setup-script'" parameter to test_perf. To
make invocations easier to understand, I also moved the prerequisites to
a new --prereq parameter.

The setup facility will be used in the upcoming perf tests for batch
mode, but it already helps in some existing tests, like t5302 and t7820.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

core.fsyncmethod: tests for batch mode

Add test cases to exercise batch mode for:
* 'git add'
* 'git stash'
* 'git update-index'
* 'git unpack-objects'

These tests ensure that the added data winds up in the object database.

In this change we introduce a new test helper lib-unique-files.sh. The
goal of this library is to create a tree of files that have different
oids from any other files that may have been created in the current test
repo. This helps us avoid missing validation of an object being added
due to it already being in the repo.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-lib-functions: add parsing helpers for ls-files and ls-tree

Several tests use awk to parse OIDs from the output of 'git ls-files
--stage' and 'git ls-tree'. Introduce helpers to centralize these uses
of awk.

Update t5317-pack-objects-filter-objects.sh to use the new ls-files
helper so that it has some usages to review. Other updates are left for
the future.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

core.fsync: use batch mode and sync loose objects by default on Windows

Git for Windows has defaulted to core.fsyncObjectFiles=true since
September 2017. We turn on syncing of loose object files with batch mode
in upstream Git so that we can get broad coverage of the new code
upstream.

We don't actually do fsyncs in the most of the test suite, since
GIT_TEST_FSYNC is set to 0. However, we do exercise all of the
surrounding batch mode code since GIT_TEST_FSYNC merely makes the
maybe_fsync wrapper always appear to succeed.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

unpack-objects: use the bulk-checkin infrastructure

The unpack-objects functionality is used by fetch, push, and fast-import
to turn the transfered data into object database entries when there are
fewer objects than the 'unpacklimit' setting.

By enabling an odb-transaction when unpacking objects, we can take advantage
of batched fsyncs.

Here are some performance numbers to justify batch mode for
unpack-objects, collected on a WSL2 Ubuntu VM.

Fsync Mode | Time for 90 objects (ms)
-------------------------------------
       Off | 170
  On,fsync | 760
  On,batch | 230

Note that the default unpackLimit is 100 objects, so there's a 3x
benefit in the worst case. The non-batch mode fsync scales linearly
with the number of objects, so there are significant benefits even with
smaller numbers of objects.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

update-index: use the bulk-checkin infrastructure

The update-index functionality is used internally by 'git stash push' to
setup the internal stashed commit.

This change enables odb-transactions for update-index infrastructure to
speed up adding new objects to the object database by leveraging the
batch fsync functionality.

There is some risk with this change, since under batch fsync, the object
files will be in a tmp-objdir until update-index is complete, so callers
using the --stdin option will not see them until update-index is done.
This risk is mitigated by flushing the ODB transaction prior to
reporting any verbose output so that objects will be visible to callers
that are synchronizing with update-index by snooping its output.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/add: add ODB transaction around add_files_to_cache

The add_files_to_cache function is invoked internally by
builtin/commit.c and builtin/checkout.c for their flags that stage
modified files before doing the larger operation. These commands
can benefit from batched fsyncing.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

cache-tree: use ODB transaction around writing a tree

Take advantage of the odb transaction infrastructure around writing the
cached tree to the object database.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

core.fsyncmethod: batched disk flushes for loose-objects

When adding many objects to a repo with `core.fsync=loose-object`,
the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. This commit introduces
a new `core.fsyncMethod=batch` option that batches up hardware flushes.
It hooks into the bulk-checkin odb-transaction functionality, takes
advantage of tmp-objdir, and uses the writeout-only support code.

When the new mode is enabled, we do the following for each new object:
1a. Create the object in a tmp-objdir.
2a. Issue a pagecache writeback request and wait for it to complete.

At the end of the entire transaction when unplugging bulk checkin:
1b. Issue an fsync against a dummy file to flush the log and hardware
   writeback cache, which should by now have seen the tmp-objdir writes.
2b. Rename all of the tmp-objdir files to their final names.
3b. When updating the index and/or refs, we assume that Git will issue
   another fsync internal to that operation. This is not the default
   today, but the user now has the option of syncing the index and there
   is a separate patch series to implement syncing of refs.

On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS
we would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns. This sequence also ensures that no object files
appear in the main object store unless they are fsync-durable.

Batch mode is only enabled if core.fsync includes loose-objects. If
the legacy core.fsyncObjectFiles setting is enabled, but core.fsync does
not include loose-objects, we will use file-by-file fsyncing.

In step (1a) of the sequence, the tmp-objdir is created lazily to avoid
work if no loose objects are ever added to the ODB. We use a tmp-objdir
to maintain the invariant that no loose-objects are visible in the main
ODB unless they are properly fsync-durable. This is important since
future ODB operations that try to create an object with specific
contents will silently drop the new data if an object with the target
hash exists without checking that the loose-object contents match the
hash. Only a full git-fsck would restore the ODB to a functional state
where dataloss doesn't occur.

In step (1b) of the sequence, we issue a fsync against a dummy file
created specifically for the purpose. This method has a little higher
cost than using one of the input object files, but makes adding new
callers of this mechanism easier, since we don't need to figure out
which object file is "last" or risk sharing violations by caching the fd
of the last object file.

_Performance numbers_:

Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.

Adding 500 files to the repo with 'git add' Times reported in seconds.

object file syncing | Linux | Mac   | Windows
--------------------|-------|-------|--------
           disabled | 0.06  |  0.35 | 0.61
              fsync | 1.88  | 11.18 | 2.47
              batch | 0.15  |  0.41 | 1.53

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'

Make it clearer in the naming and documentation of the plug_bulk_checkin
and unplug_bulk_checkin APIs that they can be thought of as
a "transaction" to optimize operations on the object database. These
transactions may be nested so that subsystems like the cache-tree
writing code can optimize their operations without caring whether the
top-level code has a transaction active.

Add a flush_odb_transaction API that will be used in update-index to
make objects visible even if a transaction is active. The flush call may
also be useful in future cases if we hold a transaction active around
calling hooks.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bulk-checkin: rename 'state' variable and separate 'plugged' boolean

This commit prepares for adding batch-fsync to the bulk-checkin
infrastructure.

The bulk-checkin infrastructure is currently used to batch up addition
of large blobs to a packfile. When a blob is larger than
big_file_threshold, we unconditionally add it to a pack. If bulk
checkins are 'plugged', we allow multiple large blobs to be added to a
single pack until we reach the packfile size limit; otherwise, we simply
make a new packfile for each large blob. The 'unplug' call tells us when
the series of blob additions is done so that we can finish the packfiles
and make their objects available to subsequent operations.

Stated another way, bulk-checkin allows callers to define a transaction
that adds multiple objects to the object database, where the object
database can optimize its internal operations within the transaction
boundary.

Batched fsync will fit into bulk-checkin by taking advantage of the
plug/unplug functionality to determine the appropriate time to fsync
and make newly-added objects available in the primary object database.

* Rename 'state' variable to 'bulk_checkin_packfile', since we will
  later be adding 'bulk_fsync_objdir'. This also makes the variable
  easier to find in the debugger, since the name is more unique.

* Rename finish_bulk_checkin to flush_bulk_checkin_packfile and call it
  unconditionally from unplug_bulk_checkin. Internally it will
  conditionally do a flush if there's any work to do.

* Move the 'plugged' data member of 'bulk_checkin_state' into a separate
  static variable. Doing this avoids resetting the variable in
  finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we
  seem to unintentionally disable the plugging functionality the first
  time a new packfile must be created due to packfile size limits. While
  disabling the plugging state only results in suboptimal behavior for
  the current code, it would be fatal for the bulk-fsync functionality
  later in this patch series.

The net effect of these changes is to make a clear separation between
the portion of the bulk-checkin infrastructure that is related to the
packfile (nearly all of it at present) and the part that is related to
other future optimizations of the ODB.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ns/core-fsyncmethod' into ns/batch-fsync

* ns/core-fsyncmethod:
  configure.ac: fix HAVE_SYNC_FILE_RANGE definition
  core.fsyncmethod: correctly camel-case warning message
  core.fsync: fix incorrect expression for default configuration
  core.fsync: documentation and user-friendly aggregate options
  core.fsync: new option to harden the index
  core.fsync: add configuration parsing
  core.fsync: introduce granular fsync control infrastructure
  core.fsyncmethod: add writeout-only mode
  wrapper: make inclusion of Windows csprng header tightly scoped

configure.ac: fix HAVE_SYNC_FILE_RANGE definition

If sync_file_range is not available when building the configure script,
there is a cosmetic bug when running that script reporting
"HAVE_SYNC_FILE_RANGE: command not found". Remove that error message by
defining HAVE_SYNC_FILE_RANGE to an empty string, rather than generating
a script where that appears as a bare command.

Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

core.fsyncmethod: correctly camel-case warning message

The warning for an unrecognized fsyncMethod was not
camel-cased.

Reported-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

core.fsync: fix incorrect expression for default configuration

Commit b9f5d035 (core.fsync: documentation and user-friendly
aggregate options, 2022-03-15) introduced an incorrect value for
FSYNC_COMPONENTS_DEFAULT. We need an AND-NOT rather than OR-NOT.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

core.fsync: documentation and user-friendly aggregate options

This commit adds aggregate options for the core.fsync setting that are
more user-friendly. These options are specified in terms of 'levels of
safety', indicating which Git operations are considered to be sync
points for durability.

The new documentation is also included here in its entirety for ease of
review.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The eleventh batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ab/plug-random-leaks'

Plug random memory leaks.

* ab/plug-random-leaks:
  repository.c: free the "path cache" in repo_clear()
  range-diff: plug memory leak in read_patches()
  range-diff: plug memory leak in common invocation
  lockfile API users: simplify and don't leak "path"
  commit-graph: stop fill_oids_from_packs() progress on error and free()
  commit-graph: fix memory leak in misused string_list API
  submodule--helper: fix trivial leak in module_add()
  transport: stop needlessly copying bundle header references
  bundle: call strvec_clear() on allocated strvec
  remote-curl.c: free memory in cmd_main()
  urlmatch.c: add and use a *_release() function
  diff.c: free "buf" in diff_words_flush()
  merge-base: free() allocated "struct commit **" list
  index-pack: fix memory leaks

Merge branch 'nj/read-tree-doc-reffix'

Documentation mark-up fix.

* nj/read-tree-doc-reffix:
Documentation: git-read-tree: separate links using commas

Merge branch 'ps/fetch-atomic-fixup'

Test simplification.

* ps/fetch-atomic-fixup:
t5503: simplify setup of test which exercises failure of backfill

Merge branch 'fs/gpgsm-update'

Newer version of GPGSM changed its output in a backward
incompatible way to break our code that parses its output.  It also
added more processes our tests need to kill when cleaning up.
Adjustments have been made to accommodate these changes.

* fs/gpgsm-update:
  t/lib-gpg: kill all gpg components, not just gpg-agent
  t/lib-gpg: reload gpg components after updating trustlist
  gpg-interface/gpgsm: fix for v2.3

Merge branch 'gc/parse-tree-indirect-errors'

Check the return value from parse_tree_indirect() to turn segfaults
into calls to die().

* gc/parse-tree-indirect-errors:
checkout, clone: die if tree cannot be parsed

Merge branch 'en/merge-ort-align-verbosity-with-recursive'

Align the level of verbose output from the ort backend during inner
merge to that of the recursive backend.

* en/merge-ort-align-verbosity-with-recursive:
merge-ort: exclude messages from inner merges by default

Merge branch 'ab/make-optim-noop'

Makefile refactoring with a bit of suffixes rule stripping to
optimize the runtime overhead.

* ab/make-optim-noop:
  Makefiles: add and use wildcard "mkdir -p" template
  Makefile: add "$(QUIET)" boilerplate to shared.mak
  Makefile: move $(comma), $(empty) and $(space) to shared.mak
  Makefile: move ".SUFFIXES" rule to shared.mak
  Makefile: define $(LIB_H) in terms of $(FIND_SOURCE_FILES)
  Makefile: disable GNU make built-in wildcard rules
  Makefiles: add "shared.mak", move ".DELETE_ON_ERROR" to it
  scalar Makefile: use "The default target of..." pattern

Merge branch 'ps/fetch-atomic'

"git fetch" can make two separate fetches, but ref updates coming
from them were in two separate ref transactions under "--atomic",
which has been corrected.

* ps/fetch-atomic:
  fetch: make `--atomic` flag cover pruning of refs
  fetch: make `--atomic` flag cover backfilling of tags
  refs: add interface to iterate over queued transactional updates
  fetch: report errors when backfilling tags fails
  fetch: control lifecycle of FETCH_HEAD in a single place
  fetch: backfill tags before setting upstream
  fetch: increase test coverage of fetches

core.fsync: new option to harden the index

This commit introduces the new ability for the user to harden
the index. In the event of a system crash, the index must be
durable for the user to actually find a file that has been added
to the repo and then deleted from the working tree.

We use the presence of the COMMIT_LOCK flag and absence of the
alternate_index_output as a proxy for determining whether we're
updating the persistent index of the repo or some temporary
index. We don't sync these temporary indexes.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

core.fsync: add configuration parsing

This change introduces code to parse the core.fsync setting and
configure the fsync_components variable.

core.fsync is configured as a comma-separated list of component names to
sync. Each time a core.fsync variable is encountered in the
configuration heirarchy, we start off with a clean state with the
platform default value. Passing 'none' resets the value to indicate
nothing will be synced. We gather all negative and positive entries from
the comma separated list and then compute the new value by removing all
the negative entries and adding all of the positive entries.

We issue a warning for components that are not recognized so that the
configuration code is compatible with configs from future versions of
Git with more repo components.

Complete documentation for the new setting is included in a later patch
in the series so that it can be reviewed once in final form.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

core.fsync: introduce granular fsync control infrastructure

This commit introduces the infrastructure for the core.fsync
configuration knob. The repository components we want to sync
are identified by flags so that we can turn on or off syncing
for specific components.

If core.fsyncObjectFiles is set and the core.fsync configuration
also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any
loose objects. This picks the strictest data integrity behavior
if core.fsync and core.fsyncObjectFiles are set to conflicting values.

This change introduces the currently unused fsync_component
helper, which will be used by a later patch that adds fsyncing to
the refs backend.

Actual configuration and documentation of the fsync components
list are in other patches in the series to separate review of
the underlying mechanism from the policy of how it's configured.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

core.fsyncmethod: add writeout-only mode

This commit introduces the `core.fsyncMethod` configuration
knob, which can currently be set to `fsync` or `writeout-only`.

The new writeout-only mode attempts to tell the operating system to
flush its in-memory page cache to the storage hardware without issuing a
CACHE_FLUSH command to the storage controller.

Writeout-only fsync is significantly faster than a vanilla fsync on
common hardware, since data is written to a disk-side cache rather than
all the way to a durable medium. Later changes in this patch series will
take advantage of this primitive to implement batching of hardware
flushes.

When git_fsync is called with FSYNC_WRITEOUT_ONLY, it may fail and the
caller is expected to do an ordinary fsync as needed.

On Apple platforms, the fsync system call does not issue a CACHE_FLUSH
directive to the storage controller. This change updates fsync to do
fcntl(F_FULLFSYNC) to make fsync actually durable. We maintain parity
with existing behavior on Apple platforms by setting the default value
of the new core.fsyncMethod option.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

wrapper: make inclusion of Windows csprng header tightly scoped

Including NTSecAPI.h in git-compat-util.h causes build errors in any
other file that includes winternl.h. NTSecAPI.h was included in order to
get access to the RtlGenRandom cryptographically secure PRNG. This
change scopes the inclusion of ntsecapi.h to wrapper.c, which is the only
place that it's actually needed.

The build breakage is due to the definition of UNICODE_STRING in
NtSecApi.h:
    #ifndef _NTDEF_
    typedef LSA_UNICODE_STRING UNICODE_STRING, *PUNICODE_STRING;
    typedef LSA_STRING STRING, *PSTRING ;
    #endif

LsaLookup.h:
    typedef struct _LSA_UNICODE_STRING {
        USHORT Length;
        USHORT MaximumLength;
    #ifdef MIDL_PASS
        [size_is(MaximumLength/2), length_is(Length/2)]
    #endif // MIDL_PASS
        PWSTR  Buffer;
    } LSA_UNICODE_STRING, *PLSA_UNICODE_STRING;

winternl.h also defines UNICODE_STRING:
    typedef struct _UNICODE_STRING {
        USHORT Length;
        USHORT MaximumLength;
        PWSTR  Buffer;
    } UNICODE_STRING;
    typedef UNICODE_STRING *PUNICODE_STRING;

Both definitions have equivalent layouts. Apparently these internal
Windows headers aren't designed to be included together. This is
an oversight in the headers and does not represent an incompatibility
between the APIs.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The tenth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ab/help-fixes'

Updates to how command line options to "git help" are handled.

* ab/help-fixes:
  help: don't print "\n" before single-section output
  help: add --no-[external-commands|aliases] for use with --all
  help: error if [-a|-g|-c] and [-i|-m|-w] are combined
  help: correct usage & behavior of "git help --all"
  help: note the option name on option incompatibility
  help.c: split up list_all_cmds_help() function
  help tests: test "git" and "git help [-a|-g] spacing
  help.c: use puts() instead of printf{,_ln}() for consistency
  help doc: add missing "]" to "[-a|--all]"

Merge branch 'ab/c99-variadic-macros'

Remove the escape hatch we added when we introduced the weather
balloon to use variadic macros unconditionally, to make it official
that we now have a hard dependency on the feature.

* ab/c99-variadic-macros:
C99: remove hardcoded-out !HAVE_VARIADIC_MACROS code
git-compat-util.h: clarify GCC v.s. C99-specific in comment

Merge branch 'hn/reftable-no-empty-keys'

General clean-up in reftable implementation, including
clarification of the API documentation, tightening the code to
honor documented length limit, etc.

* hn/reftable-no-empty-keys:
  reftable: rename writer_stats to reftable_writer_stats
  reftable: add test for length of disambiguating prefix
  reftable: ensure that obj_id_len is >= 2 on writing
  reftable: avoid writing empty keys at the block layer
  reftable: add a test that verifies that writing empty keys fails
  reftable: reject 0 object_id_len
  Documentation: object_id_len goes up to 31

Merge branch 'jc/cat-file-batch-commands'

"git cat-file" learns "--batch-command" mode, which is a more
flexible interface than the existing "--batch" or "--batch-check"
modes, to allow different kinds of inquiries made.

* jc/cat-file-batch-commands:
  cat-file: add --batch-command mode
  cat-file: add remove_timestamp helper
  cat-file: introduce batch_mode enum to replace print_contents
  cat-file: rename cmdmode to transform_mode

Merge branch 'pw/xdiff-alloc-fail'

Improve failure case behaviour of xdiff library when memory
allocation fails.

* pw/xdiff-alloc-fail:
  xdiff: handle allocation failure when merging
  xdiff: refactor a function
  xdiff: handle allocation failure in patience diff
  xdiff: fix a memory leak

Merge branch 'en/present-despite-skipped'

In sparse-checkouts, files mis-marked as missing from the working tree
could lead to later problems.  Such files were hard to discover, and
harder to correct.  Automatically detecting and correcting the marking
of such files has been added to avoid these problems.

* en/present-despite-skipped:
  repo_read_index: add config to expect files outside sparse patterns
  Accelerate clear_skip_worktree_from_present_files() by caching
  Update documentation related to sparsity and the skip-worktree bit
  repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
  unpack-trees: fix accidental loss of user changes
  t1011: add testcase demonstrating accidental loss of user modifications

The ninth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jt/ls-files-stage-recurse'

Many output modes of "ls-files" do not work with its
"--recurse-submodules" option, but the "-s" mode has been taught to
work with it.

* jt/ls-files-stage-recurse:
ls-files: support --recurse-submodules --stage

Merge branch 'gc/stash-on-branch-with-multi-level-name'

"git checkout -b branch/with/multi/level/name && git stash" only
recorded the last level component of the branch name, which has
been corrected.

* gc/stash-on-branch-with-multi-level-name:
stash: strip "refs/heads/" with skip_prefix

Merge branch 'ah/advice-switch-requires-detach-to-detach'

The error message given by "git switch HEAD~4" has been clarified
to suggest the "--detach" option that is required.

* ah/advice-switch-requires-detach-to-detach:
switch: mention the --detach option when dying due to lack of a branch

Merge branch 'ab/c99-designated-initializers'

Use designated initializers we started using in mid 2017 in more
parts of the codebase that are relatively quiescent.

* ab/c99-designated-initializers:
  fast-import.c: use designated initializers for "partial" struct assignments
  refspec.c: use designated initializers for "struct refspec_item"
  convert.c: use designated initializers for "struct stream_filter*"
  userdiff.c: use designated initializers for "struct userdiff_driver"
  archive-*.c: use designated initializers for "struct archiver"
  object-file: use designated initializers for "struct git_hash_algo"
  trace2: use designated initializers for "struct tr2_dst"
  trace2: use designated initializers for "struct tr2_tgt"
  imap-send.c: use designated initializers for "struct imap_server_conf"

Merge branch 'mc/index-pack-report-max-size'

When "index-pack" dies due to incoming data exceeding the maximum
allowed input size, include the value of the limit in the error
message.

* mc/index-pack-report-max-size:
index-pack: clarify the breached limit

Merge branch 'ac/usage-string-fixups'

Usage-string normalization.

* ac/usage-string-fixups:
amend remaining usage strings according to style guide

Merge branch 'ab/test-leak-diag'

Random test-framework clean-up.

* ab/test-leak-diag:
  test-lib: add "fast_unwind_on_malloc=0" to LSAN_OPTIONS
  test-lib: make $GIT_BUILD_DIR an absolute path
  test-lib: correct and assert TEST_DIRECTORY overriding
  test-lib: add GIT_SAN_OPTIONS, inherit [AL]SAN_OPTIONS

Merge branch 'ab/hook-tests'

Test modernization.

* ab/hook-tests:
hook tests: use a modern style for "pre-push" tests
hook tests: test for exact "pre-push" hook input

Merge branch 'en/merge-ort-plug-leaks'

Leakfix.

* en/merge-ort-plug-leaks:
merge-ort: fix small memory leak in unique_path()
merge-ort: fix small memory leak in detect_and_process_renames()

Merge branch 'ds/worktree-docs'

Tighten the language around "working tree" and "worktree" in the
docs.

* ds/worktree-docs:
  worktree: use 'worktree' over 'working tree'
  worktree: use 'worktree' over 'working tree'
  worktree: use 'worktree' over 'working tree'
  worktree: use 'worktree' over 'working tree'
  worktree: use 'worktree' over 'working tree'
  worktree: use 'worktree' over 'working tree'
  worktree: use 'worktree' over 'working tree'
  worktree: extract checkout_worktree()
  worktree: extract copy_sparse_checkout()
  worktree: extract copy_filtered_worktree_config()
  worktree: combine two translatable messages

Merge branch 'jc/rerere-train-modernise'

Small modernization of the rerere-train script (in contrib/).

* jc/rerere-train-modernise:
rerere-train: two fixes to the use of "git show -s"

Merge branch 'rs/bisect-executable-not-found'

A not-so-common mistake is to write a script to feed "git bisect
run" without making it executable, in which case all tests will
exit with 126 or 127 error codes, even on revisions that are marked
as good.  Try to recognize this situation and stop iteration early.

* rs/bisect-executable-not-found:
  bisect--helper: double-check run command on exit code 126 and 127
  bisect: document run behavior with exit codes 126 and 127
  bisect--helper: release strbuf and strvec on run error
  bisect--helper: report actual bisect_state() argument on error

Merge branch 'en/sparse-checkout-fixes'

Further polishing of "git sparse-checkout".

* en/sparse-checkout-fixes:
  sparse-checkout: reject arguments in cone-mode that look like patterns
  sparse-checkout: error or warn when given individual files
  sparse-checkout: pay attention to prefix for {set, add}
  sparse-checkout: correctly set non-cone mode when expected
  sparse-checkout: correct reapply's handling of options

Merge branch 'cg/t3903-modernize'

Test modernization.

* cg/t3903-modernize:
  tests: make the code more readable
  tests: allow testing if a path is truly a file or a directory
  t/t3903-stash.sh: replace test [-d|-f] with test_path_is_*

repository.c: free the "path cache" in repo_clear()

The "struct path_cache" added in 102de880d24 (path.c: migrate global
git_path_* to take a repository argument, 2018-05-17) is only used
directly by code in repository.[ch] (but populated in path.[ch]).

Let's move this code to repository.[ch], and stop leaking this memory
when we run repo_clear(). To avoid the cast change it from a "const
char *" to a "char *".

This also removes the "PATH_CACHE_INIT" macro, which has never been
used for anything. For the "struct repository" we already make a hard
assumption that it (and "the_repository") can be identically
initialized by making it a "static" variable, so making use of a
"PATH_CACHE_INIT" somewhere would have been confusing.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

range-diff: plug memory leak in read_patches()

Amend code added in d9c66f0b5bf (range-diff: first rudimentary
implementation, 2018-08-13) to use a "goto cleanup" pattern. This
makes for less code, and frees memory that we'd previously leak.

The reason for changing free(util) to FREE_AND_NULL(util) is because
at the end of the function we append the contents of "util" to a
"struct string_list" if it's non-NULL.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

range-diff: plug memory leak in common invocation

Create a public release_patch() version of the private free_patch()
function added in 13b5af22f39 (apply: move libified code from
builtin/apply.c to apply.{c,h}, 2016-04-22). Unlike the existing
function this one doesn't free() the "struct patch" itself, so we can
use it for variables on the stack.

Use it in range-diff.c to fix a memory leak in common range-diff
invocations, e.g.:

git -P range-diff origin/master origin/next origin/seen

Would emit several errors when compiled with SANITIZE=leak, but now
runs cleanly.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

lockfile API users: simplify and don't leak "path"

Fix a memory leak in code added in 6c622f9f0bb (commit-graph: write
commit-graph chains, 2019-06-18). We needed to free the "lock_name" if
we encounter errors, and the "graph_name" after we'd run unlink() on
it.

For the case of write_commit_graph_file() refactoring the code to free
the "lock_name" after we were done using the "struct lock_file lk"
would have made the control flow more complex. Luckily we can free the
"lock_file" right after the hold_lock_file_for_update() call, if it
makes use of "path" at all it'll have copied its contents to a "struct
strbuf" of its own.

While I'm at it let's fix code added in fb10ca5b543 (sparse-checkout:
write using lockfile, 2019-11-21) in write_patterns_and_update() to
avoid the same complexity that I thought I needed when I wrote the
initial fix for write_commit_graph_file(). We can free the
"sparse_filename" right after calling hold_lock_file_for_update(), we
don't need to wait until we're exiting the function to do so.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: stop fill_oids_from_packs() progress on error and free()

Fix a bug in fill_oids_from_packs(), we should always stop_progress(),
but did not do so if we returned an error here. This also plugs a
memory leak in those cases by releasing the two "struct strbuf"
variables the function uses.

While I'm at it stop hardcoding "-1" here and just use the return
value of error() instead, which happens to be "-1".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit-graph: fix memory leak in misused string_list API

When this code was migrated to the string_list API in
d88b14b3fd6 (commit-graph: use string-list API for input, 2018-06-27)
it was made to use used both STRING_LIST_INIT_NODUP and a
strbuf_detach() pattern.

Those should not be used together if string_list_clear() is expected
to free the memory, instead we need to either use STRING_LIST_INIT_DUP
with a string_list_append_nodup(), or a STRING_LIST_INIT_NODUP and
manually fiddle with the "strdup_strings" member before calling
string_list_clear(). Let's do the former.

Since "strdup_strings = 1" is set now other code might be broken by
relying on "pack_indexes" not to duplicate it strings, but that
doesn't happen. When we pass this down to write_commit_graph() that
code uses the "struct string_list" without modifying it. Let's add a
"const" to the variable to have the compiler enforce that assumption.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

submodule--helper: fix trivial leak in module_add()

Fix a memory leak in code added in a6226fd772b (submodule--helper:
convert the bulk of cmd_add() to C, 2021-08-10). If "realrepo" isn't a
copy of the "repo" member we should free() it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

transport: stop needlessly copying bundle header references

Amend the logic added in fddf2ebe388 (transport: teach all vtables to
allow fetch first, 2019-08-21) and save ourselves pointless work in
fetch_refs_from_bundle().

The fetch_refs_from_bundle() caller doesn't care about the "struct
ref *result" return value of get_refs_from_bundle(), and doesn't need
any of the work we were doing in looping over the
"data->header.references" in get_refs_from_bundle().

So this change saves us work, and also fixes a memory leak that we had
when called from fetch_refs_from_bundle(). The other caller of
get_refs_from_bundle() is the "get_refs_list" member we set up for the
"struct transport_vtable bundle_vtable". That caller does care about
the "struct ref *result" return value.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bundle: call strvec_clear() on allocated strvec

Fixing this small memory leak in cmd_bundle_create() gets
"t5607-clone-bundle.sh" closer to passing under SANITIZE=leak.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote-curl.c: free memory in cmd_main()

Plug a trivial memory leak in code added in a2d725b7bdf (Use an
external program to implement fetching with curl, 2009-08-05).

To do this have the cmd_main() use a "goto cleanup" pattern, and to
return an error of 1 unless we can fall through to the http_cleanup()
at the end.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

urlmatch.c: add and use a *_release() function

Plug a memory leak in credential_apply_config() by adding and using a
new urlmatch_config_release() function. This just does a
string_list_clear() on the "vars" member.

This finished up work on normalizing the init/free pattern in this
API, started in 73ee449bbf2 (urlmatch.[ch]: add and use
URLMATCH_CONFIG_INIT, 2021-10-01).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff.c: free "buf" in diff_words_flush()

Amend the freeing logic added in e6e045f8031 (diff.c: buffer all
output if asked to, 2017-06-29) to free the containing "buf" in
addition to its members.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-base: free() allocated "struct commit **" list

Fix a memory leak in 53eda89b2fe (merge-base: teach "git merge-base"
to drive underlying merge_bases_many(), 2008-07-30) by calling free()
on the "struct commit **" list used by "git merge-base".

This gets e.g. "t6010-merge-base.sh" closer to passing under
SANITIZE=leak, it failed 8 tests before when compiled with that
option, and now fails only 5 tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

index-pack: fix memory leaks

Fix various memory leaks in "git index-pack", due to how tightly
coupled this command is with the revision walking this doesn't make
any new tests pass.

But e.g. this now passes, and had several failures before, i.e. we
still have failures in tests 3, 5 etc., which are being skipped here.

./t5300-pack-object.sh --run=1-2,4,6-27,30-42

It is a bit odd that we'll free "opts.anomaly", since the "opts" is a
"struct pack_idx_option" declared in pack.h. In pack-write.c there's a
reset_pack_idx_option(), but it only wipes the contents, but doesn't
free() anything.

Doing this here in cmd_index_pack() is correct because while the
struct is declared in pack.h, this code in builtin/index-pack.c (in
read_v2_anomalous_offsets()) is what allocates the "opts.anomaly", so
we should also free it here.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/lib-gpg: kill all gpg components, not just gpg-agent

The gpg-agent is one of several processes that newer releases of GnuPG
start automatically.  Issue a kill to each of them to ensure they do not
affect separate tests.  (Yes, the separate GNUPGHOME should do that
already. If we find that is case, we could drop the --kill entirely.)

In terms of compatibility, the 'all' keyword was added to the --kill &
--reload options in GnuPG 2.1.18.  Debian and RHEL are often used as
indicators of how a change might affect older systems we often try to
support.

    - Debian Strech (old old stable), which has limited security support
      until June 2022, has GnuPG 2.1.18 (or 2.2.x in backports).

    - CentOS/RHEL 7, which is supported until June 2024, has GnuPG
      2.0.22, which lacks the --kill option, so the change won't have
      any impact.

Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/lib-gpg: reload gpg components after updating trustlist

With gpgsm from gnupg-2.3, the changes to the trustlist.txt do not
appear to be picked up without refreshing the gpg-agent.  Use the 'all'
keyword to reload all of the gpg components.  The scdaemon is started as
a child of gpg-agent, for example.

We used to have a --kill at this spot, but I removed it in 2e285e7803
(t/lib-gpg: drop redundant killing of gpg-agent, 2019-02-07).  It seems
like it might be necessary (again) for 2.3.

Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gpg-interface/gpgsm: fix for v2.3

Checking if signing was successful will now accept '[GNUPG]:
SIG_CREATED' on the beginning of the first or any subsequent line. Not
just explictly the second one anymore.

Gpgsm v2.3 changed its output when listing keys from `fingerprint` to
`sha1/2 fpr`. This leads to the gpgsm tests silently not being executed
because of a failed prerequisite.
Switch to gpg's `--with-colons` output format when evaluating test
prerequisites to make parsing more robust. This also allows us to
combine the existing grep/cut/tr/echo pipe for writing the trustlist.txt
into a single awk expression.

Adjust error message checking in test for v2.3 specific output changes.

Helped-By: Junio C Hamano <gitster@pobox.com>
Helped-By: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5503: simplify setup of test which exercises failure of backfill

In the testcase to exercise backfilling of tags for fetches we evoke a
failure of the backfilling mechanism by creating a reference that later
on causes a D/F conflict. Because the assumption was that git-fetch(1)
would notice the D/F conflict early on this conflicting reference was
created via the reference-transaction hook just when we were about to
write the backfilled tag. As it turns out though this is not the case,
and the fetch fails in the same way when we create the conflicting ref
up front.

Simplify the test setup creating the reference up front, which allows us
to get rid of the hook script.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: git-read-tree: separate links using commas

This makes it consistent with the rest of the documentation.

Signed-off-by: Nihal Jere <nihal@nihaljere.xyz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Makefiles: add and use wildcard "mkdir -p" template

Add a template to do the "mkdir -p" of $(@D) (the parent dir of $@)
for us, and use it for the "make lint-docs" targets I added in
8650c6298c1 (doc lint: make "lint-docs" non-.PHONY, 2021-10-15).

As seen in 4c64fb5aad9 (Documentation/Makefile: fix lint-docs mkdir
dependency, 2021-10-26) maintaining these manual lists of parent
directory dependencies is fragile, in addition to being obviously
verbose.

I used this pattern at the time because I couldn't find another method
than "order-only" prerequisites to avoid doing a "mkdir -p $(@D)" for
every file being created, which as noted in [1] would be significantly
slower.

But as it turns out we can use this neat trick of only doing a "mkdir
-p" if the $(wildcard) macro tells us the path doesn't exist. A re-run
of a performance test similar to that noted downthread of [1] in [2]
shows that this is faster, in addition to being less verbose and more
reliable (this uses my "git-hyperfine" thin wrapper for "hyperfine"[3]):

    $ git -c hyperfine.hook.setup= hyperfine -L rev HEAD~1,HEAD~0 -s 'make -C Documentation lint-docs' -p 'rm -rf Documentation/.build' 'make -C Documentation -j1 lint-docs'
    Benchmark 1: make -C Documentation -j1 lint-docs' in 'HEAD~1
      Time (mean ± σ):      2.914 s ±  0.062 s    [User: 2.449 s, System: 0.489 s]
      Range (min … max):    2.834 s …  3.020 s    10 runs

    Benchmark 2: make -C Documentation -j1 lint-docs' in 'HEAD~0
      Time (mean ± σ):      2.315 s ±  0.062 s    [User: 1.950 s, System: 0.386 s]
      Range (min … max):    2.229 s …  2.397 s    10 runs

    Summary
      'make -C Documentation -j1 lint-docs' in 'HEAD~0' ran
        1.26 ± 0.04 times faster than 'make -C Documentation -j1 lint-docs' in 'HEAD~1'

So let's use that pattern both for the "lint-docs" target, and a few
miscellaneous other targets.

This method of creating parent directories is explicitly racy in that
we don't know if we're going to say always create a "foo" followed by
a "foo/bar" under parallelism, or skip the "foo" because we created
"foo/bar" first. In this case it doesn't matter for anything except
that we aren't guaranteed to get the same number of rules firing when
running make in parallel.

1. https://lore.kernel.org/git/211028.861r45y3pt.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/211028.86o879vvtp.gmgdl@evledraar.gmail.com/
3. https://gitlab.com/avar/git-hyperfine/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Makefile: add "$(QUIET)" boilerplate to shared.mak

The $(QUIET) variables we define are largely duplicated between our
various Makefiles, let's define them in the new "shared.mak" instead.

Since we're not using the environment to pass these around we don't
need to export the "QUIET_GEN" and "QUIET_BUILT_IN" variables
anymore. The "QUIET_GEN" variable is used in "git-gui/Makefile" and
"gitweb/Makefile", but they've got their own definition for those. The
"QUIET_BUILT_IN" variable is only used in the top-level "Makefile". We
still need to export the "V" variable.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Makefile: move $(comma), $(empty) and $(space) to shared.mak

Move these variables over to the shared.mak, we'll make use of them in
a subsequent commit.

Note that there's reason for these to be "simply expanded variables",
i.e. to use ":=" assignments instead of lazily expanded "="
assignments. We could use "=", but let's leave this as-is for now for
ease of review.

See 425ca6710b2 (Makefile: allow combining UBSan with other
sanitizers, 2017-07-15) for the commit that introduced these.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Makefile: move ".SUFFIXES" rule to shared.mak

This was added in 30248886ce8 (Makefile: disable default implicit
rules, 2010-01-26), let's move it to the top of "shared.mak" so it'll
apply to all our Makefiles.

This doesn't benefit the main Makefile at all, since it already had
the rule, but since we're including shared.mak in other Makefiles
starts to benefit them. E.g. running the 'man" target is now faster:

    $ git -c hyperfine.hook.setup= hyperfine -L rev HEAD~1,HEAD~0 -s 'make -C Documentation man' 'make -C Documentation -j1 man'
    Benchmark 1: make -C Documentation -j1 man' in 'HEAD~1
      Time (mean ± σ):     121.7 ms ±   8.8 ms    [User: 105.8 ms, System: 18.6 ms]
      Range (min … max):   112.8 ms … 148.4 ms    26 runs

    Benchmark 2: make -C Documentation -j1 man' in 'HEAD~0
      Time (mean ± σ):      97.5 ms ±   8.0 ms    [User: 80.1 ms, System: 20.1 ms]
      Range (min … max):    89.8 ms … 111.8 ms    32 runs

    Summary
      'make -C Documentation -j1 man' in 'HEAD~0' ran
        1.25 ± 0.14 times faster than 'make -C Documentation -j1 man' in 'HEAD~1'

The reason for that can be seen when comparing that run with
"--debug=a". Without this change making a target like "git-status.1"
will cause "make" to consider not only "git-status.txt", but
"git-status.txt.o", as well as numerous other implicit suffixes such
as ".c", ".cc", ".cpp" etc. See [1] for a more detailed before/after
example.

So this is causing us to omit a bunch of work we didn't need to
do. For making "git-status.1" the "--debug=a" output is reduced from
~140k lines to ~6k.

1. https://lore.kernel.org/git/220222.86bkyz875k.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Makefile: define $(LIB_H) in terms of $(FIND_SOURCE_FILES)

Combine the definitions of $(FIND_SOURCE_FILES) and $(LIB_H) to speed
up the Makefile, as these are the two main expensive $(shell) commands
that we execute unconditionally.

When see what was in $(FOUND_SOURCE_FILES) that wasn't in $(LIB_H) via
the ad-hoc test of:

    $(error $(filter-out $(LIB_H),$(filter %.h,$(ALL_SOURCE_FILES))))
    $(error $(filter-out $(ALL_SOURCE_FILES),$(filter %.h,$(LIB_H))))

We'll get, respectively:

    Makefile:850: *** t/helper/test-tool.h.  Stop.
    Makefile:850: *** .  Stop.

I.e. we only had a discrepancy when it came to
t/helper/test-tool.h. In terms of correctness this was broken before,
but now works:

    $ make t/helper/test-tool.hco
        HDR t/helper/test-tool.h

This speeds things up a lot:

    $ git -c hyperfine.hook.setup= hyperfine -L rev HEAD~1,HEAD~0 -s 'make NO_TCLTK=Y' 'make -j1 NO_TCLTK=Y' --warmup 10 -M 10
    Benchmark 1: make -j1 NO_TCLTK=Y' in 'HEAD~1
      Time (mean ± σ):     159.9 ms ±   6.8 ms    [User: 137.2 ms, System: 28.0 ms]
      Range (min … max):   154.6 ms … 175.9 ms    10 runs

    Benchmark 2: make -j1 NO_TCLTK=Y' in 'HEAD~0
      Time (mean ± σ):     100.0 ms ±   1.3 ms    [User: 84.2 ms, System: 20.2 ms]
      Range (min … max):    98.8 ms … 102.8 ms    10 runs

    Summary
      'make -j1 NO_TCLTK=Y' in 'HEAD~0' ran
        1.60 ± 0.07 times faster than 'make -j1 NO_TCLTK=Y' in 'HEAD~1'

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Makefile: disable GNU make built-in wildcard rules

Override built-in rules of GNU make that use a wildcard target. This
can speeds things up significantly as we don't need to stat() so many
files. GNU make does that by default to see if it can retrieve their
contents from RCS or SCCS. See [1] for an old mailing list discussion
about how to disable these.

The speed-up may vary. I've seen 1-10% depending on the speed of the
local disk, caches, -jN etc. Running:

strace -f -c -S calls make -j1 NO_TCLTK=Y

Shows that we reduce the number of syscalls we make, mostly in "stat"
calls.

We could also invoke make with "-r" by setting "MAKEFLAGS = -r"
early. Doing so might make us a bit faster still. But doing so is a
much bigger hammer, since it will disable all built-in rules,
some (all?) of which can be seen with:

make -f/dev/null -p | grep -v -e ^# -e ^$

We may have something that relies on them, so let's go for the more
isolated optimization here that gives us most or all of the wins.

1. https://lists.gnu.org/archive/html/help-make/2002-11/msg00063.html

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Makefiles: add "shared.mak", move ".DELETE_ON_ERROR" to it

We have various behavior that's shared across our Makefiles, or that
really should be (e.g. via defined templates). Let's create a
top-level "shared.mak" to house those sorts of things, and start by
adding the ".DELETE_ON_ERROR" flag to it.

See my own 7b76d6bf221 (Makefile: add and use the ".DELETE_ON_ERROR"
flag, 2021-06-29) and db10fc6c09f (doc: simplify Makefile using
.DELETE_ON_ERROR, 2021-05-21) for the addition and use of the
".DELETE_ON_ERROR" flag.

I.e. this changes the behavior of existing rules in the altered
Makefiles (except "Makefile" & "Documentation/Makefile"). I'm
confident that this is safe having read the relevant rules in those
Makfiles, and as the GNU make manual notes that it isn't the default
behavior is out of an abundance of backwards compatibility
caution. From edition 0.75 of its manual, covering GNU make 4.3:

    [Enabling '.DELETE_ON_ERROR' is] almost always what you want
    'make' to do, but it is not historical practice; so for
    compatibility, you must explicitly request it.

This doesn't introduce a bug by e.g. having this
".DELETE_ON_ERROR" flag only apply to this new shared.mak, Makefiles
have no such scoping semantics.

It does increase the danger that any Makefile without an explicit "The
default target of this Makefile is..." snippet to define the default
target as "all" could have its default rule changed if our new
shared.mak ever defines a "real" rule. In subsequent commits we'll be
careful not to do that, and such breakage would be obvious e.g. in the
case of "make -C t".

We might want to make that less fragile still (e.g. by using
".DEFAULT_GOAL" as noted in the preceding commit), but for now let's
simply include "shared.mak" without adding that boilerplate to all the
Makefiles that don't have it already. Most of those are already
exposed to that potential caveat e.g. due to including "config.mak*".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

scalar Makefile: use "The default target of..." pattern

Make the "contrib/scalar/Makefile" be stylistically consistent with
the top-level "Makefile" in first declaring "all" to be the default
rule, followed by including other Makefile snippets.

This adjusts code added in 0a43fb22026 (scalar: create a rudimentary
executable, 2021-12-03), it further ensures that when we add another
"include" file in a subsequent commit that the included file won't be
the one to define our default target.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

repo_read_index: add config to expect files outside sparse patterns

Typically with sparse checkouts, we expect files outside the sparsity
patterns to be marked as SKIP_WORKTREE and be missing from the working
tree.  Sometimes this expectation would be violated however; including
in cases such as:
  * users grabbing files from elsewhere and writing them to the worktree
    (perhaps by editing a cached copy in an editor, copying/renaming, or
     even untarring)
  * various git commands having incomplete or no support for the
    SKIP_WORKTREE bit[1,2]
  * users attempting to "abort" a sparse-checkout operation with a
    not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
    working tree is not atomic)[3].
When the SKIP_WORKTREE bit in the index did not reflect the presence of
the file in the working tree, it traditionally caused confusion and was
difficult to detect and recover from.  So, in a sparse checkout, since
af6a51875a (repo_read_index: clear SKIP_WORKTREE bit from files present
in worktree, 2022-01-14), Git automatically clears the SKIP_WORKTREE
bit at index read time for entries corresponding to files that are
present in the working tree.

There is another workflow, however, where it is expected that paths
outside the sparsity patterns appear to exist in the working tree and
that they do not lose the SKIP_WORKTREE bit, at least until they get
modified.  A Git-aware virtual file system[4] takes advantage of its
position as a file system driver to expose all files in the working
tree, fetch them on demand using partial clone on access, and tell Git
to pay attention to them on demand by updating the sparse checkout
pattern on writes.  This means that commands like "git status" only have
to examine files that have potentially been modified, whereas commands
like "ls" are able to show the entire codebase without requiring manual
updates to the sparse checkout pattern.

Thus since af6a51875a, Git with such Git-aware virtual file systems
unsets the SKIP_WORKTREE bit for all files and commands like "git
status" have to fetch and examine them all.

Introduce a configuration setting sparse.expectFilesOutsideOfPatterns to
allow limiting the tracked set of files to a small set once again.  A
Git-aware virtual file system or other application that wants to
maintain files outside of the sparse checkout can set this in a
repository to instruct Git not to check for the presence of
SKIP_WORKTREE files.  The setting defaults to false, so most users of
sparse checkout will still get the benefit of an automatically updating
index to recover from the variety of difficult issues detailed in
af6a51875a for paths with SKIP_WORKTREE set despite the path being
present.

[1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/
[2] The three long paragraphs in the middle of
    https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/
[4] such as the vfsd described in
https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-ort: exclude messages from inner merges by default

merge-recursive would only report messages from inner merges when the
GIT_MERGE_VERBOSITY was set to 5.  Do the same for merge-ort.

Note that somewhat reverts 0d83d8240d ("merge-ort: mark conflict/warning
messages from inner merges as omittable", 2022-02-02) based on two
facts:

  * This commit basically removes the showing of messages from inner
    merges as well, at least by default.  The only difference is that
    users can request to get them back by turning up the verbosity.
  * Messages from inner merges are specially annotated since 4a3d86e1bb
    ("merge-ort: make informational messages from recursive merges
    clearer", 2022-02-17).  The ability to distinguish them from outer
    merge comments make them less problematic to include, and easier
    for humans to parse.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

checkout, clone: die if tree cannot be parsed

When a tree oid is invalid, parse_tree_indirect() can return NULL. Check
for NULL instead of proceeding as though it were a valid pointer and
segfaulting.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-lib: add "fast_unwind_on_malloc=0" to LSAN_OPTIONS

Add "fast_unwind_on_malloc=0" to LSAN_OPTIONS to get more meaningful
stack traces from LSAN. This isn't required under ASAN which will emit
traces such as this one for a leak in "t/t0006-date.sh":

    $ ASAN_OPTIONS=detect_leaks=1 ./t0006-date.sh -vixd
    [...]
    Direct leak of 3 byte(s) in 1 object(s) allocated from:
        #0 0x488b94 in strdup (t/helper/test-tool+0x488b94)
        #1 0x9444a4 in xstrdup wrapper.c:29:14
        #2 0x5995fa in parse_date_format date.c:991:24
        #3 0x4d2056 in show_dates t/helper/test-date.c:39:2
        #4 0x4d174a in cmd__date t/helper/test-date.c:116:3
        #5 0x4cce89 in cmd_main t/helper/test-tool.c:127:11
        #6 0x4cd1e3 in main common-main.c:52:11
        #7 0x7fef3c695e49 in __libc_start_main csu/../csu/libc-start.c:314:16
        #8 0x422b09 in _start (t/helper/test-tool+0x422b09)

    SUMMARY: AddressSanitizer: 3 byte(s) leaked in 1 allocation(s).
    Aborted

Whereas LSAN would emit this instead:

    $ ./t0006-date.sh -vixd
    [...]
    Direct leak of 3 byte(s) in 1 object(s) allocated from:
        #0 0x4323b8 in malloc (t/helper/test-tool+0x4323b8)
        #1 0x7f2be1d614aa in strdup string/strdup.c:42:15

    SUMMARY: LeakSanitizer: 3 byte(s) leaked in 1 allocation(s).
    Aborted

Now we'll instead git this sensible stack trace under
LSAN. I.e. almost the same one (but starting with "malloc", as is
usual for LSAN) as under ASAN:

    Direct leak of 3 byte(s) in 1 object(s) allocated from:
        #0 0x4323b8 in malloc (t/helper/test-tool+0x4323b8)
        #1 0x7f012af5c4aa in strdup string/strdup.c:42:15
        #2 0x5cb164 in xstrdup wrapper.c:29:14
        #3 0x495ee9 in parse_date_format date.c:991:24
        #4 0x453aac in show_dates t/helper/test-date.c:39:2
        #5 0x453782 in cmd__date t/helper/test-date.c:116:3
        #6 0x451d95 in cmd_main t/helper/test-tool.c:127:11
        #7 0x451f1e in main common-main.c:52:11
        #8 0x7f012aef5e49 in __libc_start_main csu/../csu/libc-start.c:314:16
        #9 0x42e0a9 in _start (t/helper/test-tool+0x42e0a9)

    SUMMARY: LeakSanitizer: 3 byte(s) leaked in 1 allocation(s).
    Aborted

As the option name suggests this does make things slower, e.g. for
t0001-init.sh we're around 10% slower:

    $ hyperfine -L v 0,1 'LSAN_OPTIONS=fast_unwind_on_malloc={v} make T=t0001-init.sh' -r 3
    Benchmark 1: LSAN_OPTIONS=fast_unwind_on_malloc=0 make T=t0001-init.sh
      Time (mean ± σ):      2.135 s ±  0.015 s    [User: 1.951 s, System: 0.554 s]
      Range (min … max):    2.122 s …  2.152 s    3 runs

    Benchmark 2: LSAN_OPTIONS=fast_unwind_on_malloc=1 make T=t0001-init.sh
      Time (mean ± σ):      1.981 s ±  0.055 s    [User: 1.769 s, System: 0.488 s]
      Range (min … max):    1.941 s …  2.044 s    3 runs

    Summary
      'LSAN_OPTIONS=fast_unwind_on_malloc=1 make T=t0001-init.sh' ran
        1.08 ± 0.03 times faster than 'LSAN_OPTIONS=fast_unwind_on_malloc=0 make T=t0001-init.sh'

I think that's more than worth it to get the more meaningful stack
traces, we can always provide LSAN_OPTIONS=fast_unwind_on_malloc=0 for
one-off "fast" runs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-lib: make $GIT_BUILD_DIR an absolute path

Change the GIT_BUILD_DIR from a path like "/path/to/build/t/.." to
"/path/to/build". The "TEST_DIRECTORY" here is already made an
absolute path a few lines above this.

We could simply do $(cd "$TEST_DIRECTORY"/.." && pwd) here, but as
noted in the preceding commit the "$TEST_DIRECTORY" can't be anything
except the path containing this test-lib.sh file at this point, so we
can more cheaply and equally strip the "/t" off the end.

This change will be helpful to LSAN_OPTIONS which will want to strip
the build directory path from filenames, which we couldn't do if we
had a "/.." in there.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-lib: correct and assert TEST_DIRECTORY overriding

Correct a misleading comment added by me in 62f539043c7 (test-lib:
Allow overriding of TEST_DIRECTORY, 2010-08-19), and add an assertion
that TEST_DIRECTORY cannot point to any directory except the "t"
directory in the top-level of git.git.

This assertion is in effect not new, since we'd already die if that
wasn't the case[1], but it and the updated commentary help to make
that clearer.

The existing comments were also on the wrong arms of the
"if". I.e. the "allow tests to override this" was on the "test -z"
arm. That came about due to a combination of 62f539043c7 and
85176d72513 (test-lib.sh: convert $TEST_DIRECTORY to an absolute path,
2013-11-17).

Those earlier comments could be read as allowing the "$TEST_DIRECTORY"
to be some path outside of t/. As explained in the updated comment
that's impossible, rather it was meant for *tests* that ran outside of
t/, i.e. the "t0000-basic.sh" tests that use "lib-subtest.sh".

Those tests have a different working directory, but they set the
"TEST_DIRECTORY" to the same path for bootstrapping. The comments now
reflect that, and further comment on why we have a hard dependency on
this.

1. https://lore.kernel.org/git/220222.86o82z8als.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-lib: add GIT_SAN_OPTIONS, inherit [AL]SAN_OPTIONS

Change our ASAN_OPTIONS and LSAN_OPTIONS to set defaults for those
variables, rather than punting out entirely if we already have them in
the environment.

We want to take any user-provided settings over our own, but we can do
that by prepending our defaults to the variable. The libsanitizer
options parsing has "last option wins" semantics.

It's now possible to do e.g.:

LSAN_OPTIONS=report_objects=1 ./t0006-date.sh

And not have the "report_objects=1" setting overwrite our sensible
default of "abort_on_error=1", but by prepending to the list we ensure
that:

LSAN_OPTIONS=report_objects=1:abort_on_error=0 ./t0006-date.sh

Will take the desired "abort_on_error=0" over our default.

See b0f4c9087e1 (t: support clang/gcc AddressSanitizer, 2014-12-08)
for the original pattern being altered here, and
85b81b35ff9 (test-lib: set LSAN_OPTIONS to abort by default,
2017-09-05) for when LSAN_OPTIONS was added in addition to the
then-existing ASAN_OPTIONS.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rerere-train: two fixes to the use of "git show -s"

The script uses "git show -s" to display the title of the merge
commit being studied, without explicitly disabling the pager, which
is not a safe thing to do in a script.

For example, when the pager is set to "less" with "-SF" options (-S
tells the pager not to fold lines but allow horizontal scrolling to
show the overly long lines, -F tells the pager not to wait if the
output in its entirety is shown on a single page), and the title of
the merge commit is longer than the width of the terminal, the pager
will wait until the end-user tells it to quit after showing the
single line.

Explicitly disable the pager with this "git show" invocation to fix
this.

The command uses the "--pretty=format:..." format, which adds LF in
between each pair of commits it outputs, which means that the label
for the merge being learned from will be followed by the next
message on the same line. "--pretty=tformat:..." is what we should
instead, which adds LF after each commit, or a more modern way to
spell it, i.e. "--format=...". This existing breakage becomes
easier to see, now we no longer use the pager.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

switch: mention the --detach option when dying due to lack of a branch

Users who are accustomed to doing `git checkout <tag>` assume that
`git switch <tag>` will do the same thing. Inform them of the --detach
option so they aren't left wondering why `git switch` doesn't work but
`git checkout` does.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The eighth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'tb/coc-plc-update'

Document Taylor as a new member of Git PLC at SFC. Welcome.

* tb/coc-plc-update:
CODE_OF_CONDUCT.md: update PLC members list

Merge branch 'en/ort-inner-merge-conflict-report'

Messages "ort" merge backend prepares while dealing with conflicted
paths were unnecessarily confusing since it did not differentiate
inner merges and outer merges.

* en/ort-inner-merge-conflict-report:
merge-ort: make informational messages from recursive merges clearer

Merge branch 'rs/pcre-invalid-utf8-fix-fix'

Workaround we have for versions of PCRE2 before their version 10.36
were in effect only for their versions newer than 10.36 by mistake,
which has been corrected.

* rs/pcre-invalid-utf8-fix-fix:
grep: fix triggering PCRE2_NO_START_OPTIMIZE workaround

Merge branch 'ds/core-untracked-cache-config'

Setting core.untrackedCache to true failed to add the untracked
cache extension to the index.

* ds/core-untracked-cache-config:
dir: force untracked cache with core.untrackedCache

Merge branch 'ab/diff-free-more'

Leakfixes.

* ab/diff-free-more:
diff.[ch]: have diff_free() free options->parseopts
diff.[ch]: have diff_free() call clear_pathspec(opts.pathspec)

Merge branch 'ab/date-mode-release'

Plug (some) memory leaks around parse_date_format().

* ab/date-mode-release:
  date API: add and use a date_mode_release()
  date API: add basic API docs
  date API: provide and use a DATE_MODE_INIT
  date API: create a date.h, split from cache.h
  cache.h: remove always unused show_date_human() declaration

Merge branch 'jc/name-rev-stdin'

Finishing touches to an earlier "name-rev --annotate-stdin" series.

* jc/name-rev-stdin:
name-rev: replace --stdin with --annotate-stdin in synopsis

Merge branch 'ab/grep-patterntype'

Some code clean-up in the "git grep" machinery.

* ab/grep-patterntype:
  grep: simplify config parsing and option parsing
  grep.c: do "if (bool && memchr())" not "if (memchr() && bool)"
  grep.h: make "grep_opt.pattern_type_option" use its enum
  grep API: call grep_config() after grep_init()
  grep.c: don't pass along NULL callback value
  built-ins: trust the "prefix" from run_builtin()
  grep tests: add missing "grep.patternType" config tests
  grep tests: create a helper function for "BRE" or "ERE"
  log tests: check if grep_config() is called by "log"-like cmds
  grep.h: remove unused "regex_t regexp" from grep_opt

Merge branch 'js/apply-partial-clone-filters-recursively'

"git clone --filter=... --recurse-submodules" only makes the
top-level a partial clone, while submodules are fully cloned. This
behaviour is changed to pass the same filter down to the submodules.

* js/apply-partial-clone-filters-recursively:
clone, submodule: pass partial clone filters to submodules

Merge branch 'ja/i18n-common-messages'

Unify more messages to help l10n.

* ja/i18n-common-messages:
  i18n: fix some misformated placeholders in command synopsis
  i18n: remove from i18n strings that do not hold translatable parts
  i18n: factorize "invalid value" messages
  i18n: factorize more 'incompatible options' messages

Merge branch 'ab/only-single-progress-at-once'

Further tweaks on progress API.

* ab/only-single-progress-at-once:
  pack-bitmap-write.c: don't return without stop_progress()
  progress API: unify stop_progress{,_msg}(), fix trace2 bug
  progress.c: refactor stop_progress{,_msg}() to use helpers
  progress.c: use dereferenced "progress" variable, not "(*p_progress)"
  progress.h: format and be consistent with progress.c naming
  progress.c tests: test some invalid usage
  progress.c tests: make start/stop commands on stdin
  progress.c test helper: add missing braces
  leak tests: fix a memory leak in "test-progress" helper