git.ipfire.org Git - thirdparty/git.git/log

builtin/worktree: create refdb via ref backend

When creating a worktree we create the worktree's ref database manually
by first writing a "HEAD" file so that the directory is recognized as a
Git repository by other commands, and then running git-update-ref(1) or
git-symbolic-ref(1) to write the actual value. But while this is fine
for the files backend, this logic simply assumes too much about how the
ref backend works and will leave behind an invalid ref database once any
other ref backend lands.

Refactor the code to instead use `refs_init_db()` to initialize the ref
database so that git-worktree(1) itself does not need to know about how
to initialize it. This will allow future ref backends to customize how
the per-worktree ref database is set up. Furthermore, as we now already
have a worktree ref store around, we can also avoid spawning external
commands to write the HEAD reference and instead use the refs API to do
so.

Note that we do not have an equivalent to passing the `--quiet` flag to
git-symbolic-ref(1) as we did before. This flag does not have an effect
anyway though, as git-symbolic-ref(1) only honors it when reading a
symref, but never when writing one.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree: expose interface to look up worktree by name

Our worktree interfaces do not provide a way to look up a worktree by
its name. Expose `get_linked_worktree()` to allow for this usecase. As
callers are responsible for freeing this worktree, introduce a new
function `free_worktree()` that does so.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/worktree: move setup of commondir file earlier

Shuffle around how we create supporting worktree files so that we first
ensure that the worktree has all link files ("gitdir", "commondir")
before we try to initialize the ref database by writing "HEAD". This
will be required by a subsequent commit where we start to initialize the
ref database via `refs_init_db()`, which will require an initialized
`struct worktree *`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: skip creation of "refs/{heads,tags}" for worktrees

The files ref backend will create both "refs/heads" and "refs/tags" in
the Git directory. While this logic makes sense for normal repositories,
it does not for worktrees because those refs are "common" refs that
would always be contained in the main repository's ref database.

Introduce a new flag telling the backend that it is expected to create a
per-worktree ref database and skip creation of these dirs in the files
backend when the flag is set. No other backends (currently) need
worktree-specific logic, so this is the only required change to start
creating per-worktree ref databases via `refs_init_db()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: move creation of "refs/" into the files backend

When creating the ref database we unconditionally create the "refs/"
directory in "setup.c". This is a mandatory prerequisite for all Git
repositories regardless of the ref backend in use, because Git will be
unable to detect the directory as a repository if "refs/" doesn't exist.

We are about to add another new caller that will want to create a ref
database when creating worktrees. We would require the same logic to
create the "refs/" directory even though the caller really should not
care about such low-level details. Ideally, the ref database should be
fully initialized after calling `refs_init_db()`.

Move the code to create the directory into the files backend itself to
make it so. This means that future ref backends will also need to have
equivalent logic around to ensure that the directory exists, but it
seems a lot more sensible to have it this way round than to require
callers to create the directory themselves.

An alternative to this would be to create "refs/" in `refs_init_db()`
directly. This feels conceptually unclean though as the creation of the
refdb is now cluttered across different callsites. Furthermore, both the
"files" and the upcoming "reftable" backend write backend-specific data
into the "refs/" directory anyway, so splitting up this logic would only
make it harder to reason about.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: prepare `refs_init_db()` for initializing worktree refs

The purpose of `refs_init_db()` is to initialize the on-disk files of a
new ref database. The function is quite inflexible right now though, as
callers can neither specify the `struct ref_store` nor can they pass any
flags.

Refactor the interface to accept both of these. This will be required so
that we can start initializing per-worktree ref databases via the ref
backend instead of open-coding the initialization in "worktree.c".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ps/refstorage-extension' into ps/worktree-refdb-initialization

* ps/refstorage-extension:
  t9500: write "extensions.refstorage" into config
  builtin/clone: introduce `--ref-format=` value flag
  builtin/init: introduce `--ref-format=` value flag
  builtin/rev-parse: introduce `--show-ref-format` flag
  t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar
  setup: introduce GIT_DEFAULT_REF_FORMAT envvar
  setup: introduce "extensions.refStorage" extension
  setup: set repository's formats on init
  setup: start tracking ref storage format
  refs: refactor logic to look up storage backends
  worktree: skip reading HEAD when repairing worktrees
  t: introduce DEFAULT_REPO_FORMAT prereq
  builtin/clone: create the refdb with the correct object format
  builtin/clone: skip reading HEAD when retrieving remote
  builtin/clone: set up sparse checkout later
  builtin/clone: fix bundle URIs with mismatching object formats
  remote-curl: rediscover repository when fetching refs
  setup: allow skipping creation of the refdb
  setup: extract function to create the refdb

t9500: write "extensions.refstorage" into config

In t9500 we're writing a custom configuration that sets up gitweb. This
requires us to manually ensure that the repository format is configured
as required, including both the repository format version and
extensions. With the introduction of the "extensions.refStorage"
extension we need to update the test to also write this new one.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/clone: introduce `--ref-format=` value flag

Introduce a new `--ref-format` value flag for git-clone(1) that allows
the user to specify the ref format that is to be used for a newly
initialized repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/init: introduce `--ref-format=` value flag

Introduce a new `--ref-format` value flag for git-init(1) that allows
the user to specify the ref format that is to be used for a newly
initialized repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/rev-parse: introduce `--show-ref-format` flag

Introduce a new `--show-ref-format` to git-rev-parse(1) that causes it
to print the ref format used by a repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar

Introduce a new GIT_TEST_DEFAULT_REF_FORMAT environment variable that
lets developers run the test suite with a different default ref format
without impacting the ref format used by non-test Git invocations. This
is modeled after GIT_TEST_DEFAULT_OBJECT_FORMAT, which does the same
thing for the repository's object format.

Adapt the setup of the `REFFILES` test prerequisite to be conditionally
set based on the default ref format.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: introduce GIT_DEFAULT_REF_FORMAT envvar

Introduce a new GIT_DEFAULT_REF_FORMAT environment variable that lets
users control the default ref format used by both git-init(1) and
git-clone(1). This is modeled after GIT_DEFAULT_OBJECT_FORMAT, which
does the same thing for the repository's object format.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: introduce "extensions.refStorage" extension

Introduce a new "extensions.refStorage" extension that allows us to
specify the ref storage format used by a repository. For now, the only
supported format is the "files" format, but this list will likely soon
be extended to also support the upcoming "reftable" format.

There have been discussions on the Git mailing list in the past around
how exactly this extension should look like. One alternative [1] that
was discussed was whether it would make sense to model the extension in
such a way that backends are arbitrarily stackable. This would allow for
a combined value of e.g. "loose,packed-refs" or "loose,reftable", which
indicates that new refs would be written via "loose" files backend and
compressed into "packed-refs" or "reftable" backends, respectively.

It is arguable though whether this flexibility and the complexity that
it brings with it is really required for now. It is not foreseeable that
there will be a proliferation of backends in the near-term future, and
the current set of existing formats and formats which are on the horizon
can easily be configured with the much simpler proposal where we have a
single value, only.

Furthermore, if we ever see that we indeed want to gain the ability to
arbitrarily stack the ref formats, then we can adapt the current
extension rather easily. Given that Git clients will refuse any unknown
value for the "extensions.refStorage" extension they would also know to
ignore a stacked "loose,packed-refs" in the future.

So let's stick with the easy proposal for the time being and wire up the
extension.

[1]: <pull.1408.git.1667846164.gitgitgadget@gmail.com>

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: set repository's formats on init

The proper hash algorithm and ref storage format that will be used for a
newly initialized repository will be figured out in `init_db()` via
`validate_hash_algorithm()` and `validate_ref_storage_format()`. Until
now though, we never set up the hash algorithm or ref storage format of
`the_repository` accordingly.

There are only two callsites of `init_db()`, one in git-init(1) and one
in git-clone(1). The former function doesn't care for the formats to be
set up properly because it never access the repository after calling the
function in the first place.

For git-clone(1) it's a different story though, as we call `init_db()`
before listing remote refs. While we do indeed have the wrong hash
function in `the_repository` when `init_db()` sets up a non-default
object format for the repository, it never mattered because we adjust
the hash after learning about the remote's hash function via the listed
refs.

So the current state is correct for the hash algo, but it's not for the
ref storage format because git-clone(1) wouldn't know to set it up
properly. But instead of adjusting only the `ref_storage_format`, set
both the hash algo and the ref storage format so that `the_repository`
is in the correct state when `init_db()` exits. This is fine as we will
adjust the hash later on anyway and makes it easier to reason about the
end state of `the_repository`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: start tracking ref storage format

In order to discern which ref storage format a repository is supposed to
use we need to start setting up and/or discovering the format. This
needs to happen in two separate code paths.

  - The first path is when we create a repository via `init_db()`. When
    we are re-initializing a preexisting repository we need to retain
    the previously used ref storage format -- if the user asked for a
    different format then this indicates an error and we error out.
    Otherwise we either initialize the repository with the format asked
    for by the user or the default format, which currently is the
    "files" backend.

  - The second path is when discovering repositories, where we need to
    read the config of that repository. There is not yet any way to
    configure something other than the "files" backend, so we can just
    blindly set the ref storage format to this backend.

Wire up this logic so that we have the ref storage format always readily
available when needed. As there is only a single backend and because it
is not configurable we cannot yet verify that this tracking works as
expected via tests, but tests will be added in subsequent commits. To
countermand this ommission now though, raise a BUG() in case the ref
storage format is not set up properly in `ref_store_init()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: refactor logic to look up storage backends

In order to look up ref storage backends, we're currently using a linked
list of backends, where each backend is expected to set up its `next`
pointer to the next ref storage backend. This is kind of a weird setup
as backends need to be aware of other backends without much of a reason.

Refactor the code so that the array of backends is centrally defined in
"refs.c", where each backend is now identified by an integer constant.
Expose functions to translate from those integer constants to the name
and vice versa, which will be required by subsequent patches.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree: skip reading HEAD when repairing worktrees

When calling `git init --separate-git-dir=<new-path>` on a preexisting
repository, we move the Git directory of that repository to the new path
specified by the user. If there are worktrees present in the repository,
we need to repair the worktrees so that their gitlinks point to the new
location of the repository.

This repair logic will load repositories via `get_worktrees()`, which
will enumerate up and initialize all worktrees. Part of initialization
is logic that we resolve their respective worktree HEADs, even though
that information may not actually be needed in the end by all callers.

Although not a problem presently with the file-based reference backend,
it will become a problem with the upcoming reftable backend. In the
context of git-init(1) we do not have a fully-initialized repository set
up via `setup_git_directory()` or friends. Consequently, we do not know
about the repository format when `repair_worktrees()` is called, and
properly setting up all parts of the repositroy in `init_db()` before we
try to repair worktrees is not an easy task. With the introduction of
the reftable backend, we would ultimately try to look up the worktree
HEADs before we have figured out the reference format, which does not
work.

We do not require the worktree HEADs at all to repair worktrees. So
let's fix this issue by skipping over the step that reads them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: introduce DEFAULT_REPO_FORMAT prereq

A limited number of tests require repositories to have the default
repository format or otherwise they would fail to run, e.g. because they
fail to detect the correct hash function. While the hash function is the
only extension right now that creates problems like this, we are about
to add a second extension for the ref format.

Introduce a new DEFAULT_REPO_FORMAT prereq that can easily be amended
whenever we add new format extensions. Next to making any such changes
easier on us, the prerequisite's name should also help to clarify the
intent better.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ps/clone-into-reftable-repository' into ps/refstorage-extension

* ps/clone-into-reftable-repository:
  builtin/clone: create the refdb with the correct object format
  builtin/clone: skip reading HEAD when retrieving remote
  builtin/clone: set up sparse checkout later
  builtin/clone: fix bundle URIs with mismatching object formats
  remote-curl: rediscover repository when fetching refs
  setup: allow skipping creation of the refdb
  setup: extract function to create the refdb

The third batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jk/config-cleanup'

Code clean-up around use of configuration variables.

* jk/config-cleanup:
  sequencer: simplify away extra git_config_string() call
  gpg-interface: drop pointless config_error_nonbool() checks
  push: drop confusing configset/callback redundancy
  config: use git_config_string() for core.checkRoundTripEncoding
  diff: give more detailed messages for bogus diff.* config
  config: use config_error_nonbool() instead of custom messages
  imap-send: don't use git_die_config() inside callback
  git_xmerge_config(): prefer error() to die()
  config: reject bogus values for core.checkstat

Merge branch 'jk/implicit-true'

Some codepaths did not correctly parse configuration variables
specified with valueless "true", which has been corrected.

* jk/implicit-true:
  fsck: handle NULL value when parsing message config
  trailer: handle NULL value when parsing trailer-specific config
  submodule: handle NULL value when parsing submodule.*.branch
  help: handle NULL value for alias.* config
  trace2: handle NULL values in tr2_sysenv config callback
  setup: handle NULL value when parsing extensions
  config: handle NULL value when parsing non-bools

Merge branch 'jk/bisect-reset-fix'

"git bisect reset" has been taught to clean up state files and refs
even when BISECT_START file is gone.

* jk/bisect-reset-fix:
bisect: always clean on reset

Merge branch 'jk/end-of-options'

"git $cmd --end-of-options --rev -- --path" for some $cmd failed
to interpret "--rev" as a rev, and "--path" as a path. This was
fixed for many programs like "reset" and "checkout".

* jk/end-of-options:
parse-options: decouple "--end-of-options" and "--"

Merge branch 'rs/incompatible-options-messages'

Clean-up code that handles combinations of incompatible options.

* rs/incompatible-options-messages:
  worktree: simplify incompatibility message for --orphan and commit-ish
  worktree: standardize incompatibility messages
  clean: factorize incompatibility message
  revision, rev-parse: factorize incompatibility messages about - -exclude-hidden
  revision: use die_for_incompatible_opt3() for - -graph/--reverse/--walk-reflogs
  repack: use die_for_incompatible_opt3() for -A/-k/--cruft
  push: use die_for_incompatible_opt4() for - -delete/--tags/--all/--mirror

Merge branch 'jc/revision-parse-int'

The command line parser for the "log" family of commands was too
loose when parsing certain numbers, e.g., silently ignoring the
extra 'q' in "git log -n 1q" without complaining, which has been
tightened up.

* jc/revision-parse-int:
revision: parse integer arguments to --max-count, --skip, etc., more carefully

Merge branch 'mk/doc-gitfile-more'

Doc update.

* mk/doc-gitfile-more:
doc: make the gitfile syntax easier to discover

Merge branch 'ps/ref-tests-update-more'

Tests update.

* ps/ref-tests-update-more:
  t6301: write invalid object ID via `test-tool ref-store`
  t5551: stop writing packed-refs directly
  t5401: speed up creation of many branches
  t4013: simplify magic parsing and drop "failure"
  t3310: stop checking for reference existence via `test -f`
  t1417: make `reflog --updateref` tests backend agnostic
  t1410: use test-tool to create empty reflog
  t1401: stop treating FETCH_HEAD as real reference
  t1400: split up generic reflog tests from the reffile-specific ones
  t0410: mark tests to require the reffiles backend

Merge branch 'jp/use-diff-index-in-pre-commit-sample'

The sample pre-commit hook that tries to catch introduction of new
paths that use potentially non-portable characters did not notice
an existing path getting renamed to such a problematic path, when
rename detection was enabled.

* jp/use-diff-index-in-pre-commit-sample:
hooks--pre-commit: detect non-ASCII when renaming

Merge branch 'en/complete-sparse-checkout'

Command line completion (in contrib/) learned to complete path
arguments to the "add/set" subcommands of "git sparse-checkout"
better.

* en/complete-sparse-checkout:
  completion: avoid user confusion in non-cone mode
  completion: avoid misleading completions in cone mode
  completion: fix logic for determining whether cone mode is active
  completion: squelch stray errors in sparse-checkout completion

The second batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jh/trace2-redact-auth'

trace2 streams used to record the URLs that potentially embed
authentication material, which has been corrected.

* jh/trace2-redact-auth:
  t0212: test URL redacting in EVENT format
  t0211: test URL redacting in PERF format
  trace2: redact passwords from https:// URLs by default
  trace2: fix signature of trace2_def_param() macro

Merge branch 'ad/merge-file-diff-algo'

"git merge-file" learned to take the "--diff-algorithm" option to
use algorithm different from the default "myers" diff.

* ad/merge-file-diff-algo:
merge-file: add --diff-algorithm option

Merge branch 'rs/column-leakfix'

Leakfix.

* rs/column-leakfix:
column: release strbuf and string_list after use

Merge branch 'rs/i18n-cannot-be-used-together'

Clean-up code that handles combinations of incompatible options.

* rs/i18n-cannot-be-used-together:
i18n: factorize even more 'incompatible options' messages

Merge branch 'jb/reflog-expire-delete-dry-run-options'

Command line parsing fix for "git reflog".

* jb/reflog-expire-delete-dry-run-options:
builtin/reflog.c: fix dry-run option short name

Merge branch 'js/update-urls-in-doc-and-comment'

Stale URLs have been updated to their current counterparts (or
archive.org) and HTTP links are replaced with working HTTPS links.

* js/update-urls-in-doc-and-comment:
  doc: refer to internet archive
  doc: update links for andre-simon.de
  doc: switch links to https
  doc: update links to current pages

Merge branch 'ps/commit-graph-less-paranoid'

Earlier we stopped relying on commit-graph that (still) records
information about commits that are lost from the object store,
which has negative performance implications. The default has been
flipped to disable this pessimization.

* ps/commit-graph-less-paranoid:
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default

Merge branch 'cc/git-replay'

Introduce "git replay", a tool meant on the server side without
working tree to recreate a history.

* cc/git-replay:
  replay: stop assuming replayed branches do not diverge
  replay: add --contained to rebase contained branches
  replay: add --advance or 'cherry-pick' mode
  replay: use standard revision ranges
  replay: make it a minimal server side command
  replay: remove HEAD related sanity check
  replay: remove progress and info output
  replay: add an important FIXME comment about gpg signing
  replay: change rev walking options
  replay: introduce pick_regular_commit()
  replay: die() instead of failing assert()
  replay: start using parse_options API
  replay: introduce new builtin
  t6429: remove switching aspects of fast-rebase

Merge branch 'ac/fuzz-show-date'

Subject approxidate() and show_date() machinery to OSS-Fuzz.

* ac/fuzz-show-date:
fuzz: add new oss-fuzz fuzzer for date.c / date.h

Merge branch 'ps/ref-deletion-updates'

Simplify API implementation to delete references by eliminating
duplication.

* ps/ref-deletion-updates:
  refs: remove `delete_refs` callback from backends
  refs: deduplicate code to delete references
  refs/files: use transactions to delete references
  t5510: ensure that the packed-refs file needs locking

Merge branch 'js/packfile-h-typofix'

Typofix.

* js/packfile-h-typofix:
packfile.c: fix a typo in `each_file_in_pack_dir_fn()`'s declaration

builtin/clone: create the refdb with the correct object format

We're currently creating the reference database with a potentially
incorrect object format when the remote repository's object format is
different from the local default object format. This works just fine for
now because the files backend never records the object format anywhere.
But this logic will fail with any new reference backend that encodes
this information in some form either on-disk or in-memory.

The preceding commits have reshuffled code in git-clone(1) so that there
is no code path that will access the reference database before we have
detected the remote's object format. With these refactorings we can now
defer initialization of the reference database until after we have
learned the remote's object format and thus initialize it with the
correct format from the get-go.

These refactorings are required to make git-clone(1) work with the
upcoming reftable backend when cloning repositories with the SHA256
object format.

This change breaks a test in "t5550-http-fetch-dumb.sh" when cloning an
empty repository with `GIT_TEST_DEFAULT_HASH=sha256`. The test expects
the resulting hash format of the empty cloned repository to match the
default hash, but now we always end up with a sha1 repository. The
problem is that for dumb HTTP fetches, we have no easy way to figure out
the remote's hash function except for deriving it based on the hash
length of refs in `info/refs`. But as the remote repository is empty we
cannot rely on this detection mechanism.

Before the change in this commit we already initialized the repository
with the default hash function and then left it as-is. With this patch
we always use the hash function detected via the remote, where we fall
back to "sha1" in case we cannot detect it.

Neither the old nor the new behaviour are correct as we second-guess the
remote hash function in both cases. But given that this is a rather
unlikely edge case (we use the dumb HTTP protocol, the remote repository
uses SHA256 and the remote repository is empty), let's simply adapt the
test to assert the new behaviour. If we want to properly address this
edge case in the future we will have to extend the dumb HTTP protocol so
that we can properly detect the hash function for empty repositories.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/clone: skip reading HEAD when retrieving remote

After we have set up the remote configuration in git-clone(1) we'll call
`remote_get()` to read the remote from the on-disk configuration. But
next to reading the on-disk configuration, `remote_get()` will also
cause us to try and read the repository's HEAD reference so that we can
figure out the current branch. Besides being pointless in git-clone(1)
because we're operating in an empty repository anyway, this will also
break once we move creation of the reference database to a later point
in time.

Refactor the code to introduce a new `remote_get_early()` function that
will skip reading the HEAD reference to address this issue.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/clone: set up sparse checkout later

When asked to do a sparse checkout, then git-clone(1) will spawn
`git sparse-checkout set` to set up the configuration accordingly. This
requires a proper Git repository or otherwise the command will fail. But
as we are about to move creation of the reference database to a later
point, this prerequisite will not hold anymore.

Move the logic to a later point in time where we know to have created
the reference database already.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/clone: fix bundle URIs with mismatching object formats

We create the reference database in git-clone(1) quite early before
connecting to the remote repository. Given that we do not yet know about
the object format that the remote repository uses at that point in time
the consequence is that the refdb may be initialized with the wrong
object format.

This is not a problem in the context of the files backend as we do not
encode the object format anywhere, and furthermore the only reference
that we write between initializing the refdb and learning about the
object format is the "HEAD" symref. It will become a problem though once
we land the reftable backend, which indeed does require to know about
the proper object format at the time of creation. We thus need to
rearrange the logic in git-clone(1) so that we only initialize the refdb
once we have learned about the actual object format.

As a first step, move listing of remote references to happen earlier,
which also allow us to set up the hash algorithm of the repository
earlier now. While we aim to execute this logic as late as possible
until after most of the setup has happened already, detection of the
object format and thus later the setup of the reference database must
happen before any other logic that may spawn Git commands or otherwise
these Git commands may not recognize the repository as such.

The first Git step where we expect the repository to be fully initalized
is when we fetch bundles via bundle URIs. Funny enough, the comments
there also state that "the_repository must match the cloned repo", which
is indeed not necessarily the case for the hash algorithm right now. So
in practice it is the right thing to detect the remote's object format
before downloading bundle URIs anyway, and not doing so causes clones
with bundle URIs to fail when the local default object format does not
match the remote repository's format.

Unfortunately though, this creates a new issue: downloading bundles may
take a long time, so if we list refs beforehand they might've grown
stale meanwhile. It is not clear how to solve this issue except for a
second reference listing though after we have downloaded the bundles,
which may be an expensive thing to do.

Arguably though, it's preferable to have a staleness issue compared to
being unable to clone a repository altogether.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote-curl: rediscover repository when fetching refs

The reftable format encodes the hash function used by the repository
inside of its tables. The reftable backend thus needs to be initialized
with the correct hash function right from the start, or otherwise we may
end up writing tables with the wrong hash function. But git-clone(1)
initializes the reference database before learning about the hash
function used by the remote repository, which has never been a problem
with the reffiles backend.

To fix this, we'll have to change git-clone(1) to be more careful and
only create the reference backend once it learned about the remote hash
function. This creates a problem for git-remote-curl(1), which will then
be spawned at a time where the repository is not yet fully-initialized.
Consequentially, git-remote-curl(1) will fail to detect the repository,
which eventually causes it to error out once it is asked to fetch remote
objects.

We can address this issue by trying to re-discover the Git repository in
case none was detected at startup time. With this change, the clone will
look as following:

  1. git-clone(1) sets up the initial repository, excluding the
     reference database.

  2. git-clone(1) spawns git-remote-curl(1), which will be unable to
     detect the repository due to a missing "HEAD".

  3. git-clone(1) asks git-remote-curl(1) to list remote references.
     This works just fine as this step does not require a local
     repository

  4. git-clone(1) creates the reference database as it has now learned
     about the hash function.

  5. git-clone(1) asks git-remote-curl(1) to fetch the remote packfile.
     The latter notices that it doesn't have a repository available, but
     it now knows to try and re-discover it.

If the re-discovery succeeds in the last step we can continue with the
clone.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: allow skipping creation of the refdb

Allow callers to skip creation of the reference database via a new flag
`INIT_DB_SKIP_REFDB`, which is required for git-clone(1) so that we can
create it at a later point once the object format has been discovered
from the remote repository.

Note that we also uplift the call to `create_reference_database()` into
`init_db()`, which makes it easier to handle the new flag for us. This
changes the order in which we do initialization so that we now set up
the Git configuration before we create the reference database. In
practice this move should not result in any change in behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: extract function to create the refdb

We're about to let callers skip creation of the reference database when
calling `init_db()`. Extract the logic into a standalone function so
that it becomes easier to do this refactoring.

While at it, expand the comment that explains why we always create the
"refs/" directory.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Start the 2.44 cycle

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'tz/send-email-negatable-options'

Newer versions of Getopt::Long started giving warnings against our
(ab)use of it in "git send-email".  Bump the minimum version
requirement for Perl to 5.8.1 (from September 2002) to allow
simplifying our implementation.

* tz/send-email-negatable-options:
  send-email: avoid duplicate specification warnings
  perl: bump the required Perl version to 5.8.1 from 5.8.0

Merge branch 'ak/rebase-autosquash'

"git rebase --autosquash" is now enabled for non-interactive rebase,
but it is still incompatible with the apply backend.

* ak/rebase-autosquash:
  rebase: rewrite --(no-)autosquash documentation
  rebase: support --autosquash without -i
  rebase: fully ignore rebase.autoSquash without -i

Merge branch 'vd/for-each-ref-unsorted-optimization'

"git for-each-ref --no-sort" still sorted the refs alphabetically
which paid non-trivial cost.  It has been redefined to show output
in an unspecified order, to allow certain optimizations to take
advantage of.

* vd/for-each-ref-unsorted-optimization:
  t/perf: add perf tests for for-each-ref
  ref-filter.c: use peeled tag for '*' format fields
  for-each-ref: clean up documentation of --format
  ref-filter.c: filter & format refs in the same callback
  ref-filter.c: refactor to create common helper functions
  ref-filter.c: rename 'ref_filter_handler()' to 'filter_one()'
  ref-filter.h: add functions for filter/format & format-only
  ref-filter.h: move contains caches into filter
  ref-filter.h: add max_count and omit_empty to ref_format
  ref-filter.c: really don't sort when using --no-sort

Merge branch 'ps/ban-a-or-o-operator-with-test'

Test and shell scripts clean-up.

* ps/ban-a-or-o-operator-with-test:
  Makefile: stop using `test -o` when unlinking duplicate executables
  contrib/subtree: convert subtree type check to use case statement
  contrib/subtree: stop using `-o` to test for number of args
  global: convert trivial usages of `test <expr> -a/-o <expr>`

Merge branch 'ss/format-patch-use-encode-headers-for-cover-letter'

"git format-patch --encode-email-headers" ignored the option when
preparing the cover letter, which has been corrected.

* ss/format-patch-use-encode-headers-for-cover-letter:
format-patch: fix ignored encode_email_headers for cover letter

Merge branch 'ps/ref-tests-update'

Update ref-related tests.

* ps/ref-tests-update:
  t: mark several tests that assume the files backend with REFFILES
  t7900: assert the absence of refs via git-for-each-ref(1)
  t7300: assert exact states of repo
  t4207: delete replace references via git-update-ref(1)
  t1450: convert tests to remove worktrees via git-worktree(1)
  t: convert tests to not access reflog via the filesystem
  t: convert tests to not access symrefs via the filesystem
  t: convert tests to not write references via the filesystem
  t: allow skipping expected object ID in `ref-store update-ref`

Merge branch 'jw/git-add-attr-pathspec'

"git add" and "git stash" learned to support the ":(attr:...)"
magic pathspec.

* jw/git-add-attr-pathspec:
attr: enable attr pathspec magic for git-add and git-stash

Merge branch 'jk/chunk-bounds-more'

Code clean-up for jk/chunk-bounds topic.

* jk/chunk-bounds-more:
  commit-graph: mark chunk error messages for translation
  commit-graph: drop verify_commit_graph_lite()
  commit-graph: check order while reading fanout chunk
  commit-graph: use fanout value for graph size
  commit-graph: abort as soon as we see a bogus chunk
  commit-graph: clarify missing-chunk error messages
  commit-graph: drop redundant call to "lite" verification
  midx: check consistency of fanout table
  commit-graph: handle overflow in chunk_size checks

Merge branch 'js/ci-discard-prove-state'

The way CI testing used "prove" could lead to running the test
suite twice needlessly, which has been corrected.

* js/ci-discard-prove-state:
ci: avoid running the test suite _twice_

Merge branch 'ps/ci-gitlab'

Add support for GitLab CI.

* ps/ci-gitlab:
  ci: add support for GitLab CI
  ci: install test dependencies for linux-musl
  ci: squelch warnings when testing with unusable Git repo
  ci: unify setup of some environment variables
  ci: split out logic to set up failed test artifacts
  ci: group installation of Docker dependencies
  ci: make grouping setup more generic
  ci: reorder definitions for grouping functions

Merge branch 'js/doc-unit-tests-with-cmake'

Update the base topic to work with CMake builds.

* js/doc-unit-tests-with-cmake:
  cmake: handle also unit tests
  cmake: use test names instead of full paths
  cmake: fix typo in variable name
  artifacts-tar: when including `.dll` files, don't forget the unit-tests
  unit-tests: do show relative file paths
  unit-tests: do not mistake `.pdb` files for being executable
  cmake: also build unit tests

Merge branch 'js/doc-unit-tests'

Process to add some form of low-level unit tests has started.

* js/doc-unit-tests:
  ci: run unit tests in CI
  unit tests: add TAP unit test framework
  unit tests: add a project plan document

Merge branch 'ps/httpd-tests-on-nixos'

Portability tweak.

* ps/httpd-tests-on-nixos:
  t9164: fix inability to find basename(1) in Subversion hooks
  t/lib-httpd: stop using legacy crypt(3) for authentication
  t/lib-httpd: dynamically detect httpd and modules path

revision: parse integer arguments to --max-count, --skip, etc., more carefully

The "rev-list" and other commands in the "log" family, being the
oldest part of the system, use their own custom argument parsers,
and integer values of some options are parsed with atoi(), which
allows a non-digit after the number (e.g., "1q") to be silently
ignored. As a natural consequence, an argument that does not begin
with a digit (e.g., "q") silently becomes zero, too.

Switch to use strtol_i() and parse_timestamp() appropriately to
catch bogus input.

Note that one may naïvely expect that --max-count, --skip, etc., to
only take non-negative values, but we must allow them to also take
negative values, as an escape hatch to countermand a limit set by an
earlier option on the command line; the underlying variables are
initialized to (-1) and "--max-count=-1", for example, is a
legitimate way to reinitialize the limit.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer: simplify away extra git_config_string() call

In our config callback, we call git_config_string() to copy the incoming
value string into a local string. But we don't modify or store that
string; we just look at it and then free it. We can make the code
simpler by just looking at the value passed into the callback.

Note that we do need to check for NULL, which is the one bit of logic
git_config_string() did for us. And I could even see an argument that we
are abstracting any error-checking of the value behind the
git_config_string() layer. But in practice no other callbacks behave
this way; it is standard to check for NULL and then just look at the
string directly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gpg-interface: drop pointless config_error_nonbool() checks

Config callbacks which use git_config_string() or git_config_pathname()
have no need to check for a NULL value. This is handled automatically
by those helpers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

push: drop confusing configset/callback redundancy

We parse push config by calling git_config() with our git_push_config()
callback. But inside that callback, when we see "push.gpgsign", we
ignore the value passed into the callback and instead make a new call to
git_config_get_value().

This is unnecessary at best, and slightly wrong at worst (if there are
multiple instances, get_value() only returns one; both methods end up
with last-one-wins, but we'd fail to report errors if earlier
incarnations were bogus).

The call was added by 68c757f219 (push: add a config option push.gpgSign
for default signed pushes, 2015-08-19). That commit doesn't give any
reason to deviate from the usual strategy here; it was probably just
somebody unfamiliar with our config API and its conventions.

It also added identical code to builtin/send-pack.c, which also handles
push.gpgsign.

And then the same issue spread to its neighbor in b33a15b081 (push: add
recurseSubmodules config option, 2015-11-17), presumably via
cargo-culting.

This patch fixes all three to just directly use the value provided to
the callback. While I was adjusting the code to do so, I noticed that
push.gpgsign is overly careful about a NULL value. After
git_parse_maybe_bool() has returned anything besides 1, we know that the
value cannot be NULL (if it were, it would be an implicit "true", and
many callers of maybe_bool rely on that). Here that lets us shorten "if
(v && !strcasecmp(v, ...))" to just "if (!strcasecmp(v, ...))".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: use git_config_string() for core.checkRoundTripEncoding

Since this code path was recently converted to check for a NULL value,
it now behaves exactly like git_config_string(). We can shorten the code
a bit by using that helper.

Note that git_config_string() takes a const pointer, but our storage
variable is non-const. We're better off making this "const", though,
since the default value points to a string literal (and thus it would be
an error if anybody tried to write to it).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff: give more detailed messages for bogus diff.* config

The config callbacks for a few diff.* variables simply return -1 when we
encounter an error. The message you get mentions the offending location,
like:

  fatal: bad config variable 'diff.algorithm' in file '.git/config' at line 7

but is vague about "bad" (as it must be, since the message comes from
the generic config code). Most callbacks add their own messages here, so
let's do the same. E.g.:

  error: unknown value for config 'diff.algorithm': foo
  fatal: bad config variable 'diff.algorithm' in file '.git/config' at line 7

I've written the string in a way that should be reusable for
translators, and matches another similar message in transport.c (there
doesn't yet seem to be a popular generic message to reuse here, so
hopefully this will get the ball rolling).

Note that in the case of diff.algorithm, our parse_algorithm_value()
helper does detect a NULL value string. But it's still worth detecting
it ourselves here, since we can give a more specific error message (and
which is the usual one for unexpected implicit-bool values).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: use config_error_nonbool() instead of custom messages

A few config callbacks use their own custom messages to report an
unexpected implicit bool like:

[merge "foo"]
driver

These should just use config_error_nonbool(), so the user sees
consistent messages.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

imap-send: don't use git_die_config() inside callback

The point of git_die_config() is to let configset users mention the
file/line info for invalid config, like:

  if (!git_config_get_int("foo.bar", &value)) {
if (!is_ok(value))
git_die_config("foo.bar");
  }

Using it from within a config callback is unnecessary, because we can
simply return an error, at which point the config machinery will mention
the file/line of the offending variable. Worse, using git_die_config()
can actually produce the wrong location when the key is found in
multiple spots. For instance, with config like:

  [imap]
  host
  host = foo

we'll report the line number of the "host = foo" line, but the problem
is on the implicit-bool "host" line.

We can fix it by just returning an error code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git_xmerge_config(): prefer error() to die()

When parsing merge config, a few code paths die on error. It's
preferable for us to call error() here, because the resulting error
message from the config parsing code contains much more detail.

For example, before:

  fatal: unknown style 'bogus' given for 'merge.conflictstyle'

and after:

  error: unknown style 'bogus' given for 'merge.conflictstyle'
  fatal: bad config variable 'merge.conflictstyle' in file '.git/config' at line 7

Since we're touching these lines, I also marked them for translation.
There's no reason they shouldn't behave like most other config-parsing
errors.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: reject bogus values for core.checkstat

If you feed nonsense config like:

git -c core.checkstat=foobar status

we'll silently ignore the unknown value, rather than reporting an error.
This goes all the way back to c08e4d5b5c (Enable minimal stat checking,
2013-01-22).

Detecting and complaining now is technically a backwards-incompatible
change, but I don't think anybody has any reason to use an invalid value
here. There are no historical values we'd want to allow for backwards
compatibility or anything like that. We are better off loudly telling
the user that their config may not be doing what they expect.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsck: handle NULL value when parsing message config

When parsing fsck.*, receive.fsck.*, or fetch.fsck.*, we don't check for
an implicit bool. So any of:

  [fsck]
  badTree
  [receive "fsck"]
  badTree
  [fetch "fsck"]
  badTree

will cause us to segfault. We can fix it with config_error_nonbool() in
the usual way, but we have to make a few more changes to get good error
messages. The problem is that all three spots do:

  if (skip_prefix(var, "fsck.", &var))

to match and parse the actual message id. But that means that "var" now
just says "badTree" instead of "receive.fsck.badTree", making the
resulting message confusing. We can fix that by storing the parsed
message id in its own separate variable.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

trailer: handle NULL value when parsing trailer-specific config

When parsing the "key", "command", and "cmd" trailer config, we just
make a copy of the value string.  If we see an implicit bool like:

  [trailer "foo"]
  key

we'll segfault trying to copy a NULL pointer. We can fix this with the
usual config_error_nonbool() check.

I split this out from the other vanilla cases, because at first glance
it looks like a better fix here would be to move the NULL check out of
the switch statement. But it would change the behavior of other keys
like trailer.*.ifExists, where an implicit bool is interpreted as
EXISTS_DEFAULT.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

submodule: handle NULL value when parsing submodule.*.branch

We record the submodule branch config value as a string, so config that
uses an implicit bool like:

[submodule "foo"]
branch

will cause us to segfault. Note that unlike most other config-parsing
bugs of this class, this can be triggered by parsing a bogus .gitmodules
file (which we might do after cloning a malicious repository).

I don't think the security implications are important, though. It's
always a strict NULL dereference, not an out-of-bounds read or write. So
we should reliably kill the process. That may be annoying, but the
impact is limited to the attacker preventing the victim from
successfully using "git clone --recurse-submodules", etc, on the
malicious repo.

The "branch" entry is the only one with this problem; other strings like
"path" and "url" already check for NULL.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

help: handle NULL value for alias.* config

When showing all config with "git help --all", we print the list of
defined aliases. But our config callback to do so does not check for a
NULL value, meaning a config block like:

[alias]
foo

will cause us to segfault. We should detect and complain about this in
the usual way.

Since this command is purely informational (and we aren't trying to run
the alias), we could perhaps just generate a warning and continue. But
this sort of misconfiguration should be pretty rare, and the error
message we will produce points directly to the line of config that needs
to be fixed. So just generating the usual error should be OK.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

trace2: handle NULL values in tr2_sysenv config callback

If you have config with an implicit bool like:

[trace2]
envvars

we'll segfault, as we unconditionally try to xstrdup() the value. We
should instead detect and complain, as a boolean value has no meaning
here. The same is true for every variable in tr2_sysenv_settings (and
this patch covers them all, as we check them in a loop).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: handle NULL value when parsing extensions

The "partialclone" extension config records a string, and hence it is an
error to have an implicit bool like:

[extensions]
partialclone

in your config. We should recognize and reject this, rather than
segfaulting (which is the current behavior). Note that it's OK to use
config_error_nonbool() here, even though the return value is an enum. We
explicitly document EXTENSION_ERROR as -1 for compatibility with
error(), etc.

This is the only extension value that has this problem. Most of the
others are bools that interpret this value naturally. The exception is
extensions.objectformat, which does correctly check for NULL.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: handle NULL value when parsing non-bools

When the config parser sees an "implicit" bool like:

  [core]
  someVariable

it passes NULL to the config callback. Any callback code which expects a
string must check for NULL. This usually happens via helpers like
git_config_string(), etc, but some custom code forgets to do so and will
segfault.

These are all fairly vanilla cases where the solution is just the usual
pattern of:

  if (!value)
        return config_error_nonbool(var);

though note that in a few cases we have to split initializers like:

  int some_var = initializer();

into:

  int some_var;
  if (!value)
        return config_error_nonbool(var);
  some_var = initializer();

There are still some broken instances after this patch, which I'll
address on their own in individual patches after this one.

Reported-by: Carlos Andrés Ramírez Cataño <antaigroupltda@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bisect: always clean on reset

Usually "bisect reset" cleans up any refs/bisect/ refs, along with
meta-files like .git/BISECT_LOG. But it only does so after deciding that
a bisection is active, which it does by reading BISECT_START. This is
usually fine, but it's possible to get into a confusing state if the
BISECT_START file is gone, but other cruft is left (this might be due to
a bug, or a system crash, etc).

And since "bisect reset" refuses to do anything in this state, the user
has no easy way to clean up the leftover cruft. While another "bisect
start" would clear the state, in the interim it can be annoying, as
other tools (like our bash prompt code) think we are bisecting, and
for-each-ref output may be polluted with refs/bisect/ entries.

Further adding to the confusion is that running "bisect reset $some_ref"
skips the BISECT_START check. So it never realizes that there's no
bisection active and does the cleanup anyway!

So let's just make sure we always do the cleanup, whether we looked at
BISECT_START or not. If the user doesn't give us a commit to reset to,
we'll still say "We are not bisecting" and skip the call to "git
checkout".

Reported-by: Janik Haag <janik@aq0.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

parse-options: decouple "--end-of-options" and "--"

When we added generic end-of-options support in 51b4594b40
(parse-options: allow --end-of-options as a synonym for "--",
2019-08-06), we made them true synonyms. They both stop option parsing,
and they are both returned in the resulting argv if the KEEP_DASHDASH
flag is used.

The hope was that this would work for all callers:

  - most generic callers would not pass KEEP_DASHDASH, and so would just
    do the right thing (stop parsing there) without needing to know
    anything more.

  - callers with KEEP_DASHDASH were generally going to rely on
    setup_revisions(), which knew to handle --end-of-options specially

But that turned out miss quite a few cases that pass KEEP_DASHDASH but
do their own manual parsing. For example, "git reset", "git checkout",
and so on want pass KEEP_DASHDASH so they can support:

  git reset $revs -- $paths

but of course aren't going to actually do a traversal, so they don't
call setup_revisions(). And those cases currently get confused by
--end-of-options being left in place, like:

   $ git reset --end-of-options HEAD
   fatal: option '--end-of-options' must come before non-option arguments

We could teach each of these callers to handle the leftover option
explicitly. But let's try to be a bit more clever and see if we can
solve it centrally in parse-options.c.

The bogus assumption here is that KEEP_DASHDASH tells us the caller
wants to see --end-of-options in the result. But really, the callers
which need to know that --end-of-options was reached are those that may
potentially parse more options from argv. In other words, those that
pass the KEEP_UNKNOWN_OPT flag.

If such a caller is aware of --end-of-options (e.g., because they call
setup_revisions() with the result), then this will continue to do the
right thing, treating anything after --end-of-options as a non-option.

And if the caller is not aware of --end-of-options, they are better off
keeping it intact, because either:

  1. They are just passing the options along to somebody else anyway, in
     which case that somebody would need to know about the
     --end-of-options marker.

  2. They are going to parse the remainder themselves, at which point
     choking on --end-of-options is much better than having it silently
     removed. The point is to avoid option injection from untrusted
     command line arguments, and bailing is better than quietly treating
     the untrusted argument as an option.

This fixes bugs with --end-of-options across several commands, but I've
focused on two in particular here:

  - t7102 confirms that "git reset --end-of-options --foo" now works.
    This checks two things. One, that we no longer barf on
    "--end-of-options" itself (which previously we did, even if the rev
    was something vanilla like "HEAD" instead of "--foo"). And two, that
    we correctly treat "--foo" as a revision rather than an option.

    This fix applies to any other cases which pass KEEP_DASHDASH but not
    KEEP_UNKNOWN_OPT, like "git checkout", "git check-attr", "git grep",
    etc, which would previously choke on "--end-of-options".

  - t9350 shows the opposite case: fast-export passed KEEP_UNKNOWN_OPT
    but not KEEP_DASHDASH, but then passed the result on to
    setup_revisions(). So it never saw --end-of-options, and would
    erroneously parse "fast-export --end-of-options --foo" as having a
    "--foo" option. This is now fixed.

Note that this does shut the door for callers which want to know if we
hit end-of-options, but don't otherwise need to keep unknown opts. The
obvious thing here is feeding it to the DWIM verify_filename()
machinery. And indeed, this is a problem even for commands which do
understand --end-of-options already. For example, without this patch,
you get:

  $ git log --end-of-options --foo
  fatal: option '--foo' must come before non-option arguments

because we refuse to accept "--foo" as a filename (because it starts
with a dash) even though we could know that we saw end-of-options. The
verify_filename() function simply doesn't accept this extra information.

So that is the status quo, and this patch doubles down further on that.
Commands like "git reset" have the same problem, but they won't even
know that parse-options saw --end-of-options! So even if we fixed
verify_filename(), they wouldn't have anything to pass to it.

But in practice I don't think this is a big deal. If you are being
careful enough to use --end-of-options, then you should also be using
"--" to disambiguate and avoid the DWIM behavior in the first place. In
other words, doing:

  git log --end-of-options --this-is-a-rev -- --this-is-a-path

works correctly, and will continue to do so. And likewise, with this
patch now:

  git reset --end-of-options --this-is-a-rev -- --this-is-a-path

will work, as well.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree: simplify incompatibility message for --orphan and commit-ish

Use a single translatable string to report that the worktree add option
--orphan is incompatible with a commit-ish instead of having the
commit-ish in a separate translatable string. This reduces the number
of strings to translate and gives translators the full context.

A similar message is used in builtin/describe.c, but with the plural of
commit-ish, and here we need the singular form.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree: standardize incompatibility messages

Use the standard parameterized message for reporting incompatible
options for worktree add. This reduces the number of strings to
translate and makes the UI slightly more consistent.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

clean: factorize incompatibility message

Use the standard parameterized message for reporting incompatible
options to inform users that they can't use -x and -X together. This
reduces the number of strings to translate and makes the UI slightly
more consistent.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

revision, rev-parse: factorize incompatibility messages about - -exclude-hidden

Use the standard parameterized message for reporting incompatible
options to report options that are not accepted in combination with
--exclude-hidden. This reduces the number of strings to translate and
makes the UI a bit more consistent.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

revision: use die_for_incompatible_opt3() for - -graph/--reverse/--walk-reflogs

The revision option --reverse is incompatible with --walk-reflogs and
--graph is incompatible with both --reverse and --walk-reflogs. So they
are all incompatible with each other.

Use the function for checking three mutually incompatible options,
die_for_incompatible_opt3(), to perform this check in one place and
without repetition. This is shorter and clearer.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

repack: use die_for_incompatible_opt3() for -A/-k/--cruft

The repack option --keep-unreachable is incompatible with -A, --cruft is
incompatible with -A and -k, and -k is short for --keep-unreachable. So
they are all incompatible with each other.

Use the function for checking three mutually incompatible options,
die_for_incompatible_opt3(), to perform this check in one place and
without repetition. This is shorter and clearer.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

push: use die_for_incompatible_opt4() for - -delete/--tags/--all/--mirror

The push option --delete is incompatible with --all, --mirror, and
--tags; --tags is incompatible with --all and --mirror; --all is
incompatible with --mirror.  This means they are all incompatible with
each other.  And --branches is an alias for --all.

Use the function for checking four mutually incompatible options,
die_for_incompatible_opt4(), to perform this check in one place and
without repetition.  This is shorter and clearer.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

completion: avoid user confusion in non-cone mode

It is tempting to think of "files and directories" of the current
directory as valid inputs to the add and set subcommands of git
sparse-checkout.  However, in non-cone mode, they often aren't and using
them as potential completions leads to *many* forms of confusion:

Issue #1. It provides the *wrong* files and directories.

For
    git sparse-checkout add
we always want to add files and directories not currently in our sparse
checkout, which means we want file and directories not currently present
in the current working tree.  Providing the files and directories
currently present is thus always wrong.

For
    git sparse-checkout set
we have a similar problem except in the subset of cases where we are
trying to narrow our checkout to a strict subset of what we already
have.  That is not a very common scenario, especially since it often
does not even happen to be true for the first use of the command; for
years we required users to create a sparse-checkout via
    git sparse-checkout init
    git sparse-checkout set <args...>
(or use a clone option that did the init step for you at clone time).
The init command creates a minimal sparse-checkout with just the
top-level directory present, meaning the set command has to be used to
expand the checkout.  Thus, only in a special and perhaps unusual cases
would any of the suggestions from normal file and directory completion
be appropriate.

Issue #2: Suggesting patterns that lead to warnings is unfriendly.

If the user specifies any regular file and omits the leading '/', then
the sparse-checkout command will warn the user that their command is
problematic and suggest they use a leading slash instead.

Issue #3: Completion gets confused by leading '/', and provides wrong paths.

Users often want to anchor their patterns to the toplevel of the
repository, especially when listing individual files.  There are a
number of reasons for this, but notably even sparse-checkout encourages
them to do so (as noted above).  However, if users do so (via adding a
leading '/' to their pattern), then bash completion will interpret the
leading slash not as a request for a path at the toplevel of the
repository, but as a request for a path at the root of the filesytem.
That means at best that completion cannot help with such paths, and if
it does find any completions, they are almost guaranteed to be wrong.

Issue #4: Suggesting invalid patterns from subdirectories is unfriendly.

There is no per-directory equivalent to .gitignore with
sparse-checkouts.  There is only a single worktree-global
$GIT_DIR/info/sparse-checkout file.  As such, paths to files must be
specified relative to the toplevel of a repository.  Providing
suggestions of paths that are relative to the current working directory,
as bash completion defaults to, is wrong when the current working
directory is not the worktree toplevel directory.

Issue #5: Paths with special characters will be interpreted incorrectly

The entries in the sparse-checkout file are patterns, not paths.  While
most paths also qualify as patterns (though even in such cases it would
be better for users to not use them directly but prefix them with a
leading '/'), there are a variety of special characters that would need
special escaping beyond the normal shell escaping: '*', '?', '\', '[',
']', and any leading '#' or '!'.  If completion suggests any such paths,
users will likely expect them to be treated as an exact path rather than
as a pattern that might match some number of files other than 1.

However, despite the first four issues, we can note that _if_ users are
using tab completion, then they are probably trying to specify a path in
the index.  As such, we transform their argument into a top-level-rooted
pattern that matches such a file.  For example, if they type:
   git sparse-checkout add Make<TAB>
we could "complete" to
   git sparse-checkout add /Makefile
or, if they ran from the Documentation/technical/ subdirectory:
   git sparse-checkout add m<TAB>
we could "complete" it to:
   git sparse-checkout add /Documentation/technical/multi-pack-index.txt
Note in both cases I use "complete" in quotes, because we actually add
characters both before and after the argument in question, so we are
kind of abusing "bash completions" to be "bash completions AND
beginnings".

The fifth issue is a bit stickier, especially when you consider that we
not only need to deal with escaping issues because of special meanings
of patterns in sparse-checkout & gitignore files, but also that we need
to consider escaping issues due to ls-files needing to sometimes quote
or escape characters, and because the shell needs to escape some
characters.  The multiple interacting forms of escaping could get ugly;
this patch makes no attempt to do so and simply documents that we
decided to not deal with those corner cases for now but at least get the
common cases right.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

completion: avoid misleading completions in cone mode

The "set" and "add" subcommands of "sparse-checkout", when in cone mode,
should only complete on directories.  For bash_completion in general,
when no completions are returned for any subcommands, it will often fall
back to standard completion of files and directories as a substitute.
That is not helpful here.  Since we have already looked for all valid
completions, if none are found then falling back to standard bash file
and directory completion is at best actively misleading.  In fact, there
are three different ways it can be actively misleading.  Add a long
comment in the code about how that fallback behavior can deceive, and
disable the fallback by returning a fake result as the sole completion.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

completion: fix logic for determining whether cone mode is active

_git_sparse_checkout() was checking whether we were in cone mode by
checking whether either:

    A) core.sparseCheckoutCone was "true"
    B) "--cone" was specified on the command line

This code has 2 bugs I didn't catch in my review at the time

    1) core.sparseCheckout must be "true" for core.sparseCheckoutCone to
       be relevant (which matters since "git sparse-checkout disable"
       only unsets core.sparseCheckout, not core.sparseCheckoutCone)
    2) The presence of "--no-cone" should override any config setting

Further, I forgot to update this logic as part of 2d95707a02
("sparse-checkout: make --cone the default", 2022-04-22) for the new
default.

Update the code for the new default and make it be more careful in
determining whether to complete based on cone mode or non-cone mode.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

completion: squelch stray errors in sparse-checkout completion

If, in the root of a project, one types

git sparse-checkout set --cone ../<TAB>

then an error message of the form

fatal: ../: '../' is outside repository at '/home/newren/floss/git'

is written to stderr, which munges the users view of their own command.
Squelch such messages by using the __git() wrapper, designed for this
purpose; see commit e15098a314 (completion: consolidate silencing errors
from git commands, 2017-02-03) for more on the wrapper.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hooks--pre-commit: detect non-ASCII when renaming

When diff.renames is turned on, the diff-filter will not return renamed
files (or copied ones with diff.renames=copy) and potential non-ASCII
characters would not be caught by this hook.

Use the plumbing command diff-index instead of the porcelain one to not
be affected by diff.rename.

Signed-off-by: Julian Prein <druckdev@protonmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t6301: write invalid object ID via `test-tool ref-store`

One of the tests in t6301 verifies that the reference backend correctly
warns about the case where a reference points to a non-existent object.
This is done by writing the object ID into the loose reference directly,
which is quite intimate with how the files backend works.

Refactor the code to instead use `test-tool ref-store` to write the
reference, which is backend-agnostic.

There are two more tests in this file which write loose files directly,
as well. But both of them are indeed quite specific to the loose files
backend and cannot be easily ported to other backends. We thus mark them
as requiring the REFFILES prerequisite.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5551: stop writing packed-refs directly

We have multiple tests in t5551 that write thousands of tags. To do so
efficiently we generate the tags by writing the `packed-refs` file
directly, which of course assumes that the reference database is backed
by the files backend.

Refactor the code to instead use a single `git update-ref --stdin`
command to write the tags. While the on-disk end result is not the same
as we now have a bunch of loose refs instead of a single packed-refs
file, the distinction shouldn't really matter for any of the tests that
use this helper.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5401: speed up creation of many branches

One of the tests in t5401 creates a bunch of branches by calling
git-branch(1) for every one of them. This is quite inefficient and takes
a comparatively long time even on Unix systems where spawning processes
is comparatively fast. Refactor it to instead use git-update-ref(1),
which leads to an almost 10-fold speedup:

```
Benchmark 1: ./t5401-update-hooks.sh (rev = HEAD)
  Time (mean ± σ):     983.2 ms ±  97.6 ms    [User: 328.8 ms, System: 679.2 ms]
  Range (min … max):   882.9 ms … 1078.0 ms    3 runs

Benchmark 2: ./t5401-update-hooks.sh (rev = HEAD~)
  Time (mean ± σ):      9.312 s ±  0.398 s    [User: 2.766 s, System: 6.617 s]
  Range (min … max):    8.885 s …  9.674 s    3 runs

Summary
  ./t5401-update-hooks.sh (rev = HEAD) ran
    9.47 ± 1.02 times faster than ./t5401-update-hooks.sh (rev = HEAD~)
```

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t4013: simplify magic parsing and drop "failure"

In t14013, we have various different tests that verify whether certain
diffs are generated as expected. As much of the logic is the same across
many of the tests we some common code in there that generates the actual
test cases for us.

As some diffs are more special than others depending on the command line
parameters passed to git-diff(1), these tests need to adapt behaviour to
the specific test case sometimes. This is done via colon-prefixed magic
commands, of which we currently know "failure" and "noellipses". The
logic to parse this magic is a bit convoluted though and hard to grasp,
also due to the rather unnecessary nesting.

Un-nest the cases so that it becomes a bit more straightfoward. The
logic is further simplified by removing support for the "failure" magic,
which is not actually used anymore.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t3310: stop checking for reference existence via `test -f`

One of the tests in t3310 exercises whether the special references
`NOTES_MERGE_PARTIAL` and `NOTES_MERGE_REF` exist as expected when the
notes subsystem runs into a merge conflict. This is done by checking
on-disk data structures directly though instead of asking the reference
backend.

Refactor the test to use git-rev-parse(1) instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>