git-prompt: stop manually parsing HEAD with unknown ref formats
We're manually parsing the HEAD reference in git-prompt to figure out
whether it is a symbolic or direct reference. This makes it intimately
tied to the on-disk format we use to store references and will stop
working once we gain additional reference backends in the Git project.
Ideally, we would refactor the code to exclusively use plumbing tools to
read refs such that we do not have to care about the on-disk format at
all. Unfortunately though, spawning processes can be quite expensive on
some systems like Windows. As the Git prompt logic may be executed quite
frequently we try very hard to spawn as few processes as possible. This
refactoring is thus out of question for now.
Instead, condition the logic on the repository's ref format: if the repo
uses the the "files" backend we can continue to use the old logic and
read the respective files from disk directly. If it's anything else,
then we use git-symbolic-ref(1) to read the value of HEAD.
This change makes the Git prompt compatible with the upcoming "reftable"
format.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 8 Jan 2024 19:21:18 +0000 (11:21 -0800)]
Merge branch 'ps/refstorage-extension' into ps/prompt-parse-HEAD-futureproof
* ps/refstorage-extension:
t9500: write "extensions.refstorage" into config
builtin/clone: introduce `--ref-format=` value flag
builtin/init: introduce `--ref-format=` value flag
builtin/rev-parse: introduce `--show-ref-format` flag
t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar
setup: introduce GIT_DEFAULT_REF_FORMAT envvar
setup: introduce "extensions.refStorage" extension
setup: set repository's formats on init
setup: start tracking ref storage format
refs: refactor logic to look up storage backends
worktree: skip reading HEAD when repairing worktrees
t: introduce DEFAULT_REPO_FORMAT prereq
Junio C Hamano [Tue, 2 Jan 2024 21:51:30 +0000 (13:51 -0800)]
Merge branch 'ps/pseudo-refs'
Assorted changes around pseudoref handling.
* ps/pseudo-refs:
bisect: consistently write BISECT_EXPECTED_REV via the refdb
refs: complete list of special refs
refs: propagate errno when reading special refs fails
wt-status: read HEAD and ORIG_HEAD via the refdb
Junio C Hamano [Tue, 2 Jan 2024 21:51:29 +0000 (13:51 -0800)]
Merge branch 'la/trailer-cleanups'
Code clean-up.
* la/trailer-cleanups:
trailer: use offsets for trailer_start/trailer_end
trailer: find the end of the log message
commit: ignore_non_trailer computes number of bytes to ignore
In t9500 we're writing a custom configuration that sets up gitweb. This
requires us to manually ensure that the repository format is configured
as required, including both the repository format version and
extensions. With the introduction of the "extensions.refStorage"
extension we need to update the test to also write this new one.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/clone: introduce `--ref-format=` value flag
Introduce a new `--ref-format` value flag for git-clone(1) that allows
the user to specify the ref format that is to be used for a newly
initialized repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/init: introduce `--ref-format=` value flag
Introduce a new `--ref-format` value flag for git-init(1) that allows
the user to specify the ref format that is to be used for a newly
initialized repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new GIT_TEST_DEFAULT_REF_FORMAT environment variable that
lets developers run the test suite with a different default ref format
without impacting the ref format used by non-test Git invocations. This
is modeled after GIT_TEST_DEFAULT_OBJECT_FORMAT, which does the same
thing for the repository's object format.
Adapt the setup of the `REFFILES` test prerequisite to be conditionally
set based on the default ref format.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new GIT_DEFAULT_REF_FORMAT environment variable that lets
users control the default ref format used by both git-init(1) and
git-clone(1). This is modeled after GIT_DEFAULT_OBJECT_FORMAT, which
does the same thing for the repository's object format.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new "extensions.refStorage" extension that allows us to
specify the ref storage format used by a repository. For now, the only
supported format is the "files" format, but this list will likely soon
be extended to also support the upcoming "reftable" format.
There have been discussions on the Git mailing list in the past around
how exactly this extension should look like. One alternative [1] that
was discussed was whether it would make sense to model the extension in
such a way that backends are arbitrarily stackable. This would allow for
a combined value of e.g. "loose,packed-refs" or "loose,reftable", which
indicates that new refs would be written via "loose" files backend and
compressed into "packed-refs" or "reftable" backends, respectively.
It is arguable though whether this flexibility and the complexity that
it brings with it is really required for now. It is not foreseeable that
there will be a proliferation of backends in the near-term future, and
the current set of existing formats and formats which are on the horizon
can easily be configured with the much simpler proposal where we have a
single value, only.
Furthermore, if we ever see that we indeed want to gain the ability to
arbitrarily stack the ref formats, then we can adapt the current
extension rather easily. Given that Git clients will refuse any unknown
value for the "extensions.refStorage" extension they would also know to
ignore a stacked "loose,packed-refs" in the future.
So let's stick with the easy proposal for the time being and wire up the
extension.
The proper hash algorithm and ref storage format that will be used for a
newly initialized repository will be figured out in `init_db()` via
`validate_hash_algorithm()` and `validate_ref_storage_format()`. Until
now though, we never set up the hash algorithm or ref storage format of
`the_repository` accordingly.
There are only two callsites of `init_db()`, one in git-init(1) and one
in git-clone(1). The former function doesn't care for the formats to be
set up properly because it never access the repository after calling the
function in the first place.
For git-clone(1) it's a different story though, as we call `init_db()`
before listing remote refs. While we do indeed have the wrong hash
function in `the_repository` when `init_db()` sets up a non-default
object format for the repository, it never mattered because we adjust
the hash after learning about the remote's hash function via the listed
refs.
So the current state is correct for the hash algo, but it's not for the
ref storage format because git-clone(1) wouldn't know to set it up
properly. But instead of adjusting only the `ref_storage_format`, set
both the hash algo and the ref storage format so that `the_repository`
is in the correct state when `init_db()` exits. This is fine as we will
adjust the hash later on anyway and makes it easier to reason about the
end state of `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to discern which ref storage format a repository is supposed to
use we need to start setting up and/or discovering the format. This
needs to happen in two separate code paths.
- The first path is when we create a repository via `init_db()`. When
we are re-initializing a preexisting repository we need to retain
the previously used ref storage format -- if the user asked for a
different format then this indicates an error and we error out.
Otherwise we either initialize the repository with the format asked
for by the user or the default format, which currently is the
"files" backend.
- The second path is when discovering repositories, where we need to
read the config of that repository. There is not yet any way to
configure something other than the "files" backend, so we can just
blindly set the ref storage format to this backend.
Wire up this logic so that we have the ref storage format always readily
available when needed. As there is only a single backend and because it
is not configurable we cannot yet verify that this tracking works as
expected via tests, but tests will be added in subsequent commits. To
countermand this ommission now though, raise a BUG() in case the ref
storage format is not set up properly in `ref_store_init()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to look up ref storage backends, we're currently using a linked
list of backends, where each backend is expected to set up its `next`
pointer to the next ref storage backend. This is kind of a weird setup
as backends need to be aware of other backends without much of a reason.
Refactor the code so that the array of backends is centrally defined in
"refs.c", where each backend is now identified by an integer constant.
Expose functions to translate from those integer constants to the name
and vice versa, which will be required by subsequent patches.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
worktree: skip reading HEAD when repairing worktrees
When calling `git init --separate-git-dir=<new-path>` on a preexisting
repository, we move the Git directory of that repository to the new path
specified by the user. If there are worktrees present in the repository,
we need to repair the worktrees so that their gitlinks point to the new
location of the repository.
This repair logic will load repositories via `get_worktrees()`, which
will enumerate up and initialize all worktrees. Part of initialization
is logic that we resolve their respective worktree HEADs, even though
that information may not actually be needed in the end by all callers.
Although not a problem presently with the file-based reference backend,
it will become a problem with the upcoming reftable backend. In the
context of git-init(1) we do not have a fully-initialized repository set
up via `setup_git_directory()` or friends. Consequently, we do not know
about the repository format when `repair_worktrees()` is called, and
properly setting up all parts of the repositroy in `init_db()` before we
try to repair worktrees is not an easy task. With the introduction of
the reftable backend, we would ultimately try to look up the worktree
HEADs before we have figured out the reference format, which does not
work.
We do not require the worktree HEADs at all to repair worktrees. So
let's fix this issue by skipping over the step that reads them.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
A limited number of tests require repositories to have the default
repository format or otherwise they would fail to run, e.g. because they
fail to detect the correct hash function. While the hash function is the
only extension right now that creates problems like this, we are about
to add a second extension for the ref format.
Introduce a new DEFAULT_REPO_FORMAT prereq that can easily be amended
whenever we add new format extensions. Next to making any such changes
easier on us, the prerequisite's name should also help to clarify the
intent better.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 27 Dec 2023 22:52:28 +0000 (14:52 -0800)]
Merge branch 'ps/clone-into-reftable-repository'
"git clone" has been prepared to allow cloning a repository with
non-default hash function into a repository that uses the reftable
backend.
* ps/clone-into-reftable-repository:
builtin/clone: create the refdb with the correct object format
builtin/clone: skip reading HEAD when retrieving remote
builtin/clone: set up sparse checkout later
builtin/clone: fix bundle URIs with mismatching object formats
remote-curl: rediscover repository when fetching refs
setup: allow skipping creation of the refdb
setup: extract function to create the refdb
* jc/doc-most-refs-are-not-that-special:
docs: MERGE_AUTOSTASH is not that special
docs: AUTO_MERGE is not that special
refs.h: HEAD is not that special
git-bisect.txt: BISECT_HEAD is not that special
git.txt: HEAD is not that special
Junio C Hamano [Wed, 27 Dec 2023 22:52:24 +0000 (14:52 -0800)]
Merge branch 'jc/checkout-B-branch-in-use'
"git checkout -B <branch> [<start-point>]" allowed a branch that is
in use in another worktree to be updated and checked out, which
might be a bit unexpected. The rule has been tightened, which is a
breaking change. "--ignore-other-worktrees" option is required to
unbreak you, if you are used to the current behaviour that "-B"
overrides the safety.
* jc/checkout-B-branch-in-use:
checkout: forbid "-B <branch>" from touching a branch used elsewhere
checkout: refactor die_if_checked_out() caller
Linus Arver [Fri, 20 Oct 2023 19:01:35 +0000 (19:01 +0000)]
trailer: use offsets for trailer_start/trailer_end
Previously these fields in the trailer_info struct were of type "const
char *" and pointed to positions in the input string directly (to the
start and end positions of the trailer block).
Use offsets to make the intended usage less ambiguous. We only need to
reference the input string in format_trailer_info(), so update that
function to take a pointer to the input.
While we're at it, rename trailer_start to trailer_block_start to be
more explicit about these offsets (that they are for the entire trailer
block including other trailers). Ditto for trailer_end.
Reported-by: Glen Choo <glencbz@gmail.com> Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Linus Arver [Fri, 20 Oct 2023 19:01:34 +0000 (19:01 +0000)]
trailer: find the end of the log message
Previously, trailer_info_get() computed the trailer block end position
by
(1) checking for the opts->no_divider flag and optionally calling
find_patch_start() to find the "patch start" location (patch_start), and
(2) calling find_trailer_end() to find the end of the trailer block
using patch_start as a guide, saving the return value into
"trailer_end".
The logic in (1) was awkward because the variable "patch_start" is
misleading if there is no patch in the input. The logic in (2) was
misleading because it could be the case that no trailers are in the
input (yet we are setting a "trailer_end" variable before even searching
for trailers, which happens later in find_trailer_start()). The name
"find_trailer_end" was misleading because that function did not look for
any trailer block itself --- instead it just computed the end position
of the log message in the input where the end of the trailer block (if
it exists) would be (because trailer blocks must always come after the
end of the log message).
Combine the logic in (1) and (2) together into find_patch_start() by
renaming it to find_end_of_log_message(). The end of the log message is
the starting point which find_trailer_start() needs to start searching
backward to parse individual trailers (if any).
Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 20 Dec 2023 18:19:58 +0000 (10:19 -0800)]
Merge branch 'ps/clone-into-reftable-repository' into ps/refstorage-extension
* ps/clone-into-reftable-repository:
builtin/clone: create the refdb with the correct object format
builtin/clone: skip reading HEAD when retrieving remote
builtin/clone: set up sparse checkout later
builtin/clone: fix bundle URIs with mismatching object formats
remote-curl: rediscover repository when fetching refs
setup: allow skipping creation of the refdb
setup: extract function to create the refdb
Junio C Hamano [Wed, 20 Dec 2023 18:14:54 +0000 (10:14 -0800)]
Merge branch 'jk/config-cleanup'
Code clean-up around use of configuration variables.
* jk/config-cleanup:
sequencer: simplify away extra git_config_string() call
gpg-interface: drop pointless config_error_nonbool() checks
push: drop confusing configset/callback redundancy
config: use git_config_string() for core.checkRoundTripEncoding
diff: give more detailed messages for bogus diff.* config
config: use config_error_nonbool() instead of custom messages
imap-send: don't use git_die_config() inside callback
git_xmerge_config(): prefer error() to die()
config: reject bogus values for core.checkstat
Junio C Hamano [Wed, 20 Dec 2023 18:14:54 +0000 (10:14 -0800)]
Merge branch 'jk/implicit-true'
Some codepaths did not correctly parse configuration variables
specified with valueless "true", which has been corrected.
* jk/implicit-true:
fsck: handle NULL value when parsing message config
trailer: handle NULL value when parsing trailer-specific config
submodule: handle NULL value when parsing submodule.*.branch
help: handle NULL value for alias.* config
trace2: handle NULL values in tr2_sysenv config callback
setup: handle NULL value when parsing extensions
config: handle NULL value when parsing non-bools
Junio C Hamano [Wed, 20 Dec 2023 18:14:54 +0000 (10:14 -0800)]
Merge branch 'jk/end-of-options'
"git $cmd --end-of-options --rev -- --path" for some $cmd failed
to interpret "--rev" as a rev, and "--path" as a path. This was
fixed for many programs like "reset" and "checkout".
* jk/end-of-options:
parse-options: decouple "--end-of-options" and "--"
Junio C Hamano [Wed, 20 Dec 2023 18:14:53 +0000 (10:14 -0800)]
Merge branch 'rs/incompatible-options-messages'
Clean-up code that handles combinations of incompatible options.
* rs/incompatible-options-messages:
worktree: simplify incompatibility message for --orphan and commit-ish
worktree: standardize incompatibility messages
clean: factorize incompatibility message
revision, rev-parse: factorize incompatibility messages about - -exclude-hidden
revision: use die_for_incompatible_opt3() for - -graph/--reverse/--walk-reflogs
repack: use die_for_incompatible_opt3() for -A/-k/--cruft
push: use die_for_incompatible_opt4() for - -delete/--tags/--all/--mirror
Junio C Hamano [Wed, 20 Dec 2023 18:14:53 +0000 (10:14 -0800)]
Merge branch 'jc/revision-parse-int'
The command line parser for the "log" family of commands was too
loose when parsing certain numbers, e.g., silently ignoring the
extra 'q' in "git log -n 1q" without complaining, which has been
tightened up.
* jc/revision-parse-int:
revision: parse integer arguments to --max-count, --skip, etc., more carefully
Junio C Hamano [Wed, 20 Dec 2023 18:14:53 +0000 (10:14 -0800)]
Merge branch 'ps/ref-tests-update-more'
Tests update.
* ps/ref-tests-update-more:
t6301: write invalid object ID via `test-tool ref-store`
t5551: stop writing packed-refs directly
t5401: speed up creation of many branches
t4013: simplify magic parsing and drop "failure"
t3310: stop checking for reference existence via `test -f`
t1417: make `reflog --updateref` tests backend agnostic
t1410: use test-tool to create empty reflog
t1401: stop treating FETCH_HEAD as real reference
t1400: split up generic reflog tests from the reffile-specific ones
t0410: mark tests to require the reffiles backend
The sample pre-commit hook that tries to catch introduction of new
paths that use potentially non-portable characters did not notice
an existing path getting renamed to such a problematic path, when
rename detection was enabled.
* jp/use-diff-index-in-pre-commit-sample:
hooks--pre-commit: detect non-ASCII when renaming
Junio C Hamano [Wed, 20 Dec 2023 18:14:52 +0000 (10:14 -0800)]
Merge branch 'en/complete-sparse-checkout'
Command line completion (in contrib/) learned to complete path
arguments to the "add/set" subcommands of "git sparse-checkout"
better.
* en/complete-sparse-checkout:
completion: avoid user confusion in non-cone mode
completion: avoid misleading completions in cone mode
completion: fix logic for determining whether cone mode is active
completion: squelch stray errors in sparse-checkout completion
René Scharfe [Tue, 19 Dec 2023 07:42:18 +0000 (08:42 +0100)]
rebase: use strvec_pushf() for format-patch revisions
In run_am(), a strbuf is used to create a revision argument that is then
added to the argument list for git format-patch using strvec_push().
Use strvec_pushf() to add it directly instead, simplifying the code
and plugging a small leak on the error code path.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stan Hu [Tue, 19 Dec 2023 22:14:18 +0000 (14:14 -0800)]
completion: support pseudoref existence checks for reftables
In contrib/completion/git-completion.bash, there are a bunch of
instances where we read pseudorefs, such as HEAD, MERGE_HEAD,
REVERT_HEAD, and others via the filesystem. However, the upcoming
reftable refs backend won't use '.git/HEAD' at all but instead will
write an invalid refname as placeholder for backwards compatibility,
which will break the git-completion script.
Update the '__git_pseudoref_exists' function to:
1. Recognize the placeholder '.git/HEAD' written by the reftable
backend (its content is specified in the reftable specs).
2. If reftable is in use, use 'git rev-parse' to determine whether the
given ref exists.
3. Otherwise, continue to use 'test -f' to check for the ref's filename.
Signed-off-by: Stan Hu <stanhu@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stan Hu [Tue, 19 Dec 2023 22:14:17 +0000 (14:14 -0800)]
completion: refactor existence checks for pseudorefs
In preparation for the reftable backend, this commit introduces a
'__git_pseudoref_exists' function that continues to use 'test -f' to
determine whether a given pseudoref exists in the local filesystem.
Signed-off-by: Stan Hu <stanhu@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 19 Dec 2023 19:26:25 +0000 (11:26 -0800)]
remote.h: retire CAS_OPT_NAME
When the "--force-with-lease" option was introduced in 28f5d176
(remote.c: add command line option parser for "--force-with-lease",
2013-07-08), the design discussion revolved around the concept of
"compare-and-swap", and it can still be seen in the name used for
variables and helper functions. The end-user facing option name
ended up to be a bit different, so during the development iteration
of the feature, we used this C preprocessor macro to make it easier
to rename it later.
All of that happened more than 10 years ago, and the flexibility
afforded by the CAS_OPT_NAME macro outlived its usefulness. Inline
the constant string for the option name, like all other option names
in the code.
Junio C Hamano [Mon, 18 Dec 2023 22:10:13 +0000 (14:10 -0800)]
Merge branch 'jh/trace2-redact-auth'
trace2 streams used to record the URLs that potentially embed
authentication material, which has been corrected.
* jh/trace2-redact-auth:
t0212: test URL redacting in EVENT format
t0211: test URL redacting in PERF format
trace2: redact passwords from https:// URLs by default
trace2: fix signature of trace2_def_param() macro
Junio C Hamano [Mon, 18 Dec 2023 22:10:12 +0000 (14:10 -0800)]
Merge branch 'js/update-urls-in-doc-and-comment'
Stale URLs have been updated to their current counterparts (or
archive.org) and HTTP links are replaced with working HTTPS links.
* js/update-urls-in-doc-and-comment:
doc: refer to internet archive
doc: update links for andre-simon.de
doc: switch links to https
doc: update links to current pages
Junio C Hamano [Mon, 18 Dec 2023 22:10:11 +0000 (14:10 -0800)]
Merge branch 'ps/commit-graph-less-paranoid'
Earlier we stopped relying on commit-graph that (still) records
information about commits that are lost from the object store,
which has negative performance implications. The default has been
flipped to disable this pessimization.
* ps/commit-graph-less-paranoid:
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
Junio C Hamano [Mon, 18 Dec 2023 22:10:11 +0000 (14:10 -0800)]
Merge branch 'cc/git-replay'
Introduce "git replay", a tool meant on the server side without
working tree to recreate a history.
* cc/git-replay:
replay: stop assuming replayed branches do not diverge
replay: add --contained to rebase contained branches
replay: add --advance or 'cherry-pick' mode
replay: use standard revision ranges
replay: make it a minimal server side command
replay: remove HEAD related sanity check
replay: remove progress and info output
replay: add an important FIXME comment about gpg signing
replay: change rev walking options
replay: introduce pick_regular_commit()
replay: die() instead of failing assert()
replay: start using parse_options API
replay: introduce new builtin
t6429: remove switching aspects of fast-rebase
Junio C Hamano [Mon, 18 Dec 2023 22:10:11 +0000 (14:10 -0800)]
Merge branch 'ps/ref-deletion-updates'
Simplify API implementation to delete references by eliminating
duplication.
* ps/ref-deletion-updates:
refs: remove `delete_refs` callback from backends
refs: deduplicate code to delete references
refs/files: use transactions to delete references
t5510: ensure that the packed-refs file needs locking
In the recent commit 2e87fca189 (test framework: further deprecate
test_i18ngrep, 2023-10-31), the test_i18ngrep function was
deprecated, and all the callers were updated to call the test_grep
function instead. But test_grep inherited an error message that
still refers to test_i18ngrep by mistake. Correct it so that a
broken call to the test_grep will identify itself as such.
Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sat, 16 Dec 2023 10:47:21 +0000 (11:47 +0100)]
git-compat-util: convert skip_{prefix,suffix}{,_mem} to bool
Use the data type bool and its values true and false to document the
binary return value of skip_prefix() and friends more explicitly.
This first use of stdbool.h, introduced with C99, is meant to check
whether there are platforms that claim support for C99, as tested by 7bc341e21b (git-compat-util: add a test balloon for C99 support,
2021-12-01), but still lack that header for some reason.
A fallback based on a wider type, e.g. int, would have to deal with
comparisons somehow to emulate that any non-zero value is true:
bool b1 = 1;
bool b2 = 2;
if (b1 == b2) puts("This is true.");
int i1 = 1;
int i2 = 2;
if (i1 == i2) puts("Not printed.");
#define BOOLEQ(a, b) (!(a) == !(b))
if (BOOLEQ(i1, i2)) puts("This is true.");
So we'd be better off using bool everywhere without a fallback, if
possible. That's why this patch doesn't include any.
Signed-off-by: René Scharfe <l.s.r@web.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Sun, 17 Dec 2023 14:11:34 +0000 (22:11 +0800)]
fetch: no redundant error message for atomic fetch
If an error occurs during an atomic fetch, a redundant error message
will appear at the end of do_fetch(). It was introduced in b3a804663c
(fetch: make `--atomic` flag cover backfilling of tags, 2022-02-17).
Because a failure message is displayed before setting retcode in the
function do_fetch(), calling error() on the err message at the end of
this function may result in redundant or empty error message to be
displayed.
We can remove the redundant error() function, because we know that
the function ref_transaction_abort() never fails. While we can find a
common pattern for calling ref_transaction_abort() by running command
"git grep -A1 ref_transaction_abort", e.g.:
if (ref_transaction_abort(transaction, &error))
error("abort: %s", error.buf);
Following this pattern, we can tolerate the return value of the function
ref_transaction_abort() being changed in the future. We also delay the
output of the err message to the end of do_fetch() to reduce redundant
code. With these changes, the test case "fetch porcelain output
(atomic)" in t5574 will also be fixed.
Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Sun, 17 Dec 2023 14:11:33 +0000 (22:11 +0800)]
t5574: test porcelain output of atomic fetch
The test case "fetch porcelain output" checks output of the fetch
command. The error output must be empty with the follow assertion:
test_must_be_empty stderr
But this assertion fails if using atomic fetch. Refactor this test case
to use different fetch options by splitting it into three test cases.
1. "setup for fetch porcelain output".
2. "fetch porcelain output": for non-atomic fetch.
3. "fetch porcelain output (atomic)": for atomic fetch.
Add new command "test_commit ..." in the first test case, so that if we
run these test cases individually (--run=4-6), "git rev-parse HEAD~"
command will work properly. Run the above test cases, we can find that
one test case has a known breakage, as shown below:
ok 4 - setup for fetch porcelain output
ok 5 - fetch porcelain output # TODO known breakage vanished
not ok 6 - fetch porcelain output (atomic) # TODO known breakage
The failed test case has an error message with only the error prompt but
no message body, as follows:
'stderr' is not empty, it contains:
error:
In a later commit, we will fix this issue.
Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 15 Dec 2023 20:32:42 +0000 (12:32 -0800)]
git-bisect.txt: BISECT_HEAD is not that special
The description of "git bisect --no-checkout" called BISECT_HEAD a
"special ref", but there is nothing special about it. It merely is
yet another pseudoref.
Junio C Hamano [Fri, 15 Dec 2023 20:32:41 +0000 (12:32 -0800)]
git.txt: HEAD is not that special
The introductory text in "git help git" that describes HEAD called
it "a special ref". It is special compared to the more regular refs
like refs/heads/master and refs/tags/v1.0.0, but not that special,
unlike truly special ones like FETCH_HEAD.
Rewrite a few sentences to also introduce the distinction between a
regular ref that contain the object name and a symbolic ref that
contain the name of another ref. Update the description of HEAD
that point at the current branch to use the more correct term, a
"symbolic ref".
This was found as part of auditing the documentation and in-code
comments for uses of "special ref" that refer merely a "pseudo ref".
Eric Sunshine [Fri, 15 Dec 2023 20:43:33 +0000 (15:43 -0500)]
git-add.txt: add missing short option -A to synopsis
With one exception, the synopsis for `git add` consistently lists the
short counterpart alongside the long-form of each option (for instance,
"[--edit | -e]"). The exception is that -A is not mentioned alongside
--all. Fix this inconsistency
Reported-by: Benjamin Lehmann <ben.lehmann@gmail.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
tests: adjust whitespace in chainlint expectations
The "check-chainlint" target runs automatically when running tests and
performs self-checks to verify that the chainlinter itself produces the
expected output. Originally, the chainlinter was implemented via sed,
but the infrastructure has been rewritten in fb41727b7e (t: retire
unused chainlint.sed, 2022-09-01) to use a Perl script instead.
The rewrite caused some slight whitespace changes in the output that are
ultimately not of much importance. In order to be able to assert that
the actual chainlinter errors match our expectations we thus have to
ignore whitespace characters when diffing them. As the `-w` flag is not
in POSIX we try to use `git diff -w --no-index` before we fall back to
`diff -w -u`.
To accomodate for cases where the host system has no Git installation we
use the locally-compiled version of Git. This can result in problems
though when the Git project's repository is using extensions that the
locally-compiled version of Git doesn't understand. It will refuse to
run and thus cause the checks to fail.
Instead of improving the detection logic, fix our ".expect" files so
that we do not need any post-processing at all anymore. This allows us
to drop the `-w` flag when diffing so that we can always use diff(1)
now.
Note that we keep some of the post-processing of `chainlint.pl` output
intact to strip leading line numbers generated by the script. Having
these would cause a rippling effect whenever we add a new test that
sorts into the middle of existing tests and would require us to
renumerate all subsequent lines, which seems rather pointless.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 14 Dec 2023 21:48:59 +0000 (16:48 -0500)]
mailinfo: avoid recursion when unquoting From headers
Our unquote_comment() function is recursive; when it sees a comment
within a comment, like:
(this is an (embedded) comment)
it recurses to handle the inner comment. This is fine for practical use,
but it does mean that you can easily run out of stack space with a
malicious header. For example:
segfaults on my system. And since mailinfo is likely to be fed untrusted
input from the Internet (if not by human users, who might recognize a
garbage header, but certainly there are automated systems that apply
patches from a list) it may be possible for an attacker to trigger the
problem.
That said, I don't think there's an interesting security vulnerability
here. All an attacker can do is make it impossible to parse their email
and apply their patch, and there are lots of ways to generate bogus
emails. So it's more of an annoyance than anything.
But it's pretty easy to fix it. The recursion is not helping us preserve
any particular state from each level. The only flag in our parsing is
take_next_literally, and we can never recurse when it is set (since the
start of a new comment implies it was not backslash-escaped). So it is
really only useful for finding the end of the matched pair of
parentheses. We can do that easily with a simple depth counter.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 14 Dec 2023 21:47:46 +0000 (16:47 -0500)]
t5100: make rfc822 comment test more careful
When processing "From" headers in an email, mailinfo "unquotes" quoted
strings and rfc822 parenthesized comments. For quoted strings, we
actually remove the double-quotes, so:
From: "A U Thor" <someone@example.com>
become:
Author: A U Thor
Email: someone@example.com
But for comments, we leave the outer parentheses in place, so:
From: A U (this is a comment) Thor <someone@example.com>
becomes:
Author: A U (this is a comment) Thor
Email: someone@example.com
So what is the comment "unquoting" actually doing? In our code, being in
a comment section has exactly two effects:
1. We'll unquote backslash-escaped characters inside a comment
section.
2. We _won't_ unquote double-quoted strings inside a comment section.
Our test for comments in t5100 checks this:
From: "A U Thor" <somebody@example.com> (this is \(really\) a comment (honestly))
So it is covering (1), but not (2). Let's add in a quoted string to
cover this.
Moreover, because the comment appears at the end of the From header,
there's nothing to confirm that we correctly found the end of the
comment section (and not just the end-of-string). Let's instead move it
to the beginning of the header, which means we can confirm that the
existing quoted string is detected (which will only happen if we know
we've left the comment block).
As expected, the test continues to pass, but this will give us more
confidence as we refactor the code in the next patch.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
bisect: consistently write BISECT_EXPECTED_REV via the refdb
We're inconsistently writing BISECT_EXPECTED_REV both via the filesystem
and via the refdb, which violates the newly established rules for how
special refs must be treated. This works alright in practice with the
reffiles reference backend, but will cause bugs once we gain additional
backends.
Fix this issue and consistently write BISECT_EXPECTED_REV via the refdb
so that it is no longer a special ref.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have some references that are more special than others. The reason
for them being special is that they either do not follow the usual
format of references, or that they are written to the filesystem
directly by the respective owning subsystem and thus circumvent the
reference backend.
This works perfectly fine right now because the reffiles backend will
know how to read those refs just fine. But with the prospect of gaining
a new reference backend implementation we need to be a lot more careful
here:
- We need to make sure that we are consistent about how those refs are
written. They must either always be written via the filesystem, or
they must always be written via the reference backend. Any mixture
will lead to inconsistent state.
- We need to make sure that such special refs are always handled
specially when reading them.
We're already mostly good with regard to the first item, except for
`BISECT_EXPECTED_REV` which will be addressed in a subsequent commit.
But the current list of special refs is missing some refs that really
should be treated specially. Right now, we only treat `FETCH_HEAD` and
`MERGE_HEAD` specially here.
Introduce a new function `is_special_ref()` that contains all current
instances of special refs to fix the reading path.
Note that this is only a temporary measure where we record and rectify
the current state. Ideally, the list of special refs should in the end
only contain `FETCH_HEAD` and `MERGE_HEAD` again because they both may
reference multiple objects and can contain annotations, so they indeed
are special.
Based-on-patch-by: Han-Wen Nienhuys <hanwenn@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: propagate errno when reading special refs fails
Some refs in Git are more special than others due to reasons explained
in the next commit. These refs are read via `refs_read_special_head()`,
but this function doesn't behave the same as when we try to read a
normal ref. Most importantly, we do not propagate `failure_errno` in the
case where the reference does not exist, which is behaviour that we rely
on in many parts of Git.
Fix this bug by propagating errno when `strbuf_read_file()` fails.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We read both the HEAD and ORIG_HEAD references directly from the
filesystem in order to figure out whether we're currently splitting a
commit. If both of the following are true:
- HEAD points to the same object as "rebase-merge/amend".
- ORIG_HEAD points to the same object as "rebase-merge/orig-head".
Then we are currently splitting commits.
The current code only works by chance because we only have a single
reference backend implementation. Refactor it to instead read both refs
via the refdb layer so that we'll also be compatible with alternate
reference backends.
There are some subtleties involved here:
- We pass `RESOLVE_REF_READING` so that a missing ref will cause
`read_ref_full()` to return an error.
- We pass `RESOLVE_REF_NO_RECURSE` so that we do not try to resolve
symrefs. The old code didn't resolve symrefs either, and we only
ever write object IDs into the refs in "rebase-merge/".
- In the same spirit we verify that successfully-read refs are not
symbolic refs.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 23 Nov 2023 06:00:31 +0000 (15:00 +0900)]
checkout: forbid "-B <branch>" from touching a branch used elsewhere
"git checkout -B <branch> [<start-point>]", being a "forced" version
of "-b", switches to the <branch>, after optionally resetting its
tip to the <start-point>, even if the <branch> is in use in another
worktree, which is somewhat unexpected.
Protect the <branch> using the same logic that forbids "git checkout
<branch>" from touching a branch that is in use elsewhere.
This is a breaking change that may deserve backward compatibliity
warning in the Release Notes. The "--ignore-other-worktrees" option
can be used as an escape hatch if the finger memory of existing
users depend on the current behaviour of "-B".
Reported-by: Willem Verstraeten <willem.verstraeten@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Tue, 12 Dec 2023 17:04:55 +0000 (18:04 +0100)]
t6300: avoid hard-coding object sizes
f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
hard-coded the expected object sizes. Coincidentally the size of commit
and tag is the same with zlib at the default compression level.
1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
encoded the sizes as a single value, which coincidentally also works
with sha256.
Different compression libraries like zlib-ng may arrive at different
values. Get them from the file system instead of hard-coding them to
make switching the compression library (or changing the compression
level) easier.
Reported-by: Ondrej Pohorelsky <opohorel@redhat.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 12 Dec 2023 22:12:43 +0000 (17:12 -0500)]
mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
When processing a header like a "From" line, mailinfo uses
unquote_quoted_pair() to handle double-quotes and rfc822 parenthesized
comments. It takes a NUL-terminated string on input, and loops over the
"in" pointer until it sees the NUL. When it finds the start of an
interesting block, it delegates to helper functions which also increment
"in", and return the updated pointer.
But there's a bug here: the helpers find the NUL with a post-increment
in the loop condition, like:
while ((c = *in++) != 0)
So when they do see a NUL (rather than the correct termination of the
quote or comment section), they return "in" as one _past_ the NUL
terminator. And thus the outer loop in unquote_quoted_pair() does not
realize we hit the NUL, and keeps reading past the end of the buffer.
We should instead make sure to return "in" positioned at the NUL, so
that the caller knows to stop their loop, too. A hacky way to do this is
to return "in - 1" after leaving the inner loop. But a slightly cleaner
solution is to avoid incrementing "in" until we are sure it contained a
non-NUL byte (i.e., doing it inside the loop body).
The two tests here show off the problem. Since we check the output,
they'll _usually_ report a failure in a normal build, but it depends on
what garbage bytes are found after the heap buffer. Building with
SANITIZE=address reliably notices the problem. The outcome (both the
exit code and the exact bytes) are just what Git happens to produce for
these cases today, and shouldn't be taken as an endorsement. It might be
reasonable to abort on an unterminated string, for example. The priority
for this patch is fixing the out-of-bounds memory access.
Reported-by: Carlos Andrés Ramírez Cataño <antaigroupltda@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/clone: create the refdb with the correct object format
We're currently creating the reference database with a potentially
incorrect object format when the remote repository's object format is
different from the local default object format. This works just fine for
now because the files backend never records the object format anywhere.
But this logic will fail with any new reference backend that encodes
this information in some form either on-disk or in-memory.
The preceding commits have reshuffled code in git-clone(1) so that there
is no code path that will access the reference database before we have
detected the remote's object format. With these refactorings we can now
defer initialization of the reference database until after we have
learned the remote's object format and thus initialize it with the
correct format from the get-go.
These refactorings are required to make git-clone(1) work with the
upcoming reftable backend when cloning repositories with the SHA256
object format.
This change breaks a test in "t5550-http-fetch-dumb.sh" when cloning an
empty repository with `GIT_TEST_DEFAULT_HASH=sha256`. The test expects
the resulting hash format of the empty cloned repository to match the
default hash, but now we always end up with a sha1 repository. The
problem is that for dumb HTTP fetches, we have no easy way to figure out
the remote's hash function except for deriving it based on the hash
length of refs in `info/refs`. But as the remote repository is empty we
cannot rely on this detection mechanism.
Before the change in this commit we already initialized the repository
with the default hash function and then left it as-is. With this patch
we always use the hash function detected via the remote, where we fall
back to "sha1" in case we cannot detect it.
Neither the old nor the new behaviour are correct as we second-guess the
remote hash function in both cases. But given that this is a rather
unlikely edge case (we use the dumb HTTP protocol, the remote repository
uses SHA256 and the remote repository is empty), let's simply adapt the
test to assert the new behaviour. If we want to properly address this
edge case in the future we will have to extend the dumb HTTP protocol so
that we can properly detect the hash function for empty repositories.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/clone: skip reading HEAD when retrieving remote
After we have set up the remote configuration in git-clone(1) we'll call
`remote_get()` to read the remote from the on-disk configuration. But
next to reading the on-disk configuration, `remote_get()` will also
cause us to try and read the repository's HEAD reference so that we can
figure out the current branch. Besides being pointless in git-clone(1)
because we're operating in an empty repository anyway, this will also
break once we move creation of the reference database to a later point
in time.
Refactor the code to introduce a new `remote_get_early()` function that
will skip reading the HEAD reference to address this issue.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When asked to do a sparse checkout, then git-clone(1) will spawn
`git sparse-checkout set` to set up the configuration accordingly. This
requires a proper Git repository or otherwise the command will fail. But
as we are about to move creation of the reference database to a later
point, this prerequisite will not hold anymore.
Move the logic to a later point in time where we know to have created
the reference database already.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/clone: fix bundle URIs with mismatching object formats
We create the reference database in git-clone(1) quite early before
connecting to the remote repository. Given that we do not yet know about
the object format that the remote repository uses at that point in time
the consequence is that the refdb may be initialized with the wrong
object format.
This is not a problem in the context of the files backend as we do not
encode the object format anywhere, and furthermore the only reference
that we write between initializing the refdb and learning about the
object format is the "HEAD" symref. It will become a problem though once
we land the reftable backend, which indeed does require to know about
the proper object format at the time of creation. We thus need to
rearrange the logic in git-clone(1) so that we only initialize the refdb
once we have learned about the actual object format.
As a first step, move listing of remote references to happen earlier,
which also allow us to set up the hash algorithm of the repository
earlier now. While we aim to execute this logic as late as possible
until after most of the setup has happened already, detection of the
object format and thus later the setup of the reference database must
happen before any other logic that may spawn Git commands or otherwise
these Git commands may not recognize the repository as such.
The first Git step where we expect the repository to be fully initalized
is when we fetch bundles via bundle URIs. Funny enough, the comments
there also state that "the_repository must match the cloned repo", which
is indeed not necessarily the case for the hash algorithm right now. So
in practice it is the right thing to detect the remote's object format
before downloading bundle URIs anyway, and not doing so causes clones
with bundle URIs to fail when the local default object format does not
match the remote repository's format.
Unfortunately though, this creates a new issue: downloading bundles may
take a long time, so if we list refs beforehand they might've grown
stale meanwhile. It is not clear how to solve this issue except for a
second reference listing though after we have downloaded the bundles,
which may be an expensive thing to do.
Arguably though, it's preferable to have a staleness issue compared to
being unable to clone a repository altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
remote-curl: rediscover repository when fetching refs
The reftable format encodes the hash function used by the repository
inside of its tables. The reftable backend thus needs to be initialized
with the correct hash function right from the start, or otherwise we may
end up writing tables with the wrong hash function. But git-clone(1)
initializes the reference database before learning about the hash
function used by the remote repository, which has never been a problem
with the reffiles backend.
To fix this, we'll have to change git-clone(1) to be more careful and
only create the reference backend once it learned about the remote hash
function. This creates a problem for git-remote-curl(1), which will then
be spawned at a time where the repository is not yet fully-initialized.
Consequentially, git-remote-curl(1) will fail to detect the repository,
which eventually causes it to error out once it is asked to fetch remote
objects.
We can address this issue by trying to re-discover the Git repository in
case none was detected at startup time. With this change, the clone will
look as following:
1. git-clone(1) sets up the initial repository, excluding the
reference database.
2. git-clone(1) spawns git-remote-curl(1), which will be unable to
detect the repository due to a missing "HEAD".
3. git-clone(1) asks git-remote-curl(1) to list remote references.
This works just fine as this step does not require a local
repository
4. git-clone(1) creates the reference database as it has now learned
about the hash function.
5. git-clone(1) asks git-remote-curl(1) to fetch the remote packfile.
The latter notices that it doesn't have a repository available, but
it now knows to try and re-discover it.
If the re-discovery succeeds in the last step we can continue with the
clone.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow callers to skip creation of the reference database via a new flag
`INIT_DB_SKIP_REFDB`, which is required for git-clone(1) so that we can
create it at a later point once the object format has been discovered
from the remote repository.
Note that we also uplift the call to `create_reference_database()` into
`init_db()`, which makes it easier to handle the new flag for us. This
changes the order in which we do initialization so that we now set up
the Git configuration before we create the reference database. In
practice this move should not result in any change in behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're about to let callers skip creation of the reference database when
calling `init_db()`. Extract the logic into a standalone function so
that it becomes easier to do this refactoring.
While at it, expand the comment that explains why we always create the
"refs/" directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/block: reuse buffer to compute record keys
When iterating over entries in the block iterator we compute the key of
each of the entries and write it into a buffer. We do not reuse the
buffer though and thus re-allocate it on every iteration, which is
wasteful.
Refactor the code to reuse the buffer.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/block: introduce macro to initialize `struct block_iter`
There are a bunch of locations where we initialize members of `struct
block_iter`, which makes it harder than necessary to expand this struct
to have additional members. Unify the logic via a new `BLOCK_ITER_INIT`
macro that initializes all members.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/merged: reuse buffer to compute record keys
When iterating over entries in the merged iterator's queue, we compute
the key of each of the entries and write it into a buffer. We do not
reuse the buffer though and thus re-allocate it on every iteration,
which is wasteful given that we never transfer ownership of the
allocated bytes outside of the loop.
Refactor the code to reuse the buffer. This also fixes a potential
memory leak when `merged_iter_advance_subiter()` returns an error.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>