git.ipfire.org Git - thirdparty/git.git/log

setup: introduce explicit repository discovery

When setting up the global repository we intermix repository discovery
and repository configuration: we repeatedly call `set_git_work_tree()`
and `apply_and_export_relative_gitdir()` until we're happy with the
result. The result of this is then a partially-configured repository
that we use for further setup.

This process is quite hard to follow, as it's never quite clear which
parts of the repository have been configured already and which haven't.
Furthermore, it means that the repository configuration is distributed
across many different places instead of having it neatly contained in a
single location. Ultimately, this is the reason that we cannot use a
central function like `repo_init()`.

Refactor the logic so that we stop partially-configuring a repository
and instead populate a new `struct repo_discovery`. This allow us to
essentially split repository setup into two phases:

  - The first phase only figures out parameters required to configure
    the repository.

  - The second phase then takes these parameters and applies them to the
    repository.

Like this, we'll never end up with a partially-configured repository and
can eventually extend `repo_init()` to handle the full initialization
for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: split up concerns of `setup_git_env_internal()`

The function `setup_git_env_internal()` does two completely unrelated
things:

  - It configures the repository's gitdir and propagates environment
    variables into it.

  - It configures a couple of global parameters via environment
    variables.

The function is called when we initialize the repository's path, but
it's also called via `chdir_notify_register()` whenever we change the
current working directory. While we indeed have to reconfigure the
gitdir in case it's a relative path, it doesn't make sense to reapply
the global environment variables.

Split up concerns of this function along the above delineation. Handling
of the global environment variables is moved into `init_git()`, as they
can be considered part of our setup procedure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: unify setup of shallow file

It is possible to configure an arbitrary "shallow" file via two
mechanisms, and the respective logic to handle these is split across two
locations:

  - Via the "GIT_SHALLOW_FILE" environment variable, which is handled in
    `setup_git_env_internal()`.

  - Via the global "--shallow-file=" command line option, which is
    handled in `handle_options()`.

We can rather easily unify this logic by not configuring the shallow
file in `handle_options()`, but instead overwriting the environment
variable. The environment variable itself is then handled inside of
`apply_repository_format()`, which is responsible for configuring a
discovered Git directory.

This new logic is similar in nature to how we handle the other global
options already, all of which end up setting an environment variable.
So for one this gives us more consistency. But more importantly, this
change means that `the_repository` will not contain any relevant state
anymore before we hit `apply_repository_format()` once we're at the end
of this patch series. Consequently, it will become possible for us to
completely discard `the_repository` and populate it anew.

Note that on first sight, this change looks like it might change the
precedence order. Before this change, we used to configure the shallow
file in the arguments handler first, and then it looks like we override
it via the environment variable. What's important to note though is the
last parameter to `set_alternate_shallow_file()`, which tells us whether
we want to overwrite a preexisting value, and when applying the value
from the environment we tell it not to overwrite preexisting values. So
in effect, the command line has precedence over the environment. After
this change, we now overwrite preexisting environment variables when we
see the argument, and consequently we keep the precedence order in tact.

With this change though we don't need the final parameter anymore that
tells `set_alternate_shallow_file()` whether or not to overwrite. We
only have a single callsite for this function now, and that function is
itself only ever called exactly once. Remove that parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: mark bogus worktree in `apply_repository_format()`

When a repository is configured to have both "core.worktree" and
"core.bare" we emit a warning and mark the worktree configuration as
bogus so that the next call to `setup_work_tree()` will cause us to die.
This allows us to still use the misconfigured repository, at least as
long as we don't try to use its worktree.

This condition is handled in `setup_explicit_git_dir()`. In a subsequent
commit we'll refactor this function so that it doesn't receive a repo as
input anymore though, and consequently we cannot set the "bogus" bit
anymore.

Move the logic into `apply_repository_format()` instead to prepare for
this. While at it, fix up formatting a bit.

Note that this change requires us to also explicitly unset the value of
"core.worktree" in case we have the "GIT_WORK_TREE" environment variable
set. This is because the environment variable overrides the repository's
configuration, and we don't want to warn or die in case the work tree
has been configured explicitly regardless of whether or not "core.bare"
is set.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: rename `check_repository_format_gently()`

The function `check_repository_format_gently()` receives a format as
input. An unknowing reader may thus suspect that this function actually
checks the passed-in format for consistency. While the function indeed
checks the repository format, it actually serves two purposes:

  - It reads the repository's format and populates the passed-in format
    with that information.

  - It then indeed checks whether the format is consistent.

Rename the function to `read_and_verify_repository_format()` to clarify
its functionality. While at it, reorder the parameters so that the
format comes first to better match other functions that pass around the
format.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Start Git 2.56 cycle

This time, do not forget to update GIT-VERSION-GEN to say 2.55.GIT

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'sg/t3420-do-not-grep-in-missing-file'

A test checking interactions between git rebase --quit and
autostash in t3420-rebase-autostash.sh has been corrected to use
test_path_is_missing instead of ! grep on a file that shouldn't
exist in the conflicted state.

* sg/t3420-do-not-grep-in-missing-file:
t3420-rebase-autostash: don't try to grep non-existing files

Merge branch 'ps/connected-generic-promisor-checks'

The connectivity check has been refactored to search for promisor
objects in a generic way using the object database interface,
rather than iterating packfiles directly. This allows connectivity
checks to work properly in repositories that do not use packfiles.

* ps/connected-generic-promisor-checks:
  connected: search promisor objects generically
  connected: split out promisor-based connectivity check
  odb/source-packed: support flags when iterating an object prefix
  odb/source-packed: extract logic to skip certain packs

Merge branch 'ps/refs-onbranch-fixes'

Reference backend configuration has been updated to load lazily to
avoid recursive calls during repository initialization when 'onbranch'
configuration conditions are evaluated. This has also fixed a memory
leak and allowed the unused `chdir_notify_reparent()` machinery to be
dropped.

* ps/refs-onbranch-fixes:
  refs: protect against chicken-and-egg recursion
  refs/reftable: lazy-load configuration to fix chicken-and-egg
  reftable: split up write options
  refs/files: lazy-load configuration to fix chicken-and-egg
  refs: move parsing of "core.logAllRefUpdates" back into ref stores
  repository: free main reference database
  chdir-notify: drop unused `chdir_notify_reparent()`
  refs: unregister reference stores from "chdir_notify"
  setup: don't apply "GIT_REFERENCE_BACKEND" without a repository
  setup: stop applying repository format twice
  setup: inline `check_and_apply_repository_format()`

Merge branch 'wy/doc-clarify-review-replies'

Documentation on community contribution guidelines has been updated to
encourage replying to review comments before rerolling, and to advise
a default limit of at most one reroll per day to give reviewers across
different time zones enough time to participate.

* wy/doc-clarify-review-replies:
doc: advise batching patch rerolls
doc: encourage review replies before rerolling

Merge branch 'jk/repo-info-path-keys'

The "git repo info" command has been taught new keys to output both
absolute and relative paths for "gitdir" and "commondir", supported by
a new path-formatting helper extracted from "git rev-parse".

* jk/repo-info-path-keys:
  repo: add path.gitdir with absolute and relative suffix formatting
  repo: add path.commondir with absolute and relative suffix formatting
  path: extract format_path() and use in rev-parse

Merge branch 'mv/log-follow-mergy'

"git log --follow" has been updated to better handle non-linear
history, in which the path being tracked gets renamed differently in
multiple history lines.

* mv/log-follow-mergy:
log: improve --follow following renames for non-linear history

Merge branch 'pw/status-rebase-todo'

The display of the rebase todo list in "git status" has been
improved to correctly abbreviate object IDs for more commands and
avoid misinterpreting refs as object IDs.

* pw/status-rebase-todo:
status: improve rebase todo list parsing
sequencer: factor out parsing of todo commands

Merge branch 'tb/pack-path-walk-bitmap-delta-islands'

The pack-objects command has been updated to support reachability
bitmaps and delta-islands concurrently with the `--path-walk` option,
allowing faster packaging by falling back to path-walk when bitmaps
cannot fully satisfy the request.

* tb/pack-path-walk-bitmap-delta-islands:
  pack-objects: support `--delta-islands` with `--path-walk`
  pack-objects: extract `record_tree_depth()` helper
  pack-objects: support reachability bitmaps with `--path-walk`
  t/perf: drop p5311's lookup-table permutation

Merge branch 'jc/submittingpatches-design-critiques'

The documentation in SubmittingPatches has been updated to clarify how
patch contributors should respond to design and viability critiques,
and how the resolution of such critiques should be recorded in the
final commit messages.

* jc/submittingpatches-design-critiques:
SubmittingPatches: address design critiques

Merge branch 'kh/submittingpatches-trailers'

The trailer sections in SubmittingPatches have been updated to
encourage use of standard trailers.

* kh/submittingpatches-trailers:
  SubmittingPatches: note that trailer order matters
  SubmittingPatches: be consistent with trailer markup
  SubmittingPatches: document Based-on-patch-by trailer
  SubmittingPatches: discourage common Linux trailers
  SubmittingPatches: encourage trailer use for substantial help

Merge branch 'mh/fetch-follow-remote-head-config'

The `fetch.followRemoteHEAD` configuration variable has been added to
provide a default for the per-remote `remote.<name>.followRemoteHEAD`
setting.

* mh/fetch-follow-remote-head-config:
  fetch: fixup a misaligned comment
  fetch: add configuration variable fetch.followRemoteHEAD
  fetch: refactor do_fetch handling of followRemoteHEAD
  fetch: return 0 on known git_fetch_config
  fetch: rename function report_set_head
  t5510: cleanup remote in followRemoteHEAD dangling ref test
  doc: explain fetchRemoteHEADWarn advice
  fetch: fixup set_head advice for warn-if-not-branch

Merge branch 'po/hash-object-size-t'

Support for hashing loose or packed objects larger than 4GB on Windows
and other LLP64 platforms has been improved by converting object header
buffers and data-handling functions from 'unsigned long' to 'size_t'.

* po/hash-object-size-t:
  hash-object: add a >4GB/LLP64 test case using filtered input
  hash-object: add another >4GB/LLP64 test case
  hash-object --stdin: verify that it works with >4GB/LLP64
  hash algorithms: use size_t for section lengths
  object-file.c: use size_t for header lengths
  hash-object: demonstrate a >4GB/LLP64 problem

Merge branch 'ty/move-protect-hfs-ntfs'

The global configuration variables protect_hfs and protect_ntfs have
been migrated into struct repo_config_values to tie them to
per-repository configuration state.

* ty/move-protect-hfs-ntfs:
environment: use 'repo->initialized' for repo_protect_hfs() and repo_protect_ntfs()
environment: move 'protect_hfs' and 'protect_ntfs' into 'repo_config_values'

Merge branch 'ps/odb-source-packed'

The packed object source has been refactored into a proper struct
odb_source.

* ps/odb-source-packed:
  odb/source-packed: drop pointer to "files" parent source
  midx: refactor interfaces to work on "packed" source
  odb/source-packed: stub out remaining functions
  odb/source-packed: wire up `freshen_object()` callback
  odb/source-packed: wire up `find_abbrev_len()` callback
  odb/source-packed: wire up `count_objects()` callback
  odb/source-packed: wire up `for_each_object()` callback
  odb/source-packed: wire up `read_object_stream()` callback
  odb/source-packed: wire up `read_object_info()` callback
  packfile: use higher-level interface to implement `has_object_pack()`
  odb/source-packed: wire up `reprepare()` callback
  odb/source-packed: wire up `close()` callback
  odb/source-packed: start converting to a proper `struct odb_source`
  odb/source-packed: store pointer to "files" instead of generic source
  packfile: move packed source into "odb/" subsystem
  packfile: split out packfile list logic
  packfile: rename `struct packfile_store` to `odb_source_packed`

Merge branch 'td/ref-filter-restore-prefix-iteration'

Commands that list branches and tags (like git branch and git tag)
have been optimized to pass the namespace prefix when initializing
their ref iterator, avoiding a loose-ref scaling regression in
repositories with many unrelated loose references.

* td/ref-filter-restore-prefix-iteration:
ref-filter: restore prefix-scoped iteration

Merge branch 'en/ort-harden-against-corrupt-trees'

The 'ort' merge backend has been hardened against corrupt trees by
ensuring it aborts under appropriate error conditions.

* en/ort-harden-against-corrupt-trees:
  cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
  merge-ort: abort merge when trees have duplicate entries
  merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
  merge-ort: drop unnecessary show_all_errors from collect_merge_info()
  merge-ort: propagate callback errors from traverse_trees_wrapper()

Merge branch 'jk/setup-gitfile-diag-fix'

A regression in the error diagnosis code for invalid .git files has
been fixed, avoiding a potential NULL-pointer crash when reporting
that a .git file does not point to a valid repository.

* jk/setup-gitfile-diag-fix:
read_gitfile(): simplify NOT_A_REPO error message

Merge branch 'rs/cat-file-default-format-optim'

The default format path of git cat-file --batch has been optimized
to use strbuf_add_oid_hex() and strbuf_add_uint() instead of
strbuf_addf(), yielding a noticeable speedup.

* rs/cat-file-default-format-optim:
cat-file: speed up default format

Merge branch 'ps/doc-recommend-b4'

Project-specific configuration for b4 has been introduced, and the
documentation has been updated to recommend using it as a
streamlined method for submitting patches.

* ps/doc-recommend-b4:
  b4: introduce configuration for the Git project
  MyFirstContribution: recommend the use of b4
  MyFirstContribution: recommend shallow threading of cover letters

Merge branch 'ps/setup-drop-global-state'

The refactoring of 'setup.c' has been continued to drop remaining
global state (`git_work_tree_cfg`, `is_bare_repository_cfg`), updating
`is_bare_repository()` to no longer implicitly rely on
`the_repository`.

* ps/setup-drop-global-state:
  treewide: drop USE_THE_REPOSITORY_VARIABLE
  environment: stop using `the_repository` in `is_bare_repository()`
  environment: split up concerns of `is_bare_repository_cfg`
  builtin/init: stop modifying `is_bare_repository_cfg`
  setup: remove global `git_work_tree_cfg` variable
  builtin/init: simplify logic to configure worktree
  builtin/init: stop modifying global `git_work_tree_cfg` variable

Merge branch 'cc/promisor-auto-config-url-more'

The handling of promisor-remote protocol capability has been updated
to allow the other side to add to the list of promisor remotes via the
'promisor.acceptFromServerURL' configuration variable.

* cc/promisor-auto-config-url-more:
  doc: promisor: improve acceptFromServer entry
  promisor-remote: auto-configure unknown remotes
  promisor-remote: trust known remotes matching acceptFromServerUrl
  promisor-remote: introduce promisor.acceptFromServerUrl
  promisor-remote: add 'local_name' to 'struct promisor_info'
  urlmatch: add url_normalize_pattern() helper
  urlmatch: change 'allow_globs' arg to bool
  t5710: simplify 'mkdir X' followed by 'git -C X init'

Merge branch 'hn/status-pull-advice-qualified'

Advice shown by "git status" when the local branch is behind or has
diverged from its push branch has been updated to suggest "git pull
<remote> <branch>".

* hn/status-pull-advice-qualified:
remote: qualify "git pull" advice for non-upstream compareBranches

t: add greplint to detect bare grep assertions

Without a lint guard, bare grep assertions will creep back into
tests over time, defeating the previous commit's conversion.

Add greplint.pl to catch bare 'grep' used as a test assertion
(where 'test_grep' should be used) and '! test_grep' (where
'test_grep !' should be used).

greplint.pl reuses the shared shell parser from lib-shell-parser.pl
to tokenize test bodies.  The Lexer collapses heredocs, command
substitutions, and quoted strings into single tokens, so 'grep'
appearing inside these contexts is not flagged.  A flat walk over
the token stream tracks command position and pipeline state to
distinguish assertion greps from filter greps.

For double-quoted test bodies, a source-line walk counts
backslash-continuation lines that the Lexer consumes without
emitting into the body text, adjusting the reported line number
accordingly.

Add test fixtures in greplint/ (modeled on chainlint/) covering
detection of bare grep assertions, correct skipping of filters,
pipelines, redirects, command substitutions, and lint-ok annotations.

Wire into the Makefile as:
  - test-greplint: runs greplint.pl on $(T) $(THELPERS) $(TPERF)
  - check-greplint: runs greplint.pl on fixtures, diffs against expected
  - clean-greplint: removes temp dir

Add eol=lf entries in t/.gitattributes for greplint fixtures,
matching chainlint, so that check-greplint passes on Windows
where core.autocrlf would otherwise cause CRLF mismatches
between expected and actual output.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: convert grep assertions to test_grep

Replace bare grep with test_grep in test assertions across the
suite, including sourced test helpers (lib-*.sh, *-tests.sh).
test_grep prints the contents of the file being searched on
failure, making debugging easier than a bare grep which fails
silently.

Only assertion-style greps are converted: grep used as a filter
in pipelines, command substitutions, conditionals, or with
redirected I/O is left as-is with a "# lint-ok" annotation.
Existing '! test_grep' calls are rewritten to 'test_grep !' so
that the diagnostic output is preserved on failure.

test_grep requires the file it reads to exist, so '! grep'
assertions that inspect a file whose presence is conditional need
care.  In t5537 the '.git/shallow' file is still present after the
repack (the client remains shallow), so the assertion is
converted like any other.  In t1400 the '.git/packed-refs' file
exists only with the files backend, so its check is guarded with a
REFFILES prerequisite; the backend-agnostic 'git show-ref' check
that follows still runs under every backend.  In t7450 'git~2' is
the NTFS 8.3 short name of a '..git' file and only exists
when 8.3 short-name generation is enabled, so its check is guarded
with a 'test -f' on the path and uses test_grep inside the guard,
the same shape as t1400 (a plain test_grep would BUG when the
short name is absent).

The conversion was generated using a grep-assertion linter
(greplint.pl, added in the following commit) to identify bare
grep calls at command position.  To reproduce, from the t/
directory:

    # Step 1: annotate the two data-filter greps (grep produces
    # data, not a verdict) so the linter skips them.
    sed -i '/grep -vf before commits\.raw/s/$/ # lint-ok: data filter/' \
        t5326-multi-pack-bitmaps.sh
    sed -i '/grep -E "^\[0-9a-f\].*|| :/s/$/ # lint-ok: data filter/' \
        t5702-protocol-v2.sh

    # Step 1b: two '! grep' assertions need more than a mechanical
    # conversion; handle them by hand before the linter-driven steps
    # below so it leaves them alone.
    #
    # t1400: '.git/packed-refs' is absent under reftable, so guard the
    # check with REFFILES (a plain test_grep would BUG on the missing
    # file):
    #
    #      git update-ref -d HEAD $B &&
    #  -   ! grep "$m" .git/packed-refs &&
    #  +   if test_have_prereq REFFILES
    #  +   then
    #  +           test_grep ! "$m" .git/packed-refs
    #  +   fi &&
    #      test_must_fail git show-ref --verify -q $m
    #
    # t7450: git~2 is an NTFS 8.3 short name that exists only when
    # short-name generation is enabled, so guard the check on its
    # presence with 'test -f' and note in a comment why the path can
    # be absent (a plain test_grep would BUG when it is):
    #
    #  -   ! grep gitdir squatting-clone/d/a/git~2
    #  +   if test -f squatting-clone/d/a/git~2
    #  +   then
    #  +           test_grep ! gitdir squatting-clone/d/a/git~2
    #  +   fi

    # Step 2: reorder pre-existing '! test_grep' to 'test_grep !'
    # (must come before steps 3-4 so greplint does not see them)
    sed -i 's/! test_grep/test_grep !/' t0031-lockfile-pid.sh
    sed -i 's/! test_grep/test_grep !/' t5300-pack-object.sh
    sed -i 's/! test_grep/test_grep !/' t5319-multi-pack-index.sh

    # Step 3: convert '! grep' -> 'test_grep !'
    perl greplint.pl *.sh 2>&1 | cut -d: -f1,2 |
    while IFS=: read f l; do
        sed -i "${l}s/! *grep/test_grep !/" "$f"
    done

    # Step 4: convert remaining 'grep' -> 'test_grep'
    perl greplint.pl *.sh 2>&1 | cut -d: -f1,2 |
    while IFS=: read f l; do
        sed -i "${l}s/grep/test_grep/" "$f"
    done

To verify, run: make -C t test-greplint

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: fix Lexer line count for $() inside double-quoted strings

scan_dqstring's post-loop newline counter re-counts newlines that
were already counted during recursive parsing of $() bodies. This
happens because scan_dollar returns text containing newlines (from
multi-line command substitutions), and the catch-all counter at the
end of scan_dqstring counts all of them again.

Fix this by counting newlines inline as non-special characters are
consumed, and removing the post-loop catch-all. Each newline is
now counted exactly once: literal newlines at the inline match,
line splices at the backslash handler, and $() newlines by
scan_token during the recursive parse.

This is a latent bug: any consumer that relies on token line
numbers rather than byte offsets would get incorrect results for
tokens following a multi-line $() inside a double-quoted string.
chainlint is not affected because it annotates the original body
text using byte offsets, not token line numbers.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: extract chainlint's parser into shared module

Move chainlint.pl's Lexer, ShellParser, and ScriptParser into a
shared module (lib-shell-parser.pl) so other lint tools can reuse
the same shell parsing infrastructure. A subsequent commit adds
greplint.pl, which needs the same tokenizer to correctly identify
command boundaries.

ScriptParser's check_test() becomes a no-op in the shared module.
chainlint.pl defines ChainlintParser (extending ScriptParser)
with the &&-chain check_test() implementation.

No functional change: chainlint produces the same output and
check-chainlint self-tests pass.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: fix grep assertions missing file arguments

Three grep assertions were missing their file arguments, causing
them to read from empty stdin instead of the intended file:

- t2402: '! grep ...' should read from 'out', matching the
  grep on the preceding line.
- t7507: the closing quote is in the wrong place, making the
  entire 'diff --git actual' a single pattern with no file
  argument instead of pattern 'diff --git' and file 'actual'.
- t7700: '! grep ...' should read from 'packlist', matching
  the redirect on the preceding line.

Without file arguments these greps always succeed (empty stdin
matches nothing), so the assertions were not actually checking
anything.  All three tests pass with the corrected file arguments,
confirming the intended behavior is sound.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/README: document test_grep helper

test_grep is a wrapper around grep for test assertions that prints
the file contents on failure for easier debugging. It also accepts
'!' as its first argument for negation, which preserves the
diagnostic output that '! test_grep' would suppress.

Despite being widely used (and the preferred replacement for bare
grep in assertions), test_grep has no entry in t/README alongside
the other documented helpers like test_cmp and test_line_count.
Add one.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/refs: add "rename" subcommand

Add a "rename" subcommand to git-refs(1) with the syntax:

$ git refs rename <oldref> <newref>

It renames <oldref> together with its reflog to <newref>; even when used
on a local branch ref, the current value and the reflog of the ref are
the only things that are renamed. Document it and redirect casual users
to "git branch -m" if that is what they wanted to do.

Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/refs: add "create" subcommand

The "update" subcommand cannot only update an existing reference, but it
can also create new branches and delete existing branches by specifying
the all-zeroes object ID as either old or new value. Despite that, we
already have the "delete" subcommand as a handy shortcut so that a user
can easily delete a branch. This relieves them of needing to understand
the more arcane uses of the "update" command, and of counting the number
of zeroes they need to pass.

But while we have a "delete" subcommand, we don't have an equivalent
that would allow the user to create a new branch, which creates a
certain asymmetry.

Add a new "create" subcommand to plug this gap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/refs: add "update" subcommand

Add a new "update" subcommand which mirrors `git update-ref <refname>
<oldoid> <newoid>`. This follows the same reasoning as the preceding
commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/refs: add "delete" subcommand

Reference-related functionality in Git is currently spread across many
different commands: git-update-ref(1), git-for-each-ref(1),
git-show-ref(1), git-pack-refs(1) and git-symbolic-ref(1). This makes it
hard for users to discover what functionality we have available to work
with references.

We have thus started to consolidate this functionality into git-refs(1),
which is a toolbox of everything related to references. Until now, the
command doesn't handle functionality of git-update-ref(1).

Fix this gap by introducing a new "delete" subcommand, which is the
equivalent of `git update-ref -d`.

Note that we're intentionally not using a generic "write" subcommand
with a "-d" flag. This is rather harder to discover, and subcommands
that are implmented as flags tend to be hard to reason about in the code
as we'd have to handle mutually-exclusive flags that stem from the other
subcommand-like modes.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/refs: drop `the_repository`

We still have a couple of uses of `the_repository` in "builtin/refs.c".
All of those are trivial to convert though as the command always
requires a repository to exist.

Convert them to use the passed-in repository and drop
`USE_THE_REPOSITORY_VARIABLE`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

blame: reserve mark column only if necessary

git blame prepends commit hashes of boundary commits with "^", ignored
commits with "?" and unblamable commits with "*" and reserves one column
for them by extending the hash abbreviation, to avoid showing ambiguous
hashes.

This reserved column wastes precious screen space, which can be
especially irritating when using the option -b to blank out boundary
commit hashes and not ignoring any commits. Reserve it only as needed,
i.e. if any of those cases are actually shown.

Pointed-out-by: Laszlo Ersek <laszlo.ersek@posteo.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitlab-ci: enable "GIT_TEST_LONG"

Starting with 7a094d68a2 (ci: run expensive tests on push builds to
integration branches, 2026-05-08) we run expensive tests in our CI for
certain events. So far, this has only been wired up for GitHub Workflows
though, which creates a test gap for GitLab CI.

Plug this gap by also making this work for the latter.

Note that these tests cannot be run on the Windows runners, as they only
have 7.5GB of RAM. This is insufficient for some of the EXPENSIVE tests,
so we explicitly disable "GIT_TEST_LONG" on these jobs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitlab-ci: disable RAM disk on macOS jobs

When we added the macOS jobs to GitLab CI in 56090a35ab (ci: add macOS
jobs to GitLab CI, 2024-01-18) we had to work around some very slow
disks. This workaround essentially creates a RAM disk that we mount,
where all test data is being written into RAM instead of the real disk.

In the next commit though we're about to enable "GIT_TEST_LONG", which
will make tests run that are marked with the "EXPENSIVE" prerequisite.
This change will make a couple of tests run that write up to 8GB of data
into the test output directory. As our RAM disk is only 4GB in size,
this change will cause ENOSPC errors.

We could accommodate for this by increasing the size of the RAM disk.
In c9d708b7fc (gitlab-ci: upgrade macOS runners, 2026-05-21) we have
upgraded our runners to use the "large" runners, which have 16GB of RAM
available. So we could easily expand the RAM disk to a capacity of for
example 12GB. But some test runs have shown that this is still quite
flaky overall, as we get quite close to our limits.

Instead, drop the workaround completely. This does indeed slow down
execution of the test jobs:

  - osx-clang goes from 18 minutes to 25 minutes

  - osx-meson goes from 21 minutes to 33 minutes

  - osx-reftable stays at 21 minutes

The last one seems like an outlier. The only explanation that I have is
that we end up writing significantly less files with the reftable
backend, which ultimately causes less I/O.

Overall though, it's preferable to have something that works with the
least amount of flakiness compared to having something else that is
faster but unstable. Despite that, the macOS jobs aren't even the
slowest jobs, so this doesn't extend the overall pipeline's length.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: use `test_bool_env` to parse GIT_TEST_LONG

It's currently hard to explicitly disable GIT_TEST_LONG by setting it to
`false`. Fix this by using `test_bool_env` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7900: clean up large EXPENSIVE repository

One of the tests in t7900 is marked with EXPENSIVE because we create a
repository with 2GB of data that we end up repacking. We never clean up
that repository though, so we occupy the full 2GB of data until the end
of the test suite.

Besides clogging our disk, having an EXPENSIVE test that alters the
repository's state used by subsequent tests is also a bad idea, as it
can easily have an impact on the heuristics used by other maintenance
tasks.

Adapt the test so that we create the data in a standalone repository
that we clean up at the end of the test. While at it, also disable
auto-maintenance so that it does not race with our manual maintenance.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7508: skip EXPENSIVE test that is broken without SIZE_T_IS_64BIT

One of the tests in t7508 is marked as EXPENSIVE because it ends up
creating and adding files that are multiple gigabytes in size. This
takes a while to complete, hence the EXPENSIVE prerequisite.

Besides being expensive though the test can only work on systems where
`size_t` is at least 64 bit. This is because one of the created files
is larger than 4GB, and because Git tracks object size via `size_t` it
will eventually blow up.

This test has also been blowing up in the "linux32" CI job in GitHub
Workflows since 7a094d68a2 (ci: run expensive tests on push builds to
integration branches, 2026-05-08). But that job doesn't only fail, it
also hangs, and that has been concealing the failure.

Fix the issue by marking the test as requiring 64 bit `size_t`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5608: reduce maximum disk usage

The tests in t5608 perform a couple of clones of repositories that are
somewhat large. Ultimately, we end up creating:

  - A setup repository that contains 2GB of uncompressed pack data.

  - A bare clone that contains the same 2GB of data.

  - A clone with worktree writes a 2GB packfile and a 2GB worktree.

  - A second setup repository that contains a 4GB packfile.

  - Two 4GB clone of that repository.

Some of these clones ultimately hardlink files, which ensures that we at
least don't end up with more than 20GB of data. But at the end of the
test we still have around 16GB of data, which is only a tiny bit better.

Refactor the test to prune repositories after they have no use anymore.
This reduced the peak disk usage of this test to 8GB.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t4141: fix inefficient use of dd(1)

In t4141 we generate a patch that is roughly 1GB in size to verify that
git-apply(1) indeed rejects that patch. We generate that patch by
prepending a patch header and then executing `test-tool genzeros`
without a limit. This causes us to print infinitely many zeros, and we
limit the overall amount of generated bytes via `test_copy_bytes`.

This test setup is extremely expensive, as `test_copy_bytes` is
implemented via `dd ibs=1 count="$1"`, which copies data one byte at a
time. So as we write 1GB of data, we end up doing 1 billion reads and
writes. This naturally takes a while: it takes 6 minutes on my system,
and around 40 minutes in some CI jobs!

We can do much better though, as genzeros already knows to handle an
optional limit of how much data it is supposed to write, which allows us
to remove the call to `test_copy_bytes`. Furthermore, it has already
been optimized to generate the data fast.

And indeed, doing this conversion drops the test execution to less than
a second on my machine. That means that in theory it becomes feasible to
drop the EXPENSIVE prerequisite now. But git-apply(1) still soaks up 1GB
of data into memory, which may count as being expensive. Consequently,
we keep the prerequisite intact.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t0021: skip EXPENSIVE test that is broken without SIZE_T_IS_64BIT

One of the tests in t0021 writes a 2GB file and then roundtrips it
through the clean/sumdge filters. This test is broken on 32 bit
platforms because they typically don't handle files larger then
`SSIZE_MAX` well at all.

While our CI has a "linux32" job that should in theory hit this issue,
we never noticed it because we didn't use to run EXPENSIVE tests until
7a094d68a2 (ci: run expensive tests on push builds to integration
branches, 2026-05-08). And after that commit, the test does not fail but
instead hangs completely.

Ideally, we'd of course properly detect this situation and then test for
it. In practice, this turns out to be hard as the test failure are not
reliable as they often (but not always) run into ENOMEM errors.

Instead, skip the test altogether.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

README: add GitLab CI badge to make it more discoverable

The Git project uses CI systems from both GitHub and GitLab. While both
of these systems are extensively used in day-to-day work, we only have a
link to the GitHub Workflows in our README, which makes the GitLab CI
hard to discover.

Improve the situation by adding a second badge for GitLab CI to our
README.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: make `exit_process()` own the process handle on all paths

After "mingw: kill child processes in a gentler way", the ownership of
the HANDLE passed to `exit_process()` and `terminate_process_tree()` is
inconsistent. `terminate_process_tree()` always closes the handle;
`exit_process()` closes it on success and on the terminate-tree
fallback, but leaks it on the early return where GetExitCodeProcess()
fails or reports the process is no longer STILL_ACTIVE.

`mingw_kill()` compensated by closing the handle on its own error path,
which is a double-close on every error path that does not hit that one
leaky branch -- the callee has already closed the handle by then.
Coverity flagged the resulting use-after-free as CID 1437238.

Pin down the invariant that `exit_process()` and
`terminate_process_tree()` own the handle from the call onward and close
it on every return path; with that, the bogus close in `mingw_kill()`
goes away.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsmonitor: plug token-data leak on early daemon-startup failures

`fsmonitor_run_daemon()` allocates `state.current_token_data`
before any subordinate setup step that may fail (alias resolution,
listener/health constructors, asynchronous IPC server init). On
the successful path the listener thread takes ownership and clears
the field during its teardown, so the `done:` cleanup block sees a
NULL pointer. On every early-error path, however, control jumps
straight to `done:` with the freshly allocated token data still
referenced, and it is never freed, as Coverity flagged.

Free it at the top of `done:` and clear the pointer. The success
path is a no-op (the pointer is already NULL there); the error
paths now drop the otherwise-leaked allocation.
`fsmonitor_free_token_data()` is NULL-safe and asserts
`client_ref_count == 0`, which holds trivially here because the
IPC server has not yet begun accepting clients when these failures
occur.

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/table: release filter on error path

`reftable_table_refs_for_unindexed()` allocates a filtering_ref_iterator
and then calls `reftable_buf_add()` to populate its oid buffer. On
success ownership is transferred to the output iterator, but if
`reftable_buf_add()` fails, the goto-out cleanup only frees the table
iterator and walks away from both the filter allocation and the oid
buffer that `reftable_buf_add()` may have grown.

Release filter->oid and free filter alongside the existing table
iterator cleanup.

Reported by Coverity as CID 1671512 ("Resource leak").

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

imap-send: avoid leaking the IMAP upload buffer

When uploading messages via libcurl, `curl_append_msgs_to_imap()`
accumulates each one in a strbuf that grows across loop iterations but
is never released before the function returns.

Release it alongside the existing libcurl cleanup.

Reported by Coverity as CID 1671507 ("Resource leak").

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree: fix resource leaks when branch creation fails

In the "add" subcommand, when `run_command()` fails while creating a new
branch (line 948), the function returns -1 immediately without freeing
the allocations made earlier: path (from prefix_filename at line 858),
opt_track, branch_to_free, and new_branch_to_free.

Redirect the error return through the existing cleanup block at the end
of the function so all four allocations are properly freed.

Pointed out by Coverity.

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

submodule: fix cwd leak in `get_superproject_working_tree()`

`get_superproject_working_tree()` allocates cwd via `xgetcwd()` at the
top of the function, but two early-return paths (when not inside a work
tree, and when strbuf_realpath for "../" fails) return 0 without freeing
it.

Redirect these early returns through a cleanup label that frees cwd
before returning.

Pointed out by Coverity.

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

dir: free allocations on parse-error paths in `read_one_dir()`

Two of `read_one_dir()`'s parse-error early returns leak ud.untracked
and ud.dirs. Plug them.

The other early returns in the same function are fine: they occur after
the `xmalloc()`+`memcpy()` that copies ud into `*untracked_`, at which
point ownership is transferred to the caller.
`read_untracked_extension()` then releases everything via
`free_untracked_cache()` on failure.

Pointed out by Coverity.

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

line-log: avoid redundant copy that leaks in process_ranges

When `bloom_filter_check()` indicates that a commit does not touch any
of the tracked paths, `line_log_process_ranges_arbitrary_commit()`
propagates the current ranges to the parent by calling
`line_log_data_copy()` and passing the copy to add_line_range().
However, `add_line_range()` always makes its own copy internally (via
line_log_data_copy or line_log_data_merge), so the caller's copy is
never freed and leaks every time this path is taken.

Pass range directly to `add_line_range()` instead of making a redundant
intermediate copy. The callee's internal copy handles ownership
correctly.

Pointed out by Coverity.

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

run-command: avoid `close(-1)` in `start_command()` error paths

When `start_command()` fails to set up a pipe partway through, it rolls
back by closing the pipe ends it has already opened. For descriptors
supplied by the caller rather than allocated locally, that rollback
tested `if (cmd->in)` / `if (cmd->out)` before calling close(). The
CHILD_PROCESS_INIT default of -1 ("no descriptor") is non-zero and so
passes the test, meaning a caller that sets cmd->no_stdin or
cmd->no_stdout without supplying a real fd ends up triggering close(-1)
on the error path.

The stdin-pipe failure branch a few lines above already uses the right
idiom, `if (cmd->out > 0)`, which rejects both the -1 sentinel and 0
(the parent's own standard streams). Apply it to the three remaining
rollback sites.

Reported by Coverity as CID 1049722 ("Argument cannot be negative").

Assisted-by: Opus 4.7
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

download_https_uri_to_file(): do not leak fd upon failure

When the `git-remote-https` command fails, we do not want to leak
`child_out`.

Pointed out by Coverity.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

loose: avoid closing invalid fd on error path

`write_one_object()` opens a file at line 186 and jumps to the errout
label on failure. The errout cleanup unconditionally calls `close(fd)`,
but when `open()` itself failed, fd is -1. Calling `close(-1)` is
harmless on most platforms (returns EBADF) but is undefined behavior per
POSIX and can confuse fd tracking in sanitizer builds.

Guard the close with fd >= 0.

Pointed out by Coverity.

Assisted-by: Claude Opus 4.6
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

load_one_loose_object_map(): fix resource leak

Pointed out by Coverity.

While at it, reduce near-duplicate clean-up code at the end of the
function.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

precompose_utf8: use a flex array for d_name

On macOS, git status may abort while reading a directory entry
whose UTF-8 name grows past NAME_MAX bytes:

  __chk_fail_overflow
  __strlcpy_chk
  precompose_utf8_readdir
  read_directory_recursive
  wt_status_collect
  cmd_status

The precompose wrapper already reallocates dirent_prec_psx for
long names, but d_name is declared as char[NAME_MAX + 1]. A
fortified libc can still see that declared object size and reject a
larger strlcpy bound, even though the allocation was grown.

Make d_name a FLEX_ARRAY and size allocations from offsetof(). That
matches the actual object layout with the dynamic allocation, so the
fortified copy sees a destination whose size can grow with max_name_len.

Add a regression test that creates an over-NAME_MAX non-ASCII basename
and runs status with core.precomposeunicode enabled.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci(dockerized): raise the PID limit for private repositories

Every once in a while I need to verify that Microsoft Git's test suite
passes for changes that are not yet meant for public consumption, and
since it was (made) too difficult to keep up a working Azure Pipeline
definition, I have to use GitHub Actions in a private GitHub repository
for that purpose.

In these tests, basically all Dockerized CI jobs fail consistently. The
symptom is something like:

  error: cannot create async thread: Resource temporarily unavailable

in the middle of a test, typically in the t5xxx-t6xxx range. The first
such error is immediately followed by plenty more of these errors, and
not a single test succeeds afterwards.

At first, I thought that maybe the massive parallelism I enjoy there is
the problem, and I thought that the cgroups limits might be shared
between the many containers that run on essentially the same physical
machine. But even reducing the matrix to just a single of those
Dockerized jobs runs into the very same problems.

The underlying reason seems to be a substantial difference in the hosted
runners that execute these Dockerized jobs: forcing the PID limit of the
container to a high number lets the jobs pass, even when running the
complete matrix of all 13 Dockerized jobs concurrently. But that's not
the only difference: The jobs seem to take a lot longer in these
containers than, say, in the containers made available to
https://github.com/git/git.

When forcing a PID limit of 64k in that private repository, the jobs
completed successfully, but they also took a lot longer, between 2x to
2.5x longer, i.e. painfully much longer. Reducing the PID limit to 16k,
the CI jobs still passed, but took an equally long amount of time.
Reducing the PID limit to 8k caused the errors to reappear.

Here are the numbers from three example runs, the first one forcing the
PID and nproc limit to 65536, the second one to 16384, the third run is
from the public git/git repository:

Job                           | 64k     | 16k     | reference
------------------------------|---------|---------|---------
almalinux-8                   | 19m 3s  | 16m 0s  | 9m 36s
debian-11                     | 20m 31s | 20m 3s  | 8m 5s
fedora-breaking-changes-meson | 16m 29s | 19m 19s | 9m 40s
linux-asan-ubsan              | 1h 10m  | 1h 11m  | 34m 36s
linux-breaking-changes        | 25m 39s | 25m 58s | 13m 15s
linux-leaks                   | 1h 9m   | 1h 10m  | 33m 30s
linux-meson                   | 28m 9s  | 27m 4s  | 13m 45s
linux-musl-meson              | 16m 32s | 13m 39s | 8m 6s
linux-reftable-leaks          | 1h 13m  | 1h 13m  | 34m 34s
linux-reftable                | 26m 2s  | 25m 48s | 13m 31s
linux-sha256                  | 26m 12s | 26m 3s  | 12m 36s
linux-TEST-vars               | 26m 5s  | 25m 21s | 13m 25s
linux32                       | 21m 16s | 19m 57s | 10m 44s

It does not look as if the PID limit is the reason for the longer
runtime, seeing as the 64k vs 16k timings deviate no more than as is
usual with GitHub workflows. So let's go for 16k.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/table: fix OOB read on truncated table

When opening a table we compute the size of its data section by
subtracting the footer size from the file size. We do not verify that
the file is actually large enough to contain both the header and the
footer though. With a truncated table the subtraction can thus
underflow, causing us to read the footer out of bounds:

  SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/pks/Development/git/build/t/unit-tests+0x2479a4) in __asan_memcpy
  Shadow bytes around the buggy address:
    0x7ccff6e0de80: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
    0x7ccff6e0df00: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
    0x7ccff6e0df80: fa fa fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x7ccff6e0e000: fd fd fd fd fa fa fa fa fa fa fa fa fd fd fd fd
    0x7ccff6e0e080: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
  =>0x7ccff6e0e100: fa fa fa fa fa[fa]00 00 00 00 00 00 00 00 00 00
    0x7ccff6e0e180: 00 00 00 00 00 00 00 04 fa fa fa fa fa fa fa fa
    0x7ccff6e0e200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x7ccff6e0e280: 00 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ccff6e0e300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7ccff6e0e380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb
  ==1500371==ABORTING

Verify that the file is large enough to contain both the header and the
footer before computing the table size.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/table: fix NULL pointer access when seeking to bogus offsets

When seeking an iterator to an arbitrary offset we may return a positive
value in case the offset points beyond the block. This makes sense when
iterating through multiple blocks of the same section, as the positive
value indicates to us that we're at the end of the table.

But when the offset originates from a section or index offset it is
supposed to point at a valid block, so an out-of-bounds value means that
the table is corrupt. Treating it as a normal end-of-iteration causes us
to silently report an empty section instead of surfacing the corruption,
and we are left with a partially-initialized block. This may later on
cause a NULL pointer exception:

  ==1486841==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55555598e02c bp 0x7fffffff4eb0 sp 0x7fffffff4e70 T0)
  ==1486841==The signal is caused by a READ memory access.
  ==1486841==Hint: address points to the zero page.
      #0 0x55555598e02c in reftable_block_type ./git/build/../reftable/block.c:392:9
      #1 0x55555598ee6e in block_iter_seek_key ./git/build/../reftable/block.c:536:35
      #2 0x5555559ae553 in table_iter_seek_linear ./git/build/../reftable/table.c:344:8
      #3 0x5555559adbff in table_iter_seek ./git/build/../reftable/table.c:450:9
      #4 0x5555559ada9c in table_iter_seek_void ./git/build/../reftable/table.c:460:9
      #5 0x555555992872 in reftable_iterator_seek_log_at ./git/build/../reftable/iter.c:281:9
      #6 0x555555992953 in reftable_iterator_seek_log ./git/build/../reftable/iter.c:287:9
      #7 0x55555583aa78 in test_reftable_table__seek_invalid_log_offset ./git/build/../t/unit-tests/u-reftable-table.c:257:20
      #8 0x5555557f684e in clar_run_test ./git/build/../t/unit-tests/clar/clar.c:335:3
      #9 0x5555557f2e69 in clar_run_suite ./git/build/../t/unit-tests/clar/clar.c:431:3
      #10 0x5555557f2882 in clar_test_run ./git/build/../t/unit-tests/clar/clar.c:636:4
      #11 0x5555557f375f in clar_test ./git/build/../t/unit-tests/clar/clar.c:687:11
      #12 0x5555557fa49d in cmd_main ./git/build/../t/unit-tests/unit-test.c:62:8
      #13 0x55555584cffa in main ./git/build/../common-main.c:9:11
      #14 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #15 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #16 0x555555694c24 in _start (./git/build/t/unit-tests+0x140c24)

  ==1486841==Register values:
  rax = 0x0000000000000000  rbx = 0x00007fffffff4ec0  rcx = 0x0000000000000000  rdx = 0x00007cfff6e2bd58
  rdi = 0x00007cfff6e2bd58  rsi = 0x00007bfff5da1020  rbp = 0x00007fffffff4eb0  rsp = 0x00007fffffff4e70
   r8 = 0x0000000000000000   r9 = 0x0000000000000002  r10 = 0x0000000000000000  r11 = 0x0000000000000017
  r12 = 0x00007fffffff5908  r13 = 0x0000000000000001  r14 = 0x00007ffff7ffd000  r15 = 0x0000555556056e90
  AddressSanitizer can not provide additional info.
  SUMMARY: AddressSanitizer: SEGV ./git/build/../reftable/block.c:392:9 in reftable_block_type
  ==1486841==ABORTING

Fix this by returning a proper error in `table_iter_seek_to()` when the
offset ranges beyond the block.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: fix OOB read with bogus restart offset

Restart points encode records in a given block that do not use prefix
compression and that can thus immediately be seeked to. These offsets
are encoded in the restart table, where each offset needs to point at
one of the records of the block. We do not verify this though, so a
bogus restart offset may cause an out-of-bounds read:

  ==1472280==ERROR: AddressSanitizer: SEGV on unknown address 0x7d8ff7de5f7f (pc 0x55555599502b bp 0x7fffffff4df0 sp 0x7fffffff4d40 T0)
  ==1472280==The signal is caused by a READ memory access.
      #0 0x55555599502b in get_var_int ./git/build/../reftable/record.c:30:6
      #1 0x555555995c2a in reftable_decode_keylen ./git/build/../reftable/record.c:177:6
      #2 0x55555598e85c in restart_needle_less ./git/build/../reftable/block.c:455:6
      #3 0x55555598895f in binsearch ./git/build/../reftable/basics.c:175:9
      #4 0x55555598e189 in block_iter_seek_key ./git/build/../reftable/block.c:543:6
      #5 0x555555814aee in test_reftable_block__corrupt_restart_offset ./git/build/../t/unit-tests/u-reftable-block.c:636:20
      #6 0x5555557f684e in clar_run_test ./git/build/../t/unit-tests/clar/clar.c:335:3
      #7 0x5555557f2e69 in clar_run_suite ./git/build/../t/unit-tests/clar/clar.c:431:3
      #8 0x5555557f2882 in clar_test_run ./git/build/../t/unit-tests/clar/clar.c:636:4
      #9 0x5555557f375f in clar_test ./git/build/../t/unit-tests/clar/clar.c:687:11
      #10 0x5555557fa49d in cmd_main ./git/build/../t/unit-tests/unit-test.c:62:8
      #11 0x55555584c25a in main ./git/build/../common-main.c:9:11
      #12 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #13 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #14 0x555555694c24 in _start (./git/build/t/unit-tests+0x140c24)

  ==1472280==Register values:
  rax = 0x00007d8ff7de5f7f  rbx = 0x00007fffffff4e00  rcx = 0x00007d8ff7de5f80  rdx = 0x00007bfff5b6af60
  rdi = 0x00007bfff5b6af40  rsi = 0x00007bfff592dfa0  rbp = 0x00007fffffff4df0  rsp = 0x00007fffffff4d40
   r8 = 0x00000000ff00002b   r9 = 0x00007d8ff7de5f7f  r10 = 0x00000f7ffeb25bf0  r11 = 0xf3f30000f1f1f1f1
  r12 = 0x00007fffffff58f8  r13 = 0x0000000000000001  r14 = 0x00007ffff7ffd000  r15 = 0x0000555556055fd0
  AddressSanitizer can not provide additional info.
  SUMMARY: AddressSanitizer: SEGV ./git/build/../reftable/record.c:30:6 in get_var_int

Guard against such restart offsets and signal an error to the caller via
`args.error`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: fix use of uninitialized memory when binsearch fails

When doing the binary search through our restart offsets we may hit an
error in case `restart_needle_less()` fails to decode the record at the
given offset. While we correctly detect this case and error out, it will
cause us to call `reftable_record_release()` on the yet-uninitialized
record.

Fix this by initializing the record earlier.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: fix OOB read with bogus restart count

The restart count is stored in the last two bytes of a block. We use it
without verification to compute the offset of the restart table. With a
bogus restart count that is large enough this computation underflows,
and the subsequent reads via the restart table access out-of-bounds
memory:

  ==129439==ERROR: AddressSanitizer: SEGV on unknown address 0x7d90f6dcd0ad (pc 0x55555598ce89 bp 0x7fffffff4ed0 sp 0x7fffffff4e80 T0)
  ==129439==The signal is caused by a READ memory access.
      #0 0x55555598ce89 in reftable_get_be24 ./git/build/../reftable/basics.h:125:9
      #1 0x55555598eabf in block_restart_offset ./git/build/../reftable/block.c:407:9
      #2 0x55555598e5d5 in restart_needle_less ./git/build/../reftable/block.c:431:17
      #3 0x5555559887e2 in binsearch ./git/build/../reftable/basics.c:165:13
      #4 0x55555598dfec in block_iter_seek_key ./git/build/../reftable/block.c:529:6
      #5 0x555555814517 in test_reftable_block__corrupt_restart_count ./git/build/../t/unit-tests/u-reftable-block.c:593:15
      #6 0x5555557f684e in clar_run_test ./git/build/../t/unit-tests/clar/clar.c:335:3
      #7 0x5555557f2e69 in clar_run_suite ./git/build/../t/unit-tests/clar/clar.c:431:3
      #8 0x5555557f2882 in clar_test_run ./git/build/../t/unit-tests/clar/clar.c:636:4
      #9 0x5555557f375f in clar_test ./git/build/../t/unit-tests/clar/clar.c:687:11
      #10 0x5555557fa49d in cmd_main ./git/build/../t/unit-tests/unit-test.c:62:8
      #11 0x55555584c12a in main ./git/build/../common-main.c:9:11
      #12 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #13 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #14 0x555555694c24 in _start (./git/build/t/unit-tests+0x140c24)

  ==129439==Register values:
  rax = 0x00007d90f6dcd0ad  rbx = 0x00007fffffff4f20  rcx = 0xf2f2f2f8f2f2f2f8  rdx = 0x0000000000000000
  rdi = 0x00007d90f6dcd0ad  rsi = 0x0000000000007fff  rbp = 0x00007fffffff4ed0  rsp = 0x00007fffffff4e80
   r8 = 0x0000000000000000   r9 = 0x0000000000000000  r10 = 0x0000000000000000  r11 = 0x0000000000000017
  r12 = 0x00007fffffff58e8  r13 = 0x0000000000000001  r14 = 0x00007ffff7ffd000  r15 = 0x00005555560550b0
  AddressSanitizer can not provide additional info.
  SUMMARY: AddressSanitizer: SEGV ./git/build/../reftable/basics.h:125:9 in reftable_get_be24

Verify that the restart table actually fits into the block.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: fix OOB read with bogus block size

The block size is read from the block header, which is untrusted data.
We use it without verification to access the restart count at the end of
the block as well as to compute the restart table offset. With a bogus
block size that exceeds the data we have actually read this can lead to
an out-of-bounds read:

  ==2274138==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c3ff6de2e3f at pc 0x55555598c6ea bp 0x7fffffff4ee0 sp 0x7fffffff4ed8
  READ of size 1 at 0x7c3ff6de2e3f thread T0
      #0 0x55555598c6e9 in reftable_get_be16 /home/pks/Development/git/build/../reftable/basics.h:119:20
      #1 0x55555598c252 in reftable_block_init /home/pks/Development/git/build/../reftable/block.c:343:18
      #2 0x555555813c70 in test_reftable_block__corrupt_block_size /home/pks/Development/git/build/../t/unit-tests/u-reftable-block.c:531:20
      #3 0x5555557f684e in clar_run_test /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:335:3
      #4 0x5555557f2e69 in clar_run_suite /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:431:3
      #5 0x5555557f2882 in clar_test_run /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:636:4
      #6 0x5555557f375f in clar_test /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:687:11
      #7 0x5555557fa49d in cmd_main /home/pks/Development/git/build/../t/unit-tests/unit-test.c:62:8
      #8 0x55555584b8aa in main /home/pks/Development/git/build/../common-main.c:9:11
      #9 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/8kvxvr3pmsypxiypq4g8zy13glnfr7nx-glibc-2.42-67/lib/libc.so.6+0x2b284) (BuildId: 5a702452a01df1d7d50ce0663acec7be3c71fd4d)
      #10 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/8kvxvr3pmsypxiypq4g8zy13glnfr7nx-glibc-2.42-67/lib/libc.so.6+0x2b337) (BuildId: 5a702452a01df1d7d50ce0663acec7be3c71fd4d)
      #11 0x555555694c24 in _start (/home/pks/Development/git/build/t/unit-tests+0x140c24)

  0x7c3ff6de2e3f is located 0 bytes after 47-byte region [0x7c3ff6de2e10,0x7c3ff6de2e3f)
  allocated by thread T0 here:
      #0 0x55555579e95b in malloc (/home/pks/Development/git/build/t/unit-tests+0x24a95b)
      #1 0x5555559871c2 in reftable_malloc /home/pks/Development/git/build/../reftable/basics.c:24:9
      #2 0x5555559872e8 in reftable_calloc /home/pks/Development/git/build/../reftable/basics.c:54:6
      #3 0x55555598f0d3 in reftable_buf_read_data /home/pks/Development/git/build/../reftable/blocksource.c:67:2
      #4 0x55555598ea7e in block_source_read_data /home/pks/Development/git/build/../reftable/blocksource.c:41:19
      #5 0x55555598c555 in read_block /home/pks/Development/git/build/../reftable/block.c:224:9
      #6 0x55555598b69e in reftable_block_init /home/pks/Development/git/build/../reftable/block.c:258:9
      #7 0x555555813c70 in test_reftable_block__corrupt_block_size /home/pks/Development/git/build/../t/unit-tests/u-reftable-block.c:531:20
      #8 0x5555557f684e in clar_run_test /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:335:3
      #9 0x5555557f2e69 in clar_run_suite /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:431:3
      #10 0x5555557f2882 in clar_test_run /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:636:4
      #11 0x5555557f375f in clar_test /home/pks/Development/git/build/../t/unit-tests/clar/clar.c:687:11
      #12 0x5555557fa49d in cmd_main /home/pks/Development/git/build/../t/unit-tests/unit-test.c:62:8
      #13 0x55555584b8aa in main /home/pks/Development/git/build/../common-main.c:9:11
      #14 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/8kvxvr3pmsypxiypq4g8zy13glnfr7nx-glibc-2.42-67/lib/libc.so.6+0x2b284) (BuildId: 5a702452a01df1d7d50ce0663acec7be3c71fd4d)
      #15 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/8kvxvr3pmsypxiypq4g8zy13glnfr7nx-glibc-2.42-67/lib/libc.so.6+0x2b337) (BuildId: 5a702452a01df1d7d50ce0663acec7be3c71fd4d)
      #16 0x555555694c24 in _start (/home/pks/Development/git/build/t/unit-tests+0x140c24)

  SUMMARY: AddressSanitizer: heap-buffer-overflow /home/pks/Development/git/build/../reftable/basics.h:119:20 in reftable_get_be16
  Shadow bytes around the buggy address:
    0x7c3ff6de2b80: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
    0x7c3ff6de2c00: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
    0x7c3ff6de2c80: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa
    0x7c3ff6de2d00: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa
    0x7c3ff6de2d80: fa fa 00 00 00 00 00 00 fa fa fd fd fd fd fd fd
  =>0x7c3ff6de2e00: fa fa 00 00 00 00 00[07]fa fa fa fa fa fa fa fa
    0x7c3ff6de2e80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c3ff6de2f00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c3ff6de2f80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c3ff6de3000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c3ff6de3080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb

Verify that the claimed block size fits into the block data before using
it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: fix OOB write with bogus inflated log size

The "log" reftable block stores reflog information. This information is
compressed using zlib. The inflated size is stored in the block header
so that callers can easily learn ahead of time how large of a buffer
they have to allocate to inflate the data in a single pass. So to
reconstruct the full inflated block we:

  - Copy over the header as-is, as it's not deflated.

  - Append the inflated data to the buffer.

The inflated block size stored in the header also includes the length of
the header itself. So to figure out the bytes that should be inflated by
zlib we need to subtract the header size, which is trusted data, from
the block size, which is untrusted data derived from the block header.

While we do verify that we were able to inflate all data as expected, we
don't verify ahead of time that the encoded block length is larger than
the header length. This can lead to an underflow, which makes zlib
assume that it can write more data into the target buffer than we have
allocated. The result is an out-of-bounds write:

  ==1422297==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c1ff6de5231 at pc 0x55555579a628 bp 0x7fffffff4f10 sp 0x7fffffff46d0
  WRITE of size 4 at 0x7c1ff6de5231 thread T0
      #0 0x55555579a627 in __asan_memcpy (./build/t/unit-tests+0x246627)
      #1 0x55555598b093 in reftable_block_init ./build/../reftable/block.c:277:3
      #2 0x555555813701 in test_reftable_block__corrupt_log_block_size ./build/../t/unit-tests/u-reftable-block.c:495:20
      #3 0x5555557f684e in clar_run_test ./build/../t/unit-tests/clar/clar.c:335:3
      #4 0x5555557f2e69 in clar_run_suite ./build/../t/unit-tests/clar/clar.c:431:3
      #5 0x5555557f2882 in clar_test_run ./build/../t/unit-tests/clar/clar.c:636:4
      #6 0x5555557f375f in clar_test ./build/../t/unit-tests/clar/clar.c:687:11
      #7 0x5555557fa49d in cmd_main ./build/../t/unit-tests/unit-test.c:62:8
      #8 0x55555584af4a in main ./build/../common-main.c:9:11
      #9 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #10 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #11 0x555555694c24 in _start (./build/t/unit-tests+0x140c24)

  0x7c1ff6de5231 is located 0 bytes after 1-byte region [0x7c1ff6de5230,0x7c1ff6de5231)
  allocated by thread T0 here:
      #0 0x55555579db1b in realloc.part.0 asan_malloc_linux.cpp.o
      #1 0x5555559868d7 in reftable_realloc ./build/../reftable/basics.c:36:9
      #2 0x55555598a98f in reftable_alloc_grow ./build/../reftable/basics.h:229:10
      #3 0x55555598ae58 in reftable_block_init ./build/../reftable/block.c:269:3
      #4 0x555555813701 in test_reftable_block__corrupt_log_block_size ./build/../t/unit-tests/u-reftable-block.c:495:20
      #5 0x5555557f684e in clar_run_test ./build/../t/unit-tests/clar/clar.c:335:3
      #6 0x5555557f2e69 in clar_run_suite ./build/../t/unit-tests/clar/clar.c:431:3
      #7 0x5555557f2882 in clar_test_run ./build/../t/unit-tests/clar/clar.c:636:4
      #8 0x5555557f375f in clar_test ./build/../t/unit-tests/clar/clar.c:687:11
      #9 0x5555557fa49d in cmd_main ./build/../t/unit-tests/unit-test.c:62:8
      #10 0x55555584af4a in main ./build/../common-main.c:9:11
      #11 0x7ffff7a2b284 in __libc_start_call_main (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b284) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #12 0x7ffff7a2b337 in __libc_start_main@GLIBC_2.2.5 (/nix/store/57iz36553175g3178pvxjij8z5rcsd4n-glibc-2.42-61/lib/libc.so.6+0x2b337) (BuildId: 8ae0b698f2d4e727f569f64bb166e08ae30bd077)
      #13 0x555555694c24 in _start (./build/t/unit-tests+0x140c24)

  SUMMARY: AddressSanitizer: heap-buffer-overflow (./build/t/unit-tests+0x246627) in __asan_memcpy
  Shadow bytes around the buggy address:
    0x7c1ff6de4f80: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
    0x7c1ff6de5000: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
    0x7c1ff6de5080: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
    0x7c1ff6de5100: fa fa fd fd fa fa fd fd fa fa fd fd fa fa fd fd
    0x7c1ff6de5180: fa fa fd fd fa fa fd fd fa fa fd fa fa fa fd fd
  =>0x7c1ff6de5200: fa fa 04 fa fa fa[01]fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x7c1ff6de5480: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:       fa
    Freed heap region:       fd
    Stack left redzone:      f1
    Stack mid redzone:       f2
    Stack right redzone:     f3
    Stack after return:      f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:       f6
    Poisoned by user:        f7
    Container overflow:      fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb

Fix the bug by adding a sanity check and add a unit test.

Reported-by: oxsignal <awo@kakao.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/unit-tests: introduce test helper to write reftable blocks

Introduce a new test helper that allows us to write reftable blocks.
This helper will be used by subsequent commits.

Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/record: don't abort when decoding invalid ref value type

When decoding a ref record we read its value type from the block. In
case the type itself is invalid we call `abort()`. This is rather
heavy-handed though: the data we're reading is untrusted, so we should
treat the issue as a normal and not as a programming error.

Fix this by handling the error gracefully. Note that this also requires
us to set the value type later, as otherwise we might store an invalid
type in the record.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/basics: fix OOB read on binary search of empty range

`binsearch()` performs a binary search over a range of `sz` elements by
repeatedly calling the comparison function with indices into that range.
When the range is empty though, there is no valid index to call the
comparison function with. We still end up executing the comparison
function though with an index of 0, which of course will cause an
out-of-bounds read.

Return early when the range is empty.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

oss-fuzz: add fuzzer for parsing reftables

Add a new fuzzer that exercises our parsing of reftables. Fallout from
this fuzzer will be fixed over subsequent commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

meson: support building fuzzers with libFuzzer

To support fuzzing via libFuzzer one has to pass a couple of compiler
options:

  - It is mandatory to enable the "fuzzer-no-link" sanitizer for
    coverage feedback.

  - It is recommended to enable at least one more sanitizer to catch
    issues, like the "address" sanitizer.

  - The fuzzing executables need to be linked with "-fsanitize=fuzzer"
    to wire up libFuzzer itself.

The first two items can already be achieved via the "-Db_sanitize="
option. But the last item cannot easily be achieved, as we can only
configure global link arguments.

Introduce a new "-Dfuzzers_link_args=" build option to plug this gap.
Add documentation so that users know how to set up libFuzzer.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/history: implement "drop" subcommand

A common operation when editing the commit history is to drop a specific
commit from the history entirely, but this operation is not currently
covered by git-history(1).

A couple of noteworthy bits:

  - This is the first git-history(1) command that will ultimately result
    in changes to both the index and the working tree. We thus have to
    add logic to merge resulting changes into those.

  - It is still not possible to replay merge commits, so this limitation
    is inherited for the new "drop" command.

  - For now we refuse to drop root commits. While we _can_ indeed drop
    root commits in the general case, there are edge cases where the
    resulting history would become completely empty. This is thus left
    to a subsequent patch series.

Other than that, most of the logic is rather straight-forward as we can
continue to build on the preexisting logic in git-history(1) for most of
the part.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/history: split handling of ref updates into two phases

The function `handle_reference_updates()` is used by git-history(1) to
update all references that refer to commits that have been rewritten. As
such, it performs two steps:

  - It gathers the references that need to be updated in the first
    place.

  - It prepares and commits the reference transaction.

In a subsequent commit we'll want to handle those two steps separately.
Prepare for this by splitting up the function into two.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

replay: expose `replay_result_queue_update()`

Expose `replay_result_queue_update()`, which is used to append another
reference update to the replay result. This function will be used in a
subsequent commit.

Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: stop assuming that the caller passes in a clean index

In 652bd0211d (rebase: use 'skip_cache_tree_update' option, 2022-11-10),
we updated `reset_working_tree()` to stop updating the index tree cache.
This was done as a performance optimization: the function is only called
by "sequencer.c" and "rebase.c", both of which assume a clean index
before they perform their operation, so we know that the end result will
be a clean index, too. Consequently, we can skip recomputing the cache
as we can instead use `prime_cache_tree()` directly.

In a subsequent commit we're about to add a new caller though where the
assumption doesn't hold anymore: the index may be dirty before calling
`reset_working_tree()`, and consequently we cannot prime the cache with
a given tree anymore as the index and tree will mismatch.

Adapt the logic so that we only skip the cache tree update in case we're
doing a hard reset. While we could introduce logic that only skips the
update in case the incoming index was dirty already, that doesn't really
feel worth it: after all, the mentioned commit says itself that the
performance improvement was negligible anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: allow the caller to specify the current HEAD object

When calling `reset_working_tree()` we automatically derive the commit
that the callers wants to move from by reading the HEAD commit. Some
callers may already have resolved it, or they may want to move from a
different commit that doesn't match HEAD.

Introduce a new `oid_from` option that lets the caller specify the
commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: introduce ability to skip updating HEAD

In a subsequent commit we'll introduce a new caller to
`reset_working_tree()` that really only wants to update the index and
working tree, without updating any references. Introduce a new flag that
makes the caller opt in to updating HEAD and adapt all callers to set
that flag.

Note that in a previous iteration we instead introduced a flag that made
callers opt out of updating any references. This was somewhat awkward
though because we already have the `UPDATE_ORIG_HEAD` flag, so the
result was somewhat inconsistent.

Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: fixed-up a typo pointed out by Christian]
Signed-off-by: Junio C Hamano <gitster@pobox.com>

meson: restore hook-list.h to builtin_sources

This fixes a racy build failure.

```
builtin/bugreport.c:12:10: fatal error: hook-list.h: No such file or directory
12 | #include "hook-list.h"
| ^~~~~~~~~~~~~

```

hook-list.h must be generated before builtin/bugreport.c is compiled.

Bug: https://bugs.gentoo.org/978326
Fixes: 2eb541e8f2a9 (hook: move is_known_hook() to hook.c for wider use, 2026-04-10)
Signed-off-by: Mike Gilbert <floppym@gentoo.org>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

odb: document object info fields

Some of the fields in `struct object_info` are undocumented. Add these
missing comments.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

odb: drop `whence` field from object info

In the preceding commits we have migrated all callers to derive their
information of how a specific object is stored to use the new object
info source instead, and hence the field is now unused. Drop it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

treewide: convert users of `whence` to the new source field

The `whence` field has become redundant now that callers can learn about
the exact source an object has been looked up from via the `struct
object_info_source::source` field.

Adapt callers to use the new field. Note that all callsites already set
up the `info.sourcep` request pointer, so the conversion is rather
straight-forward.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

odb: add `source` field to struct object_info_source

The previous commit introduced `struct object_info_source` as an opt-in
container for backend-specific information, but for now we only moved
preexisting data into this structure. Most importantly, the caller has
no way yet to learn about which source an object was actually looked up
from. Instead, callers have to rely on the `whence` enum to distinguish
the object type, but cannot use that enum to tell the object source.

Add a `struct odb_source *source` field to the structure and populate it
from each backend's lookup path.

The `whence` enum is still set and used by callers; it will be removed
in a subsequent commit now that `sourcep->source` can identify the
backend on its own.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

odb: make backend-specific fields optional

The `struct object_info` carries two pieces of information
about how an object was looked up:

  - The `whence` enum identifying the backend.

  - The backend-tagged union `u` exposing backend-specific details
    (currently only the packed-source case, which records the owning
    pack, offset and packed object type).

The union is populated unconditionally, even though most callers don't
care about provenance at all.

Split the backend-specific union out into a new public type, `struct
object_info_source`, and make the object info structure carry it via
just another opt-in request pointer. As with all the other requestable
information, callers that need source info allocate a `struct
object_info_source` on the stack and point `sourcep` at it; callers that
don't care about it simply leave the field as a `NULL` pointer. Adapt
callers accordingly.

Note that the `whence` enum is strictly-speaking also backend-specific
information, so it would be another good candidate to be moved into the
`struct object_info_source`. For now though it is left alone, as it will
be replaced by a `struct odb_source` pointer in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: thread odb_source_packed through packed_object_info()

Add an optional `struct odb_source_packed *source` parameter to
`packed_object_info()` and `packed_object_info_with_index_pos()`. This
parameter is unused at this point in time, but it will be used in a
follow-up commit so that we can record the source of a specific object.

Note that callers in "odb/source-packed.c" pass the already-available
source, but all other callers pass `NULL` instead. This is fine though,
as we only care about populating this info when called via the packed
store.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hash: add platform-specific discard functions

Our git_hash_discard() is a bit hacky: it just calls git_hash_final()
into a dummy result buffer, using the side effect that each
implementation's Final() function will also free any resources.

This is probably not too terrible, since generating the final hash is
not that expensive and we'd mostly call discard on unusual or error code
paths. But we can do better by widening the platform API a bit to add an
explicit discard function.

This requires an annoying amount of boilerplate:

  - Each algorithm needs a git_$ALGO_discard() wrapper that dereferences
    the union'd git_hash_ctx into the type-safe field. So sha1 + sha256
    + sha1-unsafe, plus a BUG() for the unknown algo. And then these all
    need to be referenced in the git_hash_algo structs.

  - Platforms which don't do anything special to discard now need a
    fallback function which does nothing. And we need this for each algo
    (sha1, sha256, and sha1-unsafe).

  - Platforms which do need to discard must define their discard
    functions. This includes sha1/openssl, sha256/openssl, and
    sha256/gcrypt (no sha1-unsafe here as it sits atop the sha1/openssl
    functions).

  - Algo selection needs to point platform_*_Discard to the appropriate
    underlying macro, or indicate that the fallback should be used. We
    have a similar situation for the Clone function (where a straight
    memcpy() of the context struct is not enough for some platforms).
    I've tied Discard to the same flag used by Clone here, since they
    are basically the same problem: is the hash context a sequence of
    bytes, or does it need smart copying/discarding?

It's easy to miss a case here since we don't even compile the
implementations we aren't using. I've tested with each of:

  - no flags, which uses our internal sha1/sha256 implementations, both
    of which exercise the noop fallback function

  - OPENSSL_SHA1_UNSAFE=1, which checks that our unsafe macro
    redirections work

  - OPENSSL_SHA1=1, though you should not do that in real life!

  - OPENSSL_SHA256=1, passes tests with GIT_TEST_DEFAULT_HASH=sha256

  - GCRYPT_SHA256=1, which likewise passes

The other implementations do not set the CLONE_HELPER flag, so they
treat the context as bytes and should be fine with the fallback.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hash: fix memory leak copying sha256 gcrypt handles

Our abstracted hash-algorithm API allows for cloning a hash context. By
default this just memcpy()s the bytes, but specific implementations can
provide a custom clone function.

Our API is based around the way that OpenSSL works, which is that you
first initialize the destination context, then copy into it. In our code
that is this:

  algo->init_fn(&dst);
  git_hash_clone(&dst, src);

and that translates into OpenSSL calls like:

  /* init_fn */
  dst->ectx = EVP_MD_CTX_new();
  EVP_DigestInit_ex(dst->ectx, EVP_sha256());
  /* clone */
  EVP_MD_CTX_copy_ex(dst->ectx, src->ectx);

So the allocation happens in the first step, and then the clone is just
copying values (the DigestInit is initializing values that just get
overwritten, but that's not wrong, just a little inefficient).

But libgcrypt doesn't work like that! Its copy function initializes dst
from scratch. So when using the sha256 gcrypt backend, that becomes:

  /* init_fn; this allocates */
  gcry_md_open(&dst, GCRY_MD_SHA256);
  /* clone; this also allocates, leaking the previous value! */
  gcry_md_copy(&dst, src);

You can see the leaks in the test suite by running:

  make \
    SANITIZE=leak \
    GCRYPT_SHA256=1 \
    GIT_TEST_DEFAULT_SHA=256 \
    test

which has many failures, as opposed to building with OPENSSL_SHA256,
which is leak-free.

The easy fix here is for the clone function to close the open context
we're about to overwrite. It's a little inefficient (we did a pointless
open in the init function), but probably not a big deal in practice.

If our API went the other way, assuming that we're always cloning into
garbage bytes, then we could be more efficient. We'd teach OpenSSL's
clone function to do its own new(), skip the DigestInit, and then copy
into it. And gcrypt could stick with just the copy() call.

But look again at the asymmetry in the very first code example. We call
the init function straight from the git_hash_algo struct, and then
subsequent calls are dispatched through our git_hash_* wrappers. If you
wanted to clone into an uninitialized destination, you'd do something
like:

  algo->clone_fn(&dst, src);

instead. That would require changing all of the callers. There's not
that many of them, but I don't know that it's worth changing our calling
conventions to try to reclaim this tiny bit of efficiency.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

http: discard hash in dumb-http http_object_request

Usually an object request results in finish_http_object_request()
calling git_hash_final_oid(), after we've received all of the data. But
if we hit an error, we'll bail early and free the http_object_request,
dropping the git_hash_ctx entirely.  This can cause a leak for hash
implementations that allocate memory in their context, like OpenSSL >=
3.0.

The obvious fix is for abort_http_object_request() to call
git_hash_discard(), under the assumption that every request is either
finished or aborted. But that's not quite true:

  1. Not everybody calls the abort function. Sometimes they jump
     straight to release_http_object_request(). So we'd have to put it
     there.

  2. After the finish function finalizes the hash, we can still
     encounter errors! In that case we end up aborting or releasing,
     and they must not discard that hash (since that would be a
     double-free).

So we'll keep a flag marking the validity of the hash_ctx field of the
request. The lifetime is simple: it is valid immediately after creation,
up until we call finalize. And then our release function can just
conditionally discard the hash based on that flag.

This fixes test failures in t5550 and t5619 when run with:

  make SANITIZE=leak \
       OPENSSL_SHA256=1 \
       GIT_TEST_DEFAULT_HASH=sha256 \
       test

The flag handling could be removed if the hash-discard function were
idempotent. This could be done easily-ish by having the underlying
hash functions (like the ones in sha256/openssl.h) set the context
pointer to NULL after free-ing. But it's something that every platform
implementation would have to remember to do, and the benefit for the
callers is not that huge (it would let us shave a few lines here and
probably in a few other spots).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

check_stream_oid(): discard hash on read error

The happy path of check_stream_oid() is to initialize a hash, feed the
loose object zlib stream into it, and then get the final result. But if
we hit a zlib error or see extra cruft we'll bail early with an error.

Since we never call git_hash_final() in this cases, any resources held
by the git_hash_ctx may be leaked. Our default hash algorithms don't
allocate anything in the hash_ctx, but some implementations do. For
example, running:

  make SANITIZE=leak \
       OPENSSL_SHA256=1 \
       GIT_TEST_DEFAULT_HASH=sha256 \
       test

will fail t1450, since it feeds corrupted objects that cause us to bail
from check_stream_oid(). This patch fixes it by discarding the hash in
those early return paths. Trying to jump to a common "out:" label is not
worth it here, as we must _not_ discard a hash that was already fed to
git_hash_final(). And the hash_ctx itself does not carry any information
(so we cannot check for a NULL pointer, etc).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

patch-id: discard hash when done

When computing a patch-id, we have a flush_one_hunk() helper that calls
git_hash_final() on our running hunk git_hash_ctx, and then
reinitializes that context for the next hunk.

When we run out of hunks to look at, we return, discarding the
git_hash_ctx. This can cause a leak if the hash implementation we are
using allocates any memory during its initialization. This includes
OpenSSL >= 3.0, for both SHA-1 and SHA-256. Normally we would not use
SHA-1 here at all, as we only recommend using non-DC implementations for
the "unsafe" variant (and patch-id, though they probably _could_ use the
unsafe variant, were never taught to do so).

But it is certainly a problem for SHA-256, which you can see with:

  make SANITIZE=leak \
       OPENSSL_SHA256=1 \
       GIT_TEST_DEFAULT_HASH=sha256 \
       test

That results in leak failures of 60 scripts, 57 of which are fixed by
this patch (basically anything which runs rebase will hit this case).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

csum-file: provide a function to release checkpoints

A hashfile_checkpoint struct is basically just a copy of the hash_ctx
state at a given point in the file. As such, it contains its own
git_hash_ctx which may (depending on the underlying hash implementation)
need to be discarded when we're done with it.

Let's add a "release" function which cleans up the hash context it
holds. I chose "release" here and not "discard" because you'd use this
to clean up every checkpoint, whether you used it or not. As opposed to
git_hash_discard(), which is needed only if you didn't call
git_hash_final().

There are only two callers which use hashfile_checkpoints, and we can
add release calls to both. When built with "SANITIZE=leak
OPENSSL_SHA1_UNSAFE=1", this makes both t1050 and t9300 leak-free.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

csum-file: always finalize or discard hash

When a hashfile struct is created, we always initialize the git_hash_ctx
inside it. We usually end up in hashfile_finalize(), which passes that
ctx to git_hash_final(), cleaning it up.

But a few code paths don't do so:

  1. If we bail on the hashfile and call free_hashfile() directly rather
     than finalizing.

  2. If the skip_hash flag is set, the hashfile_finalize() call will
     never call git_hash_final(). (You might think that we should just
     avoid git_hash_init() entirely in this case, but the skip_hash flag
     is set by the caller after the hashfile is initialized).

For most hash implementations this is OK, but for ones that allocate on
initialization it causes a memory leak. You can see many failures by
running:

  make SANITIZE=leak OPENSSL_SHA1_UNSAFE=1 test

since OpenSSL >= 3.0 is such an allocating hash implementation (and
csum-file uses the "unsafe" algorithm variant).

We can solve this by calling git_hash_discard() as appropriate.

Note that free_hashfile() is used both directly by callers to abort
without finalizing, and by hashfile_finalize() to free memory. In the
latter case we _don't_ want to call git_hash_discard(), because we'll
already have either finalized or discarded it. So we'll push that to an
internal "free_memory" function, and keep free_hashfile() as the public
interface to abort a hashfile without finalizing.

This fix makes several scripts leak-free with the command above: t1600,
t1601, t2107, t7008, t9210, t9211.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hash: add discard primitive

The usual life-cycle for a git_hash_ctx is calling git_hash_init(),
adding some data, and then using git_hash_final() to get the output
digest and free any resources.

Sometimes we decide to abort the operation without the final() call
(e.g., due to errors or other reasons). In that case we just abandon the
hash_ctx completely and let it go out of scope. For most hash
implementations this is fine; they were just holding values directly in
the struct.

But some implementations do allocate memory, and in these cases we leak
the memory. Notably OpenSSL >= 3.0 requires us to allocate the digest
context on the heap with EVP_MD_CTX_new().

Let's provide a git_hash_discard() function that can be used in these
code paths to free any resources. For now we'll implement it by just
calling git_hash_final() into a dummy output, relying on its side effect
of freeing the resources. Our view of the underlying hash implementation
is abstracted behind the platform_SHA_* macros, so that's the best we
can do without widening that interface.

It's a little inefficient, but probably not noticeably so in practice,
especially as we'd usually hit this on an error code path. And by
abstracting it in this function, we can later swap it out when the
platform_SHA interface lets us do so.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

csum-file: drop discard_hashfile()

Commit c3d034df16 (csum-file: introduce discard_hashfile(), 2024-07-25)
added a cleanup function that no longer has any callers. In that commit
we adjusted do_write_index() to use the new function. But a similar fix
occurred on a parallel branch, making free_hashfile() public, and the
merge resolution in 1b6b2bfae5 (Merge branch 'ps/leakfixes-part-4',
2024-08-23) took the free_hashfile() version.

So now we have two functions, discard_hashfile() and free_hashfile(),
and we only need one. Which one do we want to keep?

The only difference between them is that the discard variant also closes
the descriptors held in the struct. Let's look at the three callers:

  1. In finalize_hashfile() we've either already closed the descriptors
     (if the CSUM_CLOSE flag is passed) or the caller didn't want them
     closed (if it didn't pass that flag). So we want the more limited
     free_hashfile().

  2. In object-file.c:flush_packfile_transaction() we close the
     descriptor ourselves. So discard_hashfile() could save us a line of
     code.

  3. In do_write_index() we don't close the descriptor. This was the spot
     for which c3d034df16 added the discard function in the first place,
     but I'm skeptical that closing the descriptor here is the right
     thing. It is true that we are done with the descriptor at this
     point and closing it would be ideal. But we don't really own it!

     The descriptor comes from a tempfile struct (as part of a lock) and
     that tempfile will hold on to the descriptor and try to close it
     when it is deleted. This might happen at the end of the program, in
     which case the double-close is mostly harmless (we might
     accidentally close some other open descriptor, but at that point
     we're just closing and unlinking everything we can).

     But in theory it could also cause subtle bugs. If do_write_index()
     fails, we return the error up the stack and would eventually end up
     in write_locked_index(). There we roll back the lock file on error,
     which will close the descriptor. So now we get our double close,
     and we might actually close something else that was opened in the
     interim.

     This is probably unlikely in practice (as soon as we see the error
     we'd mostly be unwinding the stack, not opening new files). But it
     highlights a potential problem with the discard_hashfile()
     interface: the hashfile doesn't necessarily own that descriptor.

Note that I said "descriptors" plural above. Those callers all care
about the "fd" member of the struct. But discard_hashfile() also closes
check_fd. That is only used if the struct is initialized with
hashfd_check(), and neither of its two callers call either discard or
free (they always "finalize" instead). So closing it is irrelevant for
the current callers.

I think we're better off sticking with the simpler free_hashfile()
interface, and the handful of callers can decide how to handle the
descriptors themselves.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: introduce dry-run mode

In a subsequent commit we'll add another caller to `reset_working_tree()`
that wants to perform a dry-run check of whether it would be possible to
update the index and working tree when moving to a new commit. Introduce
a new flag that lets the caller perform this operation.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: modernize flags passed to `reset_working_tree()`

The flags passed to `reset_working_tree()` are declared as defines. This
has fallen a bit out of practice nowadays, where we instead prefer to
use enums. Furthermore, the prefix of those flags does not match the
function name anymore after the rename in the preceding commit.

Adapt the code to follow modern best practices and adapt the flag names.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reset: rename `reset_head()`

In a subsequent commit we're about to adapt `reset_head()` so that the
reference update to HEAD is optional, only. At this point the function
starts to feel misnamed, as it doesn't necessarily have anything to do
with the HEAD reference anymore. The gist of the function then is that
we reset the working tree to a specific new commit, updating both the
index and the checked-out files.

Rename it to `reset_working_tree()` to better reflect that.

Note that we don't adjust the flags yet. This will happen in a
subsequent commit.

Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>