git.ipfire.org Git - thirdparty/git.git/log

builtin/fsck: drop `fsck_head_link()`

The function `fsck_head_link()` was historically used to perform a
couple of consistency checks for refs. (Almost) all of these checks have
now been moved into the refs subsystem. There's only a single check
remaining that verifies whether `refs_resolve_ref_unsafe()` returns a
`NULL` pointer. This may happen in a couple of cases:

  - When `refs_is_safe()` declares the ref to be unsafe. We already have
    checks for this as we verify refnames with `check_refname_format()`.

  - When the ref doesn't exist. A repository without "HEAD" is
    completely broken though, and we would notice this error ahead of
    time already.

  - In case the caller passes `RESOLVE_REF_READING` and the ref is a
    symref that doesn't resolve. We don't pass this flag though.

As such, this check doesn't cover anything anymore that isn't already
covered by `refs_fsck()`. Drop it, which also allows us to inline the
call to `refs_resolve_ref_unsafe()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/fsck: move generic HEAD check into `refs_fsck()`

Move the check that detects "HEAD" refs that do not point at a branch
into `refs_fsck()`. This follows the same motivation as the preceding
commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/fsck: move generic object ID checks into `refs_fsck()`

While most of the logic that verifies the consistency of refs is
driven by `refs_fsck()`, we still have a small handful of checks in
`fsck_head_link()`. These checks don't use the git-fsck(1) reporting
infrastructure, and as such it's impossible to for example disable
some of those checks.

One such check detects refs that point to the all-zeroes object ID.
Extract this check into the generic `refs_fsck_ref()` function that is
used by both the "files" and "reftable" backends.

Note that this will cause us to not return an error code from
`fsck_head_link()` anymore in case this error was detected. This is fine
though: the only caller of this function does not check the error code
anyway. To demonstrate this, adapt the function to drop its return value
altogether. The function will be removed in a subsequent commit anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/reftable: introduce generic checks for refs

In a preceding commit we have extracted generic checks for both direct
and symbolic refs that apply for all backends. Wire up those checks for
the "reftable" backend.

Note that this is done by iterating through all refs manually with the
low-level reftable ref iterator. We explicitly don't want to use the
higher-level iterator that is exposed to users of the reftable backend
as that iterator may swallow for example broken refs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/reftable: fix consistency checks with worktrees

The ref consistency checks are driven via `cmd_refs_verify()`. That
function loops through all worktrees (including the main worktree) and
then checks the ref store for each of them individually. It follows that
the backend is expected to only verify refs that belong to the specified
worktree.

While the "files" backend handles this correctly, the "reftable" backend
doesn't. In fact, it completely ignores the passed worktree and instead
verifies refs of _all_ worktrees. The consequence is that we'll end up
every ref store N times, where N is the number of worktrees.

Or rather, that would be the case if we actually iterated through the
worktree reftable stacks correctly. But we use `strmap_for_each_entry()`
to iterate through the stacks, but the map is in fact not even properly
populated. So instead of checking stacks N^2 times, we actually only end
up checking the reftable stack of the main worktree.

Fix this bug by only verifying the stack of the passed-in worktree and
constructing the backends via `backend_for_worktree()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/reftable: extract function to retrieve backend for worktree

Pull out the logic to retrieve a backend for a given worktree. This
function will be used in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/reftable: adapt includes to become consistent

Adapt the includes to be sorted and to use include paths that are
relative to the "refs/" directory.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: introduce function to perform normal ref checks

In a subsequent commit we'll introduce new generic checks for direct
refs. These checks will be independent of the actual backend.

Introduce a new function `refs_fsck_ref()` that will be used for this
purpose. At the current point in time it's still empty, but it will get
populated in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: extract generic symref target checks

The consistency checks for the "files" backend contain a couple of
verifications for symrefs that verify generic properties of the target
reference. These properties need to hold for every backend, no matter
whether it's using the "files" or "reftable" backend.

Reimplementing these checks for every single backend doesn't really make
sense. Extract it into a generic `refs_fsck_symref()` function that can
be used by other backends, as well. The "reftable" backend will be wired
up in a subsequent commit.

While at it, improve the consistency checks so that we don't complain
about refs pointing to a non-ref target in case the target refname
format does not verify. Otherwise it's very likely that we'll generate
both error messages, which feels somewhat redundant in this case.

Note that the function has a couple of `UNUSED` parameters. These will
become referenced in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsck: drop unused fields from `struct fsck_ref_report`

The `struct fsck_ref_report` has a couple fields that are intended to
improve the error reporting for broken ref reports by showing which
object ID or target reference the ref points to. These fields are never
set though and are thus essentially unused.

Remove them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: perform consistency checks for root refs

While the "files" backend already knows to perform consistency checks
for the "refs/" hierarchy, it doesn't verify any of its root refs. Plug
this omission.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: improve error handling when verifying symrefs

The error handling when verifying symbolic refs is a bit on the wild
side:

  - `fsck_report_ref()` can be told to ignore specific errors. If an
    error has been ignored and a previous check raised an unignored
    error, then assigning `ret = fsck_report_ref()` will cause us to
    swallow the previous error.

  - When the target reference is not valid we bail out early without
    checking for other errors.

Fix both of these issues by consistently or'ing the return value and not
bailing out early.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: extract function to check single ref

When checking the consistency of references we create a directory
iterator and then verify each single reference in a loop. The logic to
perform the actual checks is embedded into that loop, which makes it
hard to reuse. But In a subsequent commit we're about to introduce a
second path that wants to verify references.

Prepare for this by extracting the logic to check a single reference
into a standalone function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: remove useless indirection

The function `files_fsck_refs()` only has a single callsite and forwards
all of its arguments as-is, so it's basically a useless indirection.
Inline the function call.

While at it, also remove the bitwise or that we have for return values.
We don't really want to or them at all, but rather just want to return
an error in case either of the functions has failed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: remove `refs_check_dir` parameter

The parameter `refs_check_dir` determines which directory we want to
check references for. But as we always want to check the complete
refs hierarchy, this parameter is always set to "refs".

Drop the parameter and hardcode it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: move fsck functions into global scope

When performing consistency checks we pass the functions that perform
the verification down the calling stack. This is somewhat unnecessary
though, as the set of functions doesn't ever change.

Simplify the code by moving the array into global scope and remove the
parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: simplify iterating through root refs

When iterating through root refs we first need to determine the
directory in which the refs live. This is done by retrieving the root of
the loose refs via `refs->loose->root->name`, and putting it through
`files_ref_path()` to derive the final path.

This is somewhat redundant though: the root name of the loose files
cache is always going to be the empty string. As such, we always end up
passing that empty string to `files_ref_path()` as the ref hierarchy we
want to start. And this actually makes sense: `files_ref_path()` already
computes the location of the root directory, so of course we need to
pass the empty string for the ref hierarchy itself. So going via the
loose ref cache to figure out that the root of a ref hierarchy is empty
is only causing confusion.

But next to the added confusion, it can also lead to a segfault. The
loose ref cache is populated lazily, so it may not always be set. It
seems to be sheer luck that this is a condition we do not currently hit.
The right thing to do would be to call `get_loose_ref_cache()`, which
knows to populate the cache if required.

Simplify the code and fix the potential segfault by simply removing the
indirection via the loose ref cache completely.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: drop repository parameter from `packed_object_info()`

The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.

Drop the redundant parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: skip unpacking object header for disk size requests

While most of the object info requests for a packed object require us to
unpack its headers, reading its disk size doesn't. We still unpack the
object header in that case though, which is unnecessary work.

Skip reading the header if only the disk size is requested. This leads
to a small speedup when reading disk size, only. The following benchmark
was done in the Git repository:

    Benchmark 1: ./git rev-list --disk-usage HEAD (rev = HEAD~)
      Time (mean ± σ):     105.2 ms ±   0.6 ms    [User: 91.4 ms, System: 13.3 ms]
      Range (min … max):   103.7 ms … 106.0 ms    27 runs

    Benchmark 2: ./git rev-list --disk-usage HEAD (rev = HEAD)
      Time (mean ± σ):      96.7 ms ±   0.4 ms    [User: 86.2 ms, System: 10.0 ms]
      Range (min … max):    96.2 ms …  98.1 ms    30 runs

    Summary
      ./git rev-list --disk-usage HEAD (rev = HEAD) ran
        1.09 ± 0.01 times faster than ./git rev-list --disk-usage HEAD (rev = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: disentangle return value of `packed_object_info()`

The `packed_object_info()` function returns the type of the packed
object. While we use an `enum object_type` to store the return value,
this type is not to be confused with the actual object type. It _may_
contain the object type, but it may just as well encode that the given
packed object is stored as a delta.

We have removed the only caller that relied on this returned object type
in the preceding commit, so let's simplify semantics and return either 0
on success or a negative error code otherwise.

This unblocks a small optimization where we can skip reading the object
type altogether.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: always populate pack-specific info when reading object info

When reading object information via `packed_object_info()` we may not
populate the object info's packfile-specific fields. This leads to
inconsistent object info depending on whether the info was populated via
`packfile_store_read_object_info()` or `packed_object_info()`.

Fix this inconsistency so that we can always assume the pack info to be
populated when reading object info from a pack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: extend `is_delta` field to allow for "unknown" state

The `struct object_info::u::packed::is_delta` field determines whether
or not a specific object is stored as a delta. It only stores whether or
not the object is stored as delta, so it is treated as a boolean value.

This boolean is insufficient though: when reading a packed object via
`packfile_store_read_object_info()` we know to skip parsing the actual
object when the user didn't request any object-specific data. In that
case we won't read the object itself, but will only look up its position
in the packfile. Consequently, we do not know whether it is a delta or
not.

This isn't really an issue right now, as the check for an empty request
is broken. But a subsequent commit will fix it, and once we do we will
have the need to also represent an "unknown" delta state.

Prepare for this change by introducing a new enum that encodes the
object type. We don't use the "unknown" state just yet, but will start
to do so in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: always declare object info to be OI_PACKED

When reading object info via a packfile we yield one of two types:

  - The object can either be OI_PACKED, which is what a caller would
    typically expect.

  - Or it can be OI_DBCACHED if it is stored in the delta base cache.

The latter really is an implementation detail though, and callers
typically don't care at all about the difference. Furthermore, the
information whether or not it is part of the delta base cache can
already be derived via the `is_delta` field, so the fact that we discern
between OI_PACKED and OI_DBCACHED only further complicates the
interface.

There aren't all that many callers that care about the `whence` field in
the first place. In fact, there's only three:

  - `packfile_store_read_object_info()` checks for `whence == OI_PACKED`
    and then populates the packfile information of the object info
    structure. We now start to do this also for deltified objects, which
    gives its callers strictly more information.

  - `repack_local_links()` wants to determine whether the object is part
    of a promisor pack and checks for `whence == OI_PACKED`. If so, it
    verifies that the packfile is a promisor pack. It's arguably wrong
    to declare that an object is not part of a promisor pack only
    because it is stored in the delta base cache.

  - `is_not_in_promisor_pack_obj()` does the same, but checks that a
    specific object is _not_ part of a promisor pack. The same reasoning
    as above applies.

Drop the OI_DBCACHED enum completely. None of the callers seem to care
about the distinction.

Note that this also fixes a segfault introduced in 8c1b84bc97
(streaming: move logic to read packed objects streams into backend,
2025-11-23), which refactors how we stream packed objects. The intent is
to only read packed objects in case they are stored non-deltified as
we'd otherwise have to deflate them first. But the check for whether or
not the object is stored as a delta was unconditionally done via
`oi.u.packed.is_delta`, which is only valid in case `oi.whence` is
`OI_PACKED`. But under some circumstances we got `OI_DBCACHED` here,
which means that none of the `oi.u.packed` fields were initialized at
all. Consequently, we assumed the object was not stored as a delta, and
then try to read the object from `oi.u.packed.pack`, which is a `NULL`
pointer and thus causes a segfault.

Add a test case for this issue so that this cannot regress in the
future anymore.

Reported-by: Matt Smiley <msmiley@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

object-file: always set OI_LOOSE when reading object info

There are some early returns in `odb_source_loose_read_object_info()`
in cases where we don't have to open the loose object. These return
paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
it becomes impossible for the caller to tell the format of such an
object.

The root cause of this really is that we have so many different return
paths in the function. As a consequence, it's harder than necessary to
make sure that all successful exit paths sot up the `whence` field as
expected.

Address this by refactoring the function to have a single exit path.
Like this, we can trivially set up the `whence` field when we exit
successfully from the function.

Note that we also:

  - Rename `status` to `ret` to match our usual coding style, but also
    to show that the old `status` variable is now always getting the
    expected value. Furthermore, the value is not initialized anymore,
    which has the consequence that most compilers will warn for exit
    paths where we forgot to set it.

  - Move the setup of scratch pointers closer to `parse_loose_header()`
    to show where it's needed.

  - Guard a couple of variables on cleanup so that they only get
    released in case they have been set up.

  - Reset `oi->delta_base_oid` towards the end of the function, together
    with all the other object info pointers.

Overall, all these changes result in a diff that is somewhat hard to
read. But the end result is significantly easier to read and reason
about, so I'd argue this one-time churn is worth it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The 17th batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'js/mailmap-karsten-blees'

Mailmap update for Karsten

* js/mailmap-karsten-blees:
.mailmap: replace Karsten Blees' default address

Merge branch 'ps/t1300-2021-use-test-path-is-helpers'

Test updates.

* ps/t1300-2021-use-test-path-is-helpers:
t1300: use test helpers instead of `test` command

Merge branch 'rs/commit-stack'

Code clean-up, unifying various hand-rolled "list of commit
objects" and use the commit_stack API.

* rs/commit-stack:
  commit-reach: use commit_stack
  commit-graph: use commit_stack
  commit: add commit_stack_grow()
  shallow: use commit_stack
  pack-bitmap-write: use commit_stack
  commit: add commit_stack_init()
  test-reach: use commit_stack
  remote: use commit_stack for src_commits
  remote: use commit_stack for sent_tips
  remote: use commit_stack for local_commits
  name-rev: use commit_stack
  midx: use commit_stack
  log: use commit_stack
  revision: export commit_stack

Merge branch 'sb/bundle-uri-without-uri'

Diagnose invalid bundle-URI that lack the URI entry, instead of
crashing.

* sb/bundle-uri-without-uri:
bundle-uri: validate that bundle entries have a uri

Merge branch 'ja/doc-synopsis-style-more'

More doc style updates.

* ja/doc-synopsis-style-more:
  doc: convert git-remote to synopsis style
  doc: convert git stage to use synopsis block
  doc: convert git-status tables to AsciiDoc format
  doc: convert git-status to synopsis style
  doc: fix t0450-txt-doc-vs-help to select only first synopsis block

t1410: use test helpers in reflog rewind test

Replace raw `test -f` and `! test -f` checks in the rewind test with
`test_path_is_file` and `test_path_is_missing`. This provides clearer
failure diagnostics and keeps the test consistent with the rest of
the test suite.

Signed-off-by: Pushkar Singh <pushkarkumarsingh1970@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

http-backend: write newlines to stderr when responding with errors

The not_found and forbidden methods currently do not write a newline to
stderr after the error message. This means that if git-http-backend is
invoked through something like fcgiwrap, and the stderr of that fcgiwrap
process is sent to a logging daemon (e.g. journald), the error messages
of several git-http-backend invocations will just get strung together,
e.g.

> Not a git repository: '/var/lib/git/foo.git'Not a git repository: '/var/lib/git/foo.git'Not a git repository: '/var/lib/git/foo.git'

I think it's git-http-backend's responsibility to format these messages
properly, rather than it being fcgiwrap's job to notice that the script
didn't terminate stderr with a newline and do so itself.

Signed-off-by: KJ Tsanaktsidis <kj@kjtsanaktsidis.id.au>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

.mailmap: replace Karsten Blees' default address

As per a recent email by Karsten, the @dcon.de address no longer works:
https://lore.kernel.org/git/77e768b2-6693-454f-9e11-fb0acdec703c@gmail.com

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

contrib/subtree: detect rewritten subtree commits

    git subtree split --prefix P

detects splits that are outside of path prefix `P` and prunes
them from history graph processing. This improves the performance
of repeated `split --rejoin` with many different prefixes.

Both before and after 83f9dad7d6 (contrib/subtree: fix split with
squashed subtrees, 2025-09-09), the pruning logic does not detect
**rebased** or **cherry-picked** git-subtree commits. If `split`
encounters any of these commits, the split output may have
incomplete history.

All commits authored by

    git subtree merge [--squash] --prefix Q

have a first or second parent that has *only* subtree commits
as ancestors. When splitting a completely different path `P/`,
it is safe to ignore:

1. the merged tree
2. the subtree parent
3. *all* of that parent's ancestry, which applies only to
   path `Q/` and not `P/`.

But this relationship no longer holds if the git-subtree commit
is rebased or otherwise reauthored. After a rebase, the former
git-subtree commit will have other unrelated commits as ancestors.
Ignoring these commits may exclude the history of `P/`,
leading to incomplete `subtree split` output.

The pruning logic relies solely on the `git-subtree-*:` trailers
to detect git-subtree commits, which it blindly accepts without
further validation. The split logic also takes its time about
being wrong: `cmd_split()` execs a `git show` for *every* commit
in the split range… twice. This is inefficient in a shell script.

Add a "reality check" to ignore rebased or rewritten commits:

* Rewrites of non-merge commits cannot be detected, so the new
  detector no longer looks for them.

* Merges carry a `git-subtree-mainline:` trailer with the hash of
  the **first parent**. If this hash differs, or if the "merge"
  commit no longer has multiple parents, a rewrite has occurred.

To increase speed, package this logic in a new method,
`find_other_splits()`. Perform the check up-front by iterating
over a single `git log`. Add ignored subtrees to:

1. the `notree` cache, which excludes them from the `split` history

2. a `prune` negative refs list. The negative refs prevent
   recursing into other subtrees. Since there are potentially a
   *lot* of these, cache them on disk and use rev-list's
   `--stdin` mode.

Reported-by: George <george@mail.dietrich.pub>
Signed-off-by: Colin Stagner <ask+git@howdoi.land>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

cocci: convert parse_tree functions to repo_ variants

Add and apply a semantic patch to convert calls to parse_tree() and
friends to the corresponding variant that takes a repository argument,
to allow the functions that implicitly use the_repository to be retired
once all potential in-flight topics are settled and converted as well.

The changes in .c files were generated by Coccinelle, but I fixed a
whitespace bug it would have introduced to builtin/commit.c.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tree: stop using the_repository

Push the use of the_repository to the remaining callers by turning the
compatibility wrappers into macros, whose use still requires
USE_THE_REPOSITORY_VARIABLE to be defined.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tree: use repo_parse_tree()

e092073d64 (tree.c: make read_tree*() take 'struct repository *',
2018-11-18) replaced explicit uses of the_repository. parse_tree() uses
it internally, though, so call repo_parse_tree() instead and hand it the
correct repository.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

path-walk: use repo_parse_tree_gently()

Use the passed in repository instead of the implicit the_repository when
parsing the tree.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap-write: use repo_parse_tree()

1a6768d1dd (pack-bitmap-write: stop depending on `the_repository`,
2025-03-10) replaced explicit uses of the_repository. parse_tree() uses
it internally, though, so call repo_parse_tree() instead and hand it the
correct repository.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

delta-islands: use repo_parse_tree()

19be71db9c (delta-islands: stop depending on `the_repository`,
2025-03-10) replaced explicit uses of the_repository. parse_tree() uses
it internally, though, so call repo_parse_tree() instead and hand it the
correct repository.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bloom: use repo_parse_tree()

Use the passed in repository instead of the implicit the_repository when
parsing the tree.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

add-interactive: use repo_parse_tree_indirect()

1b374ad71f (add-interactive: stop using `the_repository`, 2024-12-17)
replaced explicit uses of the_repository. parse_tree_indirect() uses it
internally, though, so call repo_parse_tree_indirect() instead and hand
it the correct repository.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tree: add repo_parse_tree*()

Add variants of parse_tree(), parse_tree_gently() and
parse_tree_indirect() that allow using an arbitrary repository.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

environment: move access to core.maxTreeDepth into repo settings

The config setting core.maxTreeDepth is stored in a global variable and
populated by the function git_default_core_config. This won't work if
we need to access multiple repositories with different values of that
setting in the same process. Store the setting in struct repo_settings
instead and track it separately for each repository.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: special-case index entries for symlinks with buggy size

In https://github.com/git-for-windows/git/pull/2637, we fixed a bug
where symbolic links' target path sizes were recorded incorrectly in the
index. The downside of this fix was that every user with tracked
symbolic links in their checkouts would see them as modified in `git
status`, but not in `git diff`, and only a `git add <path>` (or `git add
-u`) would "fix" this.

Let's do better than that: we can detect that situation and simply
pretend that a symbolic link with a known bad size (or a size that just
happens to be that bad size, a _very_ unlikely scenario because it would
overflow our buffers due to the trailing NUL byte) means that it needs
to be re-checked as if we had just checked it out.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: emulate `stat()` a little more faithfully

When creating directories via `safe_create_leading_directories()`, we
might encounter an already-existing directory which is not
readable by the current user. To handle that situation, Git's code calls
`stat()` to determine whether we're looking at a directory.

In such a case, `CreateFile()` will fail, though, no matter what, and
consequently `mingw_stat()` will fail, too. But POSIX semantics seem to
still allow `stat()` to go forward.

So let's call `mingw_lstat()` to the rescue if we fail to get a file
handle due to denied permission in `mingw_stat()`, and fill the stat
info that way.

We need to be careful to not allow this to go forward in case that we're
looking at a symbolic link: to resolve the link, we would still have to
create a file handle, and we just found out that we cannot. Therefore,
`stat()` still needs to fail with `EACCES` in that case.

This fixes https://github.com/git-for-windows/git/issues/2531.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: try to create symlinks without elevated permissions

As of Windows 10 Build 14972 in Developer Mode, a new flag is supported
by `CreateSymbolicLink()` to create symbolic links even when running
outside of an elevated session (which was previously required).

This new flag is called `SYMBOLIC_LINK_FLAG_ALLOW_UNPRIVILEGED_CREATE`
and has the numeric value 0x02.

Previous Windows 10 versions will not understand that flag and return
an `ERROR_INVALID_PARAMETER`, therefore we have to be careful to try
passing that flag only when the build number indicates that it is
supported.

For more information about the new flag, see this blog post:
https://blogs.windows.com/buildingapps/2016/12/02/symlinks-windows-10/

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: add support for symlinks to directories

Symlinks on Windows have a flag that indicates whether the target is a
file or a directory. Symlinks of wrong type simply don't work. This even
affects core Win32 APIs (e.g. `DeleteFile()` refuses to delete directory
symlinks).

However, `CreateFile()` with FILE_FLAG_BACKUP_SEMANTICS does work. Check
the target type by first creating a tentative file symlink, opening it,
and checking the type of the resulting handle. If it is a directory,
recreate the symlink with the directory flag set.

It is possible to create symlinks before the target exists (or in case
of symlinks to symlinks: before the target type is known). If this
happens, create a tentative file symlink and postpone the directory
decision: keep a list of phantom symlinks to be processed whenever a new
directory is created in `mingw_mkdir()`.

Limitations: This algorithm may fail if a link target changes from file
to directory or vice versa, or if the target directory is created in
another process. It's the best Git can do, though.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: implement basic `symlink()` functionality (file symlinks only)

Implement `symlink()`. This implementation always creates _file_
symlinks (remember: Windows discerns between symlinks pointing to
directories and those pointing to files). Support for directory symlinks
will be added in a subseqeuent commit.

This implementation fails with `ENOSYS` if symlinks are disabled or
unsupported.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: implement `readlink()`

Implement `readlink()` by reading NTFS reparse points via the
`read_reparse_point()` function that was introduced earlier to determine
the length of symlink targets. Works for symlinks and directory
junctions.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: allow `mingw_chdir()` to change to symlink-resolved directories

If symlinks are enabled, resolve all symlinks when changing directories,
as required by POSIX.

Note: Git's `real_path()` function bases its link resolution algorithm
on this property of `chdir()`. Unfortunately, the current directory on
Windows is limited to only MAX_PATH (260) characters. Therefore using
symlinks and long paths in combination may be problematic.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: support renaming symlinks

Older MSVCRT's `_wrename()` function cannot rename symlinks over
existing files: it returns success without doing anything. Newer
MSVCR*.dll versions probably do not share this problem: according to CRT
sources, they just call `MoveFileEx()` with the `MOVEFILE_COPY_ALLOWED`
flag.

Avoid the `_wrename()` call, and go with directly calling
`MoveFileEx()`, with proper error handling of course.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: handle symlinks to directories in `mingw_unlink()`

The `_wunlink()` and `DeleteFileW()` functions refuse to delete symlinks
to directories on Windows; The error code would be `ERROR_ACCESS_DENIED`
in that case. Take that error code as an indicator that we need to try
`_wrmdir()` as well. In the best case, it will remove a symlink. In the
worst case, it will fail with the same error code again.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: add symlink-specific error codes

The Win32 API calls do not set `errno`; Instead, error codes for failed
operations must be obtained via the `GetLastError()` function. Git would
not know what to do with those error values, though, which is why Git's
Windows compatibility layer translates them to `errno` values.

Let's handle a couple of symlink-related error codes that will become
relevant with the upcoming support for symlinks on Windows.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: change default of `core.symlinks` to false

Symlinks on Windows don't work the same way as on Unix systems. For
example, there are different types of symlinks for directories and
files, and unless using a recent-ish Windows version in Developer Mode,
creating symlinks requires administrative privileges.

By default, disable symlink support on Windows. That is, users
explicitly have to enable it with `git config [--system|--global]
core.symlinks true`; For convenience, `git init` (and `git clone`)
will perform a test whether the current setup allows creating symlinks
and will configure that setting in the repository config.

The test suite ignores system / global config files. Allow
testing *with* symlink support by checking if native symlinks are
enabled in MSYS2 (via setting the special environment variable
`MSYS=winsymlinks:nativestrict` to ask the MSYS2 runtime to enable
creating symlinks).

Note: This assumes that Git's test suite is run in MSYS2's Bash, which
is true for the time being (an experiment to switch to BusyBox-w32
failed due to the experimental nature of BusyBox-w32).

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: factor out the retry logic

In several places, Git's Windows-specific code follows the pattern where
it tries to perform an operation, and retries several times when that
operation fails, sleeping an increasing amount of time, before finally
giving up and asking the user whether to rety (after, say, closing an
editor that held a handle to a file, preventing the operation from
succeeding).

This logic is a bit hard to use, and inconsistent:
`mingw_unlink()` and `mingw_rmdir()` duplicate the code to retry,
and both of them do so incompletely. They also do not restore `errno` if the
user answers 'no'.

Introduce a `retry_ask_yes_no()` helper function that handles retry with
small delay, asking the user, and restoring `errno`.

Note that in `mingw_unlink()`, we include the `_wchmod()` call in the
retry loop (which may fail if the file is locked exclusively).

In `mingw_rmdir()`, we include special error handling in the retry loop.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: compute the correct size for symlinks in `mingw_lstat()`

POSIX specifies that upon successful return from `lstat()`: "the
value of the st_size member shall be set to the length of the pathname
contained in the symbolic link not including any terminating null byte".

Git typically doesn't trust the `stat.st_size` member of symlinks (e.g.
see `strbuf_readlink()`). Therefore, it is tempting to save on the extra
overhead of opening and reading the reparse point merely to calculate
the exact size of the link target.

This is, in fact, what Git for Windows did, from May 2015 to May 2020.
At least almost: some functions take shortcuts if `st_size` is 0 (e.g.
`diff_populate_filespec()`), hence Git for Windows hard-coded the length
of all symlinks to MAX_PATH.

This did cause problems, though, specifically in Git repositories
that were also accessed by Git for Cygwin or Git for WSL. For example,
doing `git reset --hard` using Git for Windows would update the size of
symlinks in the index to be MAX_PATH; at a later time Git for Cygwin
or Git for WSL would find that symlinks have changed size during `git
status` and update the index. And then Git for Windows would think that
the index needs to be updated. Even if the symlinks did not, in fact,
change. To avoid that, the correct size must be determined.

Signed-off-by: Bill Zissimopoulos <billziss@navimatics.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: teach dirent about symlinks

Move the `S_IFLNK` detection to `file_attr_to_st_mode()`.

Implement `DT_LNK` detection in dirent.c's `readdir()` function.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: let `mingw_lstat()` error early upon problems with reparse points

When obtaining lstat information for reparse points, we need to call
`FindFirstFile()` in addition to `GetFileInformationEx()` to obtain
the type of the reparse point (symlink, mount point etc.). However,
currently there is no error handling whatsoever if `FindFirstFile()`
fails.

Call `FindFirstFile()` before modifying the `stat *buf` output parameter
and error out if the call fails.

Note: The `FindFirstFile()` return value includes all the data
that we get from `GetFileAttributesEx()`, so we could replace
`GetFileAttributesEx()` with `FindFirstFile()`. We don't do that because
`GetFileAttributesEx()` is about twice as fast for single files. I.e.
we only pay the extra cost of calling `FindFirstFile()` in the rare case
that we encounter a reparse point.

Please also note that the indentation the remaining reparse point
code changed, and hence the best way to look at this diff is with
`--color-moved -w`. That code was _not_ moved because a subsequent
commit will move it to an altogether different function, anyway.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: drop the separate `do_lstat()` function

With the new `mingw_stat()` implementation, `do_lstat()` is only called
from `mingw_lstat()` (with the function parameter `follow == 0`). Remove
the extra function and the old `mingw_stat()`-specific (`follow == 1`)
logic.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: implement `stat()` with symlink support

With respect to symlinks, the current `mingw_stat()` implementation is
almost identical to `mingw_lstat()`: except for the file type (`st_mode
& S_IFMT`), it returns information about the link rather than the target.

Implement `mingw_stat()` by opening the file handle requesting minimal
permissions, and then calling `GetFileInformationByHandle()` on it. This
way, all links are resolved by the Windows file system layer.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: don't call `GetFileAttributes()` twice in `mingw_lstat()`

The Win32 API function `GetFileAttributes()` cannot handle paths with
trailing dir separators. The current `mingw_stat()`/`mingw_lstat()`
implementation calls `GetFileAttributes()` twice if the path has
trailing slashes (first with the original path that was passed as
function parameter, and and a second time with a path copy with trailing
'/' removed).

With the conversion to wide Unicode, we get the length of the path for
free, and also have a (wide char) buffer that can be modified. This
makes it easy to avoid that extraneous Win32 API call.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'js/prep-symlink-windows' into js/symlink-windows

* js/prep-symlink-windows:
  trim_last_path_component(): avoid hard-coding the directory separator
  strbuf_readlink(): support link targets that exceed 2*PATH_MAX
  strbuf_readlink(): avoid calling `readlink()` twice in corner-cases
  init: do parse _all_ core.* settings early
  mingw: do resolve symlinks in `getcwd()`
  t7800: work around the MSYS path conversion on Windows
  t6423: introduce Windows-specific handling for symlinking to /dev/null
  t1305: skip symlink tests that do not apply to Windows
  t1006: accommodate for symlink support in MSYS2
  t0600: fix incomplete prerequisite for a test case
  t0301: another fix for Windows compatibility
  t0001: handle `diff --no-index` gracefully
  mingw: special-case `open(symlink, O_CREAT | O_EXCL)`
  apply: symbolic links lack a "trustable executable bit"
  t9700: accommodate for Windows paths

trim_last_path_component(): avoid hard-coding the directory separator

Currently, this function hard-codes the directory separator as the
forward slash.

However, on Windows the backslash character is valid, too. And we want
to call this function in the upcoming support for symlinks on Windows
with the symlink targets (which naturally use the canonical directory
separator on Windows, which is _not_ the forward slash).

Prepare that function to be useful also in that context.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

strbuf_readlink(): support link targets that exceed 2*PATH_MAX

The `strbuf_readlink()` function refuses to read link targets that
exceed 2*PATH_MAX (even if a sufficient size was specified by the
caller).

The reason that that limit is 2*PATH_MAX instead of PATH_MAX is that
the symlink targets do not need to be normalized. After running
`ln -s a/../a/../a/../a/../b c`, the target of the symlink `c` will not
be normalized to `b` but instead be much longer. As such, symlink
targets' lengths can far exceed PATH_MAX.

They are frequently much longer than 2*PATH_MAX on Windows, which
actually supports paths up to 32,767 characters, but sets PATH_MAX to
260 for backwards compatibility. For full details, see
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation

Let's just hard-code the limit used by `strbuf_readlink()` to 32,767 and
make it independent of the current platform's PATH_MAX.

Based-on-a-patch-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

strbuf_readlink(): avoid calling `readlink()` twice in corner-cases

The `strbuf_readlink()` function calls `readlink()`` twice if the hint
argument specifies the exact size of the link target (e.g. by passing
stat.st_size as returned by `lstat()`). This is necessary because
`readlink(..., hint) == hint` could mean that the buffer was too small.

Use `hint + 1` as buffer size to prevent this.

Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

init: do parse _all_ core.* settings early

In Git for Windows, `has_symlinks` is set to 0 by default. Therefore, we
need to parse the config setting `core.symlinks` to know if it has been
set to `true`. In `git init`, we must do that before copying the
templates because they might contain symbolic links.

Even if the support for symbolic links on Windows has not made it to
upstream Git yet, we really should make sure that all the `core.*`
settings are parsed before proceeding, as they might very well change
the behavior of `git init` in a way the user intended.

This fixes https://github.com/git-for-windows/git/issues/3414

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mingw: do resolve symlinks in `getcwd()`

As pointed out in https://github.com/git-for-windows/git/issues/1676,
the `git rev-parse --is-inside-work-tree` command currently fails when
the current directory's path contains symbolic links.

The underlying reason for this bug is that `getcwd()` is supposed to
resolve symbolic links, but our `mingw_getcwd()` implementation did not.

We do have all the building blocks for that, though: the
`GetFinalPathByHandleW()` function will resolve symbolic links. However,
we only called that function if `GetLongPathNameW()` failed, for
historical reasons: the latter function was supported for a long time,
but the former API function was introduced only with Windows Vista, and
we used to support also Windows XP. With that support having been
dropped, we are free to call the symbolic link-resolving function right
away.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsck: snapshot default refs before object walk

Fsck has a race when operating on live repositories; consider the
following simple script that writes new commits as fsck runs:

    #!/bin/bash
    git fsck &
    PID=$!

    while ps -p $PID >/dev/null; do
        sleep 3
        git commit -q --allow-empty -m "Another commit"
    done

Since fsck walks objects for connectivity and then reads the refs at the
end to check, this can cause fsck to get confused and think that the new
refs refer to missing commits and that new reflog entries are invalid.
Running the above script in a clone of git.git results in the following
(output ellipsized to remove additional errors of the same type):

    $ ./fsck-while-writing.sh
    Checking ref database: 100% (1/1), done.
    Checking object directories: 100% (256/256), done.
    warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: invalid format - expected 'tagger' line
    Checking objects: 100% (835091/835091), done.
    error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
    error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
    error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
    error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
    [...]
    error: HEAD: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09
    error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
    error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
    error: HEAD: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d
    error: refs/heads/mybranch invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
    error: refs/heads/mybranch: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
    error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
    error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
    [...]
    error: refs/heads/mybranch: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09
    error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
    error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
    error: refs/heads/mybranch: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d
    Checking connectivity: 833846, done.
    missing commit 6724f2dfede88bfa9445a333e06e78536c0c6c0d
    Verifying commits in commit graph: 100% (242243/242243), done.

We can minimize the race opportunities by taking a snapshot of refs at
program invocation, doing the connectivity check, and then checking the
snapshotted refs afterward.  This avoids races with regular refs between
fsck and adding objects to the database, though it still leaves a race
between a gc and fsck.  We are less concerned about folks simultaneously
running gc with fsck; though, if it becomes an issue, we could lock fsck
during gc.  We definitely do not want to lock fsck during operations
that may add objects to the object store; that would be problematic for
forges.

Note that refs aren't the only problem, though; reflog entries and index
entries could be problematic as well.  For now we punt on index entries
just leaving a TODO comment, and for reflogs we use a coarse solution of
taking the time at the beginning of the program and ignoring reflog
entries newer than that time.  That may be imperfect if dealing with a
network filesystem, so we leave TODO comment for those that want to
improve that handling as well.

As a high level overview:
  * In addition to fsck_handle_ref(), which now is only a few lines long
    to process a ref, there's also a snapshot_ref() which is called
    early in the program for each ref and takes all the error checking
    logic.
  * The iterating over refs that used to be in get_default_heads() plus
    a loop over the arguments now appears in shapshot_refs().
  * There's a new process_refs() as well that kind of looks like the old
    get_default_heads() though it is streamlined due to the work done by
    snapshot_refs().

This combination of changes modifies the output of running the script
(from the beginning of this commit message) to:

    $ ./fsck-while-writing.sh
    Checking ref database: 100% (1/1), done.
    Checking object directories: 100% (256/256), done.
    warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: invalid format - expected 'tagger' line
    Checking objects: 100% (835091/835091), done.
    Checking connectivity: 833846, done.
    Verifying commits in commit graph: 100% (242243/242243), done.

While worries about live updates while running fsck is likely of most
interest for forge operators, it may also benefit those with
automated jobs (such as git maintenance) or even casual users who want
to do other work in their clone while fsck is running.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: move MIDX into packfile store

The multi-pack index still is tracked as a member of the object database
source, but ultimately the MIDX is always tied to one specific packfile
store.

Move the structure into `struct packfile_store` accordingly. This
ensures that the packfile store now keeps track of all data related to
packfiles.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: refactor `find_pack_entry()` to work on the packfile store

The function `find_pack_entry()` doesn't work on a specific packfile
store, but instead works on the whole repository. This causes a bit of a
conceptual mismatch in its callers:

  - `packfile_store_freshen_object()` supposedly acts on a store, and
    its callers know to iterate through all sources already.

  - `packfile_store_read_object_info()` behaves likewise.

The only exception that doesn't know to handle iteration through sources
is `has_object_pack()`, but that function is trivial to adapt.

Refactor the code so that `find_pack_entry()` works on the packfile
store level instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: inline `find_kept_pack_entry()`

The `find_kept_pack_entry()` function is only used in
`has_object_kept_pack()`, which is only a trivial wrapper itself. Inline
the latter into the former.

Furthermore, reorder the code so that we can drop the declaration of the
function in "packfile.h". This allows us to make the function file-local.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: only prepare owning store in `packfile_store_prepare()`

When calling `packfile_store_prepare()` we prepare not only the provided
packfile store, but also all those of all other sources part of the same
object database. This was required when the store was still sitting on
the object database level. But now that it sits on the source level it's
not anymore.

Refactor the code so that we only prepare the single packfile store
passed by the caller. Adapt callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: only prepare owning store in `packfile_store_get_packs()`

When calling `packfile_store_get_packs()` we prepare not only the
provided packfile store, but also all those of all other sources part of
the same object database. This was required when the store was still
sitting on the object database level. But now that it sits on the source
level it's not anymore.

Adapt the code so that we only prepare the MIDX of the provided store.
All callers only work in the context of a single store or call the
function in a loop over all sources, so this change shouldn't have any
practical effects.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: move packfile store into object source

The packfile store is a member of `struct object_database`, which means
that we have a single store per database. This doesn't really make much
sense though: each source connected to the database has its own set of
packfiles, so there is a conceptual mismatch here. This hasn't really
caused much of a problem in the past, but with the advent of pluggable
object databases this is becoming more of a problem because some of the
sources may not even use packfiles in the first place.

Move the packfile store down by one level from the object database into
the object database source. This ensures that each source now has its
own packfile store, and we can eventually start to abstract it away
entirely so that the caller doesn't even know what kind of store it
uses.

Note that we only need to adjust a relatively small number of callers,
way less than one might expect. This is because most callers are using
`repo_for_each_pack()`, which handles enumeration of all packfiles that
exist in the repository. So for now, none of these callers need to be
adapted. The remaining callers that iterate through the packfiles
directly and that need adjustment are those that are a bit more tangled
with packfiles. These will be adjusted over time.

Note that this patch only moves the packfile store, and there is still a
bunch of functions that seemingly operate on a packfile store but that
end up iterating over all sources. These will be adjusted in subsequent
commits.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: refactor misleading code when unusing pack windows

The function `unuse_one_window()` is responsible for unmapping one of
the packfile windows, which is done when we have exceeded the allowed
number of window.

The function receives a `struct packed_git` as input, which serves as an
additional packfile that should be considered to be closed. If not
given, we seemingly skip that and instead go through all of the
repository's packfiles. The conditional that checks whether we have a
packfile though does not make much sense anymore, as we dereference the
packfile regardless of whether or not it is a `NULL` pointer to derive
the repository's packfile store.

The function was originally introduced via f0e17e86e1 (pack: move
release_pack_memory(), 2017-08-18), and here we indeed had a caller that
passed a `NULL` pointer. That caller was later removed via 9827d4c185
(packfile: drop release_pack_memory(), 2019-08-12), so starting with
that commit we always pass a `struct packed_git`. In 9c5ce06d74
(packfile: use `repository` from `packed_git` directly, 2024-12-03) we
then inadvertently started to rely on the fact that the pointer is never
`NULL` because we use it now to identify the repository.

Arguably, it didn't really make sense in the first place that the caller
provides a packfile, as the selected window would have been overridden
anyway by the subsequent loop over all packfiles if there was an older
window. So the overall logic is quite misleading overall. The only case
where it _could_ make a difference is when there were two packfiles with
the same `last_used` value, but that case doesn't ever happen because
the `pack_used_ctr` is strictly increasing.

Refactor the code so that we instead pass in the object database to
help make the code less misleading.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: refactor kept-pack cache to work with packfile stores

The kept pack cache is a cache of packfiles that are marked as kept
either via an accompanying ".kept" file or via an in-memory flag. The
cache can be retrieved via `kept_pack_cache()`, where one needs to pass
in a repository.

Ultimately though the kept-pack cache is a property of the packfile
store, and this causes problems in a subsequent commit where we want to
move down the packfile store to be a per-object-source entity.

Prepare for this and refactor the kept-pack cache to work on top of a
packfile store instead. While at it, rename both the function and flags
specific to the kept-pack cache so that they can be properly attributed
to the respective subsystems.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: pass source to `prepare_pack()`

When preparing a packfile we pass various pieces attached to the pack's
object database source via the `struct prepare_pack_data`. Refactor this
code to instead pass in the source directly. This reduces the number of
variables we need to pass and allows for a subsequent refactoring where
we start to prepare the pack via the source.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

packfile: create store via its owning source

In subsequent patches we're about to move the packfile store from the
object database layer into the object database source layer. Once done,
we'll have one packfile store per source, where the source is owning the
store.

Prepare for this future and refactor `packfile_store_new()` to be
initialized via an object database source instead of via the object
database itself.

This refactoring leads to a weird in-between state where the store is
owned by the object database but created via the source. But this makes
subsequent refactorings easier because we can now start to access the
owning source of a given store.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin.h: update documentation

The documentation for the builtin API was moved from the technical
documentation and into a comment in builtin.h by ec14d4ecb5 (builtin.h: take
over documentation from api-builtin.txt, 2017-08-02). This documentation
wasn't updated as part of the major overhaul to include a repository struct
in 9b1cb5070f (builtin: add a repository parameter for builtin functions,
2024-09-13).

There was a brief update regarding the move from *.txt to *.adoc by
e8015223c7 (builtin.h: *.txt -> *.adoc fixes, 2025-03-03).

I noticed that there was quite a bit missing from the old documentation,
which is still visible on git-scm.com [1].

[1] https://github.com/git/git-scm.com/issues/2124

This change updates the documentation in the following ways:

1. Updates the cmd_foo() prototype to include a repository.
2. Adds some newlines to have uniformity in the list of flags.
3. Adds a description of the NO_PARSEOPT flag.
4. Describes the tests that perform checks on all builtins, which may trip
up a contributor working on a new builtin.

I double-checked these instructions against a toy example in my local branch
to be sure that it was complete.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7101: modernize test path checks

Replace old-style `test -[df]` and `! test -[df]` assertions with
the modern `test_path_is_file`, `test_path_is_dir`, and
`test_path_is_missing` helpers.

These helpers provide more informative error messages in case of
failure (e.g., "File 'foo' is missing" instead of just exit code 1).

While at it, fix a typo and an incorrect path
reference in one of the test descriptions.

Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitfaq: document using stash import/export to sync working tree

Git 2.51 learned how to import and export stashes. This is a
secure and robust way to transfer working tree states across machines
and comes with almost none of the pitfalls of rsync or other tools.
Recommend this as an alternative in the FAQ.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: git-blame: convert to new doc format

- Use _<placeholder>_ instead of <placeholder> in the description
- Use _underscores_ around math associated with <placeholders>
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.

Signed-off-by: Michael Lyons <git@michael.lyo.nz>
Acked-by: Jean-Noël AVILA <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: blame-options: convert to new doc format

- Use _<placeholder>_ instead of <placeholder> in the description
- Modify some samples to use <placeholders>
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.

Signed-off-by: Michael Lyons <git@michael.lyo.nz>
Acked-by: Jean-Noël AVILA <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

add -p: show user's hunk decision when selecting hunks

When a user is interactively deciding which hunks to use or skip for
staging, unstaging, stashing etc, there is no way to know the
decision previously chosen for a hunk when navigating through the
previous and next hunks using K/J respectively.

Improve the UI to explicitly show if a user has previously decided to
use a hunk (by pressing 'y') or skip the hunk (by pressing 'n').
This will improve clarity when and aid the navigation process for the
user.

Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: patch-id: --verbatim locks in --stable

The default `--unstable` is a legacy format that predates `--stable`.
That’s why 2871f4d4 (builtin: patch-id: add --verbatim as a command mode,
2022-10-24) made `--verbatim` lock in[1] `--stable`:

    Users of --unstable mainly care about compatibility with old git
    versions, which unstripping the whitespace would break. Thus there
    isn't a usecase for the combination of --verbatim and --unstable,
    and we don't expose this so as to not add maintainence burden.

† 1: imply `--stable`, disallow `--unstable`

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: patch-id: spell out the git-diff-tree(1) form

You specifically need `--patch` since the default output is a raw diff.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: patch-id: use definite article for the result

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

patch-id: use “patch ID” throughout

The “Description” section decided to introduce and use the term “patch
ID” for the ID value itself. Let’s use the same term on the options as
well.

Also make to sure to use bare “ID” instead of “id”.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: patch-id: capitalize Git version

Git versions are always capitalized.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: patch-id: don’t use semicolon between bullet points

These bullet points are full-fledged paragraphs with sentences. It’s
best to restrict semicolon-termination to the case when the bullet list
amounts to a list of items.[1]

† 1: Like “List: ... • first; ... • second; and ... • third.”

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The 16th batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'en/ort-recursive-d-f-conflict-fix'

The ort merge machinery hit an assertion failure in a history with
criss-cross merges renamed a directory and a non-directory, which
has been corrected.

* en/ort-recursive-d-f-conflict-fix:
merge-ort: fix corner case recursive submodule/directory conflict handling

Merge branch 'dd/t5403-modernise'

Test micro-clean-up.

* dd/t5403-modernise:
t5403: use test_path_is_file instead of test -f

Merge branch 'ds/diff-lazy-fetch-with-name-only-fix'

Running "git diff" with "--name-only" and other options that allows
us not to look at the blob contents, while objects that are lazily
fetched from a promisor remote, caused use-after-free, which has
been corrected.

* ds/diff-lazy-fetch-with-name-only-fix:
diff: avoid segfault with freed entries

Merge branch 'rs/tag-wo-the-repository'

Code clean-up.

* rs/tag-wo-the-repository:
  tag: stop using the_repository
  tag: support arbitrary repositories in parse_tag()
  tag: support arbitrary repositories in gpg_verify_tag()
  tag: use algo of repo parameter in parse_tag_buffer()

t5550: add netrc tests for http 401/403

git allows using .netrc file to supply credentials for HTTP auth.
Three test cases are added in this patch to provide missing coverage
when cloning over HTTP using .netrc file:

  - First test case checks that the git clone is successful when credentials
    are provided via .netrc file
  - Second test case checks that the git clone fails when the .netrc file
    provides invalid credentials. The HTTP server is expected to return
    401 Unauthorized in such a case. The test checks that the user is
    provided with a prompt for username/password on 401 to provide
    the valid ones.
  - Third test case checks that the git clone fails when the .netrc file
    provides credentials that are valid but do not have permission for
    this user. For example one may have multiple tokens in GitHub
    and uses the one which was not authorized for cloning this repo.
    In such a case the HTTP server returns 403 Forbidden.
    For this test, the apache.conf is modified to return a 403
    on finding a forbidden-user. No prompt for username/password is
    expected after the 403 (unlike 401). This is because prompting may wipe
    out existing credentials or conflict with custom credential helpers.

Signed-off-by: Ashlesh Gawande <git@ashlesh.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'kh/replay-invalid-onto-advance' into ps/history

* kh/replay-invalid-onto-advance:
  t3650: add more regression tests for failure conditions
  replay: die if we cannot parse object
  replay: improve code comment and die message
  replay: die descriptively when invalid commit-ish is given
  replay: find *onto only after testing for ref name
  replay: remove dead code and rearrange

Merge branch 'ps/odb-misc-fixes' into ps/packfile-store-in-odb-source

* ps/odb-misc-fixes:
odb: properly close sources before freeing them
builtin/gc: fix condition for whether to write commit graphs

t1420: modernize the lost-found test

This test indirectly checks that the lost-found folder has 2 files in it
and then checks that the expected two files exist. Make this more
deliberate by removing the old test -f and compare the actual ls of the
lost-found directory with the expected files.

Signed-off-by: Andrew Chitester <andchi@fastmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>