git.ipfire.org Git - thirdparty/git.git/log

builtin/notes: fix leaking `struct notes_tree` when merging notes

We allocate a `struct notes_tree` in `merge_commit()` which we then
initialize via `init_notes()`. It's not really necessary to allocate the
structure though given that we never pass ownership to the caller.
Furthermore, the allocation leads to a memory leak because despite its
name, `free_notes()` doesn't free the `notes_tree` but only clears it.

Fix this issue by converting the code to use an on-stack variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/rebase: fix leaking `commit.gpgsign` value

In `get_replay_opts()`, we override the `gpg_sign` field that already
got populated by `sequencer_init_config()` in case the user has
"commit.gpgsign" set in their config. This creates a memory leak because
we overwrite the previously assigned value, which may have already
pointed to an allocated string.

Let's plug the memory leak by freeing the value before we overwrite it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: fix leaking comment character config

When the comment line character has been specified multiple times in the
configuration, then `git_default_core_config()` will cause a memory leak
because it unconditionally copies the string into `comment_line_str`
without free'ing the previous value. In fact, it can't easily free the
value in the first place because it may contain a string constant.

Refactor the code such that we track allocated comment character strings
via a separate non-constant variable `comment_line_str_to_free`. Adapt
sites that set `comment_line_str` to set both and free the old value
that was stored in `comment_line_str_to_free`.

This memory leak is being hit in t3404. As there are still other memory
leaks in that file we cannot yet mark it as passing with leak checking
enabled.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

submodule-config: fix leaking name entry when traversing submodules

We traverse through submodules in the tree via `tree_entry()`, passing
to it a `struct name_entry` that it is supposed to populate with the
tree entry's contents. We unnecessarily allocate this variable instead
of passing a variable that is allocated on the stack, and the ultimately
don't even free that variable. This is unnecessary and leaks memory.

Convert the variable to instead be allocated on the stack to plug the
memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

read-cache: fix leaking hashfile when writing index fails

In `do_write_index()`, we use a `struct hashfile` to write the index
with a trailer hash. In case the write fails though, we never clean up
the allocated `hashfile` state and thus leak memory.

Refactor the code to have a common exit path where we can free this and
other allocated memory. While at it, refactor our use of `strbuf`s such
that we reuse the same buffer to avoid some unneeded allocations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bulk-checkin: fix leaking state TODO

When flushing a bulk-checking to disk we also reset the `struct
bulk_checkin_packfile` state. But while we free some of its members,
others aren't being free'd, leading to memory leaks:

  - The temporary packfile name is not getting freed.

  - The `struct hashfile` only gets freed in case we end up calling
    `finalize_hashfile()`. There are code paths though where that is not
    the case, namely when nothing has been written. For this, we need to
    make `free_hashfile()` public.

Fix those leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

object-name: fix leaking symlink paths in object context

The object context may be populated with symlink contents when reading a
symlink, but the associated strbuf doesn't ever get released when
releasing the object context, causing a memory leak. Plug it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

object-file: fix memory leak when reading corrupted headers

When reading corrupt object headers in `read_loose_object()`, we bail
out immediately. This causes a memory leak though because we would have
already initialized the zstream in `unpack_loose_header()`, and it is
the callers responsibility to finish the zstream even on error. While
this feels weird, other callsites do it correctly already.

Fix this leak by ending the zstream even on errors. We may want to
revisit this interface in the future such that the callee handles this
for us already when there was an error.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git: fix leaking system paths

Git has some flags to make it output system paths as they have been
compiled into Git. This is done by calling `system_path()`, which
returns an allocated string. This string isn't ever free'd though,
creating a memory leak.

Plug those leaks. While they are surfaced by t0211, there are more
memory leaks looming exposed by that test suite and it thus does not yet
pass with the memory leak checker enabled.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: plug memory leak when aliasing URLs

When we have a `url.*.insteadOf` configuration, then we end up aliasing
URLs when populating remotes. One place where this happens is in
`alias_all_urls()`, where we loop through all remotes and then alias
each of their URLs. The actual aliasing logic is then contained in
`alias_url()`, which returns an allocated string that contains the new
URL. This URL replaces the old URL that we have in the strvec that
contains all remote URLs.

We replace the remote URLs via `strvec_replace()`, which does not hand
over ownership of the new string to the vector. Still, we didn't free
the aliased URL and thus have a memory leak here. Fix it by freeing the
aliased string.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t9001-send-email.sh: fix quoting for mailrc --dump-aliases test

The .mailrc alias file format documents that multiple addresses are
separated by spaces. The alias file used in the t9001 --dump-aliases
mailrc test have addresses which include both a name and email. These
are unquoted, so git send-email will parse this as an alias that
translates to multiple independent addresses.

The existing test does not care about this, as --dump-aliases only dumps
the alias and not the address. However, it is incorrect for a future
where --dump-aliases could also dump the mail addresses.

Fix the test to quote the aliases properly, so that they translate to a
single address.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: drop unused parameters from add_midx_to_chain()

When loading a chained midx, we build up an array of hashes, one per
layer of the chain. But since the chain is also represented by the
linked list of multi_pack_index structs, nobody actually reads this
array. We pass it to add_midx_to_chain(), but the parameters are
completely ignored.

So we can drop those unused parameters. And then we can see that its
sole caller, load_midx_chain_fd_st(), only cares about one layer hash at a
time (for parsing each line and feeding it to the single-layer midx
code). So we can replace the array with a single object_id on the stack.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bundle: default to SHA1 when reading bundle headers

We hit a segfault when trying to open a bundle via `git bundle
list-heads` when running outside of a repository. This is caused by
c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), which stopped setting the default object hash so that
`the_hash_algo` is a `NULL` pointer when running outside of any repo.

This is only a symptom of a deeper issue though. Bundles default to the
SHA1 object format unless they advertise an "@object-format=" header.
Consequently, it has been wrong in the first place to use the object
format used by the current repository when parsing bundles. The
consequence is that trying to open a bundle that uses a different object
hash than the current repository will fail:

$ git bundle list-heads sha1.bundle
error: unrecognized header: ee4b540943284700a32591ad09f7e15bdeb2a10c HEAD (45)

Fix the bug by defaulting to the SHA1 object hash. We already handle the
"@object-format=" header as expected, so we don't need to adapt this
part.

Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/bundle: have unbundle check for repo before opening its bundle

The `git bundle unbundle` subcommand requires a repository to unbundle
the contents into. As thus, the subcommand checks whether we have a
startup repository in the first place, and if not it dies.

This check happens after we have already opened the bundle though. This
causes a segfault when running outside of a repository starting with
c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07) because we have no hash function set up, but we do try to
parse refs advertised by the bundle's header.

The next commit will fix that underlying issue by defaulting to the SHA1
object format for bundles, which will also fix the described segfault here.
But as we know that we will die anyway, we can do better than that and
avoid some vain work by moving the check for a repository before we try
to open the bundle.

Reported-by: ArcticLampyrid <ArcticLampyrid@outlook.com>
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t-reftable-readwrite: add test for known error

When using reftable_writer_add_ref() to add a ref record to a
reftable writer, The update_index of the ref record must be within
the limits set by reftable_writer_set_limits(), or REFTABLE_API_ERROR
is returned. This scenario is currently left untested. Add a test
case for the same.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t-reftable-readwrite: use 'for' in place of infinite 'while' loops

Using a for loop with an empty conditional statement is more concise
and easier to read than an infinite 'while' loop in instances
where we need a loop variable. Hence, replace such instances of a
'while' loop with the equivalent 'for' loop.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t-reftable-readwrite: use free_names() instead of a for loop

free_names() as defined by reftable/basics.{c,h} frees a NULL
terminated array of malloced strings along with the array itself.
Use this function instead of a for loop to free such an array.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: move reftable/readwrite_test.c to the unit testing framework

reftable/readwrite_test.c exercises the functions defined in
reftable/reader.{c,h} and reftable/writer.{c,h}. Migrate
reftable/readwrite_test.c to the unit testing framework. Migration
involves refactoring the tests to use the unit testing framework
instead of reftable's test framework and renaming the tests to
align with unit-tests' naming conventions.

Since some tests in reftable/readwrite_test.c use the functions
set_test_hash(), noop_flush() and strbuf_add_void() defined in
reftable/test_framework.{c,h} but these files are not #included
in the ported unit test, copy these functions in the new test file.

While at it, ensure structs are 0-initialized with '= { 0 }'
instead of '= { NULL }'.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: hide functions using `the_repository` by default

The config subsystem provides a bunch of legacy functions that read or
set configuration for `the_repository`. The use of those functions is
discouraged, and it is easy to miss the implicit dependency on
`the_repository` that calls to those functions may cause.

Move all config-related functions that use `the_repository` into a block
that gets only conditionally compiled depending on whether or not the
macro has been defined. This also removes all dependencies on that
variable in "config.c", allowing us to remove the definition of said
preprocessor macro.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

global: prepare for hiding away repo-less config functions

We're about to hide config functions that implicitly depend on
`the_repository` behind the `USE_THE_REPOSITORY_VARIABLE` macro. This
will uncover a bunch of dependents that transitively relied on the
global variable, but didn't define the macro yet.

Adapt them such that we define the macro to prepare for this change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: don't depend on `the_repository` with branch conditions

When computing branch "includeIf" conditions we use `the_repository` to
obtain the main ref store. We really shouldn't depend on this global
repository though, but should instead use the repository that is being
passed to us via `struct config_include_data`. Otherwise, when parsing
configuration of e.g. submodules, we may end up evaluating the condition
the via the wrong refdb.

Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: don't have setters depend on `the_repository`

Some of the setters that accept a `struct repository` still implicitly
rely on `the_repository` via `git_config_set_multivar_in_file()`. While
this function would typically use the caller-provided path, it knows to
fall back to using the configuration path indicated by `the_repository`.

Adapt those functions to instead use the caller-provided repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: pass repo to functions that rename or copy sections

Refactor functions that rename or copy config sections to accept a
`struct repository` such that we can get rid of the implicit dependency
on `the_repository`. Rename the functions accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: pass repo to `git_die_config()`

Refactor `git_die_config()` to accept a `struct repository` such that we
can get rid of the implicit dependency on `the_repository`. Rename the
function accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: pass repo to `git_config_get_expiry_in_days()`

Refactor `git_config_get_expiry_in_days()` to accept a `struct
repository` such that we can get rid of the implicit dependency on
`the_repository`. Rename the function accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: pass repo to `git_config_get_expiry()`

Refactor `git_config_get_expiry()` to accept a `struct repository` such
that we can get rid of the implicit dependency on `the_repository`.
Rename the function accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: pass repo to `git_config_get_max_percent_split_change()`

Refactor `git_config_get_max_percent_split_change()` to accept a `struct
repository` such that we can get rid of the implicit dependency on
`the_repository`. Rename the function accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: pass repo to `git_config_get_split_index()`

Refactor `git_config_get_split_index()` to accept a `struct repository`
such that we can get rid of the implicit dependency on `the_repository`.
Rename the function accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: pass repo to `git_config_get_index_threads()`

Refactor `git_config_get_index_threads()` to accept a `struct
repository` such that we can get rid of the implicit dependency on
`the_repository`. Rename the function accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: expose `repo_config_clear()`

While we already have `repo_config_clear()` as an alternative to
`git_config_clear()` that doesn't rely on `the_repository`, it is not
exposed to callers outside of the config subsystem. Do so.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: introduce missing setters that take repo as parameter

While we already provide some of the config-setting interfaces with a
`struct repository` as parameter, others only have a variant that
implicitly depends on `the_repository`. Fill in those gaps such that we
can start to deprecate the repo-less variants.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

path: hide functions using `the_repository` by default

The path subsystem provides a bunch of legacy functions that compute
paths relative to the "gitdir" and "commondir" directories of the global
`the_repository` variable. Use of those functions is discouraged, and it
is easy to miss the implicit dependency on `the_repository` that calls
to those functions may cause.

With `USE_THE_REPOSITORY_VARIABLE`, we have recently introduced a tool
that allows us to get rid of such functions over time. With this macro,
we can hide away functions that have such implicit dependency such that
other subsystems that want to be free of `the_repository` will not use
them by accident.

Move all path-related functions that use `the_repository` into a block
that gets only conditionally compiled depending on whether or not the
macro has been defined. This also removes all dependencies on that
variable in "path.c", allowing us to remove the definition of said
preprocessor macro.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

path: stop relying on `the_repository` in `worktree_git_path()`

When not provided a worktree, then `worktree_git_path()` will fall back
to returning a path relative to the main repository. In this case, we
implicitly rely on `the_repository` to derive the path. Remove this
dependency by passing a `struct repository` as parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

path: stop relying on `the_repository` when reporting garbage

We access `the_repository` in `report_linked_checkout_garbage()` both
directly and indirectly via `get_git_dir()`. Remove this dependency by
instead passing a `struct repository` as parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hooks: remove implicit dependency on `the_repository`

We implicitly depend on `the_repository` in our hook subsystem because
we use `strbuf_git_path()` to compute hook paths. Remove this dependency
by accepting a `struct repository` as parameter instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

editor: do not rely on `the_repository` for interactive edits

We implicitly rely on `the_repository` when editing a file interactively
because we call `git_path()`. Adapt the function to instead take a
`struct repository` as a parameter so that we can remove this hidden
dependency.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

path: expose `do_git_common_path()` as `repo_common_pathv()`

With the same reasoning as the preceding commit, expose the function
`do_git_common_path()` as `repo_common_pathv()`. While at it, reorder
parameters such that they match the order we have in `repo_git_pathv()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

path: expose `do_git_path()` as `repo_git_pathv()`

We're about to move functions of the "path" subsytem that do not use a
`struct repository` into "path.h" as static inlined functions. This will
require us to call `do_git_path()`, which is internal to "path.c".

Expose the function as `repo_git_pathv()` to prepare for the change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remerge-diff: clean up temporary objdir at a central place

After running a diff between two things, or a series of diffs while
walking the history, the diff computation is concluded by a call to
diff_result_code() to extract the exit status of the diff machinery.

The function can work on "struct diffopt", but all the callers
historically and currently pass "struct diffopt" that is embedded in
the "struct rev_info" that is used to hold the remerge_diff bit and
the remerge_objdir variable that points at the temporary object
directory in use.

Redefine diff_result_code() to take the whole "struct rev_info" to
give it an access to these members related to remerge-diff, so that
it can get rid of the temporary object directory for any and all
callers that used the feature. We can lose the equivalent code to
do so from the code paths for individual commands, diff-tree, diff,
and log.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

remerge-diff: lazily prepare temporary objdir on demand

It is error prone for each caller that sets revs.remerge_diff bit
to be responsible for preparing a temporary object directory and
rotate it into the list of alternate object stores, making it the
primary object store.

Instead, remove the code to set up and arrange the temporary object
directory from the current callers and implement it in the code that
runs remerge-diff logic. The code to undo the futzing of the list
of alternate object store is still spread across the callers, but we
will deal with it in future steps.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: grammofix in git-diff-tree

Describe in present tense what the option does when it is given.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

tutorial: grammofix

We say "these", so "range notations" must be plural.

Reported-by: Furkan Akkurt
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ref-filter: populate symref from iterator

With a previous commit, the reference the symbolic ref points to is saved
in the ref iterator records. Instead of making a separate call to
resolve_refdup() each time, we can just populate the ref_array_item with
the value from the iterator.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: add referent to each_ref_fn

Add a parameter to each_ref_fn so that callers to the ref APIs
that use this function as a callback can have acess to the
unresolved value of a symbolic ref.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: keep track of unresolved reference value in iterators

Since ref iterators do not hold onto the direct value of a reference
without resolving it, the only way to get ahold of a direct value of a
symbolic ref is to make a separate call to refs_read_symbolic_ref.

To make accessing the direct value of a symbolic ref more efficient,
let's save the direct value of the ref in the iterators for both the
files backend and the reftable backend.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff-tree: fix crash when used with --remerge-diff

When using "git-diff-tree" to get the tree diff for merge commits with
the diff format set to `remerge`, a bug is triggered as shown below:

  $ git diff-tree -r --remerge-diff 363337e6eb
  363337e6eb812d0c0d785ed4261544f35559ff8b
  BUG: log-tree.c:1006: did a remerge diff without remerge_objdir?!?

This bug is reported by `log-tree.c:do_remerge_diff`, where a bug check
added in commit 7b90ab467a (log: clean unneeded objects during log
--remerge-diff, 2022-02-02) detects the absence of `remerge_objdir` when
attempting to clean up temporary objects generated during the remerge
process.

After some further digging, I find that the remerge-related diff options
were introduced in db757e8b8d (show, log: provide a --remerge-diff
capability, 2022-02-02), which also affect the setup of `rev_info` for
"git-diff-tree", but were not accounted for in the original
implementation (inferred from the commit message).

Elijah Newren, the author of the remerge diff feature, notes that other
callers of `log-tree.c:log_tree_commit` (the only caller of
`log-tree.c:do_remerge_diff`) also exist, but:

  `builtin/am.c`: manually sets all flags; remerge_diff is not among them
  `sequencer.c`: manually sets all flags; remerge_diff is not among them

so `builtin/diff-tree.c` really is the only caller that was overlooked
when remerge-diff functionality was added.

This commit resolves the crash by adding `remerge_objdir` setup logic to
`builtin/diff-tree.c`, mirroring `builtin/log.c:cmd_log_walk_no_free`.
It also includes the necessary cleanup for `remerge_objdir`.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

tests: drop use of 'tee' that hides exit status

A few tests have "| tee output" downstream of a git command, and
then inspect the contents of the file. The net effect is that we
use an extra process, and hide the exit status from the upstream git
command.

In any of these tests, I do not see a reason why we want to hide a
possible failure from these git commands. Replace the use of tee
with a plain simple redirection.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

The third batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ps/p4-tests-updates'

Perforce tests have been updated.

* ps/p4-tests-updates:
  t98xx: mark Perforce tests as memory-leak free
  ci: update Perforce version to r23.2
  t98xx: fix Perforce tests with p4d r23 and newer

Merge branch 'dh/encoding-trace-optim'

An expensive operation to prepare tracing was done in re-encoding
code path even when the tracing was not requested, which has been
corrected.

* dh/encoding-trace-optim:
convert: return early when not tracing

Merge branch 'ps/doc-more-c-coding-guidelines'

Some project conventions have been added to CodingGuidelines.

* ps/doc-more-c-coding-guidelines:
  Documentation: consistently use spaces inside initializers
  Documentation: document idiomatic function names
  Documentation: document naming schema for structs and their functions
  Documentation: clarify indentation style for C preprocessor directives
  clang-format: fix indentation width for preprocessor directives

Merge branch 'rs/grep-omit-blank-lines-after-function-at-eof'

"git grep -W" omits blank lines that follow the found function at
the end of the file, just like it omits blank lines before the next
function.

* rs/grep-omit-blank-lines-after-function-at-eof:
grep: -W: skip trailing empty lines at EOF, too

Merge branch 'dd/notes-empty-no-edit-by-default'

"git notes add -m '' --allow-empty" and friends that take prepared
data to create notes should not invoke an editor, but it started
doing so since Git 2.42, which has been corrected.

* dd/notes-empty-no-edit-by-default:
notes: do not trigger editor when adding an empty note

Merge branch 'es/shell-check-updates'

Test script linter has been updated to catch an attempt to use
one-shot export construct "VAR=VAL func" for shell functions (which
does not work for some shells) better.

* es/shell-check-updates:
  check-non-portable-shell: improve `VAR=val shell-func` detection
  check-non-portable-shell: suggest alternative for `VAR=val shell-func`
  check-non-portable-shell: loosen one-shot assignment error message
  t4034: fix use of one-shot variable assignment with shell function
  t3430: drop unnecessary one-shot "VAR=val shell-func" invocation

Merge branch 'rj/add-p-pager'

A 'P' command to "git add -p" that passes the patch hunk to the
pager has been added.

* rj/add-p-pager:
  add-patch: render hunks through the pager
  pager: introduce wait_for_pager
  pager: do not close fd 2 unnecessarily
  add-patch: test for 'p' command

Merge branch 'ks/unit-test-comment-typofix'

Typofix.

* ks/unit-test-comment-typofix:
unit-tests/test-lib: fix typo in check_pointer_eq() description

t7004: make use of write_script

Use write_script which takes care of emitting the `#!/bin/sh` line
and the `chmod +x`.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7004: use single quotes instead of double quotes

Some test bodies and test description are surrounded with double
quotes instead of single quotes, violating our coding style.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7004: begin the test body on the same line as test_expect_success

Test body should begin with a single quote right after the test
description instead of backslash followed by new line.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7004: description on the same line as test_expect_success

There are several tests in t7004 where the test description that
follows `test_expect_success` is on a separate line, violating our
coding style. Adapt these to be on the same line.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7004: do not prepare things outside test_expect_success

Do not prepare expect and other things outside test_expect_success.
If such code fails for some reason, we won't necessarily hear about
it in a timely fashion (or perhaps at all). By placing all code inside
`test_expect_success` it ensures that we know immediately if it fails.

Also add '\' before EOF to avoid shell interpolation and '-' to allow
indentation of the body.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7004: use indented here-doc

Use <<-\EOF instead of <<\EOF where the latter allows us to indent
the body of the here-doc.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7004: one command per line

One of the tests in t7004 has multiple commands on a single line,
which is discouraged. Adapt these by splitting up these into one
line per command.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t7004: remove space after redirect operators

Modernize 't7004' by removing whitespace after redirect operators.

Signed-off-by: AbdAlRahman Gad <abdobngad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/stack: handle locked tables during auto-compaction

When compacting tables, it may happen that we want to compact a set of
tables which are already locked by a concurrent process that compacts
them. In the case where we wanted to perform a full compaction of all
tables it is sensible to bail out in this case, as we cannot fulfill the
requested action.

But when performing auto-compaction it isn't necessarily in our best
interest of us to abort the whole operation. For example, due to the
geometric compacting schema that we use, it may be that process A takes
a lot of time to compact the bulk of all tables whereas process B
appends a bunch of new tables to the stack. B would in this case also
notice that it has to compact the tables that process A is compacting
already and thus also try to compact the same range, probably including
the new tables it has appended. But because those tables are locked
already, it will fail and thus abort the complete auto-compaction. The
consequence is that the stack will grow longer and longer while A isn't
yet done with compaction, which will lead to a growing performance
impact.

Instead of aborting auto-compaction altogether, let's gracefully handle
this situation by instead compacting tables which aren't locked. To do
so, instead of locking from the beginning of the slice-to-be-compacted,
we start locking tables from the end of the slice. Once we hit the first
table that is locked already, we abort. If we succeeded to lock two or
more tables, then we simply reduce the slice of tables that we're about
to compact to those which we managed to lock.

This ensures that we can at least make some progress for compaction in
said scenario. It also helps in other scenarios, like for example when a
process died and left a stale lockfile behind. In such a case we can at
least ensure some compaction on a best-effort basis.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/stack: fix corruption on concurrent compaction

The locking employed by compaction uses the following schema:

  1. Lock "tables.list" and verify that it matches the version we have
     loaded in core.

  2. Lock each of the tables in the user-supplied range of tables that
     we are supposed to compact. These locks prohibit any concurrent
     process to compact those tables while we are doing that.

  3. Unlock "tables.list". This enables concurrent processes to add new
     tables to the stack, but also allows them to compact tables outside
     of the range of tables that we have locked.

  4. Perform the compaction.

  5. Lock "tables.list" again.

  6. Move the compacted table into place.

  7. Write the new order of tables, including the compacted table, into
     the lockfile.

  8. Commit the lockfile into place.

Letting concurrent processes modify the "tables.list" file while we are
doing the compaction is very much part of the design and thus expected.
After all, it may take some time to compact tables in the case where we
are compacting a lot of very large tables.

But there is a bug in the code. Suppose we have two processes which are
compacting two slices of the table. Given that we lock each of the
tables before compacting them, we know that the slices must be disjunct
from each other. But regardless of that, compaction performed by one
process will always impact what the other process needs to write to the
"tables.list" file.

Right now, we do not check whether the "tables.list" has been changed
after we have locked it for the second time in (5). This has the
consequence that we will always commit the old, cached in-core tables to
disk without paying to respect what the other process has written. This
scenario would then lead to data loss and corruption.

This can even happen in the simpler case of one compacting process and
one writing process. The newly-appended table by the writing process
would get discarded by the compacting process because it never sees the
new table.

Fix this bug by re-checking whether our stack is still up to date after
locking for the second time. If it isn't, then we adjust the indices of
tables to replace in the updated stack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/stack: use lock_file when adding table to "tables.list"

When modifying "tables.list", we need to lock the list before updating
it to ensure that no concurrent writers modify the list at the same
point in time. While we do this via the `lock_file` subsystem when
compacting the stack, we manually handle the lock when adding a new
table to it. While not wrong, it is at least inconsistent.

Refactor the code to consistently lock "tables.list" via the `lock_file`
subsytem.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/stack: do not die when fsyncing lock file files

We use `fsync_component_or_die()` when committing an addition to the
"tables.list" lock file, which unsurprisingly dies in case the fsync
fails. Given that this is part of the reftable library, we should never
die and instead let callers handle the error.

Adapt accordingly and use `fsync_component()` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/stack: simplify tracking of table locks

When compacting tables, we store the locks of all tables we are about to
compact in the `table_locks` array. As we currently only ever compact
all tables in the user-provided range or none, we simply track those
locks via the indices of the respective tables in the merged stack.

This is about to change though, as we will introduce a mode where auto
compaction gracefully handles the case of already-locked files. In this
case, it may happen that we only compact a subset of the user-supplied
range of tables. In this case, the indices will not necessarily match
the lock indices anymore.

Refactor the code such that we track the number of locks via a separate
variable. The resulting code is expected to perform the same, but will
make it easier to perform the described change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/stack: update stats on failed full compaction

When auto-compaction fails due to a locking error, we update the
statistics to indicate this failure. We're not doing the same when
performing a full compaction.

Fix this inconsistency by using `stack_compact_range_stats()`, which
handles the stat update for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/stack: test compaction with already-locked tables

We're lacking test coverage for compacting tables when some of the
tables that we are about to compact are locked. Add two tests that
exercise this, one for auto-compaction and one for full compaction.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/stack: extract function to setup stack with N tables

We're about to add two tests, and both of them will want to initialize
the reftable stack with a set of N tables. Introduce a new function that
handles this and refactor existing tests that use such a setup to use
it.

Note that this changes the exact records contained in the preexisting
tests. This is fine though as we only care about the shape of the stack
here, not the shape of each table.

Furthermore, with this change we now start to disable auto compaction
when writing the tables, as otherwise we might not end up with the
expected amount of new tables added. This also slightly changes the
behaviour of these tests, but the properties we care for remain intact.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/stack: refactor function to gather table sizes

Refactor the function that gathers table sizes to be more idiomatic. For
one, use `REFTABLE_CALLOC_ARRAY()` instead of `reftable_calloc()`.
Second, avoid using an integer to iterate through the tables in the
reftable stack given that `stack_len` itself is using a `size_t`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsck: add ref name check for files backend

The git-fsck(1) only implicitly checks the reference, it does not fully
check refs with bad format name such as standalone "@".

However, a file ending with ".lock" should not be marked as having a bad
ref name. It is expected that concurrent writers may have such lock files.
We currently ignore this situation. But for bare ".lock" file, we will
report it as error.

In order to provide such checks, add a new fsck message id "badRefName"
with default ERROR type. Use existing "check_refname_format" to explicit
check the ref name. And add a new unit test to verify the functionality.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

files-backend: add unified interface for refs scanning

For refs and reflogs, we need to scan its corresponding directories to
check every regular file or symbolic link which shares the same pattern.
Introduce a unified interface for scanning directories for
files-backend.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/refs: add verify subcommand

Introduce a new subcommand "verify" in git-refs(1) to allow the user to
check the reference database consistency and also this subcommand will
be used as the entry point of checking refs for "git-fsck(1)".

Add "verbose" field into "fsck_options" to indicate whether we should
print verbose messages when checking refs and objects consistency.

Remove bit-field for "strict" field, this is because we cannot take
address of a bit-field which makes it unhandy to set member variables
when parsing the command line options.

The "git-fsck(1)" declares "fsck_options" variable with "static"
identifier which avoids complaint by the leak-checker. However, in
"git-refs verify", we need to do memory clean manually. Thus add
"fsck_options_clear" function in "fsck.c" to provide memory clean
operation.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: set up ref consistency check infrastructure

The "struct ref_store" is the base class which contains the "be" pointer
which provides backend-specific functions whose interfaces are defined
in the "ref_storage_be". We could reuse this polymorphism to define only
one interface. For every backend, we need to provide its own function
pointer.

The interfaces defined in the `ref_storage_be` are carefully structured
in semantic. It's organized as the five parts:

1. The name and the initialization interfaces.
2. The ref transaction interfaces.
3. The ref internal interfaces (pack, rename and copy).
4. The ref filesystem interfaces.
5. The reflog related interfaces.

To keep consistent with the git-fsck(1), add a new interface named
"fsck_refs_fn" to the end of "ref_storage_be". This semantic cannot be
grouped into any above five categories. Explicitly add blank line to
make it different from others.

Last, implement placeholder functions for each ref backends.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsck: add refs report function

Introduce a new struct "fsck_ref_report" to contain the information we
need when reporting refs-related messages.

With the new "fsck_vreport" function, add a new function
"fsck_report_ref" to report refs-related fsck error message. Unlike
"report" function uses the exact parameters, we simply pass "struct
fsck_ref_report *report" as the parameter. This is because at current we
don't know exactly how many fields we need. By passing this parameter,
we don't need to change this function prototype when we want to add more
information into "fsck_ref_report".

We have introduced "fsck_report_ref" function to report the error
message for refs. We still need to add the corresponding callback
function. Create refs-specific "error_func" callback
"fsck_refs_error_function".

Last, add "FSCK_REFS_OPTIONS_DEFAULT" macro to create default options
when checking ref consistency.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsck: add a unified interface for reporting fsck messages

The static function "report" provided by "fsck.c" aims at checking error
type and calling the callback "error_func" to report the message. Both
refs and objects need to check the error type of the current fsck
message. In order to extract this common behavior, create a new function
"fsck_vreport". Instead of using "...", provide "va_list" to allow more
flexibility.

Instead of changing "report" prototype to be align with the
"fsck_vreport" function, we leave the "report" prototype unchanged due
to the reason that there are nearly 62 references about "report"
function. Simply change "report" function to use "fsck_vreport" to
report objects related messages.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsck: make "fsck_error" callback generic

The "fsck_error" callback is designed to report the objects-related
error messages. It accepts two parameter "oid" and "object_type" which
is not generic. In order to provide a unified callback which can report
either objects or refs, remove the objects-related parameters and add
the generic parameter "void *fsck_report".

Create a new "fsck_object_report" structure which incorporates the
removed parameters "oid" and "object_type". Then change the
corresponding references to adapt to new "fsck_error" callback.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsck: rename objects-related fsck error functions

The names of objects-related fsck error functions are generic. It's OK
when there is only object database check. However, we are going to
introduce refs database check report function. To avoid ambiguity,
rename object-related fsck error functions to explicitly indicate these
functions are used to report objects-related messages.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fsck: rename "skiplist" to "skip_oids"

The "skiplist" field in "fsck_options" is related to objects. Because we
are going to introduce ref consistency check, the "skiplist" name is too
general which will make the caller think "skiplist" is related to both
the refs and objects.

It may seem that for both refs and objects, we should provide a general
"skiplist" here. However, the type for "skiplist" is `struct oidset`
which is totally unsuitable for refs.

To avoid above ambiguity, rename "skiplist" to "skip_oids".

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

object: fix leaking packfiles when closing object store

When calling `raw_object_store_clear()`, we close and free several
resources associated with the object store. Part of that is to close and
free all the packfiles, which is handled by `close_object_store()`.

That function really only ends up closing the packfiles though, but it
doesn't free them. And in fact it can't, as that function is being
called via `run_command()` when `close_object_store = 1`, which is done
e.g. when we execute git-maintenance(1). At that point, other structures
may still have references on those packfiles, and thus we cannot free
them here. So while it is in fact intentional that we really only close
them, the result is a memory leak because `raw_object_store_clear()`
does not free them, either.

Fix the leak by freeing the packfiles in `raw_object_store_clear()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

submodule: fix leaking seen submodule names

We keep track of submodules we have already seen via a string map such
that we don't process the same submodule twice. We never free that map
though, causing a memory leak.

Fix this leak by clearing the map.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

submodule: fix leaking fetch tasks

When done with a fetch task used for parallel fetches of submodules, we
need to both call `fetch_task_release()` to release the task's contents
and `free()` to release the task itself. Most sites do this already, but
some only call `fetch_task_release()` and thus leak memory.

While we could trivially fix this by adding the two missing calls to
free(3P), the result would be that we always call both functions. Let's
thus refactor the code such that `fetch_task_release()` also frees the
structure itself. Rename it to `fetch_task_free()` accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/submodule: allow "add" to use different ref storage format

Same as with "clone", users may want to add a submodule to a repository
with a non-default ref storage format. Wire up a new `--ref-format=`
option that works the same as for `git submodule clone`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: fix ref storage format for submodule ref stores

When opening a submodule ref storage we accidentally use the ref storage
format of the owning repository, not of the submodule repository. As
submodules may have a different storage format than their parent repo
this can lead to bugs when trying to access the submodule ref storage
from the parent repository.

One such bug was reported when performing a recursive pull with mixed
ref stores, which fails with:

$ git pull --recursive
fatal: Unable to find current revision in submodule path 'path/to/sub'

The same issue occurs when adding a repository contained in the working
tree with a different ref storage format via `git submodule add`.

Fix the bug by using the submodule repository's ref storage format
instead and add some tests. Note that the test for `git submodule
status` was included as a precaution, only. The command worked alright
even without the bugfix.

Reported-by: Jeppe Øland <joland@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/clone: propagate ref storage format to submodules

When recursively cloning a repository with a non-default ref storage
format, e.g. by passing the `--ref-format=` option, then only the
top-level repository will end up using that ref storage format, and
all recursively cloned submodules will instead use the default format.

While mixed-format constellations are expected to work alright, the
outcome still is somewhat surprising as we have essentially ignored
the user's request.

Fix this by propagating the requested ref format to cloned submodules.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/submodule: allow cloning with different ref storage format

As submodules are proper self-contained repositories, it is perfectly
valid for them to have a different ref storage format than their parent
repository. There is no obvious way for users to ask for the ref storage
format when initializing submodules though. Whether the setup of such
mixed-ref-storage-format constellations is all that useful remains to be
seen. But there is no good reason to not expose such an option, and we
will require it in a subsequent patch.

Introduce a new `--ref-format=` option for git-submodule(1) that allows
the user to pick the ref storage format. This option will also be used
in a subsequent commit, where we start to propagate the same flag from
git-clone(1) to cloning submodules with the `--recursive` switch.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

git-submodule.sh: break overly long command lines

For most of the subcommands of git-submodule(1), we end up passing a
bunch of arguments to the submodule helper. This quickly leads to overly
long lines, where it becomes hard to spot what has changed when one
needs to modify them.

Break up these lines into one argument per line, similarly to how it is
done for the "clone" subcommand already.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

transport: mark more tests leak-free

After fixing a transport leak, a few more tests have become
leak-free. Mark them as such.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

transport: fix leak with transport helper URLs

Transport URLs can be prefixed with "foo::", which would tell us that
the transport uses a remote helper called "foo". We extract the helper
name by `xstrndup()`ing the prefix before the double-colons, but never
free that string.

Fix this leak by assigning the result to a separate local variable that
we can then free upon returning.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ps/refs-wo-the-repository' into ps/config-wo-the-repository

* ps/refs-wo-the-repository:
  refs/reftable: stop using `the_repository`
  refs/packed: stop using `the_repository`
  refs/files: stop using `the_repository`
  refs/files: stop using `the_repository` in `parse_loose_ref_contents()`
  refs: stop using `the_repository`

Merge branch 'ps/leakfixes-part-3' into ps/leakfixes-part-4

* ps/leakfixes-part-3: (24 commits)
  commit-reach: fix trivial memory leak when computing reachability
  convert: fix leaking config strings
  entry: fix leaking pathnames during delayed checkout
  object-name: fix leaking commit list items
  t/test-repository: fix leaking repository
  builtin/credential-cache: fix trivial leaks
  builtin/worktree: fix leaking derived branch names
  builtin/shortlog: fix various trivial memory leaks
  builtin/rerere: fix various trivial memory leaks
  builtin/credential-store: fix leaking credential
  builtin/show-branch: fix several memory leaks
  builtin/rev-parse: fix memory leak with `--parseopt`
  builtin/stash: fix various trivial memory leaks
  builtin/remote: fix various trivial memory leaks
  builtin/remote: fix leaking strings in `branch_list`
  builtin/ls-remote: fix leaking `pattern` strings
  builtin/submodule--helper: fix leaking buffer in `is_tip_reachable`
  builtin/submodule--helper: fix leaking clone depth parameter
  builtin/name-rev: fix various trivial memory leaks
  builtin/describe: fix trivial memory leak when describing blob
  ...

midx: implement support for writing incremental MIDX chains

Now that the rest of the MIDX subsystem and relevant callers have been
updated to learn about how to read and process incremental MIDX chains,
let's finally update the implementation in `write_midx_internal()` to be
able to write incremental MIDX chains.

This new feature is available behind the `--incremental` option for the
`multi-pack-index` builtin, like so:

    $ git multi-pack-index write --incremental

The implementation for doing so is relatively straightforward, and boils
down to a handful of different kinds of changes implemented in this
patch:

  - The `compute_sorted_entries()` function is taught to reject objects
    which appear in any existing MIDX layer.

  - Functions like `write_midx_revindex()` are adjusted to write
    pack_order values which are offset by the number of objects in the
    base MIDX layer.

  - The end of `write_midx_internal()` is adjusted to move
    non-incremental MIDX files when necessary (i.e. when creating an
    incremental chain with an existing non-incremental MIDX in the
    repository).

There are a handful of other changes that are introduced, like new
functions to clear incremental MIDX files that are unrelated to the
current chain (using the same "keep_hash" mechanism as in the
non-incremental case).

The tests explicitly exercising the new incremental MIDX feature are
relatively limited for two reasons:

  1. Most of the "interesting" behavior is already thoroughly covered in
     t5319-multi-pack-index.sh, which handles the core logic of reading
     objects through a MIDX.

     The new tests in t5334-incremental-multi-pack-index.sh are mostly
     focused on creating and destroying incremental MIDXs, as well as
     stitching their results together across layers.

  2. A new GIT_TEST environment variable is added called
     "GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL", which modifies the
     entire test suite to write incremental MIDXs after repacking when
     combined with the "GIT_TEST_MULTI_PACK_INDEX" variable.

     This exercises the long tail of other interesting behavior that is
     defined implicitly throughout the rest of the CI suite. It is
     likewise added to the linux-TEST-vars job.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t/t5313-pack-bounds-checks.sh: prepare for sub-directories

Prepare for sub-directories to appear in $GIT_DIR/objects/pack by
adjusting the copy, remove, and chmod invocations to perform their
behavior recursively.

This prepares us for the new $GIT_DIR/objects/pack/multi-pack-index.d
directory which will be added in a following commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'

Two years ago, commit ff1e653c8e2 (midx: respect
'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP', 2021-08-31) introduced a new
environment variable which caused the test suite to write MIDX bitmaps
after any 'git repack' invocation.

At the time, this was done to help flush out any bugs with MIDX bitmaps
that weren't explicitly covered in the t5326-multi-pack-bitmap.sh
script.

Two years later, that flag has served us well and is no longer providing
meaningful coverage, as the script in t5326 has matured substantially
and covers many more interesting cases than it did back when ff1e653c8e2
was originally written.

Remove the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment variable
as it is no longer serving a useful purpose. More importantly, removing
this variable clears the way for us to introduce a new one to help
similarly flush out bugs related to incremental MIDX chains.

Because these incremental MIDX chains are (for now) incompatible with
MIDX bitmaps, we cannot have both.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: implement verification support for incremental MIDXs

Teach the verification implementation used by `git multi-pack-index
verify` to perform verification for incremental MIDX chains by
independently validating each layer within the chain.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: support reading incremental MIDX chains

Now that the MIDX machinery's internals have been taught to understand
incremental MIDXs over the previous handful of commits, the MIDX
machinery itself can begin reading incremental MIDXs.

(Note that while the on-disk format for incremental MIDXs has been
defined, the writing end has not been implemented. This will take place
in the commit after next.)

The core of this change involves following the order specified in the
MIDX chain in reverse and opening up MIDXs in the chain one-by-one,
adding them to the previous layer's `->base_midx` pointer at each step.

In order to implement this, the `load_multi_pack_index()` function is
taught to call a new `load_multi_pack_index_chain()` function if loading
a non-incremental MIDX failed via `load_multi_pack_index_one()`.

When loading a MIDX chain, `load_midx_chain_fd_st()` reads each line in
the file one-by-one and dispatches calls to
`load_multi_pack_index_one()` to read each layer of the MIDX chain. When
a layer was successfully read, it is added to the MIDX chain by calling
`add_midx_to_chain()` which validates the contents of the `BASE` chunk,
performs some bounds checks on the number of combined packs and objects,
and attaches the new MIDX by assigning its `base_midx` pointer to the
existing part of the chain.

As a supplement to this, introduce a new mode in the test-read-midx
test-tool which allows us to read the information for a specific MIDX in
the chain by specifying its trailing checksum via the command-line
arguments like so:

$ test-tool read-midx .git/objects [checksum]

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs

The function `midx_fanout_add_midx_fanout()` is used to help construct
the fanout table when generating a MIDX by reusing data from an existing
MIDX.

Prepare this function to work with incremental MIDXs by making a few
changes:

  - The bounds checks need to be adjusted to start object lookups taking
    into account the number of objects in the previous MIDX layer (i.e.,
    by starting the lookups at position `m->num_objects_in_base` instead
    of position 0).

  - Likewise, the bounds checks need to end at `m->num_objects_in_base`
    objects after `m->num_objects`.

  - Finally, `midx_fanout_add_midx_fanout()` needs to recur on earlier
    MIDX layers when dealing with an incremental MIDX chain by calling
    itself when given a MIDX with a non-NULL `base_midx`.

Note that after 0c5a62f14b (midx-write.c: do not read existing MIDX with
`packs_to_include`, 2024-06-11), we do not use this function with an
existing MIDX (incremental or not) when generating a MIDX with
--stdin-packs, and likewise for incremental MIDXs.

But it is still used when adding the fanout table from an incremental
MIDX when generating a non-incremental MIDX (without --stdin-packs, of
course).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>