git.ipfire.org Git - thirdparty/git.git/log

hash: make `is_null_oid()` independent of `the_repository`

The function `is_null_oid()` uses `oideq(oid, null_oid())` to check
whether a given object ID is the all-zero object ID. `null_oid()`
implicitly relies on `the_repository` though to return the correct null
object ID.

Get rid of this dependency by always comparing the complete hash array
for being all-zeroes. This is possible due to the refactoring of object
IDs so that their hash arrays are always fully initialized.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hash: convert `oidcmp()` and `oideq()` to compare whole hash

With the preceding commit, the hash array of object IDs is now fully
zero-padded even when the hash algorithm's output is smaller than the
array length. With that, we can now adapt both `oidcmp()` and `oideq()`
to unconditionally memcmp(3P) the whole array instead of depending on
the hash size.

While it may feel inefficient to compare unused bytes for e.g. SHA-1, in
practice the compiler should now be able to produce code that is better
optimized both because we have no branch anymore, but also because the
size to compare is now known at compile time. Goldbolt spits out the
following assembly on an x86_64 platform with GCC 14.1 for the old and
new implementations of `oidcmp()`:

    oidcmp_old:
            movsx   rax, DWORD PTR [rdi+32]
            test    eax, eax
            jne     .L2
            mov     rax, QWORD PTR the_repository[rip]
            cmp     QWORD PTR [rax+16], 32
            je      .L6
    .L4:
            mov     edx, 20
            jmp     memcmp
    .L2:
            lea     rdx, [rax+rax*2]
            lea     rax, [rax+rdx*4]
            lea     rax, hash_algos[0+rax*8]
            cmp     QWORD PTR [rax+16], 32
            jne     .L4
    .L6:
            mov     edx, 32
            jmp     memcmp

    oidcmp_new:
            mov     edx, 32
            jmp     memcmp

The new implementation gets ridi of all the branches and effectively
only ends setting up `edx` for `memcmp()` and then calling it.

And for `oideq()`:

    oideq_old:
            movsx   rcx, DWORD PTR [rdi+32]
            mov     rax, rdi
            mov     rdx, rsi
            test    ecx, ecx
            jne     .L2
            mov     rcx, QWORD PTR the_repository[rip]
            cmp     QWORD PTR [rcx+16], 32
            mov     rcx, QWORD PTR [rax]
            je      .L12
    .L4:
            mov     rsi, QWORD PTR [rax+8]
            xor     rcx, QWORD PTR [rdx]
            xor     rsi, QWORD PTR [rdx+8]
            or      rcx, rsi
            je      .L13
    .L8:
            mov     eax, 1
            test    eax, eax
            sete    al
            movzx   eax, al
            ret
    .L2:
            lea     rsi, [rcx+rcx*2]
            lea     rcx, [rcx+rsi*4]
            lea     rcx, hash_algos[0+rcx*8]
            cmp     QWORD PTR [rcx+16], 32
            mov     rcx, QWORD PTR [rax]
            jne     .L4
    .L12:
            mov     rsi, QWORD PTR [rax+8]
            xor     rcx, QWORD PTR [rdx]
            xor     rsi, QWORD PTR [rdx+8]
            or      rcx, rsi
            jne     .L8
            mov     rcx, QWORD PTR [rax+16]
            mov     rax, QWORD PTR [rax+24]
            xor     rcx, QWORD PTR [rdx+16]
            xor     rax, QWORD PTR [rdx+24]
            or      rcx, rax
            jne     .L8
            xor     eax, eax
    .L14:
            test    eax, eax
            sete    al
            movzx   eax, al
            ret
    .L13:
            mov     edi, DWORD PTR [rdx+16]
            cmp     DWORD PTR [rax+16], edi
            jne     .L8
            xor     eax, eax
            jmp     .L14

    oideq_new:
            mov     rax, QWORD PTR [rdi]
            mov     rdx, QWORD PTR [rdi+8]
            xor     rax, QWORD PTR [rsi]
            xor     rdx, QWORD PTR [rsi+8]
            or      rax, rdx
            je      .L5
    .L2:
            mov     eax, 1
            xor     eax, 1
            ret
    .L5:
            mov     rax, QWORD PTR [rdi+16]
            mov     rdx, QWORD PTR [rdi+24]
            xor     rax, QWORD PTR [rsi+16]
            xor     rdx, QWORD PTR [rsi+24]
            or      rax, rdx
            jne     .L2
            xor     eax, eax
            xor     eax, 1
            ret

Interestingly, the compiler decides to split the comparisons into two so
that it first compares the lower half of the object ID for equality and
then the upper half. If the first check shows a difference, then we
wouldn't even end up comparing the second half.

In both cases, the new generated code is significantly shorter and has
way less branches. While I didn't benchmark the change, I'd be surprised
if the new code was slower.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

global: ensure that object IDs are always padded

The `oidcmp()` and `oideq()` functions only compare the prefix length as
specified by the given hash algorithm. This mandates that the object IDs
have a valid hash algorithm set, or otherwise we wouldn't be able to
figure out that prefix. As we do not have a hash algorithm in many
cases, for example when handling null object IDs, this assumption cannot
always be fulfilled. We thus have a fallback in place that instead uses
`the_repository` to derive the hash function. This implicit dependency
is hidden away from callers and can be quite surprising, especially in
contexts where there may be no repository.

In theory, we can adapt those functions to always memcmp(3P) the whole
length of their hash arrays. But there exist a couple of sites where we
populate `struct object_id`s such that only the prefix of its hash that
is actually used by the hash algorithm is populated. The remaining bytes
are left uninitialized. The fact that those bytes are uninitialized also
leads to warnings under Valgrind in some places where we copy those
bytes.

Refactor callsites where we populate object IDs to always initialize all
bytes. This also allows us to get rid of `oidcpy_with_padding()`, for
one because the input is now fully initialized, and because `oidcpy()`
will now always copy the whole hash array.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hash: require hash algorithm in `oidread()` and `oidclr()`

Both `oidread()` and `oidclr()` use `the_repository` to derive the hash
function that shall be used. Require callers to pass in the hash
algorithm to get rid of this implicit dependency.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`

Many of our hash functions have two variants, one receiving a `struct
git_hash_algo` and one that derives it via `the_repository`. Adapt all
of those functions to always require the hash algorithm as input and
drop the variants that do not accept one.

As those functions are now independent of `the_repository`, we can move
them from "hash.h" to "hash-ll.h".

Note that both in this and subsequent commits in this series we always
just pass `the_repository->hash_algo` as input even if it is obvious
that there is a repository in the context that we should be using the
hash from instead. This is done to be on the safe side and not introduce
any regressions. All callsites should eventually be amended to use a
repo passed via parameters, but this is outside the scope of this patch
series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions

The functions `is_empty_{blob,tree}_sha1()` are mostly unused, except
for a single callsite in "read-cache.c". Most callsites have long since
been converted to use the equivalents that accept a `struct object_id`
instead of a string.

Adapt the remaining callsite and drop those functions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: drop checks for zero-url case

Now that the previous commit removed the possibility that a "struct
remote" will ever have zero url fields, we can drop a number of
redundant checks and untriggerable code paths.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: always require at least one url in a remote

When we return a struct from remote_get(), the result _almost_ always
has at least one url. In remotes_remote_get_1(), we do this:

        if (name_given && !valid_remote(ret))
                add_url_alias(remote_state, ret, name);
        if (!valid_remote(ret))
                return NULL;

So if the remote doesn't have a url, we give it one based on the name
(this is how unconfigured urls are used as remotes). And if that doesn't
work, we return NULL.

But there's a catch: valid_remote() checks that we have at least one url
_unless_ the remote.*.vcs field is set. This comes from c578f51d52 (Add
a config option for remotes to specify a foreign vcs, 2009-11-18), and
the whole idea was to support remote helpers that don't have their own
url.

However, that mode has been broken since 25d5cc488a (Pass unknown
protocols to external protocol handlers, 2009-12-09)! That commit
unconditionally looks at the url in get_helper(), causing a segfault
with something like:

  git -c remote.foo.vcs=bar fetch foo

We could fix that now, of course. But given that it has been broken for
almost 15 years and nobody noticed, there's a better option. This weird
"there might not be a url" special case requires checks all over the
code base, and it's not clear if there are other similar segfaults
lurking. It would be nice if we could drop that special case.

So instead, let's let the "the remote name is the url" code kick in. If
you have "remote.foo.vcs", then your url (unless otherwise configured)
is "foo". This does have a visible effect compared to what 25d5cc488a
was trying to do. The idea back then is that for a remote without a url,
we'd run:

   # only one command-line option!
   git-remote-bar foo

whereas with our default url, now we'll run:

  git-remote-bar foo foo

Again, in practice nobody can be relying on this because it has been
segfaulting for 15 years. We should consider just removing this "vcs"
config option entirely, but that would be a user-visible breakage. So by
fixing it this way, we can keep things working that have been working,
and simplify away one special case inside our code.

This fixes the segfault from 25d5cc488a (demonstrated by the test), and
we can build further cleanups on top.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5801: test remote.*.vcs config

The usual way to trigger a remote helper is to use the "::" syntax from:
87422439d1 (Allow specifying the remote helper in the url, 2009-11-18).
Doing:

  git config remote.origin.url hg::https://example.com/repo

will run "git-remote-hg origin https://example.com/repo". Or you can
use the fallback handling from 25d5cc488a (Pass unknown protocols to
external protocol handlers, 2009-12-09):

  git config remote.origin.url "foo://bar"

which will run "git-remote-foo origin foo://bar".

But there's a third way, from c578f51d52 (Add a config option for
remotes to specify a foreign vcs, 2009-11-18):

  git config remote.origin.vcs foo
  git config remote.origin.url bar

which will run "git-remote-foo origin bar". This is mostly redundant
with the other methods, except that it is supposed to allow you to run
without a URL at all. So:

  git config remote.origin.vcs foo

would run "git-remote-foo origin" with no extra URL parameter (under the
assumption that the helper somehow knows how to access the remote repo).
However, this mode has been broken since 25d5cc488a, shortly after it
was added! That commit taught the transport code to always look at the
URL string to parse off the "foo::" bits, meaning it would always
segfault in the no-url case. You can see that with:

  git -c remote.foo.vcs=bar fetch foo

Nobody seems to have noticed in the almost 15 years since, so presumably
it's not a well-used feature. And without that, arguably the whole
remote.*.vcs feature could be removed entirely, as it isn't offering
anything you couldn't do with the "helper::" syntax. But it _does_ work
if you have a URL, and it has been advertised in the documentation for
all that time. So we shouldn't just remove it without warning.

Likewise, even if we were going to deprecate it, we should avoid
breaking it in the meantime. Since there are no tests for it at all,
let's add a few basic ones:

  - this syntax doesn't work well with "git clone" (another point
    against it versus "helper::"). But we can use "clone -c" to set up
    the config manually, passing the URL as usual to clone. This does
    work, though note that I had to use --no-local in the test to avoid
    broken interactions between the local code and the helper. In the
    real world this would be a non-issue, since the remote URL would
    generally not also be a local Git repo!

  - likewise, we should be able to set up the config manually and fetch
    into a repository. This also works.

  - we can simulate a vcs that has no URL support by stuffing the remote
    path into another environment variable. This should work, but
    doesn't (it hits the segfault mentioned above).

In the first two cases, I took the extra step of checking GIT_TRACE
output to confirm that we actually ran the helper (since the URL is a
valid Git repo, the clone/fetch would appear to work even if we
didn't use the helper at all!).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t5801: make remote-testgit GIT_DIR setup more robust

Our tests use a fake helper that just imports from an existing Git
repository. We're fed the path to that repo on the command line, and
derive the GIT_DIR by tacking on "/.git".

This is wrong if the path is a bare repository, but that's OK since this
is just a limited test. But it's also wrong if the transport code feeds
us the actual .git directory itself (i.e., we expect "/path/to/repo" but
it gives us "/path/to/repo/.git"). None of the current tests do that,
but let's future-proof ourselves against adding a test that does.

We can instead ask "rev-parse" to set our GIT_DIR. Note that we have to
first unset other git variables from our environment. Coming into this
script, we'll have GIT_DIR set to the fetching repository, and we need
to "switch" to the remote one.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: allow resetting url list

Because remote.*.url is treated as a multi-valued key, there is no way
to override previous config. So for example if you have
remote.origin.url set to some wrong value, doing:

git -c remote.origin.url=right fetch

would not work. It would append "right" to the list, which means we'd
still fetch from "wrong" (since subsequent values are used only as push
urls).

Let's provide a mechanism to reset the list, like we do for other
multi-valued keys (e.g., credential.helper, http.extraheaders, and
merge.suppressDest all use this "empty string means reset" pattern).

Reported-by: Mathew George <mathewegeorge@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: document remote.*.url/pushurl interaction

The documentation for these keys gives a very terse definition and
points you to the fetch/push manpages. But from reading those pages it
was not at all obvious to me that:

  - these are keys that can be defined multiple times with meaningful
    behavior (especially remote.*.url)

  - the way that pushurl overrides url (the git-push page does mention
    that "pushurl defaults to url", but it is not immediately clear what
    a multi-valued url would do in that situation).

Let's try to summarize the current behavior.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: simplify url/pushurl selection

When we want to know the push urls for a remote, there is some simple
logic:

  - if the user configured any remote.*.pushurl keys, then those make
    the complete set of push urls

  - otherwise we push to all urls in remote.*.url

Many spots implement this with a level of indirection, assigning to a
local url/url_nr pair. But since both arrays are now strvecs, we can
just use a pointer to select the appropriate strvec, shortening the code
a bit.

Even though this is now a one-liner, since it is application logic that
is present in so many places, it's worth abstracting a helper function.
In fact, we already have such a function, but it's local to
builtin/push.c. So we'll just make it available everywhere via remote.h.

There are two spots to pay special attention to here:

  1. in builtin/remote.c's get_url(), we are selecting first based on
     push_mode and then falling back to "url" when we're in push_mode
     but no pushurl is defined. The updated code makes that much more
     clear, compared to the original which had an "else" fall-through.

  2. likewise in that file's set_url(), we _only_ respect push_mode,
     sine the point is that we are adding to pushurl in that case
     (whether it is empty or not). And thus it does not use our helper
     function.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: use strvecs to store remote url/pushurl

Now that the url/pushurl fields of "struct remote" own their strings, we
can switch from bare arrays to strvecs. This has a few advantages:

  - push/clear are now one-liners

  - likewise the free+assigns in alias_all_urls() can use
    strvec_replace()

  - we now use size_t for storage, avoiding possible overflow

  - this will enable some further cleanups in future patches

There's quite a bit of fallout in the code that reads these fields, as
it tends to access these arrays directly. But it's mostly a mechanical
replacement of "url_nr" with "url.nr", and "url[i]" with "url.v[i]",
with a few variations (e.g. "*url" could become "*url.v", but I used
"url.v[0]" for consistency).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: transfer ownership of memory in add_url(), etc

Many of the internal functions in remote.c take const strings and store
them forever in instances of "struct remote". Since the functions are
internal and callers are aware of the convention, this seems to mostly
work and not cause leaks. But there are some issues:

  - it's impossible to clear any of the arrays, because the data
    dependencies between them are too muddled (if you free() a string,
    it might also be referenced from another array, causing a
    user-after-free; but if you don't, that might be the last reference,
    causing a leak).

    This is mostly of interest for further refactoring and features, but
    there's at least one spot that's already a problem. In alias_all_urls(),
    we replace elements of remote->url and remote->pushurl with their
    aliased forms, dropping references to the original.

  - sometimes strings from outside callers make their way in. For
    example, calling remote_get("foo") when there is no configured "foo"
    remote will create a remote struct with the single url "foo". But
    we'll do so by holding on to the string passed to remote_get()
    forever.

    In practice I think this works out because we'd usually pass in a
    string that lasts the length of the program (a string literal, or
    argv reference, or other data structure allocated in the main
    function). But it's a rather subtle requirement.

Instead, let's have remote->url and remote->pushurl own their string
memory. They'll copy the const strings that are passed in, and callers
can stop making their own copies. Likewise, when we overwrite an entry,
we can free the memory it points to, fixing the leak mentioned above.

We'll leave the struct members as "const" since they are visible to the
outside world, and shouldn't usually be touched. This requires casting
on free() for now, but we'll clean that further in a future patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote: refactor alias_url() memory ownership

The alias_url() function may return either a newly allocated string
(which the caller must take ownership of), or the original const "url"
parameter that was passed in.

This often works OK because callers are generally passing in a "url"
that they expect to retain ownership of anyway. So whether we got back
the original or a new string, we're always interested in storing it
forever. But I suspect there are some possible leaks here (e.g.,
add_url_alias() may end up discarding the original "url").

Whether there are active leaks or not, this is a confusing setup that
makes further refactoring of memory ownership harder. So instead of
returning the original string, return NULL, forcing callers to decide
what to do with it explicitly. We can then build further cleanups on top
of that.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

archive: fix check for missing url

Running "git archive --remote" checks that we have at least one url for
the remote. It does so by looking at remote.url[0], but that won't work;
if we have no url at all, then remote.url will be NULL, and we'll
segfault.

Check url_nr instead, which is a more direct way of asking what we
want.

You can trigger the segfault like this:

git -c remote.foo.vcs=bar archive --remote=foo

but I didn't bother adding a test. This is the tip of the iceberg for
no-url remotes, and a later patch will improve that situation. I just
wanted to clean up this bug so it didn't make further refactoring of
this code more confusing.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

BreakingChanges: document that we do not plan to deprecate git-checkout

The git-checkout(1) command is seen by many as hard to understand
because it connects two somewhat unrelated features: switching between
branches and restoring worktree files from arbitrary revisions. In 2019,
we thus implemented two new commands git-switch(1) and git-restore(1) to
split out these separate concerns into standalone functions.

This "replacement" of git-checkout(1) has repeatedly triggered concerns
for our userbase that git-checkout(1) will eventually go away. This is
not the case though: the use of that command is still widespread, and it
is not expected that this will change anytime soon.

Document that all three commands will remain for the foreseeable future.
This decision may be revisited in case we ever figure out that most
everyone has given up on any of the commands.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

BreakingChanges: document removal of grafting

The grafting mechanism for objects has been deprecated in e650d0643b
(docs: mark info/grafts as outdated, 2014-03-05), which is more than a
decade ago. The mechanism can lead to hard-to-debug issues and has a
superior replacement with replace refs.

Follow through with the deprecation and mark grafts for removal in Git
3.0.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

BreakingChanges: document upcoming change from "sha1" to "sha256"

Starting with 8e42eb0e9a (doc: sha256 is no longer experimental,
2023-07-31), the "sha256" object format is no longer considered to be
experimental. Furthermore, the SHA-1 hash function is actively
recommended against by for example NIST and FIPS 140-2, and attacks
against it are becoming more practical both due to new weaknesses
(SHAppening, SHAttered, Shambles) and due to the ever-increasing
computing power. It is only a matter of time before it can be considered
to be broken completely.

Let's plan for this event by being active instead of waiting for it to
happend and announce that the default object format is going to change
from "sha1" to "sha256" with Git 3.0.

All major Git implementations (libgit2, JGit, go-git) support the
"sha256" object format and are thus prepared for this change. The most
important missing piece in the puzzle is support in forges. But while
GitLab recently gained experimental support for the "sha256" object
format though, to the best of my knowledge GitHub doesn't support it
yet. Ideally, announcing this upcoming change will encourage forges to
start building that support.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

docs: introduce document to announce breaking changes

Over time, Git has grown quite a lot. With this evolution, many ideas
that were sensible at the time they were introduced are not anymore and
are thus considered to be deprecated. And while some deprecations may be
noted in manpages, most of them are actually deprecated in the "hive
mind" of the Git community, only.

Introduce a new document that tracks such breaking changes, but also
deprecations which we are not willing to go through with, to address
this issue. This document serves multiple purposes:

  - It is a way to facilitate discussion around proposed deprecations.

  - It allows users to learn about deprecations and speak up in case
    they have good reasons why a certain feature should not be
    deprecated.

  - It states intent and documents where the Git project wants to go,
    both in the case where we want to deprecate, but also in the case
    where we don't want to deprecate a specific feature.

The document is _not_ intended to cast every single discussion into
stone. It is supposed to be a living document that may change over time
when there are good reasons for it to change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'gt/unit-test-oidtree' into ps/use-the-repository

* gt/unit-test-oidtree:
t/: migrate helper/test-oidtree.c to unit-tests/t-oidtree.c

Merge branch 'ps/ref-storage-migration' into ps/use-the-repository

* ps/ref-storage-migration:
  builtin/refs: new command to migrate ref storage formats
  refs: implement logic to migrate between ref storage formats
  refs: implement removal of ref storages
  worktree: don't store main worktree twice
  reftable: inline `merged_table_release()`
  refs/files: fix NULL pointer deref when releasing ref store
  refs/files: extract function to iterate through root refs
  refs/files: refactor `add_pseudoref_and_head_entries()`
  refs: allow to skip creation of reflog entries
  refs: pass storage format to `ref_store_init()` explicitly
  refs: convert ref storage format to an enum
  setup: unset ref storage when reinitializing repository version

commit-graph: increment progress indicator

This fixes a bug that was introduced by 368d19b0b7 (commit-graph:
refactor compute_topological_levels(), 2023-03-20): Previously, the
progress indicator was updated from `i + 1` where `i` is the loop
variable of the enclosing `for` loop. After this patch, the update used
`info->progress_cnt + 1` instead, however, unlike `i`, the
`progress_cnt` attribute was not incremented. Let's increment it.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[jc: squashed in a test update from Patrick Steinhardt]
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The thirteenth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'gt/decorate-unit-test'

A test helper that essentially is unit tests on the "decorate"
logic has been rewritten using the unit-tests framework.

* gt/decorate-unit-test:
t/: migrate helper/test-example-decorate to the unit testing framework

Merge branch 'jk/sparse-leakfix'

Many memory leaks in the sparse-checkout code paths have been
plugged.

* jk/sparse-leakfix:
  sparse-checkout: free duplicate hashmap entries
  sparse-checkout: free string list after displaying
  sparse-checkout: free pattern list in sparse_checkout_list()
  sparse-checkout: free sparse_filename after use
  sparse-checkout: refactor temporary sparse_checkout_patterns
  sparse-checkout: always free "line" strbuf after reading input
  sparse-checkout: reuse --stdin buffer when reading patterns
  dir.c: always copy input to add_pattern()
  dir.c: free removed sparse-pattern hashmap entries
  sparse-checkout: clear patterns when init() sees existing sparse file
  dir.c: free strings in sparse cone pattern hashmaps
  sparse-checkout: pass string literals directly to add_pattern()
  sparse-checkout: free string list in write_cone_to_file()

Merge branch 'jk/cap-exclude-file-size'

An overly large ".gitignore" files are now rejected silently.

* jk/cap-exclude-file-size:
dir.c: reduce max pattern file size to 100MB
dir.c: skip .gitignore, etc larger than INT_MAX

Merge branch 'jc/safe-directory-leading-path'

The safe.directory configuration knob has been updated to
optionally allow leading path matches.

* jc/safe-directory-leading-path:
safe.directory: allow "lead/ing/path/*" match

Merge branch 'gt/t-hash-unit-test'

A pair of test helpers that essentially are unit tests on hash
algorithms have been rewritten using the unit-tests framework.

* gt/t-hash-unit-test:
t/: migrate helper/test-{sha1, sha256} to unit-tests/t-hash
strbuf: introduce strbuf_addstrings() to repeatedly add a string

Merge branch 'cp/reftable-unit-test'

Basic unit tests for reftable have been reimplemented under the
unit test framework.

* cp/reftable-unit-test:
  t: improve the test-case for parse_names()
  t: add test for put_be16()
  t: move tests from reftable/record_test.c to the new unit test
  t: move tests from reftable/stack_test.c to the new unit test
  t: move reftable/basics_test.c to the unit testing framework

Merge branch 'jc/t1517-more'

A new test was added to ensure git commands that are designed to
run outside repositories do work.

* jc/t1517-more:
imap-send: minimum leakfix
t1517: more coverage for commands that work without repository

t/: migrate helper/test-oidtree.c to unit-tests/t-oidtree.c

helper/test-oidtree.c along with t0069-oidtree.sh test the oidtree.h
library, which is a wrapper around crit-bit tree. Migrate them to
the unit testing framework for better debugging and runtime
performance. Along with the migration, add an extra check for
oidtree_each() test, which showcases how multiple expected matches can
be given to check_each() helper.

To achieve this, introduce a new library called 'lib-oid.h'
exclusively for the unit tests to use. It currently mainly includes
utility to generate object_id from an arbitrary hex string
(i.e. '12a' -> '12a0000000000000000000000000000000000000'). This also
handles the hash algo selection based on GIT_TEST_DEFAULT_HASH.
This library will also be helpful when we port other unit tests such
as oid-array, oidset etc.

Helped-by: Junio C Hamano <gitster@pobox.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
[jc: small fixlets squashed in]
Signed-off-by: Junio C Hamano <gitster@pobox.com>

object-name: don't try to abbreviate to lengths greater than hexsz

When given a length that equals the current hash algorithm's hex size,
then `repo_find_unique_abbrev_r()` exits early without trying to find an
abbreviation. This is only sensible because there is nothing to
abbreviate in the first place, so searching through objects to find a
unique prefix would be a waste of compute.

What we don't handle though is the case where the user passes a length
greater than the hash length. This is fine in practice as we still
compute the correct result. But at the very least, this is a waste of
resources as we try to abbreviate a value that cannot be abbreviated,
which causes us to hit the object database.

Start to explicitly handle values larger than hexsz to avoid this
performance penalty, which leads to a measureable speedup. The following
benchmark has been executed in linux.git:

  Benchmark 1: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD~)
    Time (mean ± σ):     12.812 s ±  0.040 s    [User: 12.225 s, System: 0.554 s]
    Range (min … max):   12.723 s … 12.857 s    10 runs

  Benchmark 2: git -c core.abbrev=9000 log --abbrev-commit (revision = HEAD)
    Time (mean ± σ):     11.095 s ±  0.029 s    [User: 10.546 s, System: 0.521 s]
    Range (min … max):   11.037 s … 11.122 s    10 runs

  Summary
    git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD) ran
      1.15 ± 0.00 times faster than git -c core.abbrev=9000 log --abbrev-commit HEAD (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

parse-options-cb: stop clamping "--abbrev=" to hash length

The `OPT__ABBREV()` option allows the user to specify the length that
object hashes shall be abbreviated to. This length needs to be in the
range of `(MIN_ABBREV, the_hash_algo->hexsz)`, which is why we clamp the
value as required. While this makes sense in the case of `MIN_ABBREV`,
it is unnecessary for the upper boundary as the value is eventually
passed down to `repo_find_unnique_abbrev_r()`, which handles values
larger than the current hash length just fine.

In the preceding commit, we have changed parsing of the "core.abbrev"
config to stop clamping to the upper boundary. Let's do the same here so
that the code becomes simpler, we are consistent with how we treat the
"core.abbrev" config and so that we stop depending on `the_repository`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: fix segfault when parsing "core.abbrev" without repo

The "core.abbrev" config allows the user to specify the minimum length
when abbreviating object hashes. Next to the values "auto" and "no",
this config also accepts a concrete length that needs to be bigger or
equal to the minimum length and smaller or equal to the hash algorithm's
hex length. While the former condition is trivial, the latter depends on
the object format used by the current repository. It is thus a variable
upper boundary that may either be 40 (SHA-1) or 64 (SHA-256).

This has two major downsides. First, the user that specifies this config
must be aware of the object hashes that its repository use. If they want
to configure the value globally, then they cannot pick any value in the
range `[41, 64]` if they have any repository that uses SHA-1. If they
did, Git would error out when parsing the config.

Second, and more importantly, parsing "core.abbrev" crashes when outside
of a Git repository because we dereference `the_hash_algo` to figure out
its hex length. Starting with c8aed5e8da (repository: stop setting SHA1
as the default object hash, 2024-05-07) though, we stopped initializing
`the_hash_algo` outside of Git repositories.

Fix both of these issues by not making it an error anymore when the
given length exceeds the hash length. Instead, leave the abbreviated
length intact. `repo_find_unique_abbrev_r()` handles this just fine
except for a performance penalty which we will fix in a subsequent
commit.

Reported-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Makefile: add ability to append to CFLAGS and LDFLAGS

There are some usecases where we may want to append CFLAGS to the
default CFLAGS set by Git. This could for example be to enable or
disable specific compiler warnings or to change the optimization level
that code is compiled with. This cannot be done without overriding the
complete CFLAGS value though and thus requires the user to redeclare the
complete defaults used by Git.

Introduce a new variable `CFLAGS_APPEND` that gets appended to the
default value of `CFLAGS`. As compiler options are last-one-wins, this
fulfills both of the usecases mentioned above. It's also common practice
across many other projects to have such a variable.

While at it, also introduce a matching `LDFLAGS_APPEND` variable. While
there isn't really any need for this variable as there are no default
`LDFLAGS`, users may expect this variable to exist, as well.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-revindex.c: guard against out-of-bounds pack lookups

The function midx_key_to_pack_pos() is a helper function used by
midx_to_pack_pos() and midx_pair_to_pack_pos() to translate a (pack,
offset) tuple into a position into the MIDX pseudo-pack order.

Ensure that the pack ID given to midx_pair_to_pack_pos() is bounded by
the number of packs within the MIDX to prevent, for instance,
uninitialized memory from being used as a pack ID.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pack-bitmap.c: avoid uninitialized `pack_int_id` during reuse

When performing multi-pack reuse, reuse_partial_packfile_from_bitmap()
is responsible for generating an array of bitmapped_pack structs from
which to perform reuse.

In the multi-pack case, we loop over the MIDXs packs and copy the result
of calling `nth_bitmapped_pack()` to construct the list of reusable
paths.

But we may also want to do pack-reuse over a single pack, either because
we only had one pack to perform reuse over (in the case of single-pack
bitmaps), or because we explicitly asked to do single pack reuse even
with a MIDX[^1].

When this is the case, the array we generate of reusable packs contains
only a single element, which is either (a) the pack attached to the
single-pack bitmap, or (b) the MIDX's preferred pack.

In 795006fff4 (pack-bitmap: gracefully handle missing BTMP chunks,
2024-04-15), we refactored the reuse_partial_packfile_from_bitmap()
function and stopped assigning the pack_int_id field when reusing only
the MIDX's preferred pack. This results in an uninitialized read down in
try_partial_reuse() like so:

    ==7474==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55c5cd191dde in try_partial_reuse pack-bitmap.c:1887:8
    #1 0x55c5cd191dde in reuse_partial_packfile_from_bitmap_1 pack-bitmap.c:2001:8
    #2 0x55c5cd191dde in reuse_partial_packfile_from_bitmap pack-bitmap.c:2105:3
    #3 0x55c5cce0bd0e in get_object_list_from_bitmap builtin/pack-objects.c:4043:3
    #4 0x55c5cce0bd0e in get_object_list builtin/pack-objects.c:4156:27
    #5 0x55c5cce0bd0e in cmd_pack_objects builtin/pack-objects.c:4596:3
    #6 0x55c5ccc8fac8 in run_builtin git.c:474:11

which happens when try_partial_reuse() tries to call
midx_pair_to_pack_pos() when it tries to reject cross-pack deltas.

Avoid the uninitialized read by ensuring that the pack_int_id field is
set in the single-pack reuse case by setting it to either the MIDX
preferred pack's pack_int_id, or '-1', in the case of single-pack
bitmaps.  In the latter case, we never read the pack_int_id field, so
the choice of '-1' is intentional as a "garbage in, garbage out"
measure.

Guard against further regressions in this area by adding a test which
ensures that we do not throw out deltas from the preferred pack as
"cross-pack" due to an uninitialized pack_int_id.

[^1]: This can happen for a couple of reasons, either because the
  repository is configured with 'pack.allowPackReuse=(true|single)', or
  because the MIDX was generated prior to the introduction of the BTMP
  chunk, which contains information necessary to perform multi-pack
  reuse.

Reported-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

midx-write.c: do not read existing MIDX with `packs_to_include`

Commit d6a8c58675 (midx-write.c: support reading an existing MIDX with
`packs_to_include`, 2024-05-29) changed the MIDX generation machinery to
support reading from an existing MIDX when writing a new one.

Unfortunately, the rest of the MIDX generation machinery is not prepared
to deal with such a change. For instance, the function responsible for
adding to the object ID fanout table from a MIDX source
(midx_fanout_add_midx_fanout()) will gladly add objects from an existing
MIDX for some fanout level regardless of whether or not those objects
came from packs that are to be included in the subsequent MIDX write.

This results in broken pseudo-pack object order (leading to incorrect
object traversal results) and segmentation faults, like so (generated by
running the added test prior to the changes in midx-write.c):

    #0  0x000055ee31393f47 in midx_pack_order (ctx=0x7ffdde205c70) at midx-write.c:590
    #1  0x000055ee31395a69 in write_midx_internal (object_dir=0x55ee32570440 ".git/objects",
        packs_to_include=0x7ffdde205e20, packs_to_drop=0x0, preferred_pack_name=0x0,
        refs_snapshot=0x0, flags=15) at midx-write.c:1171
    #2  0x000055ee31395f38 in write_midx_file_only (object_dir=0x55ee32570440 ".git/objects",
        packs_to_include=0x7ffdde205e20, preferred_pack_name=0x0, refs_snapshot=0x0, flags=15)
        at midx-write.c:1274
    [...]

In stack frame #0, the code on midx-write.c:590 is using the new pack ID
corresponding to some object which was added from the existing MIDX.
Importantly, the pack from which that object was selected in the
existing MIDX does not appear in the new MIDX as it was excluded via
`--stdin-packs`.

In this instance, the pack in question had pack ID "1" in the existing
MIDX, but since it was excluded from the new MIDX, we never filled in
that entry in the pack_perm table, resulting in:

    (gdb) p *ctx->pack_perm@2
    $1 = {0, 1515870810}

Which is what causes the segfault above when we try and read:

    struct pack_info *pack = &ctx->info[ctx->pack_perm[i]];
    if (pack->bitmap_pos == BITMAP_POS_UNKNOWN)
        pack->bitmap_pos = 0;

Fundamentally, we should be able to read information from an existing
MIDX when generating a new one. But in practice the midx-write.c code
assumes that we won't run into issues like the above with incongruent
pack IDs, and often makes those assumptions in extremely subtle and
fragile ways.

Instead, let's avoid reading from an existing MIDX altogether, and stick
with the pre-d6a8c58675 implementation. Harden against any regressions
in this area by adding a test which demonstrates these issues.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/blame: fix leaking ignore revs files

When parsing the blame configuration we add "blame.ignoreRevsFile"
configs to a string list. This string list is declared as with `NODUP`,
and thus we hand over the allocated string to that list. We eventually
end up calling `string_list_clear()` on that list, but due to it being
declared as `NODUP` we will not release the associated strings and thus
leak memory.

Fix this issue by setting up the list as `DUP` instead and free the
config string after insertion.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/blame: fix leaking prefixed paths

In `cmd_blame()` we compute prefixed paths by calling `add_prefix()`,
which itself calls `prefix_path()`. While `prefix_path()` returns an
allocated string, `add_prefix()` pretends to return a constant string.
Consequently, this path never gets freed.

Fix the return type to be `char *` and free the path to plug the memory
leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

blame: fix leaking data for blame scoreboards

There are some memory leaks when cleaning up blame scoreboards. Fix
those.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

line-range: plug leaking find functions

In `parse_range_funcname()` we may end up allocating a "find function",
but never free it. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge: fix leaking merge bases

When calling either the recursive or the ORT merge machineries we need
to provide a list of merge bases. The ownership of that parameter is
then implicitly transferred to the callee, which is somewhat fishy.
Furthermore, that list may leak in some cases where the merge machinery
runs into an error, thus causing a memory leak.

Refactor the code such that we stop transferring ownership. Instead, the
merge machinery will now create its own local copies of the passed in
list as required if they need to modify the list. Free the list at the
callsites as required.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/merge: fix leaking `struct cmdnames` in `get_strategy()`

In "builtin/merge.c" we use the helper infrastructure to figure out what
merge strategies there are. We never free contents of the `cmdnames`
structures though and thus leak their memory.

Fix this by exposing the already existing `clean_cmdnames()` function to
release their memory. As this name isn't quite idiomatic, rename it to
`cmdnames_release()` while at it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer: fix memory leaks in `make_script_with_merges()`

Fix some trivial memory leaks in `make_script_with_merges()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/clone: plug leaking HEAD ref in `wanted_peer_refs()`

In `wanted_peer_refs()` we first create a copy of the "HEAD" ref. This
copy may not actually be passed back to the caller, but is not getting
freed in this case. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

apply: fix leaking string in `match_fragment()`

Before calling `update_pre_post_images()`, we call `strbuf_detach()` to
put its buffer into a new string variable that we then pass to that
function. Besides being rather pointless, it also causes us to leak
memory of that variable because we never free it.

Get rid of the variable altogether and instead reach into the `strbuf`
directly. While at it, refactor the code to have a common exit path and
mark string that do not contain allocated memory as constant.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sequencer: fix leaking string buffer in `commit_staged_changes()`

We're leaking the `rev` string buffer in various call paths. Refactor
the function to have a common exit path so that we can release its
memory reliably.

This fixes a subset of tests failing with the memory sanitizer in t3404.
But as there are more failures, we cannot yet mark the whole test suite
as passing.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

commit: fix leaking parents when calling `commit_tree_extended()`

When creating commits via `commit_tree_extended()`, the caller passes in
a string list of parents. This call implicitly transfers ownership of
that list to the function, which is quite surprising to begin with. But
to make matters worse, `commit_tree_extended()` doesn't even bother to
free the list of parents in error cases. The result is a memory leak,
and one that the caller cannot fix by themselves because they do not
know whether parts of the string list have already been released.

Refactor the code such that callers can keep ownership of the list of
parents, which is getting indicated by parameter being a constant
pointer now. Free the lists at the calling site and add a common exit
path to those sites as required.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: fix leaking "core.notesref" variable

The variable used to track the "core.notesref" config is not getting
freed before we assign to it and thus leaks. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

rerere: fix various trivial leaks

We leak various different string lists in the rerere code. Free those to
plug them.

Note that the `merge_rr` variable is intentionally being free'd with the
`free_util` parameter set to 1. The `util` field is used there to store
the IDs of every rerere item and thus needs to be freed, as well.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/stash: fix leak in `show_stash()`

We leak the `revision_args()` variable. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

revision: free diff options

There is a todo comment in `release_revisions()` that mentions that we
need to free the diff options, which was added via 54c8a7c379 (revisions
API: add a TODO for diff_free(&revs->diffopt), 2022-04-14). Releasing
the diff options wasn't quite feasible at that time because some call
sites rely on its contents to remain even after the revisions have been
released.

In fact, there really only are a couple of callsites that misbehave
here:

  - `cmd_shortlog()` releases the revisions, but continues to access its
    file pointer.

  - `do_diff_cache()` creates a shallow copy of `struct diff_options`,
    but does not set the `no_free` member. Consequently, we end up
    releasing resources of the caller-provided diff options.

  - `diff_free()` and friends do not play nice when being called
    multiple times as they don't unset data structures that they have
    just released.

Fix all of those cases and enable the call to `diff_free()`, which plugs
a bunch of memory leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/log: fix leaking commit list in git-cherry(1)

We're storing the list of commits that git-cherry(1) is about to print
into a temporary list. This list is never getting free'd and thus leaks.
Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: fix memory leak when finalizing merge

We do not free some members of `struct merge_options`' private data.
Fix this to plug those leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/merge-recursive: fix leaking object ID bases

In `cmd_merge_recursive()` we have a static array of object ID bases
that we pass to `merge_recursive_generic()`. This interface is somewhat
weird though because the latter function accepts a pointer to a pointer
of object IDs, which requires us to allocate the object IDs on the heap.
And as we never free those object IDs, the end result is a leak.

While we can easily solve this leak by just freeing the respective
object IDs, the whole calling convention is somewhat weird. Instead,
refactor `merge_recursive_generic()` to accept a plain pointer to object
IDs so that we can avoid allocating them altogether.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/difftool: plug memory leaks in `run_dir_diff()`

We're leaking a bunch of memory leaks in `run_dir_diff()`. Plug them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

object-name: free leaking object contexts

While it is documented in `struct object_context::path` that this
variable needs to be released by the caller, this fact is rather easy to
miss given that we do not ever provide a function to release the object
context. And of course, while some callers dutifully release the path,
many others don't.

Introduce a new `object_context_release()` function that releases the
path. Convert callsites that used to free the path to use that new
function and add missing calls to callsites that were leaking memory.
Refactor those callsites as required to have a single return path, only.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/rev-list: fix leaking bitmap index when calculating disk usage

git-rev-list(1) can speed up its object size calculations for reachable
objects via a bitmap walk, if there is any bitmap. This is done in
`try_bitmap_disk_usage()`, which tries to optimistically load the bitmap
and then use it, if available. It never frees it though, leading to a
memory leak. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

notes: fix memory leak when pruning notes

In `prune_notes()` we first store the notes that are to be deleted in a
local list, and then iterate through that list to delete those notes one
by one. We never free the list though and thus leak its memory. Fix
this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

revision: fix leaking display notes

We never free the display notes options embedded into `struct revision`.
Implement a new function `release_display_notes()` that we can call in
`release_revisions()` to fix this.

There is another gotcha here though: we play some games with the string
list used to track extra notes refs, where we sometimes set the bit that
indicates that strings should be strdup'd and sometimes unset it. This
dance is done to avoid a copy of an already-allocated string when we
call `enable_ref_display_notes()`. But this dance is rather pointless as
we can instead call `string_list_append_nodup()` to transfer ownership
of the allocated string to the list.

Refactor the code to do so and drop the `strdup_strings` dance.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

merge-recursive: fix leaking rename conflict info

When computing rename conflicts in our recursive merge algorithm we set
up `struct rename_conflict_info`s to track that information. We never
free those data structures though and thus leak memory.

We need to be a bit more careful here though because the same rename
conflict info can be assigned to multiple structures. Accommodate for
this by introducing a `rename_conflict_info_owned` bit that we can use
to steer whether or not the rename conflict info shall be free'd.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

biultin/rev-parse: fix memory leaks in `--parseopt` mode

We have a bunch of memory leaks in git-rev-parse(1)'s `--parseopt` mode.
Refactor the code to use `struct strvec`s to make it easier for us to
track the lifecycle of those leaking variables and then free them.

While at it, remove the unneeded static lifetime for some of the
variables.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

bundle: plug leaks in `create_bundle()`

When creating a bundle, we set up a revision walk, but never release
data associated with it. Furthermore, we create a mostly-shallow copy of
that revision walk where we only adapt its pending objects such that we
can reuse the walk. While that copy must not be released, the pending
objects array need to be.

Plug those memory leaks by releasing the revision walk and the pending
objects of the copied revision walk.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

notes-utils: free note trees when releasing copied notes

While we clear most of the members of `struct notes_rewrite_cfg` in
`finish_copy_notes_for_rewrite()`, we do not clear the notes tree. Fix
this to plug this memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

parse-options: fix leaks for users of OPT_FILENAME

The `OPT_FILENAME()` option will, if set, put an allocated string into
the user-provided variable. Consequently, that variable thus needs to be
free'd by the caller of `parse_options()`. Some callsites don't though
and thus leak memory. Fix those.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

revision: fix memory leak when reversing revisions

When reversing revisions in a rev walk, `get_revision()` will allocate a
new commit list and assign it to `revs->commits`. It does not free the
old list though, which makes it leak. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The twelfth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jk/leakfixes'

Memory leaks in "git mv" has been plugged.

* jk/leakfixes:
  mv: replace src_dir with a strvec
  mv: factor out empty src_dir removal
  mv: move src_dir cleanup to end of cmd_mv()
  t-strvec: mark variable-arg helper with LAST_ARG_MUST_BE_NULL
  t-strvec: use va_end() to match va_start()

Merge branch 'iw/trace-argv-on-alias'

The alias-expanded command lines are logged to the trace output.

* iw/trace-argv-on-alias:
  run-command: show prepared command
  Documentation: alias: add notes on shell expansion
  Documentation: alias: rework notes into points

diff: let external diffs report that changes are uninteresting

The options --exit-code and --quiet instruct git diff to indicate
whether it found any significant changes by exiting with code 1 if it
did and 0 if there were none.  Currently this doesn't work if external
diff programs are involved, as we have no way to learn what they found.

Add that ability in the form of the new configuration options
diff.trustExitCode and diff.<driver>.trustExitCode and the environment
variable GIT_EXTERNAL_DIFF_TRUST_EXIT_CODE.  They pair with the config
options diff.external and diff.<driver>.command and the environment
variable GIT_EXTERNAL_DIFF, respectively.

The new options are off by default, keeping the old behavior.  Enabling
them indicates that the external diff returns exit code 1 if it finds
significant changes and 0 if it doesn't, like diff(1).

The name of the new options is taken from the git difftool and mergetool
options of similar purpose.  (There they enable passing on the exit code
of a diff tool and to infer whether a merge done by a merge tool is
successful.)

The new feature sets the diff flag diff_from_contents in
diff_setup_done() if we need the exit code and are allowed to call
external diffs.  This disables the optimization that avoids calling the
program with --quiet.  Add it back by skipping the call if the external
diff is not able to report empty diffs.  We can only do that check after
evaluating the file-specific attributes in run_external_diff().

If we do run the external diff with --quiet, send its output to
/dev/null.

I considered checking the output of the external diff to check whether
its empty.  It was added as 11be65cfa4 (diff: fix --exit-code with
external diff, 2024-05-05) and quickly reverted, as it does not work
with external diffs that do not write to stdout.  There's no reason why
a graphical diff tool would even need to write anything there at all.

I also considered using a non-zero exit code for empty diffs, which
could be done without adding new configuration options.  We'd need to
disable the optimization that allows git diff --quiet to skip calling
external diffs, though -- that might be quite surprising if graphical
diff programs are involved.  And assigning the opposite meaning of the
exit codes compared to diff(1) and git diff --exit-code to the external
diff can cause unnecessary confusion.

Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

userdiff: add and use struct external_diff

Wrap the string specifying the external diff command in a new struct to
simplify adding attributes, which the next patch will do.

Make sure external_diff() still returns NULL if neither the environment
variable GIT_EXTERNAL_DIFF nor the configuration option diff.external is
set, to continue allowing its use in a boolean context.

Use a designated initializer for the default builtin userdiff driver to
adjust to the type change of the second struct member. Spelling out
only the non-zero members improves readability as a nice side-effect.

No functional change intended.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t4020: test exit code with external diffs

Add tests to check the exit code of git diff with its options --quiet
and --exit-code when using an external diff program. Currently we
cannot tell whether it found significant changes or not.

While at it, document briefly that --quiet turns off execution of
external diff programs because that behavior surprised me for a moment
while writing the tests.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

__attribute__: add a few missing format attributes

A public function mem_pool_strfmt() takes printf like parameters,
but is not given an attribute as such. Also a few file-scope static
functions were missing their format attribute.

Add them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

__attribute__: mark some functions with LAST_ARG_MUST_BE_NULL

Some varargs functions that use NULL-terminated parameter list were
missing __attributes__ ((sentinel)) aka LAST_ARG_MUST_BE_NULL.

Add them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

__attribute__: remove redundant attribute declaration for git_die_config()

The convention is to declare the function attribute to an extern
function together with its declaration in the header file, without
repeating the attribute declaration with its definition in the .c
source file (a file-scope static function declares its attribute
together with its definition in the .c file it is defined, as there
is no other place to do so).

The definition of git_die_config() in config.c did not follow the
convention and had its attribute declared with both its declaration
in the header and its definition in the .c source file.

Remove the one in the config.c to match everybody else.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

__attribute__: trace2_region_enter_printf() is like "printf"

The last part of the parameter list the function takes is like
parameters to printf. Mark it as such.

An existing call that formats a value of type size_t using "%d" was
found by the compiler with the help with this annotation; fix it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree_git_path(): move the declaration to path.h

The definition of this function is in path.c but its declaration is
in worktree.h, which is something unexpected. The function is
explained as "Similar to git_path()"; declaring it next to where
git_path() is declared would make more sense.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

doc: interactive.singleKey is disabled by default

Make it clear that the interactive.singleKey configuration option is
disabled by default, using rather subtle wording that avoids an
emphasis on the actual default value. This should eliminate any
associated doubts.

While there, touch up the remaining wording of the description a bit.

Signed-off-by: Dragan Simic <dsimic@manjaro.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

format-patch: assume --cover-letter for diff in multi-patch series

When we deal with a multi-patch series in git-format-patch(1), if we see
`--interdiff` or `--range-diff` but no `--cover-letter`, we return with
an error, saying:

fatal: --range-diff requires --cover-letter or single patch

or:

fatal: --interdiff requires --cover-letter or single patch

This makes sense because the cover-letter is where we place the diff
from the previous version.

However, considering that `format-patch` generates a multi-patch as
needed, let's adopt a similar "cover as necessary" approach when using
`--interdiff` or `--range-diff`.

Therefore, relax the requirement for an explicit `--cover-letter` in a
multi-patch series when the user says `--iterdiff` or `--range-diff`.

Still, if only to return the error, respect "format.coverLetter=no" and
`--no-cover-letter`.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t4014: cleanups in a few tests

Arrange things we are going to create to be removed at end, and then
start creating them. That way, we will clean them up even if we fail
after creating some but before the end of the command.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'tb/midx-write-cleanup'

Code clean-up around writing the .midx files.

* tb/midx-write-cleanup:
  pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper
  midx: replace `get_midx_rev_filename()` with a generic helper
  midx-write.c: support reading an existing MIDX with `packs_to_include`
  midx-write.c: extract `fill_packs_from_midx()`
  midx-write.c: extract `should_include_pack()`
  midx-write.c: pass `start_pack` to `compute_sorted_entries()`
  midx-write.c: reduce argument count for `get_sorted_entries()`
  midx-write.c: tolerate `--preferred-pack` without bitmaps

imap-send: free all_msgs strbuf in "out" label

We read stdin into a strbuf, but most code paths never release it,
causing a leak (albeit a minor one, as we leak only when exiting from
the main function of the program).

Commit 56f4f4a29d (imap-send: minimum leakfix, 2024-06-04) did the
minimum to plug the one instance we see in the test suite, when we read
an empty input. But it was sufficient only because aside from this noop
invocation, we don't test imap-send at all!

The right spot to free is in the "out" label, which is hit by all code
paths before leaving the function. We couldn't do that in 56f4f4a29d
because there was no unified exit path. That came separately in
3aca5f7fb0 (imap-send: fix leaking memory in `imap_server_conf`,
2024-06-04), which cleaned up many other leaks (but not this one).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jc/t1517-more' into jk/imap-send-plug-all-msgs-leak

* jc/t1517-more:
imap-send: minimum leakfix
t1517: more coverage for commands that work without repository

Merge branch 'ps/no-writable-strings' into jk/imap-send-plug-all-msgs-leak

* ps/no-writable-strings: (46 commits)
  config.mak.dev: enable `-Wwrite-strings` warning
  builtin/merge: always store allocated strings in `pull_twohead`
  builtin/rebase: always store allocated string in `options.strategy`
  builtin/rebase: do not assign default backend to non-constant field
  imap-send: fix leaking memory in `imap_server_conf`
  imap-send: drop global `imap_server_conf` variable
  mailmap: always store allocated strings in mailmap blob
  revision: always store allocated strings in output encoding
  remote-curl: avoid assigning string constant to non-const variable
  send-pack: always allocate receive status
  parse-options: cast long name for OPTION_ALIAS
  http: do not assign string constant to non-const field
  compat/win32: fix const-correctness with string constants
  pretty: add casts for decoration option pointers
  object-file: make `buf` parameter of `index_mem()` a constant
  object-file: mark cached object buffers as const
  ident: add casts for fallback name and GECOS
  entry: refactor how we remove items for delayed checkouts
  line-log: always allocate the output prefix
  line-log: stop assigning string constant to file parent buffer
  ...

config.mak.dev: enable `-Wwrite-strings` warning

Writing to string constants is undefined behaviour and must be avoided
in C. Even so, the compiler does not help us with this by default
because those constants are not in fact marked as `const`. This makes it
rather easy to accidentally assign a constant to a non-const variable or
field and then later on try to either free it or write to it.

Enable `-Wwrite-strings` to catch such mistakes. With this warning
enabled, the type of string constants is changed to `const char[]` and
will thus cause compiler warnings when being assigned to non-const
fields and variables.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/merge: always store allocated strings in `pull_twohead`

The `pull_twohead` configuration may sometimes contain an allocated
string, and sometimes it may contain a string constant. Refactor this to
instead always store an allocated string such that we can release its
resources without risk.

While at it, manage the lifetime of other config strings, as well. Note
that we explicitly don't free `cleanup_arg` here. This is because the
variable may be assigned a string constant via command line options.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/rebase: always store allocated string in `options.strategy`

The `struct rebase_options::strategy` field is a `char *`, but we do end
up assigning string constants to it in two cases:

  - When being passed a `--strategy=` option via the command line.

  - When being passed a strategy option via `--strategy-option=`, but
    not a strategy.

This will cause warnings once we enable `-Wwrite-strings`.

Ideally, we'd just convert the field to be a `const char *`. But we also
assign to this field via the GIT_TEST_MERGE_ALGORITHM envvar, which we
have to strdup(3P) into it.

Instead, refactor the code to make sure that we only ever assign
allocated strings to this field.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/rebase: do not assign default backend to non-constant field

The `struct rebase_options::default_backend` field is a non-constant
string, but is being assigned a constant via `REBASE_OPTIONS_INIT`.
Fix this by using `xstrdup()` to assign the variable and introduce a new
function `rebase_options_release()` that releases memory held by the
structure, including the newly-allocated variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

imap-send: fix leaking memory in `imap_server_conf`

We never free any of the config strings that we populate into the
`struct imap_server_conf`. Fix this by creating a common exit path where
we can free resources.

While at it, drop the unused member `imap_server_conf::name`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

imap-send: drop global `imap_server_conf` variable

In "imap-send.c", we have a global `sturct imap_server_conf` variable
that keeps track of the configuration of the IMAP server. This variable
is being populated mostly via the Git configuration.

Refactor the code to allocate the structure on the stack instead of
having it globally. This change allows us to track its lifetime more
closely.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mailmap: always store allocated strings in mailmap blob

Same as with the preceding commit, the `git_mailmap_blob` may sometimes
contain an allocated string and sometimes it may contain a string
constant. This is risky and can easily lead to bugs in case the variable
is getting re-assigned, where the code may then try to free the previous
value to avoid memory leaks.

Safeguard the code by always storing allocated strings in the variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

revision: always store allocated strings in output encoding

The `git_log_output_encoding` variable can be set via the `--encoding=`
option. When doing so, we conditionally either assign it to the passed
value, or if the value is "none" we assign it the empty string.
Depending on which of the both code paths we pick though, the variable
may end up being assigned either an allocated string or a string
constant.

This is somewhat risky and may easily lead to bugs when a different code
path may want to reassign a new value to it, freeing the previous value.
We already to this when parsing the "i18n.logoutputencoding" config in
`git_default_i18n_config()`. But because the config is typically parsed
before we parse command line options this has been fine so far.

Regardless of that, safeguard the code such that the variable always
contains an allocated string. While at it, also free the old value in
case there was any to plug a potential memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote-curl: avoid assigning string constant to non-const variable

When processing remote options, we split the option line into two by
searching for a space. If there is one, we replace the space with '\0',
otherwise we implicitly assume that the value is "true" and thus assign
a string constant.

As the return value of strchr(3P) weirdly enough is a `char *` even
though it gets a `const char *` as input, the assigned-to variable also
is a non-constant. This is fine though because the argument is in fact
an allocated string, and thus we are allowed to modify it. But this will
break once we enable `-Wwrite-strings`.

Refactor the code stop splitting the fields with '\0' altogether.
Instead, we can pass the length of the option name to `set_option()` and
then use strncmp(3P) instead of strcmp(3P).

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

send-pack: always allocate receive status

In `receive_status()`, we record the reason why ref updates have been
rejected by the remote via the `remote_status`. But while we allocate
the assigned string when a reason was given, we assign a string constant
when no reason was given.

This has been working fine so far due to two reasons:

- We don't ever free the refs in git-send-pack(1)'

- Remotes always give a reason, at least as implemented by Git proper.

Adapt the code to always allocate the receive status string and free the
refs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

parse-options: cast long name for OPTION_ALIAS

We assign the long name for OPTION_ALIAS options to a non-constant value
field. We know that the variable will never be written to, but this will
cause warnings once we enable `-Wwrite-strings`.

Cast away the constness to be prepared for this change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

http: do not assign string constant to non-const field

In `write_accept_language()`, we put all acceptable languages into an
array. While all entries in that array are allocated strings, the final
entry in that array is a string constant. This is fine because we
explicitly skip over the last entry when freeing the array, but will
cause warnings once we enable `-Wwrite-strings`.

Adapt the code to also allocate the final entry.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

compat/win32: fix const-correctness with string constants

Adjust various places in our Win32 compatibility layer where we are not
assigning string constants to `const char *` variables.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>