git.ipfire.org Git - thirdparty/git.git/log

http: allow authenticating proactively

When making a request over HTTP(S), Git only sends authentication if it
receives a 401 response.  Thus, if a repository is open to the public
for reading, Git will typically never ask for authentication for fetches
and clones.

However, there may be times when a user would like to authenticate
nevertheless.  For example, a forge may give higher rate limits to users
who authenticate because they are easier to contact in case of excessive
use.  Or it may be useful for a known heavy user, such as an internal
service, to proactively authenticate so its use can be monitored and, if
necessary, throttled.

Let's make this possible with a new option, "http.proactiveAuth".  This
option specifies a type of authentication which can be used to
authenticate against the host in question.  This is necessary because we
lack the WWW-Authenticate header to provide us details; similarly, we
cannot accept certain types of authentication because we require
information from the server, such as a nonce or challenge, to
successfully authenticate.

If we're in auto mode and we got a username and password, set the
authentication scheme to Basic.  libcurl will not send authentication
proactively unless there's a single choice of allowed authentication,
and we know in this case we didn't get an authtype entry telling us what
scheme to use, or we would have taken a different codepath and written
the header ourselves.  In any event, of the other schemes that libcurl
supports, Digest and NTLM require a nonce or challenge, which means that
they cannot work with proactive auth, and GSSAPI does not use a username
and password at all, so Basic is the only logical choice among the
built-in options.

Note that the existing http_proactive_auth variable signifies proactive
auth if there are already credentials, which is different from the
functionality we're adding, which always seeks credentials even if none
are provided.  Nonetheless, t5540 tests the existing behavior for
WebDAV-based pushes to an open repository without credentials, so we
preserve it.  While at first this may seem an insecure and bizarre
decision, it may be that authentication is done with TLS certificates,
in which case it might actually provide a quite high level of security.
Expand the variable to use an enum to handle the additional cases and a
helper function to distinguish our new cases from the old ones.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'ps/no-writable-strings'

Building with "-Werror -Wwrite-strings" is now supported.

* ps/no-writable-strings: (27 commits)
  config.mak.dev: enable `-Wwrite-strings` warning
  builtin/merge: always store allocated strings in `pull_twohead`
  builtin/rebase: always store allocated string in `options.strategy`
  builtin/rebase: do not assign default backend to non-constant field
  imap-send: fix leaking memory in `imap_server_conf`
  imap-send: drop global `imap_server_conf` variable
  mailmap: always store allocated strings in mailmap blob
  revision: always store allocated strings in output encoding
  remote-curl: avoid assigning string constant to non-const variable
  send-pack: always allocate receive status
  parse-options: cast long name for OPTION_ALIAS
  http: do not assign string constant to non-const field
  compat/win32: fix const-correctness with string constants
  pretty: add casts for decoration option pointers
  object-file: make `buf` parameter of `index_mem()` a constant
  object-file: mark cached object buffers as const
  ident: add casts for fallback name and GECOS
  entry: refactor how we remove items for delayed checkouts
  line-log: always allocate the output prefix
  line-log: stop assigning string constant to file parent buffer
  ...

Merge branch 'jk/am-retry'

"git am" has a safety feature to prevent it from starting a new
session when there already is a session going.  It reliably
triggers when a mbox is given on the command line, but it has to
rely on the tty-ness of the standard input.  Add an explicit way to
opt out of this safety with a command line option.

* jk/am-retry:
  test-terminal: drop stdin handling
  am: add explicit "--retry" option

Merge branch 'jc/varargs-attributes'

Varargs functions that are unannotated as printf-like or execl-like
have been annotated as such.

* jc/varargs-attributes:
  __attribute__: add a few missing format attributes
  __attribute__: mark some functions with LAST_ARG_MUST_BE_NULL
  __attribute__: remove redundant attribute declaration for git_die_config()
  __attribute__: trace2_region_enter_printf() is like "printf"

Merge branch 'ps/ref-storage-migration'

A new command has been added to migrate a repository that uses the
files backend for its ref storage to use the reftable backend, with
limitations.

* ps/ref-storage-migration:
  builtin/refs: new command to migrate ref storage formats
  refs: implement logic to migrate between ref storage formats
  refs: implement removal of ref storages
  worktree: don't store main worktree twice
  reftable: inline `merged_table_release()`
  refs/files: fix NULL pointer deref when releasing ref store
  refs/files: extract function to iterate through root refs
  refs/files: refactor `add_pseudoref_and_head_entries()`
  refs: allow to skip creation of reflog entries
  refs: pass storage format to `ref_store_init()` explicitly
  refs: convert ref storage format to an enum
  setup: unset ref storage when reinitializing repository version

Merge branch 'ps/check-docs-fix'

"make check-docs" noticed problems and reported to its output but
failed to signal its findings with its exit status, which has been
corrected.

* ps/check-docs-fix:
  ci/test-documentation: work around SyntaxWarning in Python 3.12
  gitlab-ci: add job to run `make check-docs`
  Documentation/lint-manpages: bubble up errors
  Makefile: extract script to lint missing/extraneous manpages

Merge branch 'ps/ci-fix-detection-of-ubuntu-20'

Fix for an embarrassing typo that prevented Python2 tests from running
anywhere.

* ps/ci-fix-detection-of-ubuntu-20:
ci: fix check for Ubuntu 20.04

Merge branch 'ap/credential-clear-fix'

Upon expiration event, the credential subsystem forgot to clear
in-core authentication material other than password (whose support
was added recently), which has been corrected.

* ap/credential-clear-fix:
credential: clear expired c->credential, unify secret clearing

Merge branch 'jc/format-patch-with-range-diff'

The inter/range-diff output has been moved to the end of the patch
when format-patch adds it to a single patch, instead of writing it
before the patch text, to be consistent with what is done for a
cover letter for a multi-patch series.

* jc/format-patch-with-range-diff:
format-patch: move range/inter diff at the end of a single patch output
show_log: factor out interdiff/range-diff generation

The thirteenth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'gt/decorate-unit-test'

A test helper that essentially is unit tests on the "decorate"
logic has been rewritten using the unit-tests framework.

* gt/decorate-unit-test:
t/: migrate helper/test-example-decorate to the unit testing framework

Merge branch 'jk/sparse-leakfix'

Many memory leaks in the sparse-checkout code paths have been
plugged.

* jk/sparse-leakfix:
  sparse-checkout: free duplicate hashmap entries
  sparse-checkout: free string list after displaying
  sparse-checkout: free pattern list in sparse_checkout_list()
  sparse-checkout: free sparse_filename after use
  sparse-checkout: refactor temporary sparse_checkout_patterns
  sparse-checkout: always free "line" strbuf after reading input
  sparse-checkout: reuse --stdin buffer when reading patterns
  dir.c: always copy input to add_pattern()
  dir.c: free removed sparse-pattern hashmap entries
  sparse-checkout: clear patterns when init() sees existing sparse file
  dir.c: free strings in sparse cone pattern hashmaps
  sparse-checkout: pass string literals directly to add_pattern()
  sparse-checkout: free string list in write_cone_to_file()

Merge branch 'jk/cap-exclude-file-size'

An overly large ".gitignore" files are now rejected silently.

* jk/cap-exclude-file-size:
dir.c: reduce max pattern file size to 100MB
dir.c: skip .gitignore, etc larger than INT_MAX

Merge branch 'jc/safe-directory-leading-path'

The safe.directory configuration knob has been updated to
optionally allow leading path matches.

* jc/safe-directory-leading-path:
safe.directory: allow "lead/ing/path/*" match

Merge branch 'gt/t-hash-unit-test'

A pair of test helpers that essentially are unit tests on hash
algorithms have been rewritten using the unit-tests framework.

* gt/t-hash-unit-test:
t/: migrate helper/test-{sha1, sha256} to unit-tests/t-hash
strbuf: introduce strbuf_addstrings() to repeatedly add a string

Merge branch 'cp/reftable-unit-test'

Basic unit tests for reftable have been reimplemented under the
unit test framework.

* cp/reftable-unit-test:
  t: improve the test-case for parse_names()
  t: add test for put_be16()
  t: move tests from reftable/record_test.c to the new unit test
  t: move tests from reftable/stack_test.c to the new unit test
  t: move reftable/basics_test.c to the unit testing framework

Merge branch 'jc/t1517-more'

A new test was added to ensure git commands that are designed to
run outside repositories do work.

* jc/t1517-more:
imap-send: minimum leakfix
t1517: more coverage for commands that work without repository

The twelfth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jk/leakfixes'

Memory leaks in "git mv" has been plugged.

* jk/leakfixes:
  mv: replace src_dir with a strvec
  mv: factor out empty src_dir removal
  mv: move src_dir cleanup to end of cmd_mv()
  t-strvec: mark variable-arg helper with LAST_ARG_MUST_BE_NULL
  t-strvec: use va_end() to match va_start()

Merge branch 'iw/trace-argv-on-alias'

The alias-expanded command lines are logged to the trace output.

* iw/trace-argv-on-alias:
  run-command: show prepared command
  Documentation: alias: add notes on shell expansion
  Documentation: alias: rework notes into points

__attribute__: add a few missing format attributes

A public function mem_pool_strfmt() takes printf like parameters,
but is not given an attribute as such. Also a few file-scope static
functions were missing their format attribute.

Add them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

__attribute__: mark some functions with LAST_ARG_MUST_BE_NULL

Some varargs functions that use NULL-terminated parameter list were
missing __attributes__ ((sentinel)) aka LAST_ARG_MUST_BE_NULL.

Add them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

__attribute__: remove redundant attribute declaration for git_die_config()

The convention is to declare the function attribute to an extern
function together with its declaration in the header file, without
repeating the attribute declaration with its definition in the .c
source file (a file-scope static function declares its attribute
together with its definition in the .c file it is defined, as there
is no other place to do so).

The definition of git_die_config() in config.c did not follow the
convention and had its attribute declared with both its declaration
in the header and its definition in the .c source file.

Remove the one in the config.c to match everybody else.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

__attribute__: trace2_region_enter_printf() is like "printf"

The last part of the parameter list the function takes is like
parameters to printf. Mark it as such.

An existing call that formats a value of type size_t using "%d" was
found by the compiler with the help with this annotation; fix it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'tb/midx-write-cleanup'

Code clean-up around writing the .midx files.

* tb/midx-write-cleanup:
  pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper
  midx: replace `get_midx_rev_filename()` with a generic helper
  midx-write.c: support reading an existing MIDX with `packs_to_include`
  midx-write.c: extract `fill_packs_from_midx()`
  midx-write.c: extract `should_include_pack()`
  midx-write.c: pass `start_pack` to `compute_sorted_entries()`
  midx-write.c: reduce argument count for `get_sorted_entries()`
  midx-write.c: tolerate `--preferred-pack` without bitmaps

config.mak.dev: enable `-Wwrite-strings` warning

Writing to string constants is undefined behaviour and must be avoided
in C. Even so, the compiler does not help us with this by default
because those constants are not in fact marked as `const`. This makes it
rather easy to accidentally assign a constant to a non-const variable or
field and then later on try to either free it or write to it.

Enable `-Wwrite-strings` to catch such mistakes. With this warning
enabled, the type of string constants is changed to `const char[]` and
will thus cause compiler warnings when being assigned to non-const
fields and variables.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/merge: always store allocated strings in `pull_twohead`

The `pull_twohead` configuration may sometimes contain an allocated
string, and sometimes it may contain a string constant. Refactor this to
instead always store an allocated string such that we can release its
resources without risk.

While at it, manage the lifetime of other config strings, as well. Note
that we explicitly don't free `cleanup_arg` here. This is because the
variable may be assigned a string constant via command line options.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/rebase: always store allocated string in `options.strategy`

The `struct rebase_options::strategy` field is a `char *`, but we do end
up assigning string constants to it in two cases:

  - When being passed a `--strategy=` option via the command line.

  - When being passed a strategy option via `--strategy-option=`, but
    not a strategy.

This will cause warnings once we enable `-Wwrite-strings`.

Ideally, we'd just convert the field to be a `const char *`. But we also
assign to this field via the GIT_TEST_MERGE_ALGORITHM envvar, which we
have to strdup(3P) into it.

Instead, refactor the code to make sure that we only ever assign
allocated strings to this field.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/rebase: do not assign default backend to non-constant field

The `struct rebase_options::default_backend` field is a non-constant
string, but is being assigned a constant via `REBASE_OPTIONS_INIT`.
Fix this by using `xstrdup()` to assign the variable and introduce a new
function `rebase_options_release()` that releases memory held by the
structure, including the newly-allocated variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

imap-send: fix leaking memory in `imap_server_conf`

We never free any of the config strings that we populate into the
`struct imap_server_conf`. Fix this by creating a common exit path where
we can free resources.

While at it, drop the unused member `imap_server_conf::name`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

imap-send: drop global `imap_server_conf` variable

In "imap-send.c", we have a global `sturct imap_server_conf` variable
that keeps track of the configuration of the IMAP server. This variable
is being populated mostly via the Git configuration.

Refactor the code to allocate the structure on the stack instead of
having it globally. This change allows us to track its lifetime more
closely.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

mailmap: always store allocated strings in mailmap blob

Same as with the preceding commit, the `git_mailmap_blob` may sometimes
contain an allocated string and sometimes it may contain a string
constant. This is risky and can easily lead to bugs in case the variable
is getting re-assigned, where the code may then try to free the previous
value to avoid memory leaks.

Safeguard the code by always storing allocated strings in the variable.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

revision: always store allocated strings in output encoding

The `git_log_output_encoding` variable can be set via the `--encoding=`
option. When doing so, we conditionally either assign it to the passed
value, or if the value is "none" we assign it the empty string.
Depending on which of the both code paths we pick though, the variable
may end up being assigned either an allocated string or a string
constant.

This is somewhat risky and may easily lead to bugs when a different code
path may want to reassign a new value to it, freeing the previous value.
We already to this when parsing the "i18n.logoutputencoding" config in
`git_default_i18n_config()`. But because the config is typically parsed
before we parse command line options this has been fine so far.

Regardless of that, safeguard the code such that the variable always
contains an allocated string. While at it, also free the old value in
case there was any to plug a potential memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote-curl: avoid assigning string constant to non-const variable

When processing remote options, we split the option line into two by
searching for a space. If there is one, we replace the space with '\0',
otherwise we implicitly assume that the value is "true" and thus assign
a string constant.

As the return value of strchr(3P) weirdly enough is a `char *` even
though it gets a `const char *` as input, the assigned-to variable also
is a non-constant. This is fine though because the argument is in fact
an allocated string, and thus we are allowed to modify it. But this will
break once we enable `-Wwrite-strings`.

Refactor the code stop splitting the fields with '\0' altogether.
Instead, we can pass the length of the option name to `set_option()` and
then use strncmp(3P) instead of strcmp(3P).

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

send-pack: always allocate receive status

In `receive_status()`, we record the reason why ref updates have been
rejected by the remote via the `remote_status`. But while we allocate
the assigned string when a reason was given, we assign a string constant
when no reason was given.

This has been working fine so far due to two reasons:

- We don't ever free the refs in git-send-pack(1)'

- Remotes always give a reason, at least as implemented by Git proper.

Adapt the code to always allocate the receive status string and free the
refs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

parse-options: cast long name for OPTION_ALIAS

We assign the long name for OPTION_ALIAS options to a non-constant value
field. We know that the variable will never be written to, but this will
cause warnings once we enable `-Wwrite-strings`.

Cast away the constness to be prepared for this change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

http: do not assign string constant to non-const field

In `write_accept_language()`, we put all acceptable languages into an
array. While all entries in that array are allocated strings, the final
entry in that array is a string constant. This is fine because we
explicitly skip over the last entry when freeing the array, but will
cause warnings once we enable `-Wwrite-strings`.

Adapt the code to also allocate the final entry.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

compat/win32: fix const-correctness with string constants

Adjust various places in our Win32 compatibility layer where we are not
assigning string constants to `const char *` variables.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

pretty: add casts for decoration option pointers

The `struct decoration_options` have a prefix and suffix field which are
both non-constant, but we assign a constant pointer to them. This is
safe to do because we pass them to `format_decorations()`, which never
modifies these pointers, and then immediately discard the structure. Add
explicit casts to avoid compilation warnings with `-Wwrite-strings`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

object-file: make `buf` parameter of `index_mem()` a constant

The `buf` parameter of `index_mem()` is a non-constant string. This will
break once we enable `-Wwrite-strings` because we also pass constants
from at least one callsite.

Adapt the parameter to be a constant. As we cannot free the buffer
without casting now, this also requires us to move the lifetime of the
nested buffer around.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

object-file: mark cached object buffers as const

The buffers of cached objects are never modified, but are still stored
as a non-constant pointer. This will cause a compiler warning once we
enable the `-Wwrite-strings` compiler warning as we assign an empty
constant string when initializing the static `empty_tree` cached object.

Convert the field to be constant. This requires us to shuffle around
the code a bit because we memcpy(3P) into the allocated buffer in
`pretend_object_file()`. This is easily fixed though by allocating the
buffer into a temporary variable first.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ident: add casts for fallback name and GECOS

In `xgetpwuid_self()`, we return a fallback identity when it was not
possible to look up the current identity. This fallback identity needs
to be internal and must never be written to by the calles as specified
by getpwuid(3P). As both the `pw_name` and `pw_gecos` fields are marked
as non-constant though, it will cause a warning to assign constant
strings to them once compiling with `-Wwrite-strings`.

Add explicit casts to avoid the warning.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

entry: refactor how we remove items for delayed checkouts

When finalizing a delayed checkout, we sort out several strings from the
passed-in string list by first assigning the empty string to those
filters and then calling `string_list_remove_empty_items()`. Assigning
the empty string will cause compiler warnings though as the string is
a `char *` once we enable `-Wwrite-strings`.

Refactor the code to use a `NULL` pointer with `filter_string_list()`
instead to avoid this warning.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

line-log: always allocate the output prefix

The returned string by `output_prefix()` is sometimes a string constant
and sometimes an allocated string. This has been fine until now because
we always leak the allocated strings, and thus we never tried to free
the string constant.

Fix the code to always return an allocated string and free the returned
value at all callsites.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

line-log: stop assigning string constant to file parent buffer

Stop assigning a string constant to the file parent buffer and instead
assign an allocated string. While the code is fine in practice, it will
break once we compile with `-Wwrite-strings`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

diff: cast string constant in `fill_textconv()`

The `fill_textconv()` function is responsible for converting an input
file with a textconv driver, which is then passed to the caller. Weirdly
though, the function also handles the case where there is no textconv
driver at all. In that case, it will return either the contents of the
populated filespec, or an empty string if the filespec is invalid.

These two cases have differing memory ownership semantics. When there is
a textconv driver, then the result is an allocated string. Otherwise,
the result is either a string constant or owned by the filespec struct.
All callers are in fact aware of this weirdness and only end up freeing
the output buffer when they had a textconv driver.

Ideally, we'd split up this interface to only perform the conversion via
the textconv driver, and BUG in case the caller didn't provide one. This
would make memory ownership semantics much more straight forward. For
now though, let's simply cast the empty string constant to `char *` to
avoid a warning with `-Wwrite-strings`. This is equivalent to the same
cast that we already have in `fill_mmfile()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/remote: cast away constness in `get_head_names()`

In `get_head_names()`, we assign the "refs/heads/*" string constant to
`struct refspec_item::{src,dst}`, which are both non-constant pointers.
Ideally, we'd refactor the code such that both of these fields were
constant. But `struct refspec_item` is used for two different usecases
with conflicting requirements:

  - To query for a source or destination based on the given refspec. The
    caller either sets `src` or `dst` as the branch that we want to
    search for, and the respective other field gets populated. The
    fields should be constant when being used as a query parameter,
    which is owned by the caller, and non-constant when being used as an
    out parameter, which is owned by the refspec item. This is is
    contradictory in itself already.

  - To store refspec items with their respective source and destination
    branches, in which case both fields should be owned by the struct.

Ideally, we'd split up this interface to clearly separate between
querying and storing, which would enable us to clarify lifetimes of the
strings. This would be a much bigger undertaking though.

Instead, accept the status quo for now and cast away the constness of
the source and destination patterns. We know that those are not being
written to or freed, so while this is ugly it certainly is fine for now.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refspec: remove global tag refspec structure

We have a global tag refspec structure that is used by both git-clone(1)
and git-fetch(1). Initialization of the structure will break once we
enable `-Wwrite-strings`, even though the breakage is harmless. While we
could just add casts, the structure isn't really required in the first
place as we can simply initialize the structures at the respective
callsites.

Refactor the code accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable: cast away constness when assigning constants to records

The reftable records are used in multiple ways throughout the reftable
library. In many of those cases they merely act as input to a function
without getting modified by it at all. Most importantly, this happens
when writing records and when querying for records.

We rely on this in our tests and thus assign string constants to those
fields, which is about to generate warnings as those fields are of type
`char *`. While we could go through the process and instead allocate
those strings in all of our tests, this feels quite unnecessary.

Instead, add casts to `char *` for all of those strings. As this is part
of our tests, this also nicely serves as a demonstration that nothing
writes or frees those string constants, which would otherwise lead to
segfaults.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/reftable: stop micro-optimizing refname allocations on copy

When copying refs, we execute `write_copy_table()` to write the new
table. As the names are given to us via `arg->newname` and
`arg->oldname`, respectively, we optimize away some allocations by
assigning those fields to the reftable records we are about to write
directly, without duplicating them. This requires us to cast the input
to `char *` pointers as they are in fact constant strings. Later on, we
then unset the refname for all of the records before calling
`reftable_log_record_release()` on them.

We also do this when assigning the "HEAD" constant, but here we do not
cast because its type is `char[]` by default. It's about to be turned
into `const char *` though once we enable `-Wwrite-strings` and will
thus cause another warning.

It's quite dubious whether this micro-optimization really helps. We're
about to write to disk anyway, which is going to be way slower than a
small handful of allocations. Let's drop the optimization altogther and
instead copy arguments to simplify the code and avoid the future warning
with `-Wwrite-strings`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

global: convert intentionally-leaking config strings to consts

There are multiple cases where we intentionally leak config strings:

  - `struct gpg_format` is used to track programs that can be used for
    signing commits, either via gpg(1), gpgsm(1) or ssh-keygen(1). The
    user can override the commands via several config variables. As the
    array is populated once, only, and the struct memers are never
    written to or free'd.

  - `struct ll_merge_driver` is used to track merge drivers. Same as
    with the GPG format, these drivers are populated once and then
    reused. Its data is never written to or free'd, either.

  - `struct userdiff_funcname` and `struct userdiff_driver` can be
    configured via `diff.<driver>.*` to add additional drivers. Again,
    these have a global lifetime and are never written to or free'd.

All of these are intentionally kept alive and are never written to.
Furthermore, all of these are being assigned both string constants in
some places, and allocated strings in other places. This will cause
warnings once we enable `-Wwrite-strings`, so let's mark the respective
fields as `const char *` and cast away the constness when assigning
those values.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

global: improve const correctness when assigning string constants

We're about to enable `-Wwrite-strings`, which changes the type of
string constants to `const char[]`. Fix various sites where we assign
such constants to non-const variables.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The eleventh batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'mt/openindiana-scalar'

Avoid removing the $(cwd) for portability.

* mt/openindiana-scalar:
scalar: make enlistment delete to work on all POSIX platforms

Merge branch 'rs/difftool-env-simplify'

Code simplification.

* rs/difftool-env-simplify:
difftool: add env vars directly in run_file_diff()

Merge branch 'th/quiet-lazy-fetch-from-promisor'

The promisor.quiet configuration knob can be set to true to make
lazy fetching from promisor remotes silent.

* th/quiet-lazy-fetch-from-promisor:
promisor-remote: add promisor.quiet configuration option

Merge branch 'ps/leakfixes'

Leakfixes.

* ps/leakfixes:
  builtin/mv: fix leaks for submodule gitfile paths
  builtin/mv: refactor to use `struct strvec`
  builtin/mv duplicate string list memory
  builtin/mv: refactor `add_slash()` to always return allocated strings
  strvec: add functions to replace and remove strings
  submodule: fix leaking memory for submodule entries
  commit-reach: fix memory leak in `ahead_behind()`
  builtin/credential: clear credential before exit
  config: plug various memory leaks
  config: clarify memory ownership in `git_config_string()`
  builtin/log: stop using globals for format config
  builtin/log: stop using globals for log config
  convert: refactor code to clarify ownership of check_roundtrip_encoding
  diff: refactor code to clarify memory ownership of prefixes
  config: clarify memory ownership in `git_config_pathname()`
  http: refactor code to clarify memory ownership
  checkout: clarify memory ownership in `unique_tracking_name()`
  strbuf: fix leak when `appendwholeline()` fails with EOF
  transport-helper: fix leaking helper name

credential: clear expired c->credential, unify secret clearing

When a struct credential expires, credential_fill() clears c->password
so that clients don't try to use it later. However, a struct cred that
uses an alternate authtype won't have a password, but might have a
credential stored in c->credential.

This is a problem, for example, when an OAuth2 bearer token is used. In
the system I'm using, the OAuth2 configuration generates and caches a
bearer token that is valid for an hour. After the token expires, git
needs to call back into the credential helper to use a stored refresh
token to get a new bearer token. But if c->credential is still non-NULL,
git will instead try to use the expired token and fail with an error:

fatal: Authentication failed for 'https://<oauth2-enabled-server>/repository'

And on the server:

[auth_openidc:error] [client <ip>:34012] oidc_proto_validate_exp: "exp" validation failure (1717522989): JWT expired 224 seconds ago

Fix this by clearing both c->password and c->credential for an expired
struct credential. While we're at it, use credential_clear_secrets()
wherever both c->password and c->credential are being cleared.

Update comments in credential.h to mention the new struct fields.

Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

test-terminal: drop stdin handling

Since 18d8c26930 (test_terminal: redirect child process' stdin to a pty,
2015-08-04), we set up a pty and copy stdin to the child program. But
this ends up being racy; once we send all of the bytes and close the
descriptor, the child program will no longer see a terminal! isatty()
will return 0, and trying to read may return EIO, even if we didn't yet
get all of the bytes.

This was mentioned even in the commit message of 18d8c26930, but we
hacked around it by just sending an infinite input from /dev/zero (in
the intended case, we only cared about isatty(0), not reading actual
input).

And it came up again recently in:

https://lore.kernel.org/git/d42a55b1-1ba9-4cfb-9c3d-98ea4d86da33@gmail.com/

where we tried to actually send bytes, but they don't always all come
through. So this interface is somewhat of an accident waiting to happen;
a caller might not even care about stdin being a tty, but will get bit
by the flaky behavior.

One solution would probably be to avoid closing test_terminal's end of
the pty altogether. But then the other side would never see EOF on its
stdin. That may be OK for some cases, but it's another gotcha that
might cause races or deadlocks, depending on what the child expects to
read.

Let's instead just drop test_terminal's stdin feature completely. Since
the previous commit dropped the two cases from t4153 for which the
feature was originally added, there are no callers left that need it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

am: add explicit "--retry" option

After a patch fails, you can ask "git am" to try applying it again with
new options by running without any of the resume options. E.g.:

  git am <patch
  # oops, it failed; let's try again
  git am --3way

But since this second command has no explicit resume option (like
"--continue"), it looks just like an invocation to read a fresh patch
from stdin. To avoid confusing the two cases, there are some heuristics,
courtesy of 8d18550318 (builtin-am: reject patches when there's a
session in progress, 2015-08-04):

if (in_progress) {
/*
* Catch user error to feed us patches when there is a session
* in progress:
*
* 1. mbox path(s) are provided on the command-line.
* 2. stdin is not a tty: the user is trying to feed us a patch
*    from standard input. This is somewhat unreliable -- stdin
*    could be /dev/null for example and the caller did not
*    intend to feed us a patch but wanted to continue
*    unattended.
*/
if (argc || (resume_mode == RESUME_FALSE && !isatty(0)))
die(_("previous rebase directory %s still exists but mbox given."),
state.dir);

if (resume_mode == RESUME_FALSE)
resume_mode = RESUME_APPLY;
[...]

So if no resume command is given, then we require that stdin be a tty,
and otherwise complain about (potentially) receiving an mbox on stdin.
But of course you might not actually have a terminal available! And
sadly there is no explicit way to hit this same code path; this is the
only place that sets RESUME_APPLY. So you're stuck, and scripts like our
test suite have to bend over backwards to create a pseudo-tty.

Let's provide an explicit option to trigger this mode. The code turns
out to be quite simple; just setting "resume_mode" to RESUME_FALSE is
enough to dodge the tty check, and then our state is the same as it
would be with the heuristic case (which we'll continue to allow).

When we don't have a session in progress, there's already code to
complain when resume_mode is set (but we'll add a new test to cover
that).

To test the new option, we'll convert the existing tests that rely on
the fake stdin tty. That lets us test them on more platforms, and will
let us simplify test_terminal a bit in a future patch.

It does, however, mean we're not testing the tty heuristic at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: fix check for Ubuntu 20.04

In 5ca0c455f1 (ci: fix Python dependency on Ubuntu 24.04, 2024-05-06),
we made the use of Python 2 conditional on whether or not the CI job
runs Ubuntu 20.04. There was a brown-paper-bag-style bug though, where
the condition forgot to invoke the `test` builtin. The result of it is
that the check always fails, and thus all of our jobs run with Python 3
by accident.

Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

builtin/refs: new command to migrate ref storage formats

Introduce a new command that allows the user to migrate a repository
between ref storage formats. This new command is implemented as part of
a new git-refs(1) executable. This is due to two reasons:

  - There is no good place to put the migration logic in existing
    commands. git-maintenance(1) felt unwieldy, and git-pack-refs(1) is
    not the correct place to put it, either.

  - I had it in my mind to create a new low-level command for accessing
    refs for quite a while already. git-refs(1) is that command and can
    over time grow more functionality relating to refs. This should help
    discoverability by consolidating low-level access to refs into a
    single executable.

As mentioned in the preceding commit that introduces the ref storage
format migration logic, the new `git refs migrate` command still has a
bunch of restrictions. These restrictions are documented accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: implement logic to migrate between ref storage formats

With the introduction of the new "reftable" backend, users may want to
migrate repositories between the backends without having to recreate the
whole repository. Add the logic to do so.

The implementation is generic and works with arbitrary ref storage
formats so that a backend does not need to implement any migration
logic. It does have a few limitations though:

  - We do not migrate repositories with worktrees, because worktrees
    have separate ref storages. It makes the overall affair more complex
    if we have to migrate multiple storages at once.

  - We do not migrate reflogs, because we have no interfaces to write
    many reflog entries.

  - We do not lock the repository for concurrent access, and thus
    concurrent writes may end up with weird in-between states. There is
    no way to fully lock the "files" backend for writes due to its
    format, and thus we punt on this topic altogether and defer to the
    user to avoid those from happening.

In other words, this version is a minimum viable product for migrating a
repository's ref storage format. It works alright for bare repos, which
often have neither worktrees nor reflogs. But it will not work for many
other repositories without some preparations. These limitations are not
set into stone though, and ideally we will eventually address them over
time.

The logic is not yet used by anything, and thus there are no tests for
it. Those will be added in the next commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: implement removal of ref storages

We're about to introduce logic to migrate ref storages. One part of the
migration will be to delete the files that are part of the old ref
storage format. We don't yet have a way to delete such data generically
across ref backends though.

Implement a new `delete` callback and expose it via a new
`ref_storage_delete()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

worktree: don't store main worktree twice

In `get_worktree_ref_store()` we either return the repository's main ref
store, or we look up the ref store via the map of worktree ref stores.
Which of these worktrees gets picked depends on the `is_current` bit of
the worktree, which indicates whether the worktree is the one that
corresponds to `the_repository`.

The bit is getting set in `get_worktrees()`, but only after we have
computed the list of all worktrees. This is too late though, because at
that time we have already called `get_worktree_ref_store()` on each of
the worktrees via `add_head_info()`. The consequence is that the current
worktree will not have been marked accordingly, which means that we did
not use the main ref store, but instead created a new ref store. We thus
have two separate ref stores now that map to the same ref database.

Fix this by setting `is_current` before we call `add_head_info()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable: inline `merged_table_release()`

The function `merged_table_release()` releases a merged table, whereas
`reftable_merged_table_free()` releases a merged table and then also
free's its pointer. But all callsites of `merged_table_release()` are in
fact followed by `reftable_merged_table_free()`, which is redundant.

Inline `merged_table_release()` into `reftable_merged_table_free()` to
get rid of this redundance.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: fix NULL pointer deref when releasing ref store

The `free_ref_cache()` function is not `NULL` safe and will thus
segfault when being passed such a pointer. This can easily happen when
trying to release a partially initialized "files" ref store. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: extract function to iterate through root refs

Extract a new function that can be used to iterate through all root refs
known to the "files" backend. This will be used in the next commit,
where we start to teach ref backends to remove themselves.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/files: refactor `add_pseudoref_and_head_entries()`

The `add_pseudoref_and_head_entries()` function accepts both the ref
store as well as a directory name as input. This is unnecessary though
as the ref store already uniquely identifies the root directory of the
ref store anyway.

Furthermore, the function is misnamed now that we have clarified the
meaning of pseudorefs as it doesn't add pseudorefs, but root refs.
Rename it accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: allow to skip creation of reflog entries

The ref backends do not have any way to disable the creation of reflog
entries. This will be required for upcoming ref format migration logic
so that we do not create any entries that didn't exist in the original
ref database.

Provide a new `REF_SKIP_CREATE_REFLOG` flag that allows the caller to
disable reflog entry creation.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: pass storage format to `ref_store_init()` explicitly

We're about to introduce logic to migrate refs from one storage format
to another one. This will require us to initialize a ref store with a
different format than the one used by the passed-in repository.

Prepare for this by accepting the desired ref storage format as
parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs: convert ref storage format to an enum

The ref storage format is tracked as a simple unsigned integer, which
makes it harder than necessary to discover what that integer actually is
or where its values are defined.

Convert the ref storage format to instead be an enum.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

setup: unset ref storage when reinitializing repository version

When reinitializing a repository's version we may end up unsetting the
hash algorithm when it matches the default hash algorithm. If we didn't
do that then the previously configured value might remain intact.

While the same issue exists for the ref storage extension, we don't do
this here. This has been fine for most of the part because it is not
supported to re-initialize a repository with a different ref storage
format anyway. We're about to introduce a new command to migrate ref
storages though, so this is about to become an issue there.

Prepare for this and unset the ref storage format when reinitializing a
repository with the "files" format.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci/test-documentation: work around SyntaxWarning in Python 3.12

In Python 3.6, unrecognized escape sequences in regular expressions
started to produce a DeprecationWarning [1]. In Python 3.12, this was
upgraded to a SyntaxWarning and will eventually be raised even further
to a SyntaxError. We indirectly hit such unrecognized escape sequences
via Asciidoc, which results in a bunch of warnings:

    $ asciidoc -o /dev/null git-cat-file.txt
    <unknown>:1: SyntaxWarning: invalid escape sequence '\S'
    <unknown>:1: SyntaxWarning: invalid escape sequence '\S'

This in turn causes our "ci/test-documentation.sh" script to fail, as it
checks that stderr of `make doc` is empty.

These escape sequences seem to be part of Asciidoc itself. In the long
term, we should probably consider dropping support for Asciidoc in favor
of Asciidoctor. Upstream also considers itself to be legacy software and
recommends to move away from it [2]:

    It is suggested that unless you specifically require the AsciiDoc.py
    toolchain, you should find a processor that handles the modern
    AsciiDoc syntax.

For now though, let's expand its lifetime a little bit more by filtering
out these new warnings. We should probably reconsider once the warnings
are upgraded to errors by Python.

[1]: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
[2]: https://github.com/asciidoc-py/asciidoc-py/blob/6d9f76cff0dc3b7ca21bdd570200f8518464d99b/README.md#asciidocpy

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

gitlab-ci: add job to run `make check-docs`

Add another job to execute `make check-docs`, which lints our
documentation and makes sure that expected manpages exist. This job
mirrors the same job that we already have for GitHub Actions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation/lint-manpages: bubble up errors

The "lint-manpages.sh" script does not return an error in case any of
its checks fail. While this is faithful to the implementation that we
had as part of the "check-docs" target before the preceding commit, it
makes it hard to spot any violations of the rules via the corresponding
CI job, which will of course exit successfully, too.

Adapt the script to bubble up errors.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Makefile: extract script to lint missing/extraneous manpages

The "check-docs" target of our top-level Makefile fulfills two different
roles. For one it runs the "lint-docs" target of the "Documentation/"
Makefile. And second it performs some checks of whether there are any
manpages that are missing or extraneous via some inline scripts.

The second set of checks feels quite misplaced in the top-level Makefile
as it would fit in much better with our "lint-docs" target. Back when
the checks were introduced in 8c989ec528 (Makefile: $(MAKE) check-docs,
2006-04-13), that target did not yet exist though.

Furthermore, the script makes use of several Makefile variables which
are defined in the top-level Makefile, which makes it hard to access
their contents from elsewhere. There is a trick though that we already
use in "check-builtins.sh" to gain access: we can create an ad-hoc
Makefile that has an extra target to print those variables.

Pull out the script into a separate "lint-manpages.sh" script by using
that trick. Wire up that script via the "lint-docs" target. For one,
normal shell scripts are way easier to reason about than those which are
embedded in a Makefile. Second, it allows one to easily execute the
script standalone without any of the other checks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sparse-checkout: free duplicate hashmap entries

In insert_recursive_pattern(), we create a new pattern_entry to insert
into the parent_hashmap. If we find that the same entry already exists
in the hashmap, we skip adding the new one. But we forget to free the new
one, creating a leak.

We can fix it by cleaning up the discarded entry. It would probably be
possible to avoid creating it in the first place, but it's non-trivial.
We'd have to define a "keydata" struct that lets us compare the existing
entries to the broken-out fields. It's probably not worth the
complexity, so we'll punt on that for now.

There is one subtlety here: our insertion is happening in a loop, with
each iteration looking at the pattern we just inserted (hence the
"recursive" in the name). So if we skip insertion, what do we look at?

The obvious answer is that we should remember the existing duplicate we
found and use that. But I _think_ in that case, we probably already have
all of the recursive bits already (from when the original entry was
added). And so just breaking out of the loop would be correct. But I'm
not 100% sure on that; after all, the original leaky code could have
done the same break, but it didn't.

So I went with the "obvious answer" above, which has no chance of
changing the behavior aside from fixing the leak.

With this patch, t1091 can now be marked leak-free.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sparse-checkout: free string list after displaying

In sparse_checkout_list(), we put the hashmap entries into a string_list
so we can sort them. But after printing, we forget to free the list.

This patch drops 5 leaks from t1091.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sparse-checkout: free pattern list in sparse_checkout_list()

In sparse_checkout_list(), we create a pattern_list that needs to
eventually be cleared. We remember to do so in the regular code path,
but the cone-mode path does an early return, and forgets to clean up.

We could fix the leak by adding a new call to clear_pattern_list(). But
we can simplify even further by just skipping the early return, pushing
the other code path (which consists now of only one line!) into an else
block. That also matches the same cone/non-cone if/else used in some
other functions.

This fixes 15 leaks found in t1091.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sparse-checkout: free sparse_filename after use

We allocate a heap buffer via get_sparse_checkout_filename(). Most calls
remember to free it, but sparse_checkout_init() forgets to, causing a
leak. Ironically, it remembers to do so in the error return paths, but
not in the path that makes it all the way to the function end!

Fixing this clears up 6 leaks from t1091.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sparse-checkout: refactor temporary sparse_checkout_patterns

In update_working_directory(), we take in a pattern_list, attach it to
the repository index by assigning it to index->sparse_checkout_patterns,
and then call unpack_trees. Afterwards, we remove it by setting
index->sparse_checkout_patterns back to NULL.

But there are two possible leaks here:

  1. If the index already had a populated sparse_checkout_patterns,
     we've obliterated it. We can fix this by saving and restoring it,
     rather than always setting it back to NULL.

  2. We may call the function with a NULL pattern_list, expecting it to
     use the on-disk sparse file. In that case, the index routines will
     lazy-load the sparse patterns automatically. But now at the end of
     the function when we restore the patterns, we'll leak those
     lazy-loaded ones!

     We can fix this by freeing the pattern list before overwriting its
     pointer whenever it does not match what was passed in (in practice
     this should only happen when the passed-in list is NULL, but this
     is erring on the defensive side).

Together these remove 48 indirect leaks found in t1091.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sparse-checkout: always free "line" strbuf after reading input

In add_patterns_from_input(), we may read lines from a file with a loop
like this:

  while (!strbuf_getline(&line, file)) {
...
strbuf_to_cone_pattern(&line, pl);
  }
  /* we don't strbuf_release(&line) here! */

This generally is OK because strbuf_to_cone_pattern() consumes the
buffer via strbuf_detach(). But we can leak in a few cases:

  1. We don't always consume the buffer! If the line ends up empty after
     trimming, we leave strbuf_to_cone_pattern() without detaching. In
     most cases this is OK, because a subsequent getline() call will use
     the same buffer. But if you had an empty line at the end of file,
     for example, it would leak.

  2. Even if strbuf_to_cone_pattern() always consumed the buffer,
     there's a subtle issue with strbuf_getline(). As we saw in
     94e2aa555e (strbuf: fix leak when `appendwholeline()` fails with
     EOF, 2024-05-27), it's possible for it to return EOF with an
     allocated buffer (e.g., if the underlying getdelim() call saw an
     error). So we should always strbuf_release() after finishing a read
     loop like this.

Note that even the code to read patterns from argv has the same problem.
Because that also uses strbuf_to_cone_pattern(), we stuff each argv
entry into a strbuf. It uses the same "line" strbuf as the getline code,
but we should position the strbuf_release() to cover both code paths.

This fixes at least 9 leaks found in t1091.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sparse-checkout: reuse --stdin buffer when reading patterns

When we read patterns from --stdin, we loop on strbuf_getline(), and
detach each line we read to pass into add_pattern(). This used to be
necessary because add_pattern() required that the pattern strings remain
valid while the pattern_list was in use. But it also created a leak,
since we didn't record the detached buffers anywhere else.

Now that add_pattern() has been modified to make its own copy of the
strings, we can stop detaching and fix the leak. This fixes 4 leaks
detected in t1091.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

dir.c: always copy input to add_pattern()

The add_pattern() function has a subtle and undocumented gotcha: the
pattern string you pass in must remain valid as long as the pattern_list
is in use (and nor do we take ownership of it). This is easy to get
wrong, causing either subtle bugs (because you free or reuse the string
buffer) or leaks (because you copy the string, but don't track ownership
separately).

All of this "pattern" code was originally the "exclude" mechanism. So
this _usually_ works OK because you add entries in one of two ways:

  1. From the command-line (e.g., "--exclude"), in which case we're
     pointing to an argv entry which remains valid for the lifetime of
     the program.

  2. From a file (e.g., ".gitignore"), in which case we read the whole
     file into a buffer, attach it to the pattern_list's "filebuf"
     entry, then parse the buffer in-place (adding NULs). The strings
     point into the filebuf, which is cleaned up when the whole
     pattern_list goes away.

But other code, like sparse-checkout, reads individual lines from stdin
and passes them one by one to add_pattern(), leaking each. We could fix
this by refactoring it to take in the whole buffer at once, like (2)
above, and stuff it in "filebuf". But given how subtle the interface is,
let's just fix it to always copy the string.

That seems at first like we'd be wasting extra memory, but we can
mitigate that:

  a. The path_pattern struct already uses a FLEXPTR, since we sometimes
     make a copy (when we see "foo/", we strip off the trailing slash,
     requiring a modifiable copy of the string).

     Since we'll now always embed the string inside the struct, we can
     switch to the regular FLEX_ARRAY pattern, saving us 8 bytes of
     pointer. So patterns with a trailing slash and ones under 8 bytes
     actually get smaller.

  b. Now that we don't need the original string to hang around, we can
     get rid of the "filebuf" mechanism entirely, and just free the file
     contents after parsing. Since files are the sources we'd expect to
     have the largest pattern sets, we should mostly break even on
     stuffing the same data into the individual structs.

This patch just adjusts the add_pattern() interface; it doesn't fix any
leaky callers yet.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

dir.c: reduce max pattern file size to 100MB

In a2bc523e1e (dir.c: skip .gitignore, etc larger than INT_MAX,
2024-05-31) we put capped the size of some files whose parsing code and
data structures used ints. Setting the limit to INT_MAX was a natural
spot, since we know the parsing code would misbehave above that.

But it also leaves the possibility of overflow errors when we multiply
that limit to allocate memory. For instance, a file consisting only of
"a\na\n..." could have INT_MAX/2 entries. Allocating an array of
pointers for each would need INT_MAX*4 bytes on a 64-bit system, enough
to overflow a 32-bit int.

So let's give ourselves a bit more safety margin by giving a much
smaller limit. The size 100MB is somewhat arbitrary, but is based on the
similar value for attribute files added by 3c50032ff5 (attr: ignore
overly large gitattributes files, 2022-12-01).

There's no particular reason these have to be the same, but the idea is
that they are in the ballpark of "so huge that nobody would care, but
small enough to avoid malicious overflow". So lacking a better guess, it
makes sense to use the same value. The implementation here doesn't share
the same constant, but we could change that later (or even give it a
runtime config knob, though nobody has complained yet about the
attribute limit).

And likewise, let's add a few tests that exercise the limits, based on
the attr ones. In this case, though, we never read .gitignore from the
index; the blob code is exercised only for sparse filters. So we'll
trigger it that way.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

imap-send: minimum leakfix

EVen with the minimum "no-op" invocation t1517 makes, "git imap-send"
leaks an empty strbuf it used to read a 0-byte string into.

There are a few other topics cooking in 'next' that plugs many
other leaks in this program, so let's minimally fix this one, barely
enough to make CI pass, leaving the rest for the other topic.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

dir.c: free removed sparse-pattern hashmap entries

In add_pattern_to_hashsets(), we remove entries from the
recursive_hashmap when adding similar ones to the parent_hashmap. I
won't pretend to understand all of what's going on here, but there's an
obvious leak: whatever we removed from recursive_hashmap is not
referenced anywhere else, and is never free()d.

We can easily fix this by asking the hashmap to return a pointer to the
old entry. This makes t7002 now completely leak-free.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sparse-checkout: clear patterns when init() sees existing sparse file

In sparse_checkout_init(), we first try to load patterns from an
existing file. If we found any, we return immediately, but end up
leaking the patterns we parsed. Fixing this reduces the number of leaks
in t7002 from 9 down to 5.

Note that there are two other exits from the function, but they don't
need the same treatment:

  - if we can't resolve HEAD, we write out a hard-coded sparse file and
    return. But we know the pattern list is empty there, since we didn't
    find any in the on-disk file and we haven't yet added any of our
    own.

  - otherwise, we do populate the list and then tail-call into
    write_patterns_and_update(). But that function frees the
    pattern_list itself, so we don't need to.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

dir.c: free strings in sparse cone pattern hashmaps

The pattern_list structs used for cone-mode sparse lookups use a few
extra hashmaps. These store pattern_entry structs, each of which has its
own heap-allocated pattern string. When we clean up the hashmaps, we
free the individual pattern_entry structs, but forget to clean up the
embedded strings, causing memory leaks.

We can fix this by iterating over the hashmaps to free the extra
strings. This reduces the numbers of leaks in t7002 from 22 to 9.

One alternative here would be to make the string a FLEX_ARRAY member of
the pattern_entry. Then there's no extra free() required, and as a bonus
it would be a little more efficient. However, some of the refactoring
gets awkward, as we are often assigning strings allocated by helper
functions. So let's just fix the leak for now, and we can explore bigger
refactoring separately.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sparse-checkout: pass string literals directly to add_pattern()

The add_pattern() function takes a pattern string, but neither makes a
copy of it nor takes ownership of the memory. So it is the caller's
responsibility to make sure the string hangs around as long as the
pattern_list which references it.

There are a few cases in sparse-checkout where we use string literal
patterns by stuffing them into a strbuf, detaching the buffer, and then
passing the result into add_pattern(). This creates a leak when the
pattern_list is eventually cleared, since we don't retain a copy of the
detached buffer to free.

But we can observe that the whole strbuf dance is unnecessary. The point
was presumably[1] to satisfy the lifetime requirement of the string. But
string literals have static duration; we can count on them lasting for
the whole program.

So we can fix the leak by just passing them directly. And as a bonus,
that simplifies the code. The leaks can be seen in t7002, which drops
from 25 leaks to 22 with this patch. It also makes t3602 and t1090
leak-free.

In the long run, we will also want to clean up this (undocumented!)
memory lifetime requirement of add_pattern(). But that can come in a
later patch; passing the string literals directly will be the right
thing either way.

[1] The code in question comes from 416adc8711 (sparse-checkout: update
    working directory in-process for 'init', 2019-11-21) and 99dfa6f970
    (sparse-checkout: use in-process update for disable subcommand,
    2019-11-21), but I didn't see anything in their commit messages or
    on the list explaining the strbufs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

sparse-checkout: free string list in write_cone_to_file()

We use a string list to hold sorted and de-duped patterns, but don't
free it before leaving the function, causing a leak.

This drops the number of leaks found in t7002 from 27 to 25.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The tenth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'th/push-local-ff-check-without-lazy-fetch'

When "git push" notices that the commit at the tip of the ref on
the other side it is about to overwrite does not exist locally, it
used to first try fetching it if the local repository is a partial
clone. The command has been taught not to do so and immediately
fail instead.

* th/push-local-ff-check-without-lazy-fetch:
push: don't fetch commit object when checking existence

Merge branch 'ps/fix-reinit-includeif-onbranch'

"git init" in an already created directory, when the user
configuration has includeif.onbranch, started to fail recently,
which has been corrected.

* ps/fix-reinit-includeif-onbranch:
setup: fix bug with "includeIf.onbranch" when initializing dir

Sync with 'maint'

run-command: show prepared command

This adds a trace point in start_command so we can see the full
command invocation without having to resort to strace/code inspection.
For example:

$ GIT_TRACE=1 git test foo
git.c:755               trace: exec: git-test foo
run-command.c:657       trace: run_command: git-test foo
run-command.c:657       trace: run_command: 'echo $*' foo
run-command.c:749       trace: start_command: /bin/sh -c 'echo $* "$@"' 'echo $*' foo

Prior changes have made the documentation around the internals of the
alias command execution clearer, but I have still found this detailed
view of the aliased command being run helpful for debugging purposes.

A test case is added to ensure the full command output is present in
the execution flow.

Signed-off-by: Ian Wienand <iwienand@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: alias: add notes on shell expansion

When writing inline shell for shell-expansion aliases (i.e. prefixed
with "!"), there are some caveats around argument parsing to be aware
of. This series of notes attempts to explain what is happening more
clearly.

Signed-off-by: Ian Wienand <iwienand@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

dir.c: skip .gitignore, etc larger than INT_MAX

We use add_patterns() to read .gitignore, .git/info/exclude, etc, as
well as other pattern-like files like sparse-checkout. The parser for
these uses an "int" as an index, meaning that files over 2GB will
generally cause signed integer overflow and out-of-bounds access.

This is unlikely to happen in any real files, but we do read .gitignore
files from the tree. A malicious tree could cause an out-of-bounds read
and segfault (we also write NULs over newlines, so in theory it could be
an out-of-bounds write, too, but as we go char-by-char, the first thing
that happens is trying to read a negative 2GB offset).

We could fix the most obvious issue by replacing one "int" with a
"size_t". But there are tons of "int" sprinkled throughout this code for
things like pattern lengths, number of patterns, and so on. Since nobody
would actually want a 2GB .gitignore file, an easy defensive measure is
to just refuse to parse them.

The "int" in question is in add_patterns_from_buffer(), so we could
catch it there. But by putting the checks in its two callers, we can
produce more useful error messages.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Post 2.45.2 updates

Merge down a handful of topics to adjust tests and CI to make them
work better, without changing Git itself, and a bit of developer
docs update:

* Tests that try to corrupt in-repository files in chunked format did
   not work well on macOS due to its broken "mv", which has been
   worked around.

* Unbreak CI jobs so that we do not attempt to use Python 2 that has
   been removed from the platform.

* Git 2.43 started using the tree of HEAD as the source of attributes
   in a bare repository, which has severe performance implications.
   For now, revert the change, without ripping out a more explicit
   support for the attr.tree configuration variable.

* Windows CI running in GitHub Actions started complaining about the
   order of arguments given to calloc(); the imported regex code uses
   the wrong order almost consistently, which has been corrected.

Signed-off-by: Junio C Hamano <gitster@pobox.com>