git.ipfire.org Git - thirdparty/git.git/log

reftable/block: avoid copying block iterators on seek

When seeking a reftable record in a block we need to position the
iterator _before_ the sought-after record so that the next call to
`block_iter_next()` would yield that record. To achieve this, the loop
that performs the linear seeks to restore the previous position once it
has found the record.

This is done by advancing two `block_iter`s: one to check whether the
next record is our sought-after record, and one that we update after
every iteration. This of course involves quite a lot of copying and also
leads to needless memory allocations.

Refactor the code to get rid of the `next` iterator and the copying this
involves. Instead, we can restore the previous offset such that the call
to `next` will return the correct record.

Next to being simpler conceptually this also leads to a nice speedup.
The following benchmark parser 10k refs out of 100k existing refs via
`git-rev-list --no-walk`:

  Benchmark 1: rev-list: print many refs (HEAD~)
    Time (mean ± σ):     170.2 ms ±   1.7 ms    [User: 86.1 ms, System: 83.6 ms]
    Range (min … max):   166.4 ms … 180.3 ms    500 runs

  Benchmark 2: rev-list: print many refs (HEAD~)
    Time (mean ± σ):     161.6 ms ±   1.6 ms    [User: 78.1 ms, System: 83.0 ms]
    Range (min … max):   158.4 ms … 172.3 ms    500 runs

  Summary
    rev-list: print many refs (HEAD) ran
      1.05 ± 0.01 times faster than rev-list: print many refs (HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: reuse `zstream` state on inflation

When calling `inflateInit()` and `inflate()`, the zlib library will
allocate several data structures for the underlying `zstream` to keep
track of various information. Thus, when inflating repeatedly, it is
possible to optimize memory allocation patterns by reusing the `zstream`
and then calling `inflateReset()` on it to prepare it for the next chunk
of data to inflate.

This is exactly what the reftable code is doing: when iterating through
reflogs we need to potentially inflate many log blocks, but we discard
the `zstream` every single time. Instead, as we reuse the `block_reader`
for each of the blocks anyway, we can initialize the `zstream` once and
then reuse it for subsequent inflations.

Refactor the code to do so, which leads to a significant reduction in
the number of allocations. The following measurements were done when
iterating through 1 million reflog entries. Before:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 23,028 allocs, 22,906 frees, 162,813,552 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 302 allocs, 180 frees, 88,352 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: open-code call to `uncompress2()`

The reftable format stores log blocks in a compressed format. Thus,
whenever we want to read such a block we first need to decompress it.
This is done by calling the convenience function `uncompress2()` of the
zlib library, which is a simple wrapper that manages the lifecycle of
the `zstream` structure for us.

While nice for one-off inflation of data, when iterating through reflogs
we will likely end up inflating many such log blocks. This requires us
to reallocate the state of the `zstream` every single time, which adds
up over time. It would thus be great to reuse the `zstream` instead of
discarding it after every inflation.

Open-code the call to `uncompress2()` such that we can start reusing the
`zstream` in the subsequent commit. Note that our open-coded variant is
different from `uncompress2()` in two ways:

  - We do not loop around `inflate()` until we have processed all input.
    As our input is limited by the maximum block size, which is 16MB, we
    should not hit limits of `inflate()`.

  - We use `Z_FINISH` instead of `Z_NO_FLUSH`. Quoting the `inflate()`
    documentation: "inflate() should normally be called until it returns
    Z_STREAM_END or an error. However if all decompression is to be
    performed in a single step (a single call of inflate), the parameter
    flush should be set to Z_FINISH."

    Furthermore, "Z_FINISH also informs inflate to not maintain a
    sliding window if the stream completes, which reduces inflate's
    memory footprint."

Other than that this commit is expected to be functionally equivalent
and does not yet reuse the `zstream`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: reuse uncompressed blocks

The reftable backend stores reflog entries in a compressed format and
thus needs to uncompress blocks before one can read records from it.
For each reflog block we thus have to allocate an array that we can
decompress the block contents into. This block is being discarded
whenever the table iterator moves to the next block. Consequently, we
reallocate a new array on every block, which is quite wasteful.

Refactor the code to reuse the uncompressed block data when moving the
block reader to a new block. This significantly reduces the number of
allocations when iterating through many compressed blocks. The following
measurements are done with `git reflog list` when listing 100k reflogs.
Before:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 45,755 allocs, 45,633 frees, 254,779,456 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 23,028 allocs, 22,906 frees, 162,813,547 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/reader: iterate to next block in place

The table iterator has to iterate towards the next block once it has
yielded all records of the current block. This is done by creating a new
table iterator, initializing it to the next block, releasing the old
iterator and then copying over the data.

Refactor the code to instead advance the table iterator in place. This
is simpler and unlocks some optimizations in subsequent patches. Also,
it allows us to avoid some allocations.

The following measurements show a single matching ref out of 1 million
refs. Before this change:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 315 allocs, 190 frees, 107,027 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: move ownership of block reader into `struct table_iter`

The table iterator allows the caller to iterate through all records in a
reftable table. To do so it iterates through all blocks of the desired
type one by one, where for each block it creates a new block iterator
and yields all its entries.

One of the things that is somewhat confusing in this context is who owns
the block reader that is being used to read the blocks and pass them to
the block iterator. Intuitively, as the table iterator is responsible
for iterating through the blocks, one would assume that this iterator is
also responsible for managing the lifecycle of the reader. And while it
somewhat is, the block reader is ultimately stored inside of the block
iterator.

Refactor the code such that the block reader is instead fully managed by
the table iterator. Instead of passing the reader to the block iterator,
we now only end up passing the block data to it. Despite clearing up the
lifecycle of the reader, it will also allow for better reuse of the
reader in subsequent patches.

The following benchmark prints a single matching ref out of 1 million
refs. Before:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 6,607 allocs, 6,482 frees, 509,635 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated

Note that while there are more allocation and free calls now, the
overall number of bytes allocated is significantly lower. The number of
allocations will be reduced significantly by the next patch though.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: introduce `block_reader_release()`

Introduce a new function `block_reader_release()` that releases
resources acquired by the block reader. This function will be extended
in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: better grouping of functions

Function definitions and declaration of `struct block_reader` and
`struct block_iter` are somewhat mixed up, making it hard to see which
functions belong together. Rearrange them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: merge `block_iter_seek()` and `block_reader_seek()`

The function `block_iter_seek()` is merely a simple wrapper around
`block_reader_seek()`. Merge those two functions into a new function
`block_iter_seek_key()` that more clearly says what it is actually
doing.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: rename `block_reader_start()`

The function `block_reader_start()` does not really apply to the block
reader, but to the block iterator. It's name is thus somewhat confusing.
Rename it to `block_iter_seek_start()` to clarify.

We will rename `block_reader_seek()` in similar spirit in the next
commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

launch_editor: waiting message on error

When advice.waitingForEditor configuration is not set to false, we show
a hint telling that we are waiting for user's editor to close the file
when we launch an editor and wait for it to return control back to us.
We give the message on an incomplete line, expecting that we can go back
to the beginning of the line and clear the message when the editor returns.

However, it is possible that the editor exits with an error status, in
which case we show an error message and then return to our caller. In
such a case, the error message is given where the terminal cursor
happens to be, which is most likely after the "we are waiting for your
editor" message on the same line.

Clear the line before showing the error.

While we're here, make the error message follow our CodingGuideLines.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fast-import: make comments more precise

The former is somewhat imprecise. The latter became out of sync with the
behavior in e814c39c2f (fast-import: refactor parsing of spaces,
2014-06-18).

Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fast-import: forbid escaped NUL in paths

NUL cannot appear in paths. Even disregarding filesystem path
limitations, the tree object format delimits with NUL, so such a path
cannot be encoded by Git.

When a quoted path is unquoted, it could possibly contain NUL from
"\000". Forbid it so it isn't truncated.

fast-import still has other issues with NUL, but those will be addressed
later.

Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fast-import: document C-style escapes for paths

Simply saying “C-style” string quoting is imprecise, as only a subset of
C escapes are supported. Document the exact escapes.

Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fast-import: improve documentation for path quoting

It describes what characters cannot be in an unquoted path, but not
their semantics. Reframe it as a definition of unquoted paths. From the
perspective of the parser, whether it starts with `"` is what defines
whether it will parse it as quoted or unquoted.

The restrictions on characters in unquoted paths (with starting-", LF,
and spaces) are explained in the quoted paragraph. Move it to the
unquoted paragraph and reword.

The restriction that the source paths of filecopy and filerename cannot
contain SP is only stated in their respective sections. Restate it in
the <path> section.

Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fast-import: remove dead strbuf

The strbuf in `note_change_n` is to copy the remainder of `p` before
potentially invalidating it when reading the next line. However, `p` is
not used after that point. It has been unused since the function was
created in a8dd2e7d2b (fast-import: Add support for importing commit
notes, 2009-10-09) and looks to be a fossil from adapting
`file_change_m`. Remove it.

Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fast-import: allow unquoted empty path for root

Ever since filerename was added in f39a946a1f (Support wholesale
directory renames in fast-import, 2007-07-09) and filecopy in b6f3481bb4
(Teach fast-import to recursively copy files/directories, 2007-07-15),
both have produced an error when the destination path is empty. Later,
when support for targeting the root directory with an empty string was
added in 2794ad5244 (fast-import: Allow filemodify to set the root,
2010-10-10), this had the effect of allowing the quoted empty string
(`""`), but forbidding its unquoted variant (``). This seems to have
been intended as simple data validation for parsing two paths, rather
than a syntax restriction, because it was not extended to the other
operations.

All other occurrences of paths (in filemodify, filedelete, the source of
filecopy and filerename, and ls) allow both.

For most of this feature's lifetime, the documentation has not
prescribed the use of quoted empty strings. In e5959106d6
(Documentation/fast-import: put explanation of M 040000 <dataref> "" in
context, 2011-01-15), its documentation was changed from “`<path>` may
also be an empty string (`""`) to specify the root of the tree” to “The
root of the tree can be represented by an empty string as `<path>`”.

Thus, we should assume that some front-ends have depended on this
behavior.

Remove this restriction for the destination paths of filecopy and
filerename and change tests targeting the root to test `""` and ``.

Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fast-import: directly use strbufs for paths

Previously, one case would not write the path to the strbuf: when the
path is unquoted and at the end of the string. It was essentially
copy-on-write. However, with the logic simplification of the previous
commit, this case was eliminated and the strbuf is always populated.

Directly use the strbufs now instead of an alias.

Since this already changes all the lines that use the strbufs, rename
them from `uq` to be more descriptive. That they are unquoted is not
their most important property, so name them after what they carry.

Additionally, `file_change_m` no longer needs to copy the path before
reading inline data.

Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

fast-import: tighten path unquoting

Path parsing in fast-import is inconsistent and many unquoting errors
are suppressed or not checked.

<path> appears in the grammar in these places:

    filemodify ::= 'M' SP <mode> (<dataref> | 'inline') SP <path> LF
    filedelete ::= 'D' SP <path> LF
    filecopy   ::= 'C' SP <path> SP <path> LF
    filerename ::= 'R' SP <path> SP <path> LF
    ls         ::= 'ls' SP <dataref> SP <path> LF
    ls-commit  ::= 'ls' SP <path> LF

and fast-import.c parses them in five different ways:

1. For filemodify and filedelete:
   Try to unquote <path>. If it unquotes without errors, use the
   unquoted version; otherwise, treat it as literal bytes to the end of
   the line (including any number of SP).
2. For filecopy (source) and filerename (source):
   Try to unquote <path>. If it unquotes without errors, use the
   unquoted version; otherwise, treat it as literal bytes up to, but not
   including, the next SP.
3. For filecopy (dest) and filerename (dest):
   Like 1., but an unquoted empty string is forbidden.
4. For ls:
   If <path> starts with `"`, unquote it and report parse errors;
   otherwise, treat it as literal bytes to the end of the line
   (including any number of SP).
5. For ls-commit:
   Unquote <path> and report parse errors.
   (It must start with `"` to disambiguate from ls.)

In the first three, any errors from trying to unquote a string are
suppressed, so a quoted string that contains invalid escapes would be
interpreted as literal bytes. For example, `"\xff"` would fail to
unquote (because hex escapes are not supported), and it would instead be
interpreted as the byte sequence '"', '\\', 'x', 'f', 'f', '"', which is
certainly not intended. Some front-ends erroneously use their language's
standard quoting routine instead of matching Git's, which could silently
introduce escapes that would be incorrectly parsed due to this and lead
to data corruption.

The documentation states “To use a source path that contains SP the path
must be quoted.”, so it is expected that some implementations depend on
spaces being allowed in paths in the final position. Thus we have two
documented ways to parse paths, so simplify the implementation to that.

Now we have:

1. `parse_path_eol` for filemodify, filedelete, filecopy (dest),
   filerename (dest), ls, and ls-commit:

   If <path> starts with `"`, unquote it and report parse errors;
   otherwise, treat it as literal bytes to the end of the line
   (including any number of SP).

2. `parse_path_space` for filecopy (source) and filerename (source):

   If <path> starts with `"`, unquote it and report parse errors;
   otherwise, treat it as literal bytes up to, but not including, the
   next SP. It must be followed by SP.

There remain two special cases: The dest <path> in filecopy and rename
cannot be an unquoted empty string (this will be addressed subsequently)
and <path> in ls-commit must be quoted to disambiguate it from ls.

Signed-off-by: Thalia Archibald <thalia@archibald.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The eighteenth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'tb/midx-write'

Code clean-up by splitting code responsible for writing midx files
into its own file.

* tb/midx-write:
  midx-write.c: use `--stdin-packs` when repacking
  midx-write.c: check count of packs to repack after grouping
  midx-write.c: factor out common want_included_pack() routine
  midx-write: move writing-related functions from midx.c

Merge branch 'rs/t-prio-queue-cleanup'

t-prio-queue test has been cleaned up by using C99 compound
literals; this is meant to also serve as a weather-balloon to smoke
out folks with compilers who have trouble compiling code that uses
the feature.

* rs/t-prio-queue-cleanup:
t-prio-queue: simplify using compound literals

Merge branch 'ps/reftable-binsearch-updates'

Reftable code clean-up and some bugfixes.

* ps/reftable-binsearch-updates:
  reftable/block: avoid decoding keys when searching restart points
  reftable/record: extract function to decode key lengths
  reftable/block: fix error handling when searching restart points
  reftable/block: refactor binary search over restart points
  reftable/refname: refactor binary search over refnames
  reftable/basics: improve `binsearch()` test
  reftable/basics: fix return type of `binsearch()` to be `size_t`

Merge branch 'jc/checkout-detach-wo-tracking-report'

"git checkout/switch --detach foo", after switching to the detached
HEAD state, gave the tracking information for the 'foo' branch,
which was pointless.

Tested-by: M Hickford <mirth.hickford@gmail.com>
cf. <CAGJzqsmE9FDEBn=u3ge4LA3ha4fDbm4OWiuUbMaztwjELBd7ug@mail.gmail.com>

* jc/checkout-detach-wo-tracking-report:
checkout: omit "tracking" information on a detached HEAD

Merge branch 'rs/imap-send-use-xsnprintf'

Code clean-up and duplicate reduction.

* rs/imap-send-use-xsnprintf:
imap-send: use xsnprintf to format command

Merge branch 'js/merge-tree-3-trees'

Match the option argument type in the help text to the correct type
updated by a recent series.

* js/merge-tree-3-trees:
merge-tree: fix argument type of the `--merge-base` option

merge-tree: fix argument type of the `--merge-base` option

In 5f43cf5b2e4 (merge-tree: accept 3 trees as arguments, 2024-01-28), I
taught `git merge-tree` to perform three-way merges on trees. This
commit even changed the manual page to state that the `--merge-base`
option takes a tree-ish rather than requiring a commit.

But I forgot to adjust the in-program help text. This patch fixes that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Documentation: fix typos describing date format

This commit corrects a typographical error found in both
date-formats.txt and git-fast-import.txt documentation, where the term
`email format` was mistakenly used instead of `date format`.

Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t0612: add tests to exercise Git/JGit reftable compatibility

While the reftable format is a recent introduction in Git, JGit already
knows to read and write reftables since 2017. Given the complexity of
the format there is a very real risk of incompatibilities between those
two implementations, which is something that we really want to avoid.

Add some basic tests that verify that reftables written by Git and JGit
can be read by the respective other implementation. For now this test
suite is rather small, only covering basic functionality. But it serves
as a good starting point and can be extended over time.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t0610: fix non-portable variable assignment

Older versions of the Dash shell fail to parse `local var=val`
assignments in some cases when `val` is unquoted. Such failures can be
observed e.g. with Ubuntu 20.04 and older, which has a Dash version that
still has this bug.

Such an assignment has been introduced in t0610. The issue wasn't
detected for a while because this test used to only run when the
GIT_TEST_DEFAULT_REF_FORMAT environment variable was set to "reftable".
We have dropped that requirement now though, meaning that it runs
unconditionally, including on jobs which use such older versions of
Ubuntu.

We have worked around such issues in the past, e.g. in ebee5580ca
(parallel-checkout: avoid dash local bug in tests, 2021-06-06), by
quoting the `val` side. Apply the same fix to t0610.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t06xx: always execute backend-specific tests

The tests in t06xx exercise specific ref formats. Next to probing some
basic functionality, these tests also exercise other low-level details
specific to the format. Those tests are only executed though in case
`GIT_TEST_DEFAULT_REF_FORMAT` is set to the ref format of the respective
backend-under-test.

Ideally, we would run the full test matrix for ref formats such that our
complete test suite is executed with every supported format on every
supported platform. This is quite an expensive undertaking though, and
thus we only execute e.g. the "reftable" tests on macOS and Linux. As a
result, we basically have no test coverage for the "reftable" format at
all on other platforms like Windows.

Adapt these tests so that they override `GIT_TEST_DEFAULT_REF_FORMAT`,
which means that they'll always execute. This increases test coverage on
platforms that don't run the full test matrix, which at least gives us
some basic test coverage on those platforms for the "reftable" format.

This of course comes at the cost of running those tests multiple times
on platforms where we do run the full test matrix. But arguably, this is
a good thing because it will also cause us to e.g. run those tests with
the address sanitizer and other non-standard parameters.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: install JGit dependency

We have some tests in t5310 that use JGit to verify that bitmaps can be
read both by Git and by JGit. We do not execute these tests in our CI
jobs though because we don't make JGit available there. Consequently,
the tests basically bitrot because almost nobody is ever going to have
JGit in their path.

Install JGit to plug this test gap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: make Perforce binaries executable for all users

The Perforce binaries are only made executable for the current user. On
GitLab CI though we execute tests as a different user than "root", and
thus these binaries may not be executable by that test user at all. This
has gone unnoticed so far because those binaries are optional -- in case
they don't exist we simply skip over tests requiring them.

Fix the setup so that we set the executable bits for all users.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: merge scripts which install dependencies

We have two different scripts which install dependencies, one for
dockerized jobs and one for non-dockerized ones. Naturally, these
scripts have quite some duplication. Furthermore, either of these
scripts is missing some test dependencies that the respective other
script has, thus reducing test coverage.

Merge those two scripts such that there is a single source of truth for
test dependencies, only.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: fix setup of custom path for GitLab CI

Part of "install-dependencies.sh" is to install some binaries required
for tests into a custom directory that gets added to the PATH. This
directory is located at "$HOME/path" and thus depends on the current
user that the script executes as.

This creates problems for GitLab CI, which installs dependencies as the
root user, but runs tests as a separate, unprivileged user. As their
respective home directories are different, we will end up using two
different custom path directories. Consequently, the unprivileged user
will not be able to find the binaries that were set up as root user.

Fix this issue by allowing CI to override the custom path, which allows
GitLab to set up a constant value that isn't derived from "$HOME".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: merge custom PATH directories

We're downloading various executables required by our tests. Each of
these executables goes into its own directory, which is then appended to
the PATH variable. Consequently, whenever we add a new dependency and
thus a new directory, we would have to adapt to this change in several
places.

Refactor this to instead put all binaries into a single directory.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: convert "install-dependencies.sh" to use "/bin/sh"

We're about to merge the "install-docker-dependencies.sh" script into
"install-dependencies.sh". This will also move our Alpine-based jobs
over to use the latter script. This script uses the Bash shell though,
which is not available by default on Alpine Linux.

Refactor "install-dependencies.sh" to use "/bin/sh" instead of Bash.
This requires us to get rid of the pushd/popd invocations, which are
replaced by some more elaborate commands that download or extract
executables right to where they are needed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: drop duplicate package installation for "linux-gcc-default"

The "linux-gcc-default" job installs common Ubuntu packages. This is
already done in the distro-specific switch, so we basically duplicate
the effort here.

Drop the duplicate package installations and inline the variable that
contains those common packages.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: skip sudo when we are already root

Our "install-dependencies.sh" script is executed by non-dockerized jobs
to install dependencies. These jobs don't run with "root" permissions,
but with a separate user. Consequently, we need to use sudo(8) there to
elevate permissions when installing packages.

We're about to merge "install-docker-dependencies.sh" into that script
though, and our Docker containers do run as "root". Using sudo(8) is
thus unnecessary there, even though it would be harmless. On some images
like Alpine Linux though there is no sudo(8) available by default, which
would consequently break the build.

Adapt the script to make "sudo" a no-op when running as "root" user.
This allows us to easily reuse the script for our dockerized jobs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: expose distro name in dockerized GitHub jobs

Expose a distro name in dockerized jobs. This will be used in a
subsequent commit where we merge the installation scripts for dockerized
and non-dockerized jobs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

ci: rename "runs_on_pool" to "distro"

The "runs_on_pool" environment variable is used by our CI scripts to
distinguish the different kinds of operating systems. It is quite
specific to GitHub Actions though and not really a descriptive name.

Rename the variable to "distro" to clarify its intent.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

repository: avoid leaking `fsmonitor` data

The `fsmonitor` repo-setting data is allocated lazily. It needs to be
released in `repo_clear()` along the rest of the allocated data in
`repository`.

This is needed because the next commit will merge v2.39.4, which will
add a `git checkout --recurse-modules` call (which would leak memory
without this here fix) to an otherwise leak-free test script.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

typo: replace 'commitish' with 'committish'

Across only three files, comments and a single function name used
'commitish' rather than 'commit-ish' or 'committish' as the spelling.
The git glossary accepts a hyphen or a double-t, but not a single-t.
Despite the typo in a translation file, none of the typos appear in
user-visible locations.

Signed-off-by: Pi Fisher <Pi.L.D.Fisher@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote-curl: add Transfer-Encoding header only for older curl

As of curl 7.66.0, we don't need to manually specify a "chunked"
Transfer-Encoding header. Instead, modern curl deduces the need for it
in a POST that has a POSTFIELDSIZE of -1 and uses READFUNCTION rather
than POSTFIELDS.

That version is recent enough that we can't just drop the header; we
need to do so conditionally. Since it's only a single line, it seems
like the simplest thing would just be to keep setting it unconditionally
(after all, the #ifdefs are much longer than the actual code). But
there's another wrinkle: HTTP/2.

Curl may choose to use HTTP/2 under the hood if the server supports it.
And in that protocol, we do not use the chunked encoding for streaming
at all. Most versions of curl handle this just fine by recognizing and
removing the header. But there's a regression in curl 8.7.0 and 8.7.1
where it doesn't, and large requests over HTTP/2 are broken (which t5559
notices). That regression has since been fixed upstream, but not yet
released.

Make the setting of this header conditional, which will let Git work
even with those buggy curl versions. And as a bonus, it serves as a
reminder that we can eventually clean up the code as we bump the
supported curl versions.

This is a backport of 92a209bf24 (remote-curl: add Transfer-Encoding
header only for older curl, 2024-04-05) into the `maint-2.39` branch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

INSTALL: bump libcurl version to 7.21.3

Our documentation claims we support curl versions back to 7.19.5. But we
can no longer compile with that version since adding an unconditional
use of CURLOPT_RESOLVE in 511cfd3bff (http: add custom hostname to IP
address resolutions, 2022-05-16). That feature wasn't added to libcurl
until 7.21.3.

We could add #ifdefs to make this work back to 7.19.5. But given that
nobody noticed the compilation failure in the intervening two years, it
makes more sense to bump the version in the documentation to 7.21.3
(which is itself over 13 years old).

We could perhaps go forward even more (which would let us drop some
cruft from git-curl-compat.h), but this should be an obviously safe
jump, and we can move forward later.

Note that user-visible syntax for CURLOPT_RESOLVE has grown new features
in subsequent curl versions. Our documentation mentions "+" and "-"
entries, which require more recent versions than 7.21.3. We could
perhaps clarify that in our docs, but it's probably not worth cluttering
them with restrictions of ancient curl versions.

This is a backport of c28ee09503 (INSTALL: bump libcurl version to
7.21.3, 2024-04-02) into the `maint-2.39` branch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

http: reset POSTFIELDSIZE when clearing curl handle

In get_active_slot(), we return a CURL handle that may have been used
before (reusing them is good because it lets curl reuse the same
connection across many requests). We set a few curl options back to
defaults that may have been modified by previous requests.

We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which
defaults to "-1"). This usually doesn't matter because most POSTs will
set both fields together anyway. But there is one exception: when
handling a large request in remote-curl's post_rpc(), we don't set
_either_, and instead set a READFUNCTION to stream data into libcurl.

This can interact weirdly with a stale POSTFIELDSIZE setting, because
curl will assume it should read only some set number of bytes from our
READFUNCTION. However, it has worked in practice because we also
manually set a "Transfer-Encoding: chunked" header, which libcurl uses
as a clue to set the POSTFIELDSIZE to -1 itself.

So everything works, but we're better off resetting the size manually
for a few reasons:

  - there was a regression in curl 8.7.0 where the chunked header
    detection didn't kick in, causing any large HTTP requests made by
    Git to fail. This has since been fixed (but not yet released). In
    the issue, curl folks recommended setting it explicitly to -1:

      https://github.com/curl/curl/issues/13229#issuecomment-2029826058

    and it indeed works around the regression. So even though it won't
    be strictly necessary after the fix there, this will help folks who
    end up using the affected libcurl versions.

  - it's consistent with what a new curl handle would look like. Since
    get_active_slot() may or may not return a used handle, this reduces
    the possibility of heisenbugs that only appear with certain request
    patterns.

Note that the recommendation in the curl issue is to actually drop the
manual Transfer-Encoding header. Modern libcurl will add the header
itself when streaming from a READFUNCTION. However, that code wasn't
added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
support back to 7.19.5, so those older versions still need the manual
header.

This is a backport of 3242311742 (http: reset POSTFIELDSIZE when
clearing curl handle, 2024-04-02) into the `maint-2.39` branch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

The seventeenth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'jc/t2104-style-update'

Coding style fixes.

* jc/t2104-style-update:
t2104: style fixes

Merge branch 'kn/clarify-update-ref-doc'

Doc update, as a preparation to enhance "git update-ref --stdin".

* kn/clarify-update-ref-doc:
githooks: use {old,new}-oid instead of {old,new}-value
update-ref: use {old,new}-oid instead of {old,new}value

Merge branch 'vs/complete-with-set-u-fix'

Another "set -u" fix for the bash prompt (in contrib/) script.

* vs/complete-with-set-u-fix:
completion: protect prompt against unset SHOWUPSTREAM in nounset mode
completion: fix prompt with unset SHOWCONFLICTSTATE in nounset mode

Merge branch 'rs/mem-pool-size-t-safety'

size_t arithmetic safety.

* rs/mem-pool-size-t-safety:
mem-pool: use st_add() in mem_pool_strvfmt()

Merge branch 'ds/typofix-core-config-doc'

Typofix.

* ds/typofix-core-config-doc:
config: fix some small capitalization issues, as spotted

send-email: move newline characters out of a few translatable strings

Move the already existing newline characters out of a few translatable
strings, to help a bit with the translation efforts.

Signed-off-by: Dragan Simic <dsimic@manjaro.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t9604: Fix test for musl libc and new Debian

CST6CDT and the like are POSIX timezone, with no rule for transition.
And POSIX doesn't enforce how to interpret the rule if it's omitted.
Some libc (e.g. glibc) resorted back to IANA (formerly Olson) db rules
for those timezones. Some libc (e.g. FreeBSD) uses a fixed rule.
Other libc (e.g. musl) interpret that as no transition at all [1].

In addition, distributions (notoriously Debian-derived, which uses IANA
db for CST6CDT and the like) started to split "legacy" timezones
like CST6CDT, EST5EDT into `tzdata-legacy', which will not be installed
by default [2].

In those cases, t9604 will run into failure.

Let's switch to POSIX timezone with rules to change timezone.

1: http://mm.icann.org/pipermail/tz/2024-March/058751.html
2: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1043250

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t2104: style fixes

We use tabs to indent, not two or four spaces.

These days, even the test fixture preparation should be done inside
test_expect_success block.

Address these two style violations in this test.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

t3428: restore coverage for "apply" backend

This test file assumes the "apply" backend is the default which is not
the case since 2ac0d6273f (rebase: change the default backend from "am"
to "merge", 2020-02-15). Make sure the "apply" backend is tested by
specifying it explicitly.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t3428: use test_commit_message

Using a helper function makes the tests shorter and avoids running "git
cat-file" upstream of a pipe.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t3428: modernize test setup

Perform the setup in a dedicated test so the later tests can be run
independently. Also avoid running git upstream of a pipe and take
advantage of test_commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The sixteenth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Merge branch 'rj/use-adv-if-enabled'

Use advice_if_enabled() API to rewrite a simple pattern to
call advise() after checking advice_enabled().

* rj/use-adv-if-enabled:
  add: use advise_if_enabled for ADVICE_ADD_EMBEDDED_REPO
  add: use advise_if_enabled for ADVICE_ADD_EMPTY_PATHSPEC
  add: use advise_if_enabled for ADVICE_ADD_IGNORED_FILE

Merge branch 'ps/pack-refs-auto'

"git pack-refs" learned the "--auto" option, which is a useful
addition to be triggered from "git gc --auto".

Acked-by: Karthik Nayak <karthik.188@gmail.com>
cf. <CAOLa=ZRAEA7rSUoYL0h-2qfEELdbPHbeGpgBJRqesyhHi9Q6WQ@mail.gmail.com>

* ps/pack-refs-auto:
  builtin/gc: pack refs when using `git maintenance run --auto`
  builtin/gc: forward git-gc(1)'s `--auto` flag when packing refs
  t6500: extract objects with "17" prefix
  builtin/gc: move `struct maintenance_run_opts`
  builtin/pack-refs: introduce new "--auto" flag
  builtin/pack-refs: release allocated memory
  refs/reftable: expose auto compaction via new flag
  refs: remove `PACK_REFS_ALL` flag
  refs: move `struct pack_refs_opts` to where it's used
  t/helper: drop pack-refs wrapper
  refs/reftable: print errors on compaction failure
  reftable/stack: gracefully handle failed auto-compaction due to locks
  reftable/stack: use error codes when locking fails during compaction
  reftable/error: discern locked/outdated errors
  reftable/stack: fix error handling in `reftable_stack_init_addition()`

Merge branch 'es/test-cron-safety'

The test script had an incomplete and ineffective attempt to avoid
clobbering the testing user's real crontab (and its equivalents),
which has been completed.

* es/test-cron-safety:
test-lib: fix non-functioning GIT_TEST_MAINT_SCHEDULER fallback

Merge branch 'rj/add-p-explicit-reshow'

"git add -p" and other "interactive hunk selection" UI has learned to
skip showing the hunk immediately after it has already been shown, and
an additional action to explicitly ask to reshow the current hunk.

* rj/add-p-explicit-reshow:
add-patch: do not print hunks repeatedly
add-patch: introduce 'p' in interactive-patch

Merge branch 'mg/editorconfig-makefile'

The .editorconfig file has been taught that a Makefile uses HT
indentation.

* mg/editorconfig-makefile:
editorconfig: add Makefiles to "text files"

Merge branch 'ja/doc-markup-updates'

Documentation rules has been explicitly described how to mark-up
literal parts and a few manual pages have been updated as examples.

* ja/doc-markup-updates:
  doc: git-clone: do not autoreference the manpage in itself
  doc: git-clone: apply new documentation formatting guidelines
  doc: git-init: apply new documentation formatting guidelines
  doc: allow literal and emphasis format in doc vs help tests
  doc: rework CodingGuidelines with new formatting rules

Merge branch 'dg/myfirstobjectwalk-updates'

Update a more recent tutorial doc.

* dg/myfirstobjectwalk-updates:
  MyFirstObjectWalk: add stderr to pipe processing
  MyFirstObjectWalk: fix description for counting omitted objects
  MyFirstObjectWalk: fix filtered object walk
  MyFirstObjectWalk: fix misspelled "builtins/"
  MyFirstObjectWalk: use additional arg in config_fn_t

Merge branch 'jc/advice-sans-trailing-whitespace'

The "hint:" messages given by the advice mechanism, when given a
message with a blank line, left a line with trailing whitespace,
which has been cleansed.

* jc/advice-sans-trailing-whitespace:
advice: omit trailing whitespace

Merge branch 'jc/apply-parse-diff-git-header-names-fix'

"git apply" failed to extract the filename the patch applied to,
when the change was about an empty file created in or deleted from
a directory whose name ends with a SP, which has been corrected.

* jc/apply-parse-diff-git-header-names-fix:
  t4126: fix "funny directory name" test on Windows (again)
  t4126: make sure a directory with SP at the end is usable
  apply: parse names out of "diff --git" more carefully

t0610: execute git-pack-refs(1) with specified umask

The tests for git-pack-refs(1) with the `core.sharedRepository` config
execute git-pack-refs(1) outside of the shell that has the expected
umask set. This is wrong because we want to test the behaviour of that
command with different umasks. The issue went unnoticed because most
distributions have a default umask of 0022, and we only ever test with
`--shared=true`, which re-adds the group write bit.

Fix the issue by moving git-pack-refs(1) into the umask'd shell and add
a bunch of test cases that exercise behaviour more thoroughly.

Note that we drop the check for whether `core.sharedRepository` was set
to the correct value to make the test setup a bit easier. We should be
able to rely on git-init(1) doing its thing correctly. Furthermore, to
help readability, we convert tests that pass `--shared=true` to instead
pass the equivalent `--shared=group`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t0610: make `--shared=` tests reusable

We have two kinds of `--shared=` tests, one for git-init(1) and one for
git-pack-refs(1). Merge them into a reusable function such that we can
easily add additional testcases with different umasks and flags for the
`--shared=` switch.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: reuse compressed array

Similar to the preceding commit, let's reuse the `compressed` array that
we use to store compressed data in. This results in a small reduction in
memory allocations when writing many refs.

Before:

  HEAP SUMMARY:
      in use at exit: 671,931 bytes in 151 blocks
    total heap usage: 22,620,528 allocs, 22,620,377 frees, 1,245,549,984 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 671,931 bytes in 151 blocks
    total heap usage: 22,618,257 allocs, 22,618,106 frees, 1,236,351,528 bytes allocated

So while the reduction in allocations isn't really all that big, it's a
low hanging fruit and thus there isn't much of a reason not to pick it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/block: reuse zstream when writing log blocks

While most reftable blocks are written to disk as-is, blocks for log
records are compressed with zlib. To compress them we use `compress2()`,
which is a simple wrapper around the more complex `zstream` interface
that would require multiple function invocations.

One downside of this interface is that `compress2()` will reallocate
internal state of the `zstream` interface on every single invocation.
Consequently, as we call `compress2()` for every single log block which
we are about to write, this can lead to quite some memory allocation
churn.

Refactor the code so that the block writer reuses a `zstream`. This
significantly reduces the number of bytes allocated when writing many
refs in a single transaction, as demonstrated by the following benchmark
that writes 100k refs in a single transaction.

Before:

  HEAP SUMMARY:
      in use at exit: 671,931 bytes in 151 blocks
    total heap usage: 22,631,887 allocs, 22,631,736 frees, 1,854,670,793 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 671,931 bytes in 151 blocks
    total heap usage: 22,620,528 allocs, 22,620,377 frees, 1,245,549,984 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/writer: reset `last_key` instead of releasing it

The reftable writer tracks the last key that it has written so that it
can properly compute the compressed prefix for the next record it is
about to write. This last key must be reset whenever we move on to write
the next block, which is done in `writer_reinit_block_writer()`. We do
this by calling `strbuf_release()` though, which needlessly deallocates
the underlying buffer.

Convert the code to use `strbuf_reset()` instead, which saves one
allocation per block we're about to write. This requires us to also
amend `reftable_writer_free()` to release the buffer's memory now as we
previously seemingly relied on `writer_reinit_block_writer()` to release
the memory for us. Releasing memory here is the right thing to do
anyway.

While at it, convert a callsite where we truncate the buffer by setting
its length to zero to instead use `strbuf_reset()`, too.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/writer: unify releasing memory

There are two code paths which release memory of the reftable writer:

  - `reftable_writer_close()` releases internal state after it has
    written data.

  - `reftable_writer_free()` releases the block that was written to and
    the writer itself.

Both code paths free different parts of the writer, and consequently the
caller must make sure to call both. And while callers mostly do this
already, this falls apart when a write failure causes the caller to skip
calling `reftable_write_close()`.

Introduce a new function `reftable_writer_release()` that releases all
internal state and call it from both paths. Like this it is fine for the
caller to not call `reftable_writer_close()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/writer: refactorings for `writer_flush_nonempty_block()`

Large parts of the reftable library do not conform to Git's typical code
style. Refactor `writer_flush_nonempty_block()` such that it conforms
better to it and add some documentation that explains some of its more
intricate behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/writer: refactorings for `writer_add_record()`

Large parts of the reftable library do not conform to Git's typical code
style. Refactor `writer_add_record()` such that it conforms better to it
and add some documentation that explains some of its more intricate
behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/reftable: don't recompute committer ident

In order to write reflog entries we need to compute the committer's
identity as it gets encoded in the log record itself. The reftable
backend does this via `git_committer_info()` and `split_ident_line()` in
`fill_reftable_log_record()`, which use the Git config as well as
environment variables to figure out the identity.

While most callers would only call `fill_reftable_log_record()` once or
twice, `write_transaction_table()` will call it as many times as there
are queued ref updates. This can be quite a waste of effort when writing
many refs with reflog entries in a single transaction.

Refactor the code to pre-compute the committer information. This results
in a small speedup when writing 100000 refs in a single transaction:

  Benchmark 1: update-ref: create many refs (HEAD~)
    Time (mean ± σ):      2.895 s ±  0.020 s    [User: 1.516 s, System: 1.374 s]
    Range (min … max):    2.868 s …  2.983 s    100 runs

  Benchmark 2: update-ref: create many refs (HEAD)
    Time (mean ± σ):      2.845 s ±  0.017 s    [User: 1.461 s, System: 1.379 s]
    Range (min … max):    2.803 s …  2.913 s    100 runs

  Summary
    update-ref: create many refs (HEAD) ran
      1.02 ± 0.01 times faster than update-ref: create many refs (HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable: remove name checks

In the preceding commit we have disabled name checks in the "reftable"
backend. These checks were responsible for verifying multiple things
when writing records to the reftable stack:

  - Detecting file/directory conflicts. Starting with the preceding
    commits this is now handled by the reftable backend itself via
    `refs_verify_refname_available()`.

  - Validating refnames. This is handled by `check_refname_format()` in
    the generic ref transacton layer.

The code in the reftable library is thus not used anymore and likely to
bitrot over time. Remove it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/reftable: skip duplicate name checks

All the callback functions which write refs in the reftable backend
perform D/F conflict checks via `refs_verify_refname_available()`. But
in reality we perform these D/F conflict checks a second time in the
reftable library via `stack_check_addition()`.

Interestingly, the code in the reftable library is inferior compared to
the generic function:

  - It is slower than `refs_verify_refname_available()`, even though
    this can probably be optimized.

  - It does not provide a proper error message to the caller, and thus
    all the user would see is a generic "file/directory conflict"
    message.

Disable the D/F conflict checks in the reftable library by setting the
`skip_name_check` write option. This results in a non-negligible speedup
when writing many refs. The following benchmark writes 100k refs in a
single transaction:

  Benchmark 1: update-ref: create many refs (HEAD~)
    Time (mean ± σ):      3.241 s ±  0.040 s    [User: 1.854 s, System: 1.381 s]
    Range (min … max):    3.185 s …  3.454 s    100 runs

  Benchmark 2: update-ref: create many refs (HEAD)
    Time (mean ± σ):      2.878 s ±  0.024 s    [User: 1.506 s, System: 1.367 s]
    Range (min … max):    2.838 s …  2.960 s    100 runs

  Summary
    update-ref: create many refs (HEAD~) ran
      1.13 ± 0.02 times faster than update-ref: create many refs (HEAD)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/reftable: perform explicit D/F check when writing symrefs

We already perform explicit D/F checks in all reftable callbacks which
write refs, except when writing symrefs. For one this leads to an error
message which isn't perfectly actionable because we only tell the user
that there was a D/F conflict, but not which refs conflicted with each
other. But second, once all ref updating callbacks explicitly check for
D/F conflicts, we can disable the D/F checks in the reftable library
itself and thus avoid some duplicated efforts.

Refactor the code that writes symref tables to explicitly call into
`refs_verify_refname_available()` when writing symrefs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

refs/reftable: fix D/F conflict error message on ref copy

The `write_copy_table()` function is shared between the reftable
implementations for renaming and copying refs. The only difference
between those two cases is that the rename will also delete the old
reference, whereas copying won't.

This has resulted in a bug though where we don't properly verify refname
availability. When calling `refs_verify_refname_available()`, we always
add the old ref name to the list of refs to be skipped when computing
availability, which indicates that the name would be available even if
it already exists at the current point in time. This is only the right
thing to do for renames though, not for copies.

The consequence of this bug is quite harmless because the reftable
backend has its own checks for D/F conflicts further down in the call
stack, and thus we refuse the update regardless of the bug. But all the
user gets in this case is an uninformative message that copying the ref
has failed, without any further details.

Fix the bug and only add the old name to the skip-list in case we rename
the ref. Consequently, this error case will now be handled by
`refs_verify_refname_available()`, which knows to provide a proper error
message.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

Makefile(s): do not enforce "all indents must be done with tab"

Our top-level Makefile follows our generic whitespace rule
established by the top-level .gitattributes file that does not
enforce indent-with-non-tab rule by default, but git-gui is set up
to enforce indent-with-non-tab by default. With the upcoming change
to GNU make, we no longer can reject (and worse, "fix") a patch that
adds whitespace indented lines to the Makefile, so loosen the rule
there for git-gui/Makefile, too.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

Makefile(s): avoid recipe prefix in conditional statements

In GNU Make commit 07fcee35 ([SV 64815] Recipe lines cannot contain
conditional statements, 2023-05-22) and following, conditional
statements may no longer be preceded by a tab character (which Make
refers to as the recipe prefix).

There are a handful of spots in our various Makefile(s) which will break
in a future release of Make containing 07fcee35. For instance, trying to
compile the pre-image of this patch with the tip of make.git results in
the following:

    $ make -v | head -1 && make
    GNU Make 4.4.90
    config.mak.uname:842: *** missing 'endif'.  Stop.

The kernel addressed this issue in 82175d1f9430 (kbuild: Replace tabs
with spaces when followed by conditionals, 2024-01-28). Address the
issues in Git's tree by applying the same strategy.

When a conditional word (ifeq, ifneq, ifdef, etc.) is preceded by one or
more tab characters, replace each tab character with 8 space characters
with the following:

    find . -type f -not -path './.git/*' -name Makefile -or -name '*.mak' |
      xargs perl -i -pe '
        s/(\t+)(ifn?eq|ifn?def|else|endif)/" " x (length($1) * 8) . $2/ge unless /\\$/
      '

The "unless /\\$/" removes any false-positives (like "\telse \"
appearing within a shell script as part of a recipe).

After doing so, Git compiles on newer versions of Make:

    $ make -v | head -1 && make
    GNU Make 4.4.90
    GIT_VERSION = 2.44.0.414.gfac1dc44ca9
    [...]

    $ echo $?
    0

Reported-by: Dario Gjorgjevski <dario.gjorgjevski@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

config: do not leak excludes_file

The excludes_file variable is marked "const char *", but all the
assignments to it are made with a piece of memory allocated just
for it, and the variable is responsible for owning it.

When "core.excludesfile" is read, the code just lost the previous
value, leaking memory. Plug it.

The real problem is that the variable is mistyped; our convention
is to never make a variable that owns the piece of memory pointed
by it as "const". Fixing that would reduce the chance of this kind
of bug happening, and also would make it unnecessary to cast the
constness away while free()ing it, but that would be a much larger
follow-up effort.

Reported-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/stack: use geometric table compaction

To reduce the number of on-disk reftables, compaction is performed.
Contiguous tables with the same binary log value of size are grouped
into segments. The segment that has both the lowest binary log value and
contains more than one table is set as the starting point when
identifying the compaction segment.

Since segments containing a single table are not initially considered
for compaction, if the table appended to the list does not match the
previous table log value, no compaction occurs for the new table. It is
therefore possible for unbounded growth of the table list. This can be
demonstrated by repeating the following sequence:

git branch -f foo
git branch -d foo

Each operation results in a new table being written with no compaction
occurring until a separate operation produces a table matching the
previous table log value.

Instead, to avoid unbounded growth of the table list, the compaction
strategy is updated to ensure tables follow a geometric sequence after
each operation by individually evaluating each table in reverse index
order. This strategy results in a much simpler and more robust algorithm
compared to the previous one while also maintaining a minimal ordered
set of tables on-disk.

When creating 10 thousand references, the new strategy has no
performance impact:

Benchmark 1: update-ref: create refs sequentially (revision = HEAD~)
  Time (mean ± σ):     26.516 s ±  0.047 s    [User: 17.864 s, System: 8.491 s]
  Range (min … max):   26.447 s … 26.569 s    10 runs

Benchmark 2: update-ref: create refs sequentially (revision = HEAD)
  Time (mean ± σ):     26.417 s ±  0.028 s    [User: 17.738 s, System: 8.500 s]
  Range (min … max):   26.366 s … 26.444 s    10 runs

Summary
  update-ref: create refs sequentially (revision = HEAD) ran
    1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~)

Some tests in `t0610-reftable-basics.sh` assert the on-disk state of
tables and are therefore updated to specify the correct new table count.
Since compaction is more aggressive in ensuring tables maintain a
geometric sequence, the expected table count is reduced in these tests.
In `reftable/stack_test.c` tests related to `sizes_to_segments()` are
removed because the function is no longer needed. Also, the
`test_suggest_compaction_segment()` test is updated to better showcase
and reflect the new geometric compaction behavior.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/stack: add env to disable autocompaction

In future tests it will be neccesary to create repositories with a set
number of tables. To make this easier, introduce the
`GIT_TEST_REFTABLE_AUTOCOMPACTION` environment variable that, when set
to false, disables autocompaction of reftables.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

reftable/stack: expose option to disable auto-compaction

The reftable stack already has a variable to configure whether or not to
run auto-compaction, but it is inaccessible to users of the library.
There exist use cases where a caller may want to have more control over
auto-compaction.

Move the `disable_auto_compact` option into `reftable_write_options` to
allow external callers to disable auto-compaction. This will be used in
a subsequent commit.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

t1016: local VAR="VAL" fix

The series was based on maint and fixes all the tests that exist
there, but we have acquired a few more.

I suspect that the values assigned in many of these places are $IFS
safe, and this is primarily to squelch the linter than adding a
necessary workaround for buggy dash.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

t0610: local VAR="VAL" fix

The series was based on maint and fixes all the tests that exist
there, but we have acquired a few more.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: teach lint that RHS of 'local VAR=VAL' needs to be quoted

Teach t/check-non-portable-shell.pl that right hand side of the
assignment done with "local VAR=VAL" need to be quoted. We
deliberately target only VAL that begins with $ so that we can catch

- $variable_reference and positional parameter reference like $4
- $(command substitution)
- ${variable_reference-with_magic}

while excluding

- $'\n' that is a bash-ism freely usable in t990[23]
- $(( arithmetic )) whose result should be $IFS safe.
- $? that also is $IFS safe

Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: local VAR="VAL" (quote ${magic-reference})

Future-proof test scripts that do

local VAR=VAL

without quoting VAL (which is OK in POSIX but broken in some shells)
that is ${magic-"reference to a parameter"}.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: local VAR="VAL" (quote command substitution)

Future-proof test scripts that do

local VAR=VAL

without quoting VAL (which is OK in POSIX but broken in some shells)
that is a $(command substitution).

Signed-off-by: Junio C Hamano <gitster@pobox.com>

t: local VAR="VAL" (quote positional parameters)

Future-proof test scripts that do

local VAR=VAL

without quoting VAL (which is OK in POSIX but broken in some shells)
that is a positional parameter, e.g. $4.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

CodingGuidelines: quote assigned value in 'local var=$val'

Dash bug https://bugs.launchpad.net/ubuntu/+source/dash/+bug/139097
lets the shell erroneously perform field splitting on the expansion
of a command substitution during declaration of a local or an extern
variable.

The explanation was stolen from ebee5580 (parallel-checkout: avoid
dash local bug in tests, 2021-06-06).

Signed-off-by: Junio C Hamano <gitster@pobox.com>

CodingGuidelines: describe "export VAR=VAL" rule

https://lore.kernel.org/git/201307081121.22769.tboegi@web.de/
resulted in 9968ffff (test-lint: detect 'export FOO=bar',
2013-07-08) to add a rule to t/check-non-portable-shell.pl script to
reject

export VAR=VAL

and suggest us to instead write it as two statements, i.e.,

VAR=VAL
export VAR

This however was not spelled out in the CodingGuidelines document.

We may want to re-evaluate the rule since it is from ages ago, but
for now, let's make the written rule and what the automation
enforces consistent.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

userdiff: better method/property matching for C#

- Support multi-line methods by not requiring closing parenthesis.
- Support multiple generics (comma was missing before).
- Add missing `foreach`, `lock` and `fixed` keywords to skip over.
- Remove `instanceof` keyword, which isn't C#.
- Also detect non-method keywords not positioned at the start of a line.
- Added tests; none existed before.

The overall strategy is to focus more on what isn't expected for
method/property definitions, instead of what is, but is fully optional.

Signed-off-by: Steven Jeuris <steven.jeuris@gmail.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

date: make DATE_MODE thread-safe

date_mode_from_type() modifies a static variable and returns a pointer
to it.  This is not thread-safe.  Most callers of date_mode_from_type()
use it via the macro DATE_MODE and pass its result on to functions like
show_date(), which take a const pointer and don't modify the struct.

Avoid the static storage by putting the variable on the stack and
returning the whole struct date_mode.  Change functions that take a
constant pointer to expect the whole struct instead.

Reduce the cost of passing struct date_mode around on 64-bit systems
by reordering its members to close the hole between the 32-bit wide
.type and the 64-bit aligned .strftime_fmt as well as the alignment
hole at the end.  sizeof reports 24 before and 16 with this change
on x64.  Keep .type at the top to still allow initialization without
designator -- though that's only done in a single location, in
builtin/blame.c.

Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

usage: report vsnprintf(3) failure

vreportf(), which is used e.g. by die() and warning() by default, calls
vsnprintf(3) to format the message to report.  If that call fails, it
only prints the prefix, e.g. "fatal: " or "warning: ".  This at least
informs users that they were supposed to get a message and reveals its
severity, but leaves them wondering what it may have been about.

Here's an example where vreportf() tries to print a message with a 2GB
string, which is too much for vsnprintf(3):

  $ perl -le 'print "create refs/heads/", "a"x2**31' | git update-ref --stdin
  fatal:

At least report the formatting error along with the offending message
(unformatted) to indicate why that message is empty.  Use fprintf(3)
instead of error() to get the message out directly and avoid recursing
back into vreportf().

With this patch we get:

  $ perl -le 'print "create refs/heads/", "a"x2**31' | git update-ref --stdin
  error: unable to format message: invalid ref format: %s
  fatal:

... which allows users to at least get an idea of what went wrong.

Suggested-by: Jeff King <peff@peff.net>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

remote-curl: add Transfer-Encoding header only for older curl

As of curl 7.66.0, we don't need to manually specify a "chunked"
Transfer-Encoding header. Instead, modern curl deduces the need for it
in a POST that has a POSTFIELDSIZE of -1 and uses READFUNCTION rather
than POSTFIELDS.

That version is recent enough that we can't just drop the header; we
need to do so conditionally. Since it's only a single line, it seems
like the simplest thing would just be to keep setting it unconditionally
(after all, the #ifdefs are much longer than the actual code). But
there's another wrinkle: HTTP/2.

Curl may choose to use HTTP/2 under the hood if the server supports it.
And in that protocol, we do not use the chunked encoding for streaming
at all. Most versions of curl handle this just fine by recognizing and
removing the header. But there's a regression in curl 8.7.0 and 8.7.1
where it doesn't, and large requests over HTTP/2 are broken (which t5559
notices). That regression has since been fixed upstream, but not yet
released.

Make the setting of this header conditional, which will let Git work
even with those buggy curl versions. And as a bonus, it serves as a
reminder that we can eventually clean up the code as we bump the
supported curl versions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

The fifteenth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>