Older versions of the Dash shell fail to parse `local var=val`
assignments in some cases when `val` is unquoted. Such failures can be
observed e.g. with Ubuntu 20.04 and older, which has a Dash version that
still has this bug.
Such an assignment has been introduced in t0610. The issue wasn't
detected for a while because this test used to only run when the
GIT_TEST_DEFAULT_REF_FORMAT environment variable was set to "reftable".
We have dropped that requirement now though, meaning that it runs
unconditionally, including on jobs which use such older versions of
Ubuntu.
We have worked around such issues in the past, e.g. in ebee5580ca
(parallel-checkout: avoid dash local bug in tests, 2021-06-06), by
quoting the `val` side. Apply the same fix to t0610.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tests in t06xx exercise specific ref formats. Next to probing some
basic functionality, these tests also exercise other low-level details
specific to the format. Those tests are only executed though in case
`GIT_TEST_DEFAULT_REF_FORMAT` is set to the ref format of the respective
backend-under-test.
Ideally, we would run the full test matrix for ref formats such that our
complete test suite is executed with every supported format on every
supported platform. This is quite an expensive undertaking though, and
thus we only execute e.g. the "reftable" tests on macOS and Linux. As a
result, we basically have no test coverage for the "reftable" format at
all on other platforms like Windows.
Adapt these tests so that they override `GIT_TEST_DEFAULT_REF_FORMAT`,
which means that they'll always execute. This increases test coverage on
platforms that don't run the full test matrix, which at least gives us
some basic test coverage on those platforms for the "reftable" format.
This of course comes at the cost of running those tests multiple times
on platforms where we do run the full test matrix. But arguably, this is
a good thing because it will also cause us to e.g. run those tests with
the address sanitizer and other non-standard parameters.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have some tests in t5310 that use JGit to verify that bitmaps can be
read both by Git and by JGit. We do not execute these tests in our CI
jobs though because we don't make JGit available there. Consequently,
the tests basically bitrot because almost nobody is ever going to have
JGit in their path.
Install JGit to plug this test gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ci: make Perforce binaries executable for all users
The Perforce binaries are only made executable for the current user. On
GitLab CI though we execute tests as a different user than "root", and
thus these binaries may not be executable by that test user at all. This
has gone unnoticed so far because those binaries are optional -- in case
they don't exist we simply skip over tests requiring them.
Fix the setup so that we set the executable bits for all users.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two different scripts which install dependencies, one for
dockerized jobs and one for non-dockerized ones. Naturally, these
scripts have quite some duplication. Furthermore, either of these
scripts is missing some test dependencies that the respective other
script has, thus reducing test coverage.
Merge those two scripts such that there is a single source of truth for
test dependencies, only.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Part of "install-dependencies.sh" is to install some binaries required
for tests into a custom directory that gets added to the PATH. This
directory is located at "$HOME/path" and thus depends on the current
user that the script executes as.
This creates problems for GitLab CI, which installs dependencies as the
root user, but runs tests as a separate, unprivileged user. As their
respective home directories are different, we will end up using two
different custom path directories. Consequently, the unprivileged user
will not be able to find the binaries that were set up as root user.
Fix this issue by allowing CI to override the custom path, which allows
GitLab to set up a constant value that isn't derived from "$HOME".
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're downloading various executables required by our tests. Each of
these executables goes into its own directory, which is then appended to
the PATH variable. Consequently, whenever we add a new dependency and
thus a new directory, we would have to adapt to this change in several
places.
Refactor this to instead put all binaries into a single directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ci: convert "install-dependencies.sh" to use "/bin/sh"
We're about to merge the "install-docker-dependencies.sh" script into
"install-dependencies.sh". This will also move our Alpine-based jobs
over to use the latter script. This script uses the Bash shell though,
which is not available by default on Alpine Linux.
Refactor "install-dependencies.sh" to use "/bin/sh" instead of Bash.
This requires us to get rid of the pushd/popd invocations, which are
replaced by some more elaborate commands that download or extract
executables right to where they are needed.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ci: drop duplicate package installation for "linux-gcc-default"
The "linux-gcc-default" job installs common Ubuntu packages. This is
already done in the distro-specific switch, so we basically duplicate
the effort here.
Drop the duplicate package installations and inline the variable that
contains those common packages.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our "install-dependencies.sh" script is executed by non-dockerized jobs
to install dependencies. These jobs don't run with "root" permissions,
but with a separate user. Consequently, we need to use sudo(8) there to
elevate permissions when installing packages.
We're about to merge "install-docker-dependencies.sh" into that script
though, and our Docker containers do run as "root". Using sudo(8) is
thus unnecessary there, even though it would be harmless. On some images
like Alpine Linux though there is no sudo(8) available by default, which
would consequently break the build.
Adapt the script to make "sudo" a no-op when running as "root" user.
This allows us to easily reuse the script for our dockerized jobs.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expose a distro name in dockerized jobs. This will be used in a
subsequent commit where we merge the installation scripts for dockerized
and non-dockerized jobs.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "runs_on_pool" environment variable is used by our CI scripts to
distinguish the different kinds of operating systems. It is quite
specific to GitHub Actions though and not really a descriptive name.
Rename the variable to "distro" to clarify its intent.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `fsmonitor` repo-setting data is allocated lazily. It needs to be
released in `repo_clear()` along the rest of the allocated data in
`repository`.
This is needed because the next commit will merge v2.39.4, which will
add a `git checkout --recurse-modules` call (which would leak memory
without this here fix) to an otherwise leak-free test script.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Pi Fisher [Sun, 7 Apr 2024 21:21:08 +0000 (17:21 -0400)]
typo: replace 'commitish' with 'committish'
Across only three files, comments and a single function name used
'commitish' rather than 'commit-ish' or 'committish' as the spelling.
The git glossary accepts a hyphen or a double-t, but not a single-t.
Despite the typo in a translation file, none of the typos appear in
user-visible locations.
Signed-off-by: Pi Fisher <Pi.L.D.Fisher@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 5 Apr 2024 20:04:51 +0000 (16:04 -0400)]
remote-curl: add Transfer-Encoding header only for older curl
As of curl 7.66.0, we don't need to manually specify a "chunked"
Transfer-Encoding header. Instead, modern curl deduces the need for it
in a POST that has a POSTFIELDSIZE of -1 and uses READFUNCTION rather
than POSTFIELDS.
That version is recent enough that we can't just drop the header; we
need to do so conditionally. Since it's only a single line, it seems
like the simplest thing would just be to keep setting it unconditionally
(after all, the #ifdefs are much longer than the actual code). But
there's another wrinkle: HTTP/2.
Curl may choose to use HTTP/2 under the hood if the server supports it.
And in that protocol, we do not use the chunked encoding for streaming
at all. Most versions of curl handle this just fine by recognizing and
removing the header. But there's a regression in curl 8.7.0 and 8.7.1
where it doesn't, and large requests over HTTP/2 are broken (which t5559
notices). That regression has since been fixed upstream, but not yet
released.
Make the setting of this header conditional, which will let Git work
even with those buggy curl versions. And as a bonus, it serves as a
reminder that we can eventually clean up the code as we bump the
supported curl versions.
This is a backport of 92a209bf24 (remote-curl: add Transfer-Encoding
header only for older curl, 2024-04-05) into the `maint-2.39` branch.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Jeff King [Tue, 2 Apr 2024 20:06:00 +0000 (16:06 -0400)]
INSTALL: bump libcurl version to 7.21.3
Our documentation claims we support curl versions back to 7.19.5. But we
can no longer compile with that version since adding an unconditional
use of CURLOPT_RESOLVE in 511cfd3bff (http: add custom hostname to IP
address resolutions, 2022-05-16). That feature wasn't added to libcurl
until 7.21.3.
We could add #ifdefs to make this work back to 7.19.5. But given that
nobody noticed the compilation failure in the intervening two years, it
makes more sense to bump the version in the documentation to 7.21.3
(which is itself over 13 years old).
We could perhaps go forward even more (which would let us drop some
cruft from git-curl-compat.h), but this should be an obviously safe
jump, and we can move forward later.
Note that user-visible syntax for CURLOPT_RESOLVE has grown new features
in subsequent curl versions. Our documentation mentions "+" and "-"
entries, which require more recent versions than 7.21.3. We could
perhaps clarify that in our docs, but it's probably not worth cluttering
them with restrictions of ancient curl versions.
This is a backport of c28ee09503 (INSTALL: bump libcurl version to
7.21.3, 2024-04-02) into the `maint-2.39` branch.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Jeff King [Tue, 2 Apr 2024 20:05:17 +0000 (16:05 -0400)]
http: reset POSTFIELDSIZE when clearing curl handle
In get_active_slot(), we return a CURL handle that may have been used
before (reusing them is good because it lets curl reuse the same
connection across many requests). We set a few curl options back to
defaults that may have been modified by previous requests.
We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which
defaults to "-1"). This usually doesn't matter because most POSTs will
set both fields together anyway. But there is one exception: when
handling a large request in remote-curl's post_rpc(), we don't set
_either_, and instead set a READFUNCTION to stream data into libcurl.
This can interact weirdly with a stale POSTFIELDSIZE setting, because
curl will assume it should read only some set number of bytes from our
READFUNCTION. However, it has worked in practice because we also
manually set a "Transfer-Encoding: chunked" header, which libcurl uses
as a clue to set the POSTFIELDSIZE to -1 itself.
So everything works, but we're better off resetting the size manually
for a few reasons:
- there was a regression in curl 8.7.0 where the chunked header
detection didn't kick in, causing any large HTTP requests made by
Git to fail. This has since been fixed (but not yet released). In
the issue, curl folks recommended setting it explicitly to -1:
and it indeed works around the regression. So even though it won't
be strictly necessary after the fix there, this will help folks who
end up using the affected libcurl versions.
- it's consistent with what a new curl handle would look like. Since
get_active_slot() may or may not return a used handle, this reduces
the possibility of heisenbugs that only appear with certain request
patterns.
Note that the recommendation in the curl issue is to actually drop the
manual Transfer-Encoding header. Modern libcurl will add the header
itself when streaming from a READFUNCTION. However, that code wasn't
added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
support back to 7.19.5, so those older versions still need the manual
header.
This is a backport of 3242311742 (http: reset POSTFIELDSIZE when
clearing curl handle, 2024-04-02) into the `maint-2.39` branch.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Junio C Hamano [Wed, 10 Apr 2024 17:00:08 +0000 (10:00 -0700)]
Merge branch 'vs/complete-with-set-u-fix'
Another "set -u" fix for the bash prompt (in contrib/) script.
* vs/complete-with-set-u-fix:
completion: protect prompt against unset SHOWUPSTREAM in nounset mode
completion: fix prompt with unset SHOWCONFLICTSTATE in nounset mode
CST6CDT and the like are POSIX timezone, with no rule for transition.
And POSIX doesn't enforce how to interpret the rule if it's omitted.
Some libc (e.g. glibc) resorted back to IANA (formerly Olson) db rules
for those timezones. Some libc (e.g. FreeBSD) uses a fixed rule.
Other libc (e.g. musl) interpret that as no transition at all [1].
In addition, distributions (notoriously Debian-derived, which uses IANA
db for CST6CDT and the like) started to split "legacy" timezones
like CST6CDT, EST5EDT into `tzdata-legacy', which will not be installed
by default [2].
In those cases, t9604 will run into failure.
Let's switch to POSIX timezone with rules to change timezone.
This test file assumes the "apply" backend is the default which is not
the case since 2ac0d6273f (rebase: change the default backend from "am"
to "merge", 2020-02-15). Make sure the "apply" backend is tested by
specifying it explicitly.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Perform the setup in a dedicated test so the later tests can be run
independently. Also avoid running git upstream of a pipe and take
advantage of test_commit.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 9 Apr 2024 21:31:45 +0000 (14:31 -0700)]
Merge branch 'rj/use-adv-if-enabled'
Use advice_if_enabled() API to rewrite a simple pattern to
call advise() after checking advice_enabled().
* rj/use-adv-if-enabled:
add: use advise_if_enabled for ADVICE_ADD_EMBEDDED_REPO
add: use advise_if_enabled for ADVICE_ADD_EMPTY_PATHSPEC
add: use advise_if_enabled for ADVICE_ADD_IGNORED_FILE
* ps/pack-refs-auto:
builtin/gc: pack refs when using `git maintenance run --auto`
builtin/gc: forward git-gc(1)'s `--auto` flag when packing refs
t6500: extract objects with "17" prefix
builtin/gc: move `struct maintenance_run_opts`
builtin/pack-refs: introduce new "--auto" flag
builtin/pack-refs: release allocated memory
refs/reftable: expose auto compaction via new flag
refs: remove `PACK_REFS_ALL` flag
refs: move `struct pack_refs_opts` to where it's used
t/helper: drop pack-refs wrapper
refs/reftable: print errors on compaction failure
reftable/stack: gracefully handle failed auto-compaction due to locks
reftable/stack: use error codes when locking fails during compaction
reftable/error: discern locked/outdated errors
reftable/stack: fix error handling in `reftable_stack_init_addition()`
Junio C Hamano [Tue, 9 Apr 2024 21:31:44 +0000 (14:31 -0700)]
Merge branch 'es/test-cron-safety'
The test script had an incomplete and ineffective attempt to avoid
clobbering the testing user's real crontab (and its equivalents),
which has been completed.
Junio C Hamano [Tue, 9 Apr 2024 21:31:44 +0000 (14:31 -0700)]
Merge branch 'rj/add-p-explicit-reshow'
"git add -p" and other "interactive hunk selection" UI has learned to
skip showing the hunk immediately after it has already been shown, and
an additional action to explicitly ask to reshow the current hunk.
* rj/add-p-explicit-reshow:
add-patch: do not print hunks repeatedly
add-patch: introduce 'p' in interactive-patch
Junio C Hamano [Tue, 9 Apr 2024 21:31:44 +0000 (14:31 -0700)]
Merge branch 'ja/doc-markup-updates'
Documentation rules has been explicitly described how to mark-up
literal parts and a few manual pages have been updated as examples.
* ja/doc-markup-updates:
doc: git-clone: do not autoreference the manpage in itself
doc: git-clone: apply new documentation formatting guidelines
doc: git-init: apply new documentation formatting guidelines
doc: allow literal and emphasis format in doc vs help tests
doc: rework CodingGuidelines with new formatting rules
Junio C Hamano [Tue, 9 Apr 2024 21:31:43 +0000 (14:31 -0700)]
Merge branch 'jc/advice-sans-trailing-whitespace'
The "hint:" messages given by the advice mechanism, when given a
message with a blank line, left a line with trailing whitespace,
which has been cleansed.
"git apply" failed to extract the filename the patch applied to,
when the change was about an empty file created in or deleted from
a directory whose name ends with a SP, which has been corrected.
* jc/apply-parse-diff-git-header-names-fix:
t4126: fix "funny directory name" test on Windows (again)
t4126: make sure a directory with SP at the end is usable
apply: parse names out of "diff --git" more carefully
t0610: execute git-pack-refs(1) with specified umask
The tests for git-pack-refs(1) with the `core.sharedRepository` config
execute git-pack-refs(1) outside of the shell that has the expected
umask set. This is wrong because we want to test the behaviour of that
command with different umasks. The issue went unnoticed because most
distributions have a default umask of 0022, and we only ever test with
`--shared=true`, which re-adds the group write bit.
Fix the issue by moving git-pack-refs(1) into the umask'd shell and add
a bunch of test cases that exercise behaviour more thoroughly.
Note that we drop the check for whether `core.sharedRepository` was set
to the correct value to make the test setup a bit easier. We should be
able to rely on git-init(1) doing its thing correctly. Furthermore, to
help readability, we convert tests that pass `--shared=true` to instead
pass the equivalent `--shared=group`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two kinds of `--shared=` tests, one for git-init(1) and one for
git-pack-refs(1). Merge them into a reusable function such that we can
easily add additional testcases with different umasks and flags for the
`--shared=` switch.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the preceding commit, let's reuse the `compressed` array that
we use to store compressed data in. This results in a small reduction in
memory allocations when writing many refs.
Before:
HEAP SUMMARY:
in use at exit: 671,931 bytes in 151 blocks
total heap usage: 22,620,528 allocs, 22,620,377 frees, 1,245,549,984 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 671,931 bytes in 151 blocks
total heap usage: 22,618,257 allocs, 22,618,106 frees, 1,236,351,528 bytes allocated
So while the reduction in allocations isn't really all that big, it's a
low hanging fruit and thus there isn't much of a reason not to pick it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/block: reuse zstream when writing log blocks
While most reftable blocks are written to disk as-is, blocks for log
records are compressed with zlib. To compress them we use `compress2()`,
which is a simple wrapper around the more complex `zstream` interface
that would require multiple function invocations.
One downside of this interface is that `compress2()` will reallocate
internal state of the `zstream` interface on every single invocation.
Consequently, as we call `compress2()` for every single log block which
we are about to write, this can lead to quite some memory allocation
churn.
Refactor the code so that the block writer reuses a `zstream`. This
significantly reduces the number of bytes allocated when writing many
refs in a single transaction, as demonstrated by the following benchmark
that writes 100k refs in a single transaction.
Before:
HEAP SUMMARY:
in use at exit: 671,931 bytes in 151 blocks
total heap usage: 22,631,887 allocs, 22,631,736 frees, 1,854,670,793 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 671,931 bytes in 151 blocks
total heap usage: 22,620,528 allocs, 22,620,377 frees, 1,245,549,984 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/writer: reset `last_key` instead of releasing it
The reftable writer tracks the last key that it has written so that it
can properly compute the compressed prefix for the next record it is
about to write. This last key must be reset whenever we move on to write
the next block, which is done in `writer_reinit_block_writer()`. We do
this by calling `strbuf_release()` though, which needlessly deallocates
the underlying buffer.
Convert the code to use `strbuf_reset()` instead, which saves one
allocation per block we're about to write. This requires us to also
amend `reftable_writer_free()` to release the buffer's memory now as we
previously seemingly relied on `writer_reinit_block_writer()` to release
the memory for us. Releasing memory here is the right thing to do
anyway.
While at it, convert a callsite where we truncate the buffer by setting
its length to zero to instead use `strbuf_reset()`, too.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two code paths which release memory of the reftable writer:
- `reftable_writer_close()` releases internal state after it has
written data.
- `reftable_writer_free()` releases the block that was written to and
the writer itself.
Both code paths free different parts of the writer, and consequently the
caller must make sure to call both. And while callers mostly do this
already, this falls apart when a write failure causes the caller to skip
calling `reftable_write_close()`.
Introduce a new function `reftable_writer_release()` that releases all
internal state and call it from both paths. Like this it is fine for the
caller to not call `reftable_writer_close()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/writer: refactorings for `writer_flush_nonempty_block()`
Large parts of the reftable library do not conform to Git's typical code
style. Refactor `writer_flush_nonempty_block()` such that it conforms
better to it and add some documentation that explains some of its more
intricate behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/writer: refactorings for `writer_add_record()`
Large parts of the reftable library do not conform to Git's typical code
style. Refactor `writer_add_record()` such that it conforms better to it
and add some documentation that explains some of its more intricate
behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to write reflog entries we need to compute the committer's
identity as it gets encoded in the log record itself. The reftable
backend does this via `git_committer_info()` and `split_ident_line()` in
`fill_reftable_log_record()`, which use the Git config as well as
environment variables to figure out the identity.
While most callers would only call `fill_reftable_log_record()` once or
twice, `write_transaction_table()` will call it as many times as there
are queued ref updates. This can be quite a waste of effort when writing
many refs with reflog entries in a single transaction.
Refactor the code to pre-compute the committer information. This results
in a small speedup when writing 100000 refs in a single transaction:
Benchmark 1: update-ref: create many refs (HEAD~)
Time (mean ± σ): 2.895 s ± 0.020 s [User: 1.516 s, System: 1.374 s]
Range (min … max): 2.868 s … 2.983 s 100 runs
Benchmark 2: update-ref: create many refs (HEAD)
Time (mean ± σ): 2.845 s ± 0.017 s [User: 1.461 s, System: 1.379 s]
Range (min … max): 2.803 s … 2.913 s 100 runs
Summary
update-ref: create many refs (HEAD) ran
1.02 ± 0.01 times faster than update-ref: create many refs (HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit we have disabled name checks in the "reftable"
backend. These checks were responsible for verifying multiple things
when writing records to the reftable stack:
- Detecting file/directory conflicts. Starting with the preceding
commits this is now handled by the reftable backend itself via
`refs_verify_refname_available()`.
- Validating refnames. This is handled by `check_refname_format()` in
the generic ref transacton layer.
The code in the reftable library is thus not used anymore and likely to
bitrot over time. Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
All the callback functions which write refs in the reftable backend
perform D/F conflict checks via `refs_verify_refname_available()`. But
in reality we perform these D/F conflict checks a second time in the
reftable library via `stack_check_addition()`.
Interestingly, the code in the reftable library is inferior compared to
the generic function:
- It is slower than `refs_verify_refname_available()`, even though
this can probably be optimized.
- It does not provide a proper error message to the caller, and thus
all the user would see is a generic "file/directory conflict"
message.
Disable the D/F conflict checks in the reftable library by setting the
`skip_name_check` write option. This results in a non-negligible speedup
when writing many refs. The following benchmark writes 100k refs in a
single transaction:
Benchmark 1: update-ref: create many refs (HEAD~)
Time (mean ± σ): 3.241 s ± 0.040 s [User: 1.854 s, System: 1.381 s]
Range (min … max): 3.185 s … 3.454 s 100 runs
Benchmark 2: update-ref: create many refs (HEAD)
Time (mean ± σ): 2.878 s ± 0.024 s [User: 1.506 s, System: 1.367 s]
Range (min … max): 2.838 s … 2.960 s 100 runs
Summary
update-ref: create many refs (HEAD~) ran
1.13 ± 0.02 times faster than update-ref: create many refs (HEAD)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/reftable: perform explicit D/F check when writing symrefs
We already perform explicit D/F checks in all reftable callbacks which
write refs, except when writing symrefs. For one this leads to an error
message which isn't perfectly actionable because we only tell the user
that there was a D/F conflict, but not which refs conflicted with each
other. But second, once all ref updating callbacks explicitly check for
D/F conflicts, we can disable the D/F checks in the reftable library
itself and thus avoid some duplicated efforts.
Refactor the code that writes symref tables to explicitly call into
`refs_verify_refname_available()` when writing symrefs.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/reftable: fix D/F conflict error message on ref copy
The `write_copy_table()` function is shared between the reftable
implementations for renaming and copying refs. The only difference
between those two cases is that the rename will also delete the old
reference, whereas copying won't.
This has resulted in a bug though where we don't properly verify refname
availability. When calling `refs_verify_refname_available()`, we always
add the old ref name to the list of refs to be skipped when computing
availability, which indicates that the name would be available even if
it already exists at the current point in time. This is only the right
thing to do for renames though, not for copies.
The consequence of this bug is quite harmless because the reftable
backend has its own checks for D/F conflicts further down in the call
stack, and thus we refuse the update regardless of the bug. But all the
user gets in this case is an uninformative message that copying the ref
has failed, without any further details.
Fix the bug and only add the old name to the skip-list in case we rename
the ref. Consequently, this error case will now be handled by
`refs_verify_refname_available()`, which knows to provide a proper error
message.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 8 Apr 2024 23:36:05 +0000 (16:36 -0700)]
Makefile(s): do not enforce "all indents must be done with tab"
Our top-level Makefile follows our generic whitespace rule
established by the top-level .gitattributes file that does not
enforce indent-with-non-tab rule by default, but git-gui is set up
to enforce indent-with-non-tab by default. With the upcoming change
to GNU make, we no longer can reject (and worse, "fix") a patch that
adds whitespace indented lines to the Makefile, so loosen the rule
there for git-gui/Makefile, too.
Taylor Blau [Mon, 8 Apr 2024 15:51:44 +0000 (11:51 -0400)]
Makefile(s): avoid recipe prefix in conditional statements
In GNU Make commit 07fcee35 ([SV 64815] Recipe lines cannot contain
conditional statements, 2023-05-22) and following, conditional
statements may no longer be preceded by a tab character (which Make
refers to as the recipe prefix).
There are a handful of spots in our various Makefile(s) which will break
in a future release of Make containing 07fcee35. For instance, trying to
compile the pre-image of this patch with the tip of make.git results in
the following:
$ make -v | head -1 && make
GNU Make 4.4.90
config.mak.uname:842: *** missing 'endif'. Stop.
The kernel addressed this issue in 82175d1f9430 (kbuild: Replace tabs
with spaces when followed by conditionals, 2024-01-28). Address the
issues in Git's tree by applying the same strategy.
When a conditional word (ifeq, ifneq, ifdef, etc.) is preceded by one or
more tab characters, replace each tab character with 8 space characters
with the following:
The "unless /\\$/" removes any false-positives (like "\telse \"
appearing within a shell script as part of a recipe).
After doing so, Git compiles on newer versions of Make:
$ make -v | head -1 && make
GNU Make 4.4.90
GIT_VERSION = 2.44.0.414.gfac1dc44ca9
[...]
$ echo $?
0
Reported-by: Dario Gjorgjevski <dario.gjorgjevski@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sat, 6 Apr 2024 18:11:12 +0000 (11:11 -0700)]
config: do not leak excludes_file
The excludes_file variable is marked "const char *", but all the
assignments to it are made with a piece of memory allocated just
for it, and the variable is responsible for owning it.
When "core.excludesfile" is read, the code just lost the previous
value, leaking memory. Plug it.
The real problem is that the variable is mistyped; our convention
is to never make a variable that owns the piece of memory pointed
by it as "const". Fixing that would reduce the chance of this kind
of bug happening, and also would make it unnecessary to cast the
constness away while free()ing it, but that would be a much larger
follow-up effort.
Reported-by: Rubén Justo <rjusto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
To reduce the number of on-disk reftables, compaction is performed.
Contiguous tables with the same binary log value of size are grouped
into segments. The segment that has both the lowest binary log value and
contains more than one table is set as the starting point when
identifying the compaction segment.
Since segments containing a single table are not initially considered
for compaction, if the table appended to the list does not match the
previous table log value, no compaction occurs for the new table. It is
therefore possible for unbounded growth of the table list. This can be
demonstrated by repeating the following sequence:
git branch -f foo
git branch -d foo
Each operation results in a new table being written with no compaction
occurring until a separate operation produces a table matching the
previous table log value.
Instead, to avoid unbounded growth of the table list, the compaction
strategy is updated to ensure tables follow a geometric sequence after
each operation by individually evaluating each table in reverse index
order. This strategy results in a much simpler and more robust algorithm
compared to the previous one while also maintaining a minimal ordered
set of tables on-disk.
When creating 10 thousand references, the new strategy has no
performance impact:
Benchmark 1: update-ref: create refs sequentially (revision = HEAD~)
Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s]
Range (min … max): 26.447 s … 26.569 s 10 runs
Benchmark 2: update-ref: create refs sequentially (revision = HEAD)
Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s]
Range (min … max): 26.366 s … 26.444 s 10 runs
Summary
update-ref: create refs sequentially (revision = HEAD) ran
1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~)
Some tests in `t0610-reftable-basics.sh` assert the on-disk state of
tables and are therefore updated to specify the correct new table count.
Since compaction is more aggressive in ensuring tables maintain a
geometric sequence, the expected table count is reduced in these tests.
In `reftable/stack_test.c` tests related to `sizes_to_segments()` are
removed because the function is no longer needed. Also, the
`test_suggest_compaction_segment()` test is updated to better showcase
and reflect the new geometric compaction behavior.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In future tests it will be neccesary to create repositories with a set
number of tables. To make this easier, introduce the
`GIT_TEST_REFTABLE_AUTOCOMPACTION` environment variable that, when set
to false, disables autocompaction of reftables.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/stack: expose option to disable auto-compaction
The reftable stack already has a variable to configure whether or not to
run auto-compaction, but it is inaccessible to users of the library.
There exist use cases where a caller may want to have more control over
auto-compaction.
Move the `disable_auto_compact` option into `reftable_write_options` to
allow external callers to disable auto-compaction. This will be used in
a subsequent commit.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sat, 6 Apr 2024 00:28:16 +0000 (17:28 -0700)]
t1016: local VAR="VAL" fix
The series was based on maint and fixes all the tests that exist
there, but we have acquired a few more.
I suspect that the values assigned in many of these places are $IFS
safe, and this is primarily to squelch the linter than adding a
necessary workaround for buggy dash.
Junio C Hamano [Sat, 6 Apr 2024 00:09:02 +0000 (17:09 -0700)]
t: teach lint that RHS of 'local VAR=VAL' needs to be quoted
Teach t/check-non-portable-shell.pl that right hand side of the
assignment done with "local VAR=VAL" need to be quoted. We
deliberately target only VAL that begins with $ so that we can catch
- $variable_reference and positional parameter reference like $4
- $(command substitution)
- ${variable_reference-with_magic}
while excluding
- $'\n' that is a bash-ism freely usable in t990[23]
- $(( arithmetic )) whose result should be $IFS safe.
- $? that also is $IFS safe
Junio C Hamano [Sat, 6 Apr 2024 00:08:58 +0000 (17:08 -0700)]
CodingGuidelines: quote assigned value in 'local var=$val'
Dash bug https://bugs.launchpad.net/ubuntu/+source/dash/+bug/139097
lets the shell erroneously perform field splitting on the expansion
of a command substitution during declaration of a local or an extern
variable.
The explanation was stolen from ebee5580 (parallel-checkout: avoid
dash local bug in tests, 2021-06-06).
Junio C Hamano [Sat, 6 Apr 2024 00:08:57 +0000 (17:08 -0700)]
CodingGuidelines: describe "export VAR=VAL" rule
https://lore.kernel.org/git/201307081121.22769.tboegi@web.de/
resulted in 9968ffff (test-lint: detect 'export FOO=bar',
2013-07-08) to add a rule to t/check-non-portable-shell.pl script to
reject
export VAR=VAL
and suggest us to instead write it as two statements, i.e.,
VAR=VAL
export VAR
This however was not spelled out in the CodingGuidelines document.
We may want to re-evaluate the rule since it is from ages ago, but
for now, let's make the written rule and what the automation
enforces consistent.
Steven Jeuris [Wed, 3 Apr 2024 21:42:44 +0000 (21:42 +0000)]
userdiff: better method/property matching for C#
- Support multi-line methods by not requiring closing parenthesis.
- Support multiple generics (comma was missing before).
- Add missing `foreach`, `lock` and `fixed` keywords to skip over.
- Remove `instanceof` keyword, which isn't C#.
- Also detect non-method keywords not positioned at the start of a line.
- Added tests; none existed before.
The overall strategy is to focus more on what isn't expected for
method/property definitions, instead of what is, but is fully optional.
Signed-off-by: Steven Jeuris <steven.jeuris@gmail.com> Acked-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Fri, 5 Apr 2024 17:44:59 +0000 (19:44 +0200)]
date: make DATE_MODE thread-safe
date_mode_from_type() modifies a static variable and returns a pointer
to it. This is not thread-safe. Most callers of date_mode_from_type()
use it via the macro DATE_MODE and pass its result on to functions like
show_date(), which take a const pointer and don't modify the struct.
Avoid the static storage by putting the variable on the stack and
returning the whole struct date_mode. Change functions that take a
constant pointer to expect the whole struct instead.
Reduce the cost of passing struct date_mode around on 64-bit systems
by reordering its members to close the hole between the 32-bit wide
.type and the 64-bit aligned .strftime_fmt as well as the alignment
hole at the end. sizeof reports 24 before and 16 with this change
on x64. Keep .type at the top to still allow initialization without
designator -- though that's only done in a single location, in
builtin/blame.c.
Signed-off-by: René Scharfe <l.s.r@web.de> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Fri, 5 Apr 2024 18:59:52 +0000 (20:59 +0200)]
usage: report vsnprintf(3) failure
vreportf(), which is used e.g. by die() and warning() by default, calls
vsnprintf(3) to format the message to report. If that call fails, it
only prints the prefix, e.g. "fatal: " or "warning: ". This at least
informs users that they were supposed to get a message and reveals its
severity, but leaves them wondering what it may have been about.
Here's an example where vreportf() tries to print a message with a 2GB
string, which is too much for vsnprintf(3):
At least report the formatting error along with the offending message
(unformatted) to indicate why that message is empty. Use fprintf(3)
instead of error() to get the message out directly and avoid recursing
back into vreportf().
... which allows users to at least get an idea of what went wrong.
Suggested-by: Jeff King <peff@peff.net> Helped-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 5 Apr 2024 20:04:51 +0000 (16:04 -0400)]
remote-curl: add Transfer-Encoding header only for older curl
As of curl 7.66.0, we don't need to manually specify a "chunked"
Transfer-Encoding header. Instead, modern curl deduces the need for it
in a POST that has a POSTFIELDSIZE of -1 and uses READFUNCTION rather
than POSTFIELDS.
That version is recent enough that we can't just drop the header; we
need to do so conditionally. Since it's only a single line, it seems
like the simplest thing would just be to keep setting it unconditionally
(after all, the #ifdefs are much longer than the actual code). But
there's another wrinkle: HTTP/2.
Curl may choose to use HTTP/2 under the hood if the server supports it.
And in that protocol, we do not use the chunked encoding for streaming
at all. Most versions of curl handle this just fine by recognizing and
removing the header. But there's a regression in curl 8.7.0 and 8.7.1
where it doesn't, and large requests over HTTP/2 are broken (which t5559
notices). That regression has since been fixed upstream, but not yet
released.
Make the setting of this header conditional, which will let Git work
even with those buggy curl versions. And as a bonus, it serves as a
reminder that we can eventually clean up the code as we bump the
supported curl versions.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 5 Apr 2024 17:49:49 +0000 (10:49 -0700)]
Merge branch 'jk/core-comment-string'
core.commentChar used to be limited to a single byte, but has been
updated to allow an arbitrary multi-byte sequence.
* jk/core-comment-string:
config: add core.commentString
config: allow multi-byte core.commentChar
environment: drop comment_line_char compatibility macro
wt-status: drop custom comment-char stringification
sequencer: handle multi-byte comment characters when writing todo list
find multi-byte comment chars in unterminated buffers
find multi-byte comment chars in NUL-terminated strings
prefer comment_line_str to comment_line_char for printing
strbuf: accept a comment string for strbuf_add_commented_lines()
strbuf: accept a comment string for strbuf_commented_addf()
strbuf: accept a comment string for strbuf_stripspace()
environment: store comment_line_char as a string
strbuf: avoid shadowing global comment_line_char name
commit: refactor base-case of adjust_comment_line_char()
strbuf: avoid static variables in strbuf_add_commented_lines()
strbuf: simplify comment-handling in add_lines() helper
config: forbid newline as core.commentChar
Junio C Hamano [Fri, 5 Apr 2024 17:49:49 +0000 (10:49 -0700)]
Merge branch 'rs/config-comment'
"git config" learned "--comment=<message>" option to leave a
comment immediately after the "variable = value" on the same line
in the configuration file.
* rs/config-comment:
config: allow tweaking whitespace between value and comment
config: fix --comment formatting
config: add --comment option to add a comment
Junio C Hamano [Fri, 5 Apr 2024 17:34:23 +0000 (10:34 -0700)]
Merge branch 'ps/pack-refs-auto' into jt/reftable-geometric-compaction
* ps/pack-refs-auto:
builtin/gc: pack refs when using `git maintenance run --auto`
builtin/gc: forward git-gc(1)'s `--auto` flag when packing refs
t6500: extract objects with "17" prefix
builtin/gc: move `struct maintenance_run_opts`
builtin/pack-refs: introduce new "--auto" flag
builtin/pack-refs: release allocated memory
refs/reftable: expose auto compaction via new flag
refs: remove `PACK_REFS_ALL` flag
refs: move `struct pack_refs_opts` to where it's used
t/helper: drop pack-refs wrapper
refs/reftable: print errors on compaction failure
reftable/stack: gracefully handle failed auto-compaction due to locks
reftable/stack: use error codes when locking fails during compaction
reftable/error: discern locked/outdated errors
reftable/stack: fix error handling in `reftable_stack_init_addition()`
When parsing config keys, the normal pattern is to return 0 after
completing the logic for a specific config key, since no other key will
match. One instance, for "submodule.recurse", was missing this case in
builtin/fetch.c.
This is a very minor change, and will have minimal impact to
performance. This particular block was edited recently in 56e8bb4fb4
(fetch: use `fetch_config` to store "fetch.recurseSubmodules" value,
2023-05-17), which led to some hesitation that perhaps this omission was
on purpose.
However, no later cases within git_fetch_config() will match the key if
equal to "submodule.recurse" and neither will any key matches within the
catch-all git_default_config().
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Fri, 5 Apr 2024 10:53:23 +0000 (12:53 +0200)]
apply: avoid fixed-size buffer in create_one_file()
PATH_MAX is not always a hard limit and 'path' in create_one_file()
could be longer -- it's taken from the patch file and allocated
dynamically. Allocate the name of the temporary file on the heap as
well instead of using a fixed-size buffer to avoid that arbitrary limit.
Resist the temptation of using the more convenient mkpath() to avoid
introducing a dependency on a static variable deep inside the apply
machinery.
Take care to work around (arguably buggy) implementations of free(3)
that modify errno, by calling it only after using the errno value.
Suggested-by: Jeff King <peff@peff.net> Helped-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/add: error out when passing untracked path with -u
When passing untracked path with -u option, it silently succeeds.
There is no error message and the exit code is zero. This is
inconsistent with other instances of git commands where the expected
argument is a known path. In those other instances, we error out when
the path is not known.
Fix this by passing a character array to add_files_to_cache() to
collect the pathspec matching information and report the error if a
pathspec does not match any cache entry. Also add a testcase to cover
this scenario.
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/commit: error out when passing untracked path with -i
When we provide a pathspec which does not match any tracked path
alongside --include, we do not error like without --include. If there
is something staged, it will commit the staged changes and ignore the
pathspec which does not match any tracked path. And if nothing is
staged, it will print the status. Exit code is 0 in both cases (unlike
without --include). This is also described in the TODO comment before
the relevant testcase.
Fix this by passing a character array to add_files_to_cache() to
collect the pathspec matching information and error out if the given
path is untracked. Also, amend the testcase to check for the error
message and remove the TODO comment.
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 3 Apr 2024 18:14:48 +0000 (23:44 +0530)]
revision: optionally record matches with pathspec elements
Unlike "git add" and other end-user facing commands, where it is
diagnosed as an error to give a pathspec with an element that does
not match any path, the diff machinery does not care if some
elements of the pathspec do not match. Given that the diff
machinery is heavily used in pathspec-limited "git log" machinery,
and it is common for a path to come and go while traversing the
project history, this is usually a good thing.
However, in some cases we would want to know if all the pathspec
elements matched. For example, "git add -u <pathspec>" internally
uses the machinery used by "git diff-files" to decide contents from
what paths to add to the index, and as an end-user facing command,
"git add -u" would want to report an unmatched pathspec element.
Add a new .ps_matched member next to the .prune_data member in
"struct rev_info" so that we can optionally keep track of the use of
.prune_data pathspec elements that can be inspected by the caller.
Windows 10 build 17063 introduced support for unix sockets to Windows. bb390b1 (git-compat-util: include declaration for unix sockets in
windows, 2021-09-14) introduced a way to build git with unix socket
support on Windows, but you still had to decide at build time which
Windows version the compiled executable was supposed to run on.
We can detect at runtime wether the operating system supports unix
sockets and act accordingly for all supported Windows versions.
This fixes https://github.com/git-for-windows/git/issues/3892
Signed-off-by: Matthias Aßhauer <mha1993@live.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 3 Apr 2024 17:56:20 +0000 (10:56 -0700)]
Merge branch 'bl/cherry-pick-empty'
Allow git-cherry-pick(1) to automatically drop redundant commits via
a new `--empty` option, similar to the `--empty` options for
git-rebase(1) and git-am(1). Includes a soft deprecation of
`--keep-redundant-commits` as well as some related docs changes and
sequencer code cleanup.
* bl/cherry-pick-empty:
cherry-pick: add `--empty` for more robust redundant commit handling
cherry-pick: enforce `--keep-redundant-commits` incompatibility
sequencer: do not require `allow_empty` for redundant commit options
sequencer: handle unborn branch with `--allow-empty`
rebase: update `--empty=ask` to `--empty=stop`
docs: clean up `--empty` formatting in git-rebase(1) and git-am(1)
docs: address inaccurate `--empty` default with `--exec`
Junio C Hamano [Wed, 3 Apr 2024 17:56:20 +0000 (10:56 -0700)]
Merge branch 'bl/pretty-shorthand-config-fix'
The "--pretty=<shortHand>" option of the commands in the "git log"
family, defined as "[pretty] shortHand = <expansion>" should have
been looked up case insensitively, but was not, which has been
corrected.
* bl/pretty-shorthand-config-fix:
pretty: find pretty formats case-insensitively
pretty: update tests to use `test_config`
Junio C Hamano [Wed, 3 Apr 2024 17:56:19 +0000 (10:56 -0700)]
Merge branch 'ds/grep-doc-updates'
Documentation updates.
* ds/grep-doc-updates:
grep docs: describe --no-index further and improve formatting a bit
grep docs: describe --recurse-submodules further and improve formatting a bit
Junio C Hamano [Wed, 3 Apr 2024 17:56:19 +0000 (10:56 -0700)]
Merge branch 'jc/release-notes-entry-experiment'
Introduce an experimental protocol for contributors to propose the
topic description to be used in the "What's cooking" report, the
merge commit message for the topic, and in the release notes and
document it in the SubmittingPatches document.
The implementation and documentation of "object-format" option
exchange between the Git itself and its remote helpers did not
quite match, which has been corrected.
* jk/remote-helper-object-format-option-fix:
transport-helper: send "true" value for object-format option
transport-helper: drop "object-format <algo>" option
transport-helper: use write helpers more consistently
Taylor Blau [Tue, 2 Apr 2024 16:26:34 +0000 (12:26 -0400)]
t/t7700-repack.sh: fix test breakages with `GIT_TEST_MULTI_PACK_INDEX=1 `
There are a handful of related test breakages which are found when
running t/t7700-repack.sh with GIT_TEST_MULTI_PACK_INDEX set to "1" in
your environment.
Both test failures are the result of something like:
, where we repack instructing Git to write a new MIDX and corresponding
MIDX bitamp.
The error occurs when GIT_TEST_MULTI_PACK_INDEX=1 is found in the
enviornment. This causes Git to write out a second MIDX (after
processing the builtin's `--write-midx` argument) which is identical to
the first, but does not request a bitmap (since we did not set the
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP variable in the environment).
Since c528e179662 (pack-bitmap: write multi-pack bitmaps, 2021-08-31),
the MIDX machinery will drop an existing MIDX bitmap when rewriting an
identical MIDX which does not itself request a corresponding bitmap,
which is similar to the way repack itself behaves in the pack-bitmap
case.
Correct these issues (which date back to [1] and [2], respectively) by
explicitly setting GIT_TEST_MULTI_PACK_INDEX to zero before running each
command.
In the future, we should consider removing GIT_TEST_MULTI_PACK_INDEX,
and in general clean up unused GIT_TEST_-variables. But that is a larger
effort, and this ensures that we can cleanly run:
$ GIT_TEST_MULTI_PACK_INDEX=1 make test
in the meantime.
[1]: 324efc90d1b (builtin/repack.c: pass `--refs-snapshot` when writing
bitmaps, 2021-10-01)
[2]: 197443e80ab (repack: don't remove .keep packs with
`--pack-kept-objects`, 2022-10-17).
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/block: avoid decoding keys when searching restart points
When searching over restart points in a block we decode the key of each
of the records, which results in a memory allocation. This is quite
pointless though given that records it restart points will never use
prefix compression and thus store their keys verbatim in the block.
Refactor the code so that we can avoid decoding the keys, which saves us
some allocations.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/record: extract function to decode key lengths
We're about to refactor the binary search over restart points so that it
does not need to fully decode the record keys anymore. To do so we will
need to decode the record key lengths, which is non-trivial logic.
Extract the logic to decode these lengths from `refatble_decode_key()`
so that we can reuse it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/block: fix error handling when searching restart points
When doing the binary search over restart points in a block we need to
decode the record keys. This decoding step can result in an error when
the block is corrupted, which we indicate to the caller of the binary
search by setting `args.error = 1`. But the only caller that exists
mishandles this because it in fact performs the error check before
calling `binsearch()`.
Fix this bug by checking for errors at the right point in time.
Furthermore, refactor `binsearch()` so that it aborts the search in case
the callback function returns a negative value so that we don't
needlessly continue to search the block.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/block: refactor binary search over restart points
When seeking a record in our block reader we perform a binary search
over the block's restart points so that we don't have to do a linear
scan over the whole block. The logic to do so is quite intricate though,
which makes it hard to understand.
Improve documentation and rename some of the functions and variables so
that the code becomes easier to understand overall. This refactoring
should not result in any change in behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/refname: refactor binary search over refnames
It is comparatively hard to understand how exactly the binary search
over refnames works given that the function and variable names are not
exactly easy to grasp. Rename them to make this more obvious. This
should not result in any change in behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `binsearch()` test is somewhat weird in that it doesn't explicitly
spell out its expectations. Instead it does so in a rather ad-hoc way
with some hard-to-understand computations.
Refactor the test to spell out the needle as well as expected index for
all testcases. This refactoring highlights that the `binsearch_func()`
is written somewhat weirdly to find the first integer smaller than the
needle, not smaller or equal to it. Adjust the function accordingly.
While at it, rename the callback function to better convey its meaning.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/basics: fix return type of `binsearch()` to be `size_t`
The `binsearch()` function can be used to find the first element for
which a callback functions returns a truish value. But while the array
size is of type `size_t`, the function in fact returns an `int` that is
supposed to index into that array.
Fix the function signature to return a `size_t`. This conversion does
not change any semantics given that the function would only ever return
a value in the range `[0, sz]` anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>