shejialuo [Thu, 27 Feb 2025 16:06:24 +0000 (00:06 +0800)]
packed-backend: check whether the "packed-refs" is regular file
Although "git-fsck(1)" and "packed-backend.c" will check some
consistency and correctness of "packed-refs" file, they never check the
filetype of the "packed-refs". Let's verify that the "packed-refs" has
the expected filetype, confirming it is created by "git pack-refs"
command.
We could use "open_nofollow" wrapper to open the raw "packed-refs" file.
If the returned "fd" value is less than 0, we could check whether the
"errno" is "ELOOP" to report an error to the user. And then we use
"fstat" to check whether the "packed-refs" file is a regular file.
Reuse "FSCK_MSG_BAD_REF_FILETYPE" fsck message id to report the error to
the user if "packed-refs" is not a regular file.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
shejialuo [Thu, 27 Feb 2025 16:06:06 +0000 (00:06 +0800)]
builtin/refs: get worktrees without reading head information
In "packed-backend.c", there are some functions such as "create_snapshot"
and "next_record" which would check the correctness of the content of
the "packed-ref" file. When anything is bad, the program will die.
It may seem that we have nothing relevant to above feature, because we
are going to read and parse the raw "packed-ref" file without creating
the snapshot and using the ref iterator to check the consistency.
However, when using "get_worktrees" in "builtin/refs", we would parse
the "HEAD" information. If the referent of the "HEAD" is inside the
"packed-ref", we will call "create_snapshot" function to parse the
"packed-ref" to get the information. No matter whether the entry of
"HEAD" in "packed-ref" is correct, "create_snapshot" would call
"verify_buffer_safe" to check whether there is a newline in the last
line of the file. If not, the program will die.
Although this behavior has no harm for the program, it will
short-circuit the program. When the users execute "git refs verify" or
"git fsck", we should avoid reading the head information, which may
execute the read operation in packed backend with stricter checks to die
the program. Instead, we should continue to check other parts of the
"packed-refs" file completely.
Fortunately, in 465a22b338 (worktree: skip reading HEAD when repairing
worktrees, 2023-12-29), we have introduced a function
"get_worktrees_internal" which allows us to get worktrees without
reading head information.
Create a new exposed function "get_worktrees_without_reading_head", then
replace the "get_worktrees" in "builtin/refs" with the new created
function.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
shejialuo [Thu, 27 Feb 2025 16:05:55 +0000 (00:05 +0800)]
t0602: use subshell to ensure working directory unchanged
For every test, we would execute the command "cd repo" in the first but
we never execute the command "cd .." to restore the working directory.
However, it's either not a good idea use above way. Because if any test
fails between "cd repo" and "cd ..", the "cd .." will never be reached.
And we cannot correctly restore the working directory.
Let's use subshell to ensure that the current working directory could be
restored to the correct path.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitlab-ci: fix "msvc-meson" test job succeeding despite test failures
We have recently noticed that the "msvc-meson" test job in GitLab CI
succeeds even if there are failures. This is somewhat puzzling because
we use exactly the same command as we do on GitHub Actions, and there
the jobs fail as exected.
As it turns out, this is another weirdness of the GitLab CI hosted
runner for Windows [1]: by default, even successful commands will not
make the job fail. Interestingly though, this depends on what exactly
the command is that you're running -- the MinGW-based job for example
works alright and does fail as expected.
The root cause here seems to be specific behaviour of PowerShell. The
invocation of `ForEach-Object` does not bubble up any errors in case the
invocation of `meson test` fails, and thus we don't notice the error.
This is specific to executing the command in a loop: other build steps
where we execute commands directly fail as expected.
This is because the specific version of PowerShell that we use in the
runner does not know about `PSNativeCommandUseErrorActionPreference`
yet, which controls whether native commands like "meson.exe" honor the
`ErrorActionPreference` variable. The preference has been introduced
with PowerShell 7.3 and is default-enabled since PowerShell 7.4, but
GitLab's hosted runners still seem to use PowerShell 5.1. Consequently,
when tests fail, we won't bubble up the error at all from the loop and
thus the job doesn't fail. This isn't an issue in other cases though
where we execute native commands directly, as the GitLab runner knows to
check the last error code after every command.
The same thing doesn't seem to be an issue on GitHub Actions, most
likely because it uses PowerShell 7.4. Curiously, the preference for
`PSNativeCommandUseErrorActionPreference` is disabled there, but the
jobs fail as expected regardless of that. It's puzzling, but I do not
have enough PowerShell expertise to give a definitive answer as to why
it works there.
In any case, Meson 1.8 will likely get support for slicing tests [1], so
we can eventually get rid of the whole PowerShell script. For now, work
around the issue by explicitly exiting out of the loop with a non-zero
error code if we see that Meson has failed.
gitlab-ci: restrict maximum number of link jobs on Windows
The hosted Windows runners on GitLab.com only have 7.5GB of RAM. Given
that "link.exe" provided by Microsoft Visual Studio is multi-threaded by
itself already and thus quite memory hungry this can quickly lead to
memory starvation, out-of-memory situations and thus failed CI jobs.
Fix the issue by limiting the number of concurrent linker jobs. The same
issue hasn't been observed on GitHub Actions yet, probably because it
got more than twice the amount of RAM with 16GB.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
meson: consistently use custom program paths to resolve programs
The calls to `find_program()` in our documentation don't use our custom
program path. This variable gets populated on Windows with the location
of Git for Windows so that we can use it to provide our build tools.
Consequently, we may not be able to find all necessary binaries on
Windows.
Adapt the calls to use the program path to fix this. While at it, drop
`required: true` arguments, which are the default anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're assigning the `git` variable in three places:
- In "meson.build" to store the external Git executable.
- In "meson.build" to store the compiled Git executable.
- In "Documentation/meson.build" to store the external Git executable,
a second time.
The last case is only needed because we overwrite the original variable
with the built version. Rename the variable used for the built Git
executable so that we don't have to resolve the external Git executable
multiple times.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're searching for the sed(1) executable in a loop, which will make us
try to find it multiple times. Starting with the preceding commit we
already declare a variable for that program in the top-level build file.
Use it so that we only need to search for the program once.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
meson: improve handling of `sane_tool_path` option
The `sane_tool_path` option can be used to override the PATH variable
from which the build process, tests and ultimately Git will end up
picking programs from. It is currently lacking though because we only
use it to populate the PATH environment variable for executed scripts
and for the `BROKEN_PATH_FIX` mechanism, but we don't use it to find
programs used in the build process itself.
Fix this issue by treating it similar to the Windows-specific paths,
which will make us use it both to find programs and to populate the PATH
environment variable.
To help with this fix, change the type of the option to be an array of
paths, which makes the handling a bit easier for us. It's also the
correct thing to do as the input indeed is a list of paths.
Furthermore, the option now overrides the default behaviour on Windows,
which si to pick up tools from Git for Windows. This is done so that it
becomes easier to override that default behaviour in case it's not
desired.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When locating programs required for the build we give some special
treatment to Windows systems so that we know to also look up tools
provided by a Git for Windows installation. This ensures that the build
doesn't have any prerequisites other than Microsoft Visual Studio, Meson
and Git for Windows.
Consequently, some of the programs returned by `find_program()` may not
be found via PATH, but via these extra directories. But while Meson can
use these tools directly without any special treatment, any scripts that
we execute may not be able to find those programs. To help them we thus
prepend the directories of a subset of the found programs to PATH.
This doesn't make much sense though: we don't need to prepend PATH for
any program that was found via PATH, but we really only need to do so
for programs located via the extraneous Windows-specific paths. So
instead of prepending all programs paths, we really only need to prepend
the Windows-specific paths.
Adapt the code accordingly by only prepeding Windows-specific paths to
PATH, which both simplifies the code and clarifies intent.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When building `libgit.a` we link it against a `libgit_version.a` library
that contains the version information that we inject at build time. The
intent of this is to avoid rebuilding all of `libgit.a` whenever the
version changes. But that wouldn't happen in the first place, as we know
to just rebuild the files that depend on the generated "version-def.h"
file.
This is an artifact of an earlier version of the Meson build infra that
didn't ultimately land. We didn't yet have "version-def.h", and instead
injected the version via preprocessor directives. And here we would have
rebuilt all of `libgit.a` indeed in case the version changes, because
the preprocessor directive applied to all files.
Stop building the separate library and instead add "version-def.h" to
the list of source files directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We set up libcurl via the `libgit_dependencies` variable, which gets
propagated into every user of the `libgit` dependency. This is not
necessary though, as most of our executables aren't even supposed to
link against libcurl.
Fix this by only propagating include directories as a libgit dependency
and propagating the full curl dependency via `libgit_curl`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've got a set of common source files that we use for those executables
that link against libcurl. The setup is somewhat repetitive though.
Simplify it by declaring a `libgit_curl` dependency that bundles all of
it together.
Note that we don't include curl itself as a dependency. This is because
we already pull it in transitively via the libgit dependency, which is
unfortunate because libgit itself shouldn't actually link against curl
in the first place. This will get fixed in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "common-main.c" file is used by multiple executables. In order to
make it easy to set it up we have created a separate library that these
executables can link against. All of these executables also want to link
against `libgit.a` though, which makes it necessary to specify both of
these as dependencies for every executable.
Simplify this a bit by declaring the library as a source dependency:
instead of creating a static library, we now instead compile the common
set of files into each executable separately.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When setting up `libgit.a` we first create the static library itself,
and then declare it as part of a dependency such that compile arguments,
include directories and transitive dependencies get propagated to the
users of that library. As such, the static library isn't expected to be
used by anything but the declared dependency.
Inline the static library so that we don't even use a separate variable
for it. This avoids any kind of confusion that may arise and clarifies
how the library is supposed to be used.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
meson: fix OpenSSL fallback when not explicitly required
When OpenSSL isn't provided by the system we know to fall back to the
subproject wrapper. This is especially helpful on Windows systems, where
you typically don't have OpenSSL available, in order to reduce the
number of required dependencies.
The fallback is broken though when the OpenSSL backend is set to 'auto'
as we end up calling `dependency('openssl', required: false)` in that
case, which implicitly disables falling back to the wrapper.
Fix the issue by re-allowing the fallback in case either OpenSSL is
required or in case the backend is set to 'auto'. While at it, fix
reporting of the backend in case the user asked us to pick no HTTPS
backend at all.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the runtime prefix option is enabled, Git is built such that it
knows to locate its binaries relative to the directory a binary is being
executed from. This requires us to figure out relative paths, which is
handled in `system_prefix()` by trying to strip a couple of well-known
paths.
One of these paths, GIT_EXEC_PATH, is expected to be absolute when
runtime prefixes are enabled, but relative otherwise. And while our
Makefile gets this correctly, in Meson we always wire up the absolute
path, which may result in us not being able to find binaries.
Fix this by conditionally injecting the paths depending on whether or
not the `runtime_prefix` option is enabled.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Submodule merges are, in general, similar to other merges based on oid
three-way-merge. When a conflict happens, however, Git has two special
cases (introduced in 68d03e4a6e44) on handling the conflict before
yielding it to the user. From the merge-ort and merge-recursive sources:
- "Case #1: a is contained in b or vice versa": both strategies try to
perform a fast-forward in the submodules if the commit referred by the
conflicted submodule is descendant of another;
- "Case #2: There are one or more merges that contain a and b in the
submodule. If there is only one, then present it as a suggestion to the
user, but leave it marked unmerged so the user needs to confirm the
resolution."
Add a small paragraph on merge-strategies.adoc describing this behavior.
Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 25 Feb 2025 23:45:42 +0000 (15:45 -0800)]
BreakingChanges: clarify branches/ and remotes/
As we have created an empty .git/branches/ hierarchy until fairly
recently, these directories may be found in modern repositories, but
it is highly unlikely that they are being used.
Reported-by: Jakub Wilk <jwilk@jwilk.net> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 25 Feb 2025 22:19:36 +0000 (14:19 -0800)]
Merge branch 'ad/set-default-target-in-makefiles'
Correct the default target in Documentation/Makefile, and
future-proof all Makefiles from similar breakages by declaring the
default target (which happens to be "all") upfront.
* ad/set-default-target-in-makefiles:
Makefile: set default goals in makefiles
Junio C Hamano [Tue, 25 Feb 2025 22:19:36 +0000 (14:19 -0800)]
Merge branch 'pw/merge-tree-stdin-deadlock-fix'
"git merge-tree --stdin" has been improved (including a workaround
for a deadlock).
* pw/merge-tree-stdin-deadlock-fix:
merge-tree: fix link formatting in html docs
merge-tree: improve docs for --stdin
merge-tree: only use basic merge config
merge-tree: remove redundant code
merge-tree --stdin: flush stdout to avoid deadlock
Junio C Hamano [Tue, 25 Feb 2025 22:19:35 +0000 (14:19 -0800)]
Merge branch 'da/xdiff-w-sign-compare-workaround'
Noises from "-Wsign-compare" in the borrowed xdiff code has been
squelched.
* da/xdiff-w-sign-compare-workaround:
xdiff: avoid signed vs. unsigned comparisons in xutils.c
xdiff: avoid signed vs. unsigned comparisons in xpatience.c
xdiff: avoid signed vs. unsigned comparisons in xhistogram.c
xdiff: avoid signed vs. unsigned comparisons in xemit.c
xdiff: avoid signed vs. unsigned comparisons in xdiffi.c
xdiff: move sign comparison warning guard into each file
Seyi Kuforiji [Tue, 25 Feb 2025 10:10:44 +0000 (11:10 +0100)]
t/unit-tests: convert oidtree test to use clar test framework
Adapt oidtree test script to clar framework by using clar assertions
where necessary. `cl_parse_any_oid()` ensures the hash algorithm is set
before parsing. This prevents issues from an uninitialized or invalid
hash algorithm.
Introduce 'test_oidtree__initialize` handles the to set up of the global
oidtree variable and `test_oidtree__cleanup` frees the oidtree when all
tests are completed.
With this change, `check_each` stops at the first error encountered,
making it easier to address it.
Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Seyi Kuforiji [Tue, 25 Feb 2025 10:10:43 +0000 (11:10 +0100)]
t/unit-tests: convert oidmap test to use clar test framework
Adapt oidmap test script to clar framework by using clar assertions
where necessary. `cl_parse_any_oid()` ensures the hash algorithm is set
before parsing. This prevents issues from an uninitialized or invalid
hash algorithm.
Introduce 'test_oidmap__initialize` handles the to set up of the global
oidmap map with predefined key-value pairs, and `test_oidmap__cleanup`
frees the oidmap and its entries when all tests are completed.
The test loops through all entries to detect multiple errors. With this
change, it stops at the first error encountered, making it easier to
address it.
Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Seyi Kuforiji [Tue, 25 Feb 2025 10:10:42 +0000 (11:10 +0100)]
t/unit-tests: convert oid-array test to use clar test framework
Adapt oid-array test script to clar framework by using clar assertions
where necessary. Remove descriptions from macros to reduce
redundancy, and move test input arrays to global scope for reuse across
multiple test functions. Introduce `test_oid_array__initialize()` to
explicitly initialize the hash algorithm.
These changes streamline the test suite, making individual tests
self-contained and reducing redundant code.
Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Seyi Kuforiji [Tue, 25 Feb 2025 10:10:41 +0000 (11:10 +0100)]
t/unit-tests: implement clar specific oid helper functions
`get_oid_arbitrary_hex()` and `init_hash_algo()` are both required for
oid-related tests to run without errors. In the current implementation,
both functions are defined and declared in the
`t/unit-tests/lib-oid.{c,h}` which is utilized by oid-related tests in
the homegrown unit tests structure.
Adapt functions in lib-oid.{c,h} to use clar. Both these functions
become available for oid-related test files implemented using the clar
testing framework, which requires them. This will be used by subsequent
commits.
Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:34:21 +0000 (01:34 -0500)]
unpack_loose_rest(): rewrite return handling for clarity
We have a pattern like:
if (error1)
...handle error 1...
else if (error2)
...handle error 2...
else
...return buf...
...free buf and return NULL...
This is a little subtle because it is the return in the success block
that lets us skip the common error handling. Rewrite this instead to
free the buffer in each error path, marking it as NULL, and then all
code paths can use the common return.
This should make the logic a bit easier to follow. It does mean
duplicating the buf cleanup for errors, but it's a single line.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:33:51 +0000 (01:33 -0500)]
unpack_loose_rest(): simplify error handling
Inflating a loose object is considered successful only if we got
Z_STREAM_END and there were no more bytes. We check both of those
conditions and return success, but then have to check them a second time
to decide which error message to produce.
I.e., we do something like this:
if (!error_1 && !error_2)
...return success...
if (error_1)
...handle error1...
else if (error_2)
...handle error2...
...common error handling...
This repetition was the source of a small bug fixed in an earlier commit
(our Z_STREAM_END check was not the same in the two conditionals).
Instead we can chain them all into a single if/else cascade, which
avoids repeating ourselves:
if (error_1)
...handle error1...
else if (error_2)
...handle error2....
else
...return success...
...common error handling...
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:33:12 +0000 (01:33 -0500)]
unpack_loose_rest(): never clean up zstream
The unpack_loose_rest() function has funny ownership semantics: we pass
in a z_stream opened by the caller, but then only _sometimes_ close it.
This oddity has developed over time. When the function was originally
split out in 5180cacc20 (Split up unpack_sha1_file() some more,
2005-06-02), it always called inflateEnd() to clean up the stream
(though nowadays it is a git_zstream and we call git_inflate_end()).
But in 7efbff7531 (unpack_sha1_file(): detect corrupt loose object
files., 2007-03-05) we added error code paths which don't close the
stream. This makes some sense, as we'd still look at parts of the stream
struct to decide which error to show (though I am not sure in practice
if inflateEnd() even touches those fields).
This subtlety makes it hard to know when the caller has to clean up the
stream and when it does not. That led to the leak fixed by aa9ef614dc
(object-file: fix memory leak when reading corrupted headers,
2024-08-14).
Let's instead always leave the stream intact, forcing the caller to
clean it up. You might think that would create more work for the
callers, but it actually ends up simplifying them, since they can put
the call to git_inflate_end() in the common cleanup code path.
Two things to note, though:
- The check_stream_oid() function is used as a replacement for
unpack_loose_rest() in read_loose_object() to read blobs. It
inherited the same funny semantics, and we should fix it here, too
(to keep the cleanup in read_loose_object() consistent).
- In read_loose_object() we need a second "out" label, as we can jump
to the existing label before opening the stream at all (and since
the struct is opaque, there is no way to if it was initialized or
not, so we must not call git_inflate_end() in that case).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:31:15 +0000 (01:31 -0500)]
unpack_loose_rest(): avoid numeric comparison of zlib status
When unpacking the actual content of a loose object file, we insist both
that the status code we got is Z_STREAM_END, and that we consumed all
bytes.
If we didn't, we'll return an error, but the specific error message we
produce depends on which of the two error conditions we saw. So we'll
check both a second time to decide which error to produce. But this
second time, our status code check is loose: it checks for a negative
status value.
This can get confused by zlib codes which are not negative, such as
Z_NEED_DICT. In this case we'd erroneously print nothing at all, when we
should say "corrupt loose object".
Instead, this second check should check explicitly against Z_STREAM_END.
Note that Z_OK is "0", so the existing code also produced no message for
Z_OK. But it's impossible to see that status, since we only break out of
the inflate loop when we stop seeing Z_OK (so a stream which has more
bytes than its object header claims would eventually yield Z_BUF_ERROR).
There's no test here, as it would require a loose object whose zlib
stream returns Z_NEED_DICT in the middle of the object content. I think
that is probably possible, but even our Z_NEED_DICT test in t1006 does
not trigger this, because we hit that error while reading the header. I
found this bug while reviewing all callers of git_inflate() for bugs
similar to the one we saw in unpack_loose_header(). This was the only
other case that did a numeric comparison rather than explicitly checking
for Z_STREAM_END.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:30:56 +0000 (01:30 -0500)]
unpack_loose_header(): avoid numeric comparison of zlib status
When unpacking a loose header, we try to inflate the first 32 bytes.
We'd expect either Z_OK (we filled up the output buffer, but there are
more bytes in the object) or Z_STREAM_END (this is a tiny object whose
header and content fit in the buffer).
We check for that with "if (status < Z_OK)", making the assumption that
all of the errors we'd see have negative values (as Z_OK itself is "0",
and Z_STREAM_END is "1").
But there's at least one case this misses: Z_NEED_DICT is "2". This
isn't something we'd ever expect to see, but if we do see it, we should
consider it an error (since we have no dictionary to load).
Instead, the current code interprets Z_NEED_DICT as success and looks
for the object header's terminating NUL in the bytes we've read. This
will generaly be zero bytes if the dictionary is mentioned at the start
of the stream. So we'll fail to find it and complain "the header is too
long" (ULHR_LONG). But really, the problem is that the object is
malformed, and we should return ULHR_BAD.
This is a minor bug, as we consider both cases to be an error. But it
does mean we print the wrong error message. The test case added in the
previous patch triggers this code, so we can just confirm the error
message we see here.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:30:26 +0000 (01:30 -0500)]
git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT
This fixes a case where malformed object input can cause us to hit a
BUG() call in the git-zlib.c code.
The zlib format allows the use of preset dictionaries to reduce the size
of deflated data. The checksum of the dictionary is computed by the
deflate code and goes into the stream. On the inflating side, zlib sees
the dictionary checksum and returns Z_NEED_DICT, asking the caller to
provide the dictionary data via inflateSetDictionary().
This should never happen in Git, because we never provide a dictionary
for deflating (and if we get a stream that mentions a dictionary, we
have no idea how to provide it). So normally Z_NEED_DICT is a hard error
for us. But something interesting happens if we _do_ happen to see it
(e.g., because of a corrupt or malicious input).
In git_inflate() as we loop over calls to zlib's inflate(), we translate
between our large-integer git_zstream values and zlib's native z_stream
types, copying in and out with zlib_pre_call() and zlib_post_call(). In
zlib_post_call() we have a few sanity checks, including one that checks
that the number of bytes consumed by zlib (as measured by it moving the
"next_in" pointer) is equal to the movement of its "total_in" count.
But these do not correspond when we see Z_NEED_DICT! Zlib consumes the
bytes from the input buffer but it does not increment total_in. And so
we hit the BUG("total_in mismatch") call.
There are a few options here:
- We could ditch that BUG() check. It is making too many assumptions
about how zlib updates these values. But it does have value in most
cases as a sanity check on the values we're copying.
- We could skip the zlib_post_call() entirely when we see Z_NEED_DICT.
We know that it's hard error for us, so we should just send the
status up the stack and let the caller bail.
The downside is that if we ever did want to support dictionaries,
we couldn't (the git_zstream will be out of sync, since we never
copied its values back from the z_stream).
- We could continue to call zlib_post_call(), but skip just that BUG()
check if the status is Z_NEED_DICT. This keeps git_inflate() as a
thin wrapper around inflate(), and would let us later support
dictionaries for some calls if we wanted to.
This patch uses the third approach. It seems like the least-surprising
thing to keep git_inflate() a close to inflate() as possible. And while
it makes the diff a bit larger (since we have to pass the status down to
to the zlib_post_call() function), it's a static local function, and
every caller by definition will have just made a zlib call (and so will
have a status integer).
Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:29:58 +0000 (01:29 -0500)]
unpack_loose_header(): fix infinite loop on broken zlib input
When reading a loose object, we first try to expand the first 32 bytes
to read the type+size header. This is enough for any of the normal Git
types. But since 46f034483e (sha1_file: support reading from a loose
object of unknown type, 2015-05-03), the caller can also ask us to parse
any unknown names, which can be much longer. In this case we keep
inflating until we find the NUL at the end of the header, or hit
Z_STREAM_END.
But what if zlib can't make forward progress? For example, if the loose
object file is truncated, we'll have no more data to feed it. It will
return Z_BUF_ERROR, and we'll just loop infinitely, calling
git_inflate() over and over but never seeing new bytes nor an
end-of-stream marker.
We can fix this by only looping when we think we can make forward
progress. This will always be Z_OK in this case. In other code we might
also be able to continue on Z_BUF_ERROR, but:
- We will never see Z_BUF_ERROR because the output buffer is full; we
always feed a fresh 32-byte buffer on each call to git_inflate().
- We may see Z_BUF_ERROR if we run out of input. But since we've fed
the whole mmap'd buffer to zlib, if it runs out of input there is
nothing more we can do.
So if we don't see Z_OK (and didn't see the end-of-header NUL, otherwise
we'd have broken out of the loop), then we should stop looping and
return an error.
The test case shows an example where the input is truncated (which gives
us the input Z_BUF_ERROR case above).
Although we do operate on objects we might get from an untrusted remote,
I don't think the security implications of this bug are too great. It
can only trigger if both of these are true:
- You're reading a loose object whose on-disk representation was
written by an attacker. So fetching an object (or receiving a push)
are mostly OK, because even with unpack-objects it is our local,
trusted code that writes out the object file. The exception may be
fetching from an untrusted local repo, or using dumb-http, which
copies objects verbatim. But...
- The only code path which triggers the inflate loop is cat-file's
--allow-unknown-type option. This is unlikely to be called at all
outside of debugging. But I also suspect that objects with
non-standard types (or that are truncated) would not survive the
usual fetch/receive checks in the first place.
So I think it would be quite hard to trick somebody into running the
infinite loop, and we can just fix the bug.
Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:29:40 +0000 (01:29 -0500)]
unpack_loose_header(): report headers without NUL as "bad"
If a caller asks us to read the whole loose object header value into a
strbuf (e.g., via "cat-file --allow-unknown-type"), we'll keep reading
until we see a NUL byte marking the end of the header.
If we hit Z_STREAM_END before seeing the NUL, we obviously have to stop.
But we return ULHR_TOO_LONG, which doesn't make any sense. The "too
long" return code is used in the normal, 32-byte limited mode to
indicate that we stopped looking. There is no such thing as "too long"
here, as we'd keep reading forever until we see the end of stream or the
NUL.
Instead, we should return ULHR_BAD. The loose object has no NUL marking
the end of header, so it is malformed. The behavior difference is
slight; in either case we'd consider the object unreadable and refuse to
go further. The only difference is the specific error message we
produce.
There's no test case here, as we'd need to generate a valid zlib stream
without a NUL. That's not something Git will do without writing new
custom code. And in the next patch we'll fix another bug in this area
which will make this easier to do (and we will test it then).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using OBJECT_INFO_ALLOW_UNKNOWN_TYPE to unpack a header that
doesn't fit into our initial 32-byte buffer, we loop over calls
git_inflate(), feeding it our buffer to the "next_out" pointer each
time. As the code is written, we reset next_out after each inflate call
(and after reading the output), ready for the next loop.
This isn't wrong, but there are a few advantages to setting up
"next_out" right before each inflate call, rather than after:
1. It drops a few duplicated lines of code.
2. It makes it obvious that we always feed a fresh buffer on each call
(and thus can never see Z_BUF_ERROR due to due to a lack of output
space).
3. After we exit the loop, we'll leave stream->next_out pointing to
the end of the fetched data (this is how zlib callers find out how
much data is in the buffer). This doesn't matter in practice, since
nobody looks at it again. But it's probably the least-surprising
thing to do, as it matches how next_out is left when the whole
thing fits in the initial 32-byte buffer (and we don't enter the
loop at all).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:28:24 +0000 (01:28 -0500)]
loose_object_info(): BUG() on inflating content with unknown type
After unpack_loose_header() returns, it will have inflated not only the
object header, but possibly some bytes of the object content. When we
call unpack_loose_rest() to extract the actual content, it finds those
extra bytes by skipping past the header's terminating NUL in the buffer.
Like this:
int bytes = strlen(buffer) + 1;
n = stream->total_out - bytes;
...
memcpy(buf, (char *) buffer + bytes, n);
This won't work with the OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag, as there
we allow a header of arbitrary size. We put into a strbuf, but feed only
the final 32-byte chunk we read to unpack_loose_rest(). In that case
stream->total_out may unexpectedly large, and thus our "n" will be
large, causing an out-of-bounds read (we do check it against our
allocated buffer size, which prevents an out-of-bounds write).
Probably this could be made to work by feeding the strbuf to
unpack_loose_rest(), along with adjusting some types (e.g., "bytes"
would need to be a size_t, since it is no longer operating on a 32-byte
buffer).
But I don't think it's possible to actually trigger this in practice.
The only caller who passes ALLOW_UNKNOWN_TYPE is cat-file, which only
allows it with the "-t" and "-s" options (neither of which access the
content). There is one way you can _almost_ trigger it: the oid compat
routines (i.e., accessing sha1 via sha256 names and vice versa) will
convert objects on the fly (which requires access to the content) using
the same flags that were passed in. So in theory this:
t='some very large type field that causes an extra inflate call'
sha1_oid=$(git hash-object -w -t "$t" file)
sha256_oid=$(git rev-parse --output-object-format=sha256 $sha1_oid)
git cat-file --allow-unknown-type -s $sha256_oid
would try to access the content. But it doesn't work, because using
compat objects requires an entry in the .git/objects/loose-object-idx
file, and we don't generate such an entry for non-standard types (see
the "compat" section of write_object_file_literally()).
If we use "t=blob" instead, then it does access the compat object, but
it doesn't trigger the problem (because "blob" is a standard short type
name, and it fits in the initial 32-byte buffer).
So given that this is almost a memory error bug, I think it's worth
addressing. But because we can't actually trigger the situation, I'm
hesitant to try a fix that we can't run. Instead let's document the
restriction and protect ourselves from the out-of-bounds read by adding
a BUG() check.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 23 Feb 2025 08:26:04 +0000 (09:26 +0100)]
commit: avoid parent list buildup in clear_commit_marks_many()
clear_commit_marks_1() clears the marks of the first parent and its
first parent and so on, and saves the higher numbered parents in a list
for later. There is no benefit in keeping that list growing with each
handled commit. Clear it after each run to reduce peak memory usage.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
brian m. carlson [Sun, 23 Feb 2025 01:53:31 +0000 (01:53 +0000)]
http: allow using netrc for WebDAV-based HTTP protocol
For an extended period of time, we've enabled libcurl's netrc
functionality, which will read credentials from the netrc file if none
are provided. Unfortunately, we have also not documented this fact or
written any tests for it, but people have come to rely on it.
In 610cbc1dfb ("http: allow authenticating proactively", 2024-07-10), we
accidentally broke the ability of users to use the netrc file for the
WebDAV-based HTTP protocol. Notably, it works on the initial request
but does not work on subsequent requests, which causes failures because
that version of the protocol will necessarily make multiple requests.
This happens because curl_empty_auth_enabled never returns -1, only 0 or
1, and so if http.proactiveAuth is not enabled, the username and
password are always set to empty credentials, which prevents libcurl's
fallback to netrc from working. However, in other cases, the server
continues to get a 401 response and the credential helper is invoked,
which is the normal behavior, so this was not noticed earlier.
To fix this, change the condition to check for enabling empty auth and
also not having proactive auth enabled, which should result in the
username and password not being set to a single colon in the typical
case, and thus the netrc file being used.
Reported-by: Peter Georg <peter.georg@physik.uni-regensburg.de> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This appears to happen because of the NULL pointer name passed into
map_user(). Fix this by passing "" instead of NULL so that we have a
valid pointer.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eliah Kagan [Fri, 21 Feb 2025 12:01:36 +0000 (12:01 +0000)]
compat/mingw: rename the symlink, not the target
Since 183ea3ea (Merge branch 'ps/mingw-rename', 2024-11-13),
a new technique is used on Windows to rename files, where supported.
The first step of this technique is to open the file with
`CreateFileW`. At that time, `FILE_ATTRIBUTE_NORMAL` was passed as
the value of the `dwFlagsAndAttributes` argument. In b30404df [2], this
was improved by passing `FILE_FLAG_BACKUP_SEMANTICS`, to support
directories as well as regular files.
However, neither value of `dwFlagsAndAttributes` is sufficient to open
a symbolic link with the correct semantics to rename it. Symlinks on
Windows are reparse points. Attempting to open a reparse point with
`CreateFileW` dereferences the reparse point and opens the target
instead, unless `FILE_FLAG_OPEN_REPARSE_POINT` is included in
`dwFlagsAndAttributes`. This is documented for that flag and in the
"Symbolic Link Behavior" section of the `CreateFileW` docs [3].
This produces a regression where attempting to rename a symlink on
Windows renames its target to the intended new name and location of the
symlink. For example, if `symlink` points to `file`, then running
git mv symlink symlink-renamed
leaves `symlink` in place and unchanged, but renames `file` to
`symlink-renamed` [4].
This regression is detectable by existing tests in `t7001-mv.sh`, but
the tests must be run by a Windows user with the ability to create
symlinks, and the `ln -s` command used to create the initial symlink
must also be able to create a real symlink (such as by setting the
`MSYS` environment variable to `winsymlinks:nativestrict`). Then
these two tests fail if the regression is present, and pass otherwise:
38 - git mv should overwrite file with a symlink
39 - check moved symlink
Let's fix this, so that renaming a symlink again renames the symlink
itself and leaves the target unchanged, by passing
as the `dwFlagsAndAttributes` argument. This is sufficient (and safe)
because including `FILE_FLAG_OPEN_REPARSE_POINT` causes no harm even
when used to open a file or directory that is not a reparse point. In
that case, as noted in [3], this flag is simply ignored.
Signed-off-by: Eliah Kagan <eliah.kagan@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karthik Nayak [Fri, 21 Feb 2025 10:04:23 +0000 (11:04 +0100)]
builtin/refs: add '--no-reflog' flag to drop reflogs
The "git refs migrate" subcommand converts the backend used for ref
storage. It always migrates reflog data as well as refs. Introduce an
option to exclude reflogs from migration, allowing them to be discarded
when they are unnecessary.
This is particularly useful in server-side repositories, where reflogs
are typically not expected. However, some repositories may still have
them due to historical reasons, such as bugs, misconfigurations, or
administrative decisions to enable reflogs for debugging. In such
repositories, it would be optimal to drop reflogs during the migration.
To address this, introduce the '--no-reflog' flag, which prevents reflog
migration. When this flag is used, reflogs from the original reference
backend are migrated. Since only the new reference backend remains in
the repository, all previous reflogs are permanently discarded.
Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ci: fix propagating UTF-8 test locale in musl-based Meson job
The musl-based Meson job is supposed to explicitly specify the UTF-8
locale used for testing, which has been introduced with 84bb5eeace7 (ci:
switch linux-musl to use Meson, 2025-01-28). That commit had two issues
though:
- We continue to refer to "linux-musl", even though the job has been
renamed in the same commit to "linux-musl-meson".
- We use the wrong option name to specify the locale. This was not
noticed though due to the first issue.
Fix both of these issues by fixing both the job and option naems.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 20 Feb 2025 13:59:56 +0000 (05:59 -0800)]
Merge branch 'master' of https://github.com/j6t/gitk
* 'master' of https://github.com/j6t/gitk:
gitk: introduce support for the Meson build system
gitk: extract script to build executable
gitk: make the "list references" default window width wider
gitk: fix arrow keys in input fields with Tcl/Tk >= 8.6
gitk: Use an external icon file on Windows
gitk: Unicode file name support
gitk(Windows): avoid inadvertently calling executables in the worktree
Johannes Sixt [Thu, 20 Feb 2025 09:53:53 +0000 (10:53 +0100)]
Merge branch 'g4w-gitk' of https://github.com/dscho/gitk
* 'g4w-gitk' of https://github.com/dscho/gitk:
gitk: make the "list references" default window width wider
gitk: fix arrow keys in input fields with Tcl/Tk >= 8.6
gitk: Use an external icon file on Windows
gitk: Unicode file name support
gitk(Windows): avoid inadvertently calling executables in the worktree
gitk: introduce support for the Meson build system
Upstream Git has introduced support for the Meson build system.
Introduce support for Meson into gitk, as well, so that Git can easily
build its vendored copy of Gitk via a `subproject()` directive. The
instructions can be set up as follows:
Specific options, like for example where Gitk shall be installed to, can
be specified at setup time via `-D`. Available options can be discovered
by running `meson configure` either in the source or build directory.
Usman Akinyemi [Sat, 15 Feb 2025 15:50:52 +0000 (21:20 +0530)]
agent: advertise OS name via agent capability
As some issues that can happen with a Git client can be operating system
specific, it can be useful for a server to know which OS a client is
using. In the same way it can be useful for a client to know which OS
a server is using.
Our current agent capability is in the form of "package/version" (e.g.,
"git/1.8.3.1"). Let's extend it to include the operating system name (os)
i.e in the form "package/version-os" (e.g., "git/1.8.3.1-Linux").
Including OS details in the agent capability simplifies implementation,
maintains backward compatibility, avoids introducing a new capability,
encourages adoption across Git-compatible software, and enhances
debugging by providing complete environment information without affecting
functionality. The operating system name is retrieved using the 'sysname'
field of the `uname(2)` system call or its equivalent.
However, there are differences between `uname(1)` (command-line utility)
and `uname(2)` (system call) outputs on Windows. These discrepancies
complicate testing on Windows platforms. For example:
- `uname(1)` output: MINGW64_NT-10.0-20348.3.4.10-87d57229.x86_64\
.2024-02-14.20:17.UTC.x86_64
- `uname(2)` output: Windows.10.0.20348
On Windows, uname(2) is not actually system-supplied but is instead
already faked up by Git itself. We could have overcome the test issue
on Windows by implementing a new `uname` subcommand in `test-tool`
using uname(2), but except uname(2), which would be tested against
itself, there would be nothing platform specific, so it's just simpler
to disable the tests on Windows.
Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Peter Oliver [Tue, 18 Feb 2025 15:30:43 +0000 (15:30 +0000)]
meson: fix Perl version check for Meson versions before 1.7.0
Command `perl --version` says, e.g., “This is perl 5, version 26,
subversion 0 (v5.26.0)”, which older versions of Meson interpret as
version 26.
This will be fixed in Meson 1.7.0, but at the time of writing that isn’t
yet released.
If we run `perl -V:version` we get the unambiguous response
“version='5.26.0';”, but we need at least Meson 1.5.0 to be able to do that.
Note that Perl are seriously considering dropping the leading 5 entirely
in the near future (https://perl.github.io/PPCs/ppc0025-perl-version/),
but that shouldn’t affect us.
Signed-off-by: Peter Oliver <git@mavit.org.uk> Co-authored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Peter Oliver [Tue, 18 Feb 2025 15:30:42 +0000 (15:30 +0000)]
meson: bump minimum required Perl version to 5.26.0
Commit 702d8c1f3b (Require Perl 5.26.0, 2024-10-23) dropped support
for Perl versions older than 5.26.0. The Meson build system, which
has been developed in parallel to that commit, hasn't been bumped
accordingly and thus still requires Perl 5.8.1 or newer.
Fix this by requiring Perl 5.26.0 or newer with Meson.
Signed-off-by: Peter Oliver <git@mavit.org.uk> Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 18 Feb 2025 23:30:32 +0000 (15:30 -0800)]
Merge branch 'da/difftool-sans-the-repository'
"git difftool" code clean-up.
* da/difftool-sans-the-repository:
difftool: eliminate use of USE_THE_REPOSITORY_VARIABLE
difftool: eliminate use of the_repository
difftool: eliminate use of global variables
"git push --atomic --porcelain" used to ignore failures from the
other side, losing the error status from the child process, which
has been corrected.
* ps/send-pack-unhide-error-in-atomic-push:
send-pack: gracefully close the connection for atomic push
t5543: atomic push reports exit code failure
send-pack: new return code "ERROR_SEND_PACK_BAD_REF_STATUS"
t5548: add porcelain push test cases for dry-run mode
t5548: add new porcelain test cases
t5548: refactor test cases by resetting upstream
t5548: refactor to reuse setup_upstream() function
t5504: modernize test by moving heredocs into test bodies
Junio C Hamano [Tue, 18 Feb 2025 23:30:31 +0000 (15:30 -0800)]
Merge branch 'ds/backfill'
Lazy-loading missing files in a blobless clone on demand is costly
as it tends to be one-blob-at-a-time. "git backfill" is introduced
to help bulk-download necessary files beforehand.
* ds/backfill:
backfill: assume --sparse when sparse-checkout is enabled
backfill: add --sparse option
backfill: add --min-batch-size=<n> option
backfill: basic functionality and tests
backfill: add builtin boilerplate
reftable: ignore file-in-use errors when unlink(3p) fails on Windows
Unlinking a file may fail on Windows systems when the file is still held
open by another process. This is incompatible with POSIX semantics and
by extension with Git's assumed semantics when unlinking files, which
is that files can be unlinked regardless of whether they are still open
or not. To counteract this incompatibility, we have some custom error
handling in the `mingw_unlink()` wrapper that first retries the deletion
with some delay, and then asks the user whether we should continue to
retry.
While this logic might be sensible in many callsites throughout Git, it
is less when used in the reftable library. We only use unlink(3) there
to delete tables which aren't referenced anymore, and the code is very
aware of the limitations on Windows. As such, all calls to unlink(3p)
don't perform any error checking at all and are fine with the call
failing.
Instead, the library provides the `reftable_stack_clean()` function,
which Git knows to execute in git-pack-refs(1) after compacting a stack.
The effect of this function is that all stale tables will eventually get
deleted once they aren't kept open anymore.
So while we're fine with unlink(3p) failing, the Windows-emulation of
that function will still perform several sleeps and ultimately end up
asking the user:
$ git pack-refs
Unlink of file 'C:/temp/jgittest/jgit/.git/reftable/0x000000000002-0x000000000004-50486d0e.ref' failed. Should I try again? (y/n) n
Unlink of file 'C:/temp/jgittest/jgit/.git/reftable/0x000000000002-0x000000000004-50486d0e.ref' failed. Should I try again? (y/n) n
Unlink of file 'C:/temp/jgittest/jgit/.git/reftable/0x000000000002-0x000000000004-50486d0e.ref' failed. Should I try again? (y/n) n
It even asks multiple times, which is doubly annoying and puzzling to
the user:
1. It asks when trying to delete the old file after having written the
compacted stack.
2. It asks when reloading the stack, where it will try to unlink
now-unreferenced tables.
3. It asks when calling `reftable_stack_clean()`, where it will try to
unlink now-stale tables.
Fix the issue by making it possible to disable this behaviour with a
preprocessor define. As "git-compat-util.h" is only included from
"system.h", and given that "system.h" is only ever included by headers
and code that are internal to the reftable library, we can set that
macro in this header without impacting anything else but the reftable
library.
Reported-by: Christian Reich <Zottelbart@t-online.de> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 18 Feb 2025 22:29:04 +0000 (14:29 -0800)]
Merge branch 'ps/reftable-sans-compat-util' into ps/reftable-windows-unlink-fix
* ps/reftable-sans-compat-util:
Makefile: skip reftable library for Coccinelle
reftable: decouple from Git codebase by pulling in "compat/posix.h"
git-compat-util.h: split out POSIX-emulating bits
compat/mingw: split out POSIX-related bits
reftable/basics: introduce `REFTABLE_UNUSED` annotation
reftable/basics: stop using `SWAP()` macro
reftable/stack: stop using `sleep_millisec()`
reftable/system: introduce `reftable_rand()`
reftable/reader: stop using `ARRAY_SIZE()` macro
reftable/basics: provide wrappers for big endian conversion
reftable/basics: stop using `st_mult()` in array allocators
reftable: stop using `BUG()` in trivial cases
reftable/record: don't `BUG()` in `reftable_record_cmp()`
reftable/record: stop using `BUG()` in `reftable_record_init()`
reftable/record: stop using `COPY_ARRAY()`
reftable/blocksource: stop using `xmmap()`
reftable/stack: stop using `write_in_full()`
reftable/stack: stop using `read_in_full()`
Wire up static analysis via Coccinelle via a new test target
"coccicheck". This target can be executed via `meson compile coccicheck`
and generates the semantic patch for us.
Note that we don't hardcode the list of source and header files that
shall be analyzed, and instead use git-ls-files(1) to find them for us.
This is because we also want to analyze files that may not get built on
the current platform, so finding all sources at configure time is easier
than introducing a new variable that tracks all sources, including those
which aren't being built.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've got a couple of credential helpers in "contrib/credential", all
of which aren't yet wired up via Meson. Do so.
Note that ideally, we'd also wire up t0303 to be executed with each of
the credential helpers to verify their functionality. Unfortunately
though, none of them pass the test suite right now, so this is left for
a future change.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
contrib/credential: fix compilation of "osxkeychain" helper
The "osxkeychain" helper does not compile due to a warning generated by
the unused `argc` parameter. Fix the warning by checking for the minimum
number of required arguments explicitly in the least restrictive way
possible.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "libsecret" credential helper does not compile when developer
warnings are enabled due to three warnings:
- contrib/credential/libsecret/git-credential-libsecret.c:78:1:
missing initializer for field ‘reserved’ of ‘SecretSchema’
[-Werror=missing-field-initializers]. This issue is fixed by using
designated initializers.
- contrib/credential/libsecret/git-credential-libsecret.c:171:43:
comparison of integer expressions of different signedness: ‘int’
and ‘guint’ {aka ‘unsigned int’} [-Werror=sign-compare]. This
issue is fixed by using an unsigned variable to iterate through
the string vector.
- contrib/credential/libsecret/git-credential-libsecret.c:420:14:
unused parameter ‘argc’ [-Werror=unused-parameter]. This issue is
fixed by checking the number of arguments, but in the least
restrictive way possible.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
M Hickford [Tue, 18 Feb 2025 07:45:51 +0000 (08:45 +0100)]
contrib/credential: fix compilation of wincred helper with MSVC
The git-credential-wincred helper does not compile on Windows with
Microsoft Visual Studio because of our use of `__attribute__()`, which
its compiler doesn't support. While the rest of our codebase would know
to handle this because we redefine the macro in "compat/msvc.h", this
stub isn't available here because we don't include "git-compat-util.h"
in the first place.
Fix the issue by making the attribute depend on the `_MSC_VER`
preprocessor macro.
Signed-off-by: M Hickford <mirth.hickford@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
contrib/credential: fix "netrc" tests with out-of-tree builds
Tests of the "netrc" credential helper aren't prepared to handle
out-of-tree builds:
- They expect the "test.pl" script to be located relative to the build
directory, even though it is located in the source directory.
- They expect the built "git-credential-netrc" helper to be located
relative to the "test.pl" file, evne though it is loated in the
build directory.
This works alright as long as source and build directories are the same,
but starts to break apart with Meson.
Fix these first issue by using the new "GIT_SOURCE_DIR" variable to
locate the test script itself. And fix the second issue by introducing a
new environment variable "CREDENTIAL_NETRC_PATH" that can be set for
out-of-tree builds to locate the built credential helper.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of our tests require knowledge around where to find the
project's source directory in order to locate files required for the
test itself. Until now we have been wiring these up ad-hoc via new,
specialized variables catered to the specific usecase. This is quite
awkward though, as every test that potentially needs to locate paths
relative to the source directory needs to grow another variable.
Introduce a new "GIT_SOURCE_DIR" variable into GIT-BUILD-OPTIONS to stop
this proliferation. Remove existing variables that can be derived from
it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Tue, 18 Feb 2025 11:32:03 +0000 (12:32 +0100)]
promisor-remote: check advertised name or URL
A previous commit introduced a "promisor.acceptFromServer" configuration
variable with only "None" or "All" as valid values.
Let's introduce "KnownName" and "KnownUrl" as valid values for this
configuration option to give more choice to a client about which
promisor remotes it might accept among those that the server advertised.
In case of "KnownName", the client will accept promisor remotes which
are already configured on the client and have the same name as those
advertised by the client. This could be useful in a corporate setup
where servers and clients are trusted to not switch names and URLs, but
where some kind of control is still useful.
In case of "KnownUrl", the client will accept promisor remotes which
have both the same name and the same URL configured on the client as the
name and URL advertised by the server. This is the most secure option,
so it should be used if possible.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Tue, 18 Feb 2025 11:32:02 +0000 (12:32 +0100)]
Add 'promisor-remote' capability to protocol v2
When a server S knows that some objects from a repository are available
from a promisor remote X, S might want to suggest to a client C cloning
or fetching the repo from S that C may use X directly instead of S for
these objects.
Note that this could happen both in the case S itself doesn't have the
objects and borrows them from X, and in the case S has the objects but
knows that X is better connected to the world (e.g., it is in a
$LARGEINTERNETCOMPANY datacenter with petabit/s backbone connections)
than S. Implementation of the latter case, which would require S to
omit in its response the objects available on X, is left for future
improvement though.
Then C might or might not, want to get the objects from X. If S and C
can agree on C using X directly, S can then omit objects that can be
obtained from X when answering C's request.
To allow S and C to agree and let each other know about C using X or
not, let's introduce a new "promisor-remote" capability in the
protocol v2, as well as a few new configuration variables:
- "promisor.advertise" on the server side, and:
- "promisor.acceptFromServer" on the client side.
By default, or if "promisor.advertise" is set to 'false', a server S will
not advertise the "promisor-remote" capability.
If S doesn't advertise the "promisor-remote" capability, then a client C
replying to S shouldn't advertise the "promisor-remote" capability
either.
If "promisor.advertise" is set to 'true', S will advertise its promisor
remotes with a string like:
promisor-remote=<pr-info>[;<pr-info>]...
where each <pr-info> element contains information about a single
promisor remote in the form:
name=<pr-name>[,url=<pr-url>]
where <pr-name> is the urlencoded name of a promisor remote and
<pr-url> is the urlencoded URL of the promisor remote named <pr-name>.
For now, the URL is passed in addition to the name. In the future, it
might be possible to pass other information like a filter-spec that the
client may use when cloning from S, or a token that the client may use
when retrieving objects from X.
It is C's responsibility to arrange how it can reach X though, so pieces
of information that are usually outside Git's concern, like proxy
configuration, must not be distributed over this protocol.
It might also be possible in the future for "promisor.advertise" to have
other values. For example a value like "onlyName" could prevent S from
advertising URLs, which could help in case C should use a different URL
for X than the URL S is using. (The URL S is using might be an internal
one on the server side for example.)
By default or if "promisor.acceptFromServer" is set to "None", C will
not accept to use the promisor remotes that might have been advertised
by S. In this case, C will not advertise any "promisor-remote"
capability in its reply to S.
If "promisor.acceptFromServer" is set to "All" and S advertised some
promisor remotes, then on the contrary, C will accept to use all the
promisor remotes that S advertised and C will reply with a string like:
promisor-remote=<pr-name>[;<pr-name>]...
where the <pr-name> elements are the urlencoded names of all the
promisor remotes S advertised.
In a following commit, other values for "promisor.acceptFromServer" will
be implemented, so that C will be able to decide the promisor remotes it
accepts depending on the name and URL it received from S. So even if
that name and URL information is not used much right now, it will be
needed soon.
Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library does not use any of the common helpers that the Git
project has. Consequently, most of the rules that we have in Coccinelle
do not apply to the library at all and may even generate false positives
when a pattern can be converted to use a Git helper function.
Exclude reftable library sources from being checked by Coccinelle to
avoid such false positives.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable: decouple from Git codebase by pulling in "compat/posix.h"
The reftable library includes "git-compat-util.h" in order to get a
POSIX-like programming environment that papers over various differences
between platforms. The header also brings with it a couple of helpers
specific to the Git codebase though, and over time we have started to
use these helpers in the reftable library, as well.
This makes it very hard to use the reftable library as a standalone
library without the rest of the Git codebase, so other libraries like
e.g. libgit2 cannot easily use it. But now that we have removed all
calls to Git-specific functionality and have split out "compat/posix.h"
as a separate header we can address this.
Stop including "git-compat-util.h" and instead include "compat/posix.h"
to finalize the decoupling of the reftable library from the rest of the
Git codebase. The only bits which remain specific to Git are "system.h"
and "system.c", which projects will have to provide.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git-compat-util.h" header is a treasure trove of various bits and
pieces used throughout the project. It basically mixes two different
things into one:
- Providing a POSIX-like interface even on platforms that aren't
POSIX-compliant.
- Providing low-level functionality that is specific to Git.
This intermixing is a bit of a problem for the reftable library as we
don't want to recreate the POSIX-like interface there. But neither do we
want to pull in the Git-specific functionality, as it is otherwise quite
easy to start depending on the Git codebase again.
Split out a new header "compat/posix.h" that only contains the bits and
pieces relevant for the emulation of POSIX, which we will start using in
the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split out POSIX-related bits from "compat/mingw.h" and "compat/msvc.h".
This is in preparation for splitting up "git-compat-utils.h" into a
header that provides POSIX-compatibility and a header that provides
common wrappers used by the Git project.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `SWAP()` macro in favor of an open-coded variant of it. Note
that this also requires us to open-code the build assert that `SWAP()`
itself uses to verify that the size of both variables matches.
This is done to reduce our dependency on the Git codebase.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor our use of `sleep_millisec()` by open-coding it with poll(3p),
which is the current implementation of this function. Ideally, we'd use
a more direct way to sleep, but there is no equivalent to sleep(3p) that
would accept milliseconds as input.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new system-level `reftable_rand()` function that generates a
single unsigned integer for us. The implementation of this function is
to be provided by the calling codebase, which allows us to more easily
hook into pre-seeded random number generators.
Adapt the two callsites where we generated random data.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/basics: provide wrappers for big endian conversion
We're using a mixture of big endian conversion functions provided by
both the reftable library, but also by the Git codebase. Refactor the
code so that we exclusively use reftable-provided wrappers in order to
untangle us from the Git codebase.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `BUG()` in the remaining trivial cases that we still have in
the reftable library. Instead of aborting the program, we'll now bubble
up a `REFTABLE_API_ERROR` to indicate misuse of the calling conventions.
Note that in both `reftable_reader_{inc,dec}ref()` we simply stop
calling `BUG()` altogether. The only situation where the counter should
be zero is when the structure has already been free'd anyway, so we
would run into undefined behaviour regardless of whether we try to abort
the program or not.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/record: don't `BUG()` in `reftable_record_cmp()`
The reftable library aborts with a bug in case `reftable_record_cmp()`
is invoked with two records of differing types. This would cause the
program to die without the caller being able to handle the error, which
is not something we want in the context of library code. And it ties us
to the Git codebase.
Refactor the code such that `reftable_record_cmp()` returns an error
code separate from the actual comparison result. This requires us to
also adapt some callers up the callchain in a similar fashion.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/record: stop using `BUG()` in `reftable_record_init()`
We're aborting the program via `BUG()` in case `reftable_record_init()`
was invoked with an unknown record type. This is bad because we may now
die in library code, and because it makes us depend on the Git codebase.
Refactor the code such that `reftable_record_init()` can return an error
code to the caller. Adapt any callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Drop our use of `COPY_ARRAY()`, replacing it with an open-coded variant
thereof. This is done to reduce our dependency on the Git library.
While at it, guard the whole array copy logic so that we only copy it in
case there actually is anything to be copied. Otherwise, we may end up
trying to allocate a zero-sized array, which will return a NULL pointer
and thus cause us to return an `REFTABLE_OUT_OF_MEMORY_ERROR`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>