Todd Zullinger [Mon, 3 Mar 2025 20:44:07 +0000 (15:44 -0500)]
howto/new-command: update reference to builtin docs
Commit ec14d4ecb5 (builtin.h: take over documentation from
api-builtin.txt, 2017-08-02) deleted api-builtin.txt and moved the
contents into builtin.h. Most of the references were fixed in d85e9448dd (new-command.txt: update reference to builtin docs,
2023-02-04), but one remained. Fix it.
Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Todd Zullinger [Mon, 3 Mar 2025 20:44:01 +0000 (15:44 -0500)]
doc: remove unneeded .gitattributes
The top-level .gitattributes file contains entries for the Documentation
tree. Documentation/.gitattributes has not been touched since it was
added in 14f9e128d3 (Define the project whitespace policy, 2008-02-10).
Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Todd Zullinger [Mon, 3 Mar 2025 20:44:00 +0000 (15:44 -0500)]
.gitattributes: more *.txt -> *.adoc updates
All Documentation files now end in .adoc. Update the entries for
git-merge.adoc, gitk.adoc, and user-manual.adoc to properly set the
conflict-marker-size attribute.
Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Todd Zullinger [Mon, 3 Mar 2025 20:43:59 +0000 (15:43 -0500)]
t0450: *.txt -> *.adoc fixes
After 1f010d6bdf (doc: use .adoc extension for AsciiDoc files,
2025-01-20), we no longer matched any files in this test. The result is
that we did not test for mismatches in the documentation and --help
output.
Adjust the test to look at the renamed *.adoc files.
Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 28 Feb 2025 17:28:21 +0000 (09:28 -0800)]
BreakingChanges: clarify the procedure
The point behind a compile-time switch is to ensure that we have a
mechanism to hide myriad of backward incompatible changes that may
be prepared and accumulated over time, yet make them available for
testing any time during the development toward the big version
boundary. Add a few words to stress that point.
Since the document was first written, we have added the CI job that
the document anticipated us to have. Rephrase to state the current
status.
The discussion in [*1*] made us abandon the "feature.git3" based
runtime switching of behaviour and instead adopt the compile-time
switching mechanism, but a stray sentence about runtime switching
still remained in the final text by mistake. Remove it.
Christian Couder [Tue, 18 Feb 2025 11:32:04 +0000 (12:32 +0100)]
doc: add technical design doc for large object promisors
Let's add a design doc about how we could improve handling liarge blobs
using "Large Object Promisors" (LOPs). It's a set of features with the
goal of using special dedicated promisor remotes to store large blobs,
and having them accessed directly by main remotes and clients.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 3 Mar 2025 16:53:01 +0000 (08:53 -0800)]
Merge branch 'ps/build-meson-fixes-0130'
Assorted fixes and improvements to the build procedure based on
meson.
* ps/build-meson-fixes-0130:
gitlab-ci: restrict maximum number of link jobs on Windows
meson: consistently use custom program paths to resolve programs
meson: fix overwritten `git` variable
meson: prevent finding sed(1) in a loop
meson: improve handling of `sane_tool_path` option
meson: improve PATH handling
meson: drop separate version library
meson: stop linking libcurl into all executables
meson: introduce `libgit_curl` dependency
meson: simplify use of the common-main library
meson: inline the static 'git' library
meson: fix OpenSSL fallback when not explicitly required
meson: fix exec path with enabled runtime prefix
Phillip Wood [Sun, 2 Mar 2025 16:02:30 +0000 (16:02 +0000)]
meson: fix building technical and howto docs
When our asciidoc files were renamed from "*.txt" to "*.adoc" in 1f010d6bdf7 (doc: use .adoc extension for AsciiDoc files, 2025-01-20)
the "meson.build" file in "Documentation" was updated but the
"meson.build" files in the "technical" and "howto" subdirectories were
not. This causes the meson build to fail when configured with
-Ddocs=html. Fix this by updating the relevant "meson.build" files.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diffs queued from git-diff-pairs(1) are flushed when stdin is
closed. To enable greater flexibility, allow control over when the diff
queue is flushed by writing a single NUL byte on stdin between input
file pairs. Diff output between flushes is separated by a single NUL
byte.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Fri, 28 Feb 2025 21:33:45 +0000 (15:33 -0600)]
builtin: introduce diff-pairs command
Through git-diff(1), a single diff can be generated from a pair of blob
revisions directly. Unfortunately, there is not a mechanism to compute
batches of specific file pair diffs in a single process. Such a feature
is particularly useful on the server-side where diffing between a large
set of changes is not feasible all at once due to timeout concerns.
To facilitate this, introduce git-diff-pairs(1) which acts as a backend
passing its NUL-terminated raw diff format input from stdin through diff
machinery to produce various forms of output such as patch or raw.
The raw format was originally designed as an interchange format and
represents the contents of the diff_queued_diff list making it possible
to break the diff pipeline into separate stages. For example,
git-diff-tree(1) can be used as a frontend to compute file pairs to
queue and feed its raw output to git-diff-pairs(1) to compute patches.
With this, batches of diffs can be progressively generated without
having to recompute renames or retrieve object context. Something like
the following:
should generate the same output as `git diff-tree -p -M`. Furthermore,
each line of raw diff formatted input can also be individually fed to a
separate git-diff-pairs(1) process and still produce the same output.
Based-on-patch-by: Jeff King <peff@peff.net> Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Fri, 28 Feb 2025 21:33:44 +0000 (15:33 -0600)]
diff: add option to skip resolving diff statuses
By default, `diffcore_std()` resolves the statuses for queued diff file
pairs by calling `diff_resolve_rename_copy()`. If status information is
already manually set, invoking `diffcore_std()` may change the status
value.
Introduce the `skip_resolving_statuses` diff option that prevents
`diffcore_std()` from resolving file pair statuses when enabled.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Fri, 28 Feb 2025 21:33:43 +0000 (15:33 -0600)]
diff: return diff_filepair from diff queue helpers
The `diff_addremove()` and `diff_change()` functions set up and queue
diffs, but do not return the `diff_filepair` added to the queue. In a
subsequent commit, modifications to `diff_filepair` need to occur in
certain cases after being queued.
Since the existing `diff_addremove()` and `diff_change()` are also used
for callbacks in `diff_options` as types `add_remove_fn_t` and
`change_fn_t`, modifying the existing function signatures requires
further changes. The diff options for pruning use `file_add_remove()`
and `file_change()` where file pairs do not even get queued. Thus,
separate functions are implemented instead.
Split out the queuing operations into `diff_queue_addremove()` and
`diff_queue_change()` which also return a handle to the queued
`diff_filepair`. Both `diff_addremove()` and `diff_change()` are
reimplemented as thin wrappers around the new functions.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sat, 1 Mar 2025 18:25:10 +0000 (10:25 -0800)]
doc: fix build-docdep.perl
We renamed from .txt to .adoc all the asciidoc source files and
necessary includes. We also need to adjust the build-docdep tool to
work on files whose suffix is .adoc when computing the documentation
dependencies.
Todd Zullinger [Sat, 1 Mar 2025 15:36:02 +0000 (10:36 -0500)]
doc: update howto-index.sh for .adoc extensions
The .txt extensions were changed to .adoc in 1f010d6bdf (doc: use .adoc
extension for AsciiDoc files, 2025-01-20). This left broken links in
the generated howto-index.html.
Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
path: adjust last remaining users of `the_repository`
With the preceding refactorings we now only have a couple of implicit
users of `the_repository` left in the "path" subsystem, all of which
depend on global state via `calc_shared_perm()`. Make the dependency on
`the_repository` explicit by passing the repo as a parameter instead and
adjust callers accordingly.
Note that this change bubbles up into a couple of subsystems that were
previously declared as free from `the_repository`. Instead of marking
all of them as `the_repository`-dependent again, we instead use the
repository that is available in the calling context. There are three
exceptions though with "copy.c", "pack-write.c" and "tempfile.c".
Adjusting these would require us to adapt callsites all over the place,
so this is left for a future iteration.
Mark "path.c" as free from `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
environment: move access to "core.sharedRepository" into repo settings
Similar as with the preceding commit, we track "core.sharedRepository"
via a pair of global variables. Move them into `struct repo_settings` so
that we can instead track them per-repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
environment: move access to "core.hooksPath" into repo settings
The "core.hooksPath" setting is stored in a global variable and
populated via the `git_default_core_config`. This may cause issues in
the case where one is handling multiple different repositories in a
single process with different values for that config key, as we may or
may not see the correct value in that case. Furthermore, global state
blocks our path towards libification.
Refactor the code so that we instead store the value in `struct
repo_settings`. The value is computed as-needed and cached. The result
should be functionally the same as there aren't ever any code paths
where we'd execute hooks outside the context of a repository.
Note that this requires us to change the passed-in repository in the
`repo_git_path()` family of functions to be non-constant, as we call
`adjust_git_path()` there.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't provide a way to clear a `struct repo_settings`, and instead
open-code this in `repo_clear()`. This is mixing up concerns and means
that developers have to touch multiple files whenever they add a new
field to the structure in case the associated resources need to be
released.
Provide a new `repo_settings_clear()` function to improve this.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
path: drop `git_path()` in favor of `repo_git_path()`
Remove `git_path()` in favor of the `repo_git_path()` family of
functions, which makes the implicit dependency on `the_repository` go
away.
Note that `git_path()` returned a string allocated via `get_pathname()`,
which uses a rotating set of statically allocated buffers. Consequently,
callers didn't have to free the returned string. The same isn't true for
`repo_common_path()`, so we also have to add logic to free the returned
strings.
This refactoring also allows us to remove `repo_common_pathv()` as well
as `get_pathname()` from the public interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
rerere: let `rerere_path()` write paths into a caller-provided buffer
Same as with `get_worktree_git_dir()` a couple of commits ago, the
`rerere_path()` function returns paths that need not be free'd by the
caller because `git_path()` internally uses `get_pathname()`.
Refactor the function to instead accept a caller-provided buffer that
the path will be written into, passing on ownership to the caller. This
refactoring prepares us for the removal of `git_path()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 27 Feb 2025 23:22:59 +0000 (15:22 -0800)]
Merge branch 'ua/os-version-capability'
The value of "uname -s" is by default sent over the wire as a part
of the "version" capability.
* ua/os-version-capability:
agent: advertise OS name via agent capability
t5701: add setup test to remove side-effect dependency
version: extend get_uname_info() to hide system details
version: refactor get_uname_info()
version: refactor redact_non_printables()
version: replace manual ASCII checks with isprint() for clarity
shejialuo [Thu, 27 Feb 2025 16:07:48 +0000 (00:07 +0800)]
builtin/fsck: add `git refs verify` child process
At now, we have already implemented the ref consistency checks for both
"files-backend" and "packed-backend". Although we would check some
redundant things, it won't cause trouble. So, let's integrate it into
the "git-fsck(1)" command to get feedback from the users. And also by
calling "git refs verify" in "git-fsck(1)", we make sure that the new
added checks don't break.
Introduce a new function "fsck_refs" that initializes and runs a child
process to execute the "git refs verify" command. In order to provide
the user interface create a progress which makes the total task be 1.
It's hard to know how many loose refs we will check now. We might
improve this later.
Then, introduce the option to allow the user to disable checking ref
database consistency. Put this function in the very first execution
sequence of "git-fsck(1)" due to that we don't want the existing code of
"git-fsck(1)" which would implicitly check the consistency of refs to
die the program.
Last, update the test to exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
shejialuo [Thu, 27 Feb 2025 16:07:40 +0000 (00:07 +0800)]
packed-backend: check whether the "packed-refs" is sorted
When there is a "sorted" trait in the header of the "packed-refs" file,
it means that each entry is sorted increasingly by comparing the
refname. We should add checks to verify whether the "packed-refs" is
sorted in this case.
Update the "packed_fsck_ref_header" to know whether there is a "sorted"
trail in the header. It may seem that we could record all refnames
during the parsing process and then compare later. However, this is not
a good design due to the following reasons:
1. Because we need to store the state across the whole checking
lifetime, we would consume a lot of memory if there are many entries
in the "packed-refs" file.
2. We cannot reuse the existing compare function "cmp_packed_ref_records"
which cause repetition.
Because "cmp_packed_ref_records" needs an extra parameter "struct
snaphost", extract the common part into a new function
"cmp_packed_ref_records" to reuse this function to compare.
Then, create a new function "packed_fsck_ref_sorted" to parse the file
again and user the new fsck message "packedRefUnsorted(ERROR)" to report
to the user if the file is not sorted.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
"packed-backend.c::next_record" will parse the ref entry to check the
consistency. This function has already checked the following things:
1. Parse the main line of the ref entry to inspect whether the oid is
not correct. Then, check whether the next character is oid. Then
check the refname.
2. If the next line starts with '^', it would continue to parse the
peeled oid and check whether the last character is '\n'.
As we decide to implement the ref consistency check for "packed-refs",
let's port these two checks and update the test to exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
shejialuo [Thu, 27 Feb 2025 16:07:00 +0000 (00:07 +0800)]
packed-backend: check whether the refname contains NUL characters
"packed-backend.c::next_record" will use "check_refname_format" to check
the consistency of the refname. If it is not OK, the program will die.
However, it is reported in [1], we cannot catch some corruption. But we
already have the code path and we must miss out something.
In the above code, `p` is the start pointer of the refname and `eol` is
the next newline pointer. We calculate the length of the refname by
subtracting the two pointers. Then we add the memory range between `p`
and `eol` to get the refname.
However, if there are some NUL characters in the memory range between `p`
and `eol`, we will see the refname as a valid ref name as long as the
memory range between `p` and first occurred NUL character is valid.
In order to catch above corruption, create a new function
"refname_contains_nul" by searching the first NUL character. If it is
not at the end of the string, there must be some NUL characters in the
refname.
Use this function in "next_record" function to die the program if
"refname_contains_nul" returns true.
Reported-by: R. Diez <rdiez-temp3@rd10.de> Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "packed-backend.c::create_snapshot", if there is a header (the line
which starts with '#'), we will check whether the line starts with "#
pack-refs with: ". However, we need to consider other situations and
discuss whether we need to add checks.
1. If the header does not exist, we should not report an error to the
user. This is because in older Git version, we never write header in
the "packed-refs" file. Also, we do allow no header in "packed-refs"
in runtime.
2. If the header content does not start with "# packed-ref with: ", we
should report an error just like what "create_snapshot" does. So,
create a new fsck message "badPackedRefHeader(ERROR)" for this.
3. If the header content is not the same as the constant string
"PACKED_REFS_HEADER". This is expected because we make it extensible
intentionally and runtime "create_snapshot" won't complain about
unknown traits. In order to align with the runtime behavior. There is
no need to report.
As we have analyzed, we only need to check the case 2 in the above. In
order to do this, use "open_nofollow" function to get the file
descriptor and then read the "packed-refs" file via "strbuf_read". Like
what "create_snapshot" and other functions do, we could split the line
by finding the next newline in the buffer. When we cannot find a
newline, we could report an error.
So, create a function "packed_fsck_ref_next_line" to find the next
newline and if there is no such newline, use
"packedRefEntryNotTerminated(ERROR)" to report an error to the user.
Then, parse the first line to apply the checks. Update the test to
exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
shejialuo [Thu, 27 Feb 2025 16:06:40 +0000 (00:06 +0800)]
packed-backend: check if header starts with "# pack-refs with: "
We always write a space after "# pack-refs with:" but we don't align
with this rule in the "create_snapshot" method where we would check
whether header starts with "# pack-refs with:". It might seem that we
should undoubtedly tighten this rule, however, we don't have any
technical documentation about this and there is a possibility that we
would break the compatibility for other third-party libraries.
By investigating influential third-party libraries, we could conclude
how these libraries handle the header of "packed-refs" file:
1. libgit2 is fine and always writes the space. It also expects the
whitespace to exist.
2. JGit does not expect th header to have a trailing space, but expects
the "peeled" capability to have a leading space, which is mostly
equivalent because that capability is typically the first one we
write. It always writes the space.
3. gitoxide expects the space t exist and writes it.
4. go-git doesn't create the header by default.
As many third-party libraries expect a single space after "# pack-refs
with:", if we forget to write the space after the colon,
"create_snapshot" won't catch this. And we would break other
re-implementations. So, we'd better tighten the rule by checking whether
the header starts with "# pack-refs with: ".
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
shejialuo [Thu, 27 Feb 2025 16:06:24 +0000 (00:06 +0800)]
packed-backend: check whether the "packed-refs" is regular file
Although "git-fsck(1)" and "packed-backend.c" will check some
consistency and correctness of "packed-refs" file, they never check the
filetype of the "packed-refs". Let's verify that the "packed-refs" has
the expected filetype, confirming it is created by "git pack-refs"
command.
We could use "open_nofollow" wrapper to open the raw "packed-refs" file.
If the returned "fd" value is less than 0, we could check whether the
"errno" is "ELOOP" to report an error to the user. And then we use
"fstat" to check whether the "packed-refs" file is a regular file.
Reuse "FSCK_MSG_BAD_REF_FILETYPE" fsck message id to report the error to
the user if "packed-refs" is not a regular file.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
shejialuo [Thu, 27 Feb 2025 16:06:06 +0000 (00:06 +0800)]
builtin/refs: get worktrees without reading head information
In "packed-backend.c", there are some functions such as "create_snapshot"
and "next_record" which would check the correctness of the content of
the "packed-ref" file. When anything is bad, the program will die.
It may seem that we have nothing relevant to above feature, because we
are going to read and parse the raw "packed-ref" file without creating
the snapshot and using the ref iterator to check the consistency.
However, when using "get_worktrees" in "builtin/refs", we would parse
the "HEAD" information. If the referent of the "HEAD" is inside the
"packed-ref", we will call "create_snapshot" function to parse the
"packed-ref" to get the information. No matter whether the entry of
"HEAD" in "packed-ref" is correct, "create_snapshot" would call
"verify_buffer_safe" to check whether there is a newline in the last
line of the file. If not, the program will die.
Although this behavior has no harm for the program, it will
short-circuit the program. When the users execute "git refs verify" or
"git fsck", we should avoid reading the head information, which may
execute the read operation in packed backend with stricter checks to die
the program. Instead, we should continue to check other parts of the
"packed-refs" file completely.
Fortunately, in 465a22b338 (worktree: skip reading HEAD when repairing
worktrees, 2023-12-29), we have introduced a function
"get_worktrees_internal" which allows us to get worktrees without
reading head information.
Create a new exposed function "get_worktrees_without_reading_head", then
replace the "get_worktrees" in "builtin/refs" with the new created
function.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
shejialuo [Thu, 27 Feb 2025 16:05:55 +0000 (00:05 +0800)]
t0602: use subshell to ensure working directory unchanged
For every test, we would execute the command "cd repo" in the first but
we never execute the command "cd .." to restore the working directory.
However, it's either not a good idea use above way. Because if any test
fails between "cd repo" and "cd ..", the "cd .." will never be reached.
And we cannot correctly restore the working directory.
Let's use subshell to ensure that the current working directory could be
restored to the correct path.
Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitlab-ci: fix "msvc-meson" test job succeeding despite test failures
We have recently noticed that the "msvc-meson" test job in GitLab CI
succeeds even if there are failures. This is somewhat puzzling because
we use exactly the same command as we do on GitHub Actions, and there
the jobs fail as exected.
As it turns out, this is another weirdness of the GitLab CI hosted
runner for Windows [1]: by default, even successful commands will not
make the job fail. Interestingly though, this depends on what exactly
the command is that you're running -- the MinGW-based job for example
works alright and does fail as expected.
The root cause here seems to be specific behaviour of PowerShell. The
invocation of `ForEach-Object` does not bubble up any errors in case the
invocation of `meson test` fails, and thus we don't notice the error.
This is specific to executing the command in a loop: other build steps
where we execute commands directly fail as expected.
This is because the specific version of PowerShell that we use in the
runner does not know about `PSNativeCommandUseErrorActionPreference`
yet, which controls whether native commands like "meson.exe" honor the
`ErrorActionPreference` variable. The preference has been introduced
with PowerShell 7.3 and is default-enabled since PowerShell 7.4, but
GitLab's hosted runners still seem to use PowerShell 5.1. Consequently,
when tests fail, we won't bubble up the error at all from the loop and
thus the job doesn't fail. This isn't an issue in other cases though
where we execute native commands directly, as the GitLab runner knows to
check the last error code after every command.
The same thing doesn't seem to be an issue on GitHub Actions, most
likely because it uses PowerShell 7.4. Curiously, the preference for
`PSNativeCommandUseErrorActionPreference` is disabled there, but the
jobs fail as expected regardless of that. It's puzzling, but I do not
have enough PowerShell expertise to give a definitive answer as to why
it works there.
In any case, Meson 1.8 will likely get support for slicing tests [1], so
we can eventually get rid of the whole PowerShell script. For now, work
around the issue by explicitly exiting out of the loop with a non-zero
error code if we see that Meson has failed.
gitlab-ci: restrict maximum number of link jobs on Windows
The hosted Windows runners on GitLab.com only have 7.5GB of RAM. Given
that "link.exe" provided by Microsoft Visual Studio is multi-threaded by
itself already and thus quite memory hungry this can quickly lead to
memory starvation, out-of-memory situations and thus failed CI jobs.
Fix the issue by limiting the number of concurrent linker jobs. The same
issue hasn't been observed on GitHub Actions yet, probably because it
got more than twice the amount of RAM with 16GB.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
meson: consistently use custom program paths to resolve programs
The calls to `find_program()` in our documentation don't use our custom
program path. This variable gets populated on Windows with the location
of Git for Windows so that we can use it to provide our build tools.
Consequently, we may not be able to find all necessary binaries on
Windows.
Adapt the calls to use the program path to fix this. While at it, drop
`required: true` arguments, which are the default anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're assigning the `git` variable in three places:
- In "meson.build" to store the external Git executable.
- In "meson.build" to store the compiled Git executable.
- In "Documentation/meson.build" to store the external Git executable,
a second time.
The last case is only needed because we overwrite the original variable
with the built version. Rename the variable used for the built Git
executable so that we don't have to resolve the external Git executable
multiple times.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're searching for the sed(1) executable in a loop, which will make us
try to find it multiple times. Starting with the preceding commit we
already declare a variable for that program in the top-level build file.
Use it so that we only need to search for the program once.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
meson: improve handling of `sane_tool_path` option
The `sane_tool_path` option can be used to override the PATH variable
from which the build process, tests and ultimately Git will end up
picking programs from. It is currently lacking though because we only
use it to populate the PATH environment variable for executed scripts
and for the `BROKEN_PATH_FIX` mechanism, but we don't use it to find
programs used in the build process itself.
Fix this issue by treating it similar to the Windows-specific paths,
which will make us use it both to find programs and to populate the PATH
environment variable.
To help with this fix, change the type of the option to be an array of
paths, which makes the handling a bit easier for us. It's also the
correct thing to do as the input indeed is a list of paths.
Furthermore, the option now overrides the default behaviour on Windows,
which si to pick up tools from Git for Windows. This is done so that it
becomes easier to override that default behaviour in case it's not
desired.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When locating programs required for the build we give some special
treatment to Windows systems so that we know to also look up tools
provided by a Git for Windows installation. This ensures that the build
doesn't have any prerequisites other than Microsoft Visual Studio, Meson
and Git for Windows.
Consequently, some of the programs returned by `find_program()` may not
be found via PATH, but via these extra directories. But while Meson can
use these tools directly without any special treatment, any scripts that
we execute may not be able to find those programs. To help them we thus
prepend the directories of a subset of the found programs to PATH.
This doesn't make much sense though: we don't need to prepend PATH for
any program that was found via PATH, but we really only need to do so
for programs located via the extraneous Windows-specific paths. So
instead of prepending all programs paths, we really only need to prepend
the Windows-specific paths.
Adapt the code accordingly by only prepeding Windows-specific paths to
PATH, which both simplifies the code and clarifies intent.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When building `libgit.a` we link it against a `libgit_version.a` library
that contains the version information that we inject at build time. The
intent of this is to avoid rebuilding all of `libgit.a` whenever the
version changes. But that wouldn't happen in the first place, as we know
to just rebuild the files that depend on the generated "version-def.h"
file.
This is an artifact of an earlier version of the Meson build infra that
didn't ultimately land. We didn't yet have "version-def.h", and instead
injected the version via preprocessor directives. And here we would have
rebuilt all of `libgit.a` indeed in case the version changes, because
the preprocessor directive applied to all files.
Stop building the separate library and instead add "version-def.h" to
the list of source files directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We set up libcurl via the `libgit_dependencies` variable, which gets
propagated into every user of the `libgit` dependency. This is not
necessary though, as most of our executables aren't even supposed to
link against libcurl.
Fix this by only propagating include directories as a libgit dependency
and propagating the full curl dependency via `libgit_curl`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've got a set of common source files that we use for those executables
that link against libcurl. The setup is somewhat repetitive though.
Simplify it by declaring a `libgit_curl` dependency that bundles all of
it together.
Note that we don't include curl itself as a dependency. This is because
we already pull it in transitively via the libgit dependency, which is
unfortunate because libgit itself shouldn't actually link against curl
in the first place. This will get fixed in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "common-main.c" file is used by multiple executables. In order to
make it easy to set it up we have created a separate library that these
executables can link against. All of these executables also want to link
against `libgit.a` though, which makes it necessary to specify both of
these as dependencies for every executable.
Simplify this a bit by declaring the library as a source dependency:
instead of creating a static library, we now instead compile the common
set of files into each executable separately.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When setting up `libgit.a` we first create the static library itself,
and then declare it as part of a dependency such that compile arguments,
include directories and transitive dependencies get propagated to the
users of that library. As such, the static library isn't expected to be
used by anything but the declared dependency.
Inline the static library so that we don't even use a separate variable
for it. This avoids any kind of confusion that may arise and clarifies
how the library is supposed to be used.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
meson: fix OpenSSL fallback when not explicitly required
When OpenSSL isn't provided by the system we know to fall back to the
subproject wrapper. This is especially helpful on Windows systems, where
you typically don't have OpenSSL available, in order to reduce the
number of required dependencies.
The fallback is broken though when the OpenSSL backend is set to 'auto'
as we end up calling `dependency('openssl', required: false)` in that
case, which implicitly disables falling back to the wrapper.
Fix the issue by re-allowing the fallback in case either OpenSSL is
required or in case the backend is set to 'auto'. While at it, fix
reporting of the backend in case the user asked us to pick no HTTPS
backend at all.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the runtime prefix option is enabled, Git is built such that it
knows to locate its binaries relative to the directory a binary is being
executed from. This requires us to figure out relative paths, which is
handled in `system_prefix()` by trying to strip a couple of well-known
paths.
One of these paths, GIT_EXEC_PATH, is expected to be absolute when
runtime prefixes are enabled, but relative otherwise. And while our
Makefile gets this correctly, in Meson we always wire up the absolute
path, which may result in us not being able to find binaries.
Fix this by conditionally injecting the paths depending on whether or
not the `runtime_prefix` option is enabled.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Submodule merges are, in general, similar to other merges based on oid
three-way-merge. When a conflict happens, however, Git has two special
cases (introduced in 68d03e4a6e44) on handling the conflict before
yielding it to the user. From the merge-ort and merge-recursive sources:
- "Case #1: a is contained in b or vice versa": both strategies try to
perform a fast-forward in the submodules if the commit referred by the
conflicted submodule is descendant of another;
- "Case #2: There are one or more merges that contain a and b in the
submodule. If there is only one, then present it as a suggestion to the
user, but leave it marked unmerged so the user needs to confirm the
resolution."
Add a small paragraph on merge-strategies.adoc describing this behavior.
Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 25 Feb 2025 23:45:42 +0000 (15:45 -0800)]
BreakingChanges: clarify branches/ and remotes/
As we have created an empty .git/branches/ hierarchy until fairly
recently, these directories may be found in modern repositories, but
it is highly unlikely that they are being used.
Reported-by: Jakub Wilk <jwilk@jwilk.net> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 25 Feb 2025 22:19:36 +0000 (14:19 -0800)]
Merge branch 'ad/set-default-target-in-makefiles'
Correct the default target in Documentation/Makefile, and
future-proof all Makefiles from similar breakages by declaring the
default target (which happens to be "all") upfront.
* ad/set-default-target-in-makefiles:
Makefile: set default goals in makefiles
Junio C Hamano [Tue, 25 Feb 2025 22:19:36 +0000 (14:19 -0800)]
Merge branch 'pw/merge-tree-stdin-deadlock-fix'
"git merge-tree --stdin" has been improved (including a workaround
for a deadlock).
* pw/merge-tree-stdin-deadlock-fix:
merge-tree: fix link formatting in html docs
merge-tree: improve docs for --stdin
merge-tree: only use basic merge config
merge-tree: remove redundant code
merge-tree --stdin: flush stdout to avoid deadlock
Junio C Hamano [Tue, 25 Feb 2025 22:19:35 +0000 (14:19 -0800)]
Merge branch 'da/xdiff-w-sign-compare-workaround'
Noises from "-Wsign-compare" in the borrowed xdiff code has been
squelched.
* da/xdiff-w-sign-compare-workaround:
xdiff: avoid signed vs. unsigned comparisons in xutils.c
xdiff: avoid signed vs. unsigned comparisons in xpatience.c
xdiff: avoid signed vs. unsigned comparisons in xhistogram.c
xdiff: avoid signed vs. unsigned comparisons in xemit.c
xdiff: avoid signed vs. unsigned comparisons in xdiffi.c
xdiff: move sign comparison warning guard into each file
Seyi Kuforiji [Tue, 25 Feb 2025 10:10:44 +0000 (11:10 +0100)]
t/unit-tests: convert oidtree test to use clar test framework
Adapt oidtree test script to clar framework by using clar assertions
where necessary. `cl_parse_any_oid()` ensures the hash algorithm is set
before parsing. This prevents issues from an uninitialized or invalid
hash algorithm.
Introduce 'test_oidtree__initialize` handles the to set up of the global
oidtree variable and `test_oidtree__cleanup` frees the oidtree when all
tests are completed.
With this change, `check_each` stops at the first error encountered,
making it easier to address it.
Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Seyi Kuforiji [Tue, 25 Feb 2025 10:10:43 +0000 (11:10 +0100)]
t/unit-tests: convert oidmap test to use clar test framework
Adapt oidmap test script to clar framework by using clar assertions
where necessary. `cl_parse_any_oid()` ensures the hash algorithm is set
before parsing. This prevents issues from an uninitialized or invalid
hash algorithm.
Introduce 'test_oidmap__initialize` handles the to set up of the global
oidmap map with predefined key-value pairs, and `test_oidmap__cleanup`
frees the oidmap and its entries when all tests are completed.
The test loops through all entries to detect multiple errors. With this
change, it stops at the first error encountered, making it easier to
address it.
Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Seyi Kuforiji [Tue, 25 Feb 2025 10:10:42 +0000 (11:10 +0100)]
t/unit-tests: convert oid-array test to use clar test framework
Adapt oid-array test script to clar framework by using clar assertions
where necessary. Remove descriptions from macros to reduce
redundancy, and move test input arrays to global scope for reuse across
multiple test functions. Introduce `test_oid_array__initialize()` to
explicitly initialize the hash algorithm.
These changes streamline the test suite, making individual tests
self-contained and reducing redundant code.
Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Seyi Kuforiji [Tue, 25 Feb 2025 10:10:41 +0000 (11:10 +0100)]
t/unit-tests: implement clar specific oid helper functions
`get_oid_arbitrary_hex()` and `init_hash_algo()` are both required for
oid-related tests to run without errors. In the current implementation,
both functions are defined and declared in the
`t/unit-tests/lib-oid.{c,h}` which is utilized by oid-related tests in
the homegrown unit tests structure.
Adapt functions in lib-oid.{c,h} to use clar. Both these functions
become available for oid-related test files implemented using the clar
testing framework, which requires them. This will be used by subsequent
commits.
Mentored-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:34:21 +0000 (01:34 -0500)]
unpack_loose_rest(): rewrite return handling for clarity
We have a pattern like:
if (error1)
...handle error 1...
else if (error2)
...handle error 2...
else
...return buf...
...free buf and return NULL...
This is a little subtle because it is the return in the success block
that lets us skip the common error handling. Rewrite this instead to
free the buffer in each error path, marking it as NULL, and then all
code paths can use the common return.
This should make the logic a bit easier to follow. It does mean
duplicating the buf cleanup for errors, but it's a single line.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:33:51 +0000 (01:33 -0500)]
unpack_loose_rest(): simplify error handling
Inflating a loose object is considered successful only if we got
Z_STREAM_END and there were no more bytes. We check both of those
conditions and return success, but then have to check them a second time
to decide which error message to produce.
I.e., we do something like this:
if (!error_1 && !error_2)
...return success...
if (error_1)
...handle error1...
else if (error_2)
...handle error2...
...common error handling...
This repetition was the source of a small bug fixed in an earlier commit
(our Z_STREAM_END check was not the same in the two conditionals).
Instead we can chain them all into a single if/else cascade, which
avoids repeating ourselves:
if (error_1)
...handle error1...
else if (error_2)
...handle error2....
else
...return success...
...common error handling...
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:33:12 +0000 (01:33 -0500)]
unpack_loose_rest(): never clean up zstream
The unpack_loose_rest() function has funny ownership semantics: we pass
in a z_stream opened by the caller, but then only _sometimes_ close it.
This oddity has developed over time. When the function was originally
split out in 5180cacc20 (Split up unpack_sha1_file() some more,
2005-06-02), it always called inflateEnd() to clean up the stream
(though nowadays it is a git_zstream and we call git_inflate_end()).
But in 7efbff7531 (unpack_sha1_file(): detect corrupt loose object
files., 2007-03-05) we added error code paths which don't close the
stream. This makes some sense, as we'd still look at parts of the stream
struct to decide which error to show (though I am not sure in practice
if inflateEnd() even touches those fields).
This subtlety makes it hard to know when the caller has to clean up the
stream and when it does not. That led to the leak fixed by aa9ef614dc
(object-file: fix memory leak when reading corrupted headers,
2024-08-14).
Let's instead always leave the stream intact, forcing the caller to
clean it up. You might think that would create more work for the
callers, but it actually ends up simplifying them, since they can put
the call to git_inflate_end() in the common cleanup code path.
Two things to note, though:
- The check_stream_oid() function is used as a replacement for
unpack_loose_rest() in read_loose_object() to read blobs. It
inherited the same funny semantics, and we should fix it here, too
(to keep the cleanup in read_loose_object() consistent).
- In read_loose_object() we need a second "out" label, as we can jump
to the existing label before opening the stream at all (and since
the struct is opaque, there is no way to if it was initialized or
not, so we must not call git_inflate_end() in that case).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:31:15 +0000 (01:31 -0500)]
unpack_loose_rest(): avoid numeric comparison of zlib status
When unpacking the actual content of a loose object file, we insist both
that the status code we got is Z_STREAM_END, and that we consumed all
bytes.
If we didn't, we'll return an error, but the specific error message we
produce depends on which of the two error conditions we saw. So we'll
check both a second time to decide which error to produce. But this
second time, our status code check is loose: it checks for a negative
status value.
This can get confused by zlib codes which are not negative, such as
Z_NEED_DICT. In this case we'd erroneously print nothing at all, when we
should say "corrupt loose object".
Instead, this second check should check explicitly against Z_STREAM_END.
Note that Z_OK is "0", so the existing code also produced no message for
Z_OK. But it's impossible to see that status, since we only break out of
the inflate loop when we stop seeing Z_OK (so a stream which has more
bytes than its object header claims would eventually yield Z_BUF_ERROR).
There's no test here, as it would require a loose object whose zlib
stream returns Z_NEED_DICT in the middle of the object content. I think
that is probably possible, but even our Z_NEED_DICT test in t1006 does
not trigger this, because we hit that error while reading the header. I
found this bug while reviewing all callers of git_inflate() for bugs
similar to the one we saw in unpack_loose_header(). This was the only
other case that did a numeric comparison rather than explicitly checking
for Z_STREAM_END.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:30:56 +0000 (01:30 -0500)]
unpack_loose_header(): avoid numeric comparison of zlib status
When unpacking a loose header, we try to inflate the first 32 bytes.
We'd expect either Z_OK (we filled up the output buffer, but there are
more bytes in the object) or Z_STREAM_END (this is a tiny object whose
header and content fit in the buffer).
We check for that with "if (status < Z_OK)", making the assumption that
all of the errors we'd see have negative values (as Z_OK itself is "0",
and Z_STREAM_END is "1").
But there's at least one case this misses: Z_NEED_DICT is "2". This
isn't something we'd ever expect to see, but if we do see it, we should
consider it an error (since we have no dictionary to load).
Instead, the current code interprets Z_NEED_DICT as success and looks
for the object header's terminating NUL in the bytes we've read. This
will generaly be zero bytes if the dictionary is mentioned at the start
of the stream. So we'll fail to find it and complain "the header is too
long" (ULHR_LONG). But really, the problem is that the object is
malformed, and we should return ULHR_BAD.
This is a minor bug, as we consider both cases to be an error. But it
does mean we print the wrong error message. The test case added in the
previous patch triggers this code, so we can just confirm the error
message we see here.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:30:26 +0000 (01:30 -0500)]
git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT
This fixes a case where malformed object input can cause us to hit a
BUG() call in the git-zlib.c code.
The zlib format allows the use of preset dictionaries to reduce the size
of deflated data. The checksum of the dictionary is computed by the
deflate code and goes into the stream. On the inflating side, zlib sees
the dictionary checksum and returns Z_NEED_DICT, asking the caller to
provide the dictionary data via inflateSetDictionary().
This should never happen in Git, because we never provide a dictionary
for deflating (and if we get a stream that mentions a dictionary, we
have no idea how to provide it). So normally Z_NEED_DICT is a hard error
for us. But something interesting happens if we _do_ happen to see it
(e.g., because of a corrupt or malicious input).
In git_inflate() as we loop over calls to zlib's inflate(), we translate
between our large-integer git_zstream values and zlib's native z_stream
types, copying in and out with zlib_pre_call() and zlib_post_call(). In
zlib_post_call() we have a few sanity checks, including one that checks
that the number of bytes consumed by zlib (as measured by it moving the
"next_in" pointer) is equal to the movement of its "total_in" count.
But these do not correspond when we see Z_NEED_DICT! Zlib consumes the
bytes from the input buffer but it does not increment total_in. And so
we hit the BUG("total_in mismatch") call.
There are a few options here:
- We could ditch that BUG() check. It is making too many assumptions
about how zlib updates these values. But it does have value in most
cases as a sanity check on the values we're copying.
- We could skip the zlib_post_call() entirely when we see Z_NEED_DICT.
We know that it's hard error for us, so we should just send the
status up the stack and let the caller bail.
The downside is that if we ever did want to support dictionaries,
we couldn't (the git_zstream will be out of sync, since we never
copied its values back from the z_stream).
- We could continue to call zlib_post_call(), but skip just that BUG()
check if the status is Z_NEED_DICT. This keeps git_inflate() as a
thin wrapper around inflate(), and would let us later support
dictionaries for some calls if we wanted to.
This patch uses the third approach. It seems like the least-surprising
thing to keep git_inflate() a close to inflate() as possible. And while
it makes the diff a bit larger (since we have to pass the status down to
to the zlib_post_call() function), it's a static local function, and
every caller by definition will have just made a zlib call (and so will
have a status integer).
Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:29:58 +0000 (01:29 -0500)]
unpack_loose_header(): fix infinite loop on broken zlib input
When reading a loose object, we first try to expand the first 32 bytes
to read the type+size header. This is enough for any of the normal Git
types. But since 46f034483e (sha1_file: support reading from a loose
object of unknown type, 2015-05-03), the caller can also ask us to parse
any unknown names, which can be much longer. In this case we keep
inflating until we find the NUL at the end of the header, or hit
Z_STREAM_END.
But what if zlib can't make forward progress? For example, if the loose
object file is truncated, we'll have no more data to feed it. It will
return Z_BUF_ERROR, and we'll just loop infinitely, calling
git_inflate() over and over but never seeing new bytes nor an
end-of-stream marker.
We can fix this by only looping when we think we can make forward
progress. This will always be Z_OK in this case. In other code we might
also be able to continue on Z_BUF_ERROR, but:
- We will never see Z_BUF_ERROR because the output buffer is full; we
always feed a fresh 32-byte buffer on each call to git_inflate().
- We may see Z_BUF_ERROR if we run out of input. But since we've fed
the whole mmap'd buffer to zlib, if it runs out of input there is
nothing more we can do.
So if we don't see Z_OK (and didn't see the end-of-header NUL, otherwise
we'd have broken out of the loop), then we should stop looping and
return an error.
The test case shows an example where the input is truncated (which gives
us the input Z_BUF_ERROR case above).
Although we do operate on objects we might get from an untrusted remote,
I don't think the security implications of this bug are too great. It
can only trigger if both of these are true:
- You're reading a loose object whose on-disk representation was
written by an attacker. So fetching an object (or receiving a push)
are mostly OK, because even with unpack-objects it is our local,
trusted code that writes out the object file. The exception may be
fetching from an untrusted local repo, or using dumb-http, which
copies objects verbatim. But...
- The only code path which triggers the inflate loop is cat-file's
--allow-unknown-type option. This is unlikely to be called at all
outside of debugging. But I also suspect that objects with
non-standard types (or that are truncated) would not survive the
usual fetch/receive checks in the first place.
So I think it would be quite hard to trick somebody into running the
infinite loop, and we can just fix the bug.
Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:29:40 +0000 (01:29 -0500)]
unpack_loose_header(): report headers without NUL as "bad"
If a caller asks us to read the whole loose object header value into a
strbuf (e.g., via "cat-file --allow-unknown-type"), we'll keep reading
until we see a NUL byte marking the end of the header.
If we hit Z_STREAM_END before seeing the NUL, we obviously have to stop.
But we return ULHR_TOO_LONG, which doesn't make any sense. The "too
long" return code is used in the normal, 32-byte limited mode to
indicate that we stopped looking. There is no such thing as "too long"
here, as we'd keep reading forever until we see the end of stream or the
NUL.
Instead, we should return ULHR_BAD. The loose object has no NUL marking
the end of header, so it is malformed. The behavior difference is
slight; in either case we'd consider the object unreadable and refuse to
go further. The only difference is the specific error message we
produce.
There's no test case here, as we'd need to generate a valid zlib stream
without a NUL. That's not something Git will do without writing new
custom code. And in the next patch we'll fix another bug in this area
which will make this easier to do (and we will test it then).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using OBJECT_INFO_ALLOW_UNKNOWN_TYPE to unpack a header that
doesn't fit into our initial 32-byte buffer, we loop over calls
git_inflate(), feeding it our buffer to the "next_out" pointer each
time. As the code is written, we reset next_out after each inflate call
(and after reading the output), ready for the next loop.
This isn't wrong, but there are a few advantages to setting up
"next_out" right before each inflate call, rather than after:
1. It drops a few duplicated lines of code.
2. It makes it obvious that we always feed a fresh buffer on each call
(and thus can never see Z_BUF_ERROR due to due to a lack of output
space).
3. After we exit the loop, we'll leave stream->next_out pointing to
the end of the fetched data (this is how zlib callers find out how
much data is in the buffer). This doesn't matter in practice, since
nobody looks at it again. But it's probably the least-surprising
thing to do, as it matches how next_out is left when the whole
thing fits in the initial 32-byte buffer (and we don't enter the
loop at all).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 25 Feb 2025 06:28:24 +0000 (01:28 -0500)]
loose_object_info(): BUG() on inflating content with unknown type
After unpack_loose_header() returns, it will have inflated not only the
object header, but possibly some bytes of the object content. When we
call unpack_loose_rest() to extract the actual content, it finds those
extra bytes by skipping past the header's terminating NUL in the buffer.
Like this:
int bytes = strlen(buffer) + 1;
n = stream->total_out - bytes;
...
memcpy(buf, (char *) buffer + bytes, n);
This won't work with the OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag, as there
we allow a header of arbitrary size. We put into a strbuf, but feed only
the final 32-byte chunk we read to unpack_loose_rest(). In that case
stream->total_out may unexpectedly large, and thus our "n" will be
large, causing an out-of-bounds read (we do check it against our
allocated buffer size, which prevents an out-of-bounds write).
Probably this could be made to work by feeding the strbuf to
unpack_loose_rest(), along with adjusting some types (e.g., "bytes"
would need to be a size_t, since it is no longer operating on a 32-byte
buffer).
But I don't think it's possible to actually trigger this in practice.
The only caller who passes ALLOW_UNKNOWN_TYPE is cat-file, which only
allows it with the "-t" and "-s" options (neither of which access the
content). There is one way you can _almost_ trigger it: the oid compat
routines (i.e., accessing sha1 via sha256 names and vice versa) will
convert objects on the fly (which requires access to the content) using
the same flags that were passed in. So in theory this:
t='some very large type field that causes an extra inflate call'
sha1_oid=$(git hash-object -w -t "$t" file)
sha256_oid=$(git rev-parse --output-object-format=sha256 $sha1_oid)
git cat-file --allow-unknown-type -s $sha256_oid
would try to access the content. But it doesn't work, because using
compat objects requires an entry in the .git/objects/loose-object-idx
file, and we don't generate such an entry for non-standard types (see
the "compat" section of write_object_file_literally()).
If we use "t=blob" instead, then it does access the compat object, but
it doesn't trigger the problem (because "blob" is a standard short type
name, and it fits in the initial 32-byte buffer).
So given that this is almost a memory error bug, I think it's worth
addressing. But because we can't actually trigger the situation, I'm
hesitant to try a fix that we can't run. Instead let's document the
restriction and protect ourselves from the out-of-bounds read by adding
a BUG() check.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>