Jeff King [Wed, 2 Oct 2024 23:26:18 +0000 (19:26 -0400)]
hash.h: set NEEDS_CLONE_HELPER_UNSAFE in fallback mode
Commit 253ed9ecff (hash.h: scaffolding for _unsafe hashing variants,
2024-09-26) introduced the concept of having two hash algorithms: a safe
and an unsafe one. When the Makefile knobs do not explicitly request an
unsafe one, we fall back to using the safe algorithm.
However, the fallback to do so forgot one case: we should inherit the
NEEDS_CLONE_HELPER flag from the safe variant. Failing to do so means
that we'll end up defining two clone functions (the algorithm specific
one, and the generic one that just calls memcpy). You'll see an error
like this:
$ make OPENSSL_SHA1=1
[...]
sha1/openssl.h:46:29: error: redefinition of ‘openssl_SHA1_Clone’
46 | #define platform_SHA1_Clone openssl_SHA1_Clone
| ^~~~~~~~~~~~~~~~~~
hash.h:83:40: note: in expansion of macro ‘platform_SHA1_Clone’
83 | # define platform_SHA1_Clone_unsafe platform_SHA1_Clone
| ^~~~~~~~~~~~~~~~~~~
hash.h:101:33: note: in expansion of macro ‘platform_SHA1_Clone_unsafe’
101 | # define git_SHA1_Clone_unsafe platform_SHA1_Clone_unsafe
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
hash.h:133:20: note: in expansion of macro ‘git_SHA1_Clone_unsafe’
133 | static inline void git_SHA1_Clone_unsafe(git_SHA_CTX_unsafe *dst,
| ^~~~~~~~~~~~~~~~~~~~~
sha1/openssl.h:37:20: note: previous definition of ‘openssl_SHA1_Clone’ with type ‘void(struct openssl_SHA1_CTX *, const struct openssl_SHA1_CTX *)’
37 | static inline void openssl_SHA1_Clone(struct openssl_SHA1_CTX *dst,
| ^~~~~~~~~~~~~~~~~~
This only matters when compiling with openssl as the "safe" variant,
since it's the only algorithm that requires a clone helper (and even
then, only if you are using openssl 3.0+). And you should never do that,
because it's not safe. But still, the invocation above used to work and
should continue to do so until we decide to require a
collision-detecting variant for the safe algorithm entirely.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Thu, 3 Oct 2024 15:51:01 +0000 (17:51 +0200)]
archive: fix misleading error message
The error message added by 296743a7ca (archive: load index before
pathspec checks, 2024-09-21) is misleading: unpack_trees() is not
touching the working tree at all here, but just loading a tree into
the index. Correct it.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Thu, 3 Oct 2024 11:58:42 +0000 (11:58 +0000)]
line-log: protect inner strbuf from free
The output_prefix() method in line-log.c may call a function pointer via
the diff_options struct. This function pointer returns a strbuf struct
and then its buffer is passed back. However, that implies that the
consumer is responsible to free the string. This is especially true
because the default behavior is to duplicate the empty string.
The existing functions used in the output_prefix pointer include:
1. idiff_prefix_cb() in diff-lib.c. This returns the data pointer, so
the value exists across multiple calls.
2. diff_output_prefix_callback() in graph.c. This uses a static strbuf
struct, so it reuses buffers across calls. These should not be
freed.
3. output_prefix_cb() in range-diff.c. This is similar to the
diff-lib.c case.
In each case, we should not be freeing this buffer. We can convert the
output_prefix() function to return a const char pointer and stop freeing
the result.
This choice is essentially the opposite of what was done in 394affd46d
(line-log: always allocate the output prefix, 2024-06-07).
This was discovered via 'valgrind' while investigating a public report
of a bug in 'git log --graph -L' [1].
This issue would have been caught by the new test, when Git is compiled
with ASan to catch these double frees.
Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mike Hommey [Wed, 2 Oct 2024 20:01:40 +0000 (05:01 +0900)]
utf8.h: squelch unused-parameter warnings with NO_ICONV
Since DEVELOPER=YesPlease build enables -Wunused-parameter warnings
these days, the fallback definition for reencode_string_len() that
did not touch any of its parameters but one needs to be annotated
properly.
Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library uses pluggable allocators, which means that we
shouldn't ever use the standard allocator functions. But it is an easy
mistake to make to accidentally use e.g. free(3P) instead of the
reftable-specific `reftable_free()` function, and we do not have any
mechanism to detect this misuse right now.
Introduce a couple of macros that ban the standard allocators, similar
to how we do it in "banned.h".
Note that we do not ban the following two classes of functions:
- Macros like `FREE_AND_NULL()` or `REALLOC_ARRAY()`. As those expand
to code that contains already-banned functions we'd get a compiler
error even without banning those macros explicitly.
- Git-specific allocators like `xmalloc()` and friends. The primary
reason is that there are simply too many of them, so we're rather
aiming for best effort here. Furthermore, the eventual goal is to
make them unavailable in the reftable library place by not pulling
them in via "git-compat-utils.h" anymore.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have several calls to `FREE_AND_NULL()` in the reftable library,
which of course uses free(3P). As the reftable allocators are pluggable
we should rather call the reftable specific function, which is
`reftable_free()`.
Introduce a new macro `REFTABLE_FREE_AND_NULL()` and adapt the callsites
accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a small set of calls to free(3P) in the reftable library. As
the reftable allocators are pluggable we should rather call the reftable
specific function, which is `reftable_free()`.
Convert the code accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tree interfaces of the reftable library handle both insertion and
searching of tree nodes with a single function, where the behaviour is
altered between the two via an `insert` bit. This makes it quit awkward
to handle allocation failures because on inserting we'd have to check
for `NULL` pointers and return an error, whereas on searching entries we
don't have to handle it as an allocation error.
Split up concerns of this function into two separate functions, one for
inserting entries and one for searching entries. This makes it easy for
us to check for allocation errors as `tree_insert()` should never return
a `NULL` pointer now. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `block_writer_init()` and
`block_reader_init()`. This requires us to bubble up error codes into
`writer_reinit_block_writer()`. Adapt call sites accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/reader: handle allocation failures in `reader_init_iter()`
Handle allocation failures in `reader_init_iter()`. This requires us to
also adapt `reftable_reader_init_*_iterator()` to bubble up the new
error codes. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/merged: handle allocation failures in `merged_table_init_iter()`
Handle allocation failures in `merged_table_init_iter()`. While at it,
merge `merged_iter_init()` into the function. It only has a single
caller and merging them makes it easier to handle allocation failures
consistently.
This change also requires us to adapt `reftable_stack_init_*_iterator()`
to bubble up the new error codes of `merged_table_iter_init()`. Adapt
callsites accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/writer: handle allocation failures in `reftable_new_writer()`
Handle allocation failures in `reftable_new_writer()`. Adapt the
function to return an error code to return such failures. While at it,
rename it to match our code style as we have to touch up every callsite
anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/record: handle allocation failures on copy
Handle allocation failures when copying records. While at it, convert
from `xstrdup()` to `reftable_strdup()`. Adapt callsites to check for
error codes.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/basics: handle allocation failures in `parse_names()`
Handle allocation failures in `parse_names()` by returning `NULL` in
case any allocation fails. While at it, refactor the function to return
the array directly instead of assigning it to an out-pointer.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/basics: handle allocation failures in `reftable_calloc()`
Handle allocation failures in `reftable_calloc()`.
While at it, remove our use of `st_mult()` that would cause us to die on
an overflow. From the caller's point of view there is not much of a
difference between arguments that are too large to be multiplied and a
request that is too big to handle by the allocator: in both cases the
allocation cannot be fulfilled. And in neither of these cases do we want
the reftable library to die.
While we could use `unsigned_mult_overflows()` to handle the overflow
gracefully, we instead open-code it to further our goal of converting
the reftable codebase to become a standalone library that can be reused
by external projects.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library provides the ability to swap out allocators. There
is a gap here though, because we continue to use `xstrdup()` even in the
case where all the other allocators have been swapped out.
Introduce `reftable_strdup()` that uses `reftable_malloc()` to do the
allocation.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/basics: merge "publicbasics" into "basics"
The split between "basics" and "publicbasics" is somewhat arbitrary and
not in line with how we typically structure code in the reftable
library. While we do indeed split up headers into a public and internal
part, we don't do that for the compilation unit itself. Furthermore, the
declarations for "publicbasics.c" are in "reftable-malloc.h", which
isn't in line with our naming schema, either.
Fix these inconsistencies by:
- Merging "publicbasics.c" into "basics.c".
- Renaming "reftable-malloc.h" to "reftable-basics.h" as the public
header.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library does not use the same memory allocation functions
as the rest of the Git codebase. Instead, as the reftable library is
supposed to be usable as a standalone library without Git, it provides a
set of pluggable memory allocators.
Compared to `xmalloc()` and friends these allocators are _not_ expected
to die when an allocation fails. This design choice is concious, as a
library should leave it to its caller to handle any kind of error. While
it is very likely that the caller cannot really do much in the case of
an out-of-memory situation anyway, we are not the ones to make that
decision.
Curiously though, we never handle allocation errors even though memory
allocation functions are allowed to fail. And as we do not plug in Git's
memory allocator via `reftable_set_alloc()` either the consequence is
that we'd instead segfault as soon as we run out of memory.
While the easy fix would be to wire up `xmalloc()` and friends, it
would only fix the usage of the reftable library in Git itself. Other
users like libgit2, which is about to revive its efforts to land a
backend for reftables, wouldn't be able to benefit from this solution.
Instead, we are about to do it the hard way: adapt all allocation sites
to perform error checking. Introduce a new error code for out-of-memory
errors that we will wire up in subsequent steps.
This commit also serves as the motivator for all the remaining steps in
this series such that we do not have to repeat the same arguments in
every single subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 2 Oct 2024 14:46:27 +0000 (07:46 -0700)]
Merge branch 'tb/weak-sha1-for-tail-sum'
The checksum at the tail of files are now computed without
collision detection protection. This is safe as the consumer of
the information to protect itself from replay attacks checks for
hash collisions independently.
* tb/weak-sha1-for-tail-sum:
csum-file.c: use unsafe SHA-1 implementation when available
Makefile: allow specifying a SHA-1 for non-cryptographic uses
hash.h: scaffolding for _unsafe hashing variants
sha1: do not redefine `platform_SHA_CTX` and friends
pack-objects: use finalize_object_file() to rename pack/idx/etc
finalize_object_file(): implement collision check
finalize_object_file(): refactor unlink_or_warn() placement
finalize_object_file(): check for name collision before renaming
Junio C Hamano [Wed, 2 Oct 2024 14:46:26 +0000 (07:46 -0700)]
Merge branch 'jk/http-leakfixes'
Leakfixes.
* jk/http-leakfixes: (28 commits)
http-push: clean up local_refs at exit
http-push: clean up loose request when falling back to packed
http-push: clean up objects list
http-push: free xml_ctx.cdata after use
http-push: free remote_ls_ctx.dentry_name
http-push: free transfer_request strbuf
http-push: free transfer_request dest field
http-push: free curl header lists
http-push: free repo->url string
http-push: clear refspecs before exiting
http-walker: free fake packed_git list
remote-curl: free HEAD ref with free_one_ref()
http: stop leaking buffer in http_get_info_packs()
http: call git_inflate_end() when releasing http_object_request
http: fix leak of http_object_request struct
http: fix leak when redacting cookies from curl trace
transport-helper: fix leak of dummy refs_list
fetch-pack: clear pack lockfiles list
fetch: free "raw" string when shrinking refspec
transport-helper: fix strbuf leak in push_refs_with_push()
...
When "git sparse-checkout disable" turns a sparse checkout into a
regular checkout, the index is fully expanded. This totally
expected behaviour however had an "oops, we are expanding the
index" advice message, which has been corrected.
* ds/sparse-checkout-expansion-advice:
sparse-checkout: disable advice in 'disable'
Derrick Stolee [Tue, 1 Oct 2024 17:37:44 +0000 (17:37 +0000)]
read-cache: free threaded memory pool
In load_cache_entries_threaded(), each thread allocates its own memory
pool. This pool needs to be cleaned up while closing the threads down,
or it will be leaked.
This ce_mem_pool pointer could theoretically be converted to an inline
copy of the struct, but the use of a pointer helps with existing lazy-
initialization logic. Adjusting that behavior only to avoid this pointer
would be a much bigger change.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git --git-dir=nowhere cmd" failed to properly notice that it
wasn't in any repository while processing includeIf.onbranch
configuration and instead crashed.
* ps/includeif-onbranch-cornercase-fix:
config: fix evaluating "onbranch" with nonexistent git dir
t1305: exercise edge cases of "onbranch" includes
Background tasks "git maintenance" runs may need to use credential
information when going over the network, but a credential helper
may work only in an interactive environment, and end up blocking a
scheduled task waiting for UI. Credential helpers can now behave
differently when they are not running interactively.
* ds/background-maintenance-with-credential:
scalar: configure maintenance during 'reconfigure'
maintenance: add custom config to background jobs
credential: add new interactive config option
Junio C Hamano [Mon, 30 Sep 2024 23:16:14 +0000 (16:16 -0700)]
Merge branch 'pw/submodule-process-sigpipe'
When a subprocess to work in a submodule spawned by "git submodule"
fails with SIGPIPE, the parent Git process caught the death of it,
but gave a generic "failed to work in that submodule", which was
misleading. We now behave as if the parent got SIGPIPE and die.
The push reports that report failures to the user when pushing a
reference leak in several places. Plug these leaks by introducing a new
function `ref_push_report_free()` that frees the list of reports and
call it as required. While at it, fix a trivially leaking error string
in the vicinity.
These leaks get hit in t5411, but plugging them does not make the whole
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-write: fix return parameter of `write_rev_file_order()`
While the return parameter of `write_rev_file_order()` is a string
constant, the function may indeed return an allocated string when its
first parameter is a `NULL` pointer. This makes for a confusing calling
convention, where callers need to be aware of these intricate ownership
rules and cast away the constness to free the string in some cases.
Adapt the function and its caller `write_rev_file()` to always return an
allocated string and adapt callers to always free the return value.
Note that this requires us to also adapt `rename_tmp_packfile()`, which
compares the pointers to packfile data with each other. Now that the
path of the reverse index file gets allocated unconditionally the check
will always fail. This is fixed by using strcmp(3P) instead, which also
feels way less fragile.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `saved_parents` slab is used by `--full-diff` to save parents of a
commit which we are about to rewrite. We do not release its contents
once it's not used anymore, causing a memory leak. Plug it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both `rewrite_parents()` and `remove_duplicate_parents()` may end up
dropping some parents from a commit without freeing the respective
`struct commit_list` items. This causes a bunch of memory leaks. Plug
these.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a new pseudo-merge group we collect a set of matchnig
commits and put them into a string map. This strmap is initialized such
that it does not allocate its keys, and instead we try to pass ownership
of the keys to it via `strmap_put()`. This isn't how it works though:
the strmap will never try to release these keys, and consequently they
end up leaking.
Fix this leak by initializing the strmap as duplicating its keys and not
trying to hand over ownership.
The leak is exposed by t5333, but plugging it does not yet make the full
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix various memory leaks hit by the pseudo-merge machinery. These leaks
are exposed by t5333, but plugging them does not yet make the whole test
suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
As described in "line-log.c" itself, the code is "leaking like a sieve".
These leaks are all of rather trivial nature, so this commit plugs them
without going too much into details for each of those leaks.
The leaks are hit by t4211, but plugging them alone does not make the
full test suite pass. The remaining leaks are unrelated to the line-log
subsystem.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lifecycle management of diff queues is somewhat confusing:
- For most of the part this can be attributed to `DIFF_QUEUE_CLEAR()`,
which does not release any memory but rather initializes the queue,
only. This is in contrast to our common naming schema, where
"clearing" means that we release underlying memory and then
re-initialize the data structure such that it is ready to use.
- A second offender is `diff_free_queue()`, which does not free the
queue structure itself. It is rather a release-style function.
Refactor the code to make things less confusing. `DIFF_QUEUE_CLEAR()` is
replaced by `DIFF_QUEUE_INIT` and `diff_queue_init()`, while
`diff_free_queue()` is replaced by `diff_queue_release()`. While on it,
adapt callsites where we call `DIFF_QUEUE_CLEAR()` with the intent to
release underlying memory to instead call `diff_queue_clear()` to fix
memory leaks.
This memory leak is exposed by t4211, but plugging it alone does not
make the whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/revert: fix leaking `gpg_sign` and `strategy` config
We leak the config values when `gpg_sign` or `strategy` options are
being overridden via the command line. To fix this we need to free the
old value, which requires us to figure out whether the value was changed
via an option in the first place. The easy way to do this, which is to
initialize local variables with `NULL`, doesn't work because we cannot
tell the case where the user has passed e.g. `--no-gpg-sign`. Instead,
we use a sentinel value for both values that we can compare against to
check whether the user has passed the option.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/clone: fix leaking repo state when cloning with bundle URIs
When cloning with bundle URIs we re-initialize `the_repository` after
having fetched the bundle. This causes a bunch of memory leaks though
because we do not release its previous state.
These leaks can be plugged by calling `repo_clear()` before we call
`repo_init()`. But this causes another issue because the remote that we
used is tied to the lifetime of the repository's remote state, which
would also get released. We thus have to make sure that it does not get
free'd under our feet.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `OPT_PATHSPEC_FROM_FILE()` option maps to `OPT_FILENAME()`, which we
know will always allocate memory when passed. We never free the memory
though, causing a memory leak. Plug it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The submodule entry list returned by `submodules_of_tree()` is never
completely free'd by its only caller. Introduce a new function that
free's the list for us and call it.
While at it, also fix the leaking `branch_point` string.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
wt-status: fix leaking buffer with sparse directories
When hitting a sparse directory in `wt_status_collect_changes_initial()`
we use a `struct strbuf` to assemble the directory's name. We never free
that buffer though, causing a memory leak.
Fix the leak by releasing the buffer. While at it, move the buffer
outside of the loop and reset it to save on some wasteful allocations.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two memory leaks in "shell.c". The first one in `run_shell()`
is trivial and fixed without further explanation. The second one in
`cmd_main()` happens because we overwrite the `prog` variable, which
contains an allocated string. In fact though, the memory pointed to by
that variable is still in use because we use `split_cmdline()`, which
may create pointers into the middle of that string. But as we do not
have a direct pointer to the head of the allocated string anymore, we
get a complaint by the leak checker.
Address this by not overwriting the `prog` pointer.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're leaking the args vector in git-annotate(1) because we never clear
it. Fixing it isn't as easy as calling `strvec_clear()` though because
calling `cmd_blame()` will cause the underlying array to be modified.
Instead, we also need to pass a shallow copy of the argv array to the
function.
Do so to plug the memory leaks.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 30 Sep 2024 18:22:21 +0000 (11:22 -0700)]
Merge branch 'jk/http-leakfixes' into ps/leakfixes-part-8
* jk/http-leakfixes: (28 commits)
http-push: clean up local_refs at exit
http-push: clean up loose request when falling back to packed
http-push: clean up objects list
http-push: free xml_ctx.cdata after use
http-push: free remote_ls_ctx.dentry_name
http-push: free transfer_request strbuf
http-push: free transfer_request dest field
http-push: free curl header lists
http-push: free repo->url string
http-push: clear refspecs before exiting
http-walker: free fake packed_git list
remote-curl: free HEAD ref with free_one_ref()
http: stop leaking buffer in http_get_info_packs()
http: call git_inflate_end() when releasing http_object_request
http: fix leak of http_object_request struct
http: fix leak when redacting cookies from curl trace
transport-helper: fix leak of dummy refs_list
fetch-pack: clear pack lockfiles list
fetch: free "raw" string when shrinking refspec
transport-helper: fix strbuf leak in push_refs_with_push()
...
Taylor Blau [Thu, 26 Sep 2024 15:22:53 +0000 (11:22 -0400)]
csum-file.c: use unsafe SHA-1 implementation when available
Update hashwrite() and friends to use the unsafe_-variants of hashing
functions, calling for e.g., "the_hash_algo->unsafe_update_fn()" instead
of "the_hash_algo->update_fn()".
These callers only use the_hash_algo to produce a checksum, which we
depend on for data integrity, but not for cryptographic purposes, so
these callers are safe to use the unsafe (non-collision detecting) SHA-1
implementation.
To time this, I took a freshly packed copy of linux.git, and ran the
following with and without the OPENSSL_SHA1_UNSAFE=1 build-knob. Both
versions were compiled with -O3:
Without OPENSSL_SHA1_UNSAFE=1 (that is, using the collision-detecting
SHA-1 implementation for both cryptographic and non-cryptographic
purposes), we spend a significant amount of our instruction count in
hashwrite():
, and generate the resulting "clone" much faster, in only 11.597 seconds
of wall time, 11.37 seconds of user time, and 0.23 seconds of system
time, for a ~40% speed-up.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 26 Sep 2024 15:22:50 +0000 (11:22 -0400)]
Makefile: allow specifying a SHA-1 for non-cryptographic uses
Introduce _UNSAFE variants of the OPENSSL_SHA1, BLK_SHA1, and
APPLE_COMMON_CRYPTO_SHA1 compile-time knobs which indicate which SHA-1
implementation is to be used for non-cryptographic uses.
There are a couple of small implementation notes worth mentioning:
- There is no way to select the collision detecting SHA-1 as the
"fast" fallback, since the fast fallback is only for
non-cryptographic uses, and is meant to be faster than our
collision-detecting implementation.
- There are no similar knobs for SHA-256, since no collision attacks
are presently known and thus no collision-detecting implementations
actually exist.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 26 Sep 2024 15:22:47 +0000 (11:22 -0400)]
hash.h: scaffolding for _unsafe hashing variants
Git's default SHA-1 implementation is collision-detecting, which hardens
us against known SHA-1 attacks against Git objects. This makes Git
object writes safer at the expense of some speed when hashing through
the collision-detecting implementation, which is slower than
non-collision detecting alternatives.
Prepare for loading a separate "unsafe" SHA-1 implementation that can be
used for non-cryptographic purposes, like computing the checksum of
files that use the hashwrite() API.
This commit does not actually introduce any new compile-time knobs to
control which implementation is used as the unsafe SHA-1 variant, but
does add scaffolding so that the "git_hash_algo" structure has five new
function pointers which are "unsafe" variants of the five existing
hashing-related function pointers:
Taylor Blau [Thu, 26 Sep 2024 15:22:44 +0000 (11:22 -0400)]
sha1: do not redefine `platform_SHA_CTX` and friends
Our in-tree SHA-1 wrappers all define platform_SHA_CTX and related
macros to point at the opaque "context" type, init, update, and similar
functions for each specific implementation.
In hash.h, we use these platform_ variables to set up the function
pointers for, e.g., the_hash_algo->init_fn(), etc.
But while these header files have a header-specific macro that prevents
them declaring their structs / functions multiple times, they
unconditionally define the platform variables, making it impossible to
load multiple SHA-1 implementations at once.
As a prerequisite for loading a separate SHA-1 implementation for
non-cryptographic uses, only define the platform_ variables if they have
not already been defined.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 26 Sep 2024 15:22:41 +0000 (11:22 -0400)]
pack-objects: use finalize_object_file() to rename pack/idx/etc
In most places that write files to the object database (even packfiles
via index-pack or fast-import), we use finalize_object_file(). This
prefers link()/unlink() over rename(), because it means we will prefer
data that is already in the repository to data that we are newly
writing.
We should do the same thing in pack-objects. Even though we don't think
of it as accepting outside data (and thus not being susceptible to
collision attacks), in theory a determined attacker could present just
the right set of objects to cause an incremental repack to generate
a pack with their desired hash.
This has some test and real-world fallout, as seen in the adjustment to
t5303 below. That test script assumes that we can "fix" corruption by
repacking into a good state, including when the pack generated by that
repack operation collides with a (corrupted) pack with the same hash.
This violates our assumption from the previous adjustments to
finalize_object_file() that if we're moving a new file over an existing
one, that since their checksums match, so too must their contents.
This makes "fixing" corruption like this a more explicit operation,
since the test (and users, who may fix real-life corruption using a
similar technique) must first move the broken contents out of the way.
Note also that we now call adjust_shared_perm() twice. We already call
adjust_shared_perm() in stage_tmp_packfiles(), and now call it again in
finalize_object_file(). This is somewhat wasteful, but cleaning up the
existing calls to adjust_shared_perm() is tricky (because sometimes
we're writing to a tmpfile, and sometimes we're writing directly into
the final destination), so let's tolerate some minor waste until we can
more carefully clean up the now-redundant calls.
Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 26 Sep 2024 15:22:38 +0000 (11:22 -0400)]
finalize_object_file(): implement collision check
We've had "FIXME!!! Collision check here ?" in finalize_object_file()
since aac1794132 (Improve sha1 object file writing., 2005-05-03). That
is, when we try to write a file with the same name, we assume the
on-disk contents are the same and blindly throw away the new copy.
One of the reasons we never implemented this is because the files it
moves are all named after the cryptographic hash of their contents
(either loose objects, or packs which have their hash in the name these
days). So we are unlikely to see such a collision by accident. And even
though there are weaknesses in sha1, we assume they are mitigated by our
use of sha1dc.
So while it's a theoretical concern now, it hasn't been a priority.
However, if we start using weaker hashes for pack checksums and names,
this will become a practical concern. So in preparation, let's actually
implement a byte-for-byte collision check.
The new check will cause the write of new differing content to be a
failure, rather than a silent noop, and we'll retain the temporary file
on disk. If there's no collision present, we'll clean up the temporary
file as usual after either rename()-ing or link()-ing it into place.
Note that this may cause some extra computation when the files are in
fact identical, but this should happen rarely.
Loose objects are exempt from this check, and the collision check may be
skipped by calling the _flags variant of this function with the
FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons:
- We don't treat the hash of the loose object file's contents as a
checksum, since the same loose object can be stored using different
bytes on disk (e.g., when adjusting core.compression, using a
different version of zlib, etc.).
This is fundamentally different from cases where
finalize_object_file() is operating over a file which uses the hash
value as a checksum of the contents. In other words, a pair of
identical loose objects can be stored using different bytes on disk,
and that should not be treated as a collision.
- We already use the path of the loose object as its hash value /
object name, so checking for collisions at the content level doesn't
add anything.
Adding a content-level collision check would have to happen at a
higher level than in finalize_object_file(), since (avoiding race
conditions) writing an object loose which already exists in the
repository will prevent us from even reaching finalize_object_file()
via the object freshening code.
There is a collision check in index-pack via its `check_collision()`
function, but there isn't an analogous function in unpack-objects,
which just feeds the result to write_object_file().
So skipping the collision check here does not change for better or
worse the hardness of loose object writes.
As a small note related to the latter bullet point above, we must teach
the tmp-objdir routines to similarly skip the content-level collision
checks when calling migrate_one() on a loose object file, which we do by
setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose
object shard.
Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
As soon as we've tried to link() a temporary object into place, we then
unlink() the tempfile immediately, whether we were successful or not.
For the success case, this is because we no longer need the old file
(it's now linked into place).
For the error case, there are two outcomes. Either we got EEXIST, in
which case we consider the collision to be a noop. Or we got a system
error, in which we case we are just cleaning up after ourselves.
Using a single line for all of these cases has some problems:
- in the error case, our unlink() may clobber errno, which we use in
the error message
- for the collision case, there's a FIXME that indicates we should do
a collision check. In preparation for implementing that, we'll need
to actually hold on to the file.
Split these three cases into their own calls to unlink_or_warn(). This
is more verbose, but lets us do the right thing in each case.
Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 26 Sep 2024 15:22:32 +0000 (11:22 -0400)]
finalize_object_file(): check for name collision before renaming
We prefer link()/unlink() to rename() for object files, with the idea
that we should prefer the data that is already on disk to what is
incoming. But we may fall back to rename() if the user has configured us
to do so, or if the filesystem seems not to support cross-directory
links. This loses the "prefer what is on disk" property.
We can mitigate this somewhat by trying to stat() the destination
filename before doing the rename. This is racy, since the object could
be created between the stat() and rename() calls. But in practice it is
expanding the definition of "what is already on disk" to be the point
that the function is called. That is enough to deal with any potential
attacks where an attacker is trying to collide hashes with what's
already in the repository.
Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
diffcore-break: fix leaking filespecs when merging broken pairs
When merging file pairs after they have been broken up we queue a new
file pair and discard the broken-up ones. The newly-queued file pair
reuses one filespec of the broken up pairs each, where the respective
other filespec gets discarded. But we only end up freeing the filespec's
data, not the filespec itself, and thus leak memory.
Fix these leaks by using `free_filespec()` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
revision: fix leaking parents when simplifying commits
When simplifying commits, e.g. because they are treesame with their
parents, we unset the commit's parent pointers but never free them. Plug
the resulting memory leaks.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/maintenance: fix leak in `get_schedule_cmd()`
The `get_schedule_cmd()` function allows us to override the schedule
command with a specific test command such that we can verify the
underlying logic in a platform-independent way. Its memory management is
somewhat wild though, because it basically gives up and assigns an
allocated string to the string constant output pointer. While this part
is marked with `UNLEAK()` to mask this, we also leak the local string
lists.
Rework the function such that it has a separate out parameter. If set,
we will assign it the final allocated command. Plug the other memory
leaks and create a common exit path.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing the maintenance strategy from config we allocate a config
string, but do not free it after parsing it. Plug this leak by instead
using `git_config_get_string_tmp()`, which does not allocate any memory.
This leak is exposed by t7900, but plugging it alone does not make the
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The partial clone filter of a promisor remote is never free'd, causing
memory leaks. Furthermore, in case multiple partial clone filters are
defined for the same remote, we'd overwrite previous values without
freeing them.
Fix these leaks.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a pattern via `create_grep_pat()` we allocate the pattern
member of the structure regardless of the token type. But later, when we
try to free the structure, we free the pattern member conditionally on
the token type and thus leak memory.
Plug this leak. The leak is exposed by t7814, but plugging it alone does
not make the whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `add_submodule_odb_by_path()` we add a path into a global string
list. The list is initialized with `NODUP`, which means that we do not
pass ownership of strings to the list. But we use `xstrdup()` when we
insert a path, with the consequence that the string will never get
free'd.
Plug the leak by marking the list as `DUP`. There is only a single
callsite where we insert paths anyway, and as explained above that
callsite was mishandling the allocation.
This leak is exposed by t7814, but plugging it does not make the whole
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
trace2: destroy context stored in thread-local storage
Each thread may have a specific context in the trace2 subsystem that we
set up via thread-local storage. We do not set up a destructor for this
data though, which means that the context data will leak.
Plug this leak by installing a destructor. This leak is exposed by
t7814, but plugging it alone does not make the whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When repacking, we assemble git-pack-objects(1) arguments both for the
"normal" pack and for the cruft pack. This configuration gets populated
with a bunch of `OPT_PASSTHRU` options that we end up passing to the
child process. These options are allocated, but never free'd.
Create a new `pack_objects_args_release()` function that releases the
memory for us and call it for both sets of options.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
diffcore-order: fix leaking buffer when parsing orderfiles
In `prepare_order()` we parse an orderfile and assign it to a global
array. In order to save on some allocations, we replace newlines with
NUL characters and then assign pointers into the allocated buffer to
that array. This can cause the buffer to be completely unreferenced
though in some cases, e.g. because the order file is empty or because we
had to use `xmemdupz()` to copy the lines instead of NUL-terminating
them.
Refactor the code to always `xmemdupz()` the strings. This is a bit
simpler, and it is rather unlikely that saving a handful of allocations
really matters. This allows us to release the string buffer and thus
plug the memory leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse-options: free previous value of `OPTION_FILENAME`
The `OPTION_FILENAME` option always assigns either an allocated string
or `NULL` to the value. In case it is passed multiple times it does not
know to free the previous value though, which causes a memory leak.
Refactor the function to always free the previous value. None of the
sites where this option is used pass a string constant, so this change
is safe.
While at it, fix the argument of `fix_filename()` to be a string
constant. The only reason why it's not is because we use it as an
in-out-parameter, where the input is a constant and the output is not.
This is weird and unnecessary, as we can just return the result instead
of using the parameter for this.
This leak is being hit in t7621, but plugging it alone does not make the
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `orderfile` diff option is being assigned via `OPT_FILENAME()`,
which assigns an allocated string to the variable. We never free it
though, causing a memory leak.
Change the type of the string to `char *` and free it to plug the leak.
This also requires us to use `xstrdup()` to assign the global config to
it in case it is set.
This leak is being hit in t7621, but plugging it alone does not make the
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `opt_ff` field gets populated either via `OPT_PASSTHRU` via
`config_get_ff()` or when `--rebase` is passed. So we sometimes end up
overriding the value in `opt_ff` with another value, but we do not free
the old value, causing a memory leak.
Adapt the type of the variable to be `char *` and consistently assign
allocated strings to it such that we can easily free it when it is being
overridden.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
dir: fix off by one errors for ignored and untracked entries
In `treat_directory()` we perform some logic to handle ignored and
untracked entries. When populating a directory with entries we first
save the current number of ignored/untracked entries and then populate
new entries at the end of our arrays that keep track of those entries.
When we figure out that all entries have been ignored/are untracked we
then remove this tail of entries from those vectors again. But there is
an off by one error in both paths that causes us to not free the first
ignored and untracked entries, respectively.
Fix these off-by-one errors to plug the resulting leak. While at it,
massage the code a bit to match our modern code style.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/submodule--helper: fix leaking remote ref on errors
When `update_submodule()` fails we return with `die_message()`, which
only causes us to print the same message as `die()` would without
actually causing the process to die. We don't free memory in that case
and thus leak memory.
Fix the leak by freeing the remote ref.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t/helper: fix leaking subrepo in nested submodule config helper
In the "submodule-nested-repo-config" helper we create a submodule
repository and print its configuration. We do not clear the repo,
causing a memory leak. Plug it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/submodule--helper: clear child process when not running it
In `runcommand_in_submodule_cb()` we may end up not executing the child
command when `argv` is empty. But we still populate the command with
environment variables and other things, which needs cleanup. This leads
to a memory leak because we do not call `finish_command()`.
Fix this by clearing the child process when we don't execute it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `handle_builtin()` we may end up creating an ad-hoc argv array in
case we see that the command line contains the "--help" parameter. In
this case we observe two memory leaks though:
- We leak the `struct strvec` itself because we directly exit after
calling `run_builtin()`, without bothering about any cleanups.
- Even if we free'd that vector we'd end up leaking some of its
strings because `run_builtin()` will modify the array.
Plug both of these leaks.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/help: fix leaking `html_path` when reading config multiple times
The `html_path` variable gets populated via `git_help_config()`, which
puts an allocated string into it if its value has been configured. We do
not clear the old value though, which causes a memory leak in case the
config exists multiple times.
Plug this leak. The leak is exposed by t0012, but plugging it alone is
not sufficient to make the test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/help: fix dangling reference to `html_path`
In `get_html_page_path()` we may end up assigning the return value of
`system_path()` to the global `html_path` variable. But as we also
assign the returned value to `to_free`, we will deallocate its memory
upon returning from the function. Consequently, `html_path` will now
point to deallocated memory.
Fix this issue by instead assigning the value to a separate local
variable.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>