The speed of the merged_iter_pqueue_add() can be improved by using a
pointer to the pq_entry struct, which is 96 bytes. Since the pq_entry
param is worked directly on the stack and does not currently have a
pointer to it, the merged_iter_pqueue_add() function is slightly
slower.
References to pq_entry in reftable have typically included pointers,
such as both of the params for pq_less().
Since we are working with pointers in the pq_entry param, as keenly
pointed out, the pq_entry param has also been made into a const since
the contents of the pq_entry param are copied and not manipulated.
Signed-off-by: Elijah Conners <business@elijahpepe.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 14 Sep 2022 19:56:40 +0000 (12:56 -0700)]
Merge branch 'ab/submodule-helper-leakfix'
Plugging leaks in submodule--helper.
* ab/submodule-helper-leakfix:
submodule--helper: fix a configure_added_submodule() leak
submodule--helper: free rest of "displaypath" in "struct update_data"
submodule--helper: free some "displaypath" in "struct update_data"
submodule--helper: fix a memory leak in print_status()
submodule--helper: fix a leak in module_add()
submodule--helper: fix obscure leak in module_add()
submodule--helper: fix "reference" leak
submodule--helper: fix a memory leak in get_default_remote_submodule()
submodule--helper: fix a leak with repo_clear()
submodule--helper: fix "sm_path" and other "module_cb_list" leaks
submodule--helper: fix "errmsg_str" memory leak
submodule--helper: add and use *_release() functions
submodule--helper: don't leak {run,capture}_command() cp.dir argument
submodule--helper: "struct pathspec" memory leak in module_update()
submodule--helper: fix most "struct pathspec" memory leaks
submodule--helper: fix trivial get_default_remote_submodule() leak
submodule--helper: fix a leak in "clone_submodule"
Junio C Hamano [Wed, 14 Sep 2022 19:56:38 +0000 (12:56 -0700)]
Merge branch 'jk/unused-annotation'
Annotate function parameters that are not used (but cannot be
removed for structural reasons), to prepare us to later compile
with -Wunused warning turned on.
* jk/unused-annotation:
is_path_owned_by_current_uid(): mark "report" parameter as unused
run-command: mark unused async callback parameters
mark unused read_tree_recursive() callback parameters
hashmap: mark unused callback parameters
config: mark unused callback parameters
streaming: mark unused virtual method parameters
transport: mark bundle transport_options as unused
refs: mark unused virtual method parameters
refs: mark unused reflog callback parameters
refs: mark unused each_ref_fn parameters
git-compat-util: add UNUSED macro
Junio C Hamano [Tue, 13 Sep 2022 19:18:30 +0000 (12:18 -0700)]
Merge a handful of topics from the 'master' front
As the 'master' front will soon tag a preview and then release
candidates for 2.38, it is unknown if we are going to issue another
maintenance release on the 2.37.x track, but as we have accumulated
enough material there, let's prepare a draft for it.
Even if we end up not tagging 2.37.4, it would help motivated distro
packagers to maintain their slightly older and "more stable" versions.
Junio C Hamano [Tue, 13 Sep 2022 19:21:11 +0000 (12:21 -0700)]
Merge branch 'en/merge-unstash-only-on-clean-merge' into maint
The auto-stashed local changes created by "git merge --autostash"
was mixed into a conflicted state left in the working tree, which
has been corrected.
* en/merge-unstash-only-on-clean-merge:
merge: only apply autostash when appropriate
Junio C Hamano [Tue, 13 Sep 2022 19:21:09 +0000 (12:21 -0700)]
Merge branch 'sg/xcalloc-cocci-fix' into maint
xcalloc(), imitating calloc(), takes "number of elements of the
array", and "size of a single element", in this order. A call that
does not follow this ordering has been corrected.
* sg/xcalloc-cocci-fix:
promisor-remote: fix xcalloc() argument order
Junio C Hamano [Tue, 13 Sep 2022 19:21:08 +0000 (12:21 -0700)]
Merge branch 'jk/pipe-command-nonblock' into maint
Fix deadlocks between main Git process and subprocess spawned via
the pipe_command() API, that can kill "git add -p" that was
reimplemented in C recently.
* jk/pipe-command-nonblock:
pipe_command(): mark stdin descriptor as non-blocking
pipe_command(): handle ENOSPC when writing to a pipe
pipe_command(): avoid xwrite() for writing to pipe
git-compat-util: make MAX_IO_SIZE define globally available
nonblock: support Windows
compat: add function to enable nonblocking pipes
Junio C Hamano [Tue, 13 Sep 2022 18:38:24 +0000 (11:38 -0700)]
Merge branch 'js/builtin-add-p-portability-fix'
More fixes to "add -p"
* js/builtin-add-p-portability-fix:
t6132(NO_PERL): do not run the scripted `add -p`
t3701: test the built-in `add -i` regardless of NO_PERL
add -p: avoid ambiguous signed/unsigned comparison
Junio C Hamano [Tue, 13 Sep 2022 18:38:24 +0000 (11:38 -0700)]
Merge branch 'sg/parse-options-subcommand'
The codepath for the OPT_SUBCOMMAND facility has been cleaned up.
* sg/parse-options-subcommand:
notes, remote: show unknown subcommands between `'
notes: simplify default operation mode arguments check
test-parse-options.c: fix style of comparison with zero
test-parse-options.c: don't use for loop initial declaration
t0040-parse-options: remove leftover debugging
Junio C Hamano [Tue, 13 Sep 2022 18:38:24 +0000 (11:38 -0700)]
Merge branch 'jk/rev-list-verify-objects-fix'
"git rev-list --verify-objects" ought to inspect the contents of
objects and notice corrupted ones, but it didn't when the commit
graph is in use, which has been corrected.
* jk/rev-list-verify-objects-fix:
rev-list: disable commit graph with --verify-objects
lookup_commit_in_graph(): use prepare_commit_graph() to check for graph
Junio C Hamano [Tue, 13 Sep 2022 18:38:23 +0000 (11:38 -0700)]
Merge branch 'jk/upload-pack-skip-hash-check'
The server side that responds to "git fetch" and "git clone"
request has been optimized by allowing it to send objects in its
object store without recomputing and validating the object names.
* jk/upload-pack-skip-hash-check:
t1060: check partial clone of misnamed blob
parse_object(): check commit-graph when skip_hash set
upload-pack: skip parse-object re-hashing of "want" objects
parse_object(): allow skipping hash check
Junio C Hamano [Tue, 13 Sep 2022 18:38:23 +0000 (11:38 -0700)]
Merge branch 'rs/diff-no-index-cleanup'
"git diff --no-index A B" managed its the pathnames of its two
input files rather haphazardly, sometimes leaking them. The
command line argument processing has been straightened out to clean
it up.
* rs/diff-no-index-cleanup:
diff-no-index: simplify argv index calculation
diff-no-index: release prefixed filenames
diff-no-index: release strbuf on queue error
Junio C Hamano [Fri, 9 Sep 2022 19:02:25 +0000 (12:02 -0700)]
Merge branch 'jc/format-patch-force-in-body-from'
"git format-patch --from=<ident>" can be told to add an in-body
"From:" line even for commits that are authored by the given
<ident> with "--force-in-body-from"option.
* jc/format-patch-force-in-body-from:
format-patch: learn format.forceInBodyFrom configuration variable
format-patch: allow forcing the use of in-body From: header
pretty: separate out the logic to decide the use of in-body from
Junio C Hamano [Fri, 9 Sep 2022 19:02:24 +0000 (12:02 -0700)]
Merge branch 'js/add-p-diff-parsing-fix'
Those who use diff-so-fancy as the diff-filter noticed a regression
or two in the code that parses the diff output in the built-in
version of "add -p", which has been corrected.
* js/add-p-diff-parsing-fix:
add -p: ignore dirty submodules
add -p: gracefully handle unparseable hunk headers in colored diffs
add -p: detect more mismatches between plain vs colored diffs
After 2d893dff4c (rev-parse --parseopt: allow [*=?!] in argument hints,
2015-07-14) updated the parser, a line in parseopts's input can start
with one of the flag characters and be erroneously parsed as a opt-spec
where the short name of the option is the flag character itself and the
long name is after the end of the string. This makes Git want to
allocate SIZE_MAX bytes of memory at this line:
o->long_name = xmemdupz(sb.buf + 2, s - sb.buf - 2);
Since s and sb.buf are equal the second argument is -2 (except unsigned)
and xmemdupz allocates len + 1 bytes, ie. -1 meaning SIZE_MAX.
Avoid this by checking whether a flag character was found in the zeroth
position.
Reported-by: Ingy dot Net <ingy@ingy.net> Signed-off-by: Øystein Walle <oystwa@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 8 Sep 2022 05:02:50 +0000 (01:02 -0400)]
prepare_repo_settings(): plug leak of config values
We call repo_config_get_string() for fetch.negotiationAlgorithm, which
allocates a copy of the string, but we never free it.
We could add a call to free(), but there's an even simpler solution: we
don't need the string to persist beyond a few strcasecmp() calls, so we
can instead use the "_tmp" variant which gives us a const pointer to the
cached value.
We need to switch the type of "strval" to "const char *" for this to
work, which affects a similar call that checks core.untrackedCache. But
it's in the same boat! It doesn't actually need the value to persist
beyond a maybe_bool() check (though it does remember to correctly free
the string afterwards). So we can simplify it at the same time.
Note that this core.untrackedCache check arguably should be using
repo_config_get_maybe_bool(), but there are some subtle behavior
changes. E.g., it doesn't currently allow a value-less "true". Arguably
it should, but let's avoid lumping further changes in what should be a
simple leak cleanup.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 8 Sep 2022 05:01:23 +0000 (01:01 -0400)]
list_objects_filter_options: plug leak of filter_spec strings
The list_objects_filter_options struct contains a string_list to store
the filter_spec. Because we allow the options struct to be
zero-initialized by callers, the string_list's strdup_strings field is
generally not set.
Because we don't want to depend on the memory lifetimes of any strings
passed in to the list-objects API, everything we add to the string_list
is duplicated (either via xstrdup(), or via strbuf_detach()). So far so
good, but now we have a problem at cleanup time: when we clear the
list, the string_list API doesn't realize that it needs to free all of
those strings, and we leak them.
One option would be to set strdup_strings right before clearing the
list. But this is tricky for two reasons:
1. There's one spot, in partial_clone_get_default_filter_spec(),
that fails to duplicate its argument. We could fix that, but...
2. We clear the list in a surprising number of places. As you might
expect, we do so in list_objects_filter_release(). But we also
clear and rewrite it in expand_list_objects_filter_spec(),
list_objects_filter_spec(), and transform_to_combine_type().
We'd have to put the same hack in all of those spots.
Instead, let's just set strdup_strings before adding anything. That lets
us drop the extra manual xstrdup() calls, fixes the spot mentioned
in (1) above that _should_ be duplicating, and future-proofs further
calls. We do have to switch the strbuf_detach() calls to use the nodup
form, but that's an easy change, and the resulting code more clearly
shows the expected ownership transfer.
This also resolves a weird inconsistency: when we make a deep copy with
list_objects_filter_copy(), it initializes the copy's filter_spec with
string_list_init_dup(). So the copy frees its string_list memory
correctly, but accidentally leaks the extra manual-xstrdup()'d strings!
There is one hiccup, though. In an ideal world, everyone would allocate
the list_objects_filter_options struct with an initializer which used
STRING_LIST_INIT_DUP under the hood. But there are a bunch of existing
callers which think that zero-initializing is good enough. We can leave
them as-is by noting that the list is always initially populated via
parse_list_objects_filter(). So we can just initialize the
strdup_strings flag there.
This is arguably a band-aid, but it works reliably. And it doesn't make
anything harder if we want to switch all the callers later to a new
LIST_OBJECTS_FILTER_INIT.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 8 Sep 2022 04:58:11 +0000 (00:58 -0400)]
transport: free filter options in disconnect_git()
If a user of the transport API calls transport_set_option() with
TRANS_OPT_LIST_OBJECTS_FILTER, it doesn't pass a struct, but rather a
string with the filter-spec, which the transport code then stores in its
own list_objects_filter_options struct.
When the caller is done and we call transport_disconnect(), the contents
of that filter struct are then leaked. We should release it before
freeing the transport struct.
Another way to solve this would be for transport_set_option() to pass a
pointer to the struct. But that's awkward, because there's a generic
transport-option interface that always takes a string. Plus it opens up
questions of memory lifetimes; by storing its own filter-options struct,
the transport code remains self-contained.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 8 Sep 2022 04:57:29 +0000 (00:57 -0400)]
transport: deep-copy object-filter struct for fetch-pack
When the transport code for the git protocol calls into fetch_pack(), it
has to fill out a fetch_pack_args struct that is mostly taken from the
transport options. We pass along any object-filter data by doing a
struct assignment of the list_objects_filter_options struct. But doing
so isn't safe; it contains allocated pointers in its filter_spec
string_list, which could lead to a double-free if one side mutates or
frees the string_list.
And indeed, the fetch-pack code does clear and rewrite the list via
expand_list_objects_filter_spec(), leaving the transport code with
dangling pointers.
This hasn't been a problem so far, though, because the transport code
doesn't look further at the filter struct. But it should, because in
some cases (when fetch-pack doesn't rewrite the list), it ends up
leaking the string_list.
So let's start by turning this shallow copy into a deep one, which
should let us fix the transport leak in a subsequent patch. Likewise,
we'll free the deep copy we made here when we're done with it (to avoid
leaking).
Note that it would also work to pass fetch-pack a pointer to our filter
struct, rather than a copy. But it's awkward for fetch-pack to take a
pointer in its arg struct; the actual git-fetch-pack command allocates a
fetch_pack_args struct on the stack and expects it to contain the filter
options. It could be rewritten to avoid this, but a deep copy serves our
purposes just as well.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 8 Sep 2022 04:54:29 +0000 (00:54 -0400)]
list_objects_filter_copy(): deep-copy sparse_oid_name field
The purpose of our copy function is to do a deep copy of each field so
that the source and destination structs become independent. We correctly
copy the filter_spec string list, but we forgot the sparse_oid_name
field. By doing a shallow copy of the pointer, that puts us at risk for
a use-after-free if one or both of the structs is cleaned up.
I don't think this can be triggered in practice, because we tend to leak
the structs rather than actually clean them up. But this should
future-proof us for plugging those leaks.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Wed, 7 Sep 2022 21:02:20 +0000 (17:02 -0400)]
t1060: check partial clone of misnamed blob
A recent commit (upload-pack: skip parse-object re-hashing of "want"
objects, 2022-09-06) loosened the behavior of upload-pack so that it
does not verify the sha1 of objects it receives directly via "want"
requests. The existing corruption tests in t1060 aren't affected by
this: the corruptions are blobs reachable from commits, and the client
requests the commits.
The more interesting case here is a partial clone, where the client will
directly ask for the corrupted blob when it does an on-demand fetch of
the filtered object. And that is not covered at all, so let's add a
test.
It's important here that we use the "misnamed" corruption and not
"bit-error". The latter is sufficiently corrupted that upload-pack
cannot even figure out the type of the object, so it bails identically
both before and after the recent change. But with "misnamed", with the
hash-checks enabled it sees the problem (though the error messages are a
bit confusing because of the inability to create a "struct object" to
store the flags):
After the change to skip the hash check, the server side happily sends
the bogus object, but the client correctly realizes that it did not get
the necessary data:
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (1/1), 49 bytes | 49.00 KiB/s, done.
fatal: bad revision 'd95f3ad14dee633a758d2e331151e950dd13e4ed'
error: [...]/misnamed did not send all necessary objects
which is exactly what we expect to happen.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Wed, 7 Sep 2022 11:45:42 +0000 (13:45 +0200)]
diff-no-index: simplify argv index calculation
Since 16bb3d714d (diff --no-index: use parse_options() instead of
diff_opt_parse(), 2019-03-24) argc must be 2 if we reach the loop, i.e.
argc - 2 == 0. Remove that inconsequential term.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Wed, 7 Sep 2022 11:36:53 +0000 (13:36 +0200)]
diff-no-index: release strbuf on queue error
The strbuf is small and we are about to exit, so we could leave its
cleanup to the OS. If we release it explicitly at all, however, then we
should do it on early exit as well. Move the strbuf_release call to a
new cleanup section at the end and make sure all execution paths go
through it.
Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 6 Sep 2022 23:06:25 +0000 (19:06 -0400)]
parse_object(): check commit-graph when skip_hash set
If the caller told us that they don't care about us checking the object
hash, then we're free to implement any optimizations that get us the
parsed value more quickly. An obvious one is to check the commit graph
before loading an object from disk. And in fact, both of the callers who
pass in this flag are already doing so before they call parse_object()!
So we can simplify those callers, as well as any possible future ones,
by moving the logic into parse_object().
There are two subtle things to note in the diff, but neither has any
impact in practice:
- it seems least-surprising here to do the graph lookup on the
git-replace'd oid, rather than the original. This is in theory a
change of behavior from the earlier code, as neither caller did a
replace lookup itself. But in practice it doesn't matter, as we
disable the commit graph entirely if there are any replace refs.
- the caller in get_reference() passes the skip_hash flag only if
revs->verify_objects isn't set, whereas it would look in the commit
graph unconditionally. In practice this should not matter as we
should disable the commit graph entirely when using verify_objects
(and that was done recently in another patch).
So this should be a pure cleanup with no behavior change.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 6 Sep 2022 23:05:42 +0000 (19:05 -0400)]
upload-pack: skip parse-object re-hashing of "want" objects
Imagine we have a history with commit C pointing to a large blob B.
If a client asks us for C, we can generally serve both objects to them
without accessing the uncompressed contents of B. In upload-pack, we
figure out which commits we have and what the client has, and feed those
tips to pack-objects. In pack-objects, we traverse the commits and trees
(or use bitmaps!) to find the set of objects needed, but we never open
up B. When we serve it to the client, we can often pass the compressed
bytes directly from the on-disk packfile over the wire.
But if a client asks us directly for B, perhaps because they are doing
an on-demand fetch to fill in the missing blob of a partial clone, we
end up much slower. Upload-pack calls parse_object() on the oid we
receive, which opens up the object and re-checks its hash (even though
if it were a commit, we might skip this parse entirely in favor of the
commit graph!). And then we feed the oid directly to pack-objects, which
again calls parse_object() and opens the object. And then finally, when
we write out the result, we may send bytes straight from disk, but only
after having unnecessarily uncompressed and computed the sha1 of the
object twice!
This patch teaches both code paths to use the new SKIP_HASH_CHECK flag
for parse_object(). You can see the speed-up in p5600, which does a
blob:none clone followed by a checkout. The savings for git.git are
modest:
Test HEAD^ HEAD
----------------------------------------------------------------------
5600.3: checkout of result 2.23(4.19+0.24) 1.72(3.79+0.18) -22.9%
But the savings scale with the number of bytes. So on a repository like
linux.git with more files, we see more improvement (in both absolute and
relative numbers):
Test HEAD^ HEAD
----------------------------------------------------------------------------
5600.3: checkout of result 51.62(77.26+2.76) 34.86(61.41+2.63) -32.5%
And here's an even more extreme case. This is the android gradle-plugin
repository, whose tip checkout has ~3.7GB of files:
Test HEAD^ HEAD
--------------------------------------------------------------------------
5600.3: checkout of result 79.51(90.84+5.55) 40.28(51.88+5.67) -49.3%
Keep in mind that these timings are of the whole checkout operation. So
they count the client indexing the pack and actually writing out the
files. If we want to see just the server's view, we can hack up the
GIT_TRACE_PACKET output from those operations and replay it via
upload-pack. For the gradle example, that gives me:
Benchmark 1: GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input
Time (mean ± σ): 50.884 s ± 0.239 s [User: 51.450 s, System: 1.726 s]
Range (min … max): 50.608 s … 51.025 s 3 runs
Benchmark 2: GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input
Time (mean ± σ): 9.728 s ± 0.112 s [User: 10.466 s, System: 1.535 s]
Range (min … max): 9.618 s … 9.842 s 3 runs
Summary
'GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input' ran
5.23 ± 0.07 times faster than 'GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input'
So a server would see an 80% reduction in CPU serving the initial
checkout of a partial clone for this repository. Or possibly even more
depending on the packing; most of the time spent in the faster one were
objects we had to open during the write phase.
In both cases skipping the extra hashing on the server should be pretty
safe. The client doesn't trust the server anyway, so it will re-hash all
of the objects via index-pack. There is one thing to note, though: the
change in get_reference() affects not just pack-objects, but rev-list,
git-log, etc. We could use a flag to limit to index-pack here, but we
may already skip hash checks in this instance. For commits, we'd skip
anything we load via the commit-graph. And while before this commit we
would check a blob fed directly to rev-list on the command-line, we'd
skip checking that same blob if we found it by traversing a tree.
The exception for both is if --verify-objects is used. In that case,
we'll skip this optimization, and the new test makes sure we do this
correctly.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 6 Sep 2022 23:01:34 +0000 (19:01 -0400)]
parse_object(): allow skipping hash check
The parse_object() function checks the object hash of any object it
parses. This is a nice feature, as it means we may catch bit corruption
during normal use, rather than waiting for specific fsck operations.
But it also can be slow. It's particularly noticeable for blobs, where
except for the hash check, we could return without loading the object
contents at all. Now one may wonder what is the point of calling
parse_object() on a blob in the first place then, but usually it's not
intentional: we were fed an oid from somewhere, don't know the type, and
want an object struct. For commits and trees, the parsing is usually
helpful; we're about to look at the contents anyway. But this is less
true for blobs, where we may be collecting them as part of a
reachability traversal, etc, and don't actually care what's in them. And
blobs, of course, tend to be larger.
We don't want to just throw out the hash-checks for blobs, though. We do
depend on them in some circumstances (e.g., rev-list --verify-objects
uses parse_object() to check them). It's only the callers that know
how they're going to use the result. And so we can help them by
providing a special flag to skip the hash check.
We could just apply this to blobs, as they're going to be the main
source of performance improvement. But if a caller doesn't care about
checking the hash, we might as well skip it for other object types, too.
Even though we can't avoid reading the object contents, we can still
skip the actual hash computation.
If this seems like it is making Git a little bit less safe against
corruption, it may be. But it's part of a series of tradeoffs we're
already making. For instance, "rev-list --objects" does not open the
contents of blobs it prints. And when a commit graph is present, we skip
opening most commits entirely. The important thing will be to use this
flag in cases where it's safe to skip the check. For instance, when
serving a pack for a fetch, we know the client will fully index the
objects and do a connectivity check itself. There's little to be gained
from the server side re-hashing a blob itself. And indeed, most of the
time we don't! The revision machinery won't open up a blob reached by
traversal, but only one requested directly with a "want" line. So
applied properly, this new feature shouldn't make anything less safe in
practice.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
SZEDER Gábor [Mon, 5 Sep 2022 18:50:07 +0000 (20:50 +0200)]
notes, remote: show unknown subcommands between `'
Update the "unknown subcommand" error message in 'git notes' and 'git
remote' to wrap the offending argument between `', to make it
consistent with the "unknown switch/option/subcommand" error messages
in parse-options.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git notes' has a default operation mode, but when invoked without a
subcommand it doesn't accept any arguments (although the 'list'
subcommand implementing the default operation mode does accept
arguments). The condition checking this ended up a bit awkward, so
let's make it clearer.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
docs: add CONFIGURATION sections that fuzzy map to built-ins
Add a CONFIGURATION section to the documentation of various built-ins,
for those cases where the relevant config/NAME.txt doesn't map only to
one git-NAME.txt. In particular:
* config/blame.txt: used by git-{blame,annotate}.txt. Since the
git-annotate(1) documentation refers to git-blame(1) don't add a
"CONFIGURATION" section to git-annotate(1), only to git-blame(1).
* config/checkout.txt: maps to both git-checkout.txt and
git-switch.txt (but nothing else).
* config/init.txt: should be included in git-init(1) and
git-clone(1).
* config/column.txt: We should ideally mention the relevant subset of
this in git-{branch,clean,status,tag}.txt, but let's punt on it for
now. We will when we eventually split these sort of files into
e.g. config/column.txt and
config/column/{branch,clean,status,tag}.txt, with the former
including the latter set.
Things that are being left out, and why:
* config/{remote,remotes,credential}.txt: Configuration that affects
how we talk to remote repositories is harder to untangle. We'll need
to include some of this in git-{fetch,remote,push,ls-remote}.txt
etc., but some of those only use a small subset of these
options. Let's leave this for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
docs: add CONFIGURATION sections that map to a built-in
Add a CONFIGURATION section to the documentation of various built-ins,
for those cases where the relevant config/NAME.txt describes
configuration that is only used by the relevant built-in documented in
git-NAME.txt. Subsequent commits will handle more complex cases.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Include the "config/difftool.txt" file in "git-difftool.txt", and move
the relevant part of git-difftool(1) configuration from
"config/diff.txt" to config/difftool.txt".
Doing this is slightly odd, as we usually discuss configuration in
alphabetical order, but by doing it we're able to include the full set
of configuration used by git-difftool(1) (and only that configuration)
in its own documentation.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
notes docs: de-duplicate and combine configuration sections
Combine the various "notes" configuration sections spread across
Documentation/config/notes.txt and Documentation/git-notes.txt to live
in the former, and to be included in the latter.
We'll now forward link from "git notes" to the "CONFIGURATION" section
below, rather than to "git-config(1)" when discussing configuration
variables that are (also) discussed in that section.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
De-duplicate the discussion of "send-email" configuration, such that
the "git-config(1)" manual page becomes the source of truth, and
"git-send-email(1)" includes the relevant part.
Most commands that suffered from such duplication had diverging text
discussing the same variables, but in this case some config was also
only discussed in one or the other.
This is mostly a move-only change, the exception is a minor rewording
of changing wording like "see above" to "see linkgit:git-config[1]",
as well as a clarification about the big section of command-line
option tweaking config being discussed in git-send-email(1)'s main
docs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Include the "config/grep.txt" file in "git-grep.txt", instead of
repeating an almost identical description of the "grep" configuration
variables in two places.
There is no loss of information here that isn't shown in the addition
to "grep.txt". This change was made by copying the contents of
"git-grep.txt"'s version over the "grep.txt" version. Aside from the
change "grep.txt" being made here the two were identical.
This documentation started being copy/pasted around in b22520a37c8 (grep: allow -E and -n to be turned on by default via
configuration, 2011-03-30). After that in e.g. 6453f7b3486 (grep: add
grep.fullName config variable, 2014-03-17) they started drifting
apart, with only grep.fullName being described in the command
documentation.
In 434e6e753fe (config.txt: move grep.* to a separate file,
2018-10-27) we gained the include, but didn't do this next step, let's
do it now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
docs: add and use include template for config/* includes
In b6a8d09f6d8 (gc docs: include the "gc.*" section from "config" in
"gc", 2019-04-07) the "git gc" documentation was made to include the
config/gc.txt in its "CONFIGURATION" section. We do that in several
other places, but "git gc" was the only one with a blurb above the
include to orient the reader.
We don't want readers to carefully scrutinize "git-config(1)" and
"git-gc(1)" looking for discrepancies, instead we should tell them
that the latter includes a part of the former.
This change formalizes that wording in two new templates to be
included, one for the "git gc" case where the entire section is
included from "git-config(1)", and another for when the inclusion of
"git-config(1)" follows discussion unique to that documentation. In
order to use that re-arrange the order of those being discussed in the
"git-merge(1)" documentation.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 6 Sep 2022 21:04:35 +0000 (17:04 -0400)]
rev-list: disable commit graph with --verify-objects
Since the point of --verify-objects is to actually load and checksum the
bytes of each object, optimizing out reads using the commit graph runs
contrary to our goal.
The most targeted way to implement this would be for the revision
traversal code to check revs->verify_objects and avoid using the commit
graph. But it's difficult to be sure we've hit all of the correct spots.
For instance, I started this patch by writing the first of the included
test cases, where the corrupted commit is directly on rev-list's command
line. And that is easy to fix by teaching get_reference() to check
revs->verify_objects before calling lookup_commit_in_graph().
But that doesn't cover the second test case: when we traverse to a
corrupted commit, we'd parse the parent in process_parents(). So we'd
need to check there, too. And it keeps going. In handle_commit() we
sometimes parses commits, too, though I couldn't figure out a way to
trigger it that did not already parse via get_reference() or tag
peeling. And try_to_simplify_commit() has its own parse call, and so on.
So it seems like the safest thing is to just disable the commit graph
for the whole process when we see the --verify-objects option. We can do
that either in builtin/rev-list.c, where we use the option, or in
revision.c, where we parse it. There are some subtleties:
- putting it in rev-list.c is less surprising in some ways, because
there we know we are just doing a single traversal. In a command
which does multiple traversals in a single process, it's rather
unexpected to globally disable the commit graph.
- putting it in revision.c is less surprising in some ways, because
the caller does not have to remember to disable the graph
themselves. But this is already tricky! The verify_objects flag in
rev_info doesn't do anything by itself. The caller has to provide an
object callback which does the right thing.
- for that reason, in practice nobody but rev-list uses this option in
the first place. So the distinction is probably not important either
way. Arguably it should just be an option of rev-list, and not the
general revision machinery; right now you can run "git log
--verify-objects", but it does not actually do anything useful.
- checking for a parsed revs.verify_objects flag in rev-list.c is too
late. By that time we've already passed the arguments to
setup_revisions(), which will have parsed the commits using the
graph.
So this commit disables the graph as soon as we see the option in
revision.c. That's a pretty broad hammer, but it does what we want, and
in practice nobody but rev-list is using this flag anyway.
The tests cover both the "tip" and "parent" cases. Obviously our hammer
hits them both in this case, but it's good to check both in case
somebody later tries the more focused approach.
Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 6 Sep 2022 21:02:13 +0000 (17:02 -0400)]
lookup_commit_in_graph(): use prepare_commit_graph() to check for graph
We exit early from lookup_commit_in_graph() if the commit_graph pointer
is NULL, under the assumption that we don't have a graph to look at. But
the graph pointer is lazy-loaded; if no other code happens to have
called prepare_commit_graph(), we'll incorrectly assume that one isn't
available at all.
This has a pretty small performance impact in practice, because the
fallback will generally be to call parse_object() instead. That ends up
in parse_commit_buffer(), which loads the graph data itself. So the
first commit we see won't use the graph, but subsequent ones will. Since
using the graph is just an optimization there's generally no
user-visible difference, but if you instrument rev-list like so:
diff --git a/revision.c b/revision.c
index ee702e498a..63c488ffb6 100644
--- a/revision.c
+++ b/revision.c
@@ -381,6 +381,9 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
* parsing commit data from disk.
*/
commit = lookup_commit_in_graph(revs->repo, oid);
+ warning("%s %s in commit graph",
+ commit ? "found" : "did not find",
+ name);
if (commit)
object = &commit->object;
else
warning: did not find origin/master in commit graph
warning: found origin/next in commit graph
After this patch, you'll see that we find both:
warning: found origin/master in commit graph
warning: found origin/next in commit graph
Even though the performance implication is small here, there are two
important reasons to do this:
- it's downright confusing if you are hunting a bug triggered by the
use of the commit graph. It may or may not trigger depending on the
number and ordering of tips you ask for.
- prepare_commit_graph() has other policy logic, too. In particular,
if we've loaded a commit graph and then disabled the graph via
disable_commit_graph(), that should take precedence.
I'm not sure if this can trigger bad behavior in practice. The only
caller there is upload-pack's deepen_by_rev_list(), which should be
avoiding the commit graph for its traversal tips, but probably
wasn't before this patch. Whether you could come up with a case
where that mattered is unclear. Still, this is obviously the right
thing to be doing.
Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 6 Sep 2022 01:33:40 +0000 (18:33 -0700)]
Merge branch 'es/t4301-sed-portability-fix'
Test clean-up.
* es/t4301-sed-portability-fix:
t4301: emit blank line in more idiomatic fashion
t4301: fix broken &&-chains and add missing loop termination
t4301: account for behavior differences between sed implementations
Junio C Hamano [Tue, 6 Sep 2022 01:33:40 +0000 (18:33 -0700)]
Merge branch 'rs/tempfile-cleanup-race-fix'
The clean-up of temporary files created via mks_tempfile_dt() was
racy and attempted to unlink() the leading directory when signals
are involved, which has been corrected.
Ensure 'is_sparse_directory_entry()' receives a valid 'name_entry *' if one
exists in the list of tree(s) being unpacked in 'unpack_callback()'.
Currently, 'is_sparse_directory_entry()' is called with the first
'name_entry' in the 'names' list of entries on 'unpack_callback()'. However,
this entry may be empty even when other elements of 'names' are not (such as
when switching from an orphan branch back to a "normal" branch). As a
result, 'is_sparse_directory_entry()' could incorrectly indicate that a
sparse directory is *not* actually sparse because the name of the index
entry does not match the (empty) 'name_entry' path.
Fix the issue by using the existing 'name_entry *p' value in
'unpack_callback()', which points to the first non-empty entry in 'names'.
Because 'p' is 'const', also update 'is_sparse_directory_entry()'s
'name_entry *' argument to be 'const'.
Finally, add a regression test case.
Reported-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: fix a configure_added_submodule() leak
Fix config API a memory leak added in a452128a36c (submodule--helper:
introduce add-config subcommand, 2021-08-06) by using the *_tmp()
variant of git_config_get_string().
In this case we're only checking whether
the (repo|git)_config_get_string() call is telling us that the
"submodule.active" key exists.
As with the preceding commit we'll find many other such patterns in
the codebase if we go fishing. E.g. "git gc" leaks in the code added
in 61f7a383d3b (maintenance: use 'incremental' strategy by default,
2020-10-15). Similar code in "git gc" added in b08ff1fee00 (maintenance: add --schedule option and config,
2020-09-11) doesn't leak, but we could avoid the malloc() & free() in
that case.
A coccinelle rule to find those would find and fix some leaks, and
cases where we're doing needless malloc() + free()'s but only care
about the key existence, or are copying
the (repo|git)_config_get_string() return value right away.
But as with the preceding commit let's punt on all of that for now,
and just narrowly fix this specific case in submodule--helper.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: free rest of "displaypath" in "struct update_data"
Fix a leak in code added in c51f8f94e5b (submodule--helper: run update
procedures from C, 2021-08-24), we clobber the "displaypath" member of
the passed-in "struct update_data" both so that die() messages in this
update_submodule() function itself can use it, and for the
run_update_procedure() called within this function.
Fix a leak in code added in 51f8f94e5b (submodule--helper: run update
procedures from C, 2021-08-24). We'd always clobber the old
"displaypath" member of the previously passed-in "struct update_data".
A better fix for this would be to remove the "displaypath" member from
the "struct update_data" entirely. Along with "oid", "suboid",
"just_cloned" and "sm_path" it's managing members that mainly need to
be passed between 1-3 stack frames of functions adjacent to this
code. But doing so would be a much larger change (I have it locally,
and fully untangling that in an incremental way is a 10 patch
journey).
So let's go for this much more isolated fix suggested by Glen. We
FREE_AND_NULL() the "update_data->displaypath", the "AND_NULL()" part
of that is needed due to the later "free(ud->displaypath)" in
"update_data_release()" introduced in the preceding commit
Moving ensure_core_worktree() out of update_submodule() may not be
strictly required, but in doing so we are left with the exact same
ordering as before, making this a smaller functional change.
Helped-by: Glen Choo <chooglen@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: free some "displaypath" in "struct update_data"
Make the update_data_release() function free "displaypath" member when
appropriate. The "displaypath" member is always ours, the "const" on
the "char *" was wrong to begin with.
This leaves a leak of "displaypath" in update_submodule(), which as
we'll see in subsequent commits is harder to deal with than this
trivial fix.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: fix a memory leak in print_status()
Fix a leak in print_status(), the compute_rev_name() function
implemented in this file will return a strbuf_detach()'d value, or
NULL.
This leak has existed since this code was added in a9f8a37584a (submodule: port submodule subcommand 'status' from shell
to C, 2017-10-06), but in 0b5e2ea7cf3 (submodule--helper: don't print
null in 'submodule status', 2018-04-18) we added a "const"
intermediate variable for the return value, that "const" should be
removed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a leak in module_path(), since a6226fd772b (submodule--helper:
convert the bulk of cmd_add() to C, 2021-08-10), we've been freeing
add_data.sm_path, but in this case we clobbered it, and didn't free
the value we clobbered.
This makes test 28 of "t/t7400-submodule-basic.sh" ("submodule add in
subdirectory") pass when we're compiled with SANITIZE=leak..
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: fix obscure leak in module_add()
Fix an obscure leak in module_add(), if the "git add" command we were
piping to failed we'd fail to strbuf_release(&sb). This fixes a leak
introduced in a6226fd772b (submodule--helper: convert the bulk of
cmd_add() to C, 2021-08-10).
In fixing it move to a "goto cleanup" pattern, and since we need to
introduce a "ret" variable to do that let's also get rid of the
intermediate "exit_code" variable. The initialization to "-1" in a6226fd772b has always been redundant, we'd only use the "exit_code"
value after assigning the return value of pipe_command() to it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix leaks in the "reference" variable declared in add_submodule() and
module_clone().
In preceding commits this variable was refactored out of the "struct
module_clone_data", but the leak has been with us since 31224cbdc72 (clone: recursive and reference option triggers submodule
alternates, 2016-08-17) and 8c8195e9c3e (submodule--helper: introduce
add-clone subcommand, 2021-07-10).
Those commits added an xstrdup()'d member of the
STRING_LIST_INIT_NODUP'd "struct string_list". We need to free()
those, but not the ones we get from argv, let's make use of the "util"
member, if it has a pointer it's the pointer we'll need to free,
otherwise it'll be NULL (i.e. from argv).
Note that the free() of the "util" member is needed in both
module_clone() and add_submodule(). The module_clone() function itself
doesn't populate the "util" pointer as add_submodule() does, but
module_clone() is upstream of the
add_possible_reference_from_superproject() caller we're modifying
here, which does do that.
This does preclude the use of the "util" pointer for any other reasons
for now, but that's OK. If we ever need to use it for something else
we could turn it into a small "struct" with an optional "to_free"
member, and switch to using string_list_clear_func().
Alternatively we could have another "struct string_list to_free" which
would keep a copy of the strings we've dup'd to free(). But for now
this is perfectly adequate.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: fix a memory leak in get_default_remote_submodule()
Fix a memory leak in the get_default_remote_submodule() function added
in a77c3fcb5ec (submodule--helper: get remote names from any
repository, 2022-03-04), we need to repo_clear() the submodule we
initialize.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call repo_clear() in ensure_core_worktree() to free the "struct
repository". Fixes a leak that's been here since 74d4731da1f (submodule--helper: replace connect-gitdir-workingtree by
ensure-core-worktree, 2018-08-13).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: fix "sm_path" and other "module_cb_list" leaks
Fix leaks in "struct module_cb_list" and the "struct module_cb" which
it contains, these fix leaks in e83e3333b57 (submodule: port submodule
subcommand 'summary' from shell to C, 2020-08-13).
The "sm_path" should always have been a "char *", not a "const
char *", we always create it with xstrdup().
We can't mark any tests passing passing with SANITIZE=leak using
"TEST_PASSES_SANITIZE_LEAK=true" as a result of this change, but
"t7401-submodule-summary.sh" gets closer to passing as a result of
this change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak introduced in e83e3333b57 (submodule: port submodule
subcommand 'summary' from shell to C, 2020-08-13), we sometimes append
to the "errmsg", and need to free the "struct strbuf".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak in c51f8f94e5b (submodule--helper: run update
procedures from C, 2021-08-24) and 3c3558f0953 (submodule--helper: run
update using child process struct, 2022-03-15) by not allocating
memory in the first place.
The "dir" member of "struct child_process" will not be modified by
that API, and it's declared to be "const char *". So let's not
needlessly duplicate these strings.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: "struct pathspec" memory leak in module_update()
The module_update() function calls module_list_compute() twice, which
in turn will reset the "struct pathspec" passed to it. Let's instead
track two of them, and clear them both.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: fix most "struct pathspec" memory leaks
Call clear_pathspec() at the end of various functions that work with
and allocate a "struct pathspec".
In some cases the zero-initialization here isn't strictly needed, but
as we're moving to a "goto cleanup" pattern let's make sure that it's
safe to call clear_pathspec(), we don't want the data to be
uninitialized.
E.g. for module_foreach() we can see from looking at
module_list_compute() that if it returns non-zero that the "pathspec"
will always have been initialized. But relying on that both assumes
knowledge about parse_pathspec(), and would set up a fragile pattern
going forward.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a leak in code added in 1012a5cbc3f (submodule--helper
run-update-procedure: learn --remote, 2022-03-04), we need to free()
the xstrdup()'d string. This gets e.g. t/t7419-submodule-set-branch.sh
closer to passing under SANITIZE=leak.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: fix a leak in "clone_submodule"
Fix a memory leak of the "clone_data_path" variable that we copy or
derive from the "struct module_clone_data" in clone_submodule(). This
code was refactored in preceding commits, but the leak has been with
us since f8eaa0ba98b (submodule--helper, module_clone: always operate
on absolute paths, 2016-03-31).
For the "else" case we don't need to xstrdup() the "clone_data->path",
and we don't need to free our own "clone_data_path". We can therefore
assign the "clone_data->path" to our own "clone_data_path" right away,
and only override it (and remember to free it!) if we need to
xstrfmt() a replacement.
In the case of the module_clone() caller it's from "argv", and doesn't
need to be free'd, and in the case of the add_submodule() caller we
get a pointer to "sm_path", which doesn't need to be directly free'd
either.
Fixing this leak makes several tests pass, so let's mark them as
passing with TEST_PASSES_SANITIZE_LEAK=true.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix bad config API usage added in a452128a36c (submodule--helper:
introduce add-config subcommand, 2021-08-06). After
git_config_get_string() returns successfully we know the "char **dest"
will be non-NULL.
A coccinelle patch that transforms this turns up a couple of other
such issues, one in fetch-pack.c, and another in upload-pack.c:
submodule--helper: libify even more "die" paths for module_update()
As noted in a preceding commit the get_default_remote_submodule() and
remote_submodule_branch() functions would invoke die(), and thus leave
update_submodule() only partially lib-ified. We've addressed the
former of those in a preceding commit, let's now address the latter.
In addition to lib-ifying the function this fixes a potential (but
obscure) segfault introduced by a logic error in 1012a5cbc3f (submodule--helper run-update-procedure: learn --remote,
2022-03-04):
We were assuming that remote_submodule_branch() would always return
non-NULL, but if the submodule_from_path() call in that function fails
we'll return NULL. See its introduction in 92bbe7ccf1f (submodule--helper: add remote-branch helper,
2016-08-03). I.e. we'd previously have segfaulted in the xstrfmt()
call in update_submodule() seen in the context.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: libify more "die" paths for module_update()
As noted in a preceding commit the get_default_remote_submodule() and
remote_submodule_branch() functions would invoke die(), and thus leave
update_submodule() only partially lib-ified. Let's address the former
of those cases.
Change the functions to return an int exit code (non-zero on failure),
while leaving the get_default_remote() function for the callers that
still want the die() semantics.
This change addresses 1/2 of the "die" issue in these two lines in
update_submodule():
We can safely remove the "!default_remote" case from sync_submodule(),
because our get_default_remote_submodule() function now returns a
die_message() on failure, so we can have it and other callers check if
the exit code should be non-zero instead.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix code added in ce125d431aa (submodule: extract path to submodule
gitdir func, 2021-09-15) and a77c3fcb5ec (submodule--helper: get
remote names from any repository, 2022-03-04) which failed to check
the return values of repo_init() and repo_submodule_init(). If we
failed to initialize the repository or submodule we could segfault
when trying to access the invalid repository structs.
Let's also check that these were the only such logic errors in the
codebase by making use of the "warn_unused_result" attribute. This is
valid as of GCC 3.4.0 (and clang will catch it via its faking of
__GNUC__ ).
As the comment being added to git-compat-util.h we're piggy-backing on
the LAST_ARG_MUST_BE_NULL version check out of lazyness. See 9fe3edc47f1 (Add the LAST_ARG_MUST_BE_NULL macro, 2013-07-18) for its
addition. The marginal benefit of covering gcc 3.4.0..4.0.0 is
near-zero (or zero) at this point. It mostly matters that we catch
this somewhere.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue the libification of codepaths that previously relied on
"must_die_on_failure". In these cases we've always been early aborting
by calling die(), but as we know that these codepaths will properly
handle return codes of 128 to mean an early abort let's have them use
die_message() instead.
This still isn't a complete migration away from die() for these
codepaths, in particular this code in update_submodule() will still call die() in some cases:
When "git submodule update" runs it might call "checkout", "merge",
"rebase", or a custom command. Ever since run_update_command() was
added in c51f8f94e5b (submodule--helper: run update procedures from C,
2021-08-24) we'd either exit immediately if the
"submodule.<name>.update" method failed, or in the case of "checkout"
continue trying to update other submodules.
This code used to use the magical "2" return code, but in 55b3f12cb54 (submodule update: use die_message(), 2022-03-15) it was
made to exit(128), which in preceding commits has been changed to
return that 128 code to the top-level.
Let's "libify" this code even more by not having it arbitrarily
override the return code. In practice this doesn't change anything as
the code "git checkout" would return on any normal failure is "1", but
we'll now in principle properly abort the operation if "git checkout"
were to exit with 128.
It would make sense to follow-up this change with a change to allow
the "submodule.<name>.update = !..." (SM_UPDATE_COMMAND) method the
same liberties as "checkout", and perhaps to do the same with a failed
"merge" or "rebase". But let's leave that for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preceding commits the codepaths around update_submodules() were
changed from using exit() or die() to ferrying up a
"must_die_on_failure" in the cases where we'd exit(), and in most
cases where we'd die().
We needed to do this this to ensure that we'd early exit or otherwise
abort the update_submodules() processing before it was completed.
Now that those preceding changes have shown that we've converted those
paths, we can remove the remaining "ret == 128" special-cases, leaving
the only such special-case in update_submodules(). I.e. we now know
after having gone through the various codepaths that we were only
returning 128 if we meant to early abort.
In update_submodules() we'll for now set any non-zero non-128 exit
codes to "1", but will start ferrying up the exit code as-is in a
subsequent commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Libify the determine_submodule_update_strategy() by having it invoke
die_message() rather than die(), and returning the code die_message()
returns on failure.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: don't exit() on failure, return
Change code downstream of module_update() to short-circuit and return
to the top-level on failure, rather than calling exit().
To do so we need to diligently check whether we "must_die_on_failure",
which is a pattern started in c51f8f94e5b (submodule--helper: run
update procedures from C, 2021-08-24), but which hadn't been completed
to the point where we could avoid calling exit() here.
This introduces no functional changes, but makes it easier to both
call these routines as a library in the future, and to eventually
avoid leaking memory.
This and similar control flow in submodule--helper.c could be made
simpler by properly "libifying" it, i.e. to have it consistently
return -1 on failures, and to early return on any non-success.
But let's leave that larger project for now, and (mostly) emulate what
were doing with the "exit(128)" before this change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: use "code" in run_update_command()
Apply some DRY principles in run_update_command() and don't have two
"switch" statements over "ud->update_strategy.type" determine the same
thing.
First we were setting "must_die_on_failure = 1" in all cases except
"SM_UPDATE_CHECKOUT" (and we'd BUG(...) out on the rest). This code
was added in c51f8f94e5b (submodule--helper: run update procedures
from C, 2021-08-24).
Then we'd duplicate same "switch" logic when we were using the
"must_die_on_failure" variable.
Let's instead have the "case" branches in that inner "switch"
determine whether or not the "update must continue" by picking an exit
code.
This also mostly avoids hardcoding the "128" exit code, instead we can
make use of the return value of the die_message() function, which
we've been calling here since 55b3f12cb54 (submodule update: use
die_message(), 2022-03-15). We're still hardcoding it to determine if
we "exit()", but subsequent commit(s) will address that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule API: don't handle SM_..{UNSPECIFIED,COMMAND} in to_string()
Change the submodule_strategy_to_string() function added in 3604242f080 (submodule: port init from shell to C, 2016-04-15) to
really return a "const char *". In the "SM_UPDATE_COMMAND" case it
would return a strbuf_detach().
Furthermore, this function would return NULL on SM_UPDATE_UNSPECIFIED,
so it wasn't safe to xstrdup() its return value in the general case,
or to use it in a sprintf() format as the code removed in the
preceding commit did.
But its callers would never call it with either SM_UPDATE_UNSPECIFIED
or SM_UPDATE_COMMAND. Let's have its behavior reflect how its only
user expects it to behave, and BUG() out on the rest.
By doing this we can also stop needlessly xstrdup()-ing and free()-ing
the memory for the config we're setting. We can instead always use
constant strings. We can also use the *_tmp() variant of
git_config_get_string().
Let's also rename this submodule_strategy_to_string() function to
submodule_update_type_to_string(). Now that it's only tasked with
returning a string version of the "enum submodule_update_type type".
Before it would look at the "command" field in "struct
submodule_update_strategy".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>