Jeff King [Tue, 4 Jun 2024 10:13:35 +0000 (06:13 -0400)]
sparse-checkout: free pattern list in sparse_checkout_list()
In sparse_checkout_list(), we create a pattern_list that needs to
eventually be cleared. We remember to do so in the regular code path,
but the cone-mode path does an early return, and forgets to clean up.
We could fix the leak by adding a new call to clear_pattern_list(). But
we can simplify even further by just skipping the early return, pushing
the other code path (which consists now of only one line!) into an else
block. That also matches the same cone/non-cone if/else used in some
other functions.
This fixes 15 leaks found in t1091.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 4 Jun 2024 10:13:32 +0000 (06:13 -0400)]
sparse-checkout: free sparse_filename after use
We allocate a heap buffer via get_sparse_checkout_filename(). Most calls
remember to free it, but sparse_checkout_init() forgets to, causing a
leak. Ironically, it remembers to do so in the error return paths, but
not in the path that makes it all the way to the function end!
Fixing this clears up 6 leaks from t1091.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In update_working_directory(), we take in a pattern_list, attach it to
the repository index by assigning it to index->sparse_checkout_patterns,
and then call unpack_trees. Afterwards, we remove it by setting
index->sparse_checkout_patterns back to NULL.
But there are two possible leaks here:
1. If the index already had a populated sparse_checkout_patterns,
we've obliterated it. We can fix this by saving and restoring it,
rather than always setting it back to NULL.
2. We may call the function with a NULL pattern_list, expecting it to
use the on-disk sparse file. In that case, the index routines will
lazy-load the sparse patterns automatically. But now at the end of
the function when we restore the patterns, we'll leak those
lazy-loaded ones!
We can fix this by freeing the pattern list before overwriting its
pointer whenever it does not match what was passed in (in practice
this should only happen when the passed-in list is NULL, but this
is erring on the defensive side).
Together these remove 48 indirect leaks found in t1091.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 4 Jun 2024 10:13:27 +0000 (06:13 -0400)]
sparse-checkout: always free "line" strbuf after reading input
In add_patterns_from_input(), we may read lines from a file with a loop
like this:
while (!strbuf_getline(&line, file)) {
...
strbuf_to_cone_pattern(&line, pl);
}
/* we don't strbuf_release(&line) here! */
This generally is OK because strbuf_to_cone_pattern() consumes the
buffer via strbuf_detach(). But we can leak in a few cases:
1. We don't always consume the buffer! If the line ends up empty after
trimming, we leave strbuf_to_cone_pattern() without detaching. In
most cases this is OK, because a subsequent getline() call will use
the same buffer. But if you had an empty line at the end of file,
for example, it would leak.
2. Even if strbuf_to_cone_pattern() always consumed the buffer,
there's a subtle issue with strbuf_getline(). As we saw in 94e2aa555e (strbuf: fix leak when `appendwholeline()` fails with
EOF, 2024-05-27), it's possible for it to return EOF with an
allocated buffer (e.g., if the underlying getdelim() call saw an
error). So we should always strbuf_release() after finishing a read
loop like this.
Note that even the code to read patterns from argv has the same problem.
Because that also uses strbuf_to_cone_pattern(), we stuff each argv
entry into a strbuf. It uses the same "line" strbuf as the getline code,
but we should position the strbuf_release() to cover both code paths.
This fixes at least 9 leaks found in t1091.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 4 Jun 2024 10:13:25 +0000 (06:13 -0400)]
sparse-checkout: reuse --stdin buffer when reading patterns
When we read patterns from --stdin, we loop on strbuf_getline(), and
detach each line we read to pass into add_pattern(). This used to be
necessary because add_pattern() required that the pattern strings remain
valid while the pattern_list was in use. But it also created a leak,
since we didn't record the detached buffers anywhere else.
Now that add_pattern() has been modified to make its own copy of the
strings, we can stop detaching and fix the leak. This fixes 4 leaks
detected in t1091.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 4 Jun 2024 10:13:22 +0000 (06:13 -0400)]
dir.c: always copy input to add_pattern()
The add_pattern() function has a subtle and undocumented gotcha: the
pattern string you pass in must remain valid as long as the pattern_list
is in use (and nor do we take ownership of it). This is easy to get
wrong, causing either subtle bugs (because you free or reuse the string
buffer) or leaks (because you copy the string, but don't track ownership
separately).
All of this "pattern" code was originally the "exclude" mechanism. So
this _usually_ works OK because you add entries in one of two ways:
1. From the command-line (e.g., "--exclude"), in which case we're
pointing to an argv entry which remains valid for the lifetime of
the program.
2. From a file (e.g., ".gitignore"), in which case we read the whole
file into a buffer, attach it to the pattern_list's "filebuf"
entry, then parse the buffer in-place (adding NULs). The strings
point into the filebuf, which is cleaned up when the whole
pattern_list goes away.
But other code, like sparse-checkout, reads individual lines from stdin
and passes them one by one to add_pattern(), leaking each. We could fix
this by refactoring it to take in the whole buffer at once, like (2)
above, and stuff it in "filebuf". But given how subtle the interface is,
let's just fix it to always copy the string.
That seems at first like we'd be wasting extra memory, but we can
mitigate that:
a. The path_pattern struct already uses a FLEXPTR, since we sometimes
make a copy (when we see "foo/", we strip off the trailing slash,
requiring a modifiable copy of the string).
Since we'll now always embed the string inside the struct, we can
switch to the regular FLEX_ARRAY pattern, saving us 8 bytes of
pointer. So patterns with a trailing slash and ones under 8 bytes
actually get smaller.
b. Now that we don't need the original string to hang around, we can
get rid of the "filebuf" mechanism entirely, and just free the file
contents after parsing. Since files are the sources we'd expect to
have the largest pattern sets, we should mostly break even on
stuffing the same data into the individual structs.
This patch just adjusts the add_pattern() interface; it doesn't fix any
leaky callers yet.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In add_pattern_to_hashsets(), we remove entries from the
recursive_hashmap when adding similar ones to the parent_hashmap. I
won't pretend to understand all of what's going on here, but there's an
obvious leak: whatever we removed from recursive_hashmap is not
referenced anywhere else, and is never free()d.
We can easily fix this by asking the hashmap to return a pointer to the
old entry. This makes t7002 now completely leak-free.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 4 Jun 2024 10:13:17 +0000 (06:13 -0400)]
sparse-checkout: clear patterns when init() sees existing sparse file
In sparse_checkout_init(), we first try to load patterns from an
existing file. If we found any, we return immediately, but end up
leaking the patterns we parsed. Fixing this reduces the number of leaks
in t7002 from 9 down to 5.
Note that there are two other exits from the function, but they don't
need the same treatment:
- if we can't resolve HEAD, we write out a hard-coded sparse file and
return. But we know the pattern list is empty there, since we didn't
find any in the on-disk file and we haven't yet added any of our
own.
- otherwise, we do populate the list and then tail-call into
write_patterns_and_update(). But that function frees the
pattern_list itself, so we don't need to.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 4 Jun 2024 10:13:14 +0000 (06:13 -0400)]
dir.c: free strings in sparse cone pattern hashmaps
The pattern_list structs used for cone-mode sparse lookups use a few
extra hashmaps. These store pattern_entry structs, each of which has its
own heap-allocated pattern string. When we clean up the hashmaps, we
free the individual pattern_entry structs, but forget to clean up the
embedded strings, causing memory leaks.
We can fix this by iterating over the hashmaps to free the extra
strings. This reduces the numbers of leaks in t7002 from 22 to 9.
One alternative here would be to make the string a FLEX_ARRAY member of
the pattern_entry. Then there's no extra free() required, and as a bonus
it would be a little more efficient. However, some of the refactoring
gets awkward, as we are often assigning strings allocated by helper
functions. So let's just fix the leak for now, and we can explore bigger
refactoring separately.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 4 Jun 2024 10:13:10 +0000 (06:13 -0400)]
sparse-checkout: pass string literals directly to add_pattern()
The add_pattern() function takes a pattern string, but neither makes a
copy of it nor takes ownership of the memory. So it is the caller's
responsibility to make sure the string hangs around as long as the
pattern_list which references it.
There are a few cases in sparse-checkout where we use string literal
patterns by stuffing them into a strbuf, detaching the buffer, and then
passing the result into add_pattern(). This creates a leak when the
pattern_list is eventually cleared, since we don't retain a copy of the
detached buffer to free.
But we can observe that the whole strbuf dance is unnecessary. The point
was presumably[1] to satisfy the lifetime requirement of the string. But
string literals have static duration; we can count on them lasting for
the whole program.
So we can fix the leak by just passing them directly. And as a bonus,
that simplifies the code. The leaks can be seen in t7002, which drops
from 25 leaks to 22 with this patch. It also makes t3602 and t1090
leak-free.
In the long run, we will also want to clean up this (undocumented!)
memory lifetime requirement of add_pattern(). But that can come in a
later patch; passing the string literals directly will be the right
thing either way.
[1] The code in question comes from 416adc8711 (sparse-checkout: update
working directory in-process for 'init', 2019-11-21) and 99dfa6f970
(sparse-checkout: use in-process update for disable subcommand,
2019-11-21), but I didn't see anything in their commit messages or
on the list explaining the strbufs.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 31 May 2024 15:55:34 +0000 (08:55 -0700)]
Merge branch 'jk/leakfixes' into jk/sparse-leakfix
* jk/leakfixes:
mv: replace src_dir with a strvec
mv: factor out empty src_dir removal
mv: move src_dir cleanup to end of cmd_mv()
t-strvec: mark variable-arg helper with LAST_ARG_MUST_BE_NULL
t-strvec: use va_end() to match va_start()
Jeff King [Thu, 30 May 2024 06:46:38 +0000 (02:46 -0400)]
mv: replace src_dir with a strvec
We manually manage the src_dir array with ALLOC_GROW. Using a strvec is
a little more ergonomic, and makes the memory ownership more clear. It
does mean that we copy the strings (which were otherwise just pointers
into the "sources" strvec), but using the same rationale as 9fcd9e4e72
(builtin/mv duplicate string list memory, 2024-05-27), it's just not
enough to be worth worrying about here.
As a bonus, this gets rid of some "int"s used for allocation management
(though in practice these were limited to command-line sizes and thus
not overflowable).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 30 May 2024 06:45:21 +0000 (02:45 -0400)]
mv: factor out empty src_dir removal
This pulls the loop added by b6f51e3db9 (mv: cleanup empty
WORKING_DIRECTORY, 2022-08-09) into a sub-function. That reduces clutter
in cmd_mv() and makes it easier to see that the lifetime of the
a_src_dir strbuf is limited to this code (and thus its cleanup doesn't
need to go after the "out" label).
Another option would be to just declare the strbuf inside the loop,
since it is only used there. But this refactor retains the existing
property that we can reuse the allocated buffer for each iteration of
the loop. That optimization is probably overkill, but I think the
sub-function is more readable anyway, and then keeping the optimization
is basically free.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 30 May 2024 06:44:22 +0000 (02:44 -0400)]
mv: move src_dir cleanup to end of cmd_mv()
Commit b6f51e3db9 (mv: cleanup empty WORKING_DIRECTORY, 2022-08-09)
added an auxiliary array where we store directory arguments that we see
while processing the incoming arguments. After actually moving things,
we then use that array to remove now-empty directories, and then
immediately free the array.
But if the actual move queues any errors in only_match_skip_worktree,
that can cause us to jump straight to the "out" label to clean up,
skipping the free() and leaking the array.
Let's push the free() down past the "out" label so that we always clean
up (the array is initialized to NULL, so this is always safe). We'll
hold on to the memory a little longer than necessary, but clarity is
more important than micro-optimizing here.
Note that the adjacent "a_src_dir" strbuf does not suffer the same
problem; it is only allocated during the removal step.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 30 May 2024 06:39:32 +0000 (02:39 -0400)]
t-strvec: use va_end() to match va_start()
Our check_strvec_loc() helper uses a variable argument list. When we
va_start(), we must be sure to va_end() before leaving the function.
This is required by the standard (though the effect of forgetting will
vary between platforms).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 30 May 2024 15:54:58 +0000 (08:54 -0700)]
Merge branch 'ps/leakfixes' into jk/leakfixes
* ps/leakfixes:
builtin/mv: fix leaks for submodule gitfile paths
builtin/mv: refactor to use `struct strvec`
builtin/mv duplicate string list memory
builtin/mv: refactor `add_slash()` to always return allocated strings
strvec: add functions to replace and remove strings
submodule: fix leaking memory for submodule entries
commit-reach: fix memory leak in `ahead_behind()`
builtin/credential: clear credential before exit
config: plug various memory leaks
config: clarify memory ownership in `git_config_string()`
builtin/log: stop using globals for format config
builtin/log: stop using globals for log config
convert: refactor code to clarify ownership of check_roundtrip_encoding
diff: refactor code to clarify memory ownership of prefixes
config: clarify memory ownership in `git_config_pathname()`
http: refactor code to clarify memory ownership
checkout: clarify memory ownership in `unique_tracking_name()`
strbuf: fix leak when `appendwholeline()` fails with EOF
transport-helper: fix leaking helper name
The credential helper that talks with osx keychain learned to avoid
storing back the authentication material it just got received from
the keychain.
* kn/osxkeychain-skip-idempotent-store:
osxkeychain: state to skip unnecessary store operations
osxkeychain: exclusive lock to serialize execution of operations
Junio C Hamano [Tue, 28 May 2024 18:17:07 +0000 (11:17 -0700)]
Merge branch 'ps/builtin-config-cleanup'
Code clean-up to reduce inter-function communication inside
builtin/config.c done via the use of global variables.
* ps/builtin-config-cleanup: (21 commits)
builtin/config: pass data between callbacks via local variables
builtin/config: convert flags to a local variable
builtin/config: track "fixed value" option via flags only
builtin/config: convert `key` to a local variable
builtin/config: convert `key_regexp` to a local variable
builtin/config: convert `regexp` to a local variable
builtin/config: convert `value_pattern` to a local variable
builtin/config: convert `do_not_match` to a local variable
builtin/config: move `respect_includes_opt` into location options
builtin/config: move default value into display options
builtin/config: move type options into display options
builtin/config: move display options into local variables
builtin/config: move location options into local variables
builtin/config: refactor functions to have common exit paths
config: make the config source const
builtin/config: check for writeability after source is set up
builtin/config: move actions into `cmd_config_actions()`
builtin/config: move legacy options into `cmd_config()`
builtin/config: move subcommand options into `cmd_config()`
builtin/config: move legacy mode into its own function
...
Junio C Hamano [Tue, 28 May 2024 18:17:06 +0000 (11:17 -0700)]
Merge branch 'ps/pseudo-ref-terminology'
Terminology to call various ref-like things are getting
straightened out.
* ps/pseudo-ref-terminology:
refs: refuse to write pseudorefs
ref-filter: properly distinuish pseudo and root refs
refs: pseudorefs are no refs
refs: classify HEAD as a root ref
refs: do not check ref existence in `is_root_ref()`
refs: rename `is_special_ref()` to `is_pseudo_ref()`
refs: rename `is_pseudoref()` to `is_root_ref()`
Documentation/glossary: define root refs as refs
Documentation/glossary: clarify limitations of pseudorefs
Documentation/glossary: redefine pseudorefs as special refs
Similar to the preceding commit, we have effectively given tracking
memory ownership of submodule gitfile paths. Refactor the code to start
tracking allocated strings in a separate `struct strvec` such that we
can easily plug those leaks. Mark now-passing tests as leak free.
Note that ideally, we wouldn't require two separate data structures to
track those paths. But we do need to store `NULL` pointers for the
gitfile paths such that we can indicate that its corresponding entries
in the other arrays do not have such a path at all. And given that
`struct strvec`s cannot store `NULL` pointers we cannot use them to
store this information.
There is another small gotcha that is easy to miss: you may be wondering
why we don't want to store `SUBMODULE_WITH_GITDIR` in the strvec. This
is because this is a mere sentinel value and not actually a string at
all.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Memory allocation patterns in git-mv(1) are extremely hard to follow:
We copy around string pointers into manually-managed arrays, some of
which alias each other, but only sometimes, while we also drop some of
those strings at other times without ever daring to free them.
While this may be my own subjective feeling, it seems like others have
given up as the code has multiple calls to `UNLEAK()`. These are not
sufficient though, and git-mv(1) is still leaking all over the place
even with them.
Refactor the code to instead track strings in `struct strvec`. While
this has the effect of effectively duplicating some of the strings
without an actual need, it is way easier to reason about and fixes all
of the aliasing of memory that has been going on. It allows us to get
rid of the `UNLEAK()` calls and also fixes leaks that those calls did
not paper over.
Mark tests which are now leak-free accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
makes the next patch easier, where we will migrate to the paths being
owned by a strvec. given that we are talking about command line
parameters here it's also not like we have tons of allocations that this
would save
while at it, fix a memory leak
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/mv: refactor `add_slash()` to always return allocated strings
The `add_slash()` function will only conditionally return an allocated
string when the passed-in string did not yet have a trailing slash. This
makes the memory ownership harder to track than really necessary.
It's dubious whether this optimization really buys us all that much. The
number of times we execute this function is bounded by the number of
arguments to git-mv(1), so in the typical case we may end up saving an
allocation or two.
Simplify the code to unconditionally return allocated strings.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule: fix leaking memory for submodule entries
In `free_one_config()` we never end up freeing the `url` and `ignore`
fields and thus leak memory. Fix those leaks and mark now-passing tests
as leak free.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use a priority queue in `ahead_behind()` to compute the ahead/behind
count for commits. We may not iterate through all commits part of that
queue though in case all of its entries are stale. Consequently, as we
never make the effort to release the remaining commits, we end up
leaking bit arrays that we have allocated for each of the contained
commits.
Plug this leak and mark the corresponding test as leak free.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that memory ownership rules around `git_config_string()` and
`git_config_pathname()` are clearer, it also got easier to spot that
the returned memory needs to be free'd. Plug a subset of those cases and
mark now-passing tests as leak free.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
config: clarify memory ownership in `git_config_string()`
The out parameter of `git_config_string()` is a `const char **` even
though we transfer ownership of memory to the caller. This is quite
misleading and has led to many memory leaks all over the place. Adapt
the parameter to instead be `char **`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're using global variables to store the log configuration. Many of
these can be set both via the command line and via the config, and
depending on how they are being set, they may contain allocated strings.
This leads to hard-to-track memory ownership and memory leaks.
Refactor the code to instead use a `struct log_config` that is being
allocated on the stack. This allows us to more clearly scope the
variables, track memory ownership and ultimately release the memory.
This also prepares us for a change to `git_config_string()`, which will
be adapted to have a `char **` out parameter instead of `const char **`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
convert: refactor code to clarify ownership of check_roundtrip_encoding
The `check_roundtrip_encoding` variable is tracked in a `const char *`
even though it may contain allocated strings at times. The result is
that those strings may be leaking because we never free them.
Refactor the code to always store allocated strings in this variable.
The default value is handled in `check_roundtrip()` now, which is the
only user of the variable.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff: refactor code to clarify memory ownership of prefixes
The source and destination prefixes are tracked in a `const char *`
array, but may at times contain allocated strings. The result is that
those strings may be leaking because we never free them.
Refactor the code to always store allocated strings in those variables,
freeing them as required. This requires us to handle the default values
a bit different compared to before. But given that there is only a
single callsite where we use the variables to `struct diff_options` it's
easy to handle the defaults there.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
config: clarify memory ownership in `git_config_pathname()`
The out parameter of `git_config_pathname()` is a `const char **` even
though we transfer ownership of memory to the caller. This is quite
misleading and has led to many memory leaks all over the place. Adapt
the parameter to instead be `char **`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are various variables assigned via `git_config_string()` and
`git_config_pathname()` which are never free'd. This bug is relatable
because the out parameter of those functions are a `const char **`, even
though memory ownership is transferred to the caller.
We're about to adapt the functions to instead use `char **`. Prepare the
code accordingly. Note that the `(const char **)` casts will go away
once we have adapted the functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
checkout: clarify memory ownership in `unique_tracking_name()`
The function `unique_tracking_name()` returns an allocated string, but
does not clearly indicate this because its return type is `const char *`
instead of `char *`. This has led to various callsites where we never
free its returned memory at all, which causes memory leaks.
Plug those leaks and mark now-passing tests as leak free.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf: fix leak when `appendwholeline()` fails with EOF
In `strbuf_appendwholeline()` we call `strbuf_getwholeline()` with a
temporary buffer. In case the call returns an error we indicate this by
returning EOF, but never release the temporary buffer. This can cause a
leak though because `strbuf_getwholeline()` calls getline(3). Quoting
its documentation:
If *lineptr was set to NULL before the call, then the buffer
should be freed by the user program even on failure.
Consequently, the temporary buffer may hold allocated memory even when
the call to `strbuf_getwholeline()` fails.
Fix this by releasing the temporary buffer on error.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
- t7423: Introduced via b20c10fd9b (t7423: add tests for symlinked
submodule directories, 2024-01-28), passes since e8d0608944
(submodule: require the submodule path to contain directories only,
2024-03-26). The fix is not obviously related, but probably works
because we now die early in many code paths.
- t9xxx: All of these are exercising CVS-related tooling and pass
since at least Git v2.40. It's likely that these pass for a long
time already, but nobody ever noticed because Git developers do not
tend to have CVS on their machines.
Mark all of these tests as passing.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When initializing the transport helper in `transport_get()`, we
allocate the name of the helper. We neither end up transferring
ownership of the name, nor do we free it. The associated memory thus
leaks.
Fix this memory leak by freeing the string at the calling side in
`transport_get()`. `transport_helper_init()` now creates its own copy of
the string and thus can free it as required.
An alterantive way to fix this would be to transfer ownership of the
string passed into `transport_helper_init()`, which would avoid the call
to xstrdup(1). But it does make for a more surprising calling convention
as we do not typically transfer ownership of strings like this.
Mark now-passing tests as leak free.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "t/lib-terminal.sh", we declare a lazy prerequisite for tests that
require a TTY. The prerequisite uses a Perl script to figure out whether
we do have a usable TTY or not and thus implicitly depends on the PERL
prerequisite, as well. Furthermore though, the script requires another
dependency that is easy to miss, namely on the IO::Pty module. If that
module is not installed, then the script will exit early due to an
reason unrelated to missing TTYs.
This easily leads to missing test coverage. But most importantly, our CI
systems are missing this dependency and thus don't execute those tests
at all. Fix this.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 23 May 2024 18:04:29 +0000 (11:04 -0700)]
Merge branch 'mt/openindiana-portability'
Portability updates to various uses of grep and sed.
* mt/openindiana-portability:
t/t9001-send-email.sh: sed - remove the i flag for s
t/t9118-git-svn-funky-branch-names.sh: sed needs semicolon
t/t1700-split-index.sh: mv -v is not portable
t/t4202-log.sh: fix misspelled variable
t/t0600-reffiles-backend.sh: rm -v is not portable
t/t9902-completion.sh: backslashes in echo
Switch grep from non-portable BRE to portable ERE
Junio C Hamano [Thu, 23 May 2024 18:04:26 +0000 (11:04 -0700)]
Merge branch 'la/hide-trailer-info'
The trailer API has been reshuffled a bit.
* la/hide-trailer-info:
trailer unit tests: inspect iterator contents
trailer: document parse_trailers() usage
trailer: retire trailer_info_get() from API
trailer: make trailer_info struct private
trailer: make parse_trailers() return trailer_info pointer
interpret-trailers: access trailer_info with new helpers
sequencer: use the trailer iterator
trailer: teach iterator about non-trailer lines
trailer: add unit tests for trailer iterator
Makefile: sort UNIT_TEST_PROGRAMS
Junio C Hamano [Mon, 20 May 2024 18:20:04 +0000 (11:20 -0700)]
Merge branch 'jc/compat-regex-calloc-fix'
Windows CI running in GitHub Actions started complaining about the
order of arguments given to calloc(); the imported regex code uses
the wrong order almost consistently, which has been corrected.
* jc/compat-regex-calloc-fix:
compat/regex: fix argument order to calloc(3)
Junio C Hamano [Mon, 20 May 2024 18:20:04 +0000 (11:20 -0700)]
Merge branch 'kn/ref-transaction-symref'
Updates to symbolic refs can now be made as a part of ref
transaction.
* kn/ref-transaction-symref:
refs: remove `create_symref` and associated dead code
refs: rename `refs_create_symref()` to `refs_update_symref()`
refs: use transaction in `refs_create_symref()`
refs: add support for transactional symref updates
refs: move `original_update_refname` to 'refs.c'
refs: support symrefs in 'reference-transaction' hook
files-backend: extract out `create_symref_lock()`
refs: accept symref values in `ref_transaction_update()`
Marcel Telka [Fri, 17 May 2024 15:27:41 +0000 (17:27 +0200)]
t/t1700-split-index.sh: mv -v is not portable
The -v option for mv is not specified by POSIX. The illumos
implementation of mv does not support -v. Since we do not need the
verbose mv output we just drop -v for mv.
Signed-off-by: Marcel Telka <marcel@telka.sk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Marcel Telka [Fri, 17 May 2024 13:19:00 +0000 (15:19 +0200)]
t/t0600-reffiles-backend.sh: rm -v is not portable
The -v option for rm is not specified by POSIX. The illumos
implementation of rm does not support -v. Since we do not need the
verbose rm output we just drop -v for rm.
Signed-off-by: Marcel Telka <marcel@telka.sk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Marcel Telka [Fri, 17 May 2024 14:08:45 +0000 (16:08 +0200)]
t/t9902-completion.sh: backslashes in echo
The usage of backslashes in echo is not portable. Since some tests
tries to output strings containing '\b' it is safer to use printf
here. The usage of printf instead of echo is also preferred by POSIX.
Signed-off-by: Marcel Telka <marcel@telka.sk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 17 May 2024 17:14:46 +0000 (10:14 -0700)]
diff: document what --name-only shows
The "--name-only" option is about showing the name of each file in
the post-image tree that got changed and nothing else (like "was it
created?"). Unlike the "--name-status" option that tells how the
change happened (e.g., renamed with similarity), it does not give
anything else, like the name of the corresponding file in the old
tree.
For example, if you start from a clean checkout that has a file
whose name is COPYING, here is what you would see:
Lack of the description of this fact has confused readers in the
past. Even back when dda2d79a ([PATCH] Clean up diff option
descriptions., 2005-07-13) documented "--name-only", "git diff"
already supported the renames, so in a sense, from day one, this
should have been documented more clearly but it wasn't.
Karthik Nayak [Fri, 17 May 2024 12:27:24 +0000 (14:27 +0200)]
SubmittingPatches: add section for iterating patches
Add a section to explain how to work around other in-flight patches and
how to navigate conflicts which arise as a series is being iterated.
This provides the necessary steps that users can follow to reduce
friction with other ongoing topics and also provides guidelines on how
the users can also communicate this to the list efficiently.
Co-authored-by: Junio C Hamano <gitster@pobox.com> Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
completion: adapt git-config(1) to complete subcommands
With fe3ccc7aab (Merge branch 'ps/config-subcommands', 2024-05-15),
git-config(1) has gained support for subcommands. These subcommands live
next to the old, action-based mode, so that both the old and new way
continue to work.
The manpage for this command has been updated to prominently show the
subcommands, and the action-based modes are marked as deprecated. Update
Bash completion scripts accordingly to advertise subcommands instead of
actions.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 15 May 2024 19:32:42 +0000 (12:32 -0700)]
t0017: clarify dubious test set-up
1ff750b1 (tests: make GIT_TEST_GETTEXT_POISON a boolean, 2019-06-21)
added this test, in which "test-tool -C" is fed a name of a
directory that does not exist, and expects that it dies because of a
failure to read the configuration file(s), because the configuration
setting is screwed up to contain mutual inclusion loop, before it
notices that the directory to chdir into does not exist and dies.
It is of dubious value to etch the current order of events, i.e.,
the configuration needs to be read that early (for initializing
trace2 subsystem) before we even notice the lack of the directory
and have a chance to fail, into stone. Indeed, if you completely
compile out trace2 subsystem so that it does not even attempt to
read the configuration that early, we would die with a different
error message (i.e. "unable to chdir to 'cycle'") and this test will
fail.
At least give a bogus argument to "test-tool -C" a name that is
clearly bogus to make sure we can more easily see what is going on
with plenty of comments.
We may want to remove this test altogether, instead, though.
Junio C Hamano [Thu, 16 May 2024 17:10:13 +0000 (10:10 -0700)]
Merge branch 'ps/refs-without-the-repository'
The refs API lost functions that implicitly assumes to work on the
primary ref_store by forcing the callers to pass a ref_store as an
argument.
* ps/refs-without-the-repository:
refs: remove functions without ref store
cocci: apply rules to rewrite callers of "refs" interfaces
cocci: introduce rules to transform "refs" to pass ref store
refs: add `exclude_patterns` parameter to `for_each_fullref_in()`
refs: introduce missing functions that accept a `struct ref_store`
Junio C Hamano [Thu, 16 May 2024 17:10:13 +0000 (10:10 -0700)]
Merge branch 'jl/git-no-advice'
A new global "--no-advice" option can be used to disable all advice
messages, which is meant to be used only in scripts.
* jl/git-no-advice:
t0018: two small fixes
advice: add --no-advice global option
doc: add spacing around paginate options
doc: clean up usage documentation for --no-* opts
Koji Nakamaru [Wed, 15 May 2024 19:21:07 +0000 (19:21 +0000)]
osxkeychain: state to skip unnecessary store operations
git passes a credential that has been used successfully to the helpers
to record. If a credential is already stored,
"git-credential-osxkeychain store" just records the credential returned
by "git-credential-osxkeychain get", and unnecessary (sometimes
problematic) SecItemAdd() and/or SecItemUpdate() are performed.
We can skip such unnecessary operations by marking a credential returned
by "git-credential-osxkeychain get". This marking can be done by
utilizing the "state[]" feature:
- The "get" command sets the field "state[]=osxkeychain:seen=1".
- The "store" command skips its actual operation if the field
"state[]=osxkeychain:seen=1" exists.
Introduce a new state "state[]=osxkeychain:seen=1".
Suggested-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Koji Nakamaru <koji.nakamaru@gree.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Koji Nakamaru [Wed, 15 May 2024 19:21:06 +0000 (19:21 +0000)]
osxkeychain: exclusive lock to serialize execution of operations
git passes a credential that has been used successfully to the helpers
to record. If "git-credential-osxkeychain store" commands run in
parallel (with fetch.parallel configuration and/or by running multiple
git commands simultaneously), some of them may exit with the error
"failed to store: -25299". This is because SecItemUpdate() in
add_internet_password() may return errSecDuplicateItem (-25299) in this
situation. Apple's documentation [1] also states as below:
In macOS, some of the functions of this API block while waiting for
input from the user (for example, when the user is asked to unlock a
keychain or give permission to change trust settings). In general, it
is safe to use this API in threads other than your main thread, but
avoid calling the functions from multiple operations, work queues, or
threads concurrently. Instead, serialize function calls or confine
them to a single thread.
The error has not been noticed before, because the former implementation
ignored the error.
Introduce an exclusive lock to serialize execution of operations.
The "whitespace check" task that was enabled for GitHub Actions CI
has been ported to GitLab CI.
* jt/port-ci-whitespace-check-to-gitlab:
gitlab-ci: add whitespace error check
ci: make the whitespace report optional
ci: separate whitespace check script
github-ci: fix link to whitespace error
ci: pre-collapse GitLab CI sections
Junio C Hamano [Wed, 15 May 2024 16:52:52 +0000 (09:52 -0700)]
Merge branch 'js/unit-test-suite-runner'
The "test-tool" has been taught to run testsuite tests in parallel,
bypassing the need to use the "prove" tool.
* js/unit-test-suite-runner:
cmake: let `test-tool` run the unit tests, too
ci: use test-tool as unit test runner on Windows
t/Makefile: run unit tests alongside shell tests
unit tests: add rule for running with test-tool
test-tool run-command testsuite: support unit tests
test-tool run-command testsuite: remove hardcoded filter
test-tool run-command testsuite: get shell from env
t0080: turn t-basic unit test into a helper
Pseudorefs are not stored in the ref database as by definition, they
carry additional metadata that essentially makes them not a ref. As
such, writing pseudorefs via the ref backend does not make any sense
whatsoever as the ref backend wouldn't know how exactly to store the
data.
Restrict writing pseudorefs via the ref backend.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ref-filter: properly distinuish pseudo and root refs
The ref-filter interfaces currently define root refs as either a
detached HEAD or a pseudo ref. Pseudo refs aren't root refs though, so
let's properly distinguish those ref types.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `is_root_ref()` function will happily clarify a pseudoref as a root
ref, even though pseudorefs are no refs. Next to being wrong, it also
leads to inconsistent behaviour across ref backends: while the "files"
backend accidentally knows to parse those pseudorefs and thus yields
them to the caller, the "reftable" backend won't ever see the pseudoref
at all because they are never stored in the "reftable" backend.
Fix this issue by filtering out pseudorefs in `is_root_ref()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Root refs are those refs that live in the root of the ref hierarchy.
Our old and venerable "HEAD" reference falls into this category, but we
don't yet classify it as such in `is_root_ref()`.
Adapt the function to also treat "HEAD" as a root ref. This change is
safe to do for all current callers:
- `ref_kind_from_refname()` already handles "HEAD" explicitly before
calling `is_root_ref()`.
- The "files" and "reftable" backends explicitly call both
`is_root_ref()` and `is_headref()` together.
This also aligns behaviour or `is_root_ref()` and `is_headref()` such
that we stop checking for ref existence. This changes semantics for our
backends:
- In the reftable backend we already know that the ref must exist
because `is_headref()` is called as part of the ref iterator. The
existence check is thus redundant, and the change is safe to do.
- In the files backend we use it when populating root refs, where we
would skip adding the "HEAD" file if it was not possible to resolve
it. The new behaviour is to instead mark "HEAD" as broken, which
will cause us to emit warnings in various places.
As there are no callers of `is_headref()` left afer the refactoring, we
can absorb it completely into `is_root_ref()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: do not check ref existence in `is_root_ref()`
Before this patch series, root refs except for "HEAD" and our special
refs were classified as pseudorefs. Furthermore, our terminology
clarified that pseudorefs must not be symbolic refs. This restriction
is enforced in `is_root_ref()`, which explicitly checks that a supposed
root ref resolves to an object ID without recursing.
This has been extremely confusing right from the start because (in old
terminology) a ref name may sometimes be a pseudoref and sometimes not
depending on whether it is a symbolic or regular ref. This behaviour
does not seem reasonable at all and I very much doubt that it results in
anything sane.
Last but not least, the current behaviour can actually lead to a
segfault when calling `is_root_ref()` with a reference that either does
not exist or that is a symbolic ref because we never initialized `oid`,
but then read it via `is_null_oid()`.
We have now changed terminology to clarify that pseudorefs are really
only "MERGE_HEAD" and "FETCH_HEAD", whereas all the other refs that live
in the root of the ref hierarchy are just plain refs. Thus, we do not
need to check whether the ref is symbolic or not. In fact, we can now
avoid looking up the ref completely as the name is sufficient for us to
figure out whether something would be a root ref or not.
This change of course changes semantics for our callers. As there are
only three of them we can assess each of them individually:
- "ref-filter.c:ref_kind_from_refname()" uses it to classify refs.
It's clear that the intent is to classify based on the ref name,
only.
- "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to
filter root refs. Again, using existence checks is pointless here as
the iterator has just surfaced the ref, so we know it does exist.
- "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to
determine whether it should add a ref to the root directory of its
iterator. This had the effect that we skipped over any files that
are either a symbolic ref, or which are not a ref at all.
The new behaviour is to include symbolic refs know, which aligns us
with the adapted terminology. Furthermore, files which look like
root refs but aren't are now mark those as "broken". As broken refs
are not surfaced by our tooling, this should not lead to a change in
user-visible behaviour, but may cause us to emit warnings. This
feels like the right thing to do as we would otherwise just silently
ignore corrupted root refs completely.
So in all cases the existence check was either superfluous, not in line
with the adapted terminology or masked potential issues. This commit
thus changes the behaviour as proposed and drops the existence check
altogether.
Add a test that verifies that this does not change user-visible
behaviour. Namely, we still don't want to show broken refs to the user
by default in git-for-each-ref(1). What this does allow though is for
internal callers to surface dangling root refs when they pass in the
`DO_FOR_EACH_INCLUDE_BROKEN` flag.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>