Jeff King [Mon, 9 Oct 2023 20:59:19 +0000 (16:59 -0400)]
midx: stop ignoring malformed oid fanout chunk
When we load the oid-fanout chunk, our callback checks that its size is
reasonable and returns an error if not. However, the caller only checks
our return value against CHUNK_NOT_FOUND, so we end up ignoring the
error completely! Using a too-small fanout table means we end up
accessing random memory for the fanout and segfault.
We can fix this by checking for any non-zero return value, rather than
just CHUNK_NOT_FOUND, and adjusting our error message to cover both
cases. We could handle each error code individually, but there's not
much point for such a rare case. The extra message produced in the
callback makes it clear what is going on.
The same pattern is used in the adjacent code. Those cases are actually
OK for now because they do not use a custom callback, so the only error
they can get is CHUNK_NOT_FOUND. But let's convert them, as this is an
accident waiting to happen (especially as we convert some of them away
from pair_chunk). The error messages are more verbose, but it should be
rare for a user to see these anyway.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 9 Oct 2023 20:58:38 +0000 (16:58 -0400)]
t: add library for munging chunk-format files
When testing corruption of files using the chunk format (like
commit-graphs and midx files), it's helpful to be able to modify bytes
in specific chunks. This requires being able both to read the
table-of-contents (to find the chunk to modify) but also to adjust it
(to account for size changes in the offsets of subsequent chunks).
We have some tests already which corrupt chunk files, but they have some
downsides:
1. They are very brittle, as they manually compute the expected size
of a particular instance of the file (e.g., see the definitions
starting with NUM_OBJECTS in t5319).
2. Because they rely on manual offsets and don't read the
table-of-contents, they're limited to overwriting bytes. But there
are many interesting corruptions that involve changing the sizes of
chunks (especially smaller-than-expected ones).
This patch adds a perl script which makes such corruptions easy. We'll
use it in subsequent patches.
Note that we could get by with just a big "perl -e" inside the helper
function. I chose to put it in a separate script for two reasons. One,
so we don't have to worry about the extra layer of shell quoting. And
two, the script is kind of big, and running the tests with "-x" would
repeatedly dump it into the log output.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 9 Oct 2023 20:58:23 +0000 (16:58 -0400)]
chunk-format: note that pair_chunk() is unsafe
The pair_chunk() function is provided as an easy helper for parsing
chunks that just want a pointer to a set of bytes. But every caller has
a hidden bug: because we return only the pointer without the matching
chunk size, the callers have no clue how many bytes they are allowed to
look at. And as a result, they may read off the end of the mmap'd data
when the on-disk file does not match their expectations.
Since chunk files are typically used for local-repository data like
commit-graph files and midx's, the security implications here are pretty
mild. The worst that can happen is that you hand somebody a corrupted
repository tarball, and running Git on it does an out-of-bounds read and
crashes. So it's worth being more defensive, but we don't need to drop
everything and fix every caller immediately.
I noticed the problem because the pair_chunk_fn() callback does not look
at its chunk_size argument, and wanted to annotate it to silence
-Wunused-parameter. We could do that now, but we'd lose the hint that
this code should be audited and fixed.
So instead, let's set ourselves up for going down that path:
1. Provide a pair_chunk() function that does return the size, which
prepares us for fixing these cases.
2. Rename the existing function to pair_chunk_unsafe(). That gives us
an easy way to grep for cases which still need to be fixed, and the
name should cause anybody adding new calls to think twice before
using it.
There are no callers of the "safe" version yet, but we'll add some in
subsequent patches.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Mon, 9 Oct 2023 21:58:56 +0000 (21:58 +0000)]
files-backend.c: avoid stat in 'loose_fill_ref_dir'
Modify the 'readdir' loop in 'loose_fill_ref_dir' to, rather than 'stat' a
file to determine whether it is a directory or not, use 'get_dtype'.
Currently, the loop uses 'stat' to determine whether each dirent is a
directory itself or not in order to construct the appropriate ref cache
entry. If 'stat' fails (returning a negative value), the dirent is silently
skipped; otherwise, 'S_ISDIR(st.st_mode)' is used to check whether the entry
is a directory.
On platforms that include an entry's d_type in in the 'dirent' struct, this
extra 'stat' check is redundant. We can use the 'get_dtype' method to
extract this information on platforms that support it (i.e. where
NO_D_TYPE_IN_DIRENT is unset), and derive it with 'stat' on platforms that
don't. Because 'stat' is an expensive call, this confers a
modest-but-noticeable performance improvement when iterating over large
numbers of refs (approximately 20% speedup in 'git for-each-ref' in a 30k
ref repo).
Unlike other existing usage of 'get_dtype', the 'follow_symlinks' arg is set
to 1 to replicate the existing handling of symlink dirents. This
unfortunately requires calling 'stat' on the associated entry regardless of
platform, but symlinks in the loose ref store are highly unlikely since
they'd need to be created manually by a user.
Note that this patch also changes the condition for skipping creation of a
ref entry from "when 'stat' fails" to "when the d_type is anything other
than DT_REG or DT_DIR". If a dirent's d_type is DT_UNKNOWN (either because
the platform doesn't support d_type in dirents or some other reason) or
DT_LNK, 'get_dtype' will try to derive the underlying type with 'stat'. If
the 'stat' fails, the d_type will remain 'DT_UNKNOWN' and dirent will be
skipped. However, it will also be skipped if it is any other valid d_type
(e.g. DT_FIFO for named pipes, DT_LNK for a nested symlink). Git does not
handle these properly anyway, so we can safely constrain accepted types to
directories and regular files.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Mon, 9 Oct 2023 21:58:55 +0000 (21:58 +0000)]
dir.[ch]: add 'follow_symlink' arg to 'get_dtype'
Add a 'follow_symlink' boolean option to 'get_type()'. If 'follow_symlink'
is enabled, DT_LNK (in addition to DT_UNKNOWN) d_types triggers the
stat-based d_type resolution, using 'stat' instead of 'lstat' to get the
type of the followed symlink. Note that symlinks are not followed
recursively, so a symlink pointing to another symlink will still resolve to
DT_LNK.
Update callers in 'diagnose.c' to specify 'follow_symlink = 0' to preserve
current behavior.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Victoria Dye [Mon, 9 Oct 2023 21:58:54 +0000 (21:58 +0000)]
dir.[ch]: expose 'get_dtype'
Move 'get_dtype()' from 'diagnose.c' to 'dir.c' and add its declaration to
'dir.h' so that it is accessible to callers in other files. The function and
its documentation are moved verbatim except for a small addition to the
description clarifying what the 'path' arg represents.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Modify the logic in 'cache_ref_iterator_advance' to reflect these
expectations:
1. If 'overlaps_prefix' returns 'PREFIX_EXCLUDES_DIR', then the prefix and
ref cache entry do not overlap at all. Skip this entry.
2. If 'overlaps_prefix' returns 'PREFIX_WITHIN_DIR', then the prefix matches
inside this entry if it is a directory. Skip if the entry is not a
directory, otherwise iterate over it.
3. Otherwise, 'overlaps_prefix' returned 'PREFIX_CONTAINS_DIR', indicating
that the cache entry (directory or not) is fully contained by or equal to
the prefix. Iterate over this entry.
Note that condition 2 relies on the names of directory entries having the
appropriate trailing slash. The existing function documentation of
'create_dir_entry' explicitly calls out the trailing slash requirement, so
this is a safe assumption to make.
This bug generally doesn't have any user-facing impact, since it requires:
1. using a non-empty prefix without a trailing slash in an iteration like
'for_each_fullref_in',
2. the callback to said iteration not reapplying the original filter (as
for-each-ref does) to ensure unmatched refs are skipped, and
3. the repository having one or more refs that match part of, but not all
of, the prefix.
However, there are some niche scenarios that meet those criteria
(specifically, 'rev-parse --bisect' and '(log|show|shortlog) --bisect'). Add
tests covering those cases to demonstrate the fix in this patch.
Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
John Cai [Mon, 9 Oct 2023 13:21:00 +0000 (13:21 +0000)]
merge-ort: initialize repo in index state
initialize_attr_index() does not initialize the repo member of
attr_index. Starting in 44451a2e5e (attr: teach "--attr-source=<tree>"
global option to "git", 2023-05-06), this became a problem because
istate->repo gets passed down the call chain starting in
git_check_attr(). This gets passed all the way down to
replace_refs_enabled(), which segfaults when accessing r->gitdir.
Fix this by initializing the repository in the index state.
Signed-off-by: John Cai <johncai86@gmail.com> Helped-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sergey Organov [Mon, 9 Oct 2023 16:05:35 +0000 (19:05 +0300)]
completion: complete '--dd'
'--dd' only makes sense for 'git log' and 'git show', so add it to
__git_log_show_options which is referenced in the completion for these
two commands.
Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sergey Organov [Mon, 9 Oct 2023 16:05:34 +0000 (19:05 +0300)]
diff-merges: introduce '--dd' option
This option provides a shortcut to request diff with respect to first
parent for any kind of commit, universally. It's implemented as pure
synonym for "--diff-merges=first-parent --patch".
Gives user quick and universal way to see what changes, exactly, were
brought to a branch by merges as well as by regular commits.
Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sergey Organov [Mon, 9 Oct 2023 16:05:33 +0000 (19:05 +0300)]
diff-merges: improve --diff-merges documentation
* Put descriptions of convenience shortcuts first, so they are the
first things reader observes rather than lengthy detailed stuff.
* Get rid of very long line containing all the --diff-merges formats
by replacing them with <format>, and putting each supported format
on its own line.
Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Štěpán Němec [Mon, 9 Oct 2023 17:56:04 +0000 (19:56 +0200)]
doc/cat-file: make synopsis and description less confusing
The DESCRIPTION's "first form" is actually the 1st, 2nd, 3rd and 5th
form in SYNOPSIS, the "second form" is the 4th one.
Interestingly, this state of affairs was introduced in 97fe7250753b (cat-file docs: fix SYNOPSIS and "-h" output, 2021-12-28)
with the claim of "Now the two will match again." ("the two" being
DESCRIPTION and SYNOPSIS)...
The description also suffers from other correctness and clarity issues,
e.g., the "first form" paragraph discusses -p, -s and -t, but leaves out
-e, which is included in the corresponding SYNOPSIS section; the second
paragraph mentions <format>, which doesn't occur in SYNOPSIS at all, and
of the three batch options, really only describes the behavior of
--batch-check. Also the mention of "drivers" seems an implementation
detail not adding much clarity in a short summary (and isn't expanded
upon in the rest of the man page, either).
Rather than trying to maintain one-to-one (or N-to-M) correspondence
between the DESCRIPTION and SYNOPSIS forms, creating duplication and
providing opportunities for error, shorten the former into a concise
summary describing the two general modes of operation: batch and
non-batch, leaving details to the subsequent manual sections.
While here, fix a grammar error in the description of -e and make the
following further minor improvements:
NAME:
shorten ("content or type and size" isn't the whole story; say
"details" and leave the actual details to later sections)
SYNOPSIS and --help:
move the (--textconv | --filters) form before --batch, closer
to the other non-batch forms
Signed-off-by: Štěpán Němec <stepnem@smrk.net> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The soft limit of the first line of the commit message should be
"no more than 50 characters" or "50 characters or less", but not
"less than 50 character".
This is an addition to commit c2c349a15c (doc: correct the 50 characters
soft limit, 2023-09-28).
Signed-off-by: 谢致邦 (XIE Zhibang) <Yeking@Red54.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andy Koppe [Sun, 8 Oct 2023 20:23:07 +0000 (21:23 +0100)]
pretty: fix ref filtering for %(decorate) formats
Mark pretty formats containing "%(decorate" as requiring decoration in
userformat_find_requirements(), same as "%d" and "%D".
Without this, cmd_log_init_finish() didn't invoke load_ref_decorations()
with the decoration_filter it puts together, and hence filtering options
such as --decorate-refs were quietly ignored.
Amend one of the %(decorate) checks in t4205-log-pretty-formats.sh to
test this.
Signed-off-by: Andy Koppe <andy.koppe@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Sat, 7 Oct 2023 17:20:31 +0000 (13:20 -0400)]
repack: free existing_cruft array after use
We allocate an array of packed_git pointers so that we can sort the list
of cruft packs, but we never free the array, causing a small leak. Note
that we don't need to free the packed_git structs themselves; they're
owned by the repository object.
Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 6 Oct 2023 22:57:03 +0000 (15:57 -0700)]
doc: update list archive reference to use lore.kernel.org
No disrespect to other mailing list archives, but the local part of
their URLs will become pretty much meaningless once the archives go
out of service, and we learned the lesson hard way when $gmane
stopped serving.
Let's point into https://lore.kernel.org/ for an article that can be
found there, because the local part of the URL has the Message-Id:
that can be used to find the same message in other archives, even if
lore goes down.
Jeff King [Thu, 5 Oct 2023 21:33:26 +0000 (17:33 -0400)]
daemon: free listen_addr before returning
We build up a string list of listen addresses from the command-line
arguments, but never free it. This causes t5811 to complain of a leak
(though curiously it seems to do so only when compiled with gcc, not
with clang).
To handle this correctly, we have to do a little refactoring:
- there are two exit points from the main function, depending on
whether we are entering the main loop or serving a single client
(since rather than a traditional fork model, we re-exec ourselves
with the extra "--serve" argument to accommodate Windows).
We don't need --listen at all in the --serve case, of course, but it
is passed along by the parent daemon, which simply copies all of the
command-line options it got.
- we just "return serve()" to run the main loop, giving us no chance
to do any cleanup
So let's use a "ret" variable to store the return code, and give
ourselves a single exit point at the end. That gives us one place to do
cleanup.
Note that this code also uses the "use a no-dup string-list, but
allocate strings we add to it" trick, meaning string_list_clear() will
not realize it should free them. We can fix this by switching to a "dup"
string-list, but using the "append_nodup" function to add to it (this is
preferable to tweaking the strdup_strings flag before clearing, as it
puts all the subtle memory-ownership code together).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 5 Oct 2023 21:30:14 +0000 (17:30 -0400)]
revision: clear decoration structs during release_revisions()
The point of release_revisions() is to free memory associated with the
rev_info struct, but we have several "struct decoration" members that
are left untouched. Since the previous commit introduced a function to
do that, we can just call it.
We do have to provide some specialized callbacks to map the void
pointers onto real ones (the alternative would be casting the existing
function pointers; this generally works because "void *" is usually
interchangeable with a struct pointer, but it is technically forbidden
by the standard).
Since the line-log code does not expose the type it stores in the
decoration (nor of course the function to free it), I put this behind a
generic line_log_free() entry point. It's possible we may need to add
more line-log specific bits anyway (running t4211 shows a number of
other leaks in the line-log code).
While this doubtless cleans up many leaks triggered by the test suite,
the only script which becomes leak-free is t4217, as it does very little
beyond a simple traversal (its existing leak was from the use of
--children, which is now fixed).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 5 Oct 2023 21:29:02 +0000 (17:29 -0400)]
decorate: add clear_decoration() function
There's not currently any way to free the resources associated with a
decoration struct. As a result, we have several memory leaks which
cannot easily be plugged.
Let's add a "clear" function and make use of it in the example code of
t9004. This removes the only leak from that script, so we can mark it as
passing the leak sanitizer.
Curiously this leak is found only when running SANITIZE=leak with clang,
but not with gcc. But it is a bog-standard leak: we allocate some
memory in a local variable struct, and then exit main() without
releasing it. I'm not sure why gcc doesn't find it. After this
patch, both compilers report it as leak-free.
Note that the clear function takes a callback to free the individual
entries. That's not needed for our example (which is just decorating
with ints), but will be for real callers.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 3 Oct 2023 21:54:19 +0000 (17:54 -0400)]
builtin/repack.c: avoid making cruft packs preferred
When doing a `--geometric` repack, we make sure that the preferred pack
(if writing a MIDX) is the largest pack that we *didn't* repack. That
has the effect of keeping the preferred pack in sync with the pack
containing a majority of the repository's reachable objects.
But if the repository happens to double in size, we'll repack
everything. Here we don't specify any `--preferred-pack`, and instead
let the MIDX code choose.
In the past, that worked fine, since there would only be one pack to
choose from: the one we just wrote. But it's no longer necessarily the
case that there is one pack to choose from. It's possible that the
repository also has a cruft pack, too.
If the cruft pack happens to come earlier in lexical order (and has an
earlier mtime than any non-cruft pack), we'll pick that pack as
preferred. This makes it impossible to reuse chunks of the reachable
pack verbatim from pack-objects, so is sub-optimal.
Luckily, this is a somewhat rare circumstance to be in, since we would
have to repack the entire repository during a `--geometric` repack, and
the cruft pack would have to sort ahead of the pack we just created.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 3 Oct 2023 00:44:32 +0000 (20:44 -0400)]
builtin/repack.c: implement support for `--max-cruft-size`
Cruft packs are an alternative mechanism for storing a collection of
unreachable objects whose mtimes are recent enough to avoid being
pruned out of the repository.
When cruft packs were first introduced back in b757353676
(builtin/pack-objects.c: --cruft without expiration, 2022-05-20) and a7d493833f (builtin/pack-objects.c: --cruft with expiration,
2022-05-20), the recommended workflow consisted of:
- Repacking periodically, either by packing anything loose in the
repository (via `git repack -d`) or producing a geometric sequence
of packs (via `git repack --geometric=<d> -d`).
- Every so often, splitting the repository into two packs, one cruft
to store the unreachable objects, and another non-cruft pack to
store the reachable objects.
Repositories may (out of band with the above) choose periodically to
prune out some unreachable objects which have aged out of the grace
period by generating a pack with `--cruft-expiration=<approxidate>`.
This allowed repositories to maintain relatively few packs on average,
and quarantine unreachable objects together in a cruft pack, avoiding
the pitfalls of holding unreachable objects as loose while they age out
(for more, see some of the details in 3d89a8c118
(Documentation/technical: add cruft-packs.txt, 2022-05-20)).
This all works, but can be costly from an I/O-perspective when
frequently repacking a repository that has many unreachable objects.
This problem is exacerbated when those unreachable objects are rarely
(if every) pruned.
Since there is at most one cruft pack in the above scheme, each time we
update the cruft pack it must be rewritten from scratch. Because much of
the pack is reused, this is a relatively inexpensive operation from a
CPU-perspective, but is very costly in terms of I/O since we end up
rewriting basically the same pack (plus any new unreachable objects that
have entered the repository since the last time a cruft pack was
generated).
At the time, we decided against implementing more robust support for
multiple cruft packs. This patch implements that support which we were
lacking.
Introduce a new option `--max-cruft-size` which allows repositories to
accumulate cruft packs up to a given size, after which point a new
generation of cruft packs can accumulate until it reaches the maximum
size, and so on. To generate a new cruft pack, the process works like
so:
- Sort a list of any existing cruft packs in ascending order of pack
size.
- Starting from the beginning of the list, group cruft packs together
while the accumulated size is smaller than the maximum specified
pack size.
- Combine the objects in these cruft packs together into a new cruft
pack, along with any other unreachable objects which have since
entered the repository.
Once a cruft pack grows beyond the size specified via `--max-cruft-size`
the pack is effectively frozen. This limits the I/O churn up to a
quadratic function of the value specified by the `--max-cruft-size`
option, instead of behaving quadratically in the number of total
unreachable objects.
When pruning unreachable objects, we bypass the new code paths which
combine small cruft packs together, and instead start from scratch,
passing in the appropriate `--max-pack-size` down to `pack-objects`,
putting it in charge of keeping the resulting set of cruft packs sized
correctly.
This may seem like further I/O churn, but in practice it isn't so bad.
We could prune old cruft packs for whom all or most objects are removed,
and then generate a new cruft pack with just the remaining set of
objects. But this additional complexity buys us relatively little,
because most objects end up being pruned anyway, so the I/O churn is
well contained.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 3 Oct 2023 00:44:29 +0000 (20:44 -0400)]
builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE
The repack builtin takes a `--max-pack-size` command-line argument which
it uses to feed into any of the pack-objects children that it may spawn
when generating a new pack.
This option is parsed with OPT_STRING, meaning that we'll accept
anything as input, punting on more fine-grained validation until we get
down into pack-objects.
This is fine, but it's wasteful to spend an entire sub-process just to
figure out that one of its option is bogus. Instead, parse the value of
`--max-pack-size` with OPT_MAGNITUDE in 'git repack', and then pass the
known-good result down to pack-objects.
Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Štěpán Němec [Thu, 5 Oct 2023 09:00:55 +0000 (11:00 +0200)]
t/README: fix multi-prerequisite example
With the broken quoting the test wouldn't even parse correctly, but
there's also the '==' instead of POSIX '=' (of the shells I tested,
busybox ash, bash and ksh (93 and OpenBSD) accept '==', dash and zsh do
not), and 'print 2' from Python 2 days.
(I assume the test failing due to 3 != 4 is intentional or immaterial.)
Fixes: 93a572461386 ("test-lib: Add support for multiple test prerequisites") Signed-off-by: Štěpán Němec <stepnem@smrk.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Štěpán Němec [Thu, 5 Oct 2023 09:00:52 +0000 (11:00 +0200)]
doc/diff-options: improve wording of the log.diffMerges mention
Fix the grammar ("which default value is") and reword to match other
similar descriptions (say "configuration variable" instead of
"parameter", link to git-config(1)).
Signed-off-by: Štěpán Němec <stepnem@smrk.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
coverity: detect and report when the token or project is incorrect
When trying to obtain the MD5 of the Coverity Scan Tool (in order to
decide whether a cached version can be used or a new version has to be
downloaded), it is possible to get a 401 (Authorization required) due to
either an incorrect token, or even more likely due to an incorrect
Coverity project name.
Seeing an authorization failure that is caused by an incorrect project
name was somewhat surprising to me when developing the Coverity
workflow, as I found such a failure suggestive of an incorrect token
instead.
So let's provide a helpful error message about that specifically when
encountering authentication issues.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 4 Oct 2023 20:28:52 +0000 (13:28 -0700)]
Merge branch 'ps/revision-cmdline-stdin-not'
"git rev-list --stdin" learned to take non-revisions (like "--not")
recently from the standard input, but the way such a "--not" was
handled was quite confusing, which has been rethought. This is
potentially a change that breaks backward compatibility.
* ps/revision-cmdline-stdin-not:
revision: make pseudo-opt flags read via stdin behave consistently
Javier Mora [Wed, 4 Oct 2023 02:22:45 +0000 (02:22 +0000)]
git-status.txt: fix minor asciidoc format issue
The list of additional XY values for submodules in short format
isn't formatted consistently with the rest of the document.
Format as list for consistency.
Signed-off-by: Javier Mora <cousteaulecommandant@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t7420: test that we correctly handle renamed submodules
Create a second submodule with a name that differs from its path. Test
that calling set-url modifies the correct .gitmodules entries. Make sure
we don't create a section named after the path instead of the name.
Signed-off-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t7419: test that we correctly handle renamed submodules
Add the submodule again with an explicitly different name and path. Test
that calling set-branch modifies the correct .gitmodules entries. Make
sure we don't create a section named after the path instead of the name.
Signed-off-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The submodule repo the test set up had the 'topic' branch checked out,
meaning the repo's default branch (HEAD) is the 'topic' branch.
The following tests then pretended to switch between the default branch
and the 'topic' branch. This was papered over by continually adding
commits to the 'topic' branch and checking if the submodule gets updated
to this new commit.
Return the submodule repo to the 'main' branch after setup so we can
actually test the switching behavior.
Signed-off-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule--helper: use submodule_from_path in set-{url,branch}
The commands need a path to a submodule but treated it as the name when
modifying the .gitmodules file, leading to confusion when a submodule's
name does not match its path.
Because calling submodule_from_path initializes the submodule cache, we
need to manually trigger a reread before syncing, as the cache is
missing the config change we just made.
Signed-off-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 3 Oct 2023 20:31:30 +0000 (16:31 -0400)]
commit-graph: clear oidset after finishing write
In graph_write() we store commits in an oidset, but never clean it up,
leaking the contents. We should clear it in the cleanup section.
The oidset comes from 6830c36077 (commit-graph.h: replace 'commit_hex'
with 'commits', 2020-04-13), but it was just replacing a string_list
that was also leaked. Curiously, we fixed the leak of some adjacent
variables in commit fa8953cb40 (builtin/commit-graph.c: extract
'read_one_commit()', 2020-05-18), but the oidset wasn't included for
some reason.
In combination with the preceding commits, this lets us mark t5324 as
leak-free.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 3 Oct 2023 20:31:11 +0000 (16:31 -0400)]
commit-graph: free write-context base_graph_name during cleanup
Commit 6c622f9f0b (commit-graph: write commit-graph chains, 2019-06-18)
added a base_graph_name string to the write_commit_graph_context struct.
But the end-of-function cleanup forgot to free it, causing a leak.
This (presumably in combination with the preceding leak-fixes) lets us
mark t5328 as leak-free.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 3 Oct 2023 20:30:55 +0000 (16:30 -0400)]
commit-graph: free write-context entries before overwriting
When writing a split graph file, we replace the final element of the
commit_graph_hash_after and commit_graph_filenames_after arrays. But
since these are allocated strings, we need to free them before
overwriting to avoid leaking the old string.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 3 Oct 2023 20:30:44 +0000 (16:30 -0400)]
commit-graph: free graph struct that was not added to chain
When reading the graph chain file, we open (and allocate) each
individual slice it mentions and then add them to a linked-list chain.
But if adding to the chain fails (e.g., because the base-graph chunk it
contains didn't match what we expected), we leave the function without
freeing the graph struct that caused the failure, leaking it.
We can fix it by calling free_graph_commit().
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 3 Oct 2023 20:30:04 +0000 (16:30 -0400)]
commit-graph: delay base_graph assignment in add_graph_to_chain()
When adding a graph to a chain, we do some consistency checks and then
if everything looks good, set g->base_graph to add a link to the chain.
But when we added a new consistency check in 209250ef38 (commit-graph.c:
prevent overflow in add_graph_to_chain(), 2023-07-12), it comes _after_
we've already set g->base_graph. So we might return failure, even though
we actually added to the chain.
This hasn't caused a bug yet, because after failing to add to the chain,
we discard the failed graph struct completely, leaking it. But in order
to fix that, it's important that the struct be in a consistent and
predictable state after the failure.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 3 Oct 2023 20:29:30 +0000 (16:29 -0400)]
commit-graph: free all elements of graph chain
When running "commit-graph verify", we call free_commit_graph(). That's
sufficient for the case of a single graph file, but if we loaded a chain
of split graph files, they form a linked list via the base_graph
pointers. We need to free all of them, or we leak all but the first
struct.
We can make this work by teaching free_commit_graph() to walk the
base_graph pointers and free each element. This in turn lets us simplify
close_commit_graph(), which does the same thing by recursion (we cannot
just use close_commit_graph() in "commit-graph verify", as the function
takes a pointer to an object store, and the verify command creates a
single one-off graph struct).
While indenting the code in free_commit_graph() for the loop, I noticed
that setting g->data to NULL is rather pointless, as we free the struct
a few lines later. So I cleaned that up while we're here.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 3 Oct 2023 20:27:52 +0000 (16:27 -0400)]
commit-graph: move slab-clearing to close_commit_graph()
When closing and freeing a commit-graph, the main entry point is
close_commit_graph(), which then uses close_commit_graph_one() to
recurse through the base_graph links and free each one.
Commit 957ba814bf (commit-graph: when closing the graph, also release
the slab, 2021-09-08) put the call to clear the slab into the recursive
function, but this is pointless: there's only a single global slab
variable. It works OK in practice because clearing the slab is
idempotent, but it makes the code harder to reason about and refactor.
Move it into the parent function so it's only called once (and there are
no other direct callers of the recursive close_commit_graph_one(), so we
are not hurting them).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 3 Oct 2023 20:27:24 +0000 (16:27 -0400)]
merge: free result of repo_get_merge_bases()
We call repo_get_merge_bases(), which allocates a commit_list, but never
free the result, causing a leak.
The obvious solution is to free it, but we need to look at the contents
of the first item to decide whether to leave the loop. One option is to
free it in both code paths. But since the commit that the list points to
is longer-lived than the list itself, we can just dereference it
immediately, free the list, and then continue with the existing logic.
This is about the same amount of code, but keeps the list management all
in one place.
This lets us mark a number of merge-related test scripts as leak-free.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 3 Oct 2023 20:26:30 +0000 (16:26 -0400)]
commit-reach: free temporary list in get_octopus_merge_bases()
We loop over the set of commits to merge, and for each one compute the
merge base against the existing set of merge base candidates we've
found. Then we replace the candidate set with a simple assignment of the
list head, leaking the old list. We should free it first before
assignment.
This makes t5521 leak-free, so mark it as such.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 3 Oct 2023 20:26:09 +0000 (16:26 -0400)]
t6700: mark test as leak-free
This test has never leaked since it was added. Let's annotate it to make
sure it stays that way (and to reduce noise when looking for other
leak-free scripts after we fix some leaks).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Tue, 3 Oct 2023 08:55:21 +0000 (10:55 +0200)]
parse-options: drop unused parse_opt_ctx_t member
5c387428f1 (parse-options: don't emit "ambiguous option" for aliases,
2019-04-29) added "updated_options" to struct parse_opt_ctx_t, but it
has never been used. Remove it.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t1016-compatObjectFormat: add tests to verify the conversion between objects
For now my strategy is simple. Create two identical repositories one
in each format. Use fixed timestamps. Verify the dynamically computed
compatibility objects from one repository match the objects stored in
the other repository.
A general limitation of this strategy is that the git when generating
signed tags and commits with compatObjectFormat enabled will generate
a signature for both formats. To overcome this limitation I have
added "test-tool delete-gpgsig" that when fed an signed commit or tag
with two signatures deletes one of the signatures.
With that in place I can have "git commit" and "git tag" generate
signed objects, have my tool delete one, and feed the new object
into "git hash-object" to create the kinds of commits and tags
git without compatObjectFormat enabled will generate.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before I extend this test, changing the naming of the relevant
hash from sha1 to oid. Calling the hash sha1 is incorrect today
as it can be either sha1 or sha256 depending on the value of
GIT_DEFAULT_HASH_FUNCTION when the test is called.
I plan to test sha1 and sha256 simultaneously in the same repository.
Having a name like sha1 will be even more confusing.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
test-lib: compute the compatibility hash so tests may use it
Inspired-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/ls-tree: let the oid determine the output algorithm
Update cmd_ls_tree to call get_oid_with_context and pass
GET_OID_HASH_ANY instead of calling the simpler repo_get_oid.
This implments in ls-tree the behavior that asking to display a sha1
hash displays the corrresponding sha1 encoded object and asking to
display a sha256 hash displayes the corresponding sha256 encoded
object.
This is useful for testing the conversion of an object to an
equivlanet object encoded with a different hash function.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-file: handle compat objects in check_object_signature
Update check_object_signature to find the hash algorithm the exising
signature uses, and to use the same hash algorithm when recomputing it
to check the signature is valid.
This will be useful when teaching git ls-tree to display objects
encoded with the compat hash algorithm.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
tree-walk: init_tree_desc take an oid to get the hash algorithm
To make it possible for git ls-tree to display the tree encoded
in the hash algorithm of the oid specified to git ls-tree, update
init_tree_desc to take as a parameter the oid of the tree object.
Update all callers of init_tree_desc and init_tree_desc_gently
to pass the oid of the tree object.
Use the oid of the tree object to discover the hash algorithm
of the oid and store that hash algorithm in struct tree_desc.
Use the hash algorithm in decode_tree_entry and
update_tree_entry_internal to handle reading a tree object encoded in
a hash algorithm that differs from the repositories hash algorithm.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/cat-file: let the oid determine the output algorithm
Use GET_OID_HASH_ANY when calling get_oid_with_context. This
implements the semi-obvious behaviour that specifying a sha1 oid shows
the output for a sha1 encoded object, and specifying a sha256 oid
shows the output for a sha256 encoded object.
This is useful for testing the the conversion of an object to an
equivalent object encoded with a different hash function.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-file: update object_info_extended to reencode objects
oid_object_info_extended is updated to detect an oid encoding that
does not match the current repository, use repo_oid_to_algop to find
the correspoding oid in the current repository and to return the data
for the oid.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-file-convert: convert commits that embed signed tags
As mentioned in the hash function transition plan commit mergetag
lines need to be handled. The commit mergetag lines embed an entire
tag object in a commit object.
Keep the implementation sane if not fast by unembedding the tag
object, converting the tag object, and embedding the new tag object,
in the new commit object.
In the long run I don't expect any other approach is maintainable, as
tag objects may be extended in ways that require additional
translation.
To keep the implementation of convert_commit_object maintainable I
have modified convert_commit_object to process the lines in any order,
and to fail on unknown lines. We can't know ahead of time if a new
line might embed something that needs translation or not so it is
better to fail and require the code to be updated instead of silently
mistranslating objects.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-file-convert: convert commit objects when writing
When writing a commit object in a repository with both SHA-1 and
SHA-256, we'll need to convert our commit objects so that we can write
the hash values for both into the repository. To do so, let's add a
function to convert commit objects.
Read the commit object and map the tree value and any of the parent
values, and copy the rest of the commit through unmodified. Note that
we don't need to modify the signature headers, because they are the
same under both algorithms.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-file-convert: convert tag objects when writing
When writing a tag object in a repository with both SHA-1 and SHA-256,
we'll need to convert our commit objects so that we can write the hash
values for both into the repository. To do so, let's add a function to
convert tag objects.
Note that signatures for tag objects in the current algorithm trail the
message, and those for the alternate algorithm are in headers.
Therefore, we parse the tag object for both a trailing signature and a
header and then, when writing the other format, swap the two around.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object-file-convert: add a function to convert trees between algorithms
In the future, we're going to want to provide SHA-256 repositories that
have compatibility support for SHA-1 as well. In order to do so, we'll
need to be able to convert tree objects from SHA-256 to SHA-1 by writing
a tree with each SHA-256 object ID mapped to a SHA-1 object ID.
We implement a function, convert_tree_object, that takes an existing
tree buffer and writes it to a new strbuf, converting between
algorithms. Let's make this function generic, because while we only
need it to convert from the main algorithm to the compatibility
algorithm now, we may need to do the other way around in the future,
such as for transport.
We avoid reusing the code in decode_tree_entry because that code
normalizes data, and we don't want that here. We want to produce a
complete round trip of data, so if, for example, the old entry had a
wrongly zero-padded mode, we'd want to preserve that when converting to
ensure a stable hash value.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
object: factor out parse_mode out of fast-import and tree-walk into in object.h
builtin/fast-import.c and tree-walk.c have almost identical version of
get_mode. The two functions started out the same but have diverged
slightly. The version in fast-import changed mode to a uint16_t to
save memory. The version in tree-walk started erroring if no mode was
present.
As far as I can tell both of these changes are valid for both of the
callers, so add the both changes and place the common parsing helper
in object.h
Rename the helper from get_mode to parse_mode so it does not
conflict with another helper named get_mode in diff-no-index.c
This will be used shortly in a new helper decode_tree_entry_raw
which is used to compute cmpatibility objects as part of
the sha256 transition.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache: add a function to read an OID of a specific algorithm
Currently, we always read a object ID of the current algorithm with
oidread. However, once we start converting objects, we'll need to
consider what happens when we want to read an object ID of a specific
algorithm, such as the compatibility algorithm. To make this easier,
let's define oidread_algop, which specifies which algorithm we should
use for our object ID, and define oidread in terms of it.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we write a tag the object oid is specific to the hash algorithm.
This matters when a tag is signed. The hash transition plan calls for
signatures on both the sha1 form and the sha256 form of the object,
and for both of those signatures to live in the tag object.
To generate tag object with multiple signatures, first compute the
unsigned form of the tag, and then if the tag is being signed compute
the unsigned form of the tag with the compatibilityr hash. Then
compute compute the signatures of both buffers.
Once the signatures are computed add them to both buffers. This
allows computing the compatibility hash in do_sign, saving
write_object_file the expense of recomputing the compatibility tag
just to compute it's hash.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit: export add_header_signature to support handling signatures on tags
Rename add_commit_signature as add_header_signature, and expose it so
that it can be used for converting tags from one object format to
another.
Inspired-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit: convert mergetag before computing the signature of a commit
It so happens that commit mergetag lines embed a tag object. So to
compute the compatible signature of a commit object that has mergetag
lines the compatible embedded tag must be computed first.
Implement this by duplicating and converting the commit extra headers
into the compatible version of the commit extra headers, that need
to be passed to commit_tree_extended.
To handle merge tags only the compatible extra headers need to be
computed.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree. In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.
However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data. While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.
Consequently, we don't want to use the standard mapping that occurs when
we write an object. Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.
We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents. If we're signing
the commit, we then sign both versions and append both signatures to
both buffers. To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.
In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>