Junio C Hamano [Sat, 23 Sep 2023 00:01:36 +0000 (17:01 -0700)]
Merge branch 'la/trailer-cleanups'
Code clean-up.
Keep only the first three clean-ups, and discard the rest to be replaced later.
cf. <owly1qetjqo1.fsf@fine.c.googlers.com>
cf. <owlyzg1dsswr.fsf@fine.c.googlers.com>
* la/trailer-cleanups:
trailer: split process_command_line_args into separate functions
trailer: split process_input_file into separate pieces
trailer: separate public from internal portion of trailer_iterator
Junio C Hamano [Wed, 20 Sep 2023 17:45:12 +0000 (10:45 -0700)]
Merge branch 'jc/update-index-show-index-version'
"git update-index" learns "--show-index-version" to inspect
the index format version used by the on-disk index file.
* jc/update-index-show-index-version:
test-tool: retire "index-version"
update-index: add --show-index-version
update-index doc: v4 is OK with JGit and libgit2
Junio C Hamano [Wed, 20 Sep 2023 17:44:57 +0000 (10:44 -0700)]
Merge branch 'js/diff-cached-fsmonitor-fix'
"git diff --cached" codepath did not fill the necessary stat
information for a file when fsmonitor knows it is clean and ended
up behaving as if it is not clean, which has been corrected.
* js/diff-cached-fsmonitor-fix:
diff-lib: fix check_removed when fsmonitor is on
Junio C Hamano [Mon, 18 Sep 2023 20:53:13 +0000 (13:53 -0700)]
Merge branch 'js/complete-checkout-t'
The completion script (in contrib/) has been taught to treat the
"-t" option to "git checkout" and "git switch" just like the
"--track" option, to complete remote-tracking branches.
* js/complete-checkout-t:
completion(switch/checkout): treat --track and -t the same
Junio C Hamano [Thu, 14 Sep 2023 18:17:00 +0000 (11:17 -0700)]
Merge branch 'pw/rebase-i-after-failure'
Various fixes to the behaviour of "rebase -i" when the command got
interrupted by conflicting changes.
* pw/rebase-i-after-failure:
rebase -i: fix adding failed command to the todo list
rebase --continue: refuse to commit after failed command
rebase: fix rewritten list for failed pick
sequencer: factor out part of pick_commits()
sequencer: use rebase_path_message()
rebase -i: remove patch file after conflict resolution
rebase -i: move unlink() calls
Junio C Hamano [Thu, 14 Sep 2023 18:16:59 +0000 (11:16 -0700)]
Merge branch 'ak/pretty-decorate-more'
"git log --format" has been taught the %(decorate) placeholder.
* ak/pretty-decorate-more:
decorate: use commit color for HEAD arrow
pretty: add pointer and tag options to %(decorate)
pretty: add %(decorate[:<options>]) format
decorate: color each token separately
decorate: avoid some unnecessary color overhead
decorate: refactor format_decorations()
pretty-formats: enclose options in angle brackets
pretty-formats: define "literal formatting code"
Junio C Hamano [Thu, 14 Sep 2023 18:16:59 +0000 (11:16 -0700)]
Merge branch 'ks/ref-filter-sort-numerically'
"git for-each-ref --sort='contents:size'" sorts the refs according
to size numerically, giving a ref that points at a blob twelve-byte
(12) long before showing a blob hundred-byte (100) long.
* ks/ref-filter-sort-numerically:
ref-filter: sort numerically when ":size" is used
Taylor Blau [Wed, 13 Sep 2023 19:18:03 +0000 (15:18 -0400)]
builtin/repack.c: extract common cruft pack loop
When generating the list of packs to store in a MIDX (when given the
`--write-midx` option), we include any cruft packs both during
--geometric and non-geometric repacks.
But the rules for when we do and don't have to check whether any of
those cruft packs were queued for deletion differ slightly between the
two cases.
But the two can be unified, provided there is a little bit of extra
detail added in the comment to clarify when it is safe to avoid checking
for any pending deletions (and why it is OK to do so even when not
required).
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `->util` field corresponding to each string_list_item is used to
track the existence of some pack at the beginning of a repack operation
was originally intended to be used as a bitfield.
This bitfield tracked:
- (1 << 0): whether or not the pack should be deleted
- (1 << 1): whether or not the pack is cruft
The previous commit removed the use of the second bit, but a future
patch (from a different series than this one) will introduce a new use
of it.
So we could stop treating the util pointer as a bitfield and instead
start treating it as if it were a boolean. But this would require some
backtracking when that later patch is applied.
Instead, let's avoid touching the ->util field directly, and instead
introduce convenience functions like:
Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Jeff King <peff@peff.net> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Wed, 13 Sep 2023 19:17:57 +0000 (15:17 -0400)]
builtin/repack.c: store existing cruft packs separately
When repacking with the `--write-midx` option, we invoke the function
`midx_included_packs()` in order to produce the list of packs we want to
include in the resulting MIDX.
This list is comprised of:
- existing .keep packs
- any pack(s) which were written earlier in the same process
- any unchanged packs when doing a `--geometric` repack
- any cruft packs
Prior to this patch, we stored pre-existing cruft and non-cruft packs
together (provided those packs are non-kept). This meant we needed an
additional bit to indicate which non-kept pack(s) were cruft versus
those that aren't.
But alternatively we can store cruft packs in a separate list, avoiding
the need for this extra bit, and simplifying the code below.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
- at least one pre-existing packfile (which is not marked as kept),
- repacking with the `-d` flag, and
- not doing a cruft repack
, then we pass a handful of additional options to the inner
`pack-objects` process, like `--unpack-unreachable`,
`--keep-unreachable`, and `--pack-loose-unreachable`, in addition to
marking any packs we just wrote for promisor remotes as kept in-core
(with `--keep-pack`, as opposed to the presence of a ".keep" file on
disk).
Because we store both cruft and non-cruft packs together in the same
`existing.non_kept_packs` list, it suffices to check its `nr` member to
see if it is zero or not.
But a following change will store cruft- and non-cruft packs separately,
meaning this check would break as a result. Prepare for this by
extracting this part of the check into a new helper function called
`has_existing_non_kept_packs()`.
This patch does not introduce any functional changes, but prepares us to
make a more isolated change in a subsequent patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Wed, 13 Sep 2023 19:17:51 +0000 (15:17 -0400)]
builtin/repack.c: extract redundant pack cleanup for existing packs
To remove redundant packs at the end of a repacking operation, Git uses
its `remove_redundant_pack()` function in a loop over the set of
pre-existing, non-kept packs.
In a later commit, we will split this list into two, one for
pre-existing cruft pack(s), and another for non-cruft pack(s). Prepare
for this by factoring out the routine to loop over and delete redundant
packs into its own function.
Instead of calling `remove_redundant_pack()` directly, we now will call
`remove_redundant_existing_packs()`, which itself dispatches a call to
`remove_redundant_packs_1()`. Note that the geometric repacking code
will still call `remove_redundant_pack()` directly, but see the previous
commit for more details.
Having `remove_redundant_packs_1()` exist as a separate function may
seem like overkill in this patch. However, a later patch will call
`remove_redundant_packs_1()` once over two separate lists, so this
refactoring sets us up for that.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Wed, 13 Sep 2023 19:17:49 +0000 (15:17 -0400)]
builtin/repack.c: extract redundant pack cleanup for --geometric
To reduce the complexity of the already quite-long `cmd_repack()`
implementation, extract out the parts responsible for deleting redundant
packs from a geometric repack out into its own sub-routine.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Wed, 13 Sep 2023 19:17:46 +0000 (15:17 -0400)]
builtin/repack.c: extract marking packs for deletion
At the end of a repack (when given `-d`), Git attempts to remove any
packs which have been made "redundant" as a result of the repacking
operation. For example, an all-into-one (`-A` or `-a`) repack makes
every pre-existing pack which is not marked as kept redundant. Geometric
repacks (with `--geometric=<n>`) make any packs which were rolled up
redundant, and so on.
But before deleting the set of packs we think are redundant, we first
check to see whether or not we just wrote a pack which is identical to
any one of the packs we were going to delete. When this is the case, Git
must avoid deleting that pack, since it matches a pack we just wrote
(so deleting it may cause the repository to become corrupt).
Right now we only process the list of non-kept packs in a single pass.
But a future change will split the existing non-kept packs further into
two lists: one for cruft packs, and another for non-cruft packs.
Factor out this routine to prepare for calling it twice on two separate
lists in a future patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Wed, 13 Sep 2023 19:17:35 +0000 (15:17 -0400)]
builtin/repack.c: extract structure to store existing packs
The repack machinery needs to keep track of which packfiles were present
in the repository at the beginning of a repack, segmented by whether or
not each pack is marked as kept.
The names of these packs are stored in two `string_list`s, corresponding
to kept- and non-kept packs, respectively. As a consequence, many
functions within the repack code need to take both `string_list`s as
arguments, leading to code like this:
Wrap up this pair of `string_list`s into a single structure that stores
both. This saves us from having to pass both string lists separately,
and prepares for adding additional fields to this structure.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 13 Sep 2023 17:07:56 +0000 (10:07 -0700)]
Merge branch 'jk/unused-post-2.42-part2'
Unused parameters to functions are marked as such, and/or removed,
in order to bring us closer to -Wunused-parameter clean.
* jk/unused-post-2.42-part2:
parse-options: mark unused parameters in noop callback
interpret-trailers: mark unused "unset" parameters in option callbacks
parse-options: add more BUG_ON() annotations
merge: do not pass unused opt->value parameter
parse-options: mark unused "opt" parameter in callbacks
parse-options: prefer opt->value to globals in callbacks
checkout-index: delay automatic setting of to_tempfile
format-patch: use OPT_STRING_LIST for to/cc options
merge: simplify parsing of "-n" option
merge: make xopts a strvec
Philippe Blain [Tue, 12 Sep 2023 17:02:15 +0000 (17:02 +0000)]
completion: improve doc for complex aliases
The completion code can be told to use a particular completion for
aliases that shell out by using ': git <cmd> ;' as the first command of
the alias. This only works if <cmd> and the semicolon are separated by a
space, since if the space is missing __git_aliased_command returns (for
example) 'checkout;' instead of just 'checkout', and then
__git_complete_command fails to find a completion for 'checkout;'.
The examples have that space but it's not clear if it's just for
style or if it's mandatory. Explicitly mention it.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Tue, 12 Sep 2023 17:30:27 +0000 (17:30 +0000)]
completion: commit: complete trailers tokens more robustly
In the previous commit, we added support for completing configured
trailer tokens in 'git commit --trailer'.
Make the implementation more robust by:
- using '__git' instead of plain 'git', as the rest of the completion
script does
- using a stricter pattern for --get-regexp to avoid false hits
- using 'cut' and 'rev' instead of 'awk' to account for tokens including
dots.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
sequencer: remove unreachable exit condition in pick_commits()
This was introduced by 56dc3ab04 ("sequencer (rebase -i): implement the
'edit' command", 2017-01-02), and was pointless from the get-go: all
early exits from the loop above are returns, so todo_list->current ==
todo_list->nr is an invariant after the loop.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t3404-rebase-interactive.sh: fix typos in title of a rewording test
This test was introduced by commit 0c164ae7a ("rebase -i: add another
reword test", 2021-08-20). I didn't quite get what it was meant to do,
so here's an explanation from Phillip:
The purpose of the test is to ensure that
(i) There are no uncommitted changes when the editor runs. i.e., we
commit without running the editor and then reword by amending
that commit. This ensures that we have the same user experience
whether or not the commit was fast-forwarded [1].
(ii) That the todo list is re-read after the commit has been reworded.
This is to allow the user to update the todo list while the rebase
is paused for editing the commit message.
Junio C Hamano [Tue, 12 Sep 2023 19:32:35 +0000 (12:32 -0700)]
test-tool: retire "index-version"
As "git update-index --show-index-version" can do the same thing,
the 'index-version' subcommand in the test-tool lost its reason to
exist. Remove it and replace its use with the end-user facing
'git update-index --show-index-version'.
Helped-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 12 Sep 2023 19:32:34 +0000 (12:32 -0700)]
update-index: add --show-index-version
"git update-index --index-version N" is used to set the index format
version to a specific version, but there was no way to query the
current version used in the on-disk index file.
Teach the command a new "--show-index-version" option, and also
teach the "--index-version N" option to report what the version was
when run with the "--verbose" option.
Helped-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 12 Sep 2023 19:32:33 +0000 (12:32 -0700)]
update-index doc: v4 is OK with JGit and libgit2
Being invented in late 2012 no longer makes the index v4 format
"relatively young".
The support for the index version 4 was added to libgit2 with their 5625d86b (index: support index v4, 2016-05-17) and to JGit with
their e9cb0a8e (DirCache: support index V4, 2020-08-10).
Let's update the paragraph that discouraged its use for folks overly
cautious about cross-tool compatibility.
Helped-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Helped-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Josip Sokcevic [Mon, 11 Sep 2023 17:09:02 +0000 (10:09 -0700)]
diff-lib: fix check_removed when fsmonitor is on
`git diff-index` may return incorrect deleted entries when fsmonitor
is used in a repository with git submodules. This can be observed on
Mac machines, but it can affect all other supported platforms too.
If fsmonitor is used, `stat *st` is not initialized if cache_entry has
CE_FSMONITOR_VALID set. But, there are three call sites that rely on stat
afterwards, which can result in incorrect results.
This change partially reverts commit 4f3d6d02 (fsmonitor: skip lstat
deletion check during git diff-index, 2021-03-17).
Signed-off-by: Josip Sokcevic <sokcevic@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
maintenance(systemd): support the Windows Subsystem for Linux
When running in the Windows Subsystem for Linux (WSL), it is usually
necessary to use the Git Credential Manager for authentication when
performing the background fetches.
This requires interoperability between the Windows Subsystem for Linux
and the Windows host to work, which uses so-called vsocks, i.e. sockets
intended for communcations between virtual machines and the host they
are running on.
However, when Git is configured to run background maintenance via
`systemd`, the address families available to those maintenance processes
are restricted, and did not include `AF_VSOCK`. This leads to problems
e.g. when a background fetch tries to access github.com:
systemd[437]: Starting Optimize Git repositories data...
git[747387]: WSL (747387) ERROR: UtilBindVsockAnyPort:285: socket failed 97
git[747381]: fatal: could not read Username for 'https://github.com': No such device or address
git[747381]: error: failed to prefetch remotes
git[747381]: error: task 'prefetch' failed
systemd[437]: git-maintenance@hourly.service: Main process exited, code=exited, status=1/FAILURE
systemd[437]: git-maintenance@hourly.service: Failed with result 'exit-code'.
systemd[437]: Failed to start Optimize Git repositories data.
Address this (pun intended) by adding the `AF_VSOCK` address family to
the allow list.
This fixes https://github.com/microsoft/git/issues/604.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When -R is given, queue_diff() swaps the mode and name variables of the
two files to produce a reverse diff. 1e3f26542a (diff --no-index:
support reading from named pipes, 2023-07-05) added variables that
indicate whether files are special, i.e named pipes or - for stdin.
These new variables were not swapped, though, which broke the handling
of stdin with with -R. Swap them like the other metadata variables.
trailer: split process_input_file into separate pieces
Currently, process_input_file does three things:
(1) parse the input string for trailers,
(2) print text before the trailers, and
(3) calculate the position of the input where the trailers end.
Rename this function to parse_trailers(), and make it only do
(1). The caller of this function, process_trailers, becomes responsible
for (2) and (3). These items belong inside process_trailers because they
are both concerned with printing the surrounding text around
trailers (which is already one of the immediate concerns of
process_trailers).
Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
trailer: separate public from internal portion of trailer_iterator
The fields here are not meant to be used by downstream callers, so put
them behind an anonymous struct named as "internal" to warn against
their use. This follows the pattern in 576de3d956 (unpack_trees: start
splitting internal fields from public API, 2023-02-27).
Signed-off-by: Linus Arver <linusa@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
completion(switch/checkout): treat --track and -t the same
When `git switch --track ` is to be completed, only remote refs are
eligible because that is what the `--track` option targets.
And when the short-hand `-t` is used instead, the same _should_ happen.
Let's make it so.
Note that the bug exists both in the completions of `switch` and
`completion`, even if it manifests in slightly different ways: While
the completion of `git switch -t ` will not even look at remote refs,
the completion of `git checkout -t ` will look at both remote _and_
local refs. Both should look only at remote refs.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 7 Sep 2023 22:06:08 +0000 (15:06 -0700)]
Merge branch 'dd/format-patch-rfc-updates'
"git format-patch --rfc --subject-prefix=<foo>" used to ignore the
"--subject-prefix" option and used "[RFC PATCH]"; now we will add
"RFC" prefix to whatever subject prefix is specified.
This is a backward compatible change that may deserve a note.
* dd/format-patch-rfc-updates:
format-patch: --rfc honors what --subject-prefix sets
Junio C Hamano [Thu, 7 Sep 2023 22:06:07 +0000 (15:06 -0700)]
Merge branch 'jk/unused-post-2.42'
Unused parameters to functions are marked as such, and/or removed,
in order to bring us closer to -Wunused-parameter clean.
* jk/unused-post-2.42: (22 commits)
update-ref: mark unused parameter in parser callbacks
gc: mark unused descriptors in scheduler callbacks
bundle-uri: mark unused parameters in callbacks
fetch: mark unused parameter in ref_transaction callback
credential: mark unused parameter in urlmatch callback
grep: mark unused parmaeters in pcre fallbacks
imap-send: mark unused parameters with NO_OPENSSL
worktree: mark unused parameters in noop repair callback
negotiator/noop: mark unused callback parameters
add-interactive: mark unused callback parameters
grep: mark unused parameter in output function
test-trace2: mark unused argv/argc parameters
trace2: mark unused config callback parameter
trace2: mark unused us_elapsed_absolute parameters
stash: mark unused parameter in diff callback
ls-tree: mark unused parameter in callback
commit-graph: mark unused data parameters in generation callbacks
worktree: mark unused parameters in each_ref_fn callback
pack-bitmap: mark unused parameters in show_object callback
ref-filter: mark unused parameters in parser callbacks
...
Junio C Hamano [Thu, 7 Sep 2023 22:06:07 +0000 (15:06 -0700)]
Merge branch 'tb/multi-cruft-pack'
Use of --max-pack-size to allow multiple packfiles to be created is
now supported even when we are sending unreachable objects to cruft
packs.
* tb/multi-cruft-pack:
Documentation/gitformat-pack.txt: drop mixed version section
Documentation/gitformat-pack.txt: remove multi-cruft packs alternative
builtin/pack-objects.c: support `--max-pack-size` with `--cruft`
builtin/pack-objects.c: remove unnecessary strbuf_reset()
Since 3e230fa1b2 (grep: use parseopt, 2009-05-07) git grep has been
accepting the option --no-or. It does the same as --or: nothing.
That's confusing and unintended. Forbid negating --or.
Since 2daae3d1d1 (commit: add --trailer option, 2021-03-23), 'git
commit' can add trailers to commit messages. To make that feature more
pleasant to use at the command line, update the Bash completion code to
offer configured trailer tokens.
Add a __git_trailer_tokens function to list the configured trailers
tokens, and use it in _git_commit to suggest the configured tokens,
suffixing the completion words with ':' so that the user only has to add
the trailer value.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
rebase -i: fix adding failed command to the todo list
When rebasing commands are moved from the todo list in "git-rebase-todo"
to the "done" file (which is used by "git status" to show the recently
executed commands) just before they are executed. This means that if a
command fails because it would overwrite an untracked file it has to be
added back into the todo list before the rebase stops for the user to
fix the problem.
Unfortunately when a failed command is added back into the todo list the
command preceding it is erroneously appended to the "done" file. This
means that when rebase stops after "pick B" fails the "done" file
contains
pick A
pick B
pick A
instead of
pick A
pick B
This happens because save_todo() updates the "done" file with the
previous command whenever "git-rebase-todo" is updated. When we add the
failed pick back into "git-rebase-todo" we do not want to update
"done". Fix this by adding a "reschedule" parameter to save_todo() which
prevents the "done" file from being updated when adding a failed command
back into the "git-rebase-todo" file. A couple of the existing tests are
modified to improve their coverage as none of them trigger this bug or
check the "done" file.
Reported-by: Stefan Haller <lists@haller-berlin.de> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
rebase --continue: refuse to commit after failed command
If a commit cannot be picked because it would overwrite an untracked
file then "git rebase --continue" should refuse to commit any staged
changes as the commit was not picked. This is implemented by refusing to
commit if the message file is missing. The message file is chosen for
this check because it is only written when "git rebase" stops for the
user to resolve merge conflicts.
Existing commands that refuse to commit staged changes when continuing
such as a failed "exec" rely on checking for the absence of the author
script in run_git_commit(). This prevents the staged changes from being
committed but prints
error: could not open '.git/rebase-merge/author-script' for
reading
before the message about not being able to commit. This is confusing to
users and so checking for the message file instead improves the user
experience. The existing test for refusing to commit after a failed exec
is updated to check that we do not print the error message about a
missing author script anymore.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
git rebase keeps a list that maps the OID of each commit before it was
rebased to the OID of the equivalent commit after the rebase. This list
is used to drive the "post-rewrite" hook that is called at the end of a
successful rebase. When a rebase stops for the user to resolve merge
conflicts the OID of the commit being picked is written to
".git/rebase-merge/stopped-sha". Then when the rebase is continued that
OID is added to the list of rewritten commits. Unfortunately if a commit
cannot be picked because it would overwrite an untracked file we still
write the "stopped-sha1" file. This means that when the rebase is
continued the commit is added into the list of rewritten commits even
though it has not been picked yet.
Fix this by not calling error_with_patch() for failed commands. The pick
has failed so there is nothing to commit and therefore we do not want to
set up the state files for committing staged changes when the rebase
continues. This change means we no-longer write a patch for the failed
command or display the error message printed by error_with_patch(). As
the command has failed the patch isn't really useful and in any case the
user can inspect the commit associated with the failed command by
inspecting REBASE_HEAD. Unless the user has disabled it we already print
an advice message that is more helpful than the message from
error_with_patch() which the user will still see. Even if the advice is
disabled the user will see the messages from the merge machinery
detailing the problem.
The code to add a failed command back into the todo list is duplicated
between pick_one_commit() and the loop in pick_commits(). Both sites
print advice about the command being rescheduled, decrement the current
item and save the todo list. To avoid duplicating this code
pick_one_commit() is modified to set a flag to indicate that the command
should be rescheduled in the main loop. This simplifies things as only
the remaining copy of the code needs to be modified to set REBASE_HEAD
rather than calling error_with_patch().
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This simplifies the next commit. If a pick fails we now return the error
at the end of the loop body rather than returning early, a successful
"edit" command continues to return early. There are three things to
check to ensure that removing the early return for an error does not
change the behavior of the code:
(1) We could enter the block guarded by "if (reschedule)". This block
is not entered because "reschedlue" is always zero when picking a
commit.
(2) We could enter the block guarded by
"else if (is_rebase_i(opts) && check_todo && !res)". This block is
not entered when returning an error because "res" is non-zero in
that case.
(3) todo_list->current could be incremented before returning. That is
avoided by moving the increment which is of course a potential
change in behavior itself. The move is safe because none of the
callers look at todo_list after this function returns. Moving the
increment makes it clear we only want to advance the current item
if the command was successful.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rather than constructing the path in a struct strbuf use the ready
made function to get the path name instead. This was the last
remaining use of the strbuf so remove it as well.
As with the previous patch we now use a hard coded string rather than
git_dir() when constructing the path. This is safe for the same
reason (make_patch() is only called when rebasing) and is protected by
the assertion added in the previous patch.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
rebase -i: remove patch file after conflict resolution
When a rebase stops for the user to resolve conflicts it writes a patch
for the conflicting commit to .git/rebase-merge/patch. This file has
been written since the introduction of "git-rebase-interactive.sh" in 1b1dce4bae7 (Teach rebase an interactive mode, 2007-06-25). I assume the
idea was to enable the user inspect the conflicting commit in the same
way as they could for the patch based rebase. This file should be
deleted when the rebase continues as if the rebase stops for a failed
"exec" command or a "break" command it is confusing to the user if there
is a stale patch lying around from an unrelated command. As the path is
now used in two different places rebase_path_patch() is added and used
to obtain the path for the patch.
To construct the path write_patch() previously used get_dir() which
returns different paths depending on whether we're rebasing or
cherry-picking/reverting. As this function is only called when
rebasing it is safe to use a hard coded string for the directory
instead. An assertion is added to make sure we don't starting calling
this function when cherry-picking in the future.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the start of each iteration the loop that picks commits removes the
state files from the previous pick. However some of these files are only
written if there are conflicts in which case we exit the loop before the
end of the loop body. Therefore they only need to be removed when the
rebase continues, not at the start of each iteration.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
doc/diff-options: fix link to generating patch section
When formatted as man-page, the section title is rendered
"GENERATING PATCH TEXT WITH -P" whereas reference still reads
"Generating patch text with -p", that is inconsistent and makes
searching harder than it needs to be.
Fix this by getting rid of custom reference text.
Also, documentation for every command that describes `-p` option by
including the "diff-options.txt" file does include the
"diff-generate-patch.txt" file as well (as it should), so the internal
link is in fact useful for any of them.
Fix this by getting rid of conditionals around the reference.
Fixes: ebdc46c242 (docs: link generating patch sections) Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
var: avoid a segmentation fault when `HOME` is unset
The code introduced in 576a37fccbf (var: add attributes files locations,
2023-06-27) paid careful attention to use `xstrdup()` for pointers known
never to be `NULL`, and `xstrdup_or_null()` otherwise.
One spot was missed, though: `git_attr_global_file()` can return `NULL`,
when the `HOME` variable is not set (and neither `XDG_CONFIG_HOME`), a
scenario not too uncommon in certain server scenarios.
Fix this, and add a test case to avoid future regressions.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
sequencer: fix error message on failure to copy SQUASH_MSG
The message talked about renaming, while the actual action is copying.
This was introduced by 6e98de72c ("sequencer (rebase -i): add support
for the 'fixup' and 'squash' commands", 2017-01-02).
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Acked-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
a91f453f64 (grep: Add --max-depth option., 2009-07-22) added the option
--max-depth, defining it using a positional struct option initializer of
type OPTION_INTEGER. It also sets defval to 1 for some reason, but that
value would only be used if the flag PARSE_OPT_OPTARG was given.
Use the macro OPT_INTEGER_F instead to standardize the definition and
specify only the necessary values. This also normalizes argh to N_("n")
as a side-effect, which is OK.
adfc1857bd (describe: fix --contains when a tag is given as input,
2013-07-18) added the option --peel-tag, defining it using a positional
struct option initializer and a comment indicating that it's intended to
be a hidden OPT_BOOL. 4741edd549 (Remove deprecated OPTION_BOOLEAN for
parsing arguments, 2013-08-03) added the macro OPT_HIDDEN_BOOL, which
allows to express this more succinctly. Use it.
Atoms like "raw" and "contents" have a ":size" option which can be used
to know the size of the data. Since these atoms have the cmp_type
FIELD_STR, they are sorted alphabetically from 'a' to 'z' and '0' to
'9'. Meaning, even when the ":size" option is used and what we
ultimatlely have is numbers, we still sort alphabetically.
For example, consider the the following case in a repo
which is a numeric sort (that is, a "$ sort -n file" as opposed to a
"$ sort file", where "file" contains only the "contents:size" or
"raw:size" info, each of which is on a newline).
Same is the case with "--sort=raw:size".
So, sort numerically whenever the sort is done with "contents:size" or
"raw:size" and do it the normal alphabetic way when "contents" or "raw"
are used with some other option (they are FIELD_STR anyways).
Helped-by: Jeff King <peff@peff.net> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 21:22:20 +0000 (17:22 -0400)]
parse-options: mark unused parameters in noop callback
Unsurprisingly, the noop options callback doesn't bother to look at any
of its parameters. Let's mark them so that -Wunused-parameter does not
complain.
Another option would be to drop the callback and have parse-options
itself recognize OPT_NOOP_NOARG. But that seems like extra work for no
real benefit.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 21:22:15 +0000 (17:22 -0400)]
interpret-trailers: mark unused "unset" parameters in option callbacks
There are a few parse-option callbacks that do not look at their "unset"
parameters, but also do not set PARSE_OPT_NONEG. At first glance this
seems like a bug, as we'd ignore "--no-if-exists", etc.
But they do work fine, because when "unset" is true, then "arg" is NULL.
And all three functions pass "arg" on to helper functions which do the
right thing with the NULL.
Note that this shortcut would not be correct if any callback used
PARSE_OPT_NOARG (in which case "arg" would be NULL but "unset" would be
false). But none of these do.
So the code is fine as-is. But we'll want to mark the unused "unset"
parameters to quiet -Wunused-parameter. I've also added a comment to
make this rather subtle situation more explicit.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 21:21:49 +0000 (17:21 -0400)]
parse-options: add more BUG_ON() annotations
These callbacks are similar to the ones touched by 517fe807d6 (assert
NOARG/NONEG behavior of parse-options callbacks, 2018-11-05), but were
either missed in that commit (the one in add.c) or were added later (the
one in log.c).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 21:21:46 +0000 (17:21 -0400)]
merge: do not pass unused opt->value parameter
The option_parse_strategy() callback does not look at opt->value;
instead it calls append_strategy(), which manipulates the global
use_strategies array directly. But the OPT_CALLBACK declaration assigns
"&use_strategies" to opt->value.
One could argue this is good, as it tells the reader what we generally
expect the callback to do. But it is also bad, because it can mislead
you into thinking that swapping out "&use_strategies" there might have
any effect. Let's switch it to pass NULL (which is what every other
"does not bother to look at opt->value" callback does). If you want to
know what the callback does, it's easy to read the function itself.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 21:21:28 +0000 (17:21 -0400)]
parse-options: mark unused "opt" parameter in callbacks
The previous commit argued that parse-options callbacks should try to
use opt->value rather than touching globals directly. In some cases,
however, that's awkward to do. Some callbacks touch multiple variables,
or may even just call into an abstracted function that does so.
In some of these cases we _could_ convert them by stuffing the multiple
variables into a single struct and passing the struct pointer through
opt->value. But that may make other parts of the code less readable,
as the struct relationship has to be mentioned everywhere.
Let's just accept that these cases are special and leave them as-is. But
we do need to mark their "opt" parameters to satisfy -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 21:21:07 +0000 (17:21 -0400)]
parse-options: prefer opt->value to globals in callbacks
We have several parse-options callbacks that ignore their "opt"
parameters entirely. This is a little unusual, as we'd normally put the
result of the parsing into opt->value. In the case of these callbacks,
though, they directly manipulate global variables instead (and in
most cases the caller sets opt->value to NULL in the OPT_CALLBACK
declaration).
The immediate symptom we'd like to deal with is that the unused "opt"
variables trigger -Wunused-parameter. But how to fix that is debatable.
One option is to annotate them with UNUSED. But another is to have the
caller pass in the appropriate variable via opt->value, and use it. That
has the benefit of making the callbacks reusable (in theory at least),
and makes it clear from the OPT_CALLBACK declaration which variables
will be affected (doubly so for the cases in builtin/fast-export.c,
where we do set opt->value, but it is completely ignored!).
The slight downside is that we lose type safety, since they're now
passing through void pointers.
I went with the "just use them" approach here. The loss of type safety
is unfortunate, but that is already an issue with most of the other
callbacks. If we want to try to address that, we should do so more
consistently (and this patch would prepare these callbacks for whatever
we choose to do there).
Note that in the cases in builtin/fast-export.c, we are passing
anonymous enums. We'll have to give them names so that we can declare
the appropriate pointer type within the callbacks.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 5 Sep 2023 07:12:59 +0000 (03:12 -0400)]
checkout-index: delay automatic setting of to_tempfile
Using --stage=all requires writing to tempfiles, since we cannot put
multiple stages into a single file. So --stage=all implies --temp.
But we do so by setting to_tempfile in the options callback for --stage,
rather than after all options have been parsed. This leads to two bugs:
1. If you run "checkout-index --stage=all --stage=2", this should not
imply --temp, but it currently does. The callback cannot just unset
to_tempfile when it sees the "2" value, because it no longer knows
if its value was from the earlier --stage call, or if the user
specified --temp explicitly.
2. If you run "checkout-index --stage=all --no-temp", the --no-temp
will overwrite the earlier implied --temp. But this mode of
operation cannot work, and the command will fail with "<path>
already exists" when trying to write the higher stages.
We can fix both by lazily setting to_tempfile. We'll make it a tristate,
with -1 as "not yet given", and have --stage=all enable it only after
all options are parsed. Likewise, after all options are parsed we can
detect and reject the bogus "--no-temp" case.
Note that this does technically change the behavior for "--stage=all
--no-temp" for paths which have only one stage present (which
accidentally worked before, but is now forbidden). But this behavior was
never intended, and you'd have to go out of your way to try to trigger
it.
The new tests cover both cases, as well the general "--stage=all implies
--temp", as most of the other tests explicitly say "--temp". Ironically,
the test "checkout --temp within subdir" is the only one that _doesn't_
use "--temp", and so was implicitly covering this case. But it seems
reasonable to have a more explicit test alongside the other related
ones.
Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 5 Sep 2023 21:38:56 +0000 (14:38 -0700)]
Merge branch 'jk/test-lsan-denoise-output'
Tests with LSan from time to time seem to emit harmless message
that makes our tests unnecessarily flakey; we work it around by
filtering the uninteresting output.
Junio C Hamano [Tue, 5 Sep 2023 21:38:56 +0000 (14:38 -0700)]
Merge branch 'tb/mark-more-tests-as-leak-free'
Tests that are known to pass with LSan are now marked as such.
* tb/mark-more-tests-as-leak-free:
leak tests: mark t5583-push-branches.sh as leak-free
leak tests: mark t3321-notes-stripspace.sh as leak-free
leak tests: mark a handful of tests as leak-free
It may be tempting to leave the help text NULL for a command line
option that is either hidden or too obvious, but "git subcmd -h"
and "git subcmd --help-all" would have segfaulted if done so. Now
the help text is optional.
* rs/parse-options-help-text-is-optional:
parse-options: allow omitting option help text
Instead of generating a silly-looking `Revert "Revert "foo""`, make it
a more humane `Reapply "foo"`.
This is done for two reasons:
- To cover the actually common case of just a double revert.
- To encourage people to rewrite summaries of recursive reverts by
setting an example (a subsequent commit will also do this explicitly
in the documentation).
To achieve these goals, the mechanism does not need to be particularly
sophisticated. Therefore, more complicated alternatives which would
"compress more efficiently" have not been implemented.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 1 Sep 2023 18:26:28 +0000 (11:26 -0700)]
Merge branch 'ob/format-patch-description-file'
"git format-patch" learns a way to feed cover letter description,
that (1) can be used on detached HEAD where there is no branch
description available, and (2) also can override the branch
description if there is one.
Junio C Hamano [Fri, 1 Sep 2023 18:26:28 +0000 (11:26 -0700)]
Merge branch 'jk/diff-result-code-cleanup'
"git diff --no-such-option" and other corner cases around the exit
status of the "diff" command has been corrected.
* jk/diff-result-code-cleanup:
diff: drop useless "status" parameter from diff_result_code()
diff: drop useless return values in git-diff helpers
diff: drop useless return from run_diff_{files,index} functions
diff: die when failing to read index in git-diff builtin
diff: show usage for unknown builtin_diff_files() options
diff-files: avoid negative exit value
diff: spell DIFF_INDEX_CACHED out when calling run_diff_index()
Eric Wong [Fri, 1 Sep 2023 02:09:28 +0000 (02:09 +0000)]
treewide: fix various bugs w/ OpenSSL 3+ EVP API
The OpenSSL 3+ EVP API for SHA-* cannot support our prior use cases
supported by other SHA-* implementations. It has the following
differences:
1. ->init_fn is required before all use
2. struct assignments don't work and requires ->clone_fn
3. can't support ->update_fn after ->final_*fn
While fixing cases 1 and 2 is merely the matter of calling ->init_fn and
->clone_fn as appropriate, fixing case 3 requires calling ->final_*fn on
a temporary context that's cloned from the primary context.
Reported-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/ZPCL11k38PXTkFga@debian.me/ Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Fixes: 3e440ea0aba0 ("sha256: avoid functions deprecated in OpenSSL 3+") Fixes: bda9c12073e7 ("avoid SHA-1 functions deprecated in OpenSSL 3+") Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:23:20 +0000 (02:23 -0400)]
lower core.maxTreeDepth default to 2048
On my Linux system, all of our recursive tree walking algorithms can run
up to the 4096 default limit without segfaulting. But not all platforms
will have stack sizes as generous (nor might even Linux if we kick off a
recursive walk within a thread).
In particular, several of the tests added in the previous few commits
fail in our Windows CI environment. Through some guess-and-check
pushing, I found that 3072 is still too much, but 2048 is OK.
These are obviously vague heuristics, and there is nothing to promise
that another system might not have trouble at even lower values. But it
seems unlikely anybody will be too angry about a 2048-depth limit (this
is close to the default max-pathname limit on Linux even for a
pathological path like "a/a/a/..."). So let's just lower it.
Some alternatives are:
- configure separate defaults for Windows versus other platforms.
- just skip the tests on Windows. This leaves Windows users with the
annoying case that they can be crashed by running out of stack
space, but there shouldn't be any security implications (they can't
go deep enough to hit integer overflow problems).
Since the original default was arbitrary, it seems less confusing to
just lower it, keeping behavior consistent across platforms.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:22:08 +0000 (02:22 -0400)]
tree-diff: respect max_allowed_tree_depth
When diffing trees, we recurse to handle subtrees. That means we may run
out of stack space and segfault. Let's teach this code path about
core.maxTreeDepth in order to fail more gracefully.
As with the previous patch, we have no way to return an error (and other
tree-loading problems would just cause us to die()). So we'll likewise
call die() if we exceed the maximum depth.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:22:03 +0000 (02:22 -0400)]
list-objects: respect max_allowed_tree_depth
The tree traversal in list-objects.c, which is used by "rev-list
--objects", etc, uses recursion and may run out of stack space. Let's
teach it about the new core.maxTreeDepth config option.
We unfortunately can't return an error here, as this code doesn't
produce an error return at all. We'll die() instead, which matches the
behavior when we see an otherwise broken tree.
Note that this will also generally reject such deep trees from entering
the repository from a fetch or push, due to the use of rev-list in the
connectivity check. But it's not foolproof! We stop traversing when we
see an UNINTERESTING object, and the connectivity check marks existing
ref tips as UNINTERESTING. So imagine commit X has a tree
with maximum depth N. If you then create a new commit Y with a tree
entry "Y:subdir" that points to "X^{tree}", then the depth of Y will be
N+1. But a connectivity check running "git rev-list --objects Y --not X"
won't realize that; it will stop traversing at X^{tree}, since that was
already reachable.
So this will stop naive pushes of too-deep trees, but not carefully
crafted malicious ones. Doing it robustly and efficiently would require
caching the maximum depth of each tree (i.e., the longest path to any
leaf entry). That's much more complex and not strictly needed. If each
recursive algorithm limits itself already, then that's sufficient.
Blocking the objects from entering the repo would be a nice
belt-and-suspenders addition, but it's not worth the extra cost.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:21:55 +0000 (02:21 -0400)]
read_tree(): respect max_allowed_tree_depth
The read_tree() function reads trees recursively (via its read_tree_at()
helper). This can cause it to run out of stack space on very deep trees.
Let's teach it about the new core.maxTreeDepth option.
The easiest way to demonstrate this is via "ls-tree -r", which the test
covers. Note that I needed a tree depth of ~30k to trigger a segfault on
my Linux system, not the 4100 used by our "big" test in t6700. However,
that test still tells us what we want: that the default 4096 limit is
enough to prevent segfaults on all platforms. We could bump it, but that
increases the cost of the test setup for little gain.
As an interesting side-note: when I originally wrote this patch about 4
years ago, I needed a depth of ~50k to segfault. But porting it forward,
the number is much lower. Seemingly little things like cf0983213c (hash:
add an algo member to struct object_id, 2021-04-26) take it from 32,722
to 29,080.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:21:40 +0000 (02:21 -0400)]
traverse_trees(): respect max_allowed_tree_depth
The tree-walk.c code walks trees recursively, and may run out of stack
space. The easiest way to see this is with git-archive; on my 64-bit
Linux system it runs out of stack trying to generate a tarfile with a
tree depth of 13,772.
I've picked 4100 as the depth for our "big" test. I ran it with a much
higher value to confirm that we do get a segfault without this patch.
But really anything over 4096 is sufficient for its stated purpose,
which is to find out if our default limit of 4096 is low enough to
prevent segfaults on all platforms. Keeping it small saves us time on
the test setup.
The tree-walk code that's touched here underlies unpack_trees(), so this
protects any programs which use it, not just git-archive (but archive is
easy to test, and was what alerted me to this issue in a real-world
case).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:21:00 +0000 (02:21 -0400)]
add core.maxTreeDepth config
Most of our tree traversal algorithms use recursion to visit sub-trees.
For pathologically large trees, this can cause us to run out of stack
space and abort in an uncontrolled way. Let's put our own limit here so
that we can fail gracefully rather than segfaulting.
In similar cases where we recursed along the commit graph, we rewrote
the algorithms to avoid recursion and keep any stack data on the heap.
But the commit graph is meant to grow without bound, whereas it's not an
imposition to put a limit on the maximum size of tree we'll handle.
And this has a bonus side effect: coupled with a limit on individual
tree entry names, this limits the total size of a path we may encounter.
This gives us an extra protection against code handling long path names
which may suffer from integer overflows in the size (which could then be
exploited by malicious trees).
The default of 4096 is set to be much longer than anybody would care
about in the real world. Even with single-letter interior tree names
(like "a/b/c"), such a path is at least 8191 bytes. While most operating
systems will let you create such a path incrementally, trying to
reference the whole thing in a system call (as Git would do when
actually trying to access it) will result in ENAMETOOLONG. Coupled with
the recent fsck.largePathname warning, the maximum total pathname Git
will handle is (by default) 16MB.
This config option doesn't do anything yet; future patches will convert
various algorithms to respect the limit.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:20:01 +0000 (02:20 -0400)]
fsck: detect very large tree pathnames
In general, Git tries not to arbitrarily limit what it will store, and
there are currently no limits at all on the size of the path we find in
a tree. In theory you could have one that is gigabytes long.
But in practice this freedom is not really helping anybody, and is
potentially harmful:
1. Most operating systems have much lower limits for the size of a
single pathname component (e.g., on Linux you'll generally get
ENAMETOOLONG for anything over 255 bytes). And while you _can_ use
Git in a way that never touches the filesystem (manipulating the
index and trees directly), it's still probably not a good idea to
have gigantic tree names. Many operations load and traverse them,
so any clever Git-as-a-database scheme is likely to perform poorly
in that case.
2. We still have a lot of code which assumes strings are reasonably
sized, and I won't be at all surprised if you can trigger some
interesting integer overflows with gigantic pathnames. Stopping
malicious trees from entering the repository provides an extra line
of defense, protecting downstream code.
This patch implements an fsck check so that such trees can be rejected
by transfer.fsckObjects. I've picked a reasonably high maximum depth
here (4096) that hopefully should not bother anybody in practice. I've
also made it configurable, as an escape hatch.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:19:22 +0000 (02:19 -0400)]
tree-walk: rename "error" variable
The "error" variable in traverse_trees() shadows the global error()
function (meaning we can't call error() from here). Let's call the local
variable "ret" instead, which matches the idiom in other functions.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:19:16 +0000 (02:19 -0400)]
tree-walk: drop MAX_TRAVERSE_TREES macro
Since the previous commit dropped the hard-coded limit in
traverse_trees(), we don't need this macro there anymore (the code can
handle any number of trees in parallel).
We do define MAX_UNPACK_TREES using MAX_TRAVERSE_TREES, due to 5290d45134 (tree-walk.c: break circular dependency with unpack-trees,
2020-02-01). So we can just directly define that as "8" now; we know
traverse_trees() can handle whatever we throw at it.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:17:54 +0000 (02:17 -0400)]
tree-walk: reduce stack size for recursive functions
The traverse_trees() and traverse_trees_recursive() functions call each
other recursively. In a deep tree, this can result in running out of
stack space and crashing.
There's obviously going to be some limit here based on available stack,
but the problem is exacerbated by a few large structs, many of which we
over-allocate. For example, in traverse_trees() we store a name_entry
and tree_desc_x per tree, both of which contain an object_id (which is
now 32 bytes). And we allocate 8 of them (from MAX_TRAVERSE_TREES), even
though many traversals will only look at 1 or 2.
Interestingly, we used to allocate these on the heap, prior to 8dd40c0472 (traverse_trees(): use stack array for name entries,
2020-01-30). That commit was trying to simplify away allocation size
computations, and naively assumed that the sizes were small enough not
to matter. And they don't in normal cases, but on my stock Debian system
I see a crash running "git archive" on a tree with ~3600 entries.
That's deep enough we wouldn't see it in practice, but probably shallow
enough that we'd prefer not to make it a hard limit. Especially because
other systems may have even smaller stacks.
We can replace these stack variables with a few malloc invocations. This
reduces the stack sizes for the two functions from 1128 and 752 bytes,
respectively, down to 40 and 92 bytes. That allows a depth of ~13000 on
my machine (the improvement isn't in linear proportion because my
numbers don't count the size of parameters and other function overhead).
The possible downsides are:
1. We now have to remember to free(). But both functions have an easy
single exit (and already had to clean up other bits anyway).
2. The extra malloc()/free() overhead might be measurable. I tested
this by setting up a 3000-depth tree with a single blob and running
"git archive" on it. After switching to the heap, it consistently
runs 2-3% faster! Presumably this is because the 1K+ of wasted
stack space penalized memory caches.
On a more real-world case like linux.git, the speed difference isn't
measurable at all, simply because most trees aren't that deep and
there's so much other work going on (like accessing the objects
themselves). So the improvement I saw should be taken as evidence that
we're not making anything worse, but isn't really that interesting on
its own. The main motivation here is that we're now less likely to run
out of stack space and crash.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 21:17:33 +0000 (17:17 -0400)]
format-patch: use OPT_STRING_LIST for to/cc options
The to_callback() and cc_callback() functions are identical to the
generic parse_opt_string_list() function (except that they don't handle
optional arguments, but that's OK because their callers do not use the
OPTARG flag).
Let's simplify the code by using OPT_STRING_LIST.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>