Jeff King [Thu, 31 Aug 2023 06:23:20 +0000 (02:23 -0400)]
lower core.maxTreeDepth default to 2048
On my Linux system, all of our recursive tree walking algorithms can run
up to the 4096 default limit without segfaulting. But not all platforms
will have stack sizes as generous (nor might even Linux if we kick off a
recursive walk within a thread).
In particular, several of the tests added in the previous few commits
fail in our Windows CI environment. Through some guess-and-check
pushing, I found that 3072 is still too much, but 2048 is OK.
These are obviously vague heuristics, and there is nothing to promise
that another system might not have trouble at even lower values. But it
seems unlikely anybody will be too angry about a 2048-depth limit (this
is close to the default max-pathname limit on Linux even for a
pathological path like "a/a/a/..."). So let's just lower it.
Some alternatives are:
- configure separate defaults for Windows versus other platforms.
- just skip the tests on Windows. This leaves Windows users with the
annoying case that they can be crashed by running out of stack
space, but there shouldn't be any security implications (they can't
go deep enough to hit integer overflow problems).
Since the original default was arbitrary, it seems less confusing to
just lower it, keeping behavior consistent across platforms.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:22:08 +0000 (02:22 -0400)]
tree-diff: respect max_allowed_tree_depth
When diffing trees, we recurse to handle subtrees. That means we may run
out of stack space and segfault. Let's teach this code path about
core.maxTreeDepth in order to fail more gracefully.
As with the previous patch, we have no way to return an error (and other
tree-loading problems would just cause us to die()). So we'll likewise
call die() if we exceed the maximum depth.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:22:03 +0000 (02:22 -0400)]
list-objects: respect max_allowed_tree_depth
The tree traversal in list-objects.c, which is used by "rev-list
--objects", etc, uses recursion and may run out of stack space. Let's
teach it about the new core.maxTreeDepth config option.
We unfortunately can't return an error here, as this code doesn't
produce an error return at all. We'll die() instead, which matches the
behavior when we see an otherwise broken tree.
Note that this will also generally reject such deep trees from entering
the repository from a fetch or push, due to the use of rev-list in the
connectivity check. But it's not foolproof! We stop traversing when we
see an UNINTERESTING object, and the connectivity check marks existing
ref tips as UNINTERESTING. So imagine commit X has a tree
with maximum depth N. If you then create a new commit Y with a tree
entry "Y:subdir" that points to "X^{tree}", then the depth of Y will be
N+1. But a connectivity check running "git rev-list --objects Y --not X"
won't realize that; it will stop traversing at X^{tree}, since that was
already reachable.
So this will stop naive pushes of too-deep trees, but not carefully
crafted malicious ones. Doing it robustly and efficiently would require
caching the maximum depth of each tree (i.e., the longest path to any
leaf entry). That's much more complex and not strictly needed. If each
recursive algorithm limits itself already, then that's sufficient.
Blocking the objects from entering the repo would be a nice
belt-and-suspenders addition, but it's not worth the extra cost.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:21:55 +0000 (02:21 -0400)]
read_tree(): respect max_allowed_tree_depth
The read_tree() function reads trees recursively (via its read_tree_at()
helper). This can cause it to run out of stack space on very deep trees.
Let's teach it about the new core.maxTreeDepth option.
The easiest way to demonstrate this is via "ls-tree -r", which the test
covers. Note that I needed a tree depth of ~30k to trigger a segfault on
my Linux system, not the 4100 used by our "big" test in t6700. However,
that test still tells us what we want: that the default 4096 limit is
enough to prevent segfaults on all platforms. We could bump it, but that
increases the cost of the test setup for little gain.
As an interesting side-note: when I originally wrote this patch about 4
years ago, I needed a depth of ~50k to segfault. But porting it forward,
the number is much lower. Seemingly little things like cf0983213c (hash:
add an algo member to struct object_id, 2021-04-26) take it from 32,722
to 29,080.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:21:40 +0000 (02:21 -0400)]
traverse_trees(): respect max_allowed_tree_depth
The tree-walk.c code walks trees recursively, and may run out of stack
space. The easiest way to see this is with git-archive; on my 64-bit
Linux system it runs out of stack trying to generate a tarfile with a
tree depth of 13,772.
I've picked 4100 as the depth for our "big" test. I ran it with a much
higher value to confirm that we do get a segfault without this patch.
But really anything over 4096 is sufficient for its stated purpose,
which is to find out if our default limit of 4096 is low enough to
prevent segfaults on all platforms. Keeping it small saves us time on
the test setup.
The tree-walk code that's touched here underlies unpack_trees(), so this
protects any programs which use it, not just git-archive (but archive is
easy to test, and was what alerted me to this issue in a real-world
case).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:21:00 +0000 (02:21 -0400)]
add core.maxTreeDepth config
Most of our tree traversal algorithms use recursion to visit sub-trees.
For pathologically large trees, this can cause us to run out of stack
space and abort in an uncontrolled way. Let's put our own limit here so
that we can fail gracefully rather than segfaulting.
In similar cases where we recursed along the commit graph, we rewrote
the algorithms to avoid recursion and keep any stack data on the heap.
But the commit graph is meant to grow without bound, whereas it's not an
imposition to put a limit on the maximum size of tree we'll handle.
And this has a bonus side effect: coupled with a limit on individual
tree entry names, this limits the total size of a path we may encounter.
This gives us an extra protection against code handling long path names
which may suffer from integer overflows in the size (which could then be
exploited by malicious trees).
The default of 4096 is set to be much longer than anybody would care
about in the real world. Even with single-letter interior tree names
(like "a/b/c"), such a path is at least 8191 bytes. While most operating
systems will let you create such a path incrementally, trying to
reference the whole thing in a system call (as Git would do when
actually trying to access it) will result in ENAMETOOLONG. Coupled with
the recent fsck.largePathname warning, the maximum total pathname Git
will handle is (by default) 16MB.
This config option doesn't do anything yet; future patches will convert
various algorithms to respect the limit.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:20:01 +0000 (02:20 -0400)]
fsck: detect very large tree pathnames
In general, Git tries not to arbitrarily limit what it will store, and
there are currently no limits at all on the size of the path we find in
a tree. In theory you could have one that is gigabytes long.
But in practice this freedom is not really helping anybody, and is
potentially harmful:
1. Most operating systems have much lower limits for the size of a
single pathname component (e.g., on Linux you'll generally get
ENAMETOOLONG for anything over 255 bytes). And while you _can_ use
Git in a way that never touches the filesystem (manipulating the
index and trees directly), it's still probably not a good idea to
have gigantic tree names. Many operations load and traverse them,
so any clever Git-as-a-database scheme is likely to perform poorly
in that case.
2. We still have a lot of code which assumes strings are reasonably
sized, and I won't be at all surprised if you can trigger some
interesting integer overflows with gigantic pathnames. Stopping
malicious trees from entering the repository provides an extra line
of defense, protecting downstream code.
This patch implements an fsck check so that such trees can be rejected
by transfer.fsckObjects. I've picked a reasonably high maximum depth
here (4096) that hopefully should not bother anybody in practice. I've
also made it configurable, as an escape hatch.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:19:22 +0000 (02:19 -0400)]
tree-walk: rename "error" variable
The "error" variable in traverse_trees() shadows the global error()
function (meaning we can't call error() from here). Let's call the local
variable "ret" instead, which matches the idiom in other functions.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:19:16 +0000 (02:19 -0400)]
tree-walk: drop MAX_TRAVERSE_TREES macro
Since the previous commit dropped the hard-coded limit in
traverse_trees(), we don't need this macro there anymore (the code can
handle any number of trees in parallel).
We do define MAX_UNPACK_TREES using MAX_TRAVERSE_TREES, due to 5290d45134 (tree-walk.c: break circular dependency with unpack-trees,
2020-02-01). So we can just directly define that as "8" now; we know
traverse_trees() can handle whatever we throw at it.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 31 Aug 2023 06:17:54 +0000 (02:17 -0400)]
tree-walk: reduce stack size for recursive functions
The traverse_trees() and traverse_trees_recursive() functions call each
other recursively. In a deep tree, this can result in running out of
stack space and crashing.
There's obviously going to be some limit here based on available stack,
but the problem is exacerbated by a few large structs, many of which we
over-allocate. For example, in traverse_trees() we store a name_entry
and tree_desc_x per tree, both of which contain an object_id (which is
now 32 bytes). And we allocate 8 of them (from MAX_TRAVERSE_TREES), even
though many traversals will only look at 1 or 2.
Interestingly, we used to allocate these on the heap, prior to 8dd40c0472 (traverse_trees(): use stack array for name entries,
2020-01-30). That commit was trying to simplify away allocation size
computations, and naively assumed that the sizes were small enough not
to matter. And they don't in normal cases, but on my stock Debian system
I see a crash running "git archive" on a tree with ~3600 entries.
That's deep enough we wouldn't see it in practice, but probably shallow
enough that we'd prefer not to make it a hard limit. Especially because
other systems may have even smaller stacks.
We can replace these stack variables with a few malloc invocations. This
reduces the stack sizes for the two functions from 1128 and 752 bytes,
respectively, down to 40 and 92 bytes. That allows a depth of ~13000 on
my machine (the improvement isn't in linear proportion because my
numbers don't count the size of parameters and other function overhead).
The possible downsides are:
1. We now have to remember to free(). But both functions have an easy
single exit (and already had to clean up other bits anyway).
2. The extra malloc()/free() overhead might be measurable. I tested
this by setting up a 3000-depth tree with a single blob and running
"git archive" on it. After switching to the heap, it consistently
runs 2-3% faster! Presumably this is because the 1K+ of wasted
stack space penalized memory caches.
On a more real-world case like linux.git, the speed difference isn't
measurable at all, simply because most trees aren't that deep and
there's so much other work going on (like accessing the objects
themselves). So the improvement I saw should be taken as evidence that
we're not making anything worse, but isn't really that interesting on
its own. The main motivation here is that we're now less likely to run
out of stack space and crash.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Sat, 19 Aug 2023 23:53:42 +0000 (19:53 -0400)]
fsck: use enum object_type for fsck_walk callback
We switched the function interface for fsck callbacks in a1aad71601
(fsck.h: use "enum object_type" instead of "int", 2021-03-28). However,
we accidentally flipped the type back to "int" as part of 0b4e9013f1
(fsck: mark unused parameters in various fsck callbacks, 2023-07-03).
The mistake happened because that commit was written before a1aad71601
and rebased forward, and I screwed up while resolving the conflict.
Curiously, the compiler does not warn about this mismatch, at least not
when using gcc and clang on Linux (nor in any of our CI environments).
Based on 28abf260a5 (builtin/fsck.c: don't conflate "int" and "enum" in
callback, 2021-06-01), I'd guess that this would cause the AIX xlc
compiler to complain. I noticed because clang-18's UBSan now identifies
mis-matched function calls at runtime, and does complain of this case
when running the test suite.
I'm not entirely clear on whether this mismatch is a problem in
practice. Compilers are certainly free to make enums smaller than "int"
if they don't need the bits, but I suspect that they have to promote
back to int for function calls (though I didn't dig in the standard, and
I won't be surprised if I'm simply wrong and the real-world impact would
depend on the ABI).
Regardless, switching it back to enum is obviously the right thing to do
here; the switch to "int" was simply a mistake.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Martin Ågren [Wed, 16 Aug 2023 14:24:35 +0000 (16:24 +0200)]
show-ref doc: fix carets in monospace
When commit 00bf685975 (show-ref doc: update for internal consistency,
2023-05-19) switched from double quotes to backticks around our {caret}
macro, we started rendering "{caret}" literally. Fix this by replacing
by a "^" character.
Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Martin Ågren [Wed, 16 Aug 2023 14:24:34 +0000 (16:24 +0200)]
notes doc: tidy up `--no-stripspace` paragraph
Where we document the `--no-stripspace` option, remove a superfluous
"For" to fix the grammar. Mark option names and command names using
`backticks` to set them in monospace.
Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Martin Ågren [Wed, 16 Aug 2023 14:24:33 +0000 (16:24 +0200)]
notes doc: split up run-on sentences
When commit c4e2aa7d45 (notes.c: introduce "--[no-]stripspace" option,
2023-05-27) mentioned the new `--no-stripspace` in the documentation for
`-m` and `-F`, it created run-on sentences. It also used slightly
different language in the two sections for no apparent reason. Split the
sentences in two to improve readability, and while touching the two
sites, make them more similar.
Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Tue, 15 Aug 2023 23:24:56 +0000 (07:24 +0800)]
Merge branch 'master' of github.com:git/git
* 'master' of github.com:git/git: (34 commits)
Git 2.42-rc2
t4053: avoid writing to unopened pipe
t4053: avoid race when killing background processes
Git 2.42-rc1
git maintenance: avoid console window in scheduled tasks on Windows
win32: add a helper to run `git.exe` without a foreground window
t9001: remove excessive GIT_SEND_EMAIL_NOTTY=1
mv: handle lstat() failure correctly
parse-options: disallow negating OPTION_SET_INT 0
repack: free geometry struct
send-email: avoid creating more than one Term::ReadLine object
send-email: drop FakeTerm hack
t0040: declare non-tab indentation to be okay in this script
advice: handle "rebase" in error_resolve_conflict()
A few more topics before -rc1
mailmap: change primary address for Glen Choo
gitignore: ignore clangd .cache directory
docs: update when `git bisect visualize` uses `gitk`
compat/mingw: implement a native locate_in_PATH()
run-command: conditionally define locate_in_PATH()
...
Junio C Hamano [Tue, 15 Aug 2023 17:19:47 +0000 (10:19 -0700)]
Merge branch 'ds/maintenance-on-windows-fix'
Windows updates.
* ds/maintenance-on-windows-fix:
git maintenance: avoid console window in scheduled tasks on Windows
win32: add a helper to run `git.exe` without a foreground window
Jeff King [Sun, 13 Aug 2023 16:24:40 +0000 (12:24 -0400)]
t4053: avoid writing to unopened pipe
This fixes an occasional hang I see when running t4053 with
--verbose-log using dash.
Commit 1e3f26542a (diff --no-index: support reading from named pipes,
2023-07-05) added a test that "diff --no-index" will complain when
comparing a named pipe and a directory. The minimum we need to test this
is to mkfifo the pipe, and then run "git diff --no-index pipe some_dir".
But the test does one thing more: it spawns a background shell process
that opens the pipe for writing, like this:
{
(>pipe) &
} &&
This extra writer _could_ be useful if Git misbehaves and tries to open
the pipe for reading. Without the writer, Git would block indefinitely
and the test would never end. But since we do not have such a bug, Git
does not open the pipe and it is the writing process which will block
indefinitely, since there are no readers. The test addresses this by
running "kill $!" in a test_when_finished block. Since the writer should
be blocking forever, this kill command will reliably find it waiting.
However, this seems to be somewhat racy, in that the writing process
sometimes hangs around even after the "kill". In a normal run of the
test script without options, this doesn't have any effect; the
main test script completes anyway. But with --verbose-log, we spawn a
"tee" process that reads the script output, and it won't end until all
descriptors pointing to its input pipe are closed. And the background
process that is hanging around still has its stderr, etc, pointed into
that pipe.
You can reproduce the situation like this:
cd t
./t4053-diff-no-index.sh --verbose-log --stress
Let that run for a few minutes, and then you'll find that some of the
runs have hung. For example, at 11:53, I ran:
$ ps xk start o pid,start,command | grep tee | head
713459 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-9.out
713527 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-15.out
719434 11:48:07 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-1.out
728117 11:48:08 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-5.out
738738 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-31.out
739457 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-27.out
744432 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-21.out
744471 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-29.out
761961 11:48:12 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-0.out
812299 11:48:19 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-8.out
All of these have been hung for several minutes. We can investigate one
and see that it's waiting to get EOF on its input:
$ strace -p 713459
strace: Process 713459 attached
read(0,
^C
Who else has that descriptor open?
$ lsof -a -p 713459 -d 0 +E
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
tee 713459 peff 0r FIFO 0,13 0t0 3943636 pipe 719203,sh,5w 719203,sh,7w 719203,sh,12w 719203,sh,13w
sh 719203 peff 5w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,7w 719203,sh,12w 719203,sh,13w
sh 719203 peff 7w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,12w 719203,sh,13w
sh 719203 peff 12w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,13w
sh 719203 peff 13w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,12w
It's a shell, presumably a subshell spawned by the main script. Though
it may seem odd, having the same descriptor open several times is not
unreasonable (they're all basically the original stdout/stderr of the
script that has been copied). And they should all close when the process
exits. So what's it doing? Curiously, it will exit as soon as we strace
it:
- it is blocking in the openat() call for the pipe, as we expect (so
this is definitely the backgrounded subshell mentioned above)
- strace sends signals (probably STOP/CONT); those cause the kernel to
stop blocking, but libc will restart the system call automatically
- by this time, the "pipe" fifo is gone, so we'll actually try to
create a regular file. But of course the surrounding directory is
gone, too! So we get ENOENT, and then exit as normal.
So the blocking is something we expect to happen. But what we didn't
expect is for the process to still exist at all! It should have been
killed earlier when the parent process called "kill", but it wasn't. And
we can't catch the race at this point, because it happened much earlier.
One can guess, though, that there is some race with the shell setting up
the signal handling in the backgrounded subshell, and possibly blocking
or ignoring signals at the time that the "kill" is received. Curiously,
the race does not seem to happen if I use "bash" instead of "dash", so
presumably bash's setup here is more atomic.
One fix might be to try killing the subshell more aggressively, either
using SIGKILL, or looping on kill/wait. But that seems complex and
likely to introduce new problems/races. Instead, we can observe that the
writer is not needed at all. Git will notice the pipe via stat() before
it is ever opened. So we can simply drop the writer subshell entirely.
If we ever changed Git to open the path and fstat() it, this would
result in the test hanging. But we're not likely to do that. After all,
we have to stat() paths to see if they are openable at all (e.g., it
could be a directory), so this seems like a low risk. And anybody who
does make such a change will immediately see the issue, as Git would
hang consistently.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Thu, 10 Aug 2023 14:33:13 +0000 (14:33 +0000)]
t4053: avoid race when killing background processes
The test 'diff --no-index reads from pipes' starts a couple of
background processes that write to the pipes that are passed to "diff
--no-index". If the test passes then we expect these processes to exit
as all their output will have been read. However if the test fails
then we want to make sure they do not hang about on the users machine
and the test remembers they should be killed by calling
test_when_finished "! kill $!"
after each background process is created. Unfortunately there is a
race where test_when_finished may run before the background process
exits even when all its output has been read resulting in the kill
command succeeding which causes the test to fail. Fix this by ignoring
the exit status of the kill command. If the diff is successful we
could instead wait for the background process to exit and check their
status but that feels like it is testing the platform's printf
implementation rather than git's code.
Reported-by: Jeff King <peff@peff.net> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 9 Aug 2023 23:18:16 +0000 (16:18 -0700)]
Merge branch 'pw/rebase-skip-commit-message-fix'
"git rebase -i" with a series of squash/fixup, when one of the
steps stopped in conflicts and ended up getting skipped, did not
handle the accumulated commit log messages, which has been
corrected.
* pw/rebase-skip-commit-message-fix:
rebase --skip: fix commit message clean up when skipping squash
Junio C Hamano [Wed, 9 Aug 2023 23:18:15 +0000 (16:18 -0700)]
Merge branch 'ma/locate-in-path-for-windows'
"git bisect visualize" stopped running "gitk" on Git for Windows
when the command was reimplemented in C around Git 2.34 timeframe.
This has been corrected.
* ma/locate-in-path-for-windows:
docs: update when `git bisect visualize` uses `gitk`
compat/mingw: implement a native locate_in_PATH()
run-command: conditionally define locate_in_PATH()
git maintenance: avoid console window in scheduled tasks on Windows
We just introduced a helper to avoid showing a console window when the
scheduled task runs `git.exe`. Let's actually use it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
win32: add a helper to run `git.exe` without a foreground window
On Windows, there are two kinds of executables, console ones and
non-console ones. Git's executables are all console ones.
When launching the former e.g. in a scheduled task, a CMD window pops
up. This is not what we want for the tasks installed via the `git
maintenance` command.
To work around this, let's introduce `headless-git.exe`, which is a
non-console program that does _not_ pop up any window. All it does is to
re-launch `git.exe`, suppressing that console window, passing through
all command-line arguments as-are.
Helped-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Helped-by: Yuyi Wang <Strawberry_Str@hotmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was added by 3ece9bf0f9 (send-email: clear the $message_id after
validation, 2023-05-17) for no apparent reason, as this is required only
in cases when git's stdin is (must be) redirected, which isn't the case
here.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sebastian Thiel [Wed, 9 Aug 2023 07:47:41 +0000 (07:47 +0000)]
mv: handle lstat() failure correctly
When moving a directory onto another with `git mv` various checks are
performed. One of of these validates that the destination is not existing.
When calling `lstat` on the destination path and it fails as the path
doesn't exist, some environments seem to overwrite the passed in
`stat` memory nonetheless (I observed this issue on debian 12 of x86_64,
running on OrbStack on ARM, emulated with Rosetta).
This would affect the code that followed as it would still acccess a now
modified `st` structure, which now seems to contain uninitialized memory.
`S_ISDIR(st_dir_mode)` would then typically return false causing the code
to run into a bad case.
The fix avoids overwriting the existing `st` structure, providing an
alternative that exists only for that purpose.
Note that this patch minimizes complexity instead of stack-frame size.
Signed-off-by: Sebastian Thiel <sebastian.thiel@icloud.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Tue, 8 Aug 2023 20:05:57 +0000 (22:05 +0200)]
parse-options: disallow negating OPTION_SET_INT 0
An option of type OPTION_SET_INT can be defined to set its variable to
zero. It's negated variant will do the same, though, which is
confusing. Several such options were fixed by disabling negation,
changing the value to set or using a different option type:
Jeff King [Tue, 8 Aug 2023 18:50:23 +0000 (14:50 -0400)]
repack: free geometry struct
When the program is ending, we call clear_pack_geometry() to free any
resources in the pack_geometry struct. But the struct itself is
allocated on the heap, and leak-checkers will complain about the
resulting small leak.
This one was marked by Coverity as a "new" leak, though it has existed
since 0fabafd0b9 (builtin/repack.c: add '--geometric' option,
2021-02-22). This might be because recent unrelated changes in the file
confused it about what is new and what is not. But regardless, it is
worth addressing.
We can fix it easily by free-ing the struct. We'll convert our "clear"
function to "free", since the allocation happens in the matching init()
function (though since there is only one call to each, and the struct is
local to this file, it's mostly academic).
Another option would be to put the struct on the stack rather than the
heap. However, this gets tricky, as we check the pointer against NULL in
several places to decide whether we're in geometric mode.
Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 8 Aug 2023 18:15:31 +0000 (14:15 -0400)]
send-email: avoid creating more than one Term::ReadLine object
Every time git-send-email calls its ask() function to prompt the user,
we call term(), which instantiates a new Term::ReadLine object. But in
v1.46 of Term::ReadLine::Gnu (which provides the Term::ReadLine
interface on some platforms), its constructor refuses to create a second
instance[1]. So on systems with that version of the module, most
git-send-email instances will fail (as we usually prompt for both "to"
and "in-reply-to" unless the user provided them on the command line).
We can fix this by keeping a single instance variable and returning it
for each call to term(). In perl 5.10 and up, we could do that with a
"state" variable. But since we only require 5.008, we'll do it the
old-fashioned way, with a lexical "my" in its own scope.
Note that the tests in t9001 detect this problem as-is, since the
failure mode is for the program to die. But let's also beef up the
"Prompting works" test to check that it correctly handles multiple
inputs (if we had chosen to keep our FakeTerm hack in the previous
commit, then the failure mode would be incorrectly ignoring prompts
after the first).
[1] For discussion of why multiple instances are forbidden, see:
https://github.com/hirooih/perl-trg/issues/16
Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 8 Aug 2023 18:14:36 +0000 (14:14 -0400)]
send-email: drop FakeTerm hack
Back in 280242d1cc (send-email: do not barf when Term::ReadLine does not
like your terminal, 2006-07-02), we added a fallback for when
Term::ReadLine's constructor failed: we'd have a FakeTerm object
instead, which would then die if anybody actually tried to call
readline() on it. Since we instantiated the $term variable at program
startup, we needed this workaround to let the program run in modes when
we did not prompt the user.
But later, in f4dc9432fd (send-email: lazily load modules for a big
speedup, 2021-05-28), we started loading Term::ReadLine lazily only when
ask() is called. So at that point we know we're trying to prompt the
user, and we can just die if ReadLine instantiation fails, rather than
making this fake object to lazily delay showing the error.
This should be OK even if there is no tty (e.g., we're in a cron job),
because Term::ReadLine will return a stub object in that case whose "IN"
and "OUT" functions return undef. And since 5906f54e47 (send-email:
don't attempt to prompt if tty is closed, 2009-03-31), we check for that
case and skip prompting.
And we can be sure that FakeTerm was not kicking in for such a
situation, because it has actually been broken since that commit! It
does not define "IN" or "OUT" methods, so perl would barf with an error.
If FakeTerm was in use, we were neither honoring what 5906f54e47 tried
to do, nor producing the readable message that 280242d1cc intended.
So we're better off just dropping FakeTerm entirely, and letting the
error reported by constructing Term::ReadLine through.
Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t0040: declare non-tab indentation to be okay in this script
By necessity, this script needs to verify that certain Git output
matches expectations, including text indented with spaces instead of
tabs.
Most recently, such a check was introduced in 448abbba6347 (short help:
allow multi-line opthelp, 2023-07-18) which is reported by `git diff
--check 448abbba6347^!` as having whitespace issues.
Let's not complain about this because it is intentional.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
advice: handle "rebase" in error_resolve_conflict()
This makes sure that we get a properly translated message rather than
inserting the command (which we failed to translate) into a generic
fallback message.
The function is called indirectly via die_resolve_conflict() with fixed
strings, and directly with the string obtained via action_name(), which
in turn returns a string from a fixed set. Hence we know that the now
covered set of strings is exhausitive, and will therefore BUG() out when
encountering an unexpected string. We also know that all covered strings
are actually used.
Arguably, the above suggests that it would be cleaner to pass the
command as an enum in the first place, but that's left for another time.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 7 Aug 2023 18:57:18 +0000 (11:57 -0700)]
Merge branch 'ew/sha256-gcrypt-leak-fixes'
Leakfixes.
* ew/sha256-gcrypt-leak-fixes:
sha256/gcrypt: die on gcry_md_open failures
sha256/gcrypt: fix memory leak with SHA-256 repos
sha256/gcrypt: fix build with SANITIZE=leak
Junio C Hamano [Mon, 7 Aug 2023 18:57:18 +0000 (11:57 -0700)]
Merge branch 'am/doc-sha256'
Tone down the warning on SHA-256 repositories being an experimental
curiosity. We do not have support for them to interoperate with
traditional SHA-1 repositories, but at this point, we do not plan
to make breaking changes to SHA-256 repositories and there is no
longer need for such a strongly phrased warning.
* am/doc-sha256:
doc: sha256 is no longer experimental
In at least some versions of clangd, including version 15 in Ubuntu
23.04, a directory, .cache, is written in the root of the repository
with index information about the files in the repository. Since clangd
is the most common language server protocol (LSP) implementation for C,
and we already support it using the GENERATE_COMPILATION_DATABASE flags
to make it functional, it's likely many users are using or will want to
use it.
As a result, ignore the ".cache" directory to help avoid users
accidentally committing the data.
Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 4 Aug 2023 17:52:31 +0000 (10:52 -0700)]
Merge branch 'jc/branch-in-use-error-message'
"git branch -f X" to repoint the branch X said that X was "checked
out" in another worktree, even when branch X was not and instead
being bisected or rebased. The message was reworded to say the
branch was "in use".
* jc/branch-in-use-error-message:
branch: update the message to refuse touching a branch in-use
Junio C Hamano [Fri, 4 Aug 2023 17:52:31 +0000 (10:52 -0700)]
Merge branch 'jc/parse-options-short-help'
Command line parser fix, and a small parse-options API update.
* jc/parse-options-short-help:
short help: allow a gap smaller than USAGE_GAP
remote: simplify "remote add --tags" help text
short help: allow multi-line opthelp
Junio C Hamano [Fri, 4 Aug 2023 17:52:30 +0000 (10:52 -0700)]
Merge branch 'la/doc-choose-starting-point-fixup'
Clarify how to pick a starting point for a new topic in the
SubmittingPatches document.
* la/doc-choose-starting-point-fixup:
SubmittingPatches: use of older maintenance tracks is an exception
SubmittingPatches: explain why 'next' and above are inappropriate base
SubmittingPatches: choice of base for fixing an older maintenance track
Junio C Hamano [Fri, 4 Aug 2023 17:52:30 +0000 (10:52 -0700)]
Merge branch 'ja/worktree-orphan-fix'
Fix tests with unportable regex patterns.
* ja/worktree-orphan-fix:
t2400: rewrite regex to avoid unintentional PCRE
builtin/worktree.c: convert tab in advice to space
t2400: drop no-op `--sq` from rev-parse call
Junio C Hamano [Fri, 4 Aug 2023 17:52:30 +0000 (10:52 -0700)]
Merge branch 'jc/retire-get-sha1-hex'
The implementation of "get_sha1_hex()" that reads a hexadecimal
string that spells a full object name has been extended to cope
with any hash function used in the repository, but the "sha1" in
its name survived. Rename it to get_hash_hex(), a name that is
more consistent within its friends like get_hash_hex_algop().
Junio C Hamano [Fri, 4 Aug 2023 17:52:29 +0000 (10:52 -0700)]
Merge branch 'la/doc-choose-starting-point'
Clarify how to choose the starting point for a new topic in
developer guidance document.
* la/doc-choose-starting-point:
SubmittingPatches: simplify guidance for choosing a starting point
SubmittingPatches: emphasize need to communicate non-default starting points
SubmittingPatches: de-emphasize branches as starting points
SubmittingPatches: discuss subsystems separately from git.git
SubmittingPatches: reword awkward phrasing
docs: update when `git bisect visualize` uses `gitk`
This check has involved more environment variables than just `DISPLAY` since 508e84a790 (bisect view: check for MinGW32 and MacOSX in addition to X11,
2008-02-14), so let's update the documentation accordingly.
Signed-off-by: Matthias Aßhauer <mha1993@live.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
since 5e1f28d (bisect--helper: reimplement `bisect_visualize()` shell
function in C, 2021-09-13) `git bisect visualize` uses exists_in_PATH()
to check wether it should call `gitk`, but exists_in_PATH() relies on
locate_in_PATH() which currently only understands POSIX-ish PATH variables
(a list of paths, separated by colons) on native Windows executables
we encounter Windows PATH variables (a list of paths that often contain
drive letters (and thus colons), separated by semicolons). Luckily we do
already have a function that can lookup executables on windows PATHs:
path_lookup(). Implement a small replacement for the existing
locate_in_PATH() based on path_lookup().
Reported-by: Louis Strous <Louis.Strous@intellimagic.com> Signed-off-by: Matthias Aßhauer <mha1993@live.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit doesn't change any behaviour by itself, but allows us to easily
define compat replacements for locate_in_PATH(). It prepares us for the next
commit that adds a native Windows implementation of locate_in_PATH().
Signed-off-by: Matthias Aßhauer <mha1993@live.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Thu, 3 Aug 2023 13:09:35 +0000 (13:09 +0000)]
rebase --skip: fix commit message clean up when skipping squash
During a series of "fixup" and/or "squash" commands, the interactive
rebase accumulates a commit message from all the commits that are being
squashed together. If one of the commits has conflicts when it is picked
and the user chooses to skip that commit then we need to remove that
commit's message from accumulated messages. To do this 15ef69314d5
(rebase --skip: clean up commit message after a failed fixup/squash,
2018-04-27) updated commit_staged_changes() to reset the accumulated
message to the commit message of HEAD (which does not contain the
message from the skipped commit) when the last command was "fixup" or
"squash" and there are no staged changes. Unfortunately the code to do
this contains two bugs.
(1) If parse_head() fails we pass an invalid pointer to
unuse_commit_buffer().
(2) The reconstructed message uses the entire commit buffer from HEAD
including the headers, rather than just the commit message.
The first issue is fixed by splitting up the "if" condition into several
statements each with its own error handling. The second issue is fixed
by finding the start of the commit message within the commit buffer
using find_commit_subject().
The existing test added by 15ef69314d5 is modified to show the effect of
this bug. The bug is triggered when skipping the first command in the
chain (as the test does before this commit) but the effect is hidden
because opts->current_fixup_count is set to zero which leads
update_squash_messages() to recreate the squash message file from
scratch overwriting the bad message created by
commit_staged_changes(). The test is also updated to explicitly check
the commit messages rather than relying on grep to ensure they do not
contain any stray commit headers.
To check the commit message the function test_commit_message() is moved
from t3437-rebase-fixup-options.sh to test-lib.sh. As the function is
now publicly available it is updated to provide better error detection
and avoid overwriting the commonly used files "actual" and "expect".
Support for reading the expected commit message from stdin is also
added.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we process a user's name (as in user.name), we strip all
leading and trailing crud from it. Right now, we consider a dot
a crud character, and strip it off.
However, this is unsuitable for many personal names because humans
frequently have abbreviated suffixes, such as "Jr." or "Sr." at the end
of their names, and this corrupts them. Some other users may wish to
use an abbreviated name or initial, which will pose a problem especially
in cultures that write the family name first, followed by the personal
name.
Since the current approach causes lots of practical problems, let's
avoid it by no longer considering a dot to be crud.
Note that "." in the name forces the entire name to be quoted to
please mailers, but stripping "." only at the beginning and the end
does not help a name with "." in the middle (like "brian m. carlson")
so this change will not make it much worse. A name like "Given
Family, Jr." that did not have to be quoted now would need to be, in
order to be placed on the e-mail headers, though.
This is based on a weather-balloon patch by Jeff King sent in Aug 2021
https://lore.kernel.org/git/YSKm8Q8nyTavQaox@coredump.intra.peff.net/
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 2 Aug 2023 16:37:23 +0000 (09:37 -0700)]
Merge branch 'ah/sequencer-rewrite-todo-fix'
When the user edits "rebase -i" todo file so that it starts with a
"fixup", which would make it invalid, the command truncated the
rest of the file before giving an error and returning the control
back to the user. Stop truncating to make it easier to correct
such a malformed todo file.
* ah/sequencer-rewrite-todo-fix:
sequencer: finish parsing the todo list despite an invalid first line
Junio C Hamano [Wed, 2 Aug 2023 16:37:23 +0000 (09:37 -0700)]
Merge branch 'ah/autoconf-fixes'
"./configure --with-expat=no" did not work as a way to refuse use
of the expat library on a system with the library installed, which
has been corrected.
* ah/autoconf-fixes:
configure.ac: always save NO_ICONV to config.status
configure.ac: don't overwrite NO_CURL option
configure.ac: don't overwrite NO_EXPAT option
Junio C Hamano [Wed, 2 Aug 2023 16:37:23 +0000 (09:37 -0700)]
Merge branch 'jc/tree-walk-drop-base-offset'
Code simplification.
* jc/tree-walk-drop-base-offset:
tree-walk: drop unused base_offset from do_match()
tree-walk: lose base_offset that is never used in tree_entry_interesting