Junio C Hamano [Fri, 13 Sep 2024 22:26:49 +0000 (15:26 -0700)]
Merge branch 'jk/free-commit-buffer-of-skipped-commits' into maint-2.46
The code forgot to discard unnecessary in-core commit buffer data
for commits that "git log --skip=<number>" traversed but omitted
from the output, which has been corrected.
* jk/free-commit-buffer-of-skipped-commits:
revision: free commit buffers for skipped commits
Junio C Hamano [Thu, 12 Sep 2024 18:02:16 +0000 (11:02 -0700)]
Merge branch 'ps/bundle-outside-repo-fix' into maint-2.46
"git bundle unbundle" outside a repository triggered a BUG()
unnecessarily, which has been corrected.
* ps/bundle-outside-repo-fix:
bundle: default to SHA1 when reading bundle headers
builtin/bundle: have unbundle check for repo before opening its bundle
Junio C Hamano [Thu, 12 Sep 2024 18:02:16 +0000 (11:02 -0700)]
Merge branch 'jc/patch-id' into maint-2.46
The patch parser in "git patch-id" has been tightened to avoid
getting confused by lines that look like a patch header in the log
message.
cf. <Zqh2T_2RLt0SeKF7@tanuki>
* jc/patch-id:
patch-id: tighten code to detect the patch header
patch-id: rewrite code that detects the beginning of a patch
patch-id: make get_one_patchid() more extensible
patch-id: call flush_current_id() only when needed
t4204: patch-id supports various input format
builtin/index-pack: fix segfaults when running outside of a repo
It was reported that git-verify-pack(1) has started to crash with Git
v2.46.0 when run outside of a repository. This is another fallout from c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), where we have stopped setting the default hash algorithm
for `the_repository`. Consequently, code that relies on `the_hash_algo`
will now crash when it hasn't explicitly been initialized, which may be
the case when running outside of a Git repository.
The crash is not in git-verify-pack(1) but instead in git-index-pack(1),
which gets called by the former. Ideally, both of these programs should
be able to identify the hash algorithm used by the packfile and index
without having to rely on external information. But unfortunately, the
format for neither of them is completely self-describing, so it is not
possible to derive that information. This is a design issue that we
should address by introducing a new packfile version that encodes its
object hash.
For now though the more important fix is to not make either of these
programs crash anymore, which we do by falling back to SHA1 when the
object hash is unconfigured. This pessimizes reading packfiles which
use a different hash than SHA1, but restores previous behaviour.
Reported-by: Ilya K <me@0upti.me> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 30 Aug 2024 20:53:31 +0000 (16:53 -0400)]
revision: free commit buffers for skipped commits
In git-log we leave the save_commit_buffer flag set to "1", which tells
the commit parsing code to store the object content after it has parsed
it to find parents, tree, etc. That lets us reuse the contents for
pretty-printing the commit in the output. And then after printing each
commit, we call free_commit_buffer(), since we don't need it anymore.
But some options may cause us to traverse commits which are not part of
the output. And so git-log does not see them at all, and doesn't free
them. One such case is something like:
which will churn through a million commits, before showing only a
thousand. We loop through these inside get_revision(), without freeing
the contents. As a result, we end up storing the object data for those
million commits simultaneously.
We should free the stored buffers (if any) for those commits as we skip
over them, which is what this patch does. Running the above command in
linux.git drops the peak heap usage from ~1.1GB to ~200MB, according to
valgrind/massif. (I thought we might get an even bigger improvement, but
the remaining memory is going to commit/tree structs, which we do hold
on to forever).
Note that this problem doesn't occur if:
- you're running a git-rev-list without a --format parameter; it turns
off save_commit_buffer by default, since it only output the object
id
- you've built a commit-graph file, since in that case we'd use the
optimized graph data instead of the initial parse, and then do a
lazy parse for commits we're actually going to output
There are probably some other option combinations that can likewise
end up with useless stored commit buffers. For example, if you ask for
"foo..bar", then we'll have to walk down to the merge base, and
everything on the "foo" side won't be shown. Tuning the "save" behavior
to handle that might be tricky (I guess maybe drop buffers for anything
we mark as UNINTERESTING?). And in the long run, the right solution here
is probably to make sure the commit-graph is built (since it fixes the
memory problem _and_ drastically reduces CPU usage).
But since this "--skip" case is an easy one-liner, it's worth fixing in
the meantime. It should be OK to make this call even if there is no
saved buffer (e.g., because save_commit_buffer=0, or because a
commit-graph was used), since it's O(1) to look up the buffer and is a
noop if it isn't present. I verified by running the above command after
"git commit-graph write --reachable", and it takes the same time with
and without this patch.
Reported-by: Yuri Karnilaev <karnilaev@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 26 Aug 2024 18:10:24 +0000 (11:10 -0700)]
Merge branch 'xx/diff-tree-remerge-diff-fix' into maint-2.46
"git rev-list ... | git diff-tree -p --remerge-diff --stdin" should
behave more or less like "git log -p --remerge-diff" but instead it
crashed, forgetting to prepare a temporary object store needed.
* xx/diff-tree-remerge-diff-fix:
diff-tree: fix crash when used with --remerge-diff
Junio C Hamano [Mon, 26 Aug 2024 18:10:18 +0000 (11:10 -0700)]
Merge branch 'tb/config-fixed-value-with-valueless-true' into maint-2.46
"git config --value=foo --fixed-value section.key newvalue" barfed
when the existing value in the configuration file used the
valueless true syntax, which has been corrected.
* tb/config-fixed-value-with-valueless-true:
config.c: avoid segfault with --fixed-value and valueless config
ci(win+VS): download the vcpkg artifacts using a dedicated GitHub Action
The Git for Windows project provides a GitHub Action to download and
cache Azure Pipelines artifacts (such as the `vcpkg` artifacts), hiding
gnarly internals, and also providing some robustness against network
glitches. Let's use it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch was originally by GitHub's Dependabot, but I cannot attribute
that bot properly because it has no dedicated email address. Probably
because it hasn't reached legal age yet, or something.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 16 Aug 2024 19:50:56 +0000 (12:50 -0700)]
Merge branch 'ps/p4-tests-updates' into maint-2.46
Perforce tests have been updated.
cf. <na5mwletzpnacietbc7pzqcgb622mvrwgrkjgjosysz3gvjcso@gzxxi7d7icr7>
* ps/p4-tests-updates:
t98xx: mark Perforce tests as memory-leak free
ci: update Perforce version to r23.2
t98xx: fix Perforce tests with p4d r23 and newer
Junio C Hamano [Fri, 16 Aug 2024 19:50:55 +0000 (12:50 -0700)]
Merge branch 'dd/notes-empty-no-edit-by-default' into maint-2.46
"git notes add -m '' --allow-empty" and friends that take prepared
data to create notes should not invoke an editor, but it started
doing so since Git 2.42, which has been corrected.
* dd/notes-empty-no-edit-by-default:
notes: do not trigger editor when adding an empty note
Junio C Hamano [Fri, 16 Aug 2024 19:50:54 +0000 (12:50 -0700)]
Merge branch 'jc/doc-rebase-fuzz-vs-offset-fix' into maint-2.46
"git rebase --help" referred to "offset" (the difference between
the location a change was taken from and the change gets replaced)
incorrectly and called it "fuzz", which has been corrected.
* jc/doc-rebase-fuzz-vs-offset-fix:
doc: difference in location to apply is "offset", not "fuzz"
Junio C Hamano [Fri, 16 Aug 2024 19:50:53 +0000 (12:50 -0700)]
Merge branch 'pw/add-patch-with-suppress-blank-empty' into maint-2.46
"git add -p" by users with diff.suppressBlankEmpty set to true
failed to parse the patch that represents an unmodified empty line
with an empty line (not a line with a single space on it), which
has been corrected.
* pw/add-patch-with-suppress-blank-empty:
add-patch: use normalize_marker() when recounting edited hunk
add-patch: handle splitting hunks with diff.suppressBlankEmpty
Junio C Hamano [Fri, 16 Aug 2024 19:50:51 +0000 (12:50 -0700)]
Merge branch 'jc/checkout-no-op-switch-errors' into maint-2.46
"git checkout --ours" (no other arguments) complained that the
option is incompatible with branch switching, which is technically
correct, but found confusing by some users. It now says that the
user needs to give pathspec to specify what paths to checkout.
* jc/checkout-no-op-switch-errors:
checkout: special case error messages during noop switching
Jeff King [Thu, 15 Aug 2024 15:30:07 +0000 (11:30 -0400)]
t4129: fix racy index when calling chmod after git-add
This patch fixes a racy test failure in t4129.
The deletion test added by e95d515141 (apply: canonicalize modes read
from patches, 2024-08-05) wants to make sure that git-apply does not
complain about a non-canonical mode in the patch, even if that mode does
not match the working tree file. So it does this:
This is wrong, because running chmod will update the ctime on the file,
making it stat-dirty and causing git-apply to refuse to apply the patch.
But this only happens sometimes, since it depends on the timestamps
crossing a second boundary (but it triggers pretty quickly when run with
--stress).
We can fix this by doing the chmod before updating the index. The order
isn't important here, as the mode will be canonicalized to 100644 in the
index anyway (in fact, the chmod is not even that important in the first
place, since git-apply will only look at the index; I only added it as
an extra confirmation that git-apply would not be confused by it).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
bundle: default to SHA1 when reading bundle headers
We hit a segfault when trying to open a bundle via `git bundle
list-heads` when running outside of a repository. This is caused by c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), which stopped setting the default object hash so that
`the_hash_algo` is a `NULL` pointer when running outside of any repo.
This is only a symptom of a deeper issue though. Bundles default to the
SHA1 object format unless they advertise an "@object-format=" header.
Consequently, it has been wrong in the first place to use the object
format used by the current repository when parsing bundles. The
consequence is that trying to open a bundle that uses a different object
hash than the current repository will fail:
Fix the bug by defaulting to the SHA1 object hash. We already handle the
"@object-format=" header as expected, so we don't need to adapt this
part.
Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/bundle: have unbundle check for repo before opening its bundle
The `git bundle unbundle` subcommand requires a repository to unbundle
the contents into. As thus, the subcommand checks whether we have a
startup repository in the first place, and if not it dies.
This check happens after we have already opened the bundle though. This
causes a segfault when running outside of a repository starting with c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07) because we have no hash function set up, but we do try to
parse refs advertised by the bundle's header.
The next commit will fix that underlying issue by defaulting to the SHA1
object format for bundles, which will also fix the described segfault here.
But as we know that we will die anyway, we can do better than that and
avoid some vain work by moving the check for a repository before we try
to open the bundle.
Reported-by: ArcticLampyrid <ArcticLampyrid@outlook.com> Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This bug is reported by `log-tree.c:do_remerge_diff`, where a bug check
added in commit 7b90ab467a (log: clean unneeded objects during log
--remerge-diff, 2022-02-02) detects the absence of `remerge_objdir` when
attempting to clean up temporary objects generated during the remerge
process.
After some further digging, I find that the remerge-related diff options
were introduced in db757e8b8d (show, log: provide a --remerge-diff
capability, 2022-02-02), which also affect the setup of `rev_info` for
"git-diff-tree", but were not accounted for in the original
implementation (inferred from the commit message).
Elijah Newren, the author of the remerge diff feature, notes that other
callers of `log-tree.c:log_tree_commit` (the only caller of
`log-tree.c:do_remerge_diff`) also exist, but:
`builtin/am.c`: manually sets all flags; remerge_diff is not among them
`sequencer.c`: manually sets all flags; remerge_diff is not among them
so `builtin/diff-tree.c` really is the only caller that was overlooked
when remerge-diff functionality was added.
This commit resolves the crash by adding `remerge_objdir` setup logic to
`builtin/diff-tree.c`, mirroring `builtin/log.c:cmd_log_walk_no_free`.
It also includes the necessary cleanup for `remerge_objdir`.
Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 8 Aug 2024 21:19:25 +0000 (14:19 -0700)]
tests: drop use of 'tee' that hides exit status
A few tests have "| tee output" downstream of a git command, and
then inspect the contents of the file. The net effect is that we
use an extra process, and hide the exit status from the upstream git
command.
In any of these tests, I do not see a reason why we want to hide a
possible failure from these git commands. Replace the use of tee
with a plain simple redirection.
Kyle Lippincott [Mon, 5 Aug 2024 17:10:08 +0000 (17:10 +0000)]
t6421: fix test to work when repo dir contains d0
The `grep` statement in this test looks for `d0.*<string>`, attempting
to filter to only show lines that had tabular output where the 2nd
column had `d0` and the final column had a substring of
[`git -c `]`fetch.negotiationAlgorithm`. These lines also have
`child_start` in the 4th column, but this isn't part of the condition.
A subsequent line will have `d1` in the 2nd column, `start` in the 4th
column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final
column. If `/path/to/git/git` contains the substring `d0`, then this
line is included by `grep` as well as the desired line, leading to an
effective doubling of the number of lines, and test failures.
Tighten the grep expression to require `d0` to be surrounded by spaces,
and to have the `child_start` label.
Signed-off-by: Kyle Lippincott <spectral@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Kyle Lippincott [Mon, 5 Aug 2024 17:10:07 +0000 (17:10 +0000)]
set errno=0 before strtoX calls
To detect conversion failure after calls to functions like `strtod`, one
can check `errno == ERANGE`. These functions are not guaranteed to set
`errno` to `0` on successful conversion, however. Manual manipulation of
`errno` can likely be avoided by checking that the output pointer
differs from the input pointer, but that's not how other locations, such
as parse.c:139, handle this issue; they set errno to 0 prior to
executing the function.
For every place I could find a strtoX function with an ERANGE check
following it, set `errno = 0;` prior to executing the conversion
function.
Signed-off-by: Kyle Lippincott <spectral@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 5 Aug 2024 06:00:10 +0000 (02:00 -0400)]
apply: canonicalize modes read from patches
Git stores only canonical modes for blobs. So for a regular file, we
care about only "100644" or "100755" (depending only on the executable
bit), but never modes where the group or other permissions are more
exotic. So never "100664", "100700", etc. When a file in the working
tree has such a mode, we quietly turn it into one of the two canonical
modes, and that's what is stored both in the index and in tree objects.
However, we don't canonicalize modes we read from incoming patches in
git-apply. These may appear in a few lines:
- "old mode" / "new mode" lines for mode changes
- "new file mode" lines for newly created files
- "deleted file mode" for removing files
For "new mode" and for "new file mode", this is harmless. The patch is
asking the result to have a certain mode, but:
- when we add an index entry (for --index or --cached), it is
canonicalized as we create the entry, via create_ce_mode().
- for a working tree file, try_create_file() passes either 0777 or
0666 to open(), so what you get depends only on your umask, not any
other bits (aside from the executable bit) in the original mode.
However, for "old mode" and "deleted file mode", there is a minor
annoyance. We compare the patch's expected preimage mode with the
current state. But that current state is always going to be a canonical
mode itself:
- updating an index entry via --cached will have the canonical mode in
the index
- for updating a working tree file, check_preimage() runs the mode
through ce_mode_from_stat(), which does the usual canonicalization
So if the patch feeds a non-canonical mode, it's impossible for it to
match, and we will always complain with something like:
file has type 100644, expected 100664
Since this is just a warning, the operation proceeds, but it's
confusing and annoying.
These cases should be pretty rare in practice. Git would never produce a
patch with non-canonical modes itself (since it doesn't store them).
And while we do accept patches from other programs, all of those lines
were invented by Git. So you'd need a program trying to be Git
compatible, but not handling canonicalization the same way. Reportedly
"quilt" is such a program.
We should canonicalize the modes as we read them so that the user never
sees the useless warning.
A few notes on the tests:
- I've covered instances of all lines for completeness, even though
the "new mode" / "new file mode" ones behave OK currently.
- the tests apply patches to both the index and working tree, and
check the result of both. Again, we know that all of these paths
canonicalize anyway, but it's giving us extra coverage (although we
are even less likely to have such a bug now since we canonicalize up
front).
- the test patches are missing "index" lines, which is also something
Git would never produce. But they don't matter for the test, they do
match the case from quilt we saw in the wild, and they avoid some
sha1/sha256 complexity.
Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/ls-remote: fall back to SHA1 outside of a repo
In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07), we have stopped setting the default hash algorithm for
`the_repository`. Consequently, code that relies on `the_hash_algo` will
now crash when it hasn't explicitly been initialized, which may be the
case when running outside of a Git repository.
It was reported that git-ls-remote(1) may crash in such a way when using
a remote helper that advertises refspecs. This is because the refspec
announced by the helper will get parsed during capability negotiation.
At that point we haven't yet figured out what object format the remote
uses though, so when run outside of a repository then we will fail.
The course of action is somewhat dubious in the first place. Ideally, we
should only parse object IDs once we have asked the remote helper for
the object format. And if the helper didn't announce the "object-format"
capability, then we should always assume SHA256. But instead, we used to
take either SHA1 if there was no repository, or we used the hash of the
local repository, which is wrong.
Arguably though, crashing hard may not be in the best interest of our
users, either. So while the old behaviour was buggy, let's restore it
for now as a short-term fix. We should eventually revisit, potentially
by deferring the point in time when we parse the refspec until after we
have figured out the remote's object hash.
Reported-by: Mike Hommey <mh@glandium.org> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 1 Aug 2024 18:51:12 +0000 (11:51 -0700)]
t0018: remove leftover debugging cruft
The actual file is copied out to /tmp, presumably so that the tester
can inspect it after the test is done, which may have been a useful
debugging aid.
But in the final shape of the test suite, such a code should not
exist. We cannot even assume that we are allowed to write into /tmp
(our TMPDIR may not even be pointing at it) or read from it for that
matter.
Noticed-by: Randall S. Becker <rsbecker@nexbridge.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 1 Aug 2024 17:06:54 +0000 (13:06 -0400)]
config.c: avoid segfault with --fixed-value and valueless config
When using `--fixed-value` with a key whose value is left empty (implied
as being "true"), 'git config' may crash when invoked like either of:
$ git config set --file=config --value=value --fixed-value \
section.key pattern
$ git config --file=config --fixed-value section.key value pattern
The original bugreport[1] bisects to 00bbdde141 (builtin/config:
introduce "set" subcommand, 2024-05-06), which is a red-herring, since
the original bugreport uses the new 'git config set' invocation.
The behavior likely bisects back to c90702a1f6 (config: plumb
--fixed-value into config API, 2020-11-25), which introduces the new
--fixed-value option in the first place.
Looking at the relevant frame from a failed process's coredump, the
crash appears in config.c::matches() like so:
(gdb) up
#1 0x000055b3e8b06022 in matches (key=0x55b3ea894360 "section.key", value=0x0,
store=0x7ffe99076eb0) at config.c:2884
2884 return !strcmp(store->fixed_value, value);
where we are trying to compare the `--fixed-value` argument to `value`,
which is NULL.
Avoid attempting to match `--fixed-value` for configuration keys with no
explicit value. A future patch could consider the empty value to mean
"true", "yes", "on", etc. when invoked with `--type=bool`, but let's
punt on that for now in the name of avoiding the segfault.
Jeff King [Thu, 1 Aug 2024 08:25:56 +0000 (04:25 -0400)]
credential/osxkeychain: respect NUL terminator in username
This patch fixes a case where git-credential-osxkeychain might output
uninitialized bytes to stdout.
We need to get the username string from a system API using
CFStringGetCString(). To do that, we get the max size for the string
from CFStringGetMaximumSizeForEncoding(), allocate a buffer based on
that, and then read into it. But then we print the entire buffer to
stdout, including the trailing NUL and any extra bytes which were not
needed. Instead, we should stop at the NUL.
This code comes from 9abe31f5f1 (osxkeychain: replace deprecated
SecKeychain API, 2024-02-17). The bug was probably overlooked back then
because this code is only used as a fallback when we can't get the
string via CFStringGetCStringPtr(). According to Apple's documentation:
Whether or not this function returns a valid pointer or NULL depends
on many factors, all of which depend on how the string was created and
its properties.
So it's not clear how we could make a test for this, and we'll have to
rely on manually testing on a system that triggered the bug in the first
place.
Reported-by: Hong Jiang <ilford@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Tested-by: Hong Jiang <ilford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
All the Perforce tests are free of memory leaks. This went unnoticed
because most folks do not have p4 and p4d installed on their computers.
Consequently, given that the prerequisites for running those tests
aren't fulfilled, `TEST_PASSES_SANITIZE_LEAK=check` won't notice that
those tests are indeed memory leak free.
Mark those tests accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update our Perforce version from r21.2 to r23.2. Note that the updated
version is not the newest version. Instead, it is the last version where
the way that Perforce is being distributed remains the same as in r21.2.
Newer releases stopped distributing p4 and p4d executables as well as
the macOS archives directly and would thus require more work.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some of the tests in t98xx modify the Perforce depot in ways that the
tool wouldn't normally allow. This is done to test behaviour of git-p4
in certain edge cases that we have observed in the wild, but which
should in theory not be possible.
Naturally, modifying the depot on disk directly is quite intimate with
the tool and thus prone to breakage when Perforce updates the way that
data is stored. And indeed, those tests are broken nowadays with r23 of
Perforce. While a file revision was previously stored as a plain file
"depot/file,v", it is now stored in a directory "depot/file,d" with
compression.
Adapt those tests to handle both old- and new-style depot layouts.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
D Harithamma [Wed, 31 Jul 2024 13:33:59 +0000 (13:33 +0000)]
convert: return early when not tracing
When Git adds a file requiring encoding conversion and tracing of encoding
conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
environment variable, the `trace_encoding()` function still allocates &
prepares "human readable" copies of the file contents before and after
conversion to show in the trace. This results in a high memory footprint
and increased runtime without providing any user-visible benefit.
This fix introduces an early exit from the `trace_encoding()` function
when tracing is not requested, preventing unnecessary memory allocation
and processing.
Signed-off-by: D Harithamma <harithamma.d@ibm.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Tue, 30 Jul 2024 14:00:20 +0000 (16:00 +0200)]
t-example-decorate: remove test messages
The test_msg() calls only repeat information already present in test
descriptions and check definitions, which are shown automatically if
the checks fail. Remove the redundant messages to simplify the tests
and their output. Here it is with all of them failing before:
# check "ret == NULL" failed at t/unit-tests/t-example-decorate.c:18
# when adding a brand-new object, NULL should be returned
# check "ret == NULL" failed at t/unit-tests/t-example-decorate.c:21
# when adding a brand-new object, NULL should be returned
not ok 1 - Add 2 objects, one with a non-NULL decoration and one with a NULL decoration.
# check "ret == &vars->decoration_a" failed at t/unit-tests/t-example-decorate.c:29
# when readding an already existing object, existing decoration should be returned
# check "ret == NULL" failed at t/unit-tests/t-example-decorate.c:32
# when readding an already existing object, existing decoration should be returned
not ok 2 - When re-adding an already existing object, the old decoration is returned.
# check "ret == NULL" failed at t/unit-tests/t-example-decorate.c:40
# lookup should return added declaration
# check "ret == &vars->decoration_b" failed at t/unit-tests/t-example-decorate.c:43
# lookup should return added declaration
# check "ret == NULL" failed at t/unit-tests/t-example-decorate.c:46
# lookup for unknown object should return NULL
not ok 3 - Lookup returns the added declarations, or NULL if the object was never added.
# check "objects_noticed == 2" failed at t/unit-tests/t-example-decorate.c:58
# left: 1
# right: 2
# should have 2 objects
not ok 4 - The user can also loop through all entries.
1..4
... and here with the patch applied:
# check "ret == NULL" failed at t/unit-tests/t-example-decorate.c:18
# check "ret == NULL" failed at t/unit-tests/t-example-decorate.c:20
not ok 1 - Add 2 objects, one with a non-NULL decoration and one with a NULL decoration.
# check "ret == &vars->decoration_a" failed at t/unit-tests/t-example-decorate.c:27
# check "ret == NULL" failed at t/unit-tests/t-example-decorate.c:29
not ok 2 - When re-adding an already existing object, the old decoration is returned.
# check "ret == NULL" failed at t/unit-tests/t-example-decorate.c:36
# check "ret == &vars->decoration_b" failed at t/unit-tests/t-example-decorate.c:38
# check "ret == NULL" failed at t/unit-tests/t-example-decorate.c:40
not ok 3 - Lookup returns the added declarations, or NULL if the object was never added.
# check "objects_noticed == 2" failed at t/unit-tests/t-example-decorate.c:51
# left: 1
# right: 2
not ok 4 - The user can also loop through all entries.
1..4
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 30 Jul 2024 18:43:52 +0000 (11:43 -0700)]
safe.directory: setting safe.directory="." allows the "current" directory
When "git daemon" enters a repository, it chdir's to the requested
repository and then uses "." (the curent directory) to consult the
"is this repository considered safe?" when it is not owned by the
same owner as the process.
Make sure this access will be allowed by setting safe.directory to
".", as that was once advertised on the list as a valid workaround
to the overly tight safe.directory settings introduced by 2.45.1
(cf. <834862fd-b579-438a-b9b3-5246bf27ce8a@gmail.com>).
Also add simlar test to show what happens in the same setting if the
safe.directory is set to "*" instead of "."; in short, "." is a bit
tighter (as it is custom designed for git-daemon situation) than
"anything goes" settings given by "*".
Junio C Hamano [Tue, 30 Jul 2024 18:43:51 +0000 (11:43 -0700)]
safe.directory: normalize the configured path
The pathname of a repository comes from getcwd() and it could be a
path aliased via symbolic links, e.g., the real directory may be
/home/u/repository but a symbolic link /home/u/repo may point at it,
and the clone request may come as "git clone file:///home/u/repo/"
A request to check if /home/u/repository is safe would be rejected
if the safe.directory configuration allows /home/u/repo/ but not its
alias /home/u/repository/. Normalize the paths configured for the
safe.directory configuration variable before comparing them with the
path being checked.
Two and a half things to note, compared to the previous step to
normalize the actual path of the suspected repository, are:
- A configured safe.directory may be coming from .gitignore in the
home directory that may be shared across machines. The path
meant to match with an entry may not necessarily exist on all of
such machines, so not being able to convert them to real path on
this machine is *not* a condition that is worthy of warning.
Hence, we ignore a path that cannot be converted to a real path.
- A configured safe.directory is essentially a random string that
user throws at us, written completely unrelated to the directory
the current process happens to be in. Hence it makes little
sense to give a non-absolute path. Hence we ignore any
non-absolute paths, except for ".".
- The safe.directory set to "." was once advertised on the list as
a valid workaround for the regression caused by the overly tight
safe.directory check introduced in 2.45.1; we treat it to mean
"if we are at the top level of a repository, it is OK".
(cf. <834862fd-b579-438a-b9b3-5246bf27ce8a@gmail.com>).
Suggested-by: Phillip Wood <phillip.wood123@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 30 Jul 2024 18:43:50 +0000 (11:43 -0700)]
safe.directory: normalize the checked path
The pathname of a repository comes from getcwd() and it could be a
path aliased via symbolic links, e.g., the real directory may be
/home/u/repository but a symbolic link /home/u/repo may point at it,
and the clone request may come as "git clone file:///home/u/repo/".
A request to check if /home/u/repo is safe would be rejected if the
safe.directory configuration allows /home/u/repository/ but not its
alias /home/u/repo/. Normalize the path being checked before
comparing with safe.directory value(s).
Suggested-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 30 Jul 2024 18:43:49 +0000 (11:43 -0700)]
safe.directory: preliminary clean-up
The paths given in the safe.directory configuration variable are
allowed to contain "~user" (which interpolates to user's home
directory) and "%(prefix)" (which interpolates to the installation
location in RUNTIME_PREFIX-enabled builds, and a call to the
git_config_pathname() function is tasked to obtain a copy of the
path with these constructs interpolated.
The function, when it succeeds, always yields an allocated string in
the location given as the out-parameter; even when there is nothing
to interpolate in the original, a literal copy is made. The code
path that contains this caller somehow made two contradicting and
incorrect assumptions of the behaviour when there is no need for
interpolation, and was written with extra defensiveness against
two phantom risks that do not exist.
One wrong assumption was that the function might yield NULL when
there is no interpolation. This led to the use of an extra "check"
variable, conditionally holding either the interpolated or the
original string. The assumption was with us since 8959555c
(setup_git_directory(): add an owner check for the top-level
directory, 2022-03-02) originally introduced the safe.directory
feature.
Another wrong assumption was that the function might yield the same
pointer as the input when there is no interpolation. This led to a
conditional free'ing of the interpolated copy, that the conditional
never skipped, as we always received an allocated string.
Simplify the code by removing the extra defensiveness.
Junio C Hamano [Tue, 30 Jul 2024 01:17:38 +0000 (18:17 -0700)]
patch-id: tighten code to detect the patch header
The get_one_patchid() function unconditionally takes a line that
matches the patch header (namely, a line that begins with a full
object name, possibly prefixed by "commit" or "From" plus a space)
as the beginning of a patch. Even when it is *not* looking for one
(namely, when the previous call found the patch header and returned,
and then we are called again to skip the log message and process the
patch whose header was found by the previous invocation).
As a consequence, a line in the commit log message that begins with
one of these patterns can be mistaken to start another patch, with
current message entirely skipped (because we haven't even reached
the patch at all).
Allow the caller to tell us if it called us already and saw the
patch header (in which case we shouldn't be looking for another one,
until we see the "diff" part of the patch; instead we simply should
be skipping these lines as part of the commit log message), and skip
the header processing logic when that is the case. In the helper
function, it also needs to flip this "are we looking for a header?"
bit, once it finished skipping the commit log message and started
processing the patches, as the patch header of the _next_ message is
the only clue in the input that the current patch is done.
Junio C Hamano [Tue, 30 Jul 2024 01:17:37 +0000 (18:17 -0700)]
patch-id: rewrite code that detects the beginning of a patch
The get_one_patchid() function reads input lines until it finds a
patch header (the line that begins a patch), whose beginning is one
of:
(1) an "<object name>", which is what "git diff-tree --stdin" shows;
(2) "commit <object name>", which is what "git log" shows; or
(3) "From <object name>", which is what "git log --format=email" shows.
When it finds such a line, it returns to the caller, reporting the
<object name> it found, and the size of the "patch" it processed.
The caller then calls the function again, which then ignores the
commit log message, and then processes the lines in the patch part
until it hits another "beginning of a patch".
The above logic was fairly easy to see until 2bb73ae8 (patch-id: use
starts_with() and skip_prefix(), 2016-05-28) reorganized the code,
which made another logic that has nothing to do with the "where does
the next patch begin?" logic, which came from 2485eab5
(git-patch-id: do not trip over "no newline" markers, 2011-02-17)
that ignores the "\ No newline at the end", rolled into the same
single if() statement.
Let's split it out. The "\ No newline at the end" marker is part of
the patch, should not appear before we start reading the patch part,
and does not belong to the detection of patch header.
Junio C Hamano [Tue, 30 Jul 2024 01:17:36 +0000 (18:17 -0700)]
patch-id: make get_one_patchid() more extensible
We pass two independent Boolean flags (i.e. do we want the stable
variant of patch-id? do we want to hash the stuff verbatim?) into
the function as two separate parameters. Before adding the third
one and make the interface even wider, let's consolidate them into
a single flag word.
No changes in behaviour. Just a trivial interface change.
Junio C Hamano [Tue, 30 Jul 2024 01:17:35 +0000 (18:17 -0700)]
patch-id: call flush_current_id() only when needed
The caller passes a flag that is used to become no-op when calling
flush_current_id(). Instead of calling something that becomes a
no-op, teach the caller not to call it in the first place.
Junio C Hamano [Tue, 30 Jul 2024 01:17:34 +0000 (18:17 -0700)]
t4204: patch-id supports various input format
"git patch-id" was first developed to read from "git diff-tree
--stdin -p" output. Later it was enhanced to read from "git
diff-tree --stdin -p -v", which was the downstream of an early
imitation of "git log" ("git rev-list" run in the upstream of a pipe
to feed the "diff-tree"). These days, we also read from "git
format-patch".
Their output begins slightly differently, but the patch-id computed
over them for the same commit should be the same. Ensure that we
won't accidentally break this expectation.
David Disseldorp [Mon, 29 Jul 2024 15:14:00 +0000 (17:14 +0200)]
notes: do not trigger editor when adding an empty note
With "git notes add -C $blob", the given blob contents are to be made
into a note without involving an editor. But when "--allow-empty" is
given, the editor is invoked, which can cause problems for
non-interactive callers[1].
This behaviour started with 90bc19b3ae (notes.c: introduce
'--separator=<paragraph-break>' option, 2023-05-27), which changed
editor invocation logic to check for a zero length note_data buffer.
Restore the original behaviour of "git note" that takes the contents
given via the "-m", "-C", "-F" options without invoking an editor, by
checking for any prior parameter callbacks, indicated by a non-zero
note_data.msg_nr. Remove the now-unneeded note_data.given flag.
Add a test for this regression by checking whether GIT_EDITOR is
invoked alongside "git notes add -C $empty_blob --allow-empty"
[1] https://github.com/ddiss/icyci/issues/12
Signed-off-by: David Disseldorp <ddiss@suse.de>
[jc: enhanced the test with -m/-F options] Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --format option on the git-ls-files man page states that `%xx`
interpolates to the character with hex code `xx`. This mirrors the
documentation and behavior of `git for-each-ref --format=...`. However,
in reality it requires the character with code `XX` to be specified as
`%xXX`, mirroring the behaviour of `git log --format`.
Signed-off-by: Jayson Rhynas <jayrhynas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 25 Jul 2024 23:07:28 +0000 (16:07 -0700)]
csum-file: introduce discard_hashfile()
The hashfile API is used to write out a "hashfile", which has a
final checksum (typically SHA-1) at the end. An in-core hashfile
structure has up to two file descriptors and a few buffers that can
only be freed by calling a helper function that is private to the
csum-file implementation.
The usual flow of a user of the API is to first open a file
descriptor for writing, obtain a hashfile associated with that write
file descriptor by calling either hashfd() or hashfd_check(), call
hashwrite() number of times to write data to the file, and then call
finalize_hashfile(), which appends th checksum to the end of the
file, closes file descriptors and releases associated buffers.
But what if a caller finds some error after calling hashfd() to
start the process and/or hashwrite() to send some data to the file,
and wants to abort the operation? The underlying file descriptor is
often managed by the tempfile API, so aborting will clean the file
out of the filesystem, but the resources associated with the in-core
hashfile structure is lost.
Introduce discard_hashfile() API function to allow them to release
the resources held by a hashfile structure the callers want to
dispose of, and use that in read-cache.c:do_write_index(), which is
a central place that writes the index file.
Mark t2107 as leak-free, as this leak in "update-index --cacheinfo"
test that deliberately makes it fail is now plugged.
Junio C Hamano [Thu, 25 Jul 2024 17:27:29 +0000 (10:27 -0700)]
doc: difference in location to apply is "offset", not "fuzz"
The documentation to "git rebase" says that the line numbers (in the
rebased change) may not exactly be the same as the line numbers the
change gets replayed on top of the new base, but uses a wrong noun
"fuzz". It should have said "offset".
They are both terms of art. "fuzz" is about context lines not
exactly matching. "offset" is about the difference in the location
that a change was taken from the original and the change gets
replayed on the target. "offset" is often inevitable and part of
normal life. "fuzz" on the other hand is often a sign of trouble
(and indeed "Git" refuses to apply a change with "fuzz", except
there are options to be fuzzy about whitespaces).