Derrick Stolee [Tue, 9 Aug 2022 13:11:40 +0000 (13:11 +0000)]
bundle-uri: create basic file-copy logic
Before implementing a way to fetch bundles into a repository, create the
basic logic. Assume that the URI is actually a file path. Future logic
will make this more careful to other protocols.
For now, we also only succeed if the content at the URI is a bundle
file, not a bundle list. Bundle lists will be implemented in a future
change.
Note that the discovery of a temporary filename is slightly racy because
the odb_mkstemp() relies on the temporary file not existing. With the
current implementation being limited to file copies, we could replace
the copy_file() with copy_fd(). The tricky part comes in future changes
that send the filename to 'git remote-https' and its 'get' capability.
At that point, we need the file descriptor closed _and_ the file
unlinked. If we were to keep the file descriptor open for the sake of
normal file copies, then we would pollute the rest of the code for
little benefit. This is especially the case because we expect that most
bundle URI use will be based on HTTPS instead of file copies.
Reviewed-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Tue, 9 Aug 2022 13:11:39 +0000 (13:11 +0000)]
remote-curl: add 'get' capability
A future change will want a way to download a file over HTTP(S) using
the simplest of download mechanisms. We do not want to assume that the
server on the other side understands anything about the Git protocol but
could be a simple static web server.
Create the new 'get' capability for the remote helpers which advertises
that the 'get' command is avalable. A caller can send a line containing
'get <url> <path>' to download the file at <url> into the file at
<path>.
Reviewed-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 19 Jul 2022 23:40:19 +0000 (16:40 -0700)]
Merge branch 'll/curl-accept-language'
Earlier, HTTP transport clients learned to tell the server side
what locale they are in by sending Accept-Language HTTP header, but
this was done only for some requests but not others.
* ll/curl-accept-language:
remote-curl: send Accept-Language header to server
Junio C Hamano [Tue, 19 Jul 2022 23:40:17 +0000 (16:40 -0700)]
Merge branch 'jk/clone-unborn-confusion'
"git clone" from a repository with some ref whose HEAD is unborn
did not set the HEAD in the resulting repository correctly, which
has been corrected.
* jk/clone-unborn-confusion:
clone: move unborn head creation to update_head()
clone: use remote branch if it matches default HEAD
clone: propagate empty remote HEAD even with other branches
clone: drop extra newline from warning message
Junio C Hamano [Mon, 18 Jul 2022 20:31:56 +0000 (13:31 -0700)]
Merge branch 'ab/cocci-unused'
Add Coccinelle rules to detect the pattern of initializing and then
finalizing a structure without using it in between at all, which
happens after code restructuring and the compilers fail to
recognize as an unused variable.
* ab/cocci-unused:
cocci: generalize "unused" rule to cover more than "strbuf"
cocci: add and apply a rule to find "unused" strbufs
cocci: have "coccicheck{,-pending}" depend on "coccicheck-test"
cocci: add a "coccicheck-test" target and test *.cocci rules
Makefile & .gitignore: ignore & clean "git.res", not "*.res"
Makefile: remove mandatory "spatch" arguments from SPATCH_FLAGS
Junio C Hamano [Mon, 18 Jul 2022 20:31:56 +0000 (13:31 -0700)]
Merge branch 'en/merge-dual-dir-renames-fix'
Fixes a long-standing corner case bug around directory renames in
the merge-ort strategy.
* en/merge-dual-dir-renames-fix:
merge-ort: fix issue with dual rename and add/add conflict
merge-ort: shuffle the computation and cleanup of potential collisions
merge-ort: make a separate function for freeing struct collisions
merge-ort: small cleanups of check_for_directory_rename
t6423: add tests of dual directory rename plus add/add conflict
Junio C Hamano [Mon, 18 Jul 2022 20:31:55 +0000 (13:31 -0700)]
Merge branch 'ab/test-without-templates'
Tweak tests so that they still work when the "git init" template
did not create .git/info directory.
* ab/test-without-templates:
tests: don't assume a .git/info for .git/info/sparse-checkout
tests: don't assume a .git/info for .git/info/exclude
tests: don't assume a .git/info for .git/info/refs
tests: don't assume a .git/info for .git/info/attributes
tests: don't assume a .git/info for .git/info/grafts
tests: don't depend on template-created .git/branches
t0008: don't rely on default ".git/info/exclude"
Junio C Hamano [Mon, 18 Jul 2022 20:31:55 +0000 (13:31 -0700)]
Merge branch 'ab/build-gitweb'
Teach "make all" to build gitweb as well.
* ab/build-gitweb:
gitweb/Makefile: add a "NO_GITWEB" parameter
Makefile: build 'gitweb' in the default target
gitweb/Makefile: include in top-level Makefile
gitweb: remove "test" and "test-installed" targets
gitweb/Makefile: prepare to merge into top-level Makefile
gitweb/Makefile: clear up and de-duplicate the gitweb.{css,js} vars
gitweb/Makefile: add a $(GITWEB_ALL) variable
gitweb/Makefile: define all .PHONY prerequisites inline
René Scharfe [Fri, 15 Jul 2022 03:58:50 +0000 (05:58 +0200)]
mingw: avoid mktemp() in mkstemp() implementation
The implementation of mkstemp() for MinGW uses mktemp() and open()
without the flag O_EXCL, which is racy. It's not a security problem
for now because all of its callers only create files within the
repository (incl. worktrees). Replace it with a call to our more
secure internal function, git_mkstemp_mode(), to prevent possible
future issues.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a known social engineering attack that takes advantage of the
fact that a working tree can include an entire bare repository,
including a config file. A user could run a Git command inside the bare
repository thinking that the config file of the 'outer' repository would
be used, but in reality, the bare repository's config file (which is
attacker-controlled) is used, which may result in arbitrary code
execution. See [1] for a fuller description and deeper discussion.
A simple mitigation is to forbid bare repositories unless specified via
`--git-dir` or `GIT_DIR`. In environments that don't use bare
repositories, this would be minimally disruptive.
Create a config variable, `safe.bareRepository`, that tells Git whether
or not to die() when working with a bare repository. This config is an
enum of:
- "all": allow all bare repositories (this is the default)
- "explicit": only allow bare repositories specified via --git-dir
or GIT_DIR.
If we want to protect users from such attacks by default, neither value
will suffice - "all" provides no protection, but "explicit" is
impractical for bare repository users. A more usable default would be to
allow only non-embedded bare repositories ([2] contains one such
proposal), but detecting if a repository is embedded is potentially
non-trivial, so this work is not implemented in this series.
Use git_protected_config() to read `safe.directory` instead of
read_very_early_config(), making it 'protected configuration only'.
As a result, `safe.directory` now respects "-c", so update the tests and
docs accordingly. It used to ignore "-c" due to how it was implemented,
not because of security or correctness concerns [1].
`uploadpack.packObjectsHook` is the only 'protected configuration only'
variable today, but we've noted that `safe.directory` and the upcoming
`safe.bareRepository` should also be 'protected configuration only'. So,
for consistency, we'd like to have a single implementation for protected
configuration.
The primary constraints are:
1. Reading from protected configuration should be fast. Nearly all "git"
commands inside a bare repository will read both `safe.directory` and
`safe.bareRepository`, so we cannot afford to be slow.
2. Protected configuration must be readable when the gitdir is not
known. `safe.directory` and `safe.bareRepository` both affect
repository discovery and the gitdir is not known at that point [1].
The chosen implementation in this commit is to read protected
configuration and cache the values in a global configset. This is
similar to the caching behavior we get with the_repository->config.
Introduce git_protected_config(), which reads protected configuration
and caches them in the global configset protected_config. Then, refactor
`uploadpack.packObjectsHook` to use git_protected_config().
The protected configuration functions are named similarly to their
non-protected counterparts, e.g. git_protected_config_check_init() vs
git_config_check_init().
In light of constraint 1, this implementation can still be improved.
git_protected_config() iterates through every variable in
protected_config, which is wasteful, but it makes the conversion simple
because it matches existing patterns. We will likely implement constant
time lookup functions for protected configuration in a future series
(such functions already exist for non-protected configuration, i.e.
repo_config_get_*()).
An alternative that avoids introducing another configset is to continue
to read all config using git_config(), but only accept values that have
the correct config scope [2]. This technically fulfills constraint 2,
because git_config() simply ignores the local and worktree config when
the gitdir is not known. However, this would read incomplete config into
the_repository->config, which would need to be reset when the gitdir is
known and git_config() needs to read the local and worktree config.
Resetting the_repository->config might be reasonable while we only have
these 'protected configuration only' variables, but it's not clear
whether this extends well to future variables.
[1] In this case, we do have a candidate gitdir though, so with a little
refactoring, it might be possible to provide a gitdir.
[2] This is how `uploadpack.packObjectsHook` was implemented prior to
this commit.
Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
For security reasons, there are config variables that are only trusted
when they are specified in certain configuration scopes, which are
sometimes referred to on-list as 'protected configuration' [1]. A future
commit will introduce another such variable, so let's define our terms
so that we can have consistent documentation and implementation.
In our documentation, define 'protected configuration' as the system,
global and command config scopes. As a shorthand, I will refer to
variables that are only respected in protected configuration as
'protected configuration only', but this term is not used in the
documentation.
This definition of protected configuration is based on whether or not
Git can reasonably protect the user by ignoring the configuration scope:
- System, global and command line config are considered protected
because an attacker who has control over any of those can do plenty of
harm without Git, so we gain very little by ignoring those scopes.
- On the other hand, local (and similarly, worktree) config are not
considered protected because it is relatively easy for an attacker to
control local config, e.g.:
- On some shared user environments, a non-admin attacker can create a
repository high up the directory hierarchy (e.g. C:\.git on
Windows), and a user may accidentally use it when their PS1
automatically invokes "git" commands.
`safe.directory` prevents attacks of this form by making sure that
the user intended to use the shared repository. It obviously
shouldn't be read from the repository, because that would end up
trusting the repository that Git was supposed to reject.
- "git upload-pack" is expected to run in repositories that may not be
controlled by the user. We cannot ignore all config in that
repository (because "git upload-pack" would fail), but we can limit
the risks by ignoring `uploadpack.packObjectsHook`.
Only `uploadpack.packObjectsHook` is 'protected configuration only'. The
following variables are intentionally excluded:
- `safe.directory` should be 'protected configuration only', but it does
not technically fit the definition because it is not respected in the
"command" scope. A future commit will fix this.
- `trace2.*` happens to read the same scopes as `safe.directory` because
they share an implementation. However, this is not for security
reasons; it is because we want to start tracing so early that
repository-level config and "-c" are not available [2].
This requirement is unique to `trace2.*`, so it does not makes sense
for protected configuration to be subject to the same constraints.
[1] For example,
https://lore.kernel.org/git/6af83767-576b-75c4-c778-0284344a8fe7@github.com/
[2] https://lore.kernel.org/git/a0c89d0d-669e-bf56-25d2-cbb09b012e70@jeffhostetler.com/
Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit, we will introduce "protected configuration",
which is easiest to describe in terms of configuration scopes (i.e. it's
the union of the 'system', 'global', and 'command' scopes). This
description is fine for ML discussions, but it's inadequate for end
users because we don't provide a good description of "configuration
scopes" in the public docs.
145d59f482 (config: add '--show-scope' to print the scope of a config
value, 2020-02-10) introduced the word "scope" to our public docs, but
that only enumerates the scopes and assumes the user can figure out
what those values mean.
Add a SCOPES section to Documentation/git-config.txt that describes the
configuration scopes, their corresponding CLI options, and mentions that
some configuration options are only respected in certain scopes. Then,
use the word "scope" to simplify the FILES section and change some
confusing wording.
Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 14 Jul 2022 22:03:59 +0000 (15:03 -0700)]
Merge branch 'sy/mv-out-of-cone'
"git mv A B" in a sparsely populated working tree can be asked to
move a path between directories that are "in cone" (i.e. expected
to be materialized in the working tree) and "out of cone"
(i.e. expected to be hidden). The handling of such cases has been
improved.
* sy/mv-out-of-cone:
mv: add check_dir_in_index() and solve general dir check issue
mv: use flags mode for update_mode
mv: check if <destination> exists in index to handle overwriting
mv: check if out-of-cone file exists in index with SKIP_WORKTREE bit
mv: decouple if/else-if checks using goto
mv: update sparsity after moving from out-of-cone to in-cone
t1092: mv directory from out-of-cone to in-cone
t7002: add tests for moving out-of-cone file/directory
Junio C Hamano [Thu, 14 Jul 2022 22:03:59 +0000 (15:03 -0700)]
Merge branch 'hx/unpack-streaming'
Allow large objects read from a packstream to be streamed into a
loose object file straight, without having to keep it in-core as a
whole.
* hx/unpack-streaming:
unpack-objects: use stream_loose_object() to unpack large objects
core doc: modernize core.bigFileThreshold documentation
object-file.c: add "stream_loose_object()" to handle large object
object-file.c: factor out deflate part of write_loose_object()
object-file.c: refactor write_loose_object() to several steps
unpack-objects: low memory footprint for get_data() in dry_run mode
Junio C Hamano [Thu, 14 Jul 2022 22:03:58 +0000 (15:03 -0700)]
Merge branch 'en/merge-tree'
"git merge-tree" learned a new mode where it takes two commits and
computes a tree that would result in the merge commit, if the
histories leading to these two commits were to be merged.
* en/merge-tree:
git-merge-tree.txt: add a section on potentional usage mistakes
merge-tree: add a --allow-unrelated-histories flag
merge-tree: allow `ls-files -u` style info to be NUL terminated
merge-ort: optionally produce machine-readable output
merge-ort: store more specific conflict information
merge-ort: make `path_messages` a strmap to a string_list
merge-ort: store messages in a list, not in a single strbuf
merge-tree: provide easy access to `ls-files -u` style info
merge-tree: provide a list of which files have conflicts
merge-ort: remove command-line-centric submodule message from merge-ort
merge-ort: provide a merge_get_conflicted_files() helper function
merge-tree: support including merge messages in output
merge-ort: split out a separate display_update_messages() function
merge-tree: implement real merges
merge-tree: add option parsing and initial shell for real merge function
merge-tree: move logic for existing merge into new function
merge-tree: rename merge_trees() to trivial_merge_trees()
Junio C Hamano [Thu, 14 Jul 2022 22:03:58 +0000 (15:03 -0700)]
Merge branch 'gg/worktree-from-the-above'
In a non-bare repository, the behavior of Git when the
core.worktree configuration variable points at a directory that has
a repository as its subdirectory, regressed in Git 2.27 days.
* gg/worktree-from-the-above:
dir: minor refactoring / clean-up
dir: traverse into repository
When sorting the output of `git shortlog` by count, a list of authors in
alphabetical order is then sorted by contribution count. Obviously, the
idea is to maintain the alphabetical order for items with identical
contribution count.
At the moment, this job is performed by `qsort()`. As that function is
not guaranteed to implement a stable sort algorithm, this can lead to
inconsistent and/or surprising behavior: items with identical
contribution count could lose their alphabetical sub-order.
The `qsort()` in MS Visual C's runtime does _not_ implement a stable
sort algorithm, and under certain circumstances this even causes a test
failure in t4201.21 "shortlog can match multiple groups", where two
authors both are listed with 2 contributions, and are listed in inverse
alphabetical order.
Let's instead use the stable sort provided by `git_stable_qsort()` to
avoid this inconsistency.
This is a companion to 2049b8dc65 (diffcore_rename(): use a stable sort,
2019-09-30).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
mergetool(vimdiff): allow paths to contain spaces again
In 0041797449d (vimdiff: new implementation with layout support,
2022-03-30), we introduced a completely new implementation of the
`vimdiff` backend for `git mergetool`.
In this implementation, we no longer call `vim` directly but we
accumulate in the variable `FINAL_CMD` an arbitrary number of commands
for `vim` to execute, which necessitates the use of `eval` to split the
commands properly into multiple command-line arguments.
That same `eval` command also needs to pass the paths to `vim`, and
while it looks as if they are quoted correctly, that quoting only
reaches the `eval` instruction and is lost after that, therefore paths
that contain whitespace characters (or other characters that are
interpreted by the POSIX shell) are handled incorrectly.
This is a simple reproducer:
git init -b main bam-merge-fail
cd bam-merge-fail
echo a>"a file.txt"
git add "a file.txt"
git commit -m "added 'a file.txt'"
echo b>"a file.txt"
git add "a file.txt"
git commit -m "diverged b 'a file.txt'"
git checkout -b c HEAD~
echo c>"a file.txt"
git add "a file.txt"
git commit -m "diverged c 'a file.txt'"
git checkout main
git merge c
git mergetool --tool=vimdiff
With Git v2.37.0/v2.37.1, this will open 7 buffers, not four, and not
display the correct contents at all.
To fix this, let's not expand the variables containing the path
parameters before passing them to the `eval` command, but let that
command expand the variables instead.
This fixes https://github.com/git-for-windows/git/issues/3945
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 78d5e4cfb4b (tests: refactor --write-junit-xml code, 2022-05-21),
this developer refactored the `--write-junit-xml` code a bit, including
the part where the current test case's title was used in a `set`
invocation, but failed to account for the fact that some test cases'
titles start with a long option, which the `set` misinterprets as being
intended for parsing.
Let's fix this by using the `set -- <...>` form.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 13 Jul 2022 21:54:54 +0000 (14:54 -0700)]
Merge branch 'zk/push-use-bitmaps'
"git push" sometimes perform poorly when reachability bitmaps are
used, even in a repository where other operations are helped by
bitmaps. The push.useBitmaps configuration variable is introduced
to allow disabling use of reachability bitmaps only for "git push".
Junio C Hamano [Wed, 13 Jul 2022 21:54:53 +0000 (14:54 -0700)]
Merge branch 'ro/mktree-allow-missing-fix'
"git mktree --missing" lazily fetched objects that are missing from
the local object store, which was totally unnecessary for the purpose
of creating the tree object(s) from its input.
* ro/mktree-allow-missing-fix:
mktree: do not check type of remote objects
Han Xin [Tue, 12 Jul 2022 08:01:43 +0000 (16:01 +0800)]
t5330: remove run_with_limited_processses()
run_with_limited_processses() is used to end the loop faster when an
infinite loop happen. But "ulimit" is tied to the entire development
station, and the test will fail due to too many other processes or using
"--stress".
Without run_with_limited_processses() the infinite loop can also be
stopped due to global configrations or quotas, and the verification
still works fine. So let's remove run_with_limited_processses().
Signed-off-by: Han Xin <hanxin.hx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 12 Jul 2022 07:03:45 +0000 (03:03 -0400)]
diff-files: move misplaced cleanup label
Commit 0139c58ab9 (revisions API users: add "goto cleanup" for
release_revisions(), 2022-04-13) converted an early return in
cmd_diff_files() into a goto. But it put the cleanup label too early: if
read_cache_preload() returns an error, we'll set result to "-1", but
then jump to calling run_diff_files(), overwriting our result.
We should jump past the call to run_diff_files(). Likewise, we should go
past diff_result_code(), which is expecting to see a code from an actual
diff, not a negative error code.
In practice, I suspect this bug cannot actually be triggered, because
read_cache_preload() does not seem to ever return an error. Its return
value (eventually) comes from do_read_index(), which gives the number of
cache entries found, and calls die() on error. Still, it makes sense to
fix the inadvertent change from 0139c58ab9 first, and we can look into
the overall error handling of read_cache() separately (which is present
in many other callsites).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 11 Jul 2022 23:25:14 +0000 (16:25 -0700)]
fsck: do not dereference NULL while checking resolve-undo data
When we found an invalid object recorded in the resolve-undo data,
we would have ended up dereferencing NULL while fsck. Reporting the
problem and going on to the next object is the right thing to do
here.
Junio C Hamano [Mon, 11 Jul 2022 22:38:51 +0000 (15:38 -0700)]
Merge branch 'rs/archive-with-internal-gzip'
Teach "git archive" to (optionally and then by default) avoid
spawning an external "gzip" process when creating ".tar.gz" (and
".tgz") archives.
* rs/archive-with-internal-gzip:
archive-tar: use internal gzip by default
archive-tar: use OS_CODE 3 (Unix) for internal gzip
archive-tar: add internal gzip implementation
archive-tar: factor out write_block()
archive: rename archiver data field to filter_command
archive: update format documentation
Junio C Hamano [Mon, 11 Jul 2022 22:38:51 +0000 (15:38 -0700)]
Merge branch 'ds/branch-checked-out'
Introduce a helper to see if a branch is already being worked on
(hence should not be newly checked out in a working tree), which
performs much better than the existing find_shared_symref() to
replace many uses of the latter.
* ds/branch-checked-out:
branch: drop unused worktrees variable
fetch: stop passing around unused worktrees variable
branch: fix branch_checked_out() leaks
branch: use branch_checked_out() when deleting refs
fetch: use new branch_checked_out() and add tests
branch: check for bisects and rebases
branch: add branch_checked_out() helper
Junio C Hamano [Mon, 11 Jul 2022 22:38:50 +0000 (15:38 -0700)]
Merge branch 'ac/bitmap-format-doc'
Adjust technical/bitmap-format to be formatted by AsciiDoc, and
add some missing information to the documentation.
* ac/bitmap-format-doc:
bitmap-format.txt: add information for trailing checksum
bitmap-format.txt: fix some formatting issues
bitmap-format.txt: feed the file to asciidoc to generate html
Junio C Hamano [Mon, 11 Jul 2022 22:38:49 +0000 (15:38 -0700)]
Merge branch 'pb/diff-doc-raw-format'
Update "git diff/log --raw" format documentation.
* pb/diff-doc-raw-format:
diff-index.txt: update raw output format in examples
diff-format.txt: correct misleading wording
diff-format.txt: dst can be 0* SHA-1 when path is deleted, too
Jeff King [Mon, 11 Jul 2022 14:48:06 +0000 (10:48 -0400)]
ref-filter: disable save_commit_buffer while traversing
Various ref-filter options like "--contains" or "--merged" may cause us
to traverse large segments of the history graph. It's counter-productive
to have save_commit_buffer turned on, as that will instruct the commit
code to cache in-memory the object contents for each commit we traverse.
This increases the amount of heap memory used while providing little or
no benefit, since we're not actually planning to display those commits
(which is the usual reason that tools like git-log want to keep them
around). We can easily disable this feature while ref-filter is running.
This lowers peak heap (as measured by massif) for running:
in linux.git from ~100MB to ~20MB. It also seems to improve runtime by
4-5% (600ms vs 630ms).
A few points to note:
- it should be safe to temporarily disable save_commit_buffer like
this. The saved buffers are accessed through get_commit_buffer(),
which treats the saved ones like a cache, and loads on-demand from
the object database on a cache miss. So any code that was using this
would not be wrong, it might just incur an extra object lookup for
some objects. But...
- I don't think any ref-filter related code is using the cache. While
it's true that an option like "--format=%(*contents:subject)" or
"--sort=*authordate" will need to look at the commit contents,
ref-filter doesn't use get_commit_buffer() to do so! It always reads
the objects directly via read_object_file(), though it does avoid
re-reading objects if the format can be satisfied without them.
Timing "git tag --format=%(*authordate)" shows that we're the same
before and after, as expected.
- Note that all of this assumes you don't have a commit-graph file. if
you do, then the heap usage is even lower, and the runtime is 10x
faster. So in that sense this is not urgent, as there's a much
better solution. But since it's such an obvious and easy win for
fallback cases (including commits which aren't yet in the graph
file), there's no reason not to.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 11 Jul 2022 09:21:52 +0000 (05:21 -0400)]
clone: move unborn head creation to update_head()
Prior to 4f37d45706 (clone: respect remote unborn HEAD, 2021-02-05),
creation of the local HEAD was always done in update_head(). That commit
added code to handle an unborn head in an empty repository, and just did
all symref creation and config setup there.
This makes the code flow a little bit confusing, especially as new
corner cases have been covered (like the previous commit to match our
default branch name to a non-HEAD remote branch).
Let's move the creation of the unborn symref into update_head(). This
matches the other HEAD-creation cases, and now the logic is consistently
separated: the main cmd_clone() function only examines the situation and
sets variables based on what it finds, and update_head() actually
performs the update.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Li Linchao [Mon, 11 Jul 2022 05:58:54 +0000 (05:58 +0000)]
remote-curl: send Accept-Language header to server
Git server end's ability to accept Accept-Language header was introduced
in f18604bbf2 (http: add Accept-Language header if possible, 2015-01-28),
but this is only used by very early phase of the transfer, which is HTTP
GET request to discover references. For other phases, like POST request
in the smart HTTP, the server does not know what language the client
speaks.
Teach git client to learn end-user's preferred language and throw
accept-language header to the server side. Once the server gets this header,
it has the ability to talk to end-user with language they understand.
This would be very helpful for many non-English speakers.
Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Li Linchao <lilinchao@oschina.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jaydeep Das [Mon, 11 Jul 2022 05:00:50 +0000 (05:00 +0000)]
gpg-interface: add function for converting trust level to string
Add new helper function `gpg_trust_level_to_str()` which will
convert a given member of `enum signature_trust_level` to its
corresponding string (in lowercase). For example, `TRUST_ULTIMATE`
will yield the string "ultimate".
This will abstract out some code in `pretty.c` relating to gpg
signature trust levels.
Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Jaydeep Das <jaydeepjd.8914@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
SZEDER Gábor [Sun, 10 Jul 2022 15:16:45 +0000 (17:16 +0200)]
multi-pack-index: simplify handling of unknown --options
Although parse_options() can handle unknown --options just fine, none
of 'git multi-pack-index's subcommands rely on it, but do it on their
own: they invoke parse_options() with the PARSE_OPT_KEEP_UNKNOWN flag,
then check whether there are any unparsed arguments left, and print
usage and quit if necessary.
Drop that PARSE_OPT_KEEP_UNKNOWN flag to let parse_options() handle
unknown options instead, which has the additional benefit that it
prints not only the usage but an "error: unknown option `foo'" message
as well.
Do leave the unparsed arguments check to catch any unexpected
non-option arguments, though, e.g. 'git multi-pack-index write foo'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sun, 10 Jul 2022 10:05:51 +0000 (12:05 +0200)]
cocci: avoid normalization rules for memcpy
Some of the rules for using COPY_ARRAY instead of memcpy with sizeof are
intended to reduce the number of sizeof variants to deal with. They can
have unintended side effects if only they match, but not the one for the
COPY_ARRAY conversion at the end.
Avoid these side effects by instead using a self-contained rule for each
combination of array and pointer for source and destination which lists
all sizeof variants inline.
This lets "make contrib/coccinelle/array.cocci.patch" take 15% longer on
my machine, but gives peace of mind that no incomplete transformation
will be generated.
Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
brian m. carlson [Sun, 10 Jul 2022 13:29:07 +0000 (13:29 +0000)]
sha256: add support for Nettle
For SHA-256, we currently have support for OpenSSL and libgcrypt because
these two libraries contain optimized implementations that can take
advantage of native processor instructions. However, OpenSSL is not
suitable for linking against for Linux distros due to licensing
incompatibilities with the GPLv2, and libgcrypt has been less favored by
cryptographers due to some security-related implementation issues,
which, while not affecting our use of hash algorithms, has affected its
reputation.
Let's add another option that's compatible with the GPLv2, which is
Nettle. This is an option which is generally better than libgcrypt
because on many distros GnuTLS (which uses Nettle) is used for HTTPS and
therefore as a practical matter it will be available on most systems.
As a result, prefer it over libgcrypt and our built-in implementation.
Nettle also has recently gained support for Intel's SHA-NI instructions,
which compare very favorably to other implementations, as well as
assembly implementations for when SHA-NI is not available.
A git gc on git.git sees a 12% performance improvement with Nettle over
our block SHA-256 implementation due to general assembly improvements.
With SHA-NI, the performance of raw SHA-256 on a 2 GiB file goes from
7.296 seconds with block SHA-256 to 1.523 seconds with Nettle.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sun, 10 Jul 2022 01:33:54 +0000 (18:33 -0700)]
builtin/mv.c: use the MOVE_ARRAY() macro instead of memmove()
The variables 'source', 'destination', and 'submodule_gitfile' are
all of type "const char **", and an element of such an array is of
"type const char *", but these memmove() calls were written as if
these variables are of type "char **".
Once these memmove() calls are fixed to use the correct type to
compute the number of bytes to be moved, e.g.
- memmove(source + i, source + i + 1, n * sizeof(char *));
+ memmove(source + i, source + i + 1, n * sizeof(const char *));
existing contrib/coccinelle/array.cocci rules can recognize them as
candidates for turning into MOVE_ARRAY().
While at it, use CALLOC_ARRAY() instead of xcalloc() to allocate the
modes[] array that is involved in the change.
Fernando Ramos [Fri, 8 Jul 2022 18:10:24 +0000 (20:10 +0200)]
vimdiff: make layout engine more robust against user vim settings
'vim' has two configuration options ('splitbelow' and 'splitright') that
change the way the 'split' command behaves. When they are set, the
commands that the layout engine generates no longer work as expected.
In order to fix this we can append special keyword 'leftabove' to each
'split' and 'vertical split' subcommand found inside the command string
generated by the layout engine.
This works because whatever comes after 'leftabove' will temporally
ignore settings 'splitbelow' and 'splitright'.
Reported-by: Matthew Klein <mklein994@gmail.com> Signed-off-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 7 Jul 2022 23:59:35 +0000 (19:59 -0400)]
clone: use remote branch if it matches default HEAD
Usually clone tries to use the same local HEAD as the remote (unless the
user has given --branch explicitly). Even if the remote HEAD is detached
or unborn, we can detect those situations with modern versions of Git.
If the remote is too old to support the "unborn" extension (or it has
been disabled via config), then we can't know the name of the remote's
unborn HEAD, and we fall back whatever the local default branch name is
configured to be.
But that leads to one weird corner case. It's rare because it needs a
number of factors:
- the remote has an unborn HEAD
- the remote is too old to support "unborn", or has disabled it
- the remote has another branch "foo"
- the local default branch name is "foo"
In that case you end up with a local clone on an unborn "foo" branch,
disconnected completely from the remote's "foo". This is rare in
practice, but the result is quite confusing.
When choosing "foo", we can double check whether the remote has such a
name, and if so, start our local "foo" at the same spot, rather than
making it unborn.
Note that this causes a test failure in t5605, which is cloning from a
bundle that doesn't contain HEAD (so it behaves like a remote that
doesn't support "unborn"), but has a single "main" branch. That test
expects that we end up in the weird "unborn main" case, where we don't
actually check out the remote branch of the same name. Even though we
have to update the test, this seems like an argument in favor of this
patch: checking out main is what I'd expect from such a bundle.
So this patch updates the test for the new behavior and adds an adjacent
one that checks what the original was going for: if there's no HEAD and
the bundle _doesn't_ have a branch that matches our local default name,
then we end up with nothing checked out.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 7 Jul 2022 23:57:45 +0000 (19:57 -0400)]
clone: propagate empty remote HEAD even with other branches
Unless "--branch" was given, clone generally tries to match the local
HEAD to the remote one. For most repositories, this is easy: the remote
tells us which branch HEAD was pointing to, and we call our local
checkout() function on that branch.
When cloning an empty repository, it's a little more tricky: we have
special code that checks the transport's "unborn" extension, or falls back
to our local idea of what the default branch should be. In either case,
we point the new HEAD to that, and set up the branch.* config.
But that leaves one case unhandled: when the remote repository _isn't_
empty, but its HEAD is unborn. The checkout() function is smart enough
to realize we didn't fetch the remote HEAD and it bails with a warning.
But we'll have ignored any information the remote gave us via the unborn
extension. This leads to nonsense outcomes:
- If the remote has its HEAD pointing to an unborn "foo" and contains
another branch "bar", cloning will get branch "bar" but leave the
local HEAD pointing at "master" (or whatever our local default is),
which is useless. The project does not use "master" as a branch.
- Worse, if the other branch "bar" is instead called "master" (but
again, the remote HEAD is not pointing to it), then we end up with a
local unborn branch "master", which is not connected to the remote
"master" (it shares no history, and there's no branch.* config).
Instead, we should try to use the remote's HEAD, even if its unborn, to
be consistent with the other cases.
The reason this case was missed is that cmd_clone() handles empty and
non-empty repositories on two different sides of a conditional:
if (we have any refs) {
fetch refs;
check for --branch;
otherwise, try to point our head at remote head;
otherwise, our head is NULL;
} else {
check for --branch;
otherwise, try to use "unborn" extension;
otherwise, fall back to our default name name;
}
So the smallest change would be to repeat the "unborn" logic at the end
of the first block. But we can note some other overlaps and
inconsistencies:
- both sides have to handle --branch (though note that it's always an
error for the empty repo case, since an empty repo by definition
does not have a matching branch)
- the fall back to the default name is much more explicit in the
empty-repo case. The non-empty case eventually ends up bailing
from checkout() with a warning, which produces a similar result, but
fails to set up the branch config we do in the empty case.
So let's pull the HEAD setup out of this conditional entirely. This
de-duplicates some of the code and the result is easy to follow, because
helper functions like find_ref_by_name() do the right thing even in the
empty-repo case (i.e., by returning NULL).
There are two subtleties:
- for a remote with a detached HEAD, it will advertise an oid for HEAD
(which we store in our "remote_head" variable), but we won't find a
matching refname (so our "remote_head_points_at" is NULL). In this
case we make a local detached HEAD to match. Right now this happens
implicitly by reaching update_head() with a non-NULL remote_head
(since we skip all of the unborn-fallback). We'll now need to
account for it explicitly before doing the fallback.
- for an empty repo, we issue a warning to the user that they've
cloned an empty repo. The text of that warning doesn't make sense
for a non-empty repo with an unborn HEAD, so we'll have to
differentiate the two cases there. We could just use different text,
but instead let's allow the code to continue down to checkout(),
which will issue an appropriate warning, like:
remote HEAD refers to nonexistent ref, unable to checkout
Continuing down to checkout() will make it easier to do more fixes
on top (see below).
Note that this patch fixes the case where the other side reports an
unborn head to us using the protocol extension. It _doesn't_ fix the
case where the other side doesn't tell us, we locally guess "master",
and the other side happens to have a "master" which its HEAD doesn't
point. But it doesn't make anything worse there, and it should actually
make it easier to fix that problem on top.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 7 Jul 2022 23:54:51 +0000 (19:54 -0400)]
clone: drop extra newline from warning message
We don't need to put a "\n" in calls to warning(), since it adds one
itself (and the user sees an extra blank line). Drop it, and while we're
here, drop the full-stop from the message, which goes against our
guidelines.
This bug dates all the way back to 8434c2f1af (Build in clone,
2008-04-27), but presumably nobody noticed because it's hard to trigger:
you have to clone a repository whose HEAD is unborn, but which is not
otherwise empty.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
cocci: generalize "unused" rule to cover more than "strbuf"
Generalize the newly added "unused.cocci" rule to find more than just
"struct strbuf", let's have it find the same unused patterns for
"struct string_list", as well as other code that uses
similar-looking *_{release,clear,free}() and {release,clear,free}_*()
functions.
We're intentionally loose in accepting e.g. a "strbuf_init(&sb)"
followed by a "string_list_clear(&sb, 0)". It's assumed that the
compiler will catch any such invalid code, i.e. that our
constructors/destructors don't take a "void *".
See [1] for example of code that would be covered by the
"get_worktrees()" part of this rule. We'd still need work that the
series is based on (we were passing "worktrees" to a function), but
could now do the change in [1] automatically.
cocci: add and apply a rule to find "unused" strbufs
Add a coccinelle rule to remove "struct strbuf" initialization
followed by calling "strbuf_release()" function, without any uses of
the strbuf in the same function.
See the tests in contrib/coccinelle/tests/unused.{c,res} for what it's
intended to find and replace.
The inclusion of "contrib/scalar/scalar.c" is because "spatch" was
manually run on it (we don't usually run spatch on contrib).
Per the "buggy code" comment we also match a strbuf_init() before the
xmalloc(), but we're not seeking to be so strict as to make checks
that the compiler will catch for us redundant. Saying we'll match
either "init" or "xmalloc" lines makes the rule simpler.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
cocci: have "coccicheck{,-pending}" depend on "coccicheck-test"
Have the newly introduced "coccicheck-test" target run implicitly when
"coccicheck" itself is run. As with e.g. the "check-chainlint"
target (see [1]) it makes sense to run this unconditionally before we
run other "spatch" rules as a basic sanity check. See
1. 803394459d4 (t/Makefile: add machinery to check correctness of
chainlint.sed, 2018-07-11)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
cocci: add a "coccicheck-test" target and test *.cocci rules
Add a "coccicheck-test" target to test our *.cocci rules, and as a
demonstration add tests for the rules added in 39ea59a2570 (remove
unnecessary NULL check before free(3), 2016-10-08) and 1b83d1251ed (coccinelle: add a rule to make "expression" code use
FREE_AND_NULL(), 2017-06-15).
I considered making use of the "spatch --test" option, and the choice
of a "tests" over a "t" directory is to make these tests compatible
with such a future change.
Unfortunately "spatch --test" doesn't return meaningful exit codes,
AFAICT you need to "grep" its output to see if the *.res is what you
expect. There's "--test-okfailed", but I didn't find a way to sensibly
integrate those (it relies on some in-between status files, but
doesn't help with the status codes).
Instead let's use a "--sp-file" pattern similar to the main
"coccicheck" rule, with the difference that we use and compare the
two *.res files with cmp(1).
The --very-quiet and --no-show-diff options ensure that we don't need
to pipe stdout and stderr somewhere. Unlike the "%.cocci.patch" rule
we're not using the diff.
The "cmp || git diff" is optimistically giving us better output on
failure, but even if we only have POSIX cmp and no system git
installed we'll still fail with the "cmp", just with an error message
that isn't as friendly. The "2>/dev/null" is in case we don't have a
"git" installed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Makefile & .gitignore: ignore & clean "git.res", not "*.res"
Adjust the overly broad .gitignore and "make clean" rule added in ce39c2e04ce (Provide a Windows version resource for the git
executables., 2012-05-24).
For now this is merely a correctness fix, but needed because a
subsequent commit will want to check in *.res files elsewhere in the
tree, which we shouldn't have to "git add -f".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Makefile: remove mandatory "spatch" arguments from SPATCH_FLAGS
The "--patch ." part of SPATCH_FLAGS added in f57d11728d1 (coccinelle:
put sane filenames into output patches, 2018-07-23) should have been
added unconditionally to the "spatch" invocation instead, using it
isn't optional.
Let's also move the other mandatory flag to come after
$(SPATCH_FLAGS), to ensure that our "--sp-file" overrides any provided
in the environment, both --sp-file <arg> and --patch <arg> are
last-option-wins as far as spatch(1) option parsing is concerned.
The environment variable override was initially added in a9a884aea57 (coccicheck: use --all-includes by default,
2016-09-30). In practice there's probably nobody that's using
SPATCH_FLAGS to try to intentionally break our invocations, but since
we're changing this let's make it clear what (if anything) we expect
to be overridden by user-supplied flags.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-ort: fix issue with dual rename and add/add conflict
There is code in both merge-recursive and merge-ort for avoiding doubly
transitive renames (i.e. one side renames directory A/ -> B/, and the
other side renames directory B/ -> C/), because this combination would
otherwise make a mess for new files added to A/ on the first side and
wondering which directory they end up in -- especially if there were
even more renames such as the first side renaming C/ -> D/. In such
cases, it just turns "off" directory rename detection for the higher
order transitive cases.
The testcases added in t6423 a couple commits ago are slightly different
but similar in principle. They involve a similar case of paired
renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
a leading directory of B/ to C/. And both sides add a new file
somewhere under the directory that the other side will rename. While
the new files added start within different directories and thus could
logically end up within different directories, it is weird for a file
on one side to end up where the other one started and not move along
with it. So, let's just turn off directory rename detection in this
case as well.
Another way to look at this is that if the source name involved in a
directory rename on one side is the target name of a directory rename
operation for a file from the other side, then we avoid the doubly
transitive rename. (More concretely, if a directory rename on side D
wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D
already had a file named NEW_NAME, and a directory rename on side E
wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the
directory rename detection for NEW_NAME to prevent the
NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add
conflict on NEW_NAME.)
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-ort: shuffle the computation and cleanup of potential collisions
Run compute_collisions() for renames on both sides of history before
any calls to collect_renames(), and do not free the computed collisions
until after both calls to collect_renames(). This is just a code
reorganization at this point that doesn't make sense on its own, but
will permit us to use the computed collision info from both sides
within each call to collect_renames() in a subsequent commit.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t6423: add tests of dual directory rename plus add/add conflict
This is an attempt at minimalizing a testcase reported by Glen Choo
with tensorflow where merge-ort would report an assertion failure:
Assertion failed: (ci->filemask == 2 || ci->filemask == 4), function apply_directory_rename_modifications, file merge-ort.c, line 2410
reversing the direction of the merge provides a different error:
error: cache entry has null sha1: ...
fatal: unable to write .git/index
so we add testcases for both. With these new testcases, the
recursive strategy differs in that it returns the latter error for
both merge directions.
These testcases are somehow a little different than Glen's original
tensorflow testcase in that these ones trigger a bug with the recursive
algorithm whereas his testcase didn't. I figure that means these
testcases somehow manage to be more comprehensive.
Reported-by: Glen Choo <chooglen@google.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 4 Jul 2022 20:40:31 +0000 (13:40 -0700)]
Merge 'js/add-i-delete' into maint-2.37
Rewrite of "git add -i" in C that appeared in Git 2.25 didn't
correctly record a removed file to the index, which is an old
regression but has become widely known because the C version
has become the default in the latest release.
Shaoxuan Yuan [Thu, 30 Jun 2022 02:37:37 +0000 (10:37 +0800)]
mv: add check_dir_in_index() and solve general dir check issue
Originally, moving a <source> directory which is not on-disk due
to its existence outside of sparse-checkout cone, "giv mv" command
errors out with "bad source".
Add a helper check_dir_in_index() function to see if a directory
name exists in the index. Also add a SKIP_WORKTREE_DIR bit to mark
such directories.
Change the checking logic, so that such <source> directory makes
"giv mv" command warns with "advise_on_updating_sparse_paths()"
instead of "bad source"; also user now can supply a "--sparse" flag so
this operation can be carried out successfully.
Helped-by: Victoria Dye <vdye@github.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>