Junio C Hamano [Mon, 29 Jun 2020 21:17:24 +0000 (14:17 -0700)]
Merge branch 'xl/upgrade-repo-format'
Allow runtime upgrade of the repository format version, which needs
to be done carefully.
There is a rather unpleasant backward compatibility worry with the
last step of this series, but it is the right thing to do in the
longer term.
* xl/upgrade-repo-format:
check_repository_format_gently(): refuse extensions for old repositories
sparse-checkout: upgrade repository to version 1 when enabling extension
fetch: allow adding a filter after initial clone
repository: add a helper function to perform repository format upgrade
Junio C Hamano [Thu, 25 Jun 2020 19:27:47 +0000 (12:27 -0700)]
Merge branch 'jt/cdn-offload'
The "fetch/clone" protocol has been updated to allow the server to
instruct the clients to grab pre-packaged packfile(s) in addition
to the packed object data coming over the wire.
* jt/cdn-offload:
upload-pack: fix a sparse '0 as NULL pointer' warning
upload-pack: send part of packfile response as uri
fetch-pack: support more than one pack lockfile
upload-pack: refactor reading of pack-objects out
Documentation: add Packfile URIs design doc
Documentation: order protocol v2 sections
http-fetch: support fetching packfiles by URL
http-fetch: refactor into function
http: refactor finish_http_pack_request()
http: use --stdin when indexing dumb HTTP pack
Junio C Hamano [Thu, 25 Jun 2020 19:27:46 +0000 (12:27 -0700)]
Merge branch 'cc/upload-pack-data-3'
Code clean-up in the codepath that serves "git fetch" continues.
* cc/upload-pack-data-3:
upload-pack: refactor common code into do_got_oid()
upload-pack: move oldest_have to upload_pack_data
upload-pack: pass upload_pack_data to got_oid()
upload-pack: pass upload_pack_data to ok_to_give_up()
upload-pack: pass upload_pack_data to send_acks()
upload-pack: pass upload_pack_data to process_haves()
upload-pack: change allow_unadvertised_object_request to an enum
upload-pack: move allow_unadvertised_object_request to upload_pack_data
upload-pack: move extra_edge_obj to upload_pack_data
upload-pack: move shallow_nr to upload_pack_data
upload-pack: pass upload_pack_data to send_unshallow()
upload-pack: pass upload_pack_data to deepen_by_rev_list()
upload-pack: pass upload_pack_data to deepen()
upload-pack: pass upload_pack_data to send_shallow_list()
"git diff" used to take arguments in random and nonsense range
notation, e.g. "git diff A..B C", "git diff A..B C...D", etc.,
which has been cleaned up.
* ct/diff-with-merge-base-clarification:
Documentation: usage for diff combined commits
git diff: improve range handling
t/t3430: avoid undefined git diff behavior
Junio C Hamano [Thu, 25 Jun 2020 19:27:45 +0000 (12:27 -0700)]
Merge branch 'en/clean-cleanups'
Code clean-up of "git clean" resulted in a fix of recent
performance regression.
* en/clean-cleanups:
clean: optimize and document cases where we recurse into subdirectories
clean: consolidate handling of ignored parameters
dir, clean: avoid disallowed behavior
dir: fix a few confusing comments
Denton Liu [Tue, 23 Jun 2020 09:19:49 +0000 (05:19 -0400)]
builtin/diff: fix botched update of usage comment
In the previous commit, an attempt was made to correct the "N=1, M=0"
case. However, the fix was botched and it introduced two half-correct
sections by mistake. Combine these half-correct sections into one fully
correct section.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 22 Jun 2020 22:55:03 +0000 (15:55 -0700)]
Merge branch 'es/worktree-duplicate-paths'
The same worktree directory must be registered only once, but
"git worktree move" allowed this invariant to be violated, which
has been corrected.
* es/worktree-duplicate-paths:
worktree: make "move" refuse to move atop missing registered worktree
worktree: generalize candidate worktree path validation
worktree: prune linked worktree referencing main worktree path
worktree: prune duplicate entries referencing same worktree path
worktree: make high-level pruning re-usable
worktree: give "should be pruned?" function more meaningful name
worktree: factor out repeated string literal
Junio C Hamano [Mon, 22 Jun 2020 22:55:02 +0000 (15:55 -0700)]
Merge branch 'cc/upload-pack-data-2'
Further code clean-up.
* cc/upload-pack-data-2:
upload-pack: move pack_objects_hook to upload_pack_data
upload-pack: move allow_sideband_all to upload_pack_data
upload-pack: move allow_ref_in_want to upload_pack_data
upload-pack: move allow_filter to upload_pack_data
upload-pack: move keepalive to upload_pack_data
upload-pack: pass upload_pack_data to upload_pack_config()
upload-pack: change multi_ack to an enum
upload-pack: move multi_ack to upload_pack_data
upload-pack: move filter_capability_requested to upload_pack_data
upload-pack: move use_sideband to upload_pack_data
upload-pack: move static vars to upload_pack_data
upload-pack: annotate upload_pack_data fields
upload-pack: actually use some upload_pack_data bitfields
strbuf_write_fd was only used in bugreport.c. Since that file now uses
write_in_full, this method is no longer needed. In addition, strbuf_write_fd
did not guard against exceeding MAX_IO_SIZE for the platform, nor
provided error handling in the event of a failure if only partial data
was written to the file descriptor. Since already write_in_full has this
capability and is in general use, it should be used instead. The change
impacts strbuf.c and strbuf.h.
Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
bugreport.c: replace strbuf_write_fd with write_in_full
The strbuf_write_fd method did not provide checks for buffers larger
than MAX_IO_SIZE. Replacing with write_in_full ensures the entire
buffer will always be written to disk or report an error and die.
Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Fri, 19 Jun 2020 13:14:19 +0000 (15:14 +0200)]
pull: plug minor memory leak after using is_descendant_of()
cmd_pull() builds a commit_list to pass a single potential ancestor to
is_descendant_of(). The latter leaves the list intact. Release the
allocated memory after the call.
Leaking in cmd_*() isn't a big deal, but sets a bad example for other
users of is_descendant_of().
Signed-off-by: René Scharfe <l.s.r@web.de> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Paolo Bonzini [Fri, 19 Jun 2020 09:32:10 +0000 (11:32 +0200)]
t4014: do not use "slave branch" nomenclature
Git branches have been qualified as topic branches, integration branches,
development branches, feature branches, release branches and so on.
Git has a branch that is the master *for* development, but it is not
the master *of* any "slave branch": Git does not have slave branches,
and has never had, except for a single testcase that claims otherwise. :)
Independent of any future change to the naming of the "master" branch,
removing this sole appearance of the term is a strict improvement: it
avoids divisive language, and talking about "feature branch" clarifies
which developer workflow the test is trying to emulate.
Reported-by: Till Maas <tmaas@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Thu, 18 Jun 2020 10:43:34 +0000 (06:43 -0400)]
builtin/diff: update usage comment
A comment in cmd_diff() states that if one tree-ish and no blobs are
provided, (the "N=1, M=0" case), it will provide a diff between the tree
and the cache. This is incorrect because a diff happens between the
tree-ish and the working tree. Remove the `--cached` in the comment so
that the correct behavior is shown. Add a new section describing the
"N=1, M=0, --cached" behavior.
Next, describe the "N=0, M=0, --cached" case, similar to the above since
it is undocumented.
Finally, fix some spacing issues. Add spaces between each section for
consistency and readability. Also, change tabs within the comment into
spaces.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Wed, 17 Jun 2020 17:24:29 +0000 (17:24 +0000)]
commit-reach: use fast logic in repo_in_merge_base
The repo_is_descendant_of() method is aware of the existence of the
commit-graph file. It checks for generation_numbers_enabled() before
deciding on using can_all_from_reach() or repo_in_merge_bases()
depending on the situation. The reason here is that can_all_from_reach()
uses a depth-first search that is limited by the minimum generation
number of the target commits, and that algorithm can be very slow when
generation numbers are not present. The alternative uses
paint_down_to_common() which will walk the entire merge-base boundary,
which is typically slower.
This method is used by commands like "git tag --contains" and "git
branch --contains" for very fast results when a commit-graph file
exists. Unfortunately, it is _not_ used in commands like "git merge-base
--is-ancestor" which is doing an even simpler request.
This issue was raised recently [1] with respect to a change to how
generation numbers are stored, but was also reported much earlier [2]
before commit-reach.c existed to simplify these reachability queries.
The root cause is that builtin/merge-base.c has a method
handle_is_ancestor() that calls in_merge_bases(), an older version of
repo_in_merge_bases(). It would be better if we have every caller to
in_merge_bases() use the logic in can_all_from_reach() when possible.
This is where things get a little tricky: repo_is_descendant_of() calls
repo_in_merge_bases() in the non-generation numbers enabled case! If we
simply update repo_in_merge_bases() to call repo_is_descendant_of()
instead of repo_in_merge_bases_many(), then we will get a recursive call
loop. Thankfully, this is caught by the test suite in the default mode
(i.e. GIT_TEST_COMMIT_GRAPH=0).
The trick, then, is to make the non-generation number case for
repo_is_descendant_of() call repo_in_merge_bases_many() directly,
skipping the non-_many version. This allows us to take advantage of this
faster code path, when possible.
The easiest way to measure the performance impact is to test the
following command on the Linux kernel repository:
git merge-base --is-ancestor <A> <B>
| A | B | Time Before | Time After |
|------|------|-------------|------------|
| v3.0 | v5.7 | 0.459s | 0.028s |
| v4.0 | v5.7 | 0.267s | 0.021s |
| v5.0 | v5.7 | 0.074s | 0.013s |
Note that each of these samples return success. The old code performed
the same operation when <A> and <B> are swapped. However,
can_all_from_reach() will return immediately if the generation numbers
show that <A> has larger generation number than <B>. Thus, the time for
the swapped case is universally 0.004s in each case.
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Wed, 17 Jun 2020 17:24:28 +0000 (17:24 +0000)]
commit-reach: create repo_is_descendant_of()
The next change will make repo_in_merge_bases() depend on the logic in
is_descendant_of(), but we need to make the method independent of
the_repository first.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Mon, 15 Jun 2020 11:53:20 +0000 (07:53 -0400)]
branch: don't mix --edit-description
`git branch` accepts `--edit-description` in conjunction with other
arguments. However, `--edit-description` is its own mode, similar to
`--set-upstream-to`, which is also made mutually exclusive with other
modes. Prevent `--edit-description` from being mixed with other modes.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Mon, 15 Jun 2020 11:53:19 +0000 (07:53 -0400)]
t3200: test for specific errors
In the "--set-upstream-to" and "--unset-upstream" tests, specific error
conditions are being tested. However, there is no way of ensuring that a
test case is failing because of some specific error.
Check stderr of failing commands to ensure that they are failing in the
expected way.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Thu, 11 Jun 2020 06:59:33 +0000 (06:59 +0000)]
clean: optimize and document cases where we recurse into subdirectories
Commit 6b1db43109 ("clean: teach clean -d to preserve ignored paths",
2017-05-23) added the following code block (among others) to git-clean:
if (remove_directories)
dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
The reason for these flags is well documented in the commit message, but
isn't obvious just from looking at the code. Add some explanations to
the code to make it clearer.
Further, it appears git-2.26 did not correctly handle this combination
of flags from git-clean. With both these flags and without
DIR_SHOW_IGNORED_TOO_MODE_MATCHING set, git is supposed to recurse into
all untracked AND ignored directories. git-2.26.0 clearly was not doing
that. I don't know the full reasons for that or whether git < 2.27.0
had additional unknown bugs because of that misbehavior, because I don't
feel it's worth digging into. As per the huge changes and craziness
documented in commit 8d92fb2927 ("dir: replace exponential algorithm
with a linear one", 2020-04-01), the old algorithm was a mess and was
thrown out. What I can say is that git-2.27.0 correctly recurses into
untracked AND ignored directories with that combination.
However, in clean's case we don't need to recurse into ignored
directories; that is just a waste of time. Thus, when git-2.27.0
started correctly handling those flags, we got a performance regression
report. Rather than relying on other bugs in fill_directory()'s former
logic to provide the behavior of skipping ignored directories, make use
of the DIR_SHOW_IGNORED_TOO_MODE_MATCHING value specifically added in
commit eec0f7f2b7 ("status: add option to show ignored files
differently", 2017-10-30) for this purpose.
Reported-by: Brian Malehorn <bmalehorn@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Thu, 11 Jun 2020 06:59:32 +0000 (06:59 +0000)]
clean: consolidate handling of ignored parameters
I spent a long time trying to figure out how and whether the code worked
with different values of ignore, ignore_only, and remove_directories.
After lots of time setting up lots of testcases, sifting through lots of
print statements, and walking through the debugger, I finally realized
that one piece of code related to how it was all setup was found in
clean.c rather than dir.c. Make a change that would have made it easier
for me to do the extra testing by putting this handling in one spot.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Thu, 11 Jun 2020 06:59:31 +0000 (06:59 +0000)]
dir, clean: avoid disallowed behavior
dir.h documented quite clearly that DIR_SHOW_IGNORED and
DIR_SHOW_IGNORED_TOO are mutually exclusive, with a big comment to this
effect by the definition of both enum values. However, a command like
git clean -fx $DIR
would set both values for dir.flags. I _think_ it happened to work
because:
* As dir.h points out, DIR_KEEP_UNTRACKED_CONTENTS only takes effect
if DIR_SHOW_IGNORED_TOO is set.
* As coded, I believe DIR_SHOW_IGNORED would just happen to take
precedence over DIR_SHOW_IGNORED_TOO in the code as currently
constructed.
Which is a long way of saying "we just got lucky".
Fix clean.c to avoid setting these mutually exclusive values at the same
time, and add a check to dir.c that will throw a BUG() to prevent anyone
else from making this mistake.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Wed, 10 Jun 2020 23:16:49 +0000 (23:16 +0000)]
git-sparse-checkout: clarify interactions with submodules
Ignoring the sparse-checkout feature momentarily, if one has a submodule and
creates local branches within it with unpushed changes and maybe adds some
untracked files to it, then we would want to avoid accidentally removing such
a submodule. So, for example with git.git, if you run
git checkout v2.13.0
then the sha1collisiondetection/ submodule is NOT removed even though it
did not exist as a submodule until v2.14.0. Similarly, if you only had
v2.13.0 checked out previously and ran
git checkout v2.14.0
the sha1collisiondetection/ submodule would NOT be automatically
initialized despite being part of v2.14.0. In both cases, git requires
submodules to be initialized or deinitialized separately. Further, we
also have special handling for submodules in other commands such as
clean, which requires two --force flags to delete untracked submodules,
and some commands have a --recurse-submodules flag.
sparse-checkout is very similar to checkout, as evidenced by the similar
name -- it adds and removes files from the working copy. However, for
the same avoid-data-loss reasons we do not want to remove a submodule
from the working copy with checkout, we do not want to do it with
sparse-checkout either. So submodules need to be separately initialized
or deinitialized; changing sparse-checkout rules should not
automatically trigger the removal or vivification of submodules.
I believe the previous wording in git-sparse-checkout.txt about
submodules was only about this particular issue. Unfortunately, the
previous wording could be interpreted to imply that submodules should be
considered active regardless of sparsity patterns. Update the wording
to avoid making such an implication. It may be helpful to consider two
example situations where the differences in wording become important:
In the future, we want users to be able to run commands like
git clone --sparse=moduleA --recurse-submodules $REPO_URL
and have sparsity paths automatically set up and have submodules *within
the sparsity paths* be automatically initialized. We do not want all
submodules in any path to be automatically initialized with that
command.
Similarly, we want to be able to do things like
git -c sparse.restrictCmds grep --recurse-submodules $REV $PATTERN
and search through $REV for $PATTERN within the recorded sparsity
patterns. We want it to recurse into submodules within those sparsity
patterns, but do not want to recurse into directories that do not match
the sparsity patterns in search of a possible submodule.
Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 12 Jun 2020 20:57:13 +0000 (13:57 -0700)]
Merge branch 'hn/refs-cleanup'
Preliminary clean-ups around refs API, plus file format
specification documentation for the reftable backend.
* hn/refs-cleanup:
reftable: define version 2 of the spec to accomodate SHA256
reftable: clarify how empty tables should be written
reftable: file format documentation
refs: improve documentation for ref iterator
t: use update-ref and show-ref to reading/writing refs
refs.h: clarify reflog iteration order
Chris Torek [Fri, 12 Jun 2020 16:19:59 +0000 (16:19 +0000)]
git diff: improve range handling
When git diff is given a symmetric difference A...B, it chooses
some merge base from the two specified commits (as documented).
This fails, however, if there is *no* merge base: instead, you
see the differences between A and B, which is certainly not what
is expected.
Moreover, if additional revisions are specified on the command
line ("git diff A...B C"), the results get a bit weird:
* If there is a symmetric difference merge base, this is used
as the left side of the diff. The last final ref is used as
the right side.
* If there is no merge base, the symmetric status is completely
lost. We will produce a combined diff instead.
Similar weirdness occurs if you use, e.g., "git diff C A...B D".
Likewise, using multiple two-dot ranges, or tossing extra
revision specifiers into the command line with two-dot ranges,
or mixing two and three dot ranges, all produce nonsense.
To avoid all this, add a routine to catch the range cases and
verify that that the arguments make sense. As a side effect,
produce a warning showing *which* merge base is being used when
there are multiple choices; die if there is no merge base.
Signed-off-by: Chris Torek <chris.torek@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Thu, 11 Jun 2020 12:05:16 +0000 (14:05 +0200)]
upload-pack: pass upload_pack_data to got_oid()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to got_oid(), so that
this function can use all the fields of the struct.
This will be used in followup commits to move a static variable
into 'upload_pack_data'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Thu, 11 Jun 2020 12:05:15 +0000 (14:05 +0200)]
upload-pack: pass upload_pack_data to ok_to_give_up()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to ok_to_give_up(), so
that this function can use all the fields of the struct.
This will be used in followup commits to move a static variable
into 'upload_pack_data'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Thu, 11 Jun 2020 12:05:14 +0000 (14:05 +0200)]
upload-pack: pass upload_pack_data to send_acks()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to send_acks(), so
that this function can use all the fields of the struct.
This will be used in followup commits to move a static variable
into 'upload_pack_data'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Thu, 11 Jun 2020 12:05:13 +0000 (14:05 +0200)]
upload-pack: pass upload_pack_data to process_haves()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to process_haves(), so
that this function can use all the fields of the struct.
This will be used in followup commits to move a static variable
into 'upload_pack_data'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Thu, 11 Jun 2020 12:05:12 +0000 (14:05 +0200)]
upload-pack: change allow_unadvertised_object_request to an enum
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's change allow_unadvertised_object_request,
which is now part of 'upload_pack_data', from an 'unsigned int'
to an enum.
This will make it clear which values this variable can take.
While at it let's change this variable name to 'allow_uor' to
make it shorter.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Thu, 11 Jun 2020 12:05:11 +0000 (14:05 +0200)]
upload-pack: move allow_unadvertised_object_request to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the 'allow_unadvertised_object_request'
static variable into this struct.
It is used by code common to protocol v0 and protocol v2.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Thu, 11 Jun 2020 12:05:08 +0000 (14:05 +0200)]
upload-pack: pass upload_pack_data to send_unshallow()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to send_unshallow(), so
that this function can use all the fields of the struct.
This will be used in followup commits to move static variables
into 'upload_pack_data'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Thu, 11 Jun 2020 12:05:07 +0000 (14:05 +0200)]
upload-pack: pass upload_pack_data to deepen_by_rev_list()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to deepen_by_rev_list(),
so that this function can use all the fields of the struct.
This will be used in followup commits to move static variables
into 'upload_pack_data'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Thu, 11 Jun 2020 12:05:06 +0000 (14:05 +0200)]
upload-pack: pass upload_pack_data to deepen()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to deepen(), so that
this function can use all the fields of the struct.
This will be used in followup commits to move static variables
into 'upload_pack_data'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Thu, 11 Jun 2020 12:05:05 +0000 (14:05 +0200)]
upload-pack: pass upload_pack_data to send_shallow_list()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to send_shallow_list(),
so that this function can use all the fields of the struct.
This will be used in followup commits to move static variables
into 'upload_pack_data'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Tan [Wed, 10 Jun 2020 20:57:23 +0000 (13:57 -0700)]
upload-pack: send part of packfile response as uri
Teach upload-pack to send part of its packfile response as URIs.
An administrator may configure a repository with one or more
"uploadpack.blobpackfileuri" lines, each line containing an OID, a pack
hash, and a URI. A client may configure fetch.uriprotocols to be a
comma-separated list of protocols that it is willing to use to fetch
additional packfiles - this list will be sent to the server. Whenever an
object with one of those OIDs would appear in the packfile transmitted
by upload-pack, the server may exclude that object, and instead send the
URI. The client will then download the packs referred to by those URIs
before performing the connectivity check.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Tan [Wed, 10 Jun 2020 20:57:22 +0000 (13:57 -0700)]
fetch-pack: support more than one pack lockfile
Whenever a fetch results in a packfile being downloaded, a .keep file is
generated, so that the packfile can be preserved (from, say, a running
"git repack") until refs are written referring to the contents of the
packfile.
In a subsequent patch, a successful fetch using protocol v2 may result
in more than one .keep file being generated. Therefore, teach
fetch_pack() and the transport mechanism to support multiple .keep
files.
Implementation notes:
- builtin/fetch-pack.c normally does not generate .keep files, and thus
is unaffected by this or future changes. However, it has an
undocumented "--lock-pack" feature, used by remote-curl.c when
implementing the "fetch" remote helper command. In keeping with the
remote helper protocol, only one "lock" line will ever be written;
the rest will result in warnings to stderr. However, in practice,
warnings will never be written because the remote-curl.c "fetch" is
only used for protocol v0/v1 (which will not generate multiple .keep
files). (Protocol v2 uses the "stateless-connect" command, not the
"fetch" command.)
- connected.c has an optimization in that connectivity checks on a ref
need not be done if the target object is in a pack known to be
self-contained and connected. If there are multiple packfiles, this
optimization can no longer be done.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Tan [Wed, 10 Jun 2020 20:57:21 +0000 (13:57 -0700)]
upload-pack: refactor reading of pack-objects out
Subsequent patches will change how the output of pack-objects is
processed, so extract that processing into its own function.
Currently, at most 1 character can be buffered (in the "buffered" local
variable). One of those patches will require a larger buffer, so replace
that "buffered" local variable with a buffer array.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Tan [Wed, 10 Jun 2020 20:57:19 +0000 (13:57 -0700)]
Documentation: order protocol v2 sections
The current C Git implementation expects Git servers to follow a
specific order of sections when transmitting protocol v2 responses, but
this is not explicit in the documentation. Make the order explicit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Tan [Wed, 10 Jun 2020 20:57:18 +0000 (13:57 -0700)]
http-fetch: support fetching packfiles by URL
Teach http-fetch the ability to download packfiles directly, given a
URL, and to verify them.
The http_pack_request suite has been augmented with a function that
takes a URL directly. With this function, the hash is only used to
determine the name of the temporary file.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Tan [Wed, 10 Jun 2020 20:57:16 +0000 (13:57 -0700)]
http: refactor finish_http_pack_request()
finish_http_pack_request() does multiple tasks, including some
housekeeping on a struct packed_git - (1) closing its index, (2)
removing it from a list, and (3) installing it. These concerns are
independent of fetching a pack through HTTP: they are there only because
(1) the calling code opens the pack's index before deciding to fetch it,
(2) the calling code maintains a list of packfiles that can be fetched,
and (3) the calling code fetches it in order to make use of its objects
in the same process.
In preparation for a subsequent commit, which adds a feature that does
not need any of this housekeeping, remove (1), (2), and (3) from
finish_http_pack_request(). (2) and (3) are now done by a helper
function, and (1) is the responsibility of the caller (in this patch,
done closer to the point where the pack index is opened).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Tan [Wed, 10 Jun 2020 20:57:15 +0000 (13:57 -0700)]
http: use --stdin when indexing dumb HTTP pack
When Git fetches a pack using dumb HTTP, (among other things) it invokes
index-pack on a ".pack.temp" packfile, specifying the filename as an
argument.
A future commit will require the aforementioned invocation of index-pack
to also generate a "keep" file. To use this, we either have to use
index-pack's naming convention (because --keep requires the pack's
filename to end with ".pack") or to pass the pack through stdin. Of the
two, it is simpler to pass the pack through stdin.
Thus, teach http to pass --stdin to index-pack. As a bonus, the code is
now simpler.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eric Sunshine [Wed, 10 Jun 2020 06:30:49 +0000 (02:30 -0400)]
worktree: make "move" refuse to move atop missing registered worktree
"git worktree add" takes special care to avoid creating a new worktree
at a location already registered to an existing worktree even if that
worktree is missing (which can happen, for instance, if the worktree
resides on removable media). "git worktree move", however, is not so
careful when validating the destination location and will happily move
the source worktree atop the location of a missing worktree. This leads
to the anomalous situation of multiple worktrees being associated with
the same path, which is expressly forbidden by design. For example:
$ git clone foo.git
$ cd foo
$ git worktree add ../bar
$ git worktree add ../baz
$ rm -rf ../bar
$ git worktree move ../baz ../bar
$ git worktree list
.../foo beefd00f [master]
.../bar beefd00f [bar]
.../bar beefd00f [baz]
$ git worktree remove ../bar
fatal: validation failed, cannot remove working tree:
'.../bar' does not point back to '.git/worktrees/bar'
Fix this shortcoming by enhancing "git worktree move" to perform the
same additional validation of the destination directory as done by "git
worktree add".
While at it, add a test to verify that "git worktree move" won't move a
worktree atop an existing (non-worktree) path -- a restriction which has
always been in place but was never tested.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git worktree add" checks that the specified path is a valid location
for a new worktree by ensuring that the path does not already exist and
is not already registered to another worktree (a path can be registered
but missing, for instance, if it resides on removable media). Since "git
worktree add" is not the only command which should perform such
validation ("git worktree move" ought to also), generalize the the
validation function for use by other callers, as well.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eric Sunshine [Wed, 10 Jun 2020 06:30:47 +0000 (02:30 -0400)]
worktree: prune linked worktree referencing main worktree path
"git worktree prune" detects when multiple entries are associated with
the same path and prunes the duplicates, however, it does not detect
when a linked worktree points at the path of the main worktree.
Although "git worktree add" disallows creating a new worktree with the
same path as the main worktree, such a case can arise outside the
control of Git even without the user mucking with .git/worktree/<id>/
administrative files. For instance:
$ git clone foo.git
$ git -C foo worktree add ../bar
$ rm -rf bar
$ mv foo bar
$ git -C bar worktree list
.../bar deadfeeb [master]
.../bar deadfeeb [bar]
Help the user recover from such corruption by extending "git worktree
prune" to also detect when a linked worktree is associated with the path
of the main worktree.
Reported-by: Jonathan Müller <jonathanmueller.dev@gmail.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eric Sunshine [Wed, 10 Jun 2020 06:30:46 +0000 (02:30 -0400)]
worktree: prune duplicate entries referencing same worktree path
A fundamental restriction of linked working trees is that there must
only ever be a single worktree associated with a particular path, thus
"git worktree add" explicitly disallows creation of a new worktree at
the same location as an existing registered worktree. Nevertheless,
users can still "shoot themselves in the foot" by mucking with
administrative files in .git/worktree/<id>/. Worse, "git worktree move"
is careless[1] and allows a worktree to be moved atop a registered but
missing worktree (which can happen, for instance, if the worktree is on
removable media). For instance:
Eric Sunshine [Wed, 10 Jun 2020 06:30:45 +0000 (02:30 -0400)]
worktree: make high-level pruning re-usable
The low-level logic for removing a worktree is well encapsulated in
delete_git_dir(). However, high-level details related to pruning a
worktree -- such as dealing with verbosity and dry-run mode -- are not
encapsulated. Factor out this high-level logic into its own function so
it can be re-used as new worktree corruption detectors are added.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eric Sunshine [Wed, 10 Jun 2020 06:30:44 +0000 (02:30 -0400)]
worktree: give "should be pruned?" function more meaningful name
Readers of the name prune_worktree() are likely to expect the function
to actually prune a worktree, however, it only answers the question
"should this worktree be pruned?". Give it a name more reflective of its
true purpose to avoid such confusion.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Chris Torek [Tue, 9 Jun 2020 19:00:21 +0000 (19:00 +0000)]
t/t3430: avoid undefined git diff behavior
The autosquash-and-exec test used "git diff HEAD^!" to mean
"git diff HEAD^ HEAD". Use these directly instead of relying
on the undefined but actual-current behavior of "HEAD^!".
Signed-off-by: Chris Torek <chris.torek@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Han-Wen Nienhuys [Wed, 20 May 2020 17:36:11 +0000 (17:36 +0000)]
reftable: clarify how empty tables should be written
The format allows for some ambiguity, as a lone footer also starts
with a valid file header. However, the current JGit code will barf on
this. This commit codifies this behavior into the standard.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jonathan Nieder [Wed, 20 May 2020 17:36:10 +0000 (17:36 +0000)]
reftable: file format documentation
Shawn Pearce explains:
Some repositories contain a lot of references (e.g. android at 866k,
rails at 31k). The reftable format provides:
- Near constant time lookup for any single reference, even when the
repository is cold and not in process or kernel cache.
- Near constant time verification if a SHA-1 is referred to by at least
one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.
This file format spec was originally written in July, 2017 by Shawn
Pearce. Some refinements since then were made by Shawn and by Han-Wen
Nienhuys based on experiences implementing and experimenting with the
format. (All of this was in the context of our work at Google and
Google is happy to contribute the result to the Git project.)
Imported from JGit[1]'s current version (c217d33ff,
"Documentation/technical/reftable: improve repo layout", 2020-02-04)
of Documentation/technical/reftable.md and converted to asciidoc by
running
using pandoc 2.2.1. The result required the following additional
minor changes:
- removed the [TOC] directive to add a table of contents, since
asciidoc does not support it
- replaced git-scm.com/docs links with linkgit: directives that link
to other pages within Git's documentation
[1] https://eclipse.googlesource.com/jgit/jgit
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 9 Jun 2020 01:06:32 +0000 (18:06 -0700)]
Merge branch 'cc/upload-pack-data'
Code clean-up.
* cc/upload-pack-data:
upload-pack: use upload_pack_data fields in receive_needs()
upload-pack: pass upload_pack_data to create_pack_file()
upload-pack: remove static variable 'stateless_rpc'
upload-pack: pass upload_pack_data to check_non_tip()
upload-pack: pass upload_pack_data to send_ref()
upload-pack: move symref to upload_pack_data
upload-pack: use upload_pack_data writer in receive_needs()
upload-pack: pass upload_pack_data to receive_needs()
upload-pack: pass upload_pack_data to get_common_commits()
upload-pack: use 'struct upload_pack_data' in upload_pack()
upload-pack: move 'struct upload_pack_data' around
upload-pack: move {want,have}_obj to upload_pack_data
upload-pack: remove unused 'wants' from upload_pack_data
Junio C Hamano [Tue, 9 Jun 2020 01:06:30 +0000 (18:06 -0700)]
Merge branch 'dl/remote-curl-deadlock-fix'
On-the-wire protocol v2 easily falls into a deadlock between the
remote-curl helper and the fetch-pack process when the server side
prematurely throws an error and disconnects. The communication has
been updated to make it more robust.
* dl/remote-curl-deadlock-fix:
stateless-connect: send response end packet
pkt-line: define PACKET_READ_RESPONSE_END
remote-curl: error on incomplete packet
pkt-line: extern packet_length()
transport: extract common fetch_pack() call
remote-curl: remove label indentation
remote-curl: fix typo
Junio C Hamano [Tue, 9 Jun 2020 01:06:29 +0000 (18:06 -0700)]
Merge branch 'vs/complete-stash-show-p-fix'
The command line completion script (in contrib/) tried to complete
"git stash -p" as if it were "git stash push -p", but it was too
aggressive and also affected "git stash show -p", which has been
corrected.
* vs/complete-stash-show-p-fix:
completion: don't override given stash subcommand with -p
Junio C Hamano [Tue, 9 Jun 2020 01:06:28 +0000 (18:06 -0700)]
Merge branch 'rs/fsck-duplicate-names-in-trees'
The check in "git fsck" to ensure that the tree objects are sorted
still had corner cases it missed unsorted entries.
* rs/fsck-duplicate-names-in-trees:
fsck: detect more in-tree d/f conflicts
t1450: demonstrate undetected in-tree d/f conflict
t1450: increase test coverage of in-tree d/f detection
fsck: fix a typo in a comment
Junio C Hamano [Tue, 9 Jun 2020 01:06:27 +0000 (18:06 -0700)]
Merge branch 'cb/t4210-illseq-auto-detect'
As FreeBSD is not the only platform whose regexp library reports
a REG_ILLSEQ error when fed invalid UTF-8, add logic to detect that
automatically and skip the affected tests.
* cb/t4210-illseq-auto-detect:
t4210: detect REG_ILLSEQ dynamically and skip affected tests
t/helper: teach test-regex to report pattern errors (like REG_ILLSEQ)
Emily Shaffer [Mon, 8 Jun 2020 21:11:32 +0000 (14:11 -0700)]
docs: mention MyFirstContribution in more places
While the MyFirstContribution guide exists and has received some use and
positive reviews, it is still not as discoverable as it could be. Add a
reference to it from the GitHub pull request template, where many
brand-new contributors may look. Also add a reference to it in
SubmittingPatches, which is the central source of guidance for patch
contribution.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Reviewed-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eric Sunshine [Mon, 8 Jun 2020 06:23:49 +0000 (02:23 -0400)]
worktree: factor out repeated string literal
For each worktree removed by "git worktree prune", it reports the reason
for the removal. All reasons share the common prefix "Removing
worktrees/%s:". As new removal reasons are added, this prefix needs to
be duplicated, which is error-prone and potentially cumbersome.
Therefore, factor out the common prefix.
Although this change seems to increase the "sentence lego quotient", it
should be reasonably safe, as the reason for removal is a distinct
clause, not strictly related to the prefix. Moreover, the "worktrees" in
"Removing worktrees/%s:" is a path literal which ought not be localized,
so by factoring it out, we can more easily avoid exposing that path
fragment to translators.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Sun, 7 Jun 2020 10:21:06 +0000 (06:21 -0400)]
CodingGuidelines: specify Python 2.7 is the oldest version
In 0b4396f068 (git-p4: make python2.7 the oldest supported version,
2019-12-13), git-p4 was updated to only support 2.7 and newer. Since
Python 2.6 is pretty much ancient history, update CodingGuidelines to
show that 2.7 is the oldest version supported.
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Denton Liu [Sun, 7 Jun 2020 09:48:26 +0000 (05:48 -0400)]
t/README: avoid poor-man's small caps GIT
In 48a8c26c62 (Documentation: avoid poor-man's small caps GIT,
2013-01-21), the documentation was amended to spell Git's name as Git
when talking about the system as a whole. However, t/README was skipped
over when the treatment was applied.
Bring t/README into conformance with the CodingGuidelines by casing
"Git" properly.
While we're at it, fix a small typo. Change "the git internal" to "the
Git internals".
Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>