We use `xmmap()` to map reftables into memory. This function has two
problems:
- It causes us to die in case the mmap fails.
- It ties us to the Git codebase.
Refactor the code to use mmap(3p) instead with manual error checking.
Note that this function may not be the system-provided mmap(3p), but may
point to our `git_mmap()` wrapper that emulates the syscall on systems
that do not have mmap(3p) available.
Fix `reftable_block_source_from_file()` to properly bubble up the error
code in case the map(3p) call fails.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the preceding commit, drop our use of `write_in_full()` and
implement a new wrapper `reftable_write_full()` that handles this logic
for us. This is done to reduce our dependency on the Git library.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a single callsite of `read_in_full()` in the reftable library.
Open-code the function to reduce our dependency on the Git library.
Note that we only partially port over the logic from `read_in_full()`
and its underlying `xread()` helper. Most importantly, the latter also
knows to handle `EWOULDBLOCK` via `handle_nonblock()`. This logic is
irrelevant for us though because the reftable library never sets the
`O_NONBLOCK` option in the first place.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
brian m. carlson [Mon, 17 Feb 2025 17:57:59 +0000 (17:57 +0000)]
diff: don't crash with empty argument to -G or -S
The pickaxe options, -G and -S, need either a regex or a string to look
through the history for. An empty value isn't very useful since it
would either match everything or nothing, and what's worse, we presently
crash with a BUG like so when the user provides one:
BUG: diffcore-pickaxe.c:241: should have needle under -G or -S
Since it's not very nice of us to crash and this wouldn't do anything
useful anyway, let's simply inform the user that they must provide a
non-empty argument and exit with an error if they provide an empty one
instead.
Reported-by: Jared Van Bortel <cebtenzzre@gmail.com> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Tue, 18 Feb 2025 16:24:39 +0000 (16:24 +0000)]
merge-tree: fix link formatting in html docs
In the html documentation the link to the "OUTPUT" section is surrounded
by square brackets. Fix this by adding explicit link text to the cross
reference.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Tue, 18 Feb 2025 16:24:38 +0000 (16:24 +0000)]
merge-tree: improve docs for --stdin
Add a section for --stdin in the list of options and document that it
implies -z so readers know how to parse the output. Also correct the
merge status documentation for --stdin as if the status is less than
zero "git merge-tree" dies before printing it.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Tue, 18 Feb 2025 16:24:37 +0000 (16:24 +0000)]
merge-tree: only use basic merge config
Commit 9c93ba4d0ae (merge-recursive: honor diff.algorithm, 2024-07-13)
replaced init_merge_options() with init_basic_merge_config() for use in
plumbing commands and init_ui_merge_config() for use in porcelain
commands. As "git merge-tree" is a plumbing command it should call
init_basic_merge_config() rather than init_ui_merge_config(). The merge
ort machinery ignores "diff.algorithm" so the behavior is unchanged by
this commit but it future proofs us against any future changes to
init_ui_merge_config().
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Tue, 18 Feb 2025 16:24:36 +0000 (16:24 +0000)]
merge-tree: remove redundant code
real_merge() only ever returns "0" or "1" as it dies if the merge status
is less than zero. Therefore the check for "result < 0" is redundant and
the result variable is not needed. The return value of real_merge() is
ignored because exit status of "git merge-tree --stdin" is "0" for both
successful and conflicted merges (the status of each merge is written to
stdout). The return type of real_merge() is not changed as it is used
for the program's exit status when "--stdin" is not given.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Tue, 18 Feb 2025 16:24:35 +0000 (16:24 +0000)]
merge-tree --stdin: flush stdout to avoid deadlock
If a process tries to read the output from "git merge-tree --stdin"
before it closes merge-tree's stdin then it deadlocks. This happens
because merge-tree does not flush its output before trying to read
another line of input and means that it is not possible to cherry-pick a
sequence of commits using "git merge-tree --stdin". Fix this by calling
maybe_flush_or_die() before trying to read the next line of
input. Flushing the output after each merge does not seem to affect the
performance, any difference is lost in the noise even after increasing
the number of runs.
$ git rev-list --merges --parents -n100 origin/master |
sed 's/^[^ ]* //' >/tmp/merges
$ hyperfine -L flush 0,1 --warmup 1 --runs 30 \
'GIT_FLUSH={flush} ./git merge-tree --stdin </tmp/merges'
Benchmark 1: GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges
Time (mean ± σ): 546.6 ms ± 11.7 ms [User: 503.2 ms, System: 40.9 ms]
Range (min … max): 535.9 ms … 567.7 ms 30 runs
Benchmark 2: GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges
Time (mean ± σ): 546.9 ms ± 12.0 ms [User: 505.9 ms, System: 38.9 ms]
Range (min … max): 529.8 ms … 570.0 ms 30 runs
Summary
'GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges' ran
1.00 ± 0.03 times faster than 'GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges'
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meet Soni [Sat, 15 Feb 2025 08:45:39 +0000 (14:15 +0530)]
refspec: clarify function naming and documentation
Rename `match_name_with_pattern()` to `match_refname_with_pattern()` to
better reflect its purpose and improve documentation comment clarity.
The previous function name and parameter names were inconsistent, making
it harder to understand their roles in refspec matching.
Usman Akinyemi [Sat, 15 Feb 2025 15:50:51 +0000 (21:20 +0530)]
t5701: add setup test to remove side-effect dependency
Currently, the "test capability advertisement" test creates some files
with expected content which are used by other tests below it.
To remove that side-effect from this test, let's split up part of
it into a "setup"-type test which creates the files with expected content
which gets reused by multiple tests. This will be useful in a following
commit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usman Akinyemi [Sat, 15 Feb 2025 15:50:50 +0000 (21:20 +0530)]
version: extend get_uname_info() to hide system details
Currently, get_uname_info() function provides the full OS information.
In a following commit, we will need it to provide only the OS name.
Let's extend it to accept a "full" flag that makes it switch between
providing full OS information and providing only the OS name.
We may need to refactor this function in the future if an
`osVersion.format` is added.
Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usman Akinyemi [Sat, 15 Feb 2025 15:50:49 +0000 (21:20 +0530)]
version: refactor get_uname_info()
Some code from "builtin/bugreport.c" uses uname(2) to get system
information.
Let's refactor this code into a new get_uname_info() function, so
that we can reuse it in a following commit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usman Akinyemi [Sat, 15 Feb 2025 15:50:48 +0000 (21:20 +0530)]
version: refactor redact_non_printables()
The git_user_agent_sanitized() function performs some sanitizing to
avoid special characters being sent over the line and possibly messing
up with the protocol or with the parsing on the other side.
Let's extract this sanitizing into a new redact_non_printables() function,
as we will want to reuse it in a following patch.
For now the new redact_non_printables() function is still static as
it's only needed locally.
While at it, let's use strbuf_detach() to explicitly detach the string
contained by the 'buf' strbuf.
Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usman Akinyemi [Sat, 15 Feb 2025 15:50:47 +0000 (21:20 +0530)]
version: replace manual ASCII checks with isprint() for clarity
Since the isprint() function checks for printable characters, let's
replace the existing hardcoded ASCII checks with it. However, since
the original checks also handled spaces, we need to account for spaces
explicitly in the new check.
Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adam Dinwoodie [Sat, 15 Feb 2025 21:19:03 +0000 (21:19 +0000)]
Makefile: set default goals in makefiles
Explicitly set the default goal at the very top of various makefiles.
This is already present in some makefiles, but not all of them.
In particular, this corrects a regression introduced in a38edab7c8
(Makefile: generate doc versions via GIT-VERSION-GEN, 2024-12-06). That
commit added some config files as build targets for the Documentation
directory, and put the target configuration in a sensible place.
Unfortunately, that sensible place was above any other build target
definitions, meaning the default goal changed to being those
configuration files only, rather than the HTML and man page
documentation.
Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org> Helped-by: Junio C Hamano <gitster@pobox.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sat, 15 Feb 2025 01:53:48 +0000 (17:53 -0800)]
Merge branch 'kn/reflog-migration-fix-followup'
Code clean-up.
* kn/reflog-migration-fix-followup:
reftable: prevent 'update_index' changes after adding records
refs: use 'uint64_t' for 'ref_update.index'
refs: mark `ref_transaction_update_reflog()` as static
Junio C Hamano [Sat, 15 Feb 2025 01:53:48 +0000 (17:53 -0800)]
Merge branch 'bf/fetch-set-head-fix'
Fetching into a bare repository incorrectly assumed it always used
a mirror layout when deciding to update remote-tracking HEAD, which
has been corrected.
* bf/fetch-set-head-fix:
fetch set_head: fix non-mirror remotes in bare repositories
fetch set_head: refactor to use remote directly
Junio C Hamano [Sat, 15 Feb 2025 01:53:48 +0000 (17:53 -0800)]
Merge branch 'op/worktree-is-main-bare-fix'
Going into a secondary worktree and asking "is the main worktree
bare?" did not work correctly when per-worktree configuration
option was in use, which has been corrected.
* op/worktree-is-main-bare-fix:
worktree: detect from secondary worktree if main worktree is bare
Junio C Hamano [Sat, 15 Feb 2025 01:53:47 +0000 (17:53 -0800)]
Merge branch 'tc/clone-single-revision'
"git clone" learned to make a shallow clone for a single commit
that is not necessarily be at the tip of any branch.
* tc/clone-single-revision:
builtin/clone: teach git-clone(1) the --revision= option
parse-options: introduce die_for_incompatible_opt2()
clone: introduce struct clone_opts in builtin/clone.c
clone: add tags refspec earlier to fetch refspec
clone: refactor wanted_peer_refs()
clone: make it possible to specify --tags
clone: cut down on global variables in clone.c
When 'remote.<name>.followRemoteHEAD' was added in b7f7d16562 (fetch:
add configuration for set_head behaviour, 2024-11-29), its description
was added to remote.txt in between the two paragraphs describing
'remote.<name>.serverOption'. Reunite these two paragraphs.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meet Soni [Fri, 14 Feb 2025 04:41:29 +0000 (10:11 +0530)]
merge-recursive: optimize time complexity for process_renames
Avoid O(n^2) complexity in `process_renames()` when building a sorted
`string_list` by constructing it unsorted and sorting it afterward,
reducing the complexity to O(n log n).
Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Thu, 13 Feb 2025 20:25:50 +0000 (15:25 -0500)]
Makefile: remove accidental recipe prefix in conditional
Back in 728b9ac0c3 (Makefile(s): avoid recipe prefix in conditional
statements, 2024-04-08), we prepared our Makefiles for a forthcoming
change in upstream Make that would ban the recipe prefix within a
conditional statement by replacing tabs (the prefix) with eight spaces.
In b9d6f64393 (compat/zlib: allow use of zlib-ng as backend,
2025-01-28), a handful of recipe prefix characters were introduced in a
conditional statement ('ifdef ZLIB_NG'), causing 'make' to fail on my
system, which uses GNU Make 4.4.90.
Remove the recipe prefix characters by replacing them with the same
script as is mentioned in 728b9ac0c3.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 12 Feb 2025 18:08:54 +0000 (10:08 -0800)]
Merge branch 'sc/help-autocorrect-one'
"[help] autocorrect = 1" used to be a way to say "please wait for
0.1 second after suggesting a typofix of the command name before
running that command"; now it means "yes, if there is a plausible
typofix for the command name, please run it immediately".
* sc/help-autocorrect-one:
help: interpret boolean string values for help.autocorrect
Junio C Hamano [Wed, 12 Feb 2025 18:08:53 +0000 (10:08 -0800)]
Merge branch 'js/libgit-rust'
Foreign language interface for Rust into our code base has been added.
* js/libgit-rust:
libgit: add higher-level libgit crate
libgit-sys: also export some config_set functions
libgit-sys: introduce Rust wrapper for libgit.a
common-main: split init and exit code into new files
"git repack --keep-unreachable" to send unreachable objects to the
main pack "git repack -ad" produces did not work when there is no
existing packs, which has been corrected.
* ps/repack-keep-unreachable-in-unpacked-repo:
builtin/repack: fix `--keep-unreachable` when there are no packs
Junio C Hamano [Wed, 12 Feb 2025 18:08:51 +0000 (10:08 -0800)]
Merge branch 'ds/name-hash-tweaks'
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.
* ds/name-hash-tweaks:
pack-objects: prevent name hash version change
test-tool: add helper for name-hash values
p5313: add size comparison test
pack-objects: add GIT_TEST_NAME_HASH_VERSION
repack: add --name-hash-version option
pack-objects: add --name-hash-version option
pack-objects: create new name-hash function version
Elijah Newren [Tue, 11 Feb 2025 21:01:52 +0000 (21:01 +0000)]
doc: clarify the intent of the renormalize option in the merge machinery
The -X renormalize (or merge.renormalize config) option is intended to
reduce conflicts due to normalization of newer versions of history. It
does so by renormalizing files that it is about to do a three-way
content merge on. Some folks thought it would renormalize all files
throughout the tree, and the previous wording wasn't clear enough to
dispell that misconception. Update the docs to make it clear that the
merge machinery will only apply renormalization to files which need a
three-way content merge.
(Technically, the merge machinery also does renormalization on
modify/delete conflicts, in order to see if the modification was merely
a normalization; if so, it can accept the delete and not report a
conflict. But it's not clear that this piece needs to be explained to
users, and trying to distinguish it might feel like splitting hairs and
overcomplicating the explanation, so we leave it out.)
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 11 Feb 2025 17:20:07 +0000 (09:20 -0800)]
doc: centrally document various ways tospell `true` and `false`
We do not seem to centrally document exhaustively ways to spell
Boolean values.
The description in the Environment Variables of git(1) section
assumes that the reader is already familiar with how "Boolean valued
configuration variables" are specified, without referring to
anything, so there is no way for the readers to find out more.
The description of `bool` in the section on "--type
<type>" in "git config --help" might be the place to do so, but it
is not telling us all that much.
The description of Boolean valued placeholders in the pretty formats
section of "git log --help" enumerates the possible values with "etc."
implying there may be other synonyms; shrink the list of samples and
instead refer to the canonical and authoritative source of truth, which
now is git-config(1).
Phillip Wood [Tue, 11 Feb 2025 15:59:08 +0000 (15:59 +0000)]
rebase -i: reword empty commit after fast-forward
When rebase rewords a commit it picks the commit and then runs "git
commit --amend" to reword it. When the commit is picked the sequencer
tries to reuse existing commits by fast-forwarding if the parents are
unchanged. Rewording an empty commit that has been fast-forwarded fails
because "git commit --amend" is called without "--allow-empty". This
happens because when a commit is fast-forwarded the logic that checks
whether we should pass "--allow-empty" is skipped. Fix this by always
passing "--allow-empty" when rewording a commit. This is safe because we
are amending a commit that has already been picked so if it had become
empty when it was picked we'd have already returned an error.
As "git commit" will happily create empty merge commits without
"--allow-empty" we do not need to pass that flag when rewording merge
commits.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usman Akinyemi [Mon, 10 Feb 2025 18:10:30 +0000 (23:40 +0530)]
builtin/update-server-info: remove the_repository global variable
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/update-server-info.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_update_server_info()` function
with `repo` set to NULL and then early in the function, "parse_options()"
call will give the options help and exit, without having to consult much
of the configuration file. So it is safe to omit reading the config when
`repo` argument the caller gave us is NULL.
Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
brian m. carlson [Mon, 10 Feb 2025 23:49:47 +0000 (23:49 +0000)]
thunderbird-patch-inline: avoid bashism
The use of "echo -e" is not portable and not specified by POSIX. dash
does not support any options except "-n", and so this script will not
work on operating systems which use that as /bin/sh.
Fortunately, the solution is easy: switch to printf(1), which is
specified by POSIX and allows the escape sequences we want to use. This
will allow the script to work with any POSIX shell.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 10 Feb 2025 18:18:31 +0000 (10:18 -0800)]
Merge branch 'sk/unit-tests-0130'
Convert a handful of unit tests to work with the clar framework.
* sk/unit-tests-0130:
t/unit-tests: convert strcmp-offset test to use clar test framework
t/unit-tests: convert strbuf test to use clar test framework
t/unit-tests: adapt example decorate test to use clar test framework
t/unit-tests: convert hashmap test to use clar test framework
Junio C Hamano [Mon, 10 Feb 2025 18:18:30 +0000 (10:18 -0800)]
Merge branch 'ps/hash-cleanup'
Further code clean-up on the use of hash functions. Now the
context object knows what hash function it is working with.
* ps/hash-cleanup:
global: adapt callers to use generic hash context helpers
hash: provide generic wrappers to update hash contexts
hash: stop typedeffing the hash context
hash: convert hashing context to a structure
Junio C Hamano [Mon, 10 Feb 2025 18:18:30 +0000 (10:18 -0800)]
Merge branch 'jt/gitlab-ci-base-fix'
Two CI tasks, whitespace check and style check, work on the
difference from the base version and the version being checked, but
the base was computed incorrectly in GitLab CI in some cases, which
has been corrected.
* jt/gitlab-ci-base-fix:
ci: fix base commit fallback for check-whitespace and check-style
Junio C Hamano [Mon, 10 Feb 2025 18:18:30 +0000 (10:18 -0800)]
Merge branch 'pw/apply-ulong-overflow-check'
"git apply" internally uses unsigned long for line numbers and uses
strtoul() to parse numbers on the hunk headers. It however forgot
to check parse errors.
* pw/apply-ulong-overflow-check:
apply: detect overflow when parsing hunk header
Junio C Hamano [Mon, 10 Feb 2025 18:18:29 +0000 (10:18 -0800)]
Merge branch 'ps/setup-reinit-fixes'
"git init" to reinitialize a repository that already exists cannot
change the hash function and ref backends; such a request is
silently ignored now.
* ps/setup-reinit-fixes:
setup: fix reinit of repos with incompatible GIT_DEFAULT_HASH
setup: fix reinit of repos with incompatible GIT_DEFAULT_REF_FORMAT
t0001: remove duplicate test
Lucas Oshiro [Sat, 8 Feb 2025 16:57:31 +0000 (13:57 -0300)]
t7603: replace test -f by test_path_is_file
`test_path_is_file` provides a better output when asserting whether a
file exists. Replace the occurrences of `test -f` in t7603 with it,
facilitating the trace of possible test failures.
Signed-off-by: Lucas Oshiro <lucasseikioshiro@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
path: drop `git_common_path()` in favor of `repo_common_path()`
Remove `git_common_path()` in favor of the `repo_common_path()` family
of functions, which makes the implicit dependency on `the_repository` go
away.
Note that `git_common_path()` used to return a string allocated via
`get_pathname()`, which uses a rotating set of statically allocated
buffers. Consequently, callers didn't have to free the returned string.
The same isn't true for `repo_common_path()`, so we also have to add
logic to free the returned strings.
This refactoring also allows us to remove `repo_common_pathv()` from the
public interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
worktree: return allocated string from `get_worktree_git_dir()`
The `get_worktree_git_dir()` function returns a string constant that
does not need to be free'd by the caller. This string is computed for
three different cases:
- If we don't have a worktree we return a path into the Git directory.
The returned string is owned by `the_repository`, so there is no
need for the caller to free it.
- If we have a worktree, but no worktree ID then the caller requests
the main worktree. In this case we return a path into the common
directory, which again is owned by `the_repository` and thus does
not need to be free'd.
- In the third case, where we have an actual worktree, we compute the
path relative to "$GIT_COMMON_DIR/worktrees/". This string does not
need to be released either, even though `git_common_path()` ends up
allocating memory. But this doesn't result in a memory leak either
because we write into a buffer returned by `get_pathname()`, which
returns one out of four static buffers.
We're about to drop `git_common_path()` in favor of `repo_common_path()`,
which doesn't use the same mechanism but instead returns an allocated
string owned by the caller. While we could adapt `get_worktree_git_dir()`
to also use `get_pathname()` and print the derived common path into that
buffer, the whole schema feels a lot like premature optimization in this
context. There are some callsites where we call `get_worktree_git_dir()`
in a loop that iterates through all worktrees. But none of these loops
seem to be even remotely in the hot path, so saving a single allocation
there does not feel worth it.
Refactor the function to instead consistently return an allocated path
so that we can start using `repo_common_path()` in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
path: drop `git_path_buf()` in favor of `repo_git_path_replace()`
Remove `git_path_buf()` in favor of `repo_git_path_replace()`. The
latter does essentially the same, with the only exception that it does
not rely on `the_repository` but takes the repo as separate parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
path: drop `git_pathdup()` in favor of `repo_git_path()`
Remove `git_pathdup()` in favor of `repo_git_path()`. The latter does
essentially the same, with the only exception that it does not rely on
`the_repository` but takes the repo as separate parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
path: refactor `repo_submodule_path()` family of functions
As explained in an earlier commit, we're refactoring path-related
functions to provide a consistent interface for computing paths into the
commondir, gitdir and worktree. Refactor the "submodule" family of
functions accordingly.
Note that in contrast to the other `repo_*_path()` families, we have to
pass in the repository as a non-constant pointer. This is because we end
up calling `repo_read_gitmodules()` deep down in the callstack, which
may end up modifying the repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule: refactor `submodule_to_gitdir()` to accept a repo
The `submodule_to_gitdir()` function implicitly uses `the_repository` to
resolve submodule paths. Refactor the function to instead accept a repo
as parameter to remove the dependency on global state.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
path: refactor `repo_worktree_path()` family of functions
As explained in an earlier commit, we're refactoring path-related
functions to provide a consistent interface for computing paths into the
commondir, gitdir and worktree. Refactor the "worktree" family of
functions accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
path: refactor `repo_git_path()` family of functions
As explained in an earlier commit, we're refactoring path-related
functions to provide a consistent interface for computing paths into the
commondir, gitdir and worktree. Refactor the "gitdir" family of
functions accordingly.
Note that the `repo_git_pathv()` function is converted into an internal
implementation detail. It is only used to implement `the_repository`
compatibility shims and will eventually be removed from the public
interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
path: refactor `repo_common_path()` family of functions
The functions provided by the "path" subsystem to derive repository
paths for the commondir, gitdir, worktrees and submodules are quite
inconsistent. Some functions have a `strbuf_` prefix, others have
different return values, some don't provide a variant working on top of
`strbuf`s.
We're thus about to refactor all of these family of functions so that
they follow a common pattern:
- `repo_*_path()` returns an allocated string.
- `repo_*_path_append()` appends the path to the caller-provided
buffer while returning a constant pointer to the buffer. This
clarifies whether the buffer is being appended to or rewritten,
which otherwise wasn't immediately obvious.
- `repo_*_path_replace()` replaces contents of the buffer with the
computed path, again returning a pointer to the buffer contents.
The returned constant pointer isn't being used anywhere yet, but it will
be used in subsequent commits. Its intent is to allow calling patterns
like the following somewhat contrived example:
if (!stat(&st, repo_common_path_replace(repo, &buf, ...)) &&
!unlink(repo_common_path_replace(repo, &buf, ...)))
...
Refactor the commondir family of functions accordingly and adapt all
callers.
Note that `repo_common_pathv()` is converted into an internal
implementation detail. It is only used to implement `the_repository`
compatibility shims and will eventually be removed from the public
interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 6 Feb 2025 22:56:45 +0000 (14:56 -0800)]
Merge branch 'ps/zlib-ng'
The code paths to interact with zlib has been cleaned up in
preparation for building with zlib-ng.
* ps/zlib-ng:
ci: make "linux-musl" job use zlib-ng
ci: switch linux-musl to use Meson
compat/zlib: allow use of zlib-ng as backend
git-zlib: cast away potential constness of `next_in` pointer
compat/zlib: provide stubs for `deflateSetHeader()`
compat/zlib: provide `deflateBound()` shim centrally
git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
compat: introduce new "zlib.h" header
git-compat-util: drop `z_const` define
compat: drop `uncompress2()` compatibility shim
Junio C Hamano [Thu, 6 Feb 2025 22:56:44 +0000 (14:56 -0800)]
Merge branch 'js/bundle-unbundle-fd-reuse-fix'
The code path used when "git fetch" fetches from a bundle file
closed the same file descriptor twice, which sometimes broke things
unexpectedly when the file descriptor was reused, which has been
corrected.
Junio C Hamano [Thu, 6 Feb 2025 22:56:44 +0000 (14:56 -0800)]
Merge branch 'ps/ci-misc-updates'
CI updates (containerization, dropping stale ones, etc.).
* ps/ci-misc-updates:
ci: remove stale code for Azure Pipelines
ci: use latest Ubuntu release
ci: stop special-casing for Ubuntu 16.04
gitlab-ci: add linux32 job testing against i386
gitlab-ci: remove the "linux-old" job
github: simplify computation of the job's distro
github: convert all Linux jobs to be containerized
github: adapt containerized jobs to be rootless
t7422: fix flaky test caused by buffered stdout
t0060: fix EBUSY in MinGW when setting up runtime prefix
Toon Claes [Thu, 6 Feb 2025 06:33:35 +0000 (07:33 +0100)]
builtin/clone: teach git-clone(1) the --revision= option
The git-clone(1) command has the option `--branch` that allows the user
to select the branch they want HEAD to point to. In a non-bare
repository this also checks out that branch.
Option `--branch` also accepts a tag. When a tag name is provided, the
commit this tag points to is checked out and HEAD is detached. Thus
`--branch` can be used to clone a repository and check out a ref kept
under `refs/heads` or `refs/tags`. But some other refs might be in use
as well. For example Git forges might use refs like `refs/pull/<id>` and
`refs/merge-requests/<id>` to track pull/merge requests. These refs
cannot be selected upon git-clone(1).
Add option `--revision` to git-clone(1). This option accepts a fully
qualified reference, or a hexadecimal commit ID. This enables the user
to clone and check out any revision they want. `--revision` can be used
in conjunction with `--depth` to do a minimal clone that only contains
the blob and tree for a single revision. This can be useful for
automated tests running in CI systems.
Using option `--branch` and `--single-branch` together is a similar
scenario, but serves a different purpose. Using these two options, a
singlet remote tracking branch is created and the fetch refspec is set
up so git-fetch(1) will receive updates on that branch from the remote.
This allows the user work on that single branch.
Option `--revision` on contrary detaches HEAD, creates no tracking
branches, and writes no fetch refspec.
Signed-off-by: Toon Claes <toon@iotcl.com> Acked-by: Patrick Steinhardt <ps@pks.im>
[jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test] Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions die_for_incompatible_opt3() and
die_for_incompatible_opt4() already exist to die whenever a user
specifies three or four options respectively that are not compatible.
Introduce die_for_incompatible_opt2() which dies when two options that
are incompatible are set.
Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Toon Claes [Thu, 6 Feb 2025 06:33:33 +0000 (07:33 +0100)]
clone: introduce struct clone_opts in builtin/clone.c
There is a lot of state stored in global variables in builtin/clone.c.
In the long run we'd like to remove many of those.
Introduce `struct clone_opts` in this file. This struct will be used to
contain all details needed to perform the clone. The struct object can
be thrown around to all the functions that need these details.
The first field we're adding is `wants_head`. In some scenarios
(specifically when both `--single-branch` and `--branch` are given) we
are not interested in `HEAD` on the remote. The field `wants_head` in
`struct clone_opts` will hold this information. We could have put
`option_branch` and `option_single_branch` into that struct instead, but
in a following commit we'll be using `wants_head` as well.
Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Toon Claes [Thu, 6 Feb 2025 06:33:30 +0000 (07:33 +0100)]
clone: make it possible to specify --tags
Option --no-tags was added in 0dab2468ee (clone: add a --no-tags option
to clone without tags, 2017-04-26). At the time there was no need to
support --tags as well, although there was some conversation about
it[1].
To simplify the code and to prepare for future commits, invert the flag
internally. Functionally there is no change, because the flag is
default-enabled passing `--tags` has no effect, so there's no need to
add tests for this.
Toon Claes [Thu, 6 Feb 2025 06:33:32 +0000 (07:33 +0100)]
clone: add tags refspec earlier to fetch refspec
In clone.c we call refspec_ref_prefixes() to copy the fetch refspecs
from the `remote->fetch` refspec into `ref_prefixes` of
`transport_ls_refs_options`. Afterwards we add the tags prefix
`refs/tags/` prefix as well. At a later point, in wanted_peer_refs() we
process refs using both `remote->fetch` and `TAG_REFSPEC`.
Simplify the code by appending `TAG_REFSPEC` to `remote->fetch` before
calling refspec_ref_prefixes().
To be able to do this, we set `option_tags` to 0 when --mirror is given.
This is because --mirror mirrors (hence the name) all the refs,
including tags and they do not need to be treated separately.
Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Toon Claes [Thu, 6 Feb 2025 06:33:29 +0000 (07:33 +0100)]
clone: cut down on global variables in clone.c
In clone.c the `struct option` which is used to parse the input options
for git-clone(1) is a global variable. Due to this, many variables that
are used to parse the value into, are also global.
Make `builtin_clone_options` a local variable in cmd_clone() and carry
along all variables that are only used in that function.
Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Toon Claes [Thu, 6 Feb 2025 06:33:31 +0000 (07:33 +0100)]
clone: refactor wanted_peer_refs()
The function wanted_peer_refs() is used to map the refs returned by the
server to refs we will save in our clone.
Over time this function grown to be very complex. Refactor it.
Previously, there was a separate code path for when
`option_single_branch` was set. It resulted in duplicated code and
deeper nested conditions. After this refactor the code path for when
`option_single_branch` is truthy modifies `refs` and then falls through
to the common code path. This approach relies on the `refspec` being set
correctly and thus only mapping refs that are relevant.
Signed-off-by: Toon Claes <toon@iotcl.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Olga Pilipenco [Wed, 5 Feb 2025 06:30:13 +0000 (06:30 +0000)]
worktree: detect from secondary worktree if main worktree is bare
When extensions.worktreeConfig is true and the main worktree is
bare -- that is, its config.worktree file contains core.bare=true
-- commands run from secondary worktrees incorrectly see the main
worktree as not bare. As such, those commands incorrectly think
that the repository's default branch (typically "main" or
"master") is checked out in the bare repository even though it's
not. This makes it impossible, for instance, to checkout or delete
the default branch from a secondary worktree, among other
shortcomings.
This problem occurs because, when extensions.worktreeConfig is
true, commands run in secondary worktrees only consult
$commondir/config and $commondir/worktrees/<id>/config.worktree,
thus they never see the main worktree's core.bare=true setting in
$commondir/config.worktree.
Fix this problem by consulting the main worktree's config.worktree
file when checking whether it is bare. (This extra work is
performed only when running from a secondary worktree.)
Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Olga Pilipenco <olga.pilipenco@shopify.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Wed, 5 Feb 2025 00:41:47 +0000 (18:41 -0600)]
rev-list: extend print-info to print missing object type
Additional information about missing objects found in git-rev-list(1)
can be printed by specifying the `print-info` missing action for the
`--missing` option. Extend this action to also print missing object type
information inferred from its containing object. This token follows the
form `type=<type>` and specifies the expected object type of the missing
object.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Justin Tobler [Wed, 5 Feb 2025 00:41:46 +0000 (18:41 -0600)]
rev-list: add print-info action to print missing object path
Missing objects identified through git-rev-list(1) can be printed by
setting the `--missing=print` option. Additional information about the
missing object, such as its path and type, may be present in its
containing object.
Add the `print-info` missing action for the `--missing` option that,
when set, prints additional insight about the missing object inferred
from its containing object. Each line of output for a missing object is
in the form: `?<oid> [<token>=<value>]...`. The `<token>=<value>` pairs
containing additional information are separated from each other by a SP.
The value is encoded in a token specific fashion, but SP or LF contained
in value are always expected to be represented in such a way that the
resulting encoded value does not have either of these two problematic
bytes. This format is kept generic so it can be extended in the future
to support additional information.
For now, only a missing object path info is implemented. It follows the
form `path=<path>` and specifies the full path to the object from the
top-level tree. A path containing SP or special characters is enclosed
in double-quotes in the C style as needed. In a subsequent commit,
missing object type info will also be added.
Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/repack: fix `--keep-unreachable` when there are no packs
The "--keep-unreachable" flag is supposed to append any unreachable
objects to the newly written pack. This flag is explicitly documented as
appending both packed and loose unreachable objects to the new packfile.
And while this works alright when repacking with preexisting packfiles,
it stops working when the repository does not have any packfiles at all.
The root cause are the conditions used to decide whether or not we want
to append "--pack-loose-unreachable" to git-pack-objects(1). There are
a couple of conditions here:
- `has_existing_non_kept_packs()` checks whether there are existing
packfiles. This condition makes sense to guard "--keep-pack=",
"--unpack-unreachable" and "--keep-unreachable", because all of
these flags only make sense in combination with existing packfiles.
But it does not make sense to disable `--pack-loose-unreachable`
when there aren't any preexisting packfiles, as loose objects can be
packed into the new packfile regardless of that.
- `delete_redundant` checks whether we want to delete any objects or
packs that are about to become redundant. The documentation of
`--keep-unreachable` explicitly says that `git repack -ad` needs to
be executed for the flag to have an effect.
It is not immediately obvious why such redundant objects need to be
deleted in order for "--pack-unreachable-objects" to be effective.
But as things are working as documented this is nothing we'll change
for now.
- `pack_everything & PACK_CRUFT` checks that we're not creating a
cruft pack. This condition makes sense in the context of
"--pack-loose-unreachable", as unreachable objects would end up in
the cruft pack anyway.
So while the second and third condition are sensible, it does not make
any sense to condition `--pack-loose-unreachable` on the existence of
packfiles.
Fix the bug by splitting out the "--pack-loose-unreachable" and only
making it depend on the second and third condition. Like this, loose
unreachable objects will be packed regardless of any preexisting
packfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meet Soni [Tue, 4 Feb 2025 04:05:58 +0000 (09:35 +0530)]
refspec: relocate apply_refspecs and related funtions
Move the functions `apply_refspecs()` and `apply_negative_refspecs()`
from `remote.c` to `refspec.c`. These functions focus on applying
refspecs, so centralizing them in `refspec.c` improves code organization
by keeping refspec-related logic in one place.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meet Soni [Tue, 4 Feb 2025 04:05:57 +0000 (09:35 +0530)]
refspec: relocate matching related functions
Move the functions `refspec_find_match()`, `refspec_find_all_matches()`
and `refspec_find_negative_match()` from `remote.c` to `refspec.c`.
These functions focus on matching refspecs, so centralizing them in
`refspec.c` improves code organization by keeping refspec-related logic
in one place.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meet Soni [Tue, 4 Feb 2025 04:05:56 +0000 (09:35 +0530)]
remote: rename query_refspecs functions
Rename functions related to handling refspecs in preparation for their
move from `remote.c` to `refspec.c`. Update their names to better
reflect their intent:
- `query_refspecs()` -> `refspec_find_match()` for clarity, as it
finds a single matching refspec.
- `query_refspecs_multiple()` -> `refspec_find_all_matches()` to
better reflect that it collects all matching refspecs instead of
returning just the first match.
- `query_matches_negative_refspec()` ->
`refspec_find_negative_match()` for consistency with the
updated naming convention, even though this static function
didn't strictly require renaming.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the functions `refname_matches_negative_refspec_item()`,
`refspec_match()`, and `match_name_with_pattern()` from `remote.c` to
`refspec.c`. These functions focus on refspec matching, so placing them
in `refspec.c` aligns with the separation of concerns. Keep
refspec-related logic in `refspec.c` and remote-specific logic in
`remote.c` for better code organization.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meet Soni [Tue, 4 Feb 2025 04:05:54 +0000 (09:35 +0530)]
remote: rename function omit_name_by_refspec
Rename the function `omit_name_by_refspec()` to
`refname_matches_negative_refspec_item()` to provide clearer intent.
The previous function name was vague and did not accurately describe its
purpose. By using `refname_matches_negative_refspec_item`, make the
function's purpose more intuitive, clarifying that it checks if a
reference name matches any negative refspec.
Rename function parameters for consistency with existing naming
conventions. Use `refname` instead of `name` to align with terminology
in `refs.h`.
Remove the redundant doc comment since the function name is now
self-explanatory.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 3 Feb 2025 17:11:07 +0000 (17:11 +0000)]
backfill: assume --sparse when sparse-checkout is enabled
The previous change introduced the '--[no-]sparse' option for the 'git
backfill' command, but did not assume it as enabled by default. However,
this is likely the behavior that users will most often want to happen.
Without this default, users with a small sparse-checkout may be confused
when 'git backfill' downloads every version of every object in the full
history.
However, this is left as a separate change so this decision can be reviewed
independently of the value of the '--[no-]sparse' option.
Add a test of adding the '--sparse' option to a repo without sparse-checkout
to make it clear that supplying it without a sparse-checkout is an error.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 3 Feb 2025 17:11:06 +0000 (17:11 +0000)]
backfill: add --sparse option
One way to significantly reduce the cost of a Git clone and later fetches is
to use a blobless partial clone and combine that with a sparse-checkout that
reduces the paths that need to be populated in the working directory. Not
only does this reduce the cost of clones and fetches, the sparse-checkout
reduces the number of objects needed to download from a promisor remote.
However, history investigations can be expensive as computing blob diffs
will trigger promisor remote requests for one object at a time. This can be
avoided by downloading the blobs needed for the given sparse-checkout using
'git backfill' and its new '--sparse' mode, at a time that the user is
willing to pay that extra cost.
Note that this is distinctly different from the '--filter=sparse:<oid>'
option, as this assumes that the partial clone has all reachable trees and
we are using client-side logic to avoid downloading blobs outside of the
sparse-checkout cone. This avoids the server-side cost of walking trees
while also achieving a similar goal. It also downloads in batches based on
similar path names, presenting a resumable download if things are
interrupted.
This augments the path-walk API to have a possibly-NULL 'pl' member that may
point to a 'struct pattern_list'. This could be more general than the
sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently
the only consumer.
Be sure to test this in both cone mode and not cone mode. Cone mode has the
benefit that the path-walk can skip certain paths once they would expand
beyond the sparse-checkout. Non-cone mode can describe the included files
using both positive and negative patterns, which changes the possible return
values of path_matches_pattern_list(). Test both kinds of matches for
increased coverage.
To test this, we can create a blobless sparse clone, expand the
sparse-checkout slightly, and then run 'git backfill --sparse' to see
how much data is downloaded. The general steps are
This case is more meaningful because a full clone of the Linux
repository is currently over 6 GB, so this is a valuable way to download
a fraction of the repository and no longer need network access for all
reachable objects within the sparse-checkout.
Choosing a batch size will depend on a lot of factors, including the
user's network speed or reliability, the repository's file structure,
and how many versions there are of the file within the sparse-checkout
scope. There will not be a one-size-fits-all solution.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 3 Feb 2025 17:11:05 +0000 (17:11 +0000)]
backfill: add --min-batch-size=<n> option
Users may want to specify a minimum batch size for their needs. This is only
a minimum: the path-walk API provides a list of OIDs that correspond to the
same path, and thus it is optimal to allow delta compression across those
objects in a single server request.
We could consider limiting the request to have a maximum batch size in the
future. For now, we let the path-walk API batches determine the
boundaries.
To get a feeling for the value of specifying the --min-batch-size parameter,
I tested a number of open source repositories available on GitHub. The
procedure was generally:
Here, we don't have much difference in the size of the repo, though the
500K batch size results in a few MB gained. That comes at a cost of a
much longer time. This extra time is due to server-side delta
compression happening as the on-disk deltas don't appear to be reusable
all the time. But for smaller batch sizes, the server is able to find
reasonable deltas partly because we are asking for objects that appear
in the same region of the directory tree and include all versions of a
file at a specific path.
To contrast this example, I tested the microsoft/fluentui repo, which
has been known to have inefficient packing due to name hash collisions.
These results are found before GitHub had the opportunity to repack the
server with more advanced name hash versions:
Here, a larger variety of batch sizes were chosen because of the great
variation in results. By asking the server to download small batches
corresponding to fewer paths at a time, the server is able to provide
better compression for these batches than it would for a regular clone.
A typical full clone for this repository would require 738 MB.
This example justifies the choice to batch requests by path name,
leading to improved communication with a server that is not optimally
packed.
Finally, the same experiment for the Linux repository had these results:
Even in this example, where the default name hash algorithm leads to
decent compression of the Linux kernel repository, there is value for
selecting a smaller batch size, to a limit. The 25K batch size has the
fastest time, but uses 250 MB more than the 50K batch size. The 500K
batch size took much more time due to server compression time and thus
we should avoid large batch sizes like this.
Based on these experiments, a batch size of 50,000 was chosen as the
default value.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 3 Feb 2025 17:11:04 +0000 (17:11 +0000)]
backfill: basic functionality and tests
The default behavior of 'git backfill' is to fetch all missing blobs that
are reachable from HEAD. Document and test this behavior.
The implementation is a very simple use of the path-walk API, initializing
the revision walk at HEAD to start the path-walk from all commits reachable
from HEAD. Ignore the object arrays that correspond to tree entries,
assuming that they are all present already.
The path-walk API provides lists of objects in batches according to a
common path, but that list could be very small. We want to balance the
number of requests to the server with the ability to have the process
interrupted with minimal repeated work to catch up in the next run.
Based on some experiments (detailed in the next change) a minimum batch
size of 50,000 is selected for the default.
This batch size is a _minimum_. As the path-walk API emits lists of blob
IDs, they are collected into a list of objects for a request to the
server. When that list is at least the minimum batch size, then the
request is sent to the server for the new objects. However, the list of
blob IDs from the path-walk API could be much longer than the batch
size. At this moment, it is unclear if there is a benefit to split the
list when there are too many objects at the same path.
Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>