Elijah Newren [Tue, 21 Mar 2023 06:26:02 +0000 (06:26 +0000)]
treewide: remove unnecessary includes of cache.h
The last several commits were geared at replacing the include of cache.h
in strbuf.c with an include of git-compat-util.h. Unfortunately, I had
to drop a patch moving some functions from cache.h to object-name.h, due
to excessive conflicts with other in-flight topics.
However, even without that patch, the series of patches so far allows us
to modify a number of C files to replace an include of cache.h with
git-compat-util.h. Do that to reduce our dependencies.
(If we could have kept our object-name.h patch in this series, it would
have also let us reduce the includes in checkout.c and fmt-merge-msg.c
in addition to strbuf.c).
Just to ensure that nothing else was bringing in cache.h, all of the
affected files have been checked to ensure that
gcc -E -I. $SOURCE_FILE | grep '"cache.h"'
found no hits and that
make DEVELOPER=1 ${OBJECT_FILE_FOR_SOURCE_FILE}
successfully compiles without warnings.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Tue, 21 Mar 2023 06:25:59 +0000 (06:25 +0000)]
cache.h: remove expand_user_path()
expand_user_path() was renamed to interpolate_path() back in mid-2021,
but reinstated with a #define and a NEEDSWORK comment that we would
eventually want to get rid of it. Do so now.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Tue, 21 Mar 2023 06:25:58 +0000 (06:25 +0000)]
abspath.h: move absolute path functions from cache.h
This is another step towards letting us remove the include of cache.h in
strbuf.c. It does mean that we also need to add includes of abspath.h
in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Tue, 21 Mar 2023 06:25:56 +0000 (06:25 +0000)]
treewide: remove unnecessary cache.h inclusion from several sources
A number of files were apparently including cache.h solely to get
gettext.h. By making those files explicitly include gettext.h, we can
already drop the include of cache.h in these files. On top of that,
there were some files using cache.h that didn't need to for any reason.
Remove these unnecessary includes.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Tue, 21 Mar 2023 06:25:55 +0000 (06:25 +0000)]
treewide: remove unnecessary inclusion of gettext.h
Looking at things from the opposite angle of the last patch, we had a
few files that were including gettext.h and perhaps needed it at some
point in history, but no longer require it. Remove the include.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Tue, 21 Mar 2023 06:25:54 +0000 (06:25 +0000)]
treewide: be explicit about dependence on gettext.h
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Elijah Newren [Tue, 21 Mar 2023 06:25:53 +0000 (06:25 +0000)]
treewide: remove unnecessary cache.h inclusion from a few headers
Ever since a64215b6cd ("object.h: stop depending on cache.h; make
cache.h depend on object.h", 2023-02-24), we have a few headers that
could have replaced their include of cache.h with an include of
object.h. Make that change now.
Some C files had to start including cache.h after this change (or some
smaller header it had brought in), because the C files were depending
on things from cache.h but were only formerly implicitly getting
cache.h through one of these headers being modified in this patch.
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 20 Mar 2023 11:26:55 +0000 (11:26 +0000)]
commit-reach: add tips_reachable_from_bases()
Both 'git for-each-ref --merged=<X>' and 'git branch --merged=<X>' use
the ref-filter machinery to select references or branches (respectively)
that are reachable from a set of commits presented by one or more
--merged arguments. This happens within reach_filter(), which uses the
revision-walk machinery to walk history in a standard way.
However, the commit-reach.c file is full of custom searches that are
more efficient, especially for reachability queries that can terminate
early when reachability is discovered. Add a new
tips_reachable_from_bases() method to commit-reach.c and call it from
within reach_filter() in ref-filter.c. This affects both 'git branch'
and 'git for-each-ref' as tested in p1500-graph-walks.sh.
For the Linux kernel repository, we take an already-fast algorithm and
make it even faster:
Test HEAD~1 HEAD
-------------------------------------------------------------------
1500.5: contains: git for-each-ref --merged 0.13 0.02 -84.6%
1500.6: contains: git branch --merged 0.14 0.02 -85.7%
1500.7: contains: git tag --merged 0.15 0.03 -80.0%
(Note that we remove the iterative 'git rev-list' test from p1500
because it no longer makes sense as a comparison to 'git for-each-ref'
and would just waste time running it for these comparisons.)
The algorithm is implemented in commit-reach.c in the method
tips_reachable_from_base(). This method takes a string_list of tips and
assigns the 'util' for each item with the value 1 if the base commit can
reach those tips.
Like other reachability queries in commit-reach.c, the fastest way to
search for "can A reach B?" is to do a depth-first search up to the
generation number of B, preferring to explore first parents before later
parents. While we must walk all reachable commits up to that generation
number when the answer is "no", the depth-first search can answer "yes"
much faster than other approaches in most cases.
This search becomes trickier when there are multiple targets for the
depth-first search. The commits with lower generation number are more
likely to be within the history of the start commit, but we don't want
to waste time searching commits of low generation number if the commit
target with lowest generation number has already been found.
The trick here is to take the input commits and sort them by generation
number in ascending order. Track the index within this order as
min_generation_index. When we find a commit, if its index in the list is
equal to min_generation_index, then we can increase the generation
number boundary of our search to the next-lowest value in the list.
With this mechanism, the number of commits to search is minimized with
respect to the depth-first search heuristic. We will walk all commits up
to the minimum generation number of a commit that is _not_ reachable
from the start, but we will walk only the necessary portion of the
depth-first search for the reachable commits of lower generation.
Add extra tests for this behavior in t6600-test-reach.sh as the
interesting data shape of that repository can sometimes demonstrate
corner case bugs.
Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 20 Mar 2023 11:26:54 +0000 (11:26 +0000)]
for-each-ref: add ahead-behind format atom
The previous change implemented the ahead_behind() method, including an
algorithm to compute the ahead/behind values for a number of commit tips
relative to a number of commit bases. Now, integrate that algorithm as
part of 'git for-each-ref' hidden behind a new format atom,
ahead-behind. This naturally extends to 'git branch' and 'git tag'
builtins, as well.
This format allows specifying multiple bases, if so desired, and all
matching references are compared against all of those bases. For this
reason, failing to read a reference provided from these atoms results in
an error.
In order to translate the ahead_behind() method information to the
format output code in ref-filter.c, we must populate arrays of
ahead_behind_count structs. In struct ref_array, we store the full array
that will be passed to ahead_behind(). In struct ref_array_item, we
store an array of pointers that point to the relvant items within the
full array. In this way, we can pull all relevant ahead/behind values
directly when formatting output for a specific item. It also ensures the
lifetime of the ahead_behind_count structs matches the time that the
array is being used.
Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as
it has an interesting repository shape. In particular, its merging
strategy and its use of different commit-graphs would demonstrate over-
counting if the ahead_behind() method did not already account for that
possibility.
Also add tests for the specific for-each-ref, branch, and tag builtins.
In the case of 'git tag', there are intersting cases that happen when
some of the selected tips are not commits. This requires careful logic
around commits_nr in the second loop of filter_ahead_behind(). Also, the
test in t7004 is carefully located to avoid being dependent on the GPG
prereq. It also avoids using the test_commit helper, as that will add
ticks to the time and disrupt the expected timestamps in later tag
tests.
Also add performance tests in a new p1300-graph-walks.sh script. This
will be useful for more uses in the future, but for now compare the
ahead-behind counting algorithm in 'git for-each-ref' to the naive
implementation by running 'git rev-list --count' processes for each
input.
For the Git source code repository, the improvement is already obvious:
Test this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00)
1500.3: ahead-behind counts: git branch 0.07(0.06+0.00)
1500.4: ahead-behind counts: git tag 0.07(0.06+0.00)
1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27)
But the standard performance benchmark is the Linux kernel repository,
which demosntrates a significant improvement:
Test this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02)
1500.3: ahead-behind counts: git branch 0.27(0.24+0.03)
1500.4: ahead-behind counts: git tag 0.28(0.27+0.01)
1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54)
The 'git rev-list' test exists in this change as a demonstration, but it
will be removed in the next change to avoid wasting time on this
comparison.
Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 20 Mar 2023 11:26:53 +0000 (11:26 +0000)]
commit-reach: implement ahead_behind() logic
Fully implement the commit-counting logic required to determine
ahead/behind counts for a batch of commit pairs. This is a new library
method within commit-reach.h. This method will be linked to the
for-each-ref builtin in the next change.
The interface for ahead_behind() uses two arrays. The first array of
commits contains the list of all starting points for the walk. This
includes all tip commits _and_ base commits. The second array specifies
base/tip pairs by pointing to commits within the first array, by index.
The second array also stores the resulting ahead/behind counts for each
of these pairs.
This implementation of ahead_behind() allows multiple bases, if desired.
Even with multiple bases, there is only one commit walk used for
counting the ahead/behind values, saving time when the base/tip ranges
overlap significantly.
This interface for ahead_behind() also makes it very easy to call
ensure_generations_valid() on the entire array of bases and tips. This
call is necessary because it is critical that the walk that counts
ahead/behind values never walks a commit more than once. Without
generation numbers on every commit, there is a possibility that a
commit date skew could cause the walk to revisit a commit and then
double-count it. For this reason, it is strongly recommended that 'git
ahead-behind' is only run in a repository with a commit-graph file that
covers most of the reachable commits, storing precomputed generation
numbers. If no commit-graph exists, this walk will be much slower as it
must walk all reachable commits in ensure_generations_valid() before
performing the counting logic.
It is possible to detect if generation numbers are available at run time
and redirect the implementation to another algorithm that does not
require this property. However, that implementation requires a commit
walk per base/tip pair _and_ can be slower due to the commit date
heuristics required. Such an implementation could be considered in the
future if there is a reason to include it, but most Git hosts should
already be generating a commit-graph file as part of repository
maintenance. Most Git clients should also be generating commit-graph
files as part of background maintenance or automatic GCs.
Now, let's discuss the ahead/behind counting algorithm.
The first array of commits are considered the starting commits. The
index within that array will play a critical role.
We create a new commit slab that maps commits to a bitmap. For a given
commit (anywhere in the history), its bitmap stores information relative
to which of the input commits can reach that commit. The ith bit will be
on if the ith commit from the starting list can reach that commit. It is
important to notice that these bitmaps are not the typical "reachability
bitmaps" that are stored in .bitmap files. Instead of signalling which
objects are reachable from the current commit, they instead signal
"which starting commits can reach me?" It is also important to know that
the bitmap is not necessarily "complete" until we walk that commit. We
will perform a commit walk by generation number in such a way that we
can guarantee the bitmap is correct when we visit that commit.
At the beginning of the ahead_behind() method, we initialize the bitmaps
for each of the starting commits. By enabling the ith bit for the ith
starting commit, we signal "the ith commit can reach itself."
We walk commits by popping the commit with maximum generation number out
of the queue, guaranteeing that we will never walk a child of that
commit in any future steps.
As we walk, we load the bitmap for the current commit and perform two
main steps. The _second_ step examines each parent of the current commit
and adds the current commit's bitmap bits to each parent's bitmap. (We
create a new bitmap for the parent if this is our first time seeing that
parent.) After adding the bits to the parent's bitmap, the parent is
added to the walk queue. Due to this passing of bits to parents, the
current commit has a guarantee that the ith bit is enabled on its bitmap
if and only if the ith commit can reach the current commit.
The first step of the walk is to examine the bitmask on the current
commit and decide which ranges the commit is in or not. Due to the "bit
pushing" in the second step, we have a guarantee that the ith bit of the
current commit's bitmap is on if and only if the ith starting commit can
reach it. For each ahead_behind_count struct, check the base_index and
tip_index to see if those bits are enabled on the current bitmap. If
exactly one bit is enabled, then increment the corresponding 'ahead' or
'behind' count. This increment is the reason we _absolutely need_ to
walk commits at most once.
The only subtle thing to do with this walk is to check to see if a
parent has all bits on in its bitmap, in which case it becomes "stale"
and is marked with the STALE bit. This allows queue_has_nonstale() to be
the terminating condition of the walk, which greatly reduces the number
of commits walked if all of the commits are nearby in history. It avoids
walking a large number of common commits when there is a deep history.
We also use the helper method insert_no_dup() to add commits to the
priority queue without adding them multiple times. This uses the PARENT2
flag. Thus, we must clear both the STALE and PARENT2 bits of all
commits, in case ahead_behind() is called multiple times in the same
process.
Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the just-introduced compute_reachable_generation_numbers_1() to
implement a function which dynamically computes topological levels (or
corrected commit dates) for out-of-graph commits.
This will be useful for the ahead-behind algorithm we are about to
introduce, which needs accurate topological levels on _all_ commits
reachable from the tips in order to avoid over-counting.
Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 20 Mar 2023 11:26:51 +0000 (11:26 +0000)]
commit-graph: return generation from memory
The commit_graph_generation() method used to report a value of
GENERATION_NUMBER_INFINITY if the commit_graph_data_slab had an instance
for the given commit but the graph_pos indicated the commit was not in
the commit-graph file.
However, an upcoming change will introduce the ability to set generation
values in-memory without writing the commit-graph file. Thus, we can no
longer trust 'graph_pos' to indicate whether or not the generation
member can be trusted.
Instead, trust the 'generation' member if the commit has a value in the
slab _and_ the 'generation' member is non-zero. Otherwise, treat it as
GENERATION_NUMBER_INFINITY.
This only makes a difference for a very old case for the commit-graph:
the very first Git release to write commit-graph files wrote zeroes in
the topological level positions. If we are parsing a commit-graph with
all zeroes, those commits will now appear to have
GENERATION_NUMBER_INFINITY (as if they were not parsed from the
commit-graph).
I attempted several variations to work around the need for providing an
uninitialized 'generation' member, but this was the best one I found. It
does require a change to a verification test in t5318 because it reports
a different error than the one about non-zero generation numbers.
Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change introduced the generic algorithm
compute_reachable_generation_numbers() and used it as the core
functionality of compute_topological_levels(). Now, use it as the core
functionality of compute_generation_numbers().
The main difference here is that we use generation version 2, which is
used in to toggle the logic in compute_generation_from_max() for
computing the corrected commit date based on the corrected commit dates
of the parent commits (and the commit date of the current commit). It
also uses different methods for (get|set)_generation in the vtable in
order to store and access the value in the correct places.
Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch extracts the common code used to compute topological levels
and corrected committer dates into a common routine,
compute_reachable_generation_numbers(). For ease of reading, it only
modifies compute_topological_levels() to use this new routine, leaving
compute_generation_numbers() to be modified in the next change.
This new routine dispatches to call the necessary functions to get and
set the generation number for a given commit through a vtable (the
compute_generation_info struct).
Computing the generation number itself is done in
compute_generation_from_max(), which dispatches its implementation based
on the generation version requested, or issuing a BUG() for unrecognized
generation versions. This does not use a vtable because the logic
depends only on the generation number version, not where the data is
being loaded from or being stored to. This is a subtle point that will
make more sense in a future change that modifies the in-memory
generation values instead of just preparing values for writing to a
commit-graph file.
This change looks like it adds a lot of new code. However, two upcoming
changes will be quite small due to the work being done in this change.
Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 20 Mar 2023 11:26:48 +0000 (11:26 +0000)]
for-each-ref: explicitly test no matches
The for-each-ref builtin can take a list of ref patterns, but if none
match, it still succeeds (but with no output). Add an explicit test that
demonstrates that behavior.
Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Derrick Stolee [Mon, 20 Mar 2023 11:26:47 +0000 (11:26 +0000)]
for-each-ref: add --stdin option
When a user wishes to input a large list of patterns to 'git
for-each-ref' (likely a long list of exact refs) there are frequently
system limits on the number of command-line arguments.
Add a new --stdin option to instead read the patterns from standard
input. Add tests that check that any unrecognized arguments are
considered an error when --stdin is provided. Also, an empty pattern
list is interpreted as the complete ref set.
When reading from stdin, we populate the filter.name_patterns array
dynamically as opposed to pointing to the 'argv' array directly. This is
simple when using a strvec, as it is NULL-terminated in the same way. We
then free the memory directly from the strvec.
Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
SZEDER Gábor [Sun, 19 Mar 2023 16:56:48 +0000 (17:56 +0100)]
parse-options.h: use designated initializers in OPT_* macros
Use designated initializers in the expansions of the OPT_* macros to
make it more readable which one-letter macro parameter initializes
which field in the resulting 'struct option'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the 'help' parameter as it matches one of the fields in 'struct
option', and, while at it, rename all other parameters to the usual
one-letter name used in similar macro definitions.
Furthermore, put all parameters in the replacement list between
parentheses, like all other OPT_* macros do.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
SZEDER Gábor [Sun, 19 Mar 2023 16:56:46 +0000 (17:56 +0100)]
parse-options.h: use consistent name for the callback parameters
In the various OPT_* macros the 'f' parameter is usually used to
specify flags, while the 'cb' parameter is used to specify a callback
function. OPT_CALLBACK and OPT_NUMBER_CALLBACKS, however, are
inconsistent with the rest, as they use 'f' to specify their callback
function.
Rename their callback macro parameters to 'cb' to avoid the
inconsistency.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
SZEDER Gábor [Sun, 19 Mar 2023 16:27:12 +0000 (17:27 +0100)]
treewide: remove unnecessary inclusions of parse-options.h from headers
The headers 'diagnose.h', 'list-objects-filter-options.h',
'ref-filter.h' and 'remote.h' declare option parsing callback
functions with a 'struct option*' parameter, and 'revision.h' declares
an option parsing helper function taking 'struct parse_opt_ctx_t*' and
'struct option*' parameters. These headers all include
'parse-options.h', although they don't need any of the type
definitions from that header file. Furthermore,
'list-objects-filter-options.h' and 'ref-filter.h' also define some
OPT_* macros to initialize a 'struct option', but these don't
necessitate the inclusion of parse-options.h in these headers either,
because these macros are only expanded in source files.
Remove these unnecessary inclusions of parse-options.h and use forward
declarations to declare the necessary types.
After this patch none of the header files include parse-options.h
anymore.
With these changes, the build time after modifying only
parse-options.h is reduced by about 30%, and the number of targets
built is almost 20% less:
Before:
$ touch parse-options.h && time make -j4 |wc -l
353
real 1m1.527s
user 3m32.205s
sys 0m15.903s
After:
289
real 0m39.285s
user 2m12.540s
sys 0m11.164s
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
SZEDER Gábor [Sun, 19 Mar 2023 16:27:11 +0000 (17:27 +0100)]
treewide: include parse-options.h in source files
The builtins 'ls-remote', 'pack-objects', 'receive-pack', 'reflog' and
'send-pack' use parse_options(), but their source files don't directly
include 'parse-options.h'. Furthermore, the source files
'diagnose.c', 'list-objects-filter-options.c', 'remote.c' and
'send-pack.c' define option parsing callback functions, while
'revision.c' defines an option parsing helper function, and thus need
access to various fields in 'struct option' and 'struct
parse_opt_ctx_t', but they don't directly include 'parse-options.h'
either. They all can still be built, of course, because they include
one of the header files that does include 'parse-options.h' (though
unnecessarily, see the next commit).
Add those missing includes to these files, as our general rule is that
"a C file must directly include the header files that declare the
functions and the types it uses".
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to print updated references during a fetch, the two different
call sites that do this will first call `format_display()` followed by a
call to `fputs()`. This is needlessly roundabout now that we have the
`display_state` structure that encapsulates all of the printing logic
for references.
Move displaying the reference updates into `format_display()` and rename
it to `display_ref_update()` to better match its new purpose, which
finalizes the conversion to make both the formatting and printing logic
of reference updates self-contained. This will make it easier to add new
output formats and printing to a different file descriptor than stderr.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching from a remote, we not only print the actual references
that have changed, but will also print the URL from which we have
fetched them to standard output. The logic to handle this is duplicated
across two different callsites with some non-trivial logic to compute
the anonymized URL. Furthermore, we're using global state to track
whether we have already shown the URL to the user or not.
Refactor the code by moving it into `format_display()`. Like this, we
can convert the global variable into a member of `display_state`. And
second, we can deduplicate the logic to compute the anonymized URL.
This also works as expected when fetching from multiple remotes, for
example via a group of remotes, as we do this by forking a standalone
git-fetch(1) process per remote that is to be fetched.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch: centralize handling of per-reference format
The function `format_display()` is used to print a single reference
update to a buffer which will then ultimately be printed by the caller.
This architecture causes us to duplicate some logic across the different
callsites of this function. This makes it hard to follow the code as
some parts of the logic are located in one place, while other parts of
the logic are located in a different place. Furthermore, by having the
logic scattered around it becomes quite hard to implement a new output
format for the reference updates.
We can make the logic a whole lot easier to understand by making the
`format_display()` function self-contained so that it handles formatting
and printing of the references. This will eventually allow us to easily
implement a completely different output format, but also opens the door
to conditionally print to either stdout or stderr depending on the
output format.
As a first step towards that goal we move the formatting directive used
by both callers to print a single reference update into this function.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch: pass the full local reference name to `format_display`
Before printing the name of the local references that would be updated
by a fetch we first prettify the reference name. This is done at the
calling side so that `format_display()` never sees the full name of the
local reference. This restricts our ability to introduce new output
formats that might want to print the full reference name.
Right now, all callsites except one are prettifying the reference name
anyway. And the only callsite that doesn't passes `FETCH_HEAD` as the
hardcoded reference name to `format_display()`, which would never be
changed by a call to `prettify_refname()` anyway. So let's refactor the
code to pass in the full local reference name and then prettify it in
the formatting code.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-fetch(1) command supports printing references either in "full"
or "compact" format depending on the `fetch.ouput` config key. The
format that is to be used is tracked in a global variable.
De-globalize the variable by moving it into the `display_state`
structure.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch: move reference width calculation into `display_state`
In order to print references in proper columns we need to calculate the
width of the reference column before starting to print the references.
This is done with the help of a global variable `refcol_width`.
Refactor the code to instead use a new structure `display_state` that
contains the computed width and plumb it through the stack as required.
This is only the first step towards de-globalizing the state required to
print references.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Mon, 20 Mar 2023 16:10:02 +0000 (16:10 +0000)]
wildmatch: hide internal return values
WM_ABORT_ALL and WM_ABORT_TO_STARSTAR are used internally to limit
backtracking when a match fails, they are not of interest to the caller
and so should not be public.
Suggested-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Mon, 20 Mar 2023 16:10:01 +0000 (16:10 +0000)]
wildmatch: avoid undefined behavior
The code changed in this commit is designed to check if the pattern
starts with "**/" or contains "/**/" (see 3a078dec33 (wildmatch: fix
"**" special case, 2013-01-01)). Unfortunately when the pattern begins
with "**/" `prev_p = p - 2` is evaluated when `p` points to the second
"*" and so the subtraction is undefined according to section 6.5.6 of
the C standard because the result does not point within the same object
as `p`. Fix this by avoiding the subtraction unless it is well defined.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Phillip Wood [Mon, 20 Mar 2023 16:10:00 +0000 (16:10 +0000)]
wildmatch: fix exponential behavior
When dowild() cannot match a '*' or '/**/' wildcard then it must return
WM_ABORT_TO_STARSTAR or WM_ABORT_ALL respectively. Failure to observe
this results in unnecessary backtracking and the time taken for a failed
match increases exponentially with the number of wildcards in the
pattern [1]. Unfortunately in some instances dowild() returns WM_NOMATCH
for a failed match resulting in long match times for patterns containing
multiple wildcards as can be seen in the following benchmark.
(Note that the timings in the Benchmark 1 are really measuring the time
to execute test-tool rather than the time to match the pattern)
Benchmark 1: t/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a"
Time (mean ± σ): 22.8 ms ± 1.7 ms [User: 12.1 ms, System: 10.6 ms]
Range (min … max): 19.4 ms … 26.9 ms 113 runs
Warning: Ignoring non-zero exit code.
Benchmark 2: t/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a*a*a*a*a*a*a*a*a"
Time (mean ± σ): 5.244 s ± 0.228 s [User: 5.229 s, System: 0.010 s]
Range (min … max): 4.969 s … 5.707 s 10 runs
The security implications are limited as it only affects operations that
are potentially DoS vectors. For example by creating a blob containing
such a pattern a malicious user can exploit this behavior to use large
amounts of CPU time on a remote server by pushing the blob and then
creating a new clone with --filter=sparse:oid. However this filter type
is usually disabled as it is known to consume large amounts of CPU time
even without this bug.
The WM_MATCH changed in the first hunk of this patch comes from the
original implementation imported from rsync in 5230f605e1 (Import
wildmatch from rsync, 2012-10-15). Compared to the others converted here
it is fairly harmless as it only triggers at the end of the pattern and
so will only cause a single unnecessary backtrack. The others introduced
by 6f1a31f0aa (wildmatch: advance faster in <asterisk> + <literal>
patterns, 2013-01-01) and 46983441ae (wildmatch: make a special case for
"*/" with FNM_PATHNAME, 2013-01-01) are more pernicious and will cause
exponential behavior.
A new test is added to protect against future regressions.
[1] https://research.swtch.com/glob
Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei Rybak [Sat, 18 Mar 2023 15:46:46 +0000 (16:46 +0100)]
t1507: assert output of rev-parse
Tests in t1507-rev-parse-upstream.sh compare files "expect" and "actual"
to assert the output of "git rev-parse", "git show", and "git log".
However, two of the tests '@{reflog}-parsing does not look beyond colon'
and '@{upstream}-parsing does not look beyond colon' don't inspect the
contents of the created files.
Assert output of "git rev-parse" in tests in t1507-rev-parse-upstream.sh
to improve test coverage.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei Rybak [Sat, 18 Mar 2023 15:46:45 +0000 (16:46 +0100)]
t1404: don't create unused file
Some tests in file t1404-update-ref-errors.sh create file "unchanged" as
the expected side for a test_cmp assertion at the end of the test for
output of "git for-each-ref". Test 'no bogus intermediate values during
delete' also creates a file named "unchanged" using "git for-each-ref".
However, the file isn't used for any assertions in the test. Instead,
"git rev-parse" is used to compare the reference with variable $D.
Don't create unused file "unchanged" in test 'no bogus intermediate
values during delete' of t1404-update-ref-errors.sh.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei Rybak [Sat, 18 Mar 2023 15:46:44 +0000 (16:46 +0100)]
t1400: assert output of update-ref
In t1400-update-ref.sh test 'transaction can create and delete' creates
files "expect" and "actual", but doesn't compare them. Similarly, test
'transaction cannot restart ongoing transaction' redirects output of
"git update-ref" to file "actual", but doesn't check its contents with
any assertions.
Assert output of "git update-ref" in tests to improve test coverage in
t1400-update-ref.sh.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei Rybak [Sat, 18 Mar 2023 15:46:43 +0000 (16:46 +0100)]
t1302: don't create unused file
Test 'gitdir selection on unsupported repo' in t1302-repo-version.sh
writes output of a "git config" invocation to file "actual". However,
the test doesn't have any assertions for the file. The file was used by
this test until commit b9605bc4f2 (config: only read .git/config from
configured repos, 2016-09-12), before which "git config" was expected to
print the bogus value of "core.repositoryformatversion" to standard
output.
Don't redirect output of "git config" to file "actual" in test 'gitdir
selection on unsupported repo'.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei Rybak [Sat, 18 Mar 2023 15:46:42 +0000 (16:46 +0100)]
t1010: don't create unused files
Builtin "git mktree" writes the the object name of the tree object built
to the standard output. Tests 'mktree refuses to read ls-tree -r output
(1)' and 'mktree refuses to read ls-tree -r output (2)' in
"t1010-mktree.sh" redirect output of "git mktree" to a file, but don't
use its contents in assertions.
Don't redirect output of "git mktree" to file "actual" in tests that
assert that an invocation of "git mktree" must fail.
Output of "git mktree" is empty when it refuses to build a tree object.
So, alternatively, the test could assert that the output is empty.
However, there isn't a good reason for the user to expect the command to
be silent in such cases, so we shouldn't enforce it. The user shouldn't
use the output of a failing command anyway.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei Rybak [Sat, 18 Mar 2023 15:46:41 +0000 (16:46 +0100)]
t1006: assert error output of cat-file
Test "cat-file $arg1 $arg2 error on missing full OID" in
t1006-cat-file.sh compares files "expect.err" and "err.actual" to assert
the expected error output of "git cat-file". A similar test in the same
file named "cat-file $arg1 $arg2 error on missing short OID" also
creates these two files, but doesn't use them in assertions.
Assert error output of "git cat-file" in test "cat-file $arg1 $arg2
error on missing short OID" of t1006-cat-file.sh to improve test
coverage.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Andrei Rybak [Sat, 18 Mar 2023 15:46:40 +0000 (16:46 +0100)]
t1005: assert output of ls-files
Test 'reset should work' in t1005-read-tree-reset.sh compares two files
"expect" and "actual" to assert the expected output of "git ls-files".
Several other tests in the same file also create files "expect" and
"actual", but don't use them in assertions.
Assert output of "git ls-files" in t1005-read-tree-reset.sh to improve
test coverage.
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sun, 19 Mar 2023 22:03:12 +0000 (15:03 -0700)]
Merge branch 'fc/advice-diverged-history'
After "git pull" that is configured with pull.rebase=false
merge.ff=only fails due to our end having our own development, give
advice messages to get out of the "Not possible to fast-forward"
state.
* fc/advice-diverged-history:
advice: add diverging advice for novices
Once we start running, we assumed that the list of alternate object
databases would never change. Hook into the machinery used to
update the list of packfiles during runtime to update this list as
well.
* ds/reprepare-alternates-when-repreparing-packfiles:
object-file: reprepare alternates when necessary
Junio C Hamano [Sun, 19 Mar 2023 22:03:12 +0000 (15:03 -0700)]
Merge branch 'jk/bundle-use-dash-for-stdfiles'
"git bundle" learned that "-" is a common way to say that the input
comes from the standard input and/or the output goes to the
standard output. It used to work only for output and only from the
root level of the working tree.
* jk/bundle-use-dash-for-stdfiles:
parse-options: use prefix_filename_except_for_dash() helper
parse-options: consistently allocate memory in fix_filename()
bundle: don't blindly apply prefix_filename() to "-"
bundle: document handling of "-" as stdin
bundle: let "-" mean stdin for reading operations
A few subcommands have been taught to stop users from working on a
branch that is being used in another worktree linked to the same
repository.
* rj/avoid-switching-to-already-used-branch:
switch: reject if the branch is already checked out elsewhere (test)
rebase: refuse to switch to a branch already checked out elsewhere (test)
branch: fix die_if_checked_out() when ignore_current_worktree
worktree: introduce is_shared_symref()
Junio C Hamano [Sun, 19 Mar 2023 22:03:11 +0000 (15:03 -0700)]
Merge branch 'rj/bisect-already-used-branch'
Allow "git bisect reset" to check out the original branch when the
branch is already checked out in a different worktree linked to the
same repository.
* rj/bisect-already-used-branch:
bisect: fix "reset" when branch is checked out elsewhere
Junio C Hamano [Sun, 19 Mar 2023 22:03:10 +0000 (15:03 -0700)]
Merge branch 'zh/push-to-delete-onelevel-ref'
"git push" has been taught to allow deletion of refs with one-level
names to help repairing a repository who acquired such a ref by
mistake. In general, we don't encourage use of such a ref, and
creation or update to such a ref is rejected as before.
"git restore" supports options like "--ours" that are only
meaningful during a conflicted merge, but these options are only
meaningful when updating the working tree files. These options are
marked to be incompatible when both "--staged" and "--worktree" are
in effect.
* ak/restore-both-incompatible-with-conflicts:
restore: fault --staged --worktree with merge opts
Jeff King [Fri, 17 Mar 2023 19:08:51 +0000 (15:08 -0400)]
git_connect(): fix corner cases in downgrading v2 to v0
There's code in git_connect() that checks whether we are doing a push
with protocol_v2, and if so, drops us to protocol_v0 (since we know
how to do v2 only for fetches). But it misses some corner cases:
1. it checks the "prog" variable, which is actually the path to
receive-pack on the remote side. By default this is just
"git-receive-pack", but it could be an arbitrary string (like
"/path/to/git receive-pack", etc). We'd accidentally stay in v2
mode in this case.
2. besides "receive-pack" and "upload-pack", there's one other value
we'd expect: "upload-archive" for handling "git archive --remote".
Like receive-pack, this doesn't understand v2, and should use the
v0 protocol.
In practice, neither of these causes bugs in the real world so far. We
do send a "we understand v2" probe to the server, but since no server
implements v2 for anything but upload-pack, it's simply ignored. But
this would eventually become a problem if we do implement v2 for those
endpoints, as older clients would falsely claim to understand it,
leading to a server response they can't parse.
We can fix (1) by passing in both the program path and the "name" of the
operation. I treat the name as a string here, because that's the pattern
set in transport_connect(), which is one of our callers (we were simply
throwing away the "name" value there before).
We can fix (2) by allowing only known-v2 protocols ("upload-pack"),
rather than blocking unknown ones ("receive-pack" and "upload-archive").
That will mean whoever eventually implements v2 push will have to adjust
this list, but that's reasonable. We'll do the safe, conservative thing
(sticking to v0) by default, and anybody working on v2 will quickly
realize this spot needs to be updated.
The new tests cover the receive-pack and upload-archive cases above, and
re-confirm that we allow v2 with an arbitrary "--upload-pack" path (that
already worked before this patch, of course, but it would be an easy
thing to break if we flipped the allow/block logic without also handling
"name" separately).
Here are a few miscellaneous implementation notes, since I had to do a
little head-scratching to understand who calls what:
- transport_connect() is called only for git-upload-archive. For
non-http git remotes, that resolves to the virtual connect_git()
function (which then calls git_connect(); confused yet?). So
plumbing through "name" in connect_git() covers that.
- for regular fetches and pushes, callers use higher-level functions
like transport_fetch_refs(). For non-http git remotes, that means
calling git_connect() under the hood via connect_setup(). And that
uses the "for_push" flag to decide which name to use.
- likewise, plumbing like fetch-pack and send-pack may call
git_connect() directly; they each know which name to use.
- for remote helpers (including http), we already have separate
parameters for "name" and "exec" (another name for "prog"). In
process_connect_service(), we feed the "name" to the helper via
"connect" or "stateless-connect" directives.
There's also a "servpath" option, which can be used to tell the
helper about the "exec" path. But no helpers we implement support
it! For http it would be useless anyway (no reasonable server
implementation will allow you to send a shell command to run the
server). In theory it would be useful for more obscure helpers like
remote-ext, but even there it is not implemented.
It's tempting to get rid of it simply to reduce confusion, but we
have publicly documented it since it was added in fa8c097cc9
(Support remote helpers implementing smart transports, 2009-12-09),
so it's possible some helper in the wild is using it.
- So for v2, helpers (again, including http) are mainly used via
stateless-connect, driven by the main program. But they do still
need to decide whether to do a v2 probe. And so there's similar
logic in remote-curl.c's discover_refs() that looks for
"git-receive-pack". But it's not buggy in the same way. Since it
doesn't support servpath, it is always dealing with a "service"
string like "git-receive-pack". And since it doesn't support
straight "connect", it can't be used for "upload-archive".
So we could leave that spot alone. But I've updated it here to match
the logic we're changing in connect_git(). That seems like the least
confusing thing for somebody who has to touch both of these spots
later (say, to add v2 push support). I didn't add a new test to make
sure this doesn't break anything; we already have several tests (in
t5551 and elsewhere) that make sure we are using v2 over http.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 17 Mar 2023 19:17:18 +0000 (15:17 -0400)]
transport: mark unused parameters in fetch_refs_from_bundle()
We don't look at the "to_fetch" or "nr_heads" parameters at all. At
first glance this seems like a bug (or at least pessimisation), because
it means we fetch more objects from the bundle than we actually need.
But the bundle does not have any way of computing the set of reachable
objects itself (we'd have to pull all of the objects out to walk them).
And anyway, we've probably already paid most of the cost of grabbing the
objects, since we must copy the bundle locally before accessing it.
So it's perfectly reasonable for the bundle code to just pull everything
into the local object store. Unneeded objects can be dropped later via
gc, etc.
But we should mark these unused parameters as such to avoid the wrath of
-Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 17 Mar 2023 19:17:08 +0000 (15:17 -0400)]
http: mark unused parameter in fill_active_slot() callbacks
We have a generic "fill" function that is used by both the dumb http
push and fetch code paths. It takes a void parameter in case the caller
wants to pass along extra data, but (since the previous commit) neither
does so.
So we could simply drop the extra parameter. But since it's good
practice to provide a void pointer for in callback functions, we'll
leave it here for the future, and just annotate it as unused (to appease
-Wunused-parameter).
While we're marking it, let's also fix the type in http-walker's
function to have the correct "void" type. The original had to cast the
function pointer and was technically undefined behavior (though
generally OK in practice).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 17 Mar 2023 19:16:34 +0000 (15:16 -0400)]
http: drop unused parameter from start_object_request()
We take a "walker" parameter for the request, but don't actually look at
it. This is due to 5424bc557f (http*: add helper methods for fetching
objects (loose), 2009-06-06). Before then, we consulted the "walker"
struct to tell us if we should be verbose, but now those messages are
printed elsewhere.
Let's drop the unused parameter to make -Wunused-parameter happy.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Fri, 17 Mar 2023 19:16:26 +0000 (15:16 -0400)]
mailmap: drop debugging code
There's some debugging code in mailmap.c which is only compiled if you
manually tweak the source to set DEBUG_MAILMAP. When it's not set, the
fallback noop uses static inline functions; we couldn't use macros here
because one of the functions is variadic (and variadic macros were
forbidden back then, but aren't now). As a result, this triggers
a -Wunused-parameter warning.
We have a few options here:
1. Leave it be. Just mark it as UNUSED, or switch to a variadic macro.
2. Assume the debugging code is useful, compile it always, and trigger
it with a run-time flag (e.g., with a trace key). This is pretty
easy to do, and carries a pretty small runtime cost.
3. Assume the debugging is not very useful, and just rip it out. This
matches what we did with a similar case in 69c5f17f11 (attr: drop
DEBUG_ATTR code, 2022-10-06).
The debugging flag has been mentioned only three times on the list.
Once, when it was added in 2009:
So it's not totally without value. On the other hand, it's not likely to
be useful to non-developers (and certainly isn't if you have to
recompile). And using a debugger or adding your own inspection code is
likely to be as useful. So I've just dropped the code entirely here.
Note that we do still have to mark a few parameters unused in callback
functions which are passed to string_list_clear_func(). Those get an
extra pointer with the string being cleared, which we previously fed to
the debugging code.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Fri, 17 Mar 2023 21:03:10 +0000 (14:03 -0700)]
Merge branch 'ew/fetch-hiderefs'
A new "fetch.hideRefs" option can be used to exclude specified refs
from "rev-list --objects --stdin --not --all" traversal for
checking object connectivity, most useful when there are many
unrelated histories in a single repository.
* ew/fetch-hiderefs:
fetch: support hideRefs to speed up connectivity checks
Junio C Hamano [Fri, 17 Mar 2023 21:03:08 +0000 (14:03 -0700)]
Merge branch 'en/header-cleanup'
Code clean-up to clarify the rule that "git-compat-util.h" must be
the first to be included.
* en/header-cleanup:
diff.h: remove unnecessary include of object.h
Remove unnecessary includes of builtin.h
treewide: replace cache.h with more direct headers, where possible
replace-object.h: move read_replace_refs declaration from cache.h to here
object-store.h: move struct object_info from cache.h
dir.h: refactor to no longer need to include cache.h
object.h: stop depending on cache.h; make cache.h depend on object.h
ident.h: move ident-related declarations out of cache.h
pretty.h: move has_non_ascii() declaration from commit.h
cache.h: remove dependence on hex.h; make other files include it explicitly
hex.h: move some hex-related declarations from cache.h
hash.h: move some oid-related declarations from cache.h
alloc.h: move ALLOC_GROW() functions from cache.h
treewide: remove unnecessary cache.h includes in source files
treewide: remove unnecessary cache.h includes
treewide: remove unnecessary git-compat-util.h includes in headers
treewide: ensure one of the appropriate headers is sourced first
Junio C Hamano [Fri, 17 Mar 2023 21:03:09 +0000 (14:03 -0700)]
Merge branch 'jk/unused-post-2.39-part2'
More work towards -Wunused.
* jk/unused-post-2.39-part2: (21 commits)
help: mark unused parameter in git_unknown_cmd_config()
run_processes_parallel: mark unused callback parameters
userformat_want_item(): mark unused parameter
for_each_commit_graft(): mark unused callback parameter
rewrite_parents(): mark unused callback parameter
fetch-pack: mark unused parameter in callback function
notes: mark unused callback parameters
prio-queue: mark unused parameters in comparison functions
for_each_object: mark unused callback parameters
list-objects: mark unused callback parameters
mark unused parameters in signal handlers
run-command: mark error routine parameters as unused
mark "pointless" data pointers in callbacks
ref-filter: mark unused callback parameters
http-backend: mark unused parameters in virtual functions
http-backend: mark argc/argv unused
object-name: mark unused parameters in disambiguate callbacks
serve: mark unused parameters in virtual functions
serve: use repository pointer to get config
ls-refs: drop config caching
...
Junio C Hamano [Fri, 17 Mar 2023 21:03:08 +0000 (14:03 -0700)]
Merge branch 'en/dir-api-cleanup'
Code clean-up to clarify directory traversal API.
* en/dir-api-cleanup:
unpack-trees: add usage notices around df_conflict_entry
unpack-trees: special case read-tree debugging as internal usage
unpack-trees: rewrap a few overlong lines from previous patch
unpack-trees: mark fields only used internally as internal
unpack_trees: start splitting internal fields from public API
sparse-checkout: avoid using internal API of unpack-trees, take 2
sparse-checkout: avoid using internal API of unpack-trees
unpack-trees: clean up some flow control
dir: mark output only fields of dir_struct as such
dir: add a usage note to exclude_per_dir
dir: separate public from internal portion of dir_struct
unpack-trees: heed requests to overwrite ignored files
t2021: fix platform-specific leftover cruft
Junio C Hamano [Fri, 17 Mar 2023 21:03:08 +0000 (14:03 -0700)]
Merge branch 'jk/fsck-indices-in-worktrees'
"git fsck" learned to check the index files in other worktrees,
just like "git gc" honors them as anchoring points.
* jk/fsck-indices-in-worktrees:
fsck: check even zero-entry index files
fsck: mention file path for index errors
fsck: check index files in all worktrees
fsck: factor out index fsck
Felipe Contreras [Tue, 28 Feb 2023 14:59:34 +0000 (08:59 -0600)]
completion: prompt: use generic colors
When the prompt command mode was introduced in 1bfc51ac81 (Allow
__git_ps1 to be used in PROMPT_COMMAND, 2012-10-10), the assumption was
that it was necessary in order to properly add colors to PS1 in bash,
but this wasn't true.
It's true that the \[ \] markers add the information needed to properly
calculate the width of the prompt, and they have to be added directly to
PS1, a function returning them doesn't work.
But that is because bash coverts the \[ \] markers in PS1 to \001 \002,
which is what readline ultimately needs in order to calculate the width.
We don't need bash to do this conversion, we can use \001 \002
ourselves, and then the prompt command mode is not necessary to display
colors.
This is what functions returning colors are supposed to do [1].
[1] http://mywiki.wooledge.org/BashFAQ/053
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Tested-by: Joakim Petersen <joak-pet@online.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 13 Mar 2023 19:54:11 +0000 (15:54 -0400)]
rebase: prefer --default-prefix to --{src,dst}-prefix for format-patch
When git-rebase invokes format-patch, it wants to make sure we use the
normal prefixes, and are not confused by diff.noprefix or similar. When
this was added in 5b220a6876f (Add --src/dst-prefix to git-formt-patch
in git-rebase.sh, 2010-09-09), we only had --src-prefix and --dst-prefix
to do so, which requires re-specifying the prefixes we expect to see.
These days we can say what we want more directly: just use the defaults.
This is a minor cleanup that should have no behavior change, but
hopefully the result expresses more clearly what the code is trying to
accomplish.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In fade728df122 (apply: fix writing behind newly created symbolic links,
2023-02-02), we backported a patch onto v2.30.* that was originally
based on a much newer version. The v2.30.* release train still has the
GETTEXT_POISON CI job, though, and hence needs `test_i18n*` in its
tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In bffc762f87ae (dir-iterator: prevent top-level symlinks without
FOLLOW_SYMLINKS, 2023-01-24), we backported a patch onto v2.30.* that
was originally based on a much newer version. The v2.30.* release train
still has the GETTEXT_POISON CI job, though, and hence needs
`test_i18n*` in its tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In cf8f6ce02a13 (clone: delay picking a transport until after
get_repo_path(), 2023-01-24), we backported a patch onto v2.30.* that
was originally based on a much newer version. The v2.30.* release train
still has the GETTEXT_POISON CI job, though, and hence needs
`test_i18n*` in its tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Jeff King [Mon, 9 Aug 2021 22:48:48 +0000 (18:48 -0400)]
range-diff: use ssize_t for parsed "len" in read_patches()
As we iterate through the buffer containing git-log output, parsing
lines, we use an "int" to store the size of an individual line. This
should be a size_t, as we have no guarantee that there is not a
malicious 2GB+ commit-message line in the output.
Overflowing this integer probably doesn't do anything _too_ terrible. We
are not using the value to size a buffer, so the worst case is probably
an out-of-bounds read from before the array. But it's easy enough to
fix.
Note that we have to use ssize_t here, since we also store the length
result from parse_git_diff_header(), which may return a negative value
for error. That function actually returns an int itself, which has a
similar overflow problem, but I'll leave that for another day. Much
of the apply.c code uses ints and should be converted as a whole; in the
meantime, a negative return from parse_git_diff_header() will be
interpreted as an error, and we'll bail (so we can't handle such a case,
but given that it's likely to be malicious anyway, the important thing
is we don't have any memory errors).
Signed-off-by: Jeff King <peff@peff.net> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 3c50032ff528 (attr: ignore overly large gitattributes files,
2022-12-01), we backported a patch onto v2.30.* that was originally
based on a much newer version. The v2.30.* release train still has the
GETTEXT_POISON CI job, though, and hence needs `test_i18n*` in its
tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Jeff King [Mon, 9 Aug 2021 22:48:39 +0000 (18:48 -0400)]
range-diff: handle unterminated lines in read_patches()
When parsing our buffer of output from git-log, we have a
find_end_of_line() helper that finds the next newline, and gives us the
number of bytes to move past it, or the size of the whole remaining
buffer if there is no newline.
But trying to handle both those cases leads to some oddities:
- we try to overwrite the newline with NUL in the caller, by writing
over line[len-1]. This is at best redundant, since the helper will
already have done so if it saw a newline. But if it didn't see a
newline, it's actively wrong; we'll overwrite the byte at the end of
the (unterminated) line.
We could solve this just dropping the extra NUL assignment in the
caller and just letting the helper do the right thing. But...
- if we see a "diff --git" line, we'll restore the newline on top of
the NUL byte, so we can pass the string to parse_git_diff_header().
But if there was no newline in the first place, we can't do this.
There's no place to put it (the current code writes a newline
over whatever byte we obliterated earlier). The best we can do is
feed the complete remainder of the buffer to the function (which is,
in fact, a string, by virtue of being a strbuf).
To solve this, the caller needs to know whether we actually found a
newline or not. We could modify find_end_of_line() to return that
information, but we can further observe that it has only one caller.
So let's just inline it in that caller.
Nobody seems to have noticed this case, probably because git-log would
never produce input that doesn't end with a newline. Arguably we could
just return an error as soon as we see that the output does not end in a
newline. But the code to do so actually ends up _longer_, mostly because
of the cleanup we have to do in handling the error.
Signed-off-by: Jeff King <peff@peff.net> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In dfa6b32b5e59 (attr: ignore attribute lines exceeding 2048 bytes,
2022-12-01), we backported a patch onto v2.30.* that was originally
based on a much newer version. The v2.30.* release train still has the
GETTEXT_POISON CI job, though, and hence needs `test_i18n*` in its
tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In e47363e5a8bd (t0033: add tests for safe.directory, 2022-04-13), we
backported a patch onto v2.30.* that was originally based on a much
newer version. The v2.30.* release train still has the GETTEXT_POISON
CI job, though, and hence needs `test_i18n*` in its tests.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Jeff King [Tue, 17 Jan 2023 03:04:48 +0000 (22:04 -0500)]
http: support CURLOPT_PROTOCOLS_STR
The CURLOPT_PROTOCOLS (and matching CURLOPT_REDIR_PROTOCOLS) flag was
deprecated in curl 7.85.0, and using it generate compiler warnings as of
curl 7.87.0. The path forward is to use CURLOPT_PROTOCOLS_STR, but we
can't just do so unilaterally, as it was only introduced less than a
year ago in 7.85.0.
Until that version becomes ubiquitous, we have to either disable the
deprecation warning or conditionally use the "STR" variant on newer
versions of libcurl. This patch switches to the new variant, which is
nice for two reasons:
- we don't have to worry that silencing curl's deprecation warnings
might cause us to miss other more useful ones
- we'd eventually want to move to the new variant anyway, so this gets
us set up (albeit with some extra ugly boilerplate for the
conditional)
There are a lot of ways to split up the two cases. One way would be to
abstract the storage type (strbuf versus a long), how to append
(strbuf_addstr vs bitwise OR), how to initialize, which CURLOPT to use,
and so on. But the resulting code looks pretty magical:
GIT_CURL_PROTOCOL_TYPE allowed = GIT_CURL_PROTOCOL_TYPE_INIT;
if (...http is allowed...)
GIT_CURL_PROTOCOL_APPEND(&allowed, "http", CURLOPT_HTTP);
and you end up with more "#define GIT_CURL_PROTOCOL_TYPE" macros than
actual code.
On the other end of the spectrum, we could just implement two separate
functions, one that handles a string list and one that handles bits. But
then we end up repeating our list of protocols (http, https, ftp, ftp).
This patch takes the middle ground. The run-time code is always there to
handle both types, and we just choose which one to feed to curl.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Jiang Xin [Fri, 25 Nov 2022 09:59:54 +0000 (17:59 +0800)]
ci: install python on ubuntu
Python is missing from the default ubuntu-22.04 runner image, which
prevents git-p4 from working. To install python on ubuntu, we need
to provide the correct package names:
* On Ubuntu 18.04 (bionic), "/usr/bin/python2" is provided by the
"python" package, and "/usr/bin/python3" is provided by the "python3"
package.
* On Ubuntu 20.04 (focal) and above, "/usr/bin/python2" is provided by
the "python2" package which has a different name from bionic, and
"/usr/bin/python3" is provided by "python3".
Since the "ubuntu-latest" runner image has a higher version, its
safe to use "python2" or "python3" package name.
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Mon, 9 Aug 2021 22:47:42 +0000 (18:47 -0400)]
range-diff: drop useless "offset" variable from read_patches()
The "offset" variable was was introduced in 44b67cb62b (range-diff:
split lines manually, 2019-07-11), but it has never done anything
useful. We use it to count up the number of bytes we've consumed, but we
never look at the result. It was probably copied accidentally from an
almost-identical loop in apply.c:find_header() (and the point of that
commit was to make use of the parse_git_diff_header() function which
underlies both).
Because the variable was set but not used, most compilers didn't seem to
notice, but the upcoming clang-14 does complain about it, via its
-Wunused-but-set-variable warning.
Signed-off-by: Jeff King <peff@peff.net> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Tue, 17 Jan 2023 03:04:44 +0000 (22:04 -0500)]
http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
The IOCTLFUNCTION option has been deprecated, and generates a compiler
warning in recent versions of curl. We can switch to using SEEKFUNCTION
instead. It was added in 2008 via curl 7.18.0; our INSTALL file already
indicates we require at least curl 7.19.4.
But there's one catch: curl says we should use CURL_SEEKFUNC_{OK,FAIL},
and those didn't arrive until 7.19.5. One workaround would be to use a
bare 0/1 here (or define our own macros). But let's just bump the
minimum required version to 7.19.5. That version is only a minor version
bump from our existing requirement, and is only a 2 month time bump for
versions that are almost 13 years old. So it's not likely that anybody
cares about the distinction.
Switching means we have to rewrite the ioctl functions into seek
functions. In some ways they are simpler (seeking is the only
operation), but in some ways more complex (the ioctl allowed only a full
rewind, but now we can seek to arbitrary offsets).
Curl will only ever use SEEK_SET (per their documentation), so I didn't
bother implementing anything else, since it would naturally be
completely untested. This seems unlikely to change, but I added an
assertion just in case.
Likewise, I doubt curl will ever try to seek outside of the buffer sizes
we've told it, but I erred on the defensive side here, rather than do an
out-of-bounds read.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Jeff King [Tue, 17 Jan 2023 03:04:38 +0000 (22:04 -0500)]
http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
The two options do exactly the same thing, but the latter has been
deprecated and in recent versions of curl may produce a compiler
warning. Since the UPLOAD form is available everywhere (it was
introduced in the year 2000 by curl 7.1), we can just switch to it.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
compat/nedmalloc/nedmalloc.c: In function 'DestroyCaches':
compat/nedmalloc/nedmalloc.c:326:12: error: the comparison will always
evaluate as 'true' for the address of 'caches'
will never be NULL [-Werror=address]
326 | if(p->caches)
| ^
compat/nedmalloc/nedmalloc.c:196:22: note: 'caches' declared here
196 | threadcache *caches[THREADCACHEMAXCACHES];
| ^~~~~~
... and it is correct, of course.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Fri, 25 Nov 2022 09:59:53 +0000 (17:59 +0800)]
ci: use the same version of p4 on both Linux and macOS
There would be a segmentation fault when running p4 v16.2 on ubuntu
22.04 which is the latest version of ubuntu runner image for github
actions.
By checking each version from [1], p4d version 21.1 and above can work
properly on ubuntu 22.04. But version 22.x will break some p4 test
cases. So p4 version 21.x is exactly the version we can use.
With this update, the versions of p4 for Linux and macOS happen to be
the same. So we can add the version number directly into the "P4WHENCE"
variable, and reuse it in p4 installation for macOS.
By removing the "LINUX_P4_VERSION" variable from "ci/lib.sh", the
comment left above has nothing to do with p4, but still applies to
git-lfs. Since we have a fixed version of git-lfs installed on Linux,
we may have a different version on macOS.
[1]: https://cdist2.perforce.com/perforce/
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Fri, 25 Nov 2022 09:59:52 +0000 (17:59 +0800)]
ci: remove the pipe after "p4 -V" to catch errors
When installing p4 as a dependency, we used to pipe output of "p4 -V"
and "p4d -V" to validate the installation and output a condensed version
information. But this would hide potential errors of p4 and would stop
with an empty output. E.g.: p4d version 16.2 running on ubuntu 22.04
causes sigfaults, even before it produces any output.
By removing the pipe after "p4 -V" and "p4d -V", we may get a
verbose output, and stop immediately on errors because we have "set
-e" in "ci/lib.sh". Since we won't look at these trace logs unless
something fails, just including the raw output seems most sensible.
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Fri, 25 Nov 2022 09:59:51 +0000 (17:59 +0800)]
github-actions: run gcc-8 on ubuntu-20.04 image
GitHub starts to upgrade its runner image "ubuntu-latest" from version
"ubuntu-20.04" to version "ubuntu-22.04". It will fail to find and
install "gcc-8" package on the new runner image.
Change the runner image of the `linux-gcc` job from "ubuntu-latest" to
"ubuntu-20.04" in order to install "gcc-8" as a dependency.
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git for Windows' SDK recently upgraded to GCC v12.x which points out
that the `pos` variable might be used even after the corresponding
memory was `realloc()`ed and therefore potentially no longer valid.
Since a subset of this SDK is used in Git's CI/PR builds, we need to fix
this to continue to be able to benefit from the CI/PR runs.
Note: This bug has been with us since 2a6b149c64f6 (mingw: avoid using
strbuf in syslog, 2011-10-06), and while it looks tempting to replace
the hand-rolled string manipulation with a `strbuf`-based one, that
commit's message explains why we cannot do that: The `syslog()` function
is called as part of the function in `daemon.c` which is set as the
`die()` routine, and since `strbuf_grow()` can call that function if it
runs out of memory, this would cause a nasty infinite loop that we do
not want to re-introduce.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adam Johnson [Fri, 10 Mar 2023 13:04:33 +0000 (13:04 +0000)]
ls-files: fix "--format" output of relative paths
Fix a bug introduced with the "--format" option in ce74de93 (ls-files: introduce "--format" option, 2022-07-23),
where relative paths were computed using the output buffer,
which could lead to random garbage data in the output.
Signed-off-by: Adam Johnson <me@adamj.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When accepting a packfile in git-receive-pack(1), we feed that packfile
into git-index-pack(1) to generate the packfile index. As the packfile
would often only contain unreachable objects until the references have
been updated, concurrently running garbage collection might be tempted
to delete the packfile right away and thus cause corruption. To fix
this, we ask git-index-pack(1) to create a `.keep` file before moving
the packfile into place, which is getting deleted again once all of the
reference updates have been processed.
Now in production systems we have observed that those `.keep` files are
sometimes not getting deleted as expected, where the result is that
repositories tend to grow packfiles that are never deleted over time.
This seems to be caused by a race when git-receive-pack(1) is killed
after we have migrated the kept packfile from the quarantine directory
into the main object database. While this race window is typically small
it can be extended for example by installing a `proc-receive` hook.
Fix this race by registering the lockfile as a tempfile so that it will
automatically be removed at exit or when receiving a signal.
Helped-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the list of files as input was implemented in 6508eedf67
(t/aggregate-results: accomodate systems with small max argument list
length, 2010-06-01), a much simpler solution wasn't considered.
Let's just pass the directory as an argument.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>