Bo Anderson [Thu, 15 Feb 2024 01:03:56 +0000 (01:03 +0000)]
t/lib-credential: clean additional credential
71201ab0e5 (t/lib-credential.sh: ensure credential helpers handle long
headers, 2023-05-01) added a test which stores credentials with the host
victim.example.com but this was never cleaned up, leaving residual data
in the credential store after running the tests.
Add a cleanup call for this credential to resolve this issue.
Signed-off-by: Bo Anderson <mail@boanderson.me> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t7003: ensure filter-branch prunes reflogs with the reftable backend
In t7003 we conditionally check whether the reflog for branches pruned
by git-filter-branch(1) get deleted based on whether or not we use the
"files" backend. Same as with the preceding commit, this condition was
added because in its initial iteration the "reftable" backend did not
delete reflogs when their corresponding ref was deleted. Since then, the
backend has been aligned to behave the same as the "files" backend
though, which makes this check unnecessary.
Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t2011: exercise D/F conflicts with HEAD with the reftable backend
Some of the tests in t2011 exercise whether it is possible to move away
from a symbolic HEAD ref whose target ref has a directory-file conflict
with another, preexisting ref. These tests don't use git-symbolic-ref(1)
but manually write HEAD. This is supposedly done to avoid using logic
that we're about to exercise, but it makes it impossible to verify
whether the logic also works for ref backends other than "files".
Refactor the code to use git-symbolic-ref(1) instead so that the tests
work with the "reftable" backend, as well. We already have lots of tests
in t1404 that ensure that both git-update-ref(1) and git-symbolic-ref(1)
work in such a scenario, so it should be safe to rely on it here.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 5e00514745 (t1405: explictly delete reflogs for reftable, 2022-01-31)
we have added a test that explicitly deletes the reflog when not using
the "files" backend. This was required because back then, the "reftable"
backend didn't yet delete reflogs when deleting their corresponding
branches, and thus subsequent tests would fail because some unexpected
reflogs still exist.
The "reftable" backend was eventually changed though so that it behaves
the same as the "files" backend and deletes reflogs when deleting refs.
This was done to make the "reftable" backend behave like the "files"
backend as closely as possible so that it can act as a drop-in
replacement.
The cleanup-style test is thus not required anymore. Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t1404: make D/F conflict tests compatible with reftable backend
Some of the tests in t1404 exercise whether Git correctly aborts
transactions when there is a directory/file conflict with ref names.
While these tests are all marked to require the "files" backend, they do
in fact apply to the "reftable" backend as well.
This may not make much sense on the surface: D/F conflicts only exist
because the "files" backend uses the filesystem to store loose refs, and
thus the restriction theoretically shouldn't apply to the "reftable"
backend. But for now, the "reftable" backend artificially restricts the
creation of such conflicting refs so that it is a drop-in replacement
for the "files" backend. This also ensures that the "reftable" backend
can easily be used on the server side without causing issues for clients
which only know to use the "files" backend.
The only difference between the "files" and "reftable" backends is a
slightly different error message. Adapt the tests to accomodate for this
difference and remove the REFFILES prerequisite so that we start testing
with both backends.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t1400: exercise reflog with gaps with reftable backend
In t1400, we have a test that exercises whether we print a warning
message as expected when the reflog contains entries which have a gap
between the old entry's new object ID and the new entry's old object ID.
While the logic should apply to all ref backends, the test setup writes
into `.git/logs` directly and is thus "files"-backend specific.
Refactor the test to instead use `git reflog delete` to create the gap
and drop the REFFILES prerequisite.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t0410: convert tests to use DEFAULT_REPO_FORMAT prereq
In t0410 we have two tests which exercise how partial clones behave in
the context of a repository with extensions. These tests are marked to
require a repository using SHA1 and the "files" backend because we
explicitly set the repository format version to 0, and setting up either
the "objectFormat" or "refStorage" extensions requires a repository
format version of 1.
We have recently introduced a new DEFAULT_REPO_FORMAT prerequisite.
Despite capturing the intent more directly, it also has the added
benefit that it can easily be extended in the future in case we add new
repository extensions. Adapt the tests to use it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Thu, 15 Feb 2024 01:48:25 +0000 (09:48 +0800)]
Merge branch 'master' of github.com:git/git
* 'master' of github.com:git/git: (51 commits)
Hopefully the last batch of fixes before 2.44 final
Git 2.43.2
A few more fixes before -rc1
write-or-die: fix the polarity of GIT_FLUSH environment variable
A few more topics before -rc1
completion: add and use __git_compute_second_level_config_vars_for_section
completion: add and use __git_compute_first_level_config_vars_for_section
completion: complete 'submodule.*' config variables
completion: add space after config variable names also in Bash 3
receive-pack: use find_commit_header() in check_nonce()
ci(linux32): add a note about Actions that must not be updated
ci: bump remaining outdated Actions versions
unit-tests: do show relative file paths on non-Windows, too
receive-pack: use find_commit_header() in check_cert_push_options()
prune: mark rebase autostash and orig-head as reachable
sequencer: unset GIT_CHERRY_PICK_HELP for 'exec' commands
ref-filter.c: sort formatted dates by byte value
ssh signing: signal an error with a negative return value
bisect: document command line arguments for "bisect start"
bisect: document "terms" subcommand more fully
...
Junio C Hamano [Wed, 14 Feb 2024 23:36:06 +0000 (15:36 -0800)]
Merge branch 'pb/complete-config'
The command line completion script (in contrib/) learned to
complete configuration variable names better.
* pb/complete-config:
completion: add and use __git_compute_second_level_config_vars_for_section
completion: add and use __git_compute_first_level_config_vars_for_section
completion: complete 'submodule.*' config variables
completion: add space after config variable names also in Bash 3
Junio C Hamano [Wed, 14 Feb 2024 23:36:05 +0000 (15:36 -0800)]
Merge branch 'rs/receive-pack-remove-find-header'
Code simplification.
* rs/receive-pack-remove-find-header:
receive-pack: use find_commit_header() in check_nonce()
receive-pack: use find_commit_header() in check_cert_push_options()
Junio C Hamano [Wed, 14 Feb 2024 23:36:05 +0000 (15:36 -0800)]
Merge branch 'pw/gc-during-rebase'
The sequencer machinery does not use the ref API and instead
records names of certain objects it needs for its correct operation
in temporary files, which makes these objects susceptible to loss
by garbage collection. These temporary files have been added as
starting points for reachability analysis to fix this.
* pw/gc-during-rebase:
prune: mark rebase autostash and orig-head as reachable
Chandra Pratap [Wed, 14 Feb 2024 17:50:48 +0000 (17:50 +0000)]
t9146: replace test -d/-e/-f with appropriate test_path_is_* function
The helper functions test_path_is_* provide better debugging
information than test -d/-e/-f.
Replace "if ! test -d then <error message>" and "test -d" with
"test_path_is_dir" at places where we check for existent directories.
Replace "test -f" with "test_path_is_file" at places where we check
for existent files.
Replace "test ! -e" and "if test -d then <error message>" with
"test_path_is_missing" where we check for non-existent directories.
Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 14 Feb 2024 18:44:01 +0000 (10:44 -0800)]
doc: add shortcut to "am --whitespace=<action>"
We refer readers of "git am --help" to "git apply --help" for many
options that are passed through, and most of them are simple
booleans, but --whitespace takes from a set of actions whose names
may slip users' minds. Give a list of them in "git am --help" to
reduce one level of redirection only to find out what they are.
In the helper function to parse the available options, there was a
helpful comment reminding the developer to update list of <action>s
in the completion script. Mention the two documentation pages there
as well.
Christian Couder [Wed, 14 Feb 2024 14:25:13 +0000 (15:25 +0100)]
rev-list: allow missing tips with --missing=[print|allow*]
In 9830926c7d (rev-list: add commit object support in `--missing`
option, 2023-10-27) we fixed the `--missing` option in `git rev-list`
so that it works with with missing commits, not just blobs/trees.
Unfortunately, such a command would still fail with a "fatal: bad
object <oid>" if it is passed a missing commit, blob or tree as an
argument (before the rev walking even begins).
When such a command is used to find the dependencies of some objects,
for example the dependencies of quarantined objects (see the
"QUARANTINE ENVIRONMENT" section in the git-receive-pack(1)
documentation), it would be better if the command would instead
consider such missing objects, especially commits, in the same way as
other missing objects.
If, for example `--missing=print` is used, it would be nice for some
use cases if the missing tips passed as arguments were reported in
the same way as other missing objects instead of the command just
failing.
We could introduce a new option to make it work like this, but most
users are likely to prefer the command to have this behavior as the
default one. Introducing a new option would require another dumb loop
to look for that option early, which isn't nice.
Also we made `git rev-list` work with missing commits very recently
and the command is most often passed commits as arguments. So let's
consider this as a bug fix related to these recent changes.
While at it let's add a NEEDSWORK comment to say that we should get
rid of the existing ugly dumb loops that parse the
`--exclude-promisor-objects` and `--missing=...` options early.
Helped-by: Linus Arver <linusa@google.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Wed, 14 Feb 2024 14:25:11 +0000 (15:25 +0100)]
oidset: refactor oidset_insert_from_set()
In a following commit, we will need to add all the oids from a set into
another set. In "list-objects-filter.c", there is already a static
function called add_all() to do that.
Let's rename this function oidset_insert_from_set() and move it into
oidset.{c,h} to make it generally available.
While at it, let's remove a useless `!= NULL`.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Christian Couder [Wed, 14 Feb 2024 14:25:10 +0000 (15:25 +0100)]
revision: clarify a 'return NULL' in get_reference()
When we know a pointer variable is NULL, it's clearer to
explicitly return NULL than to return that variable.
In get_reference(), when 'object' is NULL, we already return NULL
when 'revs->exclude_promisor_objects && is_promisor_object(oid)' is
true, but we return 'object' when 'revs->ignore_missing' is true.
Let's make the code clearer and more uniform by also explicitly
returning NULL when 'revs->ignore_missing' is true.
Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jiang Xin [Wed, 14 Feb 2024 08:46:41 +0000 (16:46 +0800)]
diff: mark param1 and param2 as placeholders
Some l10n translators translated the parameters "files", "param1" and
"param2" in the following message:
"synonym for --dirstat=files,param1,param2..."
Translating "param1" and "param2" is OK, but changing the parameter
"files" is wrong. The parameters that are not meant to be used verbatim
should be marked as placeholders, but the verbatim parameter not marked
as a placeholder should be left as is.
This change is a complement for commit 51e846e673 (doc: enforce
placeholders in documentation, 2023-12-25).
With the help of Jean-Noël,some parameter combinations in one
placeholder (e.g. "<param1,param2>...") are splited into seperate
placeholders.
Helped-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Jiang Xin <worldhello.net@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 13 Feb 2024 22:31:11 +0000 (14:31 -0800)]
Merge branch 'jc/unit-tests-make-relative-fix'
The mechanism to report the filename in the source code, used by
the unit-test machinery, assumed that the compiler expanded __FILE__
to the path to the source given to the $(CC), but some compilers
give full path, breaking the output. This has been corrected.
* jc/unit-tests-make-relative-fix:
unit-tests: do show relative file paths on non-Windows, too
The Perl version of the add -i/-p commands has been removed since 20b813d (add: remove "add.interactive.useBuiltin" & Perl "git
add--interactive", 2023-02-07)
Therefore, Perl prerequisite in the test scripts which use the patch
mode functionality is not neccessary.
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, (restore, checkout, reset) commands correctly take '@' as a
synonym for 'HEAD'. However, in patch mode different prompts/messages
are given on command line due to patch mode machinery not considering
'@' to be a synonym for 'HEAD' due to literal string comparison with
the word 'HEAD', and therefore assigning patch_mode_($command)_nothead
and triggering reverse mode (-R in diff-index). The NEEDSWORK comment
suggested comparing commit objects to get around this. However, doing
so would also take a non-checked out branch pointing to the same commit
as HEAD, as HEAD. This would cause confusion to the user.
Therefore, after parsing '@', replace it with 'HEAD' as reasonably
early as possible. This also solves another problem of disparity
between 'git checkout HEAD' and 'git checkout @' (latter detaches at
the HEAD commit and the former does not).
Trade-offs:
- Some of the errors would show the revision argument as 'HEAD' when
given '@'. This should be fine, as most users who probably use '@'
would be aware that it is a shortcut for 'HEAD' and most probably
used to use 'HEAD'. There is also relevant documentation in
'gitrevisions' manpage about '@' being the shortcut for 'HEAD'. Also,
the simplicity of the solution far outweighs this cost.
- Consider '@' as a shortcut for 'HEAD' even if 'refs/heads/@' exists
at a different commit. Naming a branch '@' is an obvious foot-gun and
many existing commands already take '@' for 'HEAD' even if
'refs/heads/@' exists at a different commit or does not exist at all
(e.g. 'git log @', 'git push origin @' etc.). Therefore this is an
existing assumption and should not be a problem.
Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 8 Feb 2024 23:17:31 +0000 (15:17 -0800)]
git: --no-lazy-fetch option
Sometimes, especially during tests of low level machinery, it is
handy to have a way to disable lazy fetching of objects. This
allows us to say, for example, "git cat-file -e <object-name>", to
see if the object is locally available.
Junio C Hamano [Tue, 13 Feb 2024 19:48:15 +0000 (11:48 -0800)]
write-or-die: fix the polarity of GIT_FLUSH environment variable
When GIT_FLUSH is set to 1, true, on, yes, then we should disable
skip_stdout_flush, but the conversion somehow did the opposite.
With the understanding of the original motivation behind "skip" in 06f59e9f (Don't fflush(stdout) when it's not helpful, 2007-06-29),
we can sympathize with the current naming (we wanted to avoid
useless flushing of stdout by default, with an escape hatch to
always flush), but it is still not a good excuse.
Retire the "skip_stdout_flush" variable and replace it with "flush_stdout"
that tells if we do or do not want to run fflush().
Reported-by: Xiaoguang WANG <wxiaoguang@gmail.com> Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git branch" and friends learned to use the formatted text as
sorting key, not the underlying timestamp value, when the --sort
option is used with author or committer timestamp with a format
specifier (e.g., "--sort=creatordate:format:%H:%M:%S").
* vd/for-each-ref-sort-with-formatted-timestamp:
ref-filter.c: sort formatted dates by byte value
Junio C Hamano [Mon, 12 Feb 2024 21:16:10 +0000 (13:16 -0800)]
Merge branch 'bk/complete-bisect'
Command line completion support (in contrib/) has been
updated for "git bisect".
* bk/complete-bisect:
completion: bisect: recognize but do not complete view subcommand
completion: bisect: complete log opts for visualize subcommand
completion: new function __git_complete_log_opts
completion: bisect: complete missing --first-parent and - -no-checkout options
completion: bisect: complete custom terms and related options
completion: bisect: complete bad, new, old, and help subcommands
completion: tests: always use 'master' for default initial branch name
Junio C Hamano [Mon, 12 Feb 2024 21:16:10 +0000 (13:16 -0800)]
Merge branch 'ps/reftable-styles'
Code clean-up in various reftable code paths.
* ps/reftable-styles:
reftable/record: improve semantics when initializing records
reftable/merged: refactor initialization of iterators
reftable/merged: refactor seeking of records
reftable/stack: use `size_t` to track stack length
reftable/stack: use `size_t` to track stack slices during compaction
reftable/stack: index segments with `size_t`
reftable/stack: fix parameter validation when compacting range
reftable: introduce macros to allocate arrays
reftable: introduce macros to grow arrays
Write multi-level indices for reftable has been corrected.
* ps/reftable-multi-level-indices-fix:
reftable: document reading and writing indices
reftable/writer: fix writing multi-level indices
reftable/writer: simplify writing index records
reftable/writer: use correct type to iterate through index entries
reftable/reader: be more careful about errors in indexed seeks
Junio C Hamano [Mon, 12 Feb 2024 18:09:19 +0000 (10:09 -0800)]
Merge branch 'ps/reftable-backend' into kn/for-all-refs
* ps/reftable-backend:
refs/reftable: fix leak when copying reflog fails
ci: add jobs to test with the reftable backend
refs: introduce reftable backend
Philippe Blain [Sat, 10 Feb 2024 18:32:23 +0000 (18:32 +0000)]
completion: add and use __git_compute_second_level_config_vars_for_section
In a previous commit we removed some hardcoded config variable names from
function __git_complete_config_variable_name in the completion script by
introducing a new function,
__git_compute_first_level_config_vars_for_section.
The remaining hardcoded config variables are "second level"
configuration variables, meaning 'branch.<name>.upstream',
'remote.<name>.url', etc. where <name> is a user-defined name.
Making use of the new existing --config flag to 'git help', add a new
function, __git_compute_second_level_config_vars_for_section. This
function takes as argument a config section name and computes the
corresponding second-level config variables, i.e. those that contain a
'<' which indicates the start of a placeholder. Note that as in
__git_compute_first_level_config_vars_for_section added previsouly, we
use indirect expansion instead of associative arrays to stay compatible
with Bash 3 on which macOS is stuck for licensing reasons.
As explained in the previous commit, we use the existing pattern in the
completion script of using global variables to cache the list of
variables for each section.
Use this new function and the variables it defines in
__git_complete_config_variable_name to remove hardcoded config
variables, and add a test to verify the new function. Use a single
'case' for all sections with second-level variables names, since the
code for each of them is now exactly the same.
Adjust the name of a test added in a previous commit to reflect that it
now tests the added function.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Sat, 10 Feb 2024 18:32:22 +0000 (18:32 +0000)]
completion: add and use __git_compute_first_level_config_vars_for_section
The function __git_complete_config_variable_name in the Bash completion
script hardcodes several config variable names. These variables are
those in config sections where user-defined names can appear, such as
"branch.<name>". These sections are treated first by the case statement,
and the two last "catch all" cases are used for other sections, making
use of the __git_compute_config_vars and __git_compute_config_sections
function, which omit listing any variables containing wildcards or
placeholders. Having hardcoded config variables introduces the risk of
the completion code becoming out of sync with the actual config
variables accepted by Git.
To avoid these hardcoded config variables, introduce a new function,
__git_compute_first_level_config_vars_for_section, making use of the
existing __git_config_vars variable. This function takes as argument a
config section name and computes the matching "first level" config
variables for that section, i.e. those _not_ containing any placeholder,
like 'branch.autoSetupMerge, 'remote.pushDefault', etc. Use this
function and the variables it defines in the 'branch.*', 'remote.*' and
'submodule.*' switches of the case statement instead of hardcoding the
corresponding config variables. Note that we use indirect expansion to
create a variable for each section, instead of using a single
associative array indexed by section names, because associative arrays
are not supported in Bash 3, on which macOS is stuck for licensing
reasons.
Use the existing pattern in the completion script of using global
variables to cache the list of config variables for each section. The
rationale for such caching is explained in eaa4e6ee2a (Speed up bash
completion loading, 2009-11-17), and the current approach to using and
defining them via 'test -n' is explained in cf0ff02a38 (completion: work
around zsh option propagation bug, 2012-02-02).
Adjust the name of one of the tests added in the previous commit,
reflecting that it now also tests the new function.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the Bash completion script, function
__git_complete_config_variable_name completes config variables and has
special logic to deal with config variables involving user-defined
names, like branch.<name>.* and remote.<name>.*.
This special logic is missing for submodule-related config variables.
Add the appropriate branches to the case statement, making use of the
in-tree '.gitmodules' to list relevant submodules.
Add corresponding tests in t9902-completion.sh, making sure we complete
both first level submodule config variables as well as second level
variables involving submodule names.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Sat, 10 Feb 2024 18:32:20 +0000 (18:32 +0000)]
completion: add space after config variable names also in Bash 3
In be6444d1ca (completion: bash: add correct suffix in variables,
2021-08-16), __git_complete_config_variable_name was changed to use
"${sfx- }" instead of "$sfx" as the fourth argument of _gitcomp_nl and
_gitcomp_nl_append, such that this argument evaluates to a space if sfx
is unset. This was to ensure that e.g.
git config branch.autoSetupMe[TAB]
correctly completes to 'branch.autoSetupMerge ' with the trailing space.
This commits notes that the fix only works in Bash 4 because in Bash 3
the 'local sfx' construct at the beginning of
__git_complete_config_variable_name creates an empty string.
Make the fix also work for Bash 3 by using the "unset or null' parameter
expansion syntax ("${sfx:- }"), such that the parameter is also expanded
to a space if it is set but null, as is the behaviour of 'local sfx' in
Bash 3.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Sat, 10 Feb 2024 07:43:01 +0000 (08:43 +0100)]
use xstrncmpz()
Add and apply a semantic patch for calling xstrncmpz() to compare a
NUL-terminated string with a buffer of a known length instead of using
strncmp() and checking the terminating NUL explicitly. This simplifies
callers by reducing code duplication.
I had to adjust remote.c manually because Coccinelle inexplicably
changed the indent of the else branches.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Fri, 9 Feb 2024 20:41:47 +0000 (21:41 +0100)]
receive-pack: use find_commit_header() in check_nonce()
Use the public function find_commit_header() and remove find_header(),
as it becomes unused. This is safe and appropriate because we pass the
NUL-terminated payload buffer to check_nonce() instead of its start and
length. The underlying strbuf push_cert cannot contain NULs, as it is
built using strbuf_addstr(), only.
We no longer need to call strlen(), as find_commit_header() returns the
length of nonce already.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/reader: add comments to `table_iter_next()`
While working on the optimizations in the preceding patches I stumbled
upon `table_iter_next()` multiple times. It is quite easy to miss the
fact that we don't call `table_iter_next_in_block()` twice, but that the
second call is in fact `table_iter_next_block()`.
Add comments to explain what exactly is going on here to make things
more obvious. While at it, touch up the code to conform to our code
style better.
Note that one of the refactorings merges two conditional blocks into
one. Before, we had the following code:
reftable/record: don't try to reallocate ref record name
When decoding reftable ref records we first release the pointer to the
record passed to us and then use realloc(3P) to allocate the refname
array. This is a bit misleading though as we know at that point that the
refname will always be `NULL`, so we would always end up allocating a
new char array anyway.
Refactor the code to use `REFTABLE_ALLOC_ARRAY()` instead. As the
following benchmark demonstrates this is a tiny bit more efficient. But
the bigger selling point really is the gained clarity.
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 150.1 ms ± 4.1 ms [User: 146.6 ms, System: 3.3 ms]
Range (min … max): 144.5 ms … 180.5 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 148.9 ms ± 4.5 ms [User: 145.2 ms, System: 3.4 ms]
Range (min … max): 143.0 ms … 185.4 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)
Ideally, we should try and reuse the memory of the old record instead of
first freeing and then immediately reallocating it. This requires some
more surgery though and is thus left for a future iteration.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When iterating towards the next record in a reftable block we need to
keep track of the key that the last record had. This is required because
reftable records use prefix compression, where subsequent records may
reuse parts of their preceding record's key.
This key is stored in the `block_iter::last_key`, which we update after
every call to `block_iter_next()`: we simply reset the buffer and then
add the current key to it.
This is a bit inefficient though because it requires us to copy over the
key on every iteration, which adds up when iterating over many records.
Instead, we can make use of the fact that the `block_iter::key` buffer
is basically only a scratch buffer. So instead of copying over contents,
we can just swap both buffers.
The following benchmark prints a single ref matching a specific pattern
out of 1 million refs via git-show-ref(1):
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 155.7 ms ± 5.0 ms [User: 152.1 ms, System: 3.4 ms]
Range (min … max): 150.8 ms … 185.7 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 150.8 ms ± 4.2 ms [User: 147.1 ms, System: 3.5 ms]
Range (min … max): 145.1 ms … 180.7 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.03 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/pq: allocation-less comparison of entry keys
The priority queue is used by the merged iterator to iterate over
reftable records from multiple tables in the correct order. The queue
ends up having one record for each table that is being iterated over,
with the record that is supposed to be shown next at the top. For
example, the key of a ref record is equal to its name so that we end up
sorting the priority queue lexicographically by ref name.
To figure out the order we need to compare the reftable record keys with
each other. This comparison is done by formatting them into a `struct
strbuf` and then doing `strbuf_strcmp()` on the result. We then discard
the buffers immediately after the comparison.
This ends up being very expensive. Because the priority queue usually
contains as many records as we have tables, we call the comparison
function `O(log($tablecount))` many times for every record we insert.
Furthermore, when iterating over many refs, we will insert at least one
record for every ref we are iterating over. So ultimately, this ends up
being called `O($refcount * log($tablecount))` many times.
Refactor the code to use the new `refatble_record_cmp()` function that
has been implemented in a preceding commit. This function does not need
to allocate memory and is thus significantly more efficient.
The following benchmark prints a single ref matching a specific pattern
out of 1 million refs via git-show-ref(1), where the reftable stack
consists of three tables:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 224.4 ms ± 6.5 ms [User: 220.6 ms, System: 3.6 ms]
Range (min … max): 216.5 ms … 261.1 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 172.9 ms ± 4.4 ms [User: 169.2 ms, System: 3.6 ms]
Range (min … max): 166.5 ms … 204.6 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.30 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/merged: skip comparison for records of the same subiter
When retrieving the next entry of a merged iterator we need to drop all
records of other sub-iterators that would be shadowed by the record that
we are about to return. We do this by comparing record keys, dropping
all keys that are smaller or equal to the key of the record we are about
to return.
There is an edge case here where we can skip that comparison: when the
record in the priority queue comes from the same subiterator as the
record we are about to return then we know that its key must be larger
than the key of the record we are about to return. This property is
guaranteed by the sub-iterators, and if it didn't hold then the whole
merged iterator would return records in the wrong order, too.
While this may seem like a very specific edge case it's in fact quite
likely to happen. For most repositories out there you can assume that we
will end up with one large table and several smaller ones on top of it.
Thus, it is very likely that the next entry will sort towards the top of
the priority queue.
Special case this and break out of the loop in that case. The following
benchmark uses git-show-ref(1) to print a single ref matching a pattern
out of 1 million refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 162.6 ms ± 4.5 ms [User: 159.0 ms, System: 3.5 ms]
Range (min … max): 156.6 ms … 188.5 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 156.8 ms ± 4.7 ms [User: 153.0 ms, System: 3.6 ms]
Range (min … max): 151.4 ms … 188.4 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.04 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/merged: allocation-less dropping of shadowed records
The purpose of the merged reftable iterator is to iterate through all
entries of a set of tables in the correct order. This is implemented by
using a sub-iterator for each table, where the next entry of each of
these iterators gets put into a priority queue. For each iteration, we
do roughly the following steps:
1. Retrieve the top record of the priority queue. This is the entry we
want to return to the caller.
2. Retrieve the next record of the sub-iterator that this record came
from. If any, add it to the priority queue at the correct position.
The position is determined by comparing the record keys, which e.g.
corresponds to the refname for ref records.
3. Keep removing the top record of the priority queue until we hit the
first entry whose key is larger than the returned record's key.
This is required to drop "shadowed" records.
The last step will lead to at least one comparison to the next entry,
but may lead to many comparisons in case the reftable stack consists of
many tables with shadowed records. It is thus part of the hot code path
when iterating through records.
The code to compare the entries with each other is quite inefficient
though. Instead of comparing record keys with each other directly, we
first format them into `struct strbuf`s and only then compare them with
each other. While we already optimized this code path to reuse buffers
in 829231dc20 (reftable/merged: reuse buffer to compute record keys,
2023-12-11), the cost to format the keys into the buffers still adds up
quite significantly.
Refactor the code to use `reftable_record_cmp()` instead, which has been
introduced in the preceding commit. This function compares records with
each other directly without requiring any memory allocations or copying
and is thus way more efficient.
The following benchmark uses git-show-ref(1) to print a single ref
matching a pattern out of 1 million refs. This is the most direct way to
exercise ref iteration speed as we remove all overhead of having to show
the refs, too.
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 180.7 ms ± 4.7 ms [User: 177.1 ms, System: 3.4 ms]
Range (min … max): 174.9 ms … 211.7 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 162.1 ms ± 4.4 ms [User: 158.5 ms, System: 3.4 ms]
Range (min … max): 155.4 ms … 189.3 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.11 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/record: introduce function to compare records by key
In some places we need to sort reftable records by their keys to
determine their ordering. This is done by first formatting the keys into
a `struct strbuf` and then using `strbuf_cmp()` to compare them. This
logic is needlessly roundabout and can end up costing quite a bit of CPU
cycles, both due to the allocation and formatting logic.
Introduce a new `reftable_record_cmp()` function that knows how to
compare two records with each other without requiring allocations.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ci(linux32): add a note about Actions that must not be updated
The Docker container used by the `linux32` job comes without Node.js,
and therefore the `actions/checkout` and `actions/upload-artifact`
Actions cannot be upgraded to the latest versions (because they use
Node.js).
One time too many, I accidentally tried to update them, where
`actions/checkout` at least fails immediately, but the
`actions/upload-artifact` step is only used when any test fails, and
therefore the CI run usually passes even though that Action was updated
to a version that is incompatible with the Docker container in which
this job runs.
So let's add a big fat warning, mainly for my own benefit, to avoid
running into the very same issue over and over again.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
After activating automatic Dependabot updates in the
git-for-windows/git repository, Dependabot noticed a couple
of yet-unaddressed updates. They avoid "Node.js 16 Actions"
deprecation messages by bumping the following Actions'
versions:
- actions/upload-artifact from 3 to 4
- actions/download-artifact from 3 to 4
- actions/cache from 3 to 4
Helped-by: Matthias Aßhauer <mha1993@live.de> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Sun, 11 Feb 2024 15:58:04 +0000 (07:58 -0800)]
unit-tests: do show relative file paths on non-Windows, too
There are compilers other than Visual C that want to show absolute
paths. Generalize the helper introduced by a2c5e294 (unit-tests: do
show relative file paths, 2023-09-25) so that it can also work with
a path that uses slash as the directory separator, and becomes
almost no-op once one-time preparation finds out that we are using a
compiler that already gives relative paths. Incidentally, this also
should do the right thing on Windows with a compiler that shows
relative paths but with backslash as the directory separator (if
such a thing exists and is used to build git).
Reported-by: Randall S. Becker <rsbecker@nexbridge.com> Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
René Scharfe [Fri, 9 Feb 2024 20:36:40 +0000 (21:36 +0100)]
receive-pack: use find_commit_header() in check_cert_push_options()
Use the public function find_commit_header() instead of find_header() to
simplify the code. This is possible and safe because we're operating on
a strbuf, which is always NUL-terminated, so there is no risk of running
over the end of the buffer. It cannot contain NUL within the buffer, as
it is built using strbuf_addstr(), only.
The string comparison becomes more complicated because we need to check
for NUL explicitly after comparing the length-limited option, but on the
flip side we don't need to clean up allocations or track the remaining
buffer length.
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>