Junio C Hamano [Wed, 15 May 2024 16:52:52 +0000 (09:52 -0700)]
Merge branch 'js/unit-test-suite-runner'
The "test-tool" has been taught to run testsuite tests in parallel,
bypassing the need to use the "prove" tool.
* js/unit-test-suite-runner:
cmake: let `test-tool` run the unit tests, too
ci: use test-tool as unit test runner on Windows
t/Makefile: run unit tests alongside shell tests
unit tests: add rule for running with test-tool
test-tool run-command testsuite: support unit tests
test-tool run-command testsuite: remove hardcoded filter
test-tool run-command testsuite: get shell from env
t0080: turn t-basic unit test into a helper
Pseudorefs are not stored in the ref database as by definition, they
carry additional metadata that essentially makes them not a ref. As
such, writing pseudorefs via the ref backend does not make any sense
whatsoever as the ref backend wouldn't know how exactly to store the
data.
Restrict writing pseudorefs via the ref backend.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
ref-filter: properly distinuish pseudo and root refs
The ref-filter interfaces currently define root refs as either a
detached HEAD or a pseudo ref. Pseudo refs aren't root refs though, so
let's properly distinguish those ref types.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `is_root_ref()` function will happily clarify a pseudoref as a root
ref, even though pseudorefs are no refs. Next to being wrong, it also
leads to inconsistent behaviour across ref backends: while the "files"
backend accidentally knows to parse those pseudorefs and thus yields
them to the caller, the "reftable" backend won't ever see the pseudoref
at all because they are never stored in the "reftable" backend.
Fix this issue by filtering out pseudorefs in `is_root_ref()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Root refs are those refs that live in the root of the ref hierarchy.
Our old and venerable "HEAD" reference falls into this category, but we
don't yet classify it as such in `is_root_ref()`.
Adapt the function to also treat "HEAD" as a root ref. This change is
safe to do for all current callers:
- `ref_kind_from_refname()` already handles "HEAD" explicitly before
calling `is_root_ref()`.
- The "files" and "reftable" backends explicitly call both
`is_root_ref()` and `is_headref()` together.
This also aligns behaviour or `is_root_ref()` and `is_headref()` such
that we stop checking for ref existence. This changes semantics for our
backends:
- In the reftable backend we already know that the ref must exist
because `is_headref()` is called as part of the ref iterator. The
existence check is thus redundant, and the change is safe to do.
- In the files backend we use it when populating root refs, where we
would skip adding the "HEAD" file if it was not possible to resolve
it. The new behaviour is to instead mark "HEAD" as broken, which
will cause us to emit warnings in various places.
As there are no callers of `is_headref()` left afer the refactoring, we
can absorb it completely into `is_root_ref()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: do not check ref existence in `is_root_ref()`
Before this patch series, root refs except for "HEAD" and our special
refs were classified as pseudorefs. Furthermore, our terminology
clarified that pseudorefs must not be symbolic refs. This restriction
is enforced in `is_root_ref()`, which explicitly checks that a supposed
root ref resolves to an object ID without recursing.
This has been extremely confusing right from the start because (in old
terminology) a ref name may sometimes be a pseudoref and sometimes not
depending on whether it is a symbolic or regular ref. This behaviour
does not seem reasonable at all and I very much doubt that it results in
anything sane.
Last but not least, the current behaviour can actually lead to a
segfault when calling `is_root_ref()` with a reference that either does
not exist or that is a symbolic ref because we never initialized `oid`,
but then read it via `is_null_oid()`.
We have now changed terminology to clarify that pseudorefs are really
only "MERGE_HEAD" and "FETCH_HEAD", whereas all the other refs that live
in the root of the ref hierarchy are just plain refs. Thus, we do not
need to check whether the ref is symbolic or not. In fact, we can now
avoid looking up the ref completely as the name is sufficient for us to
figure out whether something would be a root ref or not.
This change of course changes semantics for our callers. As there are
only three of them we can assess each of them individually:
- "ref-filter.c:ref_kind_from_refname()" uses it to classify refs.
It's clear that the intent is to classify based on the ref name,
only.
- "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to
filter root refs. Again, using existence checks is pointless here as
the iterator has just surfaced the ref, so we know it does exist.
- "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to
determine whether it should add a ref to the root directory of its
iterator. This had the effect that we skipped over any files that
are either a symbolic ref, or which are not a ref at all.
The new behaviour is to include symbolic refs know, which aligns us
with the adapted terminology. Furthermore, files which look like
root refs but aren't are now mark those as "broken". As broken refs
are not surfaced by our tooling, this should not lead to a change in
user-visible behaviour, but may cause us to emit warnings. This
feels like the right thing to do as we would otherwise just silently
ignore corrupted root refs completely.
So in all cases the existence check was either superfluous, not in line
with the adapted terminology or masked potential issues. This commit
thus changes the behaviour as proposed and drops the existence check
altogether.
Add a test that verifies that this does not change user-visible
behaviour. Namely, we still don't want to show broken refs to the user
by default in git-for-each-ref(1). What this does allow though is for
internal callers to surface dangling root refs when they pass in the
`DO_FOR_EACH_INCLUDE_BROKEN` flag.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: rename `is_special_ref()` to `is_pseudo_ref()`
Rename `is_special_ref()` to `is_pseudo_ref()` to adapt to the newly
defined terminology in our gitglossary(7). Note that in the preceding
commit we have just renamed `is_pseudoref()` to `is_root_ref()`, where
there may be confusion for in-flight patch series that add new calls to
`is_pseudoref()`. In order to intentionally break such patch series we
have thus picked `is_pseudo_ref()` instead of `is_pseudoref()` as the
new name.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Except for the pseudorefs MERGE_HEAD and FETCH_HEAD, all refs that live
in the root of the ref hierarchy behave the exact same as normal refs.
They can be symbolic refs or direct refs and can be read, iterated over
and written via normal tooling. All of these refs are stored in the ref
backends, which further demonstrates that they are just normal refs.
Extend the definition of "ref" to also cover such root refs. The only
additional restriction for root refs is that they must conform to a
specific naming schema.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation/glossary: clarify limitations of pseudorefs
Clarify limitations that pseudorefs have:
- They can be read via git-rev-parse(1) and similar tools.
- They are not surfaced when iterating through refs, like when using
git-for-each-ref(1). They are not refs, so iterating through refs
should not surface them.
- They cannot be written via git-update-ref(1) and related commands.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation/glossary: redefine pseudorefs as special refs
Nowadays, Git knows about three different kinds of refs. As defined in
gitglossary(7):
- Regular refs that start with "refs/", like "refs/heads/main".
- Pseudorefs, which live in the root directory. These must have
all-caps names and must be a file that start with an object hash.
Consequently, symbolic refs are not pseudorefs because they do not
start with an object hash.
- Special refs, of which we only have "FETCH_HEAD" and "MERGE_HEAD".
This state is extremely confusing, and I would claim that most folks
don't fully understand what is what here. The current definitions also
have several problems:
- Where does "HEAD" fit in? It's not a pseudoref because it can be
a symbolic ref. It's not a regular ref because it does not start
with "refs/". And it's not a special ref, either.
- There is a strong overlap between pseudorefs and special refs. The
pseudoref section for example mentions "MERGE_HEAD", even though it
is a special ref. Is it thus both a pseudoref and a special ref?
- Why do we even need to distinguish refs that live in the root from
other refs when they behave just like a regular ref anyway?
In other words, the current state is quite a mess and leads to wild
inconsistencies without much of a good reason.
The original reason why pseudorefs were introduced is that there are
some refs that sometimes behave like a ref, even though they aren't a
ref. And we really only have two of these nowadays, namely "MERGE_HEAD"
and "FETCH_HEAD". Those files are never written via the ref backends,
but are instead written by git-fetch(1), git-pull(1) and git-merge(1).
They contain additional metadata that highlights where a ref has been
fetched from or the list of commits that have been merged.
This original intent in fact matches the definition of special refs that
we have recently introduced in 8df4c5d205 (Documentation: add "special
refs" to the glossary, 2024-01-19). Due to the introduction of the new
reftable backend we were forced to distinguish those refs more clearly
such that we don't ever try to read or write them via the reftable
backend. In the same series, we also addressed all the other cases where
we used to write those special refs via the filesystem directly, thus
circumventing the ref backend, to instead write them via the backends.
Consequently, there are no other refs left anymore which are special.
Let's address this mess and return the pseudoref terminology back to its
original intent: a ref that sometimes behave like a ref, but which isn't
really a ref because it gets written to the filesystem directly. Or in
other words, let's redefine pseudorefs to match the current definition
of special refs. As special refs and pseudorefs are now the same per
definition, we can drop the "special refs" term again. It's not exposed
to our users and thus they wouldn't ever encounter that term anyway.
Refs that live in the root of the ref hierarchy but which are not
pseudorefs will be further defined in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/config: pass data between callbacks via local variables
We use several global variables to pass data between callers and
callbacks in `get_color()` and `get_colorbool()`. Convert those to use
callback data structures instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both the `do_all` and `use_key_regexp` bits essentially act like flags
to `get_value()`. Let's convert them to actual flags so that we can get
rid of the last two remaining global variables that track options.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/config: track "fixed value" option via flags only
We track the "fixed value" option via two separate bits: once via the
global variable `fixed_value`, and once via the CONFIG_FLAGS_FIXED_VALUE
bit in `flags`. This is confusing and may easily lead to issues when one
is not aware that this is tracked via two separate mechanisms.
Refactor the code to use the flag exclusively. We already pass it to all
the required callsites anyway, except for `collect_config()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `key` variable is used by the `get_value()` function for two
purposes:
- It is used to store the result of `git_config_parse_key()`, which is
then passed on to `collect_config()`.
- It is used as a store to convert the provided key to an
all-lowercase key when `use_key_regexp` is set.
Neither of these cases warrant a global variable at all. In the former
case we can pass the key via `struct collect_config_data`. And in the
latter case we really only want to have it as a temporary local variable
such that we can free associated memory.
Refactor the code accordingly to reduce our reliance on global state.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/config: convert `key_regexp` to a local variable
The `key_regexp` variable is used by the `format_config()` callback when
`use_key_regexp` is set. It is only ever set up by its only caller,
`collect_config()` and can thus easily be moved into the
`collect_config_data` structure.
Do so to remove our reliance on global state.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/config: convert `regexp` to a local variable
The `regexp` variable is used by the `format_config()` callback when
`CONFIG_FLAGS_FIXED_VALUE` is not set. It is only ever set up by its
only caller, `collect_config()` and can thus easily be moved into the
`collect_config_data` structure.
Do so to remove our reliance on global state.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/config: convert `value_pattern` to a local variable
The `value_pattern` variable is used by the `format_config()` callback
when `CONFIG_FLAGS_FIXED_VALUE` is used. It is only ever set up by its
only caller, `collect_config()` and can thus easily be moved into the
`collect_config_data` structure.
Do so to remove our reliance on global state.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/config: convert `do_not_match` to a local variable
The `do_not_match` variable is used by the `format_config()` callback as
an indicator whether or not the passed regular expression is negated. It
is only ever set up by its only caller, `collect_config()` and can thus
easily be moved into the `collect_config_data` structure.
Do so to remove our reliance on global state.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/config: move display options into local variables
The display options are tracked via a set of global variables. Move
them into a self-contained structure so that we can easily parse all
relevant options and hand them over to the various functions that
require them.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/config: move location options into local variables
The location options are tracked via a set of global variables. Move
them into a self-contained structure so that we can easily parse all
relevant options and hand them over to the various functions that
require them.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/config: check for writeability after source is set up
The `check_write()` function verifies that we do not try to write to a
config source that cannot be written to, like for example stdin. But
while the new subcommands do call this function, they do so before
calling `handle_config_location()`. Consequently, we only end up
checking the default config location for writeability, not the location
that was actually specified by the caller of git-config(1).
Fix this by calling `check_write()` after `handle_config_location()`. We
will further clarify the relationship between those two functions in a
subsequent commit where we remove the global state that both implicitly
rely on.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/config: move actions into `cmd_config_actions()`
We only use actions in the legacy mode. Convert them to an enum and move
them into `cmd_config_actions()` to clearly demonstrate that they are
not used anywhere else.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/config: move legacy mode into its own function
In `cmd_config()` we first try to parse the provided arguments as
subcommands and, if this is successful, call the respective functions
of that subcommand. Otherwise we continue with the "legacy" mode that
uses implicit actions and/or flags.
Disentangle this by moving the legacy mode into its own function. This
allows us to move the options into the respective functions and clearly
separates concerns.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/config: stop printing full usage on misuse
When invoking git-config(1) with a wrong set of arguments we end up
calling `usage_builtin_config()` after printing an error message that
says what was wrong. As that function ends up printing the full list of
options, which is quite long, the actual error message will be buried by
a wall of text. This makes it really hard to figure out what exactly
caused the error.
Furthermore, now that we have recently introduced subcommands, the usage
information may actually be misleading as we unconditionally print
options of the subcommand-less mode.
Fix both of these issues by just not printing the options at all
anymore. Instead, we call `usage()` that makes us report in a single
line what has gone wrong. This should be way more discoverable for our
users and addresses the inconsistency.
Furthermore, this change allow us to inline the options into the
respective functions that use them to parse the command line.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 14 May 2024 19:57:06 +0000 (15:57 -0400)]
pack-bitmap: introduce `bitmap_writer_free()`
Now that there is clearer memory ownership around the bitmap_writer
structure, introduce a bitmap_writer_free() function that callers may
use to free any memory associated with their instance of the
bitmap_writer structure.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 14 May 2024 19:57:03 +0000 (15:57 -0400)]
pack-bitmap-write.c: avoid uninitialized 'write_as' field
Prepare to free() memory associated with bitmapped_commit structs by
zero'ing the 'write_as' field.
In ideal cases, it is fine to do something like:
for (i = 0; i < writer->selected_nr; i++) {
struct bitmapped_commit *bc = &writer->selected[i];
if (bc->write_as != bc->bitmap)
ewah_free(bc->write_as);
ewah_free(bc->bitmap);
}
but if not all of the 'write_as' fields were populated (e.g., because
the packing_data given does not form a reachability closure), then we
may attempt to free uninitialized memory.
Guard against this by preemptively zero'ing this field just in case.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 14 May 2024 19:57:00 +0000 (15:57 -0400)]
pack-bitmap: drop unused `max_bitmaps` parameter
The `max_bitmaps` parameter in `bitmap_writer_select_commits()` was
introduced back in 7cc8f97108 (pack-objects: implement bitmap writing,
2013-12-21), making it original to the bitmap implementation in Git
itself.
When that patch was merged via 0f9e62e084 (Merge branch
'jk/pack-bitmap', 2014-02-27), its sole caller in builtin/pack-objects.c
passed a value of "-1" for `max_bitmaps`, indicating no limit.
Since then, the only other caller (in midx.c, added via c528e17966
(pack-bitmap: write multi-pack bitmaps, 2021-08-31)) also uses a value
of "-1" for `max_bitmaps`.
Since no callers have needed a finite limit for the `max_bitmaps`
parameter in the nearly decade that has passed since 0f9e62e084, let's
remove the parameter and any dead pieces of code connected to it.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 14 May 2024 19:56:56 +0000 (15:56 -0400)]
pack-bitmap: avoid use of static `bitmap_writer`
The pack-bitmap machinery uses a structure called 'bitmap_writer' to
collect the data necessary to write out .bitmap files. Since its
introduction in 7cc8f971085 (pack-objects: implement bitmap writing,
2013-12-21), there has been a single static bitmap_writer structure,
which is responsible for all bitmap writing-related operations.
In practice, this is OK, since we are only ever writing a single .bitmap
file in a single process (e.g., `git multi-pack-index write --bitmap`,
`git pack-objects --write-bitmap-index`, `git repack -b`, etc.).
However, having a single static variable makes issues like data
ownership unclear, when to free variables, what has/hasn't been
initialized unclear.
Refactor this code to be written in terms of a given bitmap_writer
structure instead of relying on a static global.
Note that this exposes the structure definition of the bitmap_writer at
the pack-bitmap.h level. We could work around this by, e.g., forcing
callers to declare their writers as:
which would avoid us having to expose the definition of the structure
itself. This patch takes a different approach, since future patches
(like for the ongoing pseudo-merge bitmaps work) will want to modify the
innards of this structure (in the previous example, via pseudo-merge.c).
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 14 May 2024 19:56:53 +0000 (15:56 -0400)]
pack-bitmap-write.c: move commit_positions into commit_pos fields
In 7cc8f971085 (pack-objects: implement bitmap writing, 2013-12-21), the
bitmapped_commit struct was introduced, including the 'commit_pos'
field, which has been unused ever since its introduction more than a
decade ago.
Instead, we have used the nearby `commit_positions` array leaving the
bitmapped_commit struct with an unused 4-byte field.
We could drop the `commit_pos` field as unused, and continue to store
the values in the auxiliary array. But we could also drop the array and
store the data for each bitmapped_commit struct inside of the structure
itself, which is what this patch does.
In any spot that we previously read `commit_positions[i]`, we can now
instead read `writer.selected[i].commit_pos`. There are a few spots that
need changing as a result:
- write_selected_commits_v1() is a simple transformation, since we're
just reading the field. As a result, the function no longer needs an
explicit argument to pass the commit_positions array.
- write_lookup_table() also no longer needs the explicit
commit_positions array passed in as an argument. But it still needs
to sort an array of indices into the writer.selected array to read
them in commit_pos order, so table_cmp() is adjusted accordingly.
- bitmap_writer_finish() no longer needs to allocate, populate, and
free the commit_positions table. Instead, we can just write the data
directly into each struct bitmapped_commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taylor Blau [Tue, 14 May 2024 19:56:50 +0000 (15:56 -0400)]
object.h: add flags allocated by pack-bitmap.h
In commit 7cc8f971085 (pack-objects: implement bitmap writing,
2013-12-21) the NEEDS_BITMAP flag was introduced into pack-bitmap.h, but
no object flags allocation table existed at the time.
In 208acbfb82f (object.h: centralize object flag allocation, 2014-03-25)
when that table was first introduced, we never added the flags from 7cc8f971085, which has remained the case since.
Rectify this by including the flag bit used by pack-bitmap.h into the
centralized table in object.h.
Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 14 May 2024 01:29:15 +0000 (18:29 -0700)]
Sync with Git 2.45.1
* tag 'v2.45.1': (42 commits)
Git 2.45.1
Git 2.44.1
Git 2.43.4
Git 2.42.2
Git 2.41.1
Git 2.40.2
Git 2.39.4
fsck: warn about symlink pointing inside a gitdir
core.hooksPath: add some protection while cloning
init.templateDir: consider this config setting protected
clone: prevent hooks from running during a clone
Add a helper function to compare file contents
init: refactor the template directory discovery into its own function
find_hook(): refactor the `STRIP_EXTENSION` logic
clone: when symbolic links collide with directories, keep the latter
entry: report more colliding paths
t5510: verify that D/F confusion cannot lead to an RCE
submodule: require the submodule path to contain directories only
clone_submodule: avoid using `access()` on directories
submodules: submodule paths must not contain symlinks
...
reftable/merged: adapt interface to allow reuse of iterators
Refactor the interfaces exposed by `struct reftable_merged_table` and
`struct merged_iter` such that they support iterator reuse. This is done
by separating initialization of the iterator and seeking on it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/stack: provide convenience functions to create iterators
There exist a bunch of call sites in the reftable backend that want to
create iterators for a reftable stack. This is rather convoluted right
now, where you always have to go via the merged table. And it is about
to become even more convoluted when we split up iterator initialization
and seeking in the next commit.
Introduce convenience functions that allow the caller to create an
iterator from a reftable stack directly without going through the merged
table. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/reader: adapt interface to allow reuse of iterators
Refactor the interfaces exposed by `struct reftable_reader` and `struct
table_iterator` such that they support iterator reuse. This is done by
separating initialization of the iterator and seeking on it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/generic: adapt interface to allow reuse of iterators
Refactor the interfaces exposed by `struct reftable_table` and `struct
reftable_iterator` such that they support iterator reuse. This is done
by separating initialization of the iterator and seeking on it.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/generic: move seeking of records into the iterator
Reftable iterators are created by seeking on the parent structure of a
corresponding record. For example, to create an iterator for the merged
table you would call `reftable_merged_table_seek_ref()`. Most notably,
it is not posible to create an iterator and then seek it afterwards.
While this may be a bit easier to reason about, it comes with two
significant downsides. The first downside is that the logic to find
records is split up between the parent data structure and the iterator
itself. Conceptually, it is more straight forward if all that logic was
contained in a single place, which should be the iterator.
The second and more significant downside is that it is impossible to
reuse iterators for multiple seeks. Whenever you want to look up a
record, you need to re-create the whole infrastructure again, which is
quite a waste of time. Furthermore, it is impossible to optimize seeks,
such as when seeking the same record multiple times.
To address this, we essentially split up the concerns properly such that
the parent data structure is responsible for setting up the iterator via
a new `init_iter()` callback, whereas the iterator handles seeks via a
new `seek()` callback. This will eventually allow us to call `seek()` on
the iterator multiple times, where every iterator can potentially
optimize for certain cases.
Note that at this point in time we are not yet ready to reuse the
iterators. This will be left for a future patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/merged: simplify indices for subiterators
When seeking on a merged table, we perform the seek for each of the
subiterators. If the subiterator has the desired record we add it to the
priority queue, otherwise we skip it and don't add it to the stack of
subiterators hosted by the merged table.
The consequence of this is that the index of the subiterator in the
merged table does not necessarily correspond to the index of it in the
merged iterator. Next to being potentially confusing, it also means that
we won't easily be able to re-seek the merged iterator because we have
no clear connection between both of the data structures.
Refactor the code so that the index stays the same in both structures.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/merged: split up initialization and seeking of records
To initialize a `struct merged_iter`, we need to seek all subiterators
to the wanted record and then add their results to the priority queue
used to sort the records. This logic is split up across two functions,
`merged_table_seek_record()` and `merged_iter_init()`. The scope of
these functions is somewhat weird though, where `merged_iter_init()` is
only responsible for adding the records of the subiterators to the
priority queue.
Clarify the scope of those functions such that `merged_iter_init()` is
only responsible for initializing the iterator's structure. Performing
the subiterator seeks are now part of `merged_table_seek_record()`.
This step is required to move seeking of records into the generic
`struct reftable_iterator` infrastructure.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/reader: set up the reader when initializing table iterator
All the seeking functions accept a `struct reftable_reader` as input
such that they can use the reader to look up the respective blocks.
Refactor the code to instead set up the reader as a member of `struct
table_iter` during initialization such that we don't have to pass the
reader on every single call.
This step is required to move seeking of records into the generic
`struct reftable_iterator` infrastructure.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have both `reader_seek()` and `reader_seek_internal()`, where the
former function only exists so that we can exit early in case the given
table has no records of the sought-after type.
Merge these two functions into one.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/reader: separate concerns of table iter and reftable reader
In "reftable/reader.c" we implement two different interfaces:
- The reftable reader contains the logic to read reftables.
- The table iterator is used to iterate through a single reftable read
by the reader.
The way those two types are used in the code is somewhat confusing
though because seeking inside a table is implemented as if it was part
of the reftable reader, even though it is ultimately more of a detail
implemented by the table iterator.
Make the boundary between those two types clearer by renaming functions
that seek records in a table such that they clearly belong to the table
iterator's logic.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `reader_seek_internal()` we either end up doing an indexed seek when
there is one or a linear seek otherwise. These two code paths are
disjunct without a good reason, where the indexed seek will cause us to
exit early.
Refactor the two code paths such that it becomes possible to share a bit
more code between them.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When doing an indexed seek we need to walk down the multi-level index
until we finally hit a record of the desired indexed type. This loop
performs a copy of the index iterator on every iteration, which is both
hard to understand and completely unnecessary.
Refactor the code so that we use a single iterator to walk down the
indices, only.
Note that while this should improve performance, the improvement is
negligible in all but the most unreasonable repositories. This is
because the effect is only really noticeable when we have to walk down
many levels of indices, which is not something that a repository would
typically have. So the motivation for this change is really only about
readability.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/block: use `size_t` to track restart point index
The function `block_reader_restart_offset()` gets the offset of the
`i`th restart point. `i` is a signed integer though, which is certainly
not the correct type to track indices like this. Furthermore, both
callers end up passing a `size_t`.
Refactor the code to use a `size_t` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When auto-compacting, the reftable library packs references such that
the sizes of the tables form a geometric sequence. The factor for this
geometric sequence is hardcoded to 2 right now. We're about to expose
this as a config option though, so let's expose the factor via write
options.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/reftable: allow disabling writing the object index
Besides the expected "ref" and "log" records, the reftable library also
writes "obj" records. These are basically a reverse mapping of object
IDs to their respective ref records so that it becomes efficient to
figure out which references point to a specific object. The motivation
for this data structure is the "uploadpack.allowTipSHA1InWant" config,
which allows a client to fetch any object by its hash that has a ref
pointing to it.
This reverse index is not used by Git at all though, and the expectation
is that most hosters nowadays use "uploadpack.allowAnySHA1InWant". It
may thus be preferable for many users to disable writing these optional
object indices altogether to safe some precious disk space.
Add a new config "reftable.indexObjects" that allows the user to disable
the object index altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new option `reftable.restartInterval` that allows the user to
control the restart interval when writing reftable records used by the
reftable library.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable: use `uint16_t` to track restart interval
The restart interval can at most be `UINT16_MAX` as specified in the
technical documentation of the reftable format. Furthermore, it cannot
ever be negative. Regardless of that we use an `int` to track the
restart interval.
Change the type to use an `uint16_t` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/dump: support dumping a table's block structure
We're about to introduce new configs that will allow users to have more
control over how exactly reftables are written. To verify that these
configs are effective we will need to take a peak into the actual blocks
written by the reftable backend.
Introduce a new mode to the dumping logic that prints out the block
structure. This logic can be invoked via `test-tool dump-reftables -b`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/writer: improve error when passed an invalid block size
The reftable format only supports block sizes up to 16MB. When the
writer is being passed a value bigger than that it simply calls
abort(3P), which isn't all that helpful due to the lack of a proper
error message.
Improve this by calling `BUG()` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/writer: drop static variable used to initialize strbuf
We have a static variable in the reftable writer code that is merely
used to initialize the `last_key` of the writer. Convert the code to
instead use `strbuf_init()` and drop the variable.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes pass the refatble write options as value and sometimes as a
pointer. This is quite confusing and makes the reader wonder whether the
options get modified sometimes.
In fact, `reftable_new_writer()` does cause the caller-provided options
to get updated when some values aren't set up. This is quite unexpected,
but didn't cause any harm until now.
Adapt the code so that we do not modify the caller-provided values
anymore. While at it, refactor the code to code to consistently pass the
options as a constant pointer to clarify that the caller-provided opts
will not ever get modified.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable: consistently refer to `reftable_write_options` as `opts`
Throughout the reftable library the `reftable_write_options` are
sometimes referred to as `cfg` and sometimes as `opts`. Unify these to
consistently use `opts` to avoid confusion.
While at it, touch up the coding style a bit by removing unneeded braces
around one-line statements and newlines between variable declarations.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dov Murik [Sun, 12 May 2024 03:14:00 +0000 (23:14 -0400)]
documentation: git-update-index: add --show-index-version to synopsis
In 606e088d5d (update-index: add --show-index-version, 2023-09-12), we
added the new '--show-index-version' option to 'git-update-index' and
documented it, but forgot to add it to the synopsis section.
Add '--show-index-version' to the synopsis of 'git-update-index'.
Signed-off-by: Dov Murik <dov.murik@linux.dev> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 13 May 2024 19:24:54 +0000 (12:24 -0700)]
Merge branch 'ps/undecided-is-not-necessarily-sha1' into jc/undecided-is-not-necessarily-sha1-fix
* ps/undecided-is-not-necessarily-sha1:
repository: stop setting SHA1 as the default object hash
oss-fuzz/commit-graph: set up hash algorithm
builtin/shortlog: don't set up revisions without repo
builtin/diff: explicitly set hash algo when there is no repo
builtin/bundle: abort "verify" early when there is no repository
builtin/blame: don't access potentially unitialized `the_hash_algo`
builtin/rev-parse: allow shortening to more than 40 hex characters
remote-curl: fix parsing of detached SHA256 heads
attr: fix BUG() when parsing attrs outside of repo
attr: don't recompute default attribute source
parse-options-cb: only abbreviate hashes when hash algo is known
path: move `validate_headref()` to its only user
path: harden validation of HEAD with non-standard hashes
Junio C Hamano [Mon, 13 May 2024 17:19:47 +0000 (10:19 -0700)]
Merge branch 'jk/ci-macos-gcc13-fix'
CI fix.
* jk/ci-macos-gcc13-fix:
ci: stop installing "gcc-13" for osx-gcc
ci: avoid bare "gcc" for osx-gcc job
ci: drop mention of BREW_INSTALL_PACKAGES variable
Junio C Hamano [Mon, 13 May 2024 17:19:46 +0000 (10:19 -0700)]
Merge branch 'jc/no-default-attr-tree-in-bare'
Git 2.43 started using the tree of HEAD as the source of attributes
in a bare repository, which has severe performance implications.
For now, revert the change, without ripping out a more explicit
support for the attr.tree configuration variable.
* jc/no-default-attr-tree-in-bare:
stop using HEAD for attributes in bare repository by default
Junio C Hamano [Sun, 12 May 2024 06:25:04 +0000 (23:25 -0700)]
compat/regex: fix argument order to calloc(3)
Windows compiler suddenly started complaining that calloc(3) takes
its arguments in <nmemb, size> order. Indeed, there are many calls
that has their arguments in a _wrong_ order.
Junio C Hamano [Fri, 10 May 2024 16:55:26 +0000 (09:55 -0700)]
SubmittingPatches: extend the "flow" section
Explain a full lifecycle of a patch series upfront, so that it is
clear when key decisions to "accept" a series is made and how a new
patch series becomes a part of a new release.
Fold the "you need to monitor the progress of your topic" section
into the primary "patch lifecycle" section, as that is one of the
things the patch submitter is responsible for. It is not like "I
sent a patch and responded to review messages, and now it is their
problem". They need to see their patch through the patch life
cycle.
Earlier versions of this document outlined a slightly different
patch flow in an idealized world, where the original submitter
gathered agreements from the participants of the discussion and sent
the final "we all agreed that this is the good version--please
apply" patches to the maintainer. In practice, this almost never
happened. Instead, describe what flow was used in practice for the
past decade that worked well for us.
Junio C Hamano [Fri, 10 May 2024 16:55:25 +0000 (09:55 -0700)]
SubmittingPatches: move the patch-flow section earlier
Before discussing the small details of how the patch gets sent, we'd
want to give people a larger picture first to set the expectation
straight. The existing patch-flow section covers materials that are
suitable for that purpose, so move it to the beginning of the
document. We'll update the contents of the section to clarify what
goal the patch submitter is working towards in the next step, which
will make it easier to understand the reason behind the individual
rules presented in latter parts of the document.
This step only moves two sections (patch-flow and patch-status)
without changing their contents, except that their section levels
are demoted from Level 1 to Level 2 to fit better in the document
structure at their new place.
Jeff King [Thu, 9 May 2024 16:25:44 +0000 (12:25 -0400)]
ci: stop installing "gcc-13" for osx-gcc
Our osx-gcc job explicitly asks to install gcc-13. But since the GitHub
runner image already comes with gcc-13 installed, this is mostly doing
nothing (or in some cases it may install an incremental update over the
runner image). But worse, it recently started causing errors like:
==> Fetching gcc@13
==> Downloading https://ghcr.io/v2/homebrew/core/gcc/13/blobs/sha256:fb2403d97e2ce67eb441b54557cfb61980830f3ba26d4c5a1fe5ecd0c9730d1a
==> Pouring gcc@13--13.2.0.ventura.bottle.tar.gz
Error: The `brew link` step did not complete successfully
The formula built, but is not symlinked into /usr/local
Could not symlink bin/c++-13
Target /usr/local/bin/c++-13
is a symlink belonging to gcc. You can unlink it:
brew unlink gcc
which cause the whole CI job to bail.
I didn't track down the root cause, but I suspect it may be related to
homebrew recently switching the "gcc" default to gcc-14. And it may even
be fixed when a new runner image is released. But if we don't need to
run brew at all, it's one less thing for us to worry about.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 9 May 2024 16:24:15 +0000 (12:24 -0400)]
ci: avoid bare "gcc" for osx-gcc job
On macOS, a bare "gcc" (without a version) will invoke a wrapper for
clang, not actual gcc. Even when gcc is installed via homebrew, that
only provides version-specific links in /usr/local/bin (like "gcc-13"),
and never a version-agnostic "gcc" wrapper.
As far as I can tell, this has been the case for a long time, and this
osx-gcc job has largely been doing nothing. We can point it at "gcc-13",
which will pick up the homebrew-installed version.
The fix here is specific to the github workflow file, as the gitlab one
does not have a matching job.
It's a little unfortunate that we cannot just ask for the latest version
of gcc which homebrew provides, but as far as I can tell there is no
easy alias (you'd have to find the highest number gcc-* in
/usr/local/bin yourself).
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 9 May 2024 16:23:05 +0000 (12:23 -0400)]
ci: drop mention of BREW_INSTALL_PACKAGES variable
The last user of this variable went away in 4a6e4b9602 (CI: remove
Travis CI support, 2021-11-23), so it's doing nothing except making it
more confusing to find out which packages _are_ installed.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Jeff King [Thu, 9 May 2024 16:15:06 +0000 (12:15 -0400)]
ci: update coverity runs_on_pool reference
Commit 2d65e5b6a6 (ci: rename "runs_on_pool" to "distro", 2024-04-12)
renamed this variable for the main CI workflow, as well as in the ci/
scripts. Because the coverity workflow also relies on those scripts to
install dependencies, it needs to be updated, too. Without this patch,
the coverity build fails because we lack libcurl.
Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitlab-ci: fix installing dependencies for fuzz smoke tests
There was a semantic merge conflict between 9cdeb34b96 (ci: merge
scripts which install dependencies, 2024-04-12), which has merged
"ci/install-docker-dependencies.sh" into "ci/install-dependencies.sh"
and c7b228e000 (gitlab-ci: add smoke test for fuzzers, 2024-04-29),
which has added a new fuzz smoke test job that makes use of the
now-removed script.
Adapt the job to instead use the new script to install dependencies.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fahad Alrashed [Wed, 8 May 2024 22:11:05 +0000 (22:11 +0000)]
git-p4: show Perforce error to the user
During "git p4 clone" if p4 process returns an error from the server,
it will store the message in the 'err' variable. Then it will send a
text command "die-now" to git-fast-import. However, git-fast-import
raises an exception: "fatal: Unsupported command: die-now" and err is
never displayed. This patch ensures that err is shown to the end user.
Signed-off-by: Fahad Alrashed <fahad@keylock.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 8 May 2024 17:18:47 +0000 (10:18 -0700)]
Merge branch 'bb/rgb-12-bit-colors'
The color parsing code learned to handle 12-bit RGB colors, spelled
as "#RGB" (in addition to "#RRGGBB" that is already supported).
* bb/rgb-12-bit-colors:
color: add support for 12-bit RGB colors
t/t4026-color: add test coverage for invalid RGB colors
t/t4026-color: remove an extra double quote character
Junio C Hamano [Wed, 8 May 2024 17:18:46 +0000 (10:18 -0700)]
Merge branch 'bc/zsh-compatibility'
zsh can pretend to be a normal shell pretty well except for some
glitches that we tickle in some of our scripts. Work them around
so that "vimdiff" and our test suite works well enough with it.
* bc/zsh-compatibility:
vimdiff: make script and tests work with zsh
t4046: avoid continue in &&-chain for zsh
Junio C Hamano [Wed, 8 May 2024 17:18:45 +0000 (10:18 -0700)]
Merge branch 'rj/add-p-typo-reaction'
When the user responds to a prompt given by "git add -p" with an
unsupported command, list of available commands were given, which
was too much if the user knew what they wanted to type but merely
made a typo. Now the user gets a much shorter error message.
* rj/add-p-typo-reaction:
add-patch: response to unknown command
add-patch: do not show UI messages on stderr
Junio C Hamano [Wed, 8 May 2024 17:18:45 +0000 (10:18 -0700)]
Merge branch 'rh/complete-symbolic-ref'
Command line completion script (in contrib/) learned to complete
"git symbolic-ref" a bit better (you need to enable plumbing
commands to be completed with GIT_COMPLETION_SHOW_ALL_COMMANDS).
* rh/complete-symbolic-ref:
completion: add docs on how to add subcommand completions
completion: improve docs for using __git_complete
completion: add 'symbolic-ref'
Junio C Hamano [Wed, 8 May 2024 17:18:44 +0000 (10:18 -0700)]
Merge branch 'ps/the-index-is-no-more'
The singleton index_state instance "the_index" has been eliminated
by always instantiating "the_repository" and replacing references
to "the_index" with references to its .index member.
* ps/the-index-is-no-more:
repository: drop `initialize_the_repository()`
repository: drop `the_index` variable
builtin/clone: stop using `the_index`
repository: initialize index in `repo_init()`
builtin: stop using `the_index`
t/helper: stop using `the_index`