remote: announce removal of "branches/" and "remotes/"
Back when Git was in its infancy, remotes were configured via separate
files in "branches/" (back in 2005). This mechanism was replaced later
that year with the "remotes/" directory. Both mechanisms have eventually
been replaced by config-based remotes, and it is very unlikely that
anybody still uses these directories to configure their remotes.
Both of these directories have been marked as deprecated, one in 2005
and the other one in 2011. Follow through with the deprecation and
finally announce the removal of these features in Git 3.0.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: with a small tweak to the help message] Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/pack-redundant: remove subcommand with breaking changes
The git-pack-redundant(1) subcommand has been castrated to require
the "--i-still-use-this" option to do anything since 4406522b
(pack-redundant: escalate deprecation warning to an error,
2023-03-23), which appeared in Git 2.41 and was announced for
removal with 53a92c9552 (Documentation/BreakingChanges: announce
removal of git-pack-redundant(1), 2024-09-02). Stop compiling the
subcommand in case the `WITH_BREAKING_CHANGES` build flag is set.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "linux-gcc" job isn't all that interesting by itself and can be
considered more or less the "standard" job: it is running with a
reasonably up-to-date image and uses GCC as a compiler, both of which we
already cover in other jobs.
There is one exception though: we change the default branch to be "main"
instead of "master", so it is forging ahead a bit into the future to
make sure that this change does not cause havoc. So let's expand on this
a bit and also add the new "WITH_BREAKING_CHANGES" flag to the mix.
Rename the job to "linux-breaking-changes" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "linux-gcc-default" job is mostly doing the same as the "linux-gcc"
job, except for a couple of minor differences:
- We use an explicit GCC version instead of the default version
provided by the distribution. We have other jobs that test with
"gcc-8", making this distinction pointless.
- We don't set up the Python version explicitly, and instead use the
default Python version. Python 2 has been end-of-life for quite a
while now though, making this distinction less interesting.
- We set up the default branch name to be "main" in "linux-gcc". We
have other testcases that don't and also some that explicitly use
"master".
- We use "ubuntu:20.04" in one job and "ubuntu:latest" in another. We
already have a couple other jobs testing these respectively.
So overall, the job does not add much to our test coverage.
Drop the "linux-gcc-default" job and adapt "linux-gcc" to start using
the default GCC compiler, effectively merging those two jobs into one.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Makefile: wire up build option for deprecated features
With 57ec9254eb (docs: introduce document to announce breaking changes,
2024-06-14), we have introduced a new document that tracks upcoming
breaking changes in the Git project. In 2454970930 (BreakingChanges:
early adopter option, 2024-10-11) we have amended the document a bit to
mention that any introduced breaking changes must be accompanied by
logic that allows us to enable the breaking change at compile-time.
While we already have two breaking changes lined up, neither of them has
such a switch because they predate those instructions.
Introduce the proposed `WITH_BREAKING_CHANGES` preprocessor macro and
wire it up with both our Makefiles and Meson. This does not yet wire up
the build flag for existing deprecations.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Thu, 12 Dec 2024 07:55:41 +0000 (16:55 +0900)]
Merge branch 'ps/build' into ps/3.0-remote-deprecation
* ps/build: (24 commits)
Introduce support for the Meson build system
Documentation: add comparison of build systems
t: allow overriding build dir
t: better support for out-of-tree builds
Documentation: extract script to generate a list of mergetools
Documentation: teach "cmd-list.perl" about out-of-tree builds
Documentation: allow sourcing generated includes from separate dir
Makefile: simplify building of templates
Makefile: write absolute program path into bin-wrappers
Makefile: allow "bin-wrappers/" directory to exist
Makefile: refactor generators to be PWD-independent
Makefile: extract script to generate gitweb.js
Makefile: extract script to generate gitweb.cgi
Makefile: extract script to massage Python scripts
Makefile: extract script to massage Shell scripts
Makefile: use "generate-perl.sh" to massage Perl library
Makefile: extract script to massage Perl scripts
Makefile: consistently use PERL_PATH
Makefile: generate doc versions via GIT-VERSION-GEN
Makefile: generate "git.rc" via GIT-VERSION-GEN
...
Junio C Hamano [Tue, 10 Dec 2024 01:04:57 +0000 (10:04 +0900)]
Merge branch 'ps/reftable-iterator-reuse'
Optimize reading random references out of the reftable backend by
allowing reuse of iterator objects.
* ps/reftable-iterator-reuse:
refs/reftable: reuse iterators when reading refs
reftable/merged: drain priority queue on reseek
reftable/stack: add mechanism to notify callers on reload
refs/reftable: refactor reflog expiry to use reftable backend
refs/reftable: refactor reading symbolic refs to use reftable backend
refs/reftable: read references via `struct reftable_backend`
refs/reftable: figure out hash via `reftable_stack`
reftable/stack: add accessor for the hash ID
refs/reftable: handle reloading stacks in the reftable backend
refs/reftable: encapsulate reftable stack
Junio C Hamano [Tue, 10 Dec 2024 01:04:56 +0000 (10:04 +0900)]
Merge branch 'ps/reftable-detach'
Isolates the reftable subsystem from the rest of Git's codebase by
using fewer pieces of Git's infrastructure.
* ps/reftable-detach:
reftable/system: provide thin wrapper for lockfile subsystem
reftable/stack: drop only use of `get_locked_file_path()`
reftable/system: provide thin wrapper for tempfile subsystem
reftable/stack: stop using `fsync_component()` directly
reftable/system: stop depending on "hash.h"
reftable: explicitly handle hash format IDs
reftable/system: move "dir.h" to its only user
Loosen overly strict ownership check introduced in the recent past,
to keep the promise "cloning a suspicious repository is a safe
first step to inspect it".
* bc/allow-upload-pack-from-other-people:
Allow cloning from repositories owned by another user
Introduce support for the Meson build system, a "modern" meta build
system that supports many different platforms, including Linux, macOS,
Windows and BSDs. Meson supports different backends, including Ninja,
Xcode and Microsoft Visual Studio. Several common IDEs provide an
integration with it.
The biggest contender compared to Meson is probably CMake as outlined in
our "Documentation/technical/build-systems.txt" file. Based on my own
personal experience from working with both build systems extensively I
strongly favor Meson over CMake. In my opinion, it feels significantly
easier to use with a syntax that feels more like a "real" programming
language. The second big reason is that Meson supports Rust natively,
which may prove to be important given that the project may pick up Rust
as another language eventually.
Using Meson is rather straight-forward. An example:
```
# Meson uses out-of-tree builds. You can set up multiple build
# directories, how you name them is completely up to you.
$ mkdir build
$ cd build
$ meson setup .. -Dprefix=/tmp/git-installation
# Build the project. This also provides several other targets like
e.g. `install` or `test`.
$ ninja
# Meson has been wired up to support execution of our test suites.
# Both our unit tests and our integration tests are supported.
# Running `meson test` without any arguments will execute all tests,
# but the syntax supports globbing to select only some tests.
$ meson test 't-*'
# Execute single test interactively to allow for debugging.
$ meson test 't0000-*' --interactive --test-args=-ix
```
The build instructions have been successfully tested on the following
systems, tests are passing:
- Apple macOS 10.15.
- FreeBSD 14.1.
- NixOS 24.11.
- OpenBSD 7.6.
- Ubuntu 24.04.
- Windows 10 with Cygwin.
- Windows 10 with MinGW64, except for t9700, which is also broken with
our Makefile.
- Windows 10 with Visual Studio 2022 toolchain, using the Native Tools
Command Prompt with `meson setup --vsenv`. Tests pass, except for
t9700.
- Windows 10 with Visual Studio 2022 solution, using the Native Tools
Command Prompt with `meson setup --backend vs2022`. Tests pass,
except for t9700.
- Windows 10 with VS Code, using the Meson plug-in.
It is expected that there will still be rough edges in the current
version. If this patch lands the expectation is that it will coexist
with our other build systems for a while. Like this, distributions can
slowly migrate over to Meson and report any findings they have to us
such that we can continue to iterate. A potential cutoff date for other
build systems may be Git 3.0.
Some notes:
- The installed distribution is structured somewhat differently than
how it used to be the case. All of our binaries are installed into
`$libexec/git-core`, while all binaries part of `$bindir` are now
symbolic links pointing to the former. This rule is consistent in
itself and thus easier to reason about.
- We do not install dashed binaries into `$libexec/git-core` anymore,
so there won't e.g. be a symlink for git-add(1). These are not
required by modern Git and there isn't really much of a use case for
those anymore. By not installing those symlinks we thus start the
deprecation of this layout.
- We're targeting Meson 1.3.0, which has been released relatively
recently November 2023. The only feature we use from that version is
`fs.relative_to()`, which we could replace if necessary. If so, we
could start to target Meson 1.0.0 and newer, released in December
2022.
- The whole build instructions count around 3300 lines, half of which
is listing all of our code and test files. Our Makefiles are around
5000 lines, autoconf adds another 1300 lines. CMake in comparison
has only 1200 linescode, but it avoids listing individual files and
does not wire up auto-configuration as extensively as the Meson
instructions do.
- We bundle a set of subproject wrappers for curl, expat, openssl,
pcre2 and zlib. This allows developers to build Git without these
dependencies preinstalled, and Meson will fetch and build them
automatically. This is especially helpful on Windows.
Helped-by: Eli Schwartz <eschwartz@gentoo.org> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're contemplating whether to eventually replace our build systems with
a build system that is easier to use. Add a comparison of build systems
to our technical documentation as a baseline for discussion.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our "test-lib.sh" assumes that our build directory is the parent
directory of "t/". While true when using our Makefile, it's not when
using build systems that support out-of-tree builds.
In commit ee9e66e4e7 (cmake: avoid editing t/test-lib.sh, 2022-10-18),
we have introduce support for overriding the GIT_BUILD_DIR by creating
the file "$GIT_BUILD_DIR/GIT-BUILD-DIR" with its contents pointing to
the location of the build directory. The intent was to stop modifying
"t/test-lib.sh" with the CMake build systems while allowing out-of-tree
builds. But "$GIT_BUILD_DIR" is somewhat misleadingly named, as it in
fact points to the _source_ directory. So while that commit solved part
of the problem for out-of-tree builds, CMake still has to write files
into the source tree.
Solve the second part of the problem, namely not having to write any
data into the source directory at all, by also supporting an environment
variable that allows us to point to a different build directory. This
allows us to perform properly self-contained out-of-tree builds.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our in-tree builds used by the Makefile use various different build
directories scattered around different locations. The paths to those
build directories have to be propagated to our tests such that they can
find the contained files. This is done via a mixture of hardcoded paths
in our test library and injected variables in our bin-wrappers or
"GIT-BUILD-OPTIONS".
The latter two mechanisms are preferable over using hardcoded paths. For
one, we have all paths which are subject to change stored in a small set
of central files instead of having the knowledge of build paths in many
files. And second, it allows build systems which build files elsewhere
to adapt those paths based on their own needs. This is especially nice
in the context of build systems that use out-of-tree builds like CMake
or Meson.
Remove hardcoded knowledge of build paths from our test library and move
it into our bin-wrappers and "GIT-BUILD-OPTIONS".
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation: extract script to generate a list of mergetools
We include the list of available mergetools into our manpages. Extract
the script that performs this logic such that we can reuse it in other
build systems.
While at it, refactor the Makefile targets such that we don't create
"mergetools-list.made" anymore. It shouldn't be necessary, as we can
instead have other targets depend on "mergetools-{diff,merge}.txt"
directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation: teach "cmd-list.perl" about out-of-tree builds
The "cmd-list.perl" script generates a list of commands that can be
included into our manpages. The script doesn't know about out-of-tree
builds and instead writes resulting files into the source directory.
Adapt it such that we can read data from the source directory and write
data into the build directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation: allow sourcing generated includes from separate dir
Our documentation uses "include::" directives to include parts that are
either reused across multiple documents or parts that we generate at
build time. Unfortunately, top-level includes are only ever resolved
relative to the base directory, which is typically the directory of the
including document. Most importantly, it is not possible to have either
asciidoc or asciidoctor search multiple directories.
It follows that both kinds of includes must live in the same directory.
This is of course a bummer for out-of-tree builds, because here the
dynamically-built includes live in the build directory whereas the
static includes live in the source directory.
Introduce a `build_dir` attribute and prepend it to all of our includes
for dynamically-built files. This attribute gets set to the build
directory and thus converts the include path to an absolute path, which
asciidoc and asciidoctor know how to resolve.
Note that this change also requires us to update "build-docdep.perl",
which tries to figure out included files such our Makefile can set up
proper build-time dependencies. This script simply scans through the
source files for any lines that match "^include::" and treats the
remainder of the line as included file path. But given that those may
now contain the "{build_dir}" variable we have to teach the script to
replace that attribute with the actual build directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we install Git we also install a set of default templates that both
git-init(1) and git-clone(1) populate into our build directories. The
way the pristine templates are laid out in our source directory is
somewhat weird though: instead of reconstructing the actual directory
hierarchy in "templates/", we represent directory separators with "--".
The only reason I could come up with for why we have this is the
"branches/" directory, which is supposed to be empty when installing it.
And as Git famously doesn't store empty directories at all we have to
work around this limitation.
Now the thing is that the "branches/" directory is a leftover to how
branches used to be stored in the dark ages. gitrepository-layout(5)
lists this directory as "slightly deprecated", which I would claim is a
strong understatement. I have never encountered anybody using it today
and would be surprised if it even works as expected. So having the "--"
hack in place for an item that is basically unused, unmaintained and
deprecated doesn't only feel unreasonable, but installing that entry by
default may also cause confusion for users that do not know what this is
supposed to be in the first place.
Remove this directory from our templates and, now that we do not require
the workaround anymore, restructure the templates to form a proper
hierarchy. This makes it way easier for build systems to install these
templates into place.
We should likely think about removing support for "branch/" altogether,
but that is outside of the scope of this patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Makefile: allow "bin-wrappers/" directory to exist
The "bin-wrappers/" directory gets created by our build system and is
populated with one script for each of our binaries. There isn't anything
inherently wrong with the current layout, but it is somewhat hard to
adapt for out-of-tree build systems.
Adapt the layout such that our "bin-wrappers/" directory always exists
and contains our "wrap-for-bin.sh" script to make things a little bit
easier for subsequent steps.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Makefile: refactor generators to be PWD-independent
We have multiple scripts that generate headers from other data. All of
these scripts have the assumption built-in that they are executed in the
current source directory, which makes them a bit unwieldy to use during
out-of-tree builds.
Refactor them to instead take the source directory as well as the output
file as arguments.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the preceding commit, also extract the script to generate the
"gitweb.js" file. While the logic itself is trivial, it helps us avoid
duplication of logic across build systems and ensures that the build
systems will remain in sync with each other in case the logic ever needs
to change.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to generate "gitweb.cgi" we have to replace various different
placeholders. This is done ad-hoc and is thus not easily reusable across
different build systems.
Introduce a new GITWEB-BUILD-OPTIONS.in template that we populate at
configuration time with the expected options. This script is then used
as input for a new "generate-gitweb.sh" script that generates the final
"gitweb.cgi" file. While this requires us to repeat the options multiple
times, it is in line to how we generate other build options like our
GIT-BUILD-OPTIONS file.
While at it, refactor how we replace the GITWEB_PROJECT_MAXDEPTH. Even
though this variable is supposed to be an integer, the source file has
the value quoted. The quotes are eventually stripped via sed(1), which
replaces `"@GITWEB_PROJECT_MAXDEPTH@"` with the actual value, which is
rather nonsensical. This is made clearer by just dropping the quotes in
the source file.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the script to inject various build-time parameters into our Perl
scripts into a standalone script. This is done such that we can reuse it
in other build systems.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When injecting the Perl path into our scripts we sometimes use '@PERL@'
while we othertimes use '@PERL_PATH@'. Refactor the code use the latter
consistently, which makes it easier to reuse the same logic for multiple
scripts.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Makefile: generate doc versions via GIT-VERSION-GEN
The documentation we generate embeds information for the exact Git
version used as well as the date of the commit. This information is
injected by injecting attributes into the build process via command line
argument.
Refactor the logic so that we write the information into "asciidoc.conf"
and "asciidoctor-extensions.rb" via `GIT-VERSION-GEN` for AsciiDoc and
AsciiDoctor, respectively.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git.rc" is used on Windows to embed information like the project
name and version into the resulting executables. As such we need to
inject the version information, which we do by using preprocessor
defines. The logic to do so is non-trivial and needs to be kept in sync
with the different build systems.
Refactor the logic so that we generate "git.rc" via `GIT-VERSION-GEN`.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Makefile: propagate Git version via generated header
We set up a couple of preprocessor macros when compiling Git that
propagate the version that Git was built from to `git version` et al.
The way this is set up makes it harder than necessary to reuse the
infrastructure across the different build systems.
Refactor this such that we generate a "version-def.h" header via
`GIT-VERSION-GEN` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our "GIT-VERSION-GEN" script always writes the "GIT-VERSION-FILE" into
the current directory, where the expectation is that it should exist in
the source directory. But other build systems that support out-of-tree
builds may not want to do that to keep the source directory pristine,
even though CMake currently doesn't care.
Refactor the script such that it won't write the "GIT-VERSION-FILE"
directly anymore, but instead knows to replace @PLACEHOLDERS@ in an
arbitrary input file. This allows us to simplify the logic in CMake to
determine the project version, but can also be reused later on in order
to generate other files that need to contain version information like
our "git.rc" file.
While at it, change the format of the version file by removing the
spaces around the equals sign. Like this we can continue to include the
file in our Makefiles, but can also start to source it in shell scripts
in subsequent steps.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Makefile: consistently use @PLACEHOLDER@ to substitute
We have a bunch of placeholders in our scripts that we replace at build
time, for example by using sed(1). These placeholders come in three
different formats: @PLACEHOLDER@, @@PLACEHOLDER@@ and ++PLACEHOLDER++.
Next to being inconsistent it also creates a bit of a problem with
CMake, which only supports the first syntax in its `configure_file()`
function. To work around that we instead manually replace placeholders
via string operations, which is a hassle and removes safeguards that
CMake has to verify that we didn't forget to replace any placeholders.
Besides that, other build systems like Meson also support the CMake
syntax.
Unify our codebase to consistently use the syntax supported by such
build systems.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Makefile: use common template for GIT-BUILD-OPTIONS
The "GIT-BUILD-OPTIONS" file is generated by our build systems to
propagate built-in features and paths to our tests. The generation is
done ad-hoc, where both our Makefile and the CMake build instructions
simply echo a bunch of strings into the file. This makes it very hard to
figure out what variables are expected to exist and what format they
have, and the written variables can easily get out of sync between build
systems.
Introduce a new "GIT-BUILD-OPTIONS.in" template to address this issue.
This has multiple advantages:
- It demonstrates which built options exist in the first place.
- It can serve as a spot to document the build options.
- Some build systems complain when not all variables could be
substituted, alerting us of mismatches. Others don't, but if we
forgot to substitute such variables we now have a bogus string that
will likely cause our tests to fail, if they have any meaning in the
first place.
Backfill values that we didn't yet set in our CMake build instructions.
While at it, remove the `SUPPORTS_SIMPLE_IPC` variable that we only set
up in CMake as it isn't used anywhere.
This change requires us to adapt the setup of TEST_OUTPUT_DIRECTORY in
"test-lib.sh" such that it does not get overwritten after sourcing when
it has been set up via the environment. This is the only instance I
could find where we rely on ordering on variables.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Wed, 4 Dec 2024 01:14:49 +0000 (10:14 +0900)]
Merge branch 'ja/git-diff-doc-markup'
Documentation mark-up updates.
* ja/git-diff-doc-markup:
doc: git-diff: apply format changes to config part
doc: git-diff: apply format changes to diff-generate-patch
doc: git-diff: apply format changes to diff-format
doc: git-diff: apply format changes to diff-options
doc: git-diff: apply new documentation guidelines
Junio C Hamano [Wed, 4 Dec 2024 01:14:42 +0000 (10:14 +0900)]
Merge branch 'sj/ref-contents-check'
"git fsck" learned to issue warnings on "curiously formatted" ref
contents that have always been taken valid but something Git
wouldn't have written itself (e.g., missing terminating end-of-line
after the full object name).
* sj/ref-contents-check:
ref: add symlink ref content check for files backend
ref: check whether the target of the symref is a ref
ref: add basic symref content check for files backend
ref: add more strict checks for regular refs
ref: port git-fsck(1) regular refs check for files backend
ref: support multiple worktrees check for refs
ref: initialize ref name outside of check functions
ref: check the full refname instead of basename
ref: initialize "fsck_ref_report" with zero
Junio C Hamano [Wed, 4 Dec 2024 01:14:41 +0000 (10:14 +0900)]
Merge branch 'ps/ref-backend-migration-optim'
The migration procedure between two ref backends has been optimized.
* ps/ref-backend-migration-optim:
reftable: rename scratch buffer
refs: adapt `initial_transaction` flag to be unsigned
reftable/block: optimize allocations by using scratch buffer
reftable/block: rename `block_writer::buf` variable
reftable/writer: optimize allocations by using a scratch buffer
refs: don't normalize log messages with `REF_SKIP_CREATE_REFLOG`
refs: skip collision checks in initial transactions
refs: use "initial" transaction semantics to migrate refs
refs/files: support symbolic and root refs in initial transaction
refs: introduce "initial" transaction flag
refs/files: move logic to commit initial transaction
refs: allow passing flags when setting up a transaction
Junio C Hamano [Wed, 4 Dec 2024 01:14:38 +0000 (10:14 +0900)]
Merge branch 'ps/leakfixes-part-10'
Leakfixes.
* ps/leakfixes-part-10: (27 commits)
t: remove TEST_PASSES_SANITIZE_LEAK annotations
test-lib: unconditionally enable leak checking
t: remove unneeded !SANITIZE_LEAK prerequisites
t: mark some tests as leak free
t5601: work around leak sanitizer issue
git-compat-util: drop now-unused `UNLEAK()` macro
global: drop `UNLEAK()` annotation
t/helper: fix leaking commit graph in "read-graph" subcommand
builtin/branch: fix leaking sorting options
builtin/init-db: fix leaking directory paths
builtin/help: fix leaks in `check_git_cmd()`
help: fix leaking return value from `help_unknown_cmd()`
help: fix leaking `struct cmdnames`
help: refactor to not use globals for reading config
builtin/sparse-checkout: fix leaking sanitized patterns
split-index: fix memory leak in `move_cache_to_base_index()`
git: refactor builtin handling to use a `struct strvec`
git: refactor alias handling to use a `struct strvec`
strvec: introduce new `strvec_splice()` function
line-log: fix leak when rewriting commit parents
...
Junio C Hamano [Wed, 4 Dec 2024 01:14:37 +0000 (10:14 +0900)]
Merge branch 'ps/gc-stale-lock-warning'
Give a bit of advice/hint message when "git maintenance" stops finding a
lock file left by another instance that still is potentially running.
* ps/gc-stale-lock-warning:
t7900: fix host-dependent behaviour when testing git-maintenance(1)
builtin/gc: provide hint when maintenance hits a stale schedule lock
Junio C Hamano [Tue, 26 Nov 2024 22:57:06 +0000 (07:57 +0900)]
Merge branch 'jk/gcc15'
GCC 15 compatibility updates.
* jk/gcc15:
object-file: inline empty tree and blob literals
object-file: treat cached_object values as const
object-file: drop oid field from find_cached_object() return value
object-file: move empty_tree struct into find_cached_object()
object-file: drop confusing oid initializer of empty_tree struct
object-file: prefer array-of-bytes initializer for hash literals
Junio C Hamano [Tue, 26 Nov 2024 22:57:04 +0000 (07:57 +0900)]
Merge branch 'ps/clar-build-improvement'
Fix for clar unit tests to support CMake build.
* ps/clar-build-improvement:
Makefile: let clar header targets depend on their scripts
cmake: use verbatim arguments when invoking clar commands
cmake: use SH_EXE to execute clar scripts
t/unit-tests: convert "clar-generate.awk" into a shell script
Junio C Hamano [Tue, 26 Nov 2024 22:57:03 +0000 (07:57 +0900)]
Merge branch 'kh/bundle-docs'
Documentation for "git bundle" saw improvements to more prominently
call out the use of '--all' when creating bundles.
* kh/bundle-docs:
Documentation/git-bundle.txt: discuss naïve backups
Documentation/git-bundle.txt: mention --all in spec. refs
Documentation/git-bundle.txt: remove old `--all` example
Documentation/git-bundle.txt: mention full backup example
shejialuo [Tue, 26 Nov 2024 14:40:57 +0000 (22:40 +0800)]
ref-cache: fix invalid free operation in `free_ref_entry`
In cfd971520e (refs: keep track of unresolved reference value in
iterators, 2024-08-09), we added a new field "referent" into the "struct
ref" structure. In order to free the "referent", we unconditionally
freed the "referent" by simply adding a "free" statement.
However, this is a bad usage. Because when ref entry is either directory
or loose ref, we will always execute the following statement:
free(entry->u.value.referent);
This does not make sense. We should never access the "entry->u.value"
field when "entry" is a directory. However, the change obviously doesn't
break the tests. Let's analysis why.
The anonymous union in the "ref_entry" has two members: one is "struct
ref_value", another is "struct ref_dir". On a 64-bit machine, the size
of "struct ref_dir" is 32 bytes, which is smaller than the 48-byte size
of "struct ref_value". And the offset of "referent" field in "struct
ref_value" is 40 bytes. So, whenever we create a new "ref_entry" for a
directory, we will leave the offset from 40 bytes to 48 bytes untouched,
which means the value for this memory is zero (NULL). It's OK to free a
NULL pointer, but this is merely a coincidence of memory layout.
To fix this issue, we now ensure that "free(entry->u.value.referent)" is
only called when "entry->flag" indicates that it represents a loose
reference and not a directory to avoid the invalid memory operation.
Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading references the reftable backend has to:
1. Create a new ref iterator.
2. Seek the iterator to the record we're searching for.
3. Read the record.
We cannot really avoid the last two steps, but re-creating the iterator
every single time we want to read a reference is kind of expensive and a
waste of resources. We couldn't help it in the past though because it
was not possible to reuse iterators. But starting with 5bf96e0c39
(reftable/generic: move seeking of records into the iterator,
2024-05-13) we have split up the iterator lifecycle such that creating
the iterator and seeking are two different concerns.
Refactor the code such that we cache iterators in the reftable backend.
This cache is invalidated whenever the respective stack is reloaded such
that we know to recreate the iterator in that case. This leads to a
sizeable speedup when creating many refs, which requires a lot of random
reference reads:
Benchmark 1: update-ref: create many refs (refcount = 100000, revision = master)
Time (mean ± σ): 1.793 s ± 0.010 s [User: 0.954 s, System: 0.835 s]
Range (min … max): 1.781 s … 1.811 s 10 runs
Benchmark 2: update-ref: create many refs (refcount = 100000, revision = HEAD)
Time (mean ± σ): 1.680 s ± 0.013 s [User: 0.846 s, System: 0.831 s]
Range (min … max): 1.664 s … 1.702 s 10 runs
Summary
update-ref: create many refs (refcount = 100000, revision = HEAD) ran
1.07 ± 0.01 times faster than update-ref: create many refs (refcount = 100000, revision = master)
While 7% is not a huge win, you have to consider that the benchmark is
_writing_ data, so _reading_ references is only one part of what we do.
Flame graphs show that we spend around 40% of our time reading refs, so
the speedup when reading refs is approximately ~2.5x that. I could not
find better benchmarks where we perform a lot of random ref reads.
You can also see a sizeable impact on memory usage when creating 100k
references. Before this change:
HEAP SUMMARY:
in use at exit: 19,112,538 bytes in 200,170 blocks
total heap usage: 8,400,426 allocs, 8,200,256 frees, 454,367,048 bytes allocated
After this change:
HEAP SUMMARY:
in use at exit: 674,416 bytes in 169 blocks
total heap usage: 7,929,872 allocs, 7,929,703 frees, 281,509,985 bytes allocated
As an additional factor, this refactoring opens up the possibility for
more performance optimizations in how we re-seek iterators. Any change
that allows us to optimize re-seeking by e.g. reusing data structures
would thus also directly speed up random reads.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 5bf96e0c39 (reftable/generic: move seeking of records into the
iterator, 2024-05-13) we have refactored the reftable codebase such that
iterators can be initialized once and then re-seeked multiple times.
This feature is used by 1869525066 (refs/reftable: wire up support for
exclude patterns, 2024-09-16) in order to skip records based on exclude
patterns provided by the caller.
The logic to re-seek the merged iterator is insufficient though because
we don't drain the priority queue on a re-seek. This means that the
queue may contain stale entries and thus reading the next record in the
queue will return the wrong entry. While this is an obvious bug, it is
harmless in the context of above exclude patterns:
- If the queue contained stale entries that match the pattern then the
caller would already know to filter out such refs. This is because
our codebase is prepared to handle backends that don't have a way to
efficiently implement exclude patterns.
- If the queue contained stale entries that don't match the pattern
we'd eventually filter out any duplicates. This is because the
reftable code discards items with the same ref name and sorts any
remaining entries properly.
So things happen to work in this context regardless of the bug, and
there is no other use case yet where we re-seek iterators. We're about
to introduce a caching mechanism though where iterators are reused by
the reftable backend, and that will expose the bug.
Fix the issue by draining the priority queue when seeking and add a
testcase that surfaces the issue.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
reftable/stack: add mechanism to notify callers on reload
Reftable stacks are reloaded in two cases:
- When calling `reftable_stack_reload()`, if the stat-cache tells us
that the stack has been modified.
- When committing a reftable addition.
While callers can figure out the second case, they do not have a
mechanism to figure out whether `reftable_stack_reload()` led to an
actual reload of the on-disk data. All they can do is thus to assume
that data is always being reloaded in that case.
Improve the situation by introducing a new `on_reload()` callback to the
reftable options. If provided, the function will be invoked every time
the stack has indeed been reloaded. This allows callers to invalidate
data that depends on the current stack data.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/reftable: refactor reflog expiry to use reftable backend
Refactor the callback function that expires reflog entries in the
reftable backend to use `reftable_backend_read_ref()` instead of
accessing the reftable stack directly. This ensures that the function
will benefit from the new caching layer that we're about to introduce.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/reftable: refactor reading symbolic refs to use reftable backend
Refactor the callback function that reads symbolic references in the
reftable backend to use `reftable_backend_read_ref()` instead of
accessing the reftable stack directly. This ensures that the function
will benefit from the new caching layer that we're about to introduce.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/reftable: read references via `struct reftable_backend`
Refactor `read_ref_without_reload()` to accept `struct reftable_backend`
as parameter instead of `struct reftable_stack`. Rename the function to
`reftable_backend_read_ref()` to clarify its scope and move it close to
other functions operating on `struct reftable_backend`.
This change allows us to implement an additional caching layer when
reading refs where we can reuse reftable iterators.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/reftable: figure out hash via `reftable_stack`
The function `read_ref_without_reload()` accepts a ref store as input
only so that we can figure out the hash function used by it. This is
duplicate information though because the reftable stack knows about its
hash function, too.
Drop the superfluous parameter to simplify the calling convention a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs/reftable: handle reloading stacks in the reftable backend
When accessing a stack we almost always have to reload the stack before
reading data from it. This is mostly because Git does not have a
notification mechanism for when underlying data has been changed, and
thus we are forced to opportunistically reload the stack every single
time to account for any changes that may have happened concurrently.
Handle the reload internally in `backend_for()`. For one this forces
callsites to think about whether or not they need to reload the stack.
But second this makes the logic to access stacks more self-contained by
letting the `struct reftable_backend` manage themselves.
Update callsites where we don't reload the stack to document why we
don't. In some cases it's unclear whether it is the right thing to do in
the first place, but fixing that is outside of the scope of this patch
series.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable ref store needs to keep track of multiple stacks, one for
the main worktree and an arbitrary number of stacks for worktrees. This
is done by storing pointers to `struct reftable_stack`, which we then
access directly.
Wrap the stack in a new `struct reftable_backend`. This will allow us to
attach more data to each respective stack in subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Karthik Nayak [Mon, 25 Nov 2024 14:55:30 +0000 (15:55 +0100)]
builtin: pass repository to sub commands
In 9b1cb5070f (builtin: add a repository parameter for builtin
functions, 2024-09-13) the repository was passed down to all builtin
commands. This allowed the repository to be passed down to lower layers
without depending on the global `the_repository` variable.
Continue this work by also passing down the repository parameter from
the command to sub-commands. This will help pass down the repository to
other subsystems and cleanup usage of global variables like
'the_repository' and 'the_hash_algo'.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
bisect: address Coverity warning about potential double free
Coverity has started to warn about a potential double-free in
`find_bisection()`. This warning is triggered because we may modify the
list head of the passed-in `commit_list` in case it is an UNINTERESTING
commit, but still call `free_commit_list()` on the original variable
that points to the now-freed head in case where `do_find_bisection()`
returns a `NULL` pointer.
As far as I can see, this double free cannot happen in practice, as
`do_find_bisection()` only returns a `NULL` pointer when it was passed a
`NULL` input. So in order to trigger the double free we would have to
call `find_bisection()` with a commit list that only consists of
UNINTERESTING commits, but I have not been able to construct a case
where that happens.
Drop the `else` branch entirely as it seems to be a no-op anyway.
Another option might be to instead call `free_commit_list()` on `list`,
which is the modified version of `commit_list` and thus wouldn't cause a
double free. But as mentioned, I couldn't come up with any case where a
passed-in non-NULL list becomes empty, so this shouldn't be necessary.
And if it ever does become necessary we'd notice anyway via the leak
sanitizer.
Interestingly enough we did not have a single test exercising this
branch: all tests pass just fine even when replacing it with a call to
`BUG()`. Add a test that exercises it.
Reported-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Tue, 26 Nov 2024 01:21:58 +0000 (10:21 +0900)]
Merge branch 'ps/leakfixes-part-10' into ps/bisect-double-free-fix
* ps/leakfixes-part-10: (27 commits)
t: remove TEST_PASSES_SANITIZE_LEAK annotations
test-lib: unconditionally enable leak checking
t: remove unneeded !SANITIZE_LEAK prerequisites
t: mark some tests as leak free
t5601: work around leak sanitizer issue
git-compat-util: drop now-unused `UNLEAK()` macro
global: drop `UNLEAK()` annotation
t/helper: fix leaking commit graph in "read-graph" subcommand
builtin/branch: fix leaking sorting options
builtin/init-db: fix leaking directory paths
builtin/help: fix leaks in `check_git_cmd()`
help: fix leaking return value from `help_unknown_cmd()`
help: fix leaking `struct cmdnames`
help: refactor to not use globals for reading config
builtin/sparse-checkout: fix leaking sanitized patterns
split-index: fix memory leak in `move_cache_to_base_index()`
git: refactor builtin handling to use a `struct strvec`
git: refactor alias handling to use a `struct strvec`
strvec: introduce new `strvec_splice()` function
line-log: fix leak when rewriting commit parents
...
This says that hash2 and hash3 should be squashed into hash1 and
that hash3’s commit message should be used for the resulting commit.
So the user is presented with an editor where the two first commit
messages are commented out and the third is not. However this does
not work if `core.commentChar`/`core.commentString` is in use since
the comment char is hardcoded (#) in this `sequencer.c` function.
As a result the first commit message will not be commented out.
sequencer: comment `--reference` subject line properly
`git revert --reference <commit>` leaves behind a comment in the
first line:[1]
# *** SAY WHY WE ARE REVERTING ON THE TITLE LINE ***
Meaning that the commit will just consist of the next line if the user
exits the editor directly:
This reverts commit <--format=reference commit>
But the comment char here is hardcoded (#). Which means that the
comment line will inadvertently be included in the commit message if
`core.commentChar`/`core.commentString` is in use.
† 1: See 43966ab3156 (revert: optionally refer to commit in the
"reference" format, 2022-05-26)
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git rebase --update-ref` does not insert commands for dependent/sub-
branches which are checked out.[1] Instead it leaves a comment about
that fact. The comment char is hardcoded (#). In turn the comment
line gets interpreted as an invalid command when `core.commentChar`/
`core.commentString` is in use.
† 1: See 900b50c242 (rebase: add --update-refs option, 2022-07-19)
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both `struct block_writer` and `struct reftable_writer` have a `buf`
member that is being reused to optimize the number of allocations.
Rename the variable to `scratch` to clarify its intend and provide a
comment explaining why it exists.
Suggested-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
refs: adapt `initial_transaction` flag to be unsigned
The `initial_transaction` flag is tracked as a signed integer, but we
typically pass around flags via unsigned integers. Adapt the type
accordingly.
Suggested-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
t7900: fix host-dependent behaviour when testing git-maintenance(1)
We have recently added a new test to t7900 that exercises whether
git-maintenance(1) fails as expected when the "schedule.lock" file
exists. The test depends on whether or not the host has the required
executables present to schedule maintenance tasks in the first place,
like systemd or launchctl -- if not, the test fails with an unrelated
error before even checking for the lock file. This fails for example in
our CI systems, where macOS images do not have launchctl available.
Fix this issue by creating a stub systemctl(1) binary and using the
systemd scheduler.
Reported-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 25 Nov 2024 03:14:01 +0000 (12:14 +0900)]
doc: option value may be separate for valid reasons
Even though `git help cli` recommends users to prefer using
"--option=value" over "--option value", there can be reasons why
giving them separately is a good idea. One reason is that shells do
not perform tilde expansion for `--option=~/path/name` but they
expand `--options ~/path/name` just fine.
This is not a problem for many options whose option parsing is
properly written using OPT_FILENAME(), because the value given to
OPT_FILENAME() is tilde-expanded internally by us, but some commands
take a pathname as a mere string, which needs this trick to have the
shell help us.
I think the reason we originally decided to recommend the stuck form
was because an option that takes an optional value requires you to
use it in the stuck form, and it is one less thing for users to
worry about if they get into the habit to always use the stuck form.
But we should be discouraging ourselves from adding an option with
an optional value in the first place, and we might want to weaken
the current recommendation.
In any case, let's describe this one case where it is necessary to
use the separate form, with an example.
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio C Hamano [Mon, 25 Nov 2024 03:29:39 +0000 (12:29 +0900)]
Merge branch 'jk/output-prefix-cleanup' into maint-2.47
Code clean-up.
* jk/output-prefix-cleanup:
diff: store graph prefix buf in git_graph struct
diff: return line_prefix directly when possible
diff: return const char from output_prefix callback
diff: drop line_prefix_length field
line-log: use diff_line_prefix() instead of custom helper
Junio C Hamano [Mon, 25 Nov 2024 03:20:42 +0000 (12:20 +0900)]
Merge branch 'master' of https://github.com/j6t/gitk into maint-2.47
* 'master' of https://github.com/j6t/gitk:
Makefile(s): avoid recipe prefix in conditional statements
doc: switch links to https
doc: update links to current pages
Philippe Blain [Fri, 22 Nov 2024 19:50:22 +0000 (19:50 +0000)]
git-difftool--helper.sh: exit upon initialize_merge_tool errors
Since the introduction of 'initialize_merge_tool' in de8dafbada
(mergetool: break setup_tool out into separate initialization function,
2021-02-09), any errors from this function are ignored in
git-difftool--helper.sh::launch_merge_tool, which is not the case for
its call in git-mergetool.sh::merge_file.
Despite the in-code comment, initialize_merge_tool (via its call to
setup_tool) does different checks than run_merge_tool, so it makes sense
to abort early if it encounters errors. Add exit calls if
initialize_merge_tool fails.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Fri, 22 Nov 2024 19:50:21 +0000 (19:50 +0000)]
git-mergetool--lib.sh: add error message for unknown tool variant
In setup_tool, we check if the given tool is a known variant of a tool,
and quietly return with an error if not. This leads to the following
invocation quietly failing:
git mergetool --tool=vimdiff4
Add an error message before returning in this case.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Fri, 22 Nov 2024 19:50:20 +0000 (19:50 +0000)]
git-mergetool--lib.sh: add error message if 'setup_user_tool' fails
In git-mergetool--lib.sh::setup_tool, we check if the given tool is a
known builtin tool, a known variant, or a user-defined tool by calling
setup_user_tool, and we return with the exit code from setup_user_tool
if it was called. setup_user_tool checks if {diff,merge}tool.$tool.cmd
is set and quietly returns with an error if not.
This leads to the following invocation quietly failing:
git mergetool --tool=unknown
which is not very user-friendly. Adjust setup_tool to output an error
message before returning if setup_user_tool returned with an error.
Note that we do not check the result of the second call to
setup_user_tool in setup_tool, as this call is only meant to allow users
to redefine 'cmd' for a builtin tool; it is not an error if they have
not done so.
Note that this behaviour of quietly failing is a regression dating back
to de8dafbada (mergetool: break setup_tool out into separate
initialization function, 2021-02-09), as before this commit an unknown
mergetool would be diagnosed in get_merge_tool_path when called from
run_merge_tool.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Philippe Blain [Fri, 22 Nov 2024 19:50:19 +0000 (19:50 +0000)]
git-mergetool--lib.sh: use TOOL_MODE when erroring about unknown tool
In git-mergetool--lib.sh::get_merge_tool_path, we check if the chosen
tool is valid via valid_tool and exit with an error message if not. This
error message mentions "Unknown merge tool", even if the command the
user tried was 'git difftool --tool=unknown'. Use the global 'TOOL_MODE'
variable for a more correct error message.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>