Daan De Meyer [Mon, 15 Jan 2024 21:24:08 +0000 (22:24 +0100)]
Add PackageDirectories=
Let's make it possible to serve local packages as a local repository
so that users don't have to put local paths in their Packages= setting.
We'll also allow adding more packages to this local repository in the
build script so that these can be installed in the initrd when we build
it or in a postinst or finalize script.
Daan De Meyer [Mon, 15 Jan 2024 19:59:03 +0000 (20:59 +0100)]
Fix --mirror for CentOS and Fedora
Let's unify the interface for --mirror and only require users to
specify a url and add the entire path ourselves in mkosi. This is
required to use EPEL repositories with --mirror= as the epel
repositories are mirrored under <url>/fedora/epel whereas the CentOS
Stream repositories are under <url>/centos-stream
Daan De Meyer [Sun, 14 Jan 2024 20:53:06 +0000 (21:53 +0100)]
Run systemd-tmpfiles as part of the build
Let's make sure we take user provided tmpfiles snippets into account
as well. Since systemd now mounts the initramfs read-only by default,
we need to make sure all tmpfiles snippets that copy to /etc have
already been processed during the image build itself as they won't be
able to run during the initramfs stage.
Daan De Meyer [Sun, 14 Jan 2024 17:02:39 +0000 (18:02 +0100)]
Make sure /etc/mtab exists in sandbox
Required for pacman's CheckSpace option. To avoid messing with the
package manager tree /etc too much, we bind mount individual
subdirectories of it instead of the entire directory.
Daan De Meyer [Sun, 14 Jan 2024 16:16:05 +0000 (17:16 +0100)]
Make sure we don't build the same tools tree more than once
We can do this by simply checking if the output path already exists
instead of relying on needs_build(). This allows us to refactor
needs_build() to needs_clean(). We also move some prechecks into
run_build() and run_clean() so as to not duplicate them and improve
the logging messages in run_clean().
Daan De Meyer [Fri, 12 Jan 2024 14:28:41 +0000 (15:28 +0100)]
Add BuildSources=. to the default image configuration
If we enable the rpm build, we set BuildSources= which means we
override the default build sources. However we still want the source
directory to be used as BuildSources= as well, so configure it explicitly.
Daan De Meyer [Fri, 12 Jan 2024 11:30:41 +0000 (12:30 +0100)]
Verify that output path is not a symlink in needs_build()
Otherwise if we first build a disk image and then try to run
"mkosi -t directory qemu" we won't actually rebuild the image as it
will think the output already exists and we'll try to boot a disk
image as a directory.
Daan De Meyer [Fri, 12 Jan 2024 09:15:53 +0000 (10:15 +0100)]
Improve SELinux binary policy selection
Let's deal with the possibility that there might be more than one
policy in the binary policy directory. Let's also make sure that we
consider other files in the directory that might not be policies.
Daan De Meyer [Thu, 11 Jan 2024 13:07:20 +0000 (14:07 +0100)]
Use grub binaries from tools tree instead of from image
Let's give this another try and use grub tools from the tools tree
instead of from the image.
We also hardcode the grub prefix per distribution because if we use
grub binaries from the tools tree there might not be any installed
in the image itself which means we can't derive the prefix from the
binaries in the image.
Daan De Meyer [Wed, 10 Jan 2024 15:47:58 +0000 (16:47 +0100)]
Check for all required setfiles inputs in want_selinux_relabel()
On Debian when policycoreutils is installed a policy is configured
without a matching binary policy being installed, so we have to
check that all parts are there.
Daan De Meyer [Wed, 10 Jan 2024 09:58:29 +0000 (10:58 +0100)]
Copy nspawn settings to the output directory again
machinectl pull-tar looks for a settings file so let's make sure
the output directory can be used directly for this purpose by copying
the nspawn settings file to the output directory again.
Daan De Meyer [Tue, 9 Jan 2024 19:22:59 +0000 (20:22 +0100)]
Use the directory mkosi was invoked in as the default for BuildSources=
While parsing config, we use chdir(). Also, when a BuildSources=
match is found, BuildSources= is initialized to its default value
which is Path.cwd(). However, we want the default value to be the
top level directory that mkosi was invoked in, not the current working
directory that we happen to be in while parsing configuration. Let's
fix this by using the directory mkosi was invoked in instead of Path.cwd().
Daan De Meyer [Tue, 9 Jan 2024 10:24:18 +0000 (11:24 +0100)]
Only run mount --make-rslave / if we didn't unshare a user namespace
When unsharing a mount namespace in a different user namespace than
the parent mount namespace, all mounts are marked as slave by default
so we don't need to explicitly mark all of them as slave mounts.
Daan De Meyer [Mon, 8 Jan 2024 22:31:37 +0000 (23:31 +0100)]
Simplify apivfs_cmd() and chroot_cmd()
We move the setpgid logic to run(), avoiding the need to pass a tools
argument to chroot_cmd() and apivfs_cmd().
We also try to remove as much logic from these functions as possible.
Since we can't really assume that any logic we execute during the
function will still hold true in the sandbox, so it's best to delay
any logic execution until we're already in the sandbox (using the
--ro-bind-try options of bubblewrap).
We also rework the /etc/resolv.conf handling to simply make sure that
/run/systemd/resolve exists in the chroot since if /etc/resolv.conf
points to /run it'll almost certainly be to
/run/systemd/resolv/stub-resolv.conf.
Daan De Meyer [Mon, 8 Jan 2024 15:56:31 +0000 (16:56 +0100)]
Use /work for host scripts as well
Now that everything runs sandboxed, /work is free to use for host
scripts as well. At the same time, let's stop unconditionally
mounting the current working directory when running build scripts.
To keep things working smoothly, we'll make mounting the current
working directory the default value for BuildSources= instead.
Daan De Meyer [Mon, 8 Jan 2024 14:52:15 +0000 (15:52 +0100)]
Don't use host's /var/tmp in sandbox
Instead, use a subdirectory of the host's /var/tmp. Because we want
to limit the lifetime of this directory to the lifetime of the sandbox,
we use a shell command to create and remove the directory.
Daan De Meyer [Mon, 8 Jan 2024 14:21:01 +0000 (15:21 +0100)]
Put tmpfs on /tmp in sandbox when not in relaxed mode
Let's sandbox more by not using the host's /tmp but instead putting
a fresh tmpfs on /tmp. We used the host's /tmp before because the
definitions could potentially be in the host's /tmp but now that we
mount everything in explicitly that isn't a problem anymore.
Daan De Meyer [Tue, 2 Jan 2024 07:37:40 +0000 (08:37 +0100)]
Use bubblewrap to set up the tools tree instead of doing it ourselves
The problem with overmounting the host's /usr (in a private mount
namespace) is that we have no control over the symlinks in the root
directory (/lib, /bin, /lib64) and if these symlinks don't match
between the host distribution and the tools tree distribution, all
kinds of weird breakage starts happening. For example, using Fedora
tools trees on Arch Linux is currently broken because /lib64 on Arch
Linux points to /usr/lib whereas on Fedora it points to /usr/lib64.
Because we can't (and shouldn't) modify the symlinks of the host's
root filesystem, we need to set up the tools tree in a sandbox that
we chroot into, so that we have full control over the rootfs of the
sandbox and can make sure the symlinks are correct. Luckily, we
already do just that with bubblewrap, except that currently we mount
the tools tree over /usr ourselves and then just carry that over into
the bubblewrap sandbox.
Instead, we stop mounting over the host's /usr ourselves and have
bubblewrap pick the right /usr itself. We also copy the symlinks from
the tools tree or the host if there is no tools tree.
Because we don't mount over the host's /usr anymore, we have to run
every tool that should come from the tools tree with bubblewrap now.
The side effect of this is that almost all of our tools now run
sandboxed. We also have to make use of find_binary() everywhere
instead of shutil.which() to make sure we look for binaries in the
tools tree when required. Various other codepaths that look into /usr
also have to be modified to look into the tools tree when needed.
Also, because we don't unshare the user namespace in the main mkosi
process anymore now, we can get rid of a lot of chown()'s in qemu.py
and opening the qemu device file descriptors can be moved into
run_qemu() itself.
We also don't have to make sure all python modules are loaded anymore
as the host's /usr is never overmounted so the required python modules
will be available for the entire runtime of the mkosi process.
Because virtiofsd is now executed with bubblewrap, we use bubblewrap
to set up the required uidmap instead of relying on virtiofsd to do it
with newuidmap/newgidmap. Note that this breaks RuntimeTrees= as
virtiofsd unconditionally tries to drop groups with setgroups() which
fails with EPERM in an unprivileged user namespace set up by bubblewrap.
This is fixed by https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/207
which is still awaiting review.
To make this work codewise, this commit renames the bwrap() function
to sandbox_cmd() (similar to chroot_cmd() and apivfs_cmd()) which now
returns a command line instead of executing the command itself. run()
is modified to take an extra "sandbox" arguments which is simply the
part of the full command that sets up the sandbox. Context and Config
both learn new sandbox() methods which set up the sandbox for each
object respectively (mostly by adding extra bind mounts).
Because almost every call to run() now takes a sandbox, this gives us
a lot of control over the individual environment for each tool we run.
We make use of this to restrict each tool we run to the minimal possible
sandbox that that tool needs to run. By specifically mounting in the
required paths for each tool we run, we also make sure these are always
available instead of relying that somewhere we mount a path that has the
input in it.
Because we allow passing arbitrary options to mkosi qemu, mkosi boot and
various other verbs, we run these verbs with a relaxed sandbox, where we
mount in most directories from the host. This means that whatever
directories users specify will be available.
In terms of CI, the extra sandboxing means that our previous approach of
building various systemd binaries from source and symlinking them to
/usr/bin doesn't work anymore. Instead, we opt to always use tools trees
and drop the host builds from the testing matrix. This also simplifies
and speeds up the github action as we don't have to compile systemd and
xfsprogs from source and we have to install fewer packages.
Daan De Meyer [Fri, 5 Jan 2024 13:01:26 +0000 (14:01 +0100)]
Don't copy xattrs from mkosi.extra and friends
These directories and files might have selinux xattrs and such that
we don't want to end up in the image so let's make sure that we don't
copy xattrs from skeleton and extra trees.
Daan De Meyer [Thu, 4 Jan 2024 12:17:27 +0000 (13:17 +0100)]
Add RuntimeScratch= setting
When booting output formats that reside almost entirely in memory
(initrd, UKI, ESP), doing any kind of write heavy operation in the
booted VM has a high chance of leading to OOM errors as all files
will be written in memory.
When booting disk images, unless one is using RuntimeSize=, one will
often run into disk space issues when writing lots of data.
When booting off virtiofs and doing write heavy operations, virtiofsd
can run out of file descriptors or become very slow.
To allow doing write heavy operations in all these scenarios, let's
add RuntimeScratch= which mounts extra scratch space to /var/tmp that
can be used for write heavy operations.
Daan De Meyer [Fri, 5 Jan 2024 08:23:55 +0000 (09:23 +0100)]
Fix importlib usage
We have to use as_file() on the final path, not the module path.
Because as_file() only learned to support directories in python 3.12,
we backport the 3.12 implementation temporarily in mkosi itself.
Because as_file() does not apply the executable bit, we apply it
ourselves after parsing the config. This requires delaying the check
if scripts are executable to some later point so we can parse the
config without failing because scripts are not executable.
Daan De Meyer [Thu, 4 Jan 2024 14:59:24 +0000 (15:59 +0100)]
ci: Disable jobs with arch linux tools trees for now.
Arch has qemu 8.2 which has severely broken TCG acceleration
(see https://gitlab.com/qemu-project/qemu/-/issues/2070). Let's disable
the jobs with arch tools trees until the bug is fixed.