Azure Linux looks a lot like Fedora Linux so we opt to share configuration
between Azure and Fedora/CentOS and inherit the Azure definition from
Fedora.
Daan De Meyer [Fri, 16 Aug 2024 21:41:49 +0000 (23:41 +0200)]
Introduce mkosi-sandbox and stop using subuids for image builds
Over the last years, we've accumulated a rather nasty set of workarounds
for various issues in bubblewrap:
- We contributed setpgid to util-linux and use it if available because
bubblewrap does not support making its child process the foreground
process.
- We added the innerpid logic to run() because bubblewrap does not forward
signals to the separate child process it runs in the sandbox which meant
they were getting SIGKILLed when we killed bubblewrap, preventing proper
cleanup from happening.
- bubblewrap does not provide a proper way to detect whether the command
was found in the sandbox or not, which meant we had to execute command -v
within the sandbox separately to check whether the command exists or not.
- We had to add extra logic to make sure / was a mount in the initramfs to
allow running mkosi in the initramfs as bubblewrap does not fall back to
MS_MOVE if pivot_root() doesn't work.
- We had to stitch together shell invocations after bubblewrap but before
executing the actual command we want to run to make sure directories had
the correct mode as bubblewrap creates everything with mode 0700 which was
too restrictive in many cases for us. This was fixed with new --perms and
--chmod options in bubblewrap 0.5 but we had to keep compat with 0.4
because that's what's shipped in CentOS Stream 9.
- We had to figure out a shell hack to do overlayfs mounts as these are not
supported by bubblewrap (even though a PR for the feature has been open for
years).
- We had to introduce a Mount struct to pass around mounts so we could deduplicate
and sort them before passing them to bubblewrap as bubblewrap did not do this
itself.
- Debugging all the above was made all the harder by the fact that bubblewrap's
source code is full of tech debt from its history of being a setuid tool
instead of using user namespaces. Getting any fixes into upstream is almost
impossible as the tool is practically unmaintained.
Aside from bubblewrap, our other source of troubles has been newuidmap/newgidmap.
Running as a user within the subuid range configured in /etc/sub{u,g}id has
meant we're constantly fixing ownership and permissions issues where stuff needs
to be chowned and chmodded everywhere to make sure the current user and the
subuid user can access the proper files. Another unfortunate side effect is that
users end up with many files owned by the subuid root user in their home
directories when building images with mkosi;
Let's fix all these issues at once by getting rid of bubblewrap and
newuidmap/newgidmap.
bubblewrap is replaced with a new tool mkosi-sandbox. It looks and behaves a
lot like bubblewrap, except it's much less code and much more flexible to fit
our needs, allowing us to get rid of all the hacks we've built up over the years to
work around issues that didn't get fixed in bubblewrap.
To get rid of newuidmap/newgidmap, a rework of our user namespacing was needed.
The need to use newuidmap/newgidmap came from the assumption that we need a full
65k subuid range to do unprivileged image builds, as distributions ship packages
containing files and directories that are not owned by the root user. After some
investigation, it turns out that there's very few files and directories not owned
by root in distribution packages if you ignore /var. If we could temporarily
ignore the ownership on these files and directories until we can get distributions
to only ship root owned files in /usr and /etc of their packages, we could simply
map the current user to root in a user namespace and get rid of the subuid range
completely.
Turns out that's possible with a seccomp filter. seccomp allows you to make all
chown() syscalls succeed without actually doing anything. The files and directories
end up owned by the root user instead. If we assume this is OK and are OK with
instructing users to use tmpfiles to fix up the permissions on first boot if needed,
a seccomp filter like this is sufficient to allow us to get rid of doing image
builds within a subuid user namespace.
It turns out we can go one step further. It turns out that for the majority of
the image build, one doesn't actually need to be the root user. Only package
managers and systemd-repart need the current user to be mapped to root to do their
job correctly. The reason we did the entire build mapped to root until now was
that we need to do a few mounts as part of the image build process and for now
I was under the assumption that you needed to be root for that. It turns out that
when you unshare a user namespace, you get a full set of capabilities regardless
of whether you're root or some other uid in the user namespace. The only difference
is that when you exec a subprocess as root, the capabilities aren't lost, whereas
they are when you exec a subprocess as a non-root user. This can be avoided by
adding the capabilities of the non-root user to the inheritable and ambient set.
Once that's done, any subprocess exec'd by a non-root user in the user namespace
can mount as many bind and overlay mounts as they can think of.
The above allows us to run most of the image build under the current user uid
instead of root, only switching to root when running package managers, invoking
systemd-repart or systemd-tmpfiles, or when chroot-ing into the image. This allows
us to get rid of various hacks we had to look up the proper user name or home
directory.
Specifically, we can get rid of the following:
- mkosi-as-caller can become a noop since we now by default run the build as the
caller.
- Lots of chmod()'s and chown()'s can be removed
- All uses of INVOKING_USER.uid/gid can be removed, and most can be replaced with
simple os.getuid()/os.getgid()
- We can use /etc/passwd and /etc/group from the host instead of building our own
- We can get rid of the Acl= option as the user will now be able to remove (almost)
all files written by mkosi.
- We don't have to rchown the package manager cache directory anymore after each
build. Root user builds will now use the system cache instead of the per user
cache.
- We can get rid of the Mount struct as mkosi-sandbox dedups and sorts operations
itself.
One thing to note is that if we're invoked as root, none of the seccomp or capabilities
stuff applies and it is all skipped as it's not required in that case. This means that
when building as root it's still possible to have more than one user in the generated
image unlike when building unprivileged. Also note that users can still be added to
/etc/passwd and such, they just can't own any files or directories in the image itself
until the image is booted.
Michael Ferrari [Wed, 7 Aug 2024 09:37:48 +0000 (11:37 +0200)]
Add executable `mkosi.version` support
`mkosi.version` is executed during configuration parsing, as opposed
to reading the contents of `mkosi.version`. This allows querying the
version before the build without needing to manually adjust the version
beforehand.
This allows using date based versioning by writing a script outputting
`date '+%Y-%m-%d'` or using git tag based versioning by outputting
`git describe --tags`.
kali: A distribution based on Debian: https://www.kali.org/
Kali includes many packages suitable for offensive security tasks.
It follows a rolling release model and serves fewer architectures
than Debian.
Building a kali image requires installing kali-archive-keyring:
- Source: https://gitlab.com/kalilinux/packages/kali-archive-keyring
- Packages: https://pkg.kali.org/pkg/kali-archive-keyring
Markus Weippert [Sat, 10 Aug 2024 07:36:56 +0000 (09:36 +0200)]
Fix loaded host modules filter
Module filenames might use dashes instead of underscores.
Also, anchoring the filename to a directory avoids including unrelated
modules (e.g. exfat vs fat).
Daan De Meyer [Fri, 9 Aug 2024 10:15:09 +0000 (12:15 +0200)]
Add --wipe-build-dir to allow clearing the build directory independently
Currently, to clear the build directory, -ff has to be used which
also clears the image cache. Let's add --wipe-build-dir (-w) to allow
clearing only the build directory without clearing the image cache.
Luca Boccassi [Wed, 7 Aug 2024 22:39:06 +0000 (23:39 +0100)]
distributions: drop Debian workaround for lack of VERSION_CODENAME
It has been present since Debian 9, so we can rely on it now.
It is wrong on sid, but that's a separate issue that this old
workaround doesn't solve anyway.
Daan De Meyer [Thu, 8 Aug 2024 11:09:46 +0000 (13:09 +0200)]
debian: Fix up os-release for unstable/sid builds
The version codename for unstable/sid builds is indistinguishable from
testing. Let's make sure we fix that up ourselves so that unstable image
builds can be properly distinguished from testing builds.
Daan De Meyer [Tue, 6 Aug 2024 07:57:46 +0000 (09:57 +0200)]
Don't pass down empty lists to subimages unless explicitly configured
This makes sure that subimages use default values for list based
settings unless they were explicitly configured in configuration or
on the command line or have a non-empty default value in the main
image.
Daan De Meyer [Fri, 2 Aug 2024 08:13:20 +0000 (10:13 +0200)]
Use debian as the default tools tree again on Ubuntu
Debian has distribution-gpg-keys which Ubuntu doesn't. As we'll likely
keep running into similar scenarios in the future, let's just stick with
Debian as Ubuntu's default tools tree.
Daan De Meyer [Fri, 2 Aug 2024 11:16:17 +0000 (13:16 +0200)]
Exit early if output format is none and there are no build scripts
In systemd, the build script is part of a subimage so the build is
done as part of the subimage and there's nothing to do for the main
image. To speed things up a bit, exit early if there are no build
scripts and the output format is none.
Daan De Meyer [Thu, 1 Aug 2024 10:37:50 +0000 (12:37 +0200)]
Introduce RepositoryKeyFetch=
This setting controls whether we'll fetch GPG keys remotely or not.
We disable it by default so that we only rely on locally available GPG
keys for checking package and repository metadata signatures.
This new setting only affects dnf/zypper based distributions as apt
and pacman do not support retrieving GPG keys remotely in the first
place.
zypper does not trust GPG keys listed in gpgkey= by default so we import
local GPG keys manually with rpm to work around that.
Daan De Meyer [Thu, 1 Aug 2024 14:35:36 +0000 (16:35 +0200)]
Only use unshare to become root if we're actually going to use a scope
If the relevant environment variables are not set, scope_cmd() will
return an empty list and we won't use a scope after all. In that case
we don't need to use unshare either to become root and can rely on our
own become_root() function so check whether we're actually going to use
a scope or not.
Daan De Meyer [Thu, 1 Aug 2024 13:26:32 +0000 (15:26 +0200)]
tests: Simplify initrd tests
Let's get rid of the fixtures and just rely on the default initrd
built as part of the image itself. This also means any settings
picked up from mkosi.local.conf are applied to the initrd build.
Let's require users to set these automatically if they want to
have autologin without enabling the Autologin= setting. This gives
more flexibility after https://github.com/systemd/systemd/pull/33873
is merged in systemd as users can choose to enable the settings
globally or per tty depending on what they need.
Set up proper environment variables for kernel-install
If we're not explicitly disabling kernel-install during package
manager invocations, let's set up the environment to make it do the
right thing instead.
Look for $USER for the username before reading /etc/passwd
Let's take $USER into account if set before reading /etc/passwd
for the username. This gives a way out for environments where the
uid of the user does not have an entry in /etc/passwd.
Handle failure to detect the distribution in test_parse_config()
If /usr/lib/os-release isn't available, we can't detect the current
distribution, so let's make sure we handle that scenario as well by
checking for Distribution.custom instead of None.
Template the options definitions directly into the completion function, since
for some weird scoping reasons even though the script is read fine and when
running a shell with set -x one can see e.g. _mkosi_options being assigned the
proper values, the completion function still uses '' for
"${_mkosi_options[*]}".
This wasn't caught during development because the script works fine when
sourced.
Look up qemu and virt-fw-vars in extra search paths
Because qemu uses OVMF firmware descriptions from /usr, we look
those up in the same root that we'll be invoking qemu from. Because
virt-fw-vars operates on the same files, we also invoke it in the
same root that we find qemu in.
There could potentially be a huge amount of modules and firmware
which makes these log messages very noisy. Let's drop them to make
debug logs less annoying to parse.
Assign return code before calling sys.excepthook()
It seems sys.excepthook() can raise its own exception? I'm not entirely
sure what's going on, but as a safety measure, let's assign the correct
return code before we invoke sys.excepthook() so that we always exit with
the right returncode.