On really old kernels (< 4.14+) a bug in /proc/1/sched handling in the
kernel could be used to determine whether we are running in a PID
namespace. This hasn't worked for a long time, and there's little point
in making things work on old kernels we can't make work on current
kernels, hence let's drop that old cruft.
Daan De Meyer [Mon, 7 Dec 2020 22:18:28 +0000 (22:18 +0000)]
Silence cgroups v1 read-only filesystem warning
Avoid warning messages when booting systemd-nspawn containers and using
hybrid or legacy cgroups. systemd-nspawn mounts the cgroups v1 controller
tree as read-only so these errors are expected and not problematic.
Partially fixes #17862.
```
‣ Processing default...
Spawning container image on /home/daan/projects/systemd/image.raw.
Press ^] three times within 1s to kill container.
systemd 247 running in system mode. (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization systemd-nspawn.
Detected architecture x86-64.
Welcome to Fedora 33 (Thirty Three)!
Queued start job for default target Graphical Interface.
-.slice: Failed to migrate controller cgroups from , ignoring: Read-only file system
system.slice: Failed to delete controller cgroups /system.slice, ignoring: Read-only file system
[ OK ] Created slice system-getty.slice.
[ OK ] Created slice system-modprobe.slice.
user.slice: Failed to delete controller cgroups /user.slice, ignoring: Read-only file system
[ OK ] Created slice User and Session Slice.
[ OK ] Started Dispatch Password Requests to Console Directory Watch.
[ OK ] Started Forward Password Requests to Wall Directory Watch.
[ OK ] Reached target Local Encrypted Volumes.
[ OK ] Reached target Paths.
[ OK ] Reached target Remote File Systems.
[ OK ] Reached target Slices.
[ OK ] Reached target Swap.
[ OK ] Listening on Process Core Dump Socket.
[ OK ] Listening on initctl Compatibility Named Pipe.
[ OK ] Listening on Journal Socket (/dev/log).
[ OK ] Listening on Journal Socket.
[ OK ] Listening on User Database Manager Socket.
dev-hugepages.mount: Failed to delete controller cgroups /dev-hugepages.mount, ignoring: Read-only file system
Mounting Huge Pages File System...
sys-fs-fuse-connections.mount: Failed to delete controller cgroups /sys-fs-fuse-connections.mount, ignoring: Read-only file system
Mounting FUSE Control File System...
Starting Journal Service...
Starting Remount Root and Kernel File Systems...
system.slice: Failed to delete controller cgroups /system.slice, ignoring: Read-only file system
```
After: `mkosi --default .mkosi/mkosi.fedora boot`
```
‣ Processing default...
Spawning container image on /home/daan/projects/systemd/mkosi.output/image.raw.
Press ^] three times within 1s to kill container.
systemd 247 running in system mode. (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization systemd-nspawn.
Detected architecture x86-64.
Welcome to Fedora 33 (Thirty Three)!
Queued start job for default target Graphical Interface.
[ OK ] Created slice system-getty.slice.
[ OK ] Created slice system-modprobe.slice.
[ OK ] Created slice User and Session Slice.
[ OK ] Started Dispatch Password Requests to Console Directory Watch.
[ OK ] Started Forward Password Requests to Wall Directory Watch.
[ OK ] Reached target Local Encrypted Volumes.
[ OK ] Reached target Paths.
[ OK ] Reached target Remote File Systems.
[ OK ] Reached target Slices.
[ OK ] Reached target Swap.
[ OK ] Listening on Process Core Dump Socket.
[ OK ] Listening on initctl Compatibility Named Pipe.
[ OK ] Listening on Journal Socket (/dev/log).
[ OK ] Listening on Journal Socket.
[ OK ] Listening on User Database Manager Socket.
Mounting Huge Pages File System...
Mounting FUSE Control File System...
Starting Journal Service...
Starting Remount Root and Kernel File Systems...
[ OK ] Mounted Huge Pages File System.
[ OK ] Mounted FUSE Control File System.
[ OK ] Finished Remount Root and Kernel File Systems.
Starting Create Static Device Nodes in /dev...
[ OK ] Finished Create Static Device Nodes in /dev.
[ OK ] Reached target Local File Systems (Pre).
[ OK ] Reached target Local File Systems.
Starting Restore /run/initramfs on shutdown...
[ OK ] Finished Restore /run/initramfs on shutdown.
[ OK ] Started Journal Service.
Starting Flush Journal to Persistent Storage...
[ OK ] Finished Flush Journal to Persistent Storage.
Starting Create Volatile Files and Directories...
[ OK ] Finished Create Volatile Files and Directories.
Starting Network Name Resolution...
Starting Update UTMP about System Boot/Shutdown...
[ OK ] Finished Update UTMP about System Boot/Shutdown.
[ OK ] Reached target System Initialization.
[ OK ] Started Daily Cleanup of Temporary Directories.
[ OK ] Reached target Timers.
[ OK ] Listening on D-Bus System Message Bus Socket.
[ OK ] Reached target Sockets.
[ OK ] Reached target Basic System.
Starting Home Area Manager...
Starting User Login Management...
Starting Permit User Sessions...
[ OK ] Finished Permit User Sessions.
[ OK ] Started Console Getty.
[ OK ] Reached target Login Prompts.
Starting D-Bus System Message Bus...
[ OK ] Started D-Bus System Message Bus.
[ OK ] Started Home Area Manager.
[ OK ] Started User Login Management.
[ OK ] Reached target Multi-User System.
[ OK ] Reached target Graphical Interface.
Starting Update UTMP about System Runlevel Changes...
[ OK ] Finished Update UTMP about System Runlevel Changes.
[ OK ] Started Network Name Resolution.
[ OK ] Reached target Host and Network Name Lookups.
Fedora 33 (Thirty Three) (built from systemd tree)
Kernel 5.9.11-arch2-1 on an x86_64 (console)
```
test: add test that dlopen()'s all our weak library deps once
This test should ensure we notice if distros update shared libraries
that broke so name, and we still use the old soname.
(In contrast to what the commit summary says, this currently doesn#t
cover really all such deps, specifically xkbcommon and PCRE are missing,
since they currently aren't loaded from src/shared/. This is stuff to
fix later)
Michael Marley [Tue, 8 Dec 2020 02:27:38 +0000 (21:27 -0500)]
manager: Fix HW watchdog when systemd starts before driver loaded
When manager_{set|override}_watchdog is called, set the watchdog timeout
regardless of whether the hardware watchdog was successfully initialized. If
the watchdog was requested but could not be initialized, then instead of
pinging it, attempt to initialize it again. This ensures that the hardware
watchdog is initialized even if the kernel module for it isn't loaded when
systemd starts (which is quite likely, unless it is compiled in).
This builds on work by @danc86 in https://github.com/systemd/systemd/pull/17460,
but fixes the issue of not updating the watchdog timeout with the actual value
from the hardware.
Quit frankly, I am not convinced we should do all this at all. If
close()ing of these input devices is really that slow, then this should
probably be fixed in the kernel, not worked around in userspace like
this.
Daan De Meyer [Mon, 7 Dec 2020 23:00:37 +0000 (23:00 +0000)]
mkosi: Enable --qemu-headless option for all distros
--qemu-headless configures the generated image and mkosi's qemu
command to connect to the VM via the serial port. This allows
spawning a qemu VM within the user's terminal instead of spawning
a graphical GTK GUI. --qemu-headless sets TERM, COLUMNS and LINES
in serial-getty@ttyS0.service in the container which makes the
terminal in the VM behave almost equivalent to the one on the host.
This change makes testing changes to systemd using mkosi + QEMU a
lot easier compared to before as commands can be executed in the VM
from the comfort of one's terminal compared to the Linux console
available when running via the GTK GUI.
systemd-nspawn: Allow setting ambient capability set
The old code was only able to pass the value 0 for the inheritable
and ambient capability set when a non-root user was specified.
However, sometimes it is useful to run a program in its own container
with a user specification and some capabilities set. This is needed
when the capabilities cannot be provided by file capabilities (because
the file system is mounted with MS_NOSUID for additional security).
This commit introduces the option --ambient-capability and the config
file option AmbientCapability=. Both are used in a similar way to the
existing Capability= setting. It changes the inheritable and ambient
set (which is 0 by default). The code also checks that the settings
for the bounding set (as defined by Capability= and DropCapability=)
and the setting for the ambient set (as defined by AmbientCapability=)
are compatible. Otherwise, the operation would fail in any way.
Due to the current use of -1 to indicate no support for ambient
capability set the special value "all" cannot be supported.
Also, the setting of ambient capability is restricted to running a
single program in the container payload.
Fedora will deprecate support for nscd in the upcoming release [1] and plans to
drop it in the next one [2]. At that point we might as well build systemd
without that support too, since there'll be nothing to talk too.
Daan De Meyer [Sun, 6 Dec 2020 18:16:59 +0000 (18:16 +0000)]
meson: Respect MESON_INSTALL_QUIET
MESON_INSTALL_QUIET is set when --quiet is passed to meson install.
Make sure we check the variable in our custom install scripts and
don't output anything if it is set.
Daan De Meyer [Sun, 6 Dec 2020 16:45:45 +0000 (16:45 +0000)]
mkosi: Add --quiet and --no-rebuild options to meson install in mkosi.build
By default, meson install prints a line for every file it installs.
This is verbose and doesn't provide much value. Let's silence the
meson install step to remove this output from the mkosi build step.
The --no-rebuild option removes some additional duplicate output
by the meson install step.
Ubuntu Focal still has meson 0.53.0 so we add a version check and
only use the new feature if the meson version supports it.
cryptsetup: give command line parameters proper names
It's highly confusing to reference the command line parameters via
argv[] indexes. Let's clean this up, and introduce properly named local
variables that make this easier to follow.
No actualy code changes, just some renaming of variables.
Vito Caputo [Tue, 1 Dec 2020 07:00:34 +0000 (23:00 -0800)]
mmap-cache: replace stats accessors with log func
In preparation for logging more mmap-cache statistics get rid of this
piecemeal stats accessor api and just have a debug log output function
for producing the stats.
Updates the one call site using these accessors, moving what that site
did into the new log function. So the output is unchanged for now,
just a trivial refactor.
Yu Watanabe [Thu, 3 Dec 2020 05:16:41 +0000 (14:16 +0900)]
test-network: sleep 1s after reloading configs
As interfaces will be reconfigured asynchronously after `networkctl reload`.
So, right after `networkctl reload` is finished, interfaces may be still
in 'configured' state with the old .network files.
shared/build: make the version string definition less terrible
The BLKID and ELFUTILS strings were present twice. Let's reaarange things so that
each times requires definition in exactly one place.
Also let's sort things a bit:
the "heavy hitters" like PAM/MAC first,
then crypto libs,
then other libs, alphabetically,
compressors,
and external compat integrations.
I think it's useful for users to group similar concepts together to some extent.
For example, when checking what compression is available, it helps a lot to have
them listed together.
FDISK is renamed to LIBFDISK to make it clear that this is about he library and
the executable.
Florian Westphal [Fri, 19 Jun 2020 11:33:19 +0000 (13:33 +0200)]
fw_add_local_dnat: remove unused function arguments
All users pass a NULL/0 for those, things haven't changed since 2015
when this was added originally, so remove the arguments.
THe paramters are re-added as local function variables, initalised
to NULL or 0. A followup patch can then manually remove all
if (NULL) rather than leaving dead-branch optimization to compiler.
Reason for not doing it here is to ease patch review.
Not requiring support for this will ease initial nftables backend
implementation.
In case a use-case comues up later this feature can be re-added.
Yu Watanabe [Wed, 2 Dec 2020 10:19:06 +0000 (19:19 +0900)]
network: do not set broadcast if prefixlen is 31 or 32
After fe841414ef157f7f01d339c5d5730126e7b5fe0a, broadcast address is
also compared with existing one to determine whether the address is
foregin or not. So, the address object should not contain unnecessary
information.
resolved: beef up logic for suppressing "localhost" entry in /etc/hosts
Either suppress the entry entirely, or not at all. But do not suppress
the "localhost" names we recognize, leaving the ones we do not in place.
On Fedora, where "localhost4.localdomain4" is among those listed in
/etc/hosts for 127.0.0.1 we'd thus otherwise drop the "localhost" but
keep the "localhost4.localdomain4" and then on reverse lookups only
return that, which is highly confusing.
resolved: never allow _gateway lookups to go to the network
Make them rather fail than go to the network.
Previously we'd filter them on LLMNR (explicitly) and MDNS (implicitly,
because it doesn't have .local suffix), but not on DNS.
In order to make _gateway truly reliable, let's not allow it to go to
DNS either, and keep it local.
This is particular relevant, as clients can now request lookups without
local RR synthesis, where we'd rather have NXDOMAIN returned for
_gateway than have it hit the network.
resolved: lower SERVFAIL cache timeout from 30s to 10s
Apparently 30s is a bit too long for some cases, see #5552. But not
caching SERVFAIL at all also breaks stuff, see explanation in 201d99584ed7af8078bb243ce2587e5455074713.
Let's try to find some middle ground, by lowering the cache timeout to
10s. This should be ample for the problem 201d99584ed7af8078bb243ce2587e5455074713 attackes, but not as long as
half a miute, as #5552 complains.