Yu Watanabe [Thu, 10 Dec 2020 23:34:13 +0000 (08:34 +0900)]
sd-device: make TAGS= property prefixed and suffixed with ":"
The commit 6f3ac0d51766b0b9101676cefe5c4ba81feba436 drops the prefix and
suffix in TAGS= property. But there exists several rules that have like
`TAGS=="*:tag:*"`. So, the property must be always prefixed and suffixed
with ":".
Vito Caputo [Sun, 6 Dec 2020 08:21:17 +0000 (00:21 -0800)]
mmap-cache: drop ret_size from mmap_cache_get()
The ret_size result is a bit of an awkward optimization that in a
sense enables bypassing the mmap-cache API, while encouraging
duplication of logic it already implements.
It's only utilized in one place; journal_file_move_to_object(),
apparently to avoid the overhead of remapping the whole object
again once its header, and thus its actual size, is known.
With mmap-cache's context cache, the overhead of simply
re-getting the object with the now known size should already be
negligible. So it's not clear what benefit this brings, unless
avoiding some function calls that do very little in the hot
context-cache hit case is of such a priority.
There's value in having all object-sized gets pass through
mmap_cache_get(), as it provides a single entrypoint for
instrumentation in profiling/statistics gathering. When
journal_file_move_to_object() bypasses getting the full object
size, you don't capture the full picture on the mmap-cache side
in terms of object sizes explicitly loaded from a journal file.
I'd like to see additional accounting in mmap_cache_get() in a
future commit, taking advantage of this change.
Quoting Andy Lutomirski:
> The upcoming Linux SGX driver has a device node /dev/sgx. User code opens
> it, does various setup things, mmaps it, and needs to be able to create
> PROT_EXEC mappings. This gets quite awkward if /dev is mounted noexec.
We already didn't use noexec in spawn, and this extends this behaviour to other
systems.
Afaik, the kernel would refuse execve() on a character or block device
anyway. Thus noexec on /dev matters only for actual binaries copied to /dev,
which requires root privileges in the first place.
We don't do noexec on either /tmp or /dev/shm (because that causes immediate
problems with stuff like Java and cffi). And if you have those two at your
disposal anyway, having noexec on /dev doesn't seem important. So the 'noexec'
attribute on /dev doesn't really mean much, since there are multiple other
similar directories which don't require root privileges to write to.
Yu Watanabe [Fri, 11 Dec 2020 03:15:45 +0000 (12:15 +0900)]
network: do not reconfigure interface when the link gains carrier but udev not initialized it yet
When an interface gains carrier but udev have not initialized the
interface or link_initialized_handler() has not been called yet,
then link_configure will be called twice. Thus LLDP client will be
configured twice, and triggers assertion.
When the .so module is loaded, it gets a separate copy of stuff in src/basic,
including the log level variables. So any logging settings are unaffected by
the loading program calling log_parse_environment() or such. Let's also parse
the environment here so that we can have nice logging.
Initialization is done from each exported function, and pthread_once_t is used
to avoid duplicate initialization. I didn't merge PROTECT_ERRNO into
NSS_ENTRYPOINT_BEGIN because UNPROTECT_ERRNO is called in a bunch of places
and it would feel strange to have PROTECT_ERRNO hidden, but not UNPROTECT_ERRNO.
The most interesting stuff in this module is the varlink messages, and any
potential errors in json. So let's enable json logging when debug messages are
enabled.
With those changes, figuring out the issue in
https://github.com/systemd/systemd/pull/17823 is trivial:
$ LD_LIBRARY_PATH=build/ SYSTEMD_LOG_COLOR=1 SYSTEMD_LOG_LOCATION=1 SYSTEMD_LOG_LEVEL=debug getent hosts mirrors.fedoraproject.org
src/shared/varlink.c:237: n/a: varlink: setting state idle-client
src/shared/varlink.c:1240: n/a: Sending message: {"method":"io.systemd.Resolve.ResolveHostname","parameters":{"name":"mirrors.fedoraproject.org","family":10}}
src/shared/varlink.c:240: n/a: varlink: changing state idle-client → calling
src/shared/varlink.c:588: n/a: New incoming message: {"parameters":{"addresses":[{"ifindex":0,"family":10,"address":[42,5,208,20,0,16,120,3,247,116,77,124,226,119,164,87]},{"ifindex":0,"family":10,"address":[42,5,208,28,12,106,204,3,38,58,132,9,185,97,126,2]},{"ifindex":0,"family":10,"address":[38,32,0,82,0,3,0,1,222,173,190,239,202,254,254,215]},{"ifindex":0,"family":10,"address":[38,5,188,128,48,16,6,0,222,173,190,239,202,254,254,217]},{"ifindex":0,"family":10,"address":[38,4,21,128,254,0,0,0,222,173,190,239,202,254,254,209]},{"ifindex":0,"family":10,"address":[38,32,0,82,0,3,0,1,222,173,190,239,202,254,254,214]},{"ifindex":0,"family":10,"address":[38,16,0,40,48,144,48,1,222,173,190,239,202,254,254,211]},{"ifindex":0,"family":10,"address":[32,1,65,120,0,2,18,105,0,0,0,0,0,0,254,210]}],"name":"wildcard.fedoraproject.org","flags":1}}
src/shared/varlink.c:240: n/a: varlink: changing state calling → called
src/shared/varlink.c:240: n/a: varlink: changing state called → idle-client
src/nss-resolve/nss-resolve.c:84: (string):1:40: JSON field 'ifindex' is out of bounds for an interface index.
Jinyuan Si [Fri, 4 Dec 2020 02:38:28 +0000 (10:38 +0800)]
cryptsetup: Fix crypto device missing issue after bootup
Normally, the udev rules operate on "change" events. But when
coldplugging, there's an "add" event present. The udev rules have to
recognize this and do some actions in this particular situation, too.
Also, we don't want the nodes to be created prematurely on "add"
events while not coldplugging. The udev rules will check
DM_UDEV_PRIMARY_SOURCE_FLAG to see if the device was activated
correctly before and if not, it ignore the "add" event totally.
This way the udev rules can support udev triggers generating "add"
events (e.g. "udevadm trigger --action=add" or
"echo add > /sys/block/<dm_device>/uevent").
In this case, the udevd service is started after
systemd-cryptsetup@config.service, is started, which will cause udevd
service to miss the "change" uevent with DM_UDEV_PRIMARY_SOURCE_FLAG
flag generated by systemd-cryptsetup@config.service. To solve this
issue, we let the cryptsetup service be started after the udevd
service.
Back in 5248e7e1f11aba6859de0b28f0dd3778b22842f2 (July 2017) we moved over to
"_gateway", with the old name declared to be temporary measure. Since we're
doing a bunch of changes to resolved now, it seems to be a good moment to make
this simplification and not add support for the compat name in new code.
seccomp: don't install filters for archs that can't use syscalls
When seccomp_restrict_archs is called, architectures that are blocked
are replaced by the SECCOMP_LOCAL_ARCH_BLOCKED marker so that they are
not disabled again and filters are not installed for them.
This can make some service that use SystemCallArchitecture= and
SystemCallFilter= start faster.
Vito Caputo [Thu, 3 Dec 2020 06:11:23 +0000 (22:11 -0800)]
mmap-cache: bind prot(ection) to MMapFileDescriptor
There are no mmap_cache_get() users that actually deviate prot
from the JournalFile's f->prot.
So there's no point in making this a separate parameter to
mmap_cache_get(), nor is there any need to store it in
JournalFile's f->prot.
Instead just pass it to mmap_cache_add_fd() at MMapFileDescriptor
creation, storing it in there for the mmap() callers, which
already receive MMapFileDescriptor *.
For functions receiving both an MMapFileDescriptor * and prot,
the prot argument has been simply removed and call sites updated.
Formalizing this fd:prot binding at the public API also enables
discarding the prot check in window_matches(), which is a hot
function on long window lists, so a minor CPU efficiency gain
should be had there as seen with the past removal of the fd
check. Unnoticable for uncached journals, but maybe a little
runtime improvement when cached in specific circumstances.
window_matches_fd() has also been simplified to treat the
MMapFileDescrptor * as equivalent to its fd and prot.
E.g. in nss-resolve it is still useful to print the location of the error:
src/test/test-nss.c:231: dlsym(0x0x1dc6fb0, _nss_resolve_gethostbyname2_r) → 0x0x7fdbfc53f626
(string):1:40: JSON field ifindex is out of bounds for an interface index.
I opted to use a partially duplicated if condition to avoid nesting. It's nice
to have the log calls vertically aligned. The compiler will optimize this nicely.
Takashi Iwai [Wed, 9 Dec 2020 09:56:51 +0000 (10:56 +0100)]
udev: Fix sound.target dependency
The recent bug report indicated a race at device creation and the
sound.target dependencies, and the cause turned out to be the condition
of the sound.target trigger. Currently it's set for "card*", but this
is actually the parent object; i.e. the sound.target is triggered before
the sound devices are created.
For assuring the whole sound device creations beforehand, we need to use
"controlC*" instead of "card*"; as already described in
78-sound-card.rules, this is guaranteed to be the last device, and can
be used as a synchronization point.
On really old kernels (< 4.14+) a bug in /proc/1/sched handling in the
kernel could be used to determine whether we are running in a PID
namespace. This hasn't worked for a long time, and there's little point
in making things work on old kernels we can't make work on current
kernels, hence let's drop that old cruft.
Daan De Meyer [Sun, 6 Dec 2020 11:42:45 +0000 (11:42 +0000)]
mkosi: Add gdb to final images
Let's add a debugger to the mkosi images so we can debug coredumps
from inside mkosi qemu VMs (and hopefully in the future from
mkosi systemd-nspawn containers as well).
Daan De Meyer [Mon, 7 Dec 2020 22:18:28 +0000 (22:18 +0000)]
Silence cgroups v1 read-only filesystem warning
Avoid warning messages when booting systemd-nspawn containers and using
hybrid or legacy cgroups. systemd-nspawn mounts the cgroups v1 controller
tree as read-only so these errors are expected and not problematic.
Partially fixes #17862.
```
‣ Processing default...
Spawning container image on /home/daan/projects/systemd/image.raw.
Press ^] three times within 1s to kill container.
systemd 247 running in system mode. (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization systemd-nspawn.
Detected architecture x86-64.
Welcome to Fedora 33 (Thirty Three)!
Queued start job for default target Graphical Interface.
-.slice: Failed to migrate controller cgroups from , ignoring: Read-only file system
system.slice: Failed to delete controller cgroups /system.slice, ignoring: Read-only file system
[ OK ] Created slice system-getty.slice.
[ OK ] Created slice system-modprobe.slice.
user.slice: Failed to delete controller cgroups /user.slice, ignoring: Read-only file system
[ OK ] Created slice User and Session Slice.
[ OK ] Started Dispatch Password Requests to Console Directory Watch.
[ OK ] Started Forward Password Requests to Wall Directory Watch.
[ OK ] Reached target Local Encrypted Volumes.
[ OK ] Reached target Paths.
[ OK ] Reached target Remote File Systems.
[ OK ] Reached target Slices.
[ OK ] Reached target Swap.
[ OK ] Listening on Process Core Dump Socket.
[ OK ] Listening on initctl Compatibility Named Pipe.
[ OK ] Listening on Journal Socket (/dev/log).
[ OK ] Listening on Journal Socket.
[ OK ] Listening on User Database Manager Socket.
dev-hugepages.mount: Failed to delete controller cgroups /dev-hugepages.mount, ignoring: Read-only file system
Mounting Huge Pages File System...
sys-fs-fuse-connections.mount: Failed to delete controller cgroups /sys-fs-fuse-connections.mount, ignoring: Read-only file system
Mounting FUSE Control File System...
Starting Journal Service...
Starting Remount Root and Kernel File Systems...
system.slice: Failed to delete controller cgroups /system.slice, ignoring: Read-only file system
```
After: `mkosi --default .mkosi/mkosi.fedora boot`
```
‣ Processing default...
Spawning container image on /home/daan/projects/systemd/mkosi.output/image.raw.
Press ^] three times within 1s to kill container.
systemd 247 running in system mode. (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization systemd-nspawn.
Detected architecture x86-64.
Welcome to Fedora 33 (Thirty Three)!
Queued start job for default target Graphical Interface.
[ OK ] Created slice system-getty.slice.
[ OK ] Created slice system-modprobe.slice.
[ OK ] Created slice User and Session Slice.
[ OK ] Started Dispatch Password Requests to Console Directory Watch.
[ OK ] Started Forward Password Requests to Wall Directory Watch.
[ OK ] Reached target Local Encrypted Volumes.
[ OK ] Reached target Paths.
[ OK ] Reached target Remote File Systems.
[ OK ] Reached target Slices.
[ OK ] Reached target Swap.
[ OK ] Listening on Process Core Dump Socket.
[ OK ] Listening on initctl Compatibility Named Pipe.
[ OK ] Listening on Journal Socket (/dev/log).
[ OK ] Listening on Journal Socket.
[ OK ] Listening on User Database Manager Socket.
Mounting Huge Pages File System...
Mounting FUSE Control File System...
Starting Journal Service...
Starting Remount Root and Kernel File Systems...
[ OK ] Mounted Huge Pages File System.
[ OK ] Mounted FUSE Control File System.
[ OK ] Finished Remount Root and Kernel File Systems.
Starting Create Static Device Nodes in /dev...
[ OK ] Finished Create Static Device Nodes in /dev.
[ OK ] Reached target Local File Systems (Pre).
[ OK ] Reached target Local File Systems.
Starting Restore /run/initramfs on shutdown...
[ OK ] Finished Restore /run/initramfs on shutdown.
[ OK ] Started Journal Service.
Starting Flush Journal to Persistent Storage...
[ OK ] Finished Flush Journal to Persistent Storage.
Starting Create Volatile Files and Directories...
[ OK ] Finished Create Volatile Files and Directories.
Starting Network Name Resolution...
Starting Update UTMP about System Boot/Shutdown...
[ OK ] Finished Update UTMP about System Boot/Shutdown.
[ OK ] Reached target System Initialization.
[ OK ] Started Daily Cleanup of Temporary Directories.
[ OK ] Reached target Timers.
[ OK ] Listening on D-Bus System Message Bus Socket.
[ OK ] Reached target Sockets.
[ OK ] Reached target Basic System.
Starting Home Area Manager...
Starting User Login Management...
Starting Permit User Sessions...
[ OK ] Finished Permit User Sessions.
[ OK ] Started Console Getty.
[ OK ] Reached target Login Prompts.
Starting D-Bus System Message Bus...
[ OK ] Started D-Bus System Message Bus.
[ OK ] Started Home Area Manager.
[ OK ] Started User Login Management.
[ OK ] Reached target Multi-User System.
[ OK ] Reached target Graphical Interface.
Starting Update UTMP about System Runlevel Changes...
[ OK ] Finished Update UTMP about System Runlevel Changes.
[ OK ] Started Network Name Resolution.
[ OK ] Reached target Host and Network Name Lookups.
Fedora 33 (Thirty Three) (built from systemd tree)
Kernel 5.9.11-arch2-1 on an x86_64 (console)
```
test: add test that dlopen()'s all our weak library deps once
This test should ensure we notice if distros update shared libraries
that broke so name, and we still use the old soname.
(In contrast to what the commit summary says, this currently doesn#t
cover really all such deps, specifically xkbcommon and PCRE are missing,
since they currently aren't loaded from src/shared/. This is stuff to
fix later)
Michael Marley [Tue, 8 Dec 2020 02:27:38 +0000 (21:27 -0500)]
manager: Fix HW watchdog when systemd starts before driver loaded
When manager_{set|override}_watchdog is called, set the watchdog timeout
regardless of whether the hardware watchdog was successfully initialized. If
the watchdog was requested but could not be initialized, then instead of
pinging it, attempt to initialize it again. This ensures that the hardware
watchdog is initialized even if the kernel module for it isn't loaded when
systemd starts (which is quite likely, unless it is compiled in).
This builds on work by @danc86 in https://github.com/systemd/systemd/pull/17460,
but fixes the issue of not updating the watchdog timeout with the actual value
from the hardware.
Quit frankly, I am not convinced we should do all this at all. If
close()ing of these input devices is really that slow, then this should
probably be fixed in the kernel, not worked around in userspace like
this.
Daan De Meyer [Mon, 7 Dec 2020 23:00:37 +0000 (23:00 +0000)]
mkosi: Enable --qemu-headless option for all distros
--qemu-headless configures the generated image and mkosi's qemu
command to connect to the VM via the serial port. This allows
spawning a qemu VM within the user's terminal instead of spawning
a graphical GTK GUI. --qemu-headless sets TERM, COLUMNS and LINES
in serial-getty@ttyS0.service in the container which makes the
terminal in the VM behave almost equivalent to the one on the host.
This change makes testing changes to systemd using mkosi + QEMU a
lot easier compared to before as commands can be executed in the VM
from the comfort of one's terminal compared to the Linux console
available when running via the GTK GUI.
Yu Watanabe [Fri, 4 Dec 2020 11:50:34 +0000 (20:50 +0900)]
network: fix SIGABRT related to unreachable route with DHCP6
After #17834, unreachable routes generated through DHCP6 are managed by
Manager. But they are referrenced by the DHCP6 uplink. So, the routes
managed by Manager must be freed after all Link objects are freed.
For IPv4, kernel compares the local address, prefix, and prefixlen.
For IPv6, kernel compares only the local address.
Let's follow the kernel's comparison way.
systemd-nspawn: Allow setting ambient capability set
The old code was only able to pass the value 0 for the inheritable
and ambient capability set when a non-root user was specified.
However, sometimes it is useful to run a program in its own container
with a user specification and some capabilities set. This is needed
when the capabilities cannot be provided by file capabilities (because
the file system is mounted with MS_NOSUID for additional security).
This commit introduces the option --ambient-capability and the config
file option AmbientCapability=. Both are used in a similar way to the
existing Capability= setting. It changes the inheritable and ambient
set (which is 0 by default). The code also checks that the settings
for the bounding set (as defined by Capability= and DropCapability=)
and the setting for the ambient set (as defined by AmbientCapability=)
are compatible. Otherwise, the operation would fail in any way.
Due to the current use of -1 to indicate no support for ambient
capability set the special value "all" cannot be supported.
Also, the setting of ambient capability is restricted to running a
single program in the container payload.
Fedora will deprecate support for nscd in the upcoming release [1] and plans to
drop it in the next one [2]. At that point we might as well build systemd
without that support too, since there'll be nothing to talk too.
Daan De Meyer [Sun, 6 Dec 2020 18:16:59 +0000 (18:16 +0000)]
meson: Respect MESON_INSTALL_QUIET
MESON_INSTALL_QUIET is set when --quiet is passed to meson install.
Make sure we check the variable in our custom install scripts and
don't output anything if it is set.