Martin Wilck [Tue, 12 Nov 2019 15:43:42 +0000 (16:43 +0100)]
udevd: fix crash when workers time out after exit is signal caught
If udevd receives an exit signal, it releases its reference on the udev
monitor in manager_exit(). If at this time a worker is hanging, and if
the event timeout for this worker expires before udevd exits, udevd
crashes in on_sigchld()->udev_monitor_send_device(), because the monitor
has already been freed.
Fix this by releasing the main process's monitor ref later, in
manager_free().
Franck Bui [Fri, 18 Oct 2019 10:44:51 +0000 (12:44 +0200)]
logind: fix (again) the race that might happen when logind restores VT
This patch is a new attempt to fix the race originally described in issue #9754.
The initial fix (commit ad96887a1205bad9656d280c5681f482e6d04838) consisted in
spawning a sub process that became the controlling process of the VT and hence
kicked the old controlling process off to make sure that the VT wouldn't have
entered in HUP state while logind restored the VT.
But it introduced a regression (see issue #11269) and thus was reverted. But
unlike it was described in the revert commit message, commit adb8688b3ff445d9c48ed0d72208c7844c2acc01 alone doen't fix the initial race.
This patch fixes the race in a simpler way by trying to restore the VT a second
time after making sure to re-open it if the first attempt fails.
Indeed if the old controlling process dies before or during the first attempt,
logind will fail to restore the VT. At this point the VT is in HUP state but
we're sure that it won't enter in a HUP state a second time. Therefore we will
retry by re-opening the VT to clear the HUP state and by restoring the VT a
second time, which should be safe this time.
Martin Wilck [Wed, 6 Nov 2019 11:24:41 +0000 (12:24 +0100)]
udevd: wait for workers to finish when exiting
On some systems with lots of devices, device probing for certain drivers can
take a very long time. If systemd-udevd detects a timeout and kills the worker
running modprobe using SIGKILL, some devices will not be probed, or end up in
unusable state. The --event-timeout option can be used to modify the maximum
time spent in an uevent handler. But if systemd-udevd exits, it uses a
different timeout, hard-coded to 30s, and exits when this timeout expires,
causing all workers to be KILLed by systemd afterwards. In practice, this may
lead to workers being killed after significantly less time than specified with
the event-timeout. This is particularly significant during initrd processing:
systemd-udevd will be stopped by systemd when initrd-switch-root.target is
about to be isolated, which usually happens quickly after finding and mounting
the root FS.
If systemd-udevd is started by PID 1 (i.e. basically always), systemd will
kill both udevd and the workers after expiry of TimeoutStopSec. This is
actually better than the built-in udevd timeout, because it's more transparent
and configurable for users. This way users can avoid the mentioned boot problem
by simply increasing StopTimeoutSec= in systemd-udevd.service.
If udevd is not started by systemd (standalone), this is still an
improvement. udevd will kill hanging workers when the event timeout is
reached, which is configurable via the udev.event_timeout= kernel
command line parameter. Before this patch, udevd would simply exit with
workers still running, which would then become zombie processes.
With the timeout removed, the sd_event_now() assertion in manager_exit() can be
dropped.
test-unit-name: add usual headers and add more verbose output
This makes it easier to see what unit_name_is_valid() returns at a glance.
The output is not whitespace clean, but I think it's good enough for a test.
We compile some c++ code for tests. We would simply use the default options for
those. When the previous commit raised the default warning level, we started
getting warnings from c++ code. Let's add the most important options to the c++
command, so that we get a compilation without any warnings again.
I don't think it makes sense to add *all* the options that we add for c to the
c++ flags, because testing them takes quite a while, and the c++ compilations
are for small amounts of code, mostly to check that the headers have compatible
syntax.
Let's bump up the warning level, and not add by -Wextra by hand. This is the
approach recommended by meson. The idea is that all projects should be as
similar as possible to make it easier for users to switch between projects.
With meson-0.52.0-1.module_f31+6771+f5d842eb.noarch I get:
src/test/meson.build:19: WARNING: Overriding previous value of environment variable 'PATH' with a new one
When we're using *prepend*, the whole point is to modify an existing variable,
so meson shouldn't warn. But let's set avoid the warning and shorten things by
setting the final value immediately.
(Yes, Ubuntu should install the UTC timezone data unconditionally: it
should not be an option, even if all other timezone data is excluded,
but since it's our business to validate user input but not out business
to validate distros, let's just accept "UTC" unconditionally, it's magic
after all)
Inspired by https://bugzilla.redhat.com/show_bug.cgi?id=1769299.
This change doesn't solve the issue, but makes it easier to whitelist the
syscall group.
From https://bugzilla.redhat.com/show_bug.cgi?id=1770154:
> utime is an obsolete system call. The current kernel interface is
> utimensat_time64. New 32-bit architectures do not even provide the utime
> system call.
Also add all other *time64 syscalls listed in
https://fedora.juszkiewicz.com.pl/syscalls.html.
Michal Suchanek [Mon, 4 Nov 2019 20:23:15 +0000 (21:23 +0100)]
libblkid: open device in nonblock mode.
When autoclose is set (kernel default but many distributions reverse the
setting) opening a CD-rom device causes the tray to close.
The function of blkid is to report the current state of the device and
not to change it. Hence it should use O_NONBLOCK when opening the
device to avoid closing a CD-rom tray.
blkid is used liberally in scripts so it can potentially interfere with
the user operating the CD-rom hardware.
[kzak@redhat.com: add O_NONBLOCK also to:
- wipefs
- blkid_new_probe_from_filename()
- blkid_evaluate_tag()]
Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Karel Zak <kzak@redhat.com>
(cherry picked from commit 39f5af25982d8b0244000e92a9d0e0e6557d0e17)
Anita Zhang [Thu, 7 Nov 2019 06:25:43 +0000 (22:25 -0800)]
include missing_fcntl.h where needed
f5947a5e925117c55b390460d592f57504277bf9 dropped missing.h and
replaced with the more specific headers but did not add
missing_fcntl.h in places that use O_TMPFILE. This is needed for
some older versions of glibc.
Anita Zhang [Tue, 5 Nov 2019 02:29:55 +0000 (18:29 -0800)]
core: change top-level drop-in from -.service.d to service.d
Discussed in #13743, the -.service semantic conflicts with the
existing root mount and slice names, making this feature not
uniformly extensible to all types. Change the name to be
<type>.d instead.
Updating to this format also extends the top-level dropin to
unit types.
We want users to use Wants, but we'd describe Requires first and ask users to
look for Wants instead. While at it, let's split the wall of text into sensible
paragraphs: syntax first, followed by semantics and longer description, and
finally hints and comparison to other configuration items last.
meson: remove strange dep that causes meson to enter infinite loop
The value is obviously bogus, but didn't seem to cause problems so far.
With meson-0.52.0, it causes a hang. The number of aliases is always rather
small (usually just one or two, possibly up to a dozen in a few cases), so
even if this causes some looping, it is strange that it has such a huge impact.
But let's just remove it.
Fixes #13742.
Tested with meson-0.52.0-1.module_f31+6771+f5d842eb.noarch,
meson-0.51.1-1.fc29.noarch.
Kevin Kuehler [Mon, 4 Nov 2019 22:48:06 +0000 (14:48 -0800)]
systemctl: Add TriggeredBy and Triggers to status
For all units that aren't timers, if it is activated by another unit,
add the triggering unit under the "TriggeredBy:" header. If a unit can
trigger other units, print the units it triggers other the "Triggers:"
header.
Lorenz Bauer [Mon, 4 Nov 2019 16:35:46 +0000 (16:35 +0000)]
journal: refresh cached credentials of stdout streams
journald assumes that getsockopt(SO_PEERCRED) correctly identifies the
process on the remote end of the socket. However, this is incorrect
according to man 7 socket:
The returned credentials are those that were in effect at the
time of the call to connect(2) or socketpair(2).
This becomes a problem when a new process inherits the stdout stream
from a parent. First, log messages from the child process will
be attributed to the parent. Second, the struct ucred used by journald
becomes invalid as soon as the parent exits. Further sendmsg calls then
fail with ENOENT. Logs for the child process then vanish from the journal.
Fix this by using recvmsg on the stdout stream, and refreshing the cached
struct ucred if SCM_CREDENTIALS indicate a new process.
Sebastian Wick [Thu, 31 Oct 2019 13:27:24 +0000 (14:27 +0100)]
hwdb: add XKB_FIXED_MODEL to the keyboard hwdb
Chromebook keyboards have a top row which generates f1-f10 key codes but
the keys have media symbols printed on them. A simple scan code to key
code mapping to the correct media keys makes the f1-f10 inaccessible. To
properly use the keyboard a custom key code to symbol mapping in xbk is
required (a variant of the chromebook xkb model is already upstream).
Other devices have similar problems.
This commit makes it possible to specify which xkb model should be used
for a specific device by setting XKB_FIXED_MODEL.
# LANG=C journalctl --no-pager -u A.service -u B.service -u C.target -b
-- Logs begin at Mon 2019-09-09 00:25:06 EDT, end at Thu 2019-10-24 22:28:47 EDT. --
Oct 24 22:27:47 localhost.localdomain systemd[1]: Starting A...
Oct 24 22:27:47 localhost.localdomain systemd[1]: A.service: Child 967 belongs to A.service.
Oct 24 22:27:47 localhost.localdomain systemd[1]: A.service: Main process exited, code=exited, status=0/SUCCESS
Oct 24 22:27:47 localhost.localdomain systemd[1]: A.service: Running next main command for state start.
Oct 24 22:27:47 localhost.localdomain systemd[1]: A.service: Passing 0 fds to service
Oct 24 22:27:47 localhost.localdomain systemd[1]: A.service: About to execute: /usr/bin/sleep 60
Oct 24 22:27:47 localhost.localdomain systemd[1]: A.service: Forked /usr/bin/sleep as 968
Oct 24 22:27:47 localhost.localdomain systemd[968]: A.service: Executing: /usr/bin/sleep 60
Oct 24 22:27:52 localhost.localdomain systemd[1]: A.service: Trying to enqueue job A.service/reload/replace
Oct 24 22:27:52 localhost.localdomain systemd[1]: A.service: Merged into running job, re-running: A.service/reload as 1288
Oct 24 22:27:52 localhost.localdomain systemd[1]: A.service: Enqueued job A.service/reload as 1288
Oct 24 22:27:52 localhost.localdomain systemd[1]: A.service: Unit cannot be reloaded because it is inactive.
Oct 24 22:27:52 localhost.localdomain systemd[1]: A.service: Job 1288 A.service/reload finished, result=invalid
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Passing 0 fds to service
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: About to execute: /usr/bin/echo B
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Forked /usr/bin/echo as 970
Oct 24 22:27:52 localhost.localdomain systemd[970]: B.service: Executing: /usr/bin/echo B
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Failed to send unit change signal for B.service: Connection reset by peer
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Changed dead -> start
Oct 24 22:27:52 localhost.localdomain systemd[1]: Starting B...
Oct 24 22:27:52 localhost.localdomain echo[970]: B
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Child 970 belongs to B.service.
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Main process exited, code=exited, status=0/SUCCESS
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Changed start -> exited
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Job 1371 B.service/start finished, result=done
Oct 24 22:27:52 localhost.localdomain systemd[1]: Started B.
Oct 24 22:27:52 localhost.localdomain systemd[1]: C.target: Job 1287 C.target/start finished, result=done
Oct 24 22:27:52 localhost.localdomain systemd[1]: Reached target C.
Oct 24 22:27:52 localhost.localdomain systemd[1]: C.target: Failed to send unit change signal for C.target: Connection reset by peer
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Child 968 belongs to A.service.
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Main process exited, code=exited, status=0/SUCCESS
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Running next main command for state start.
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Passing 0 fds to service
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: About to execute: /usr/bin/echo A2
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Forked /usr/bin/echo as 972
Oct 24 22:28:47 localhost.localdomain systemd[972]: A.service: Executing: /usr/bin/echo A2
Oct 24 22:28:47 localhost.localdomain echo[972]: A2
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Child 972 belongs to A.service.
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Main process exited, code=exited, status=0/SUCCESS
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Changed start -> exited
The issue occurs not only in reload command, i.e.:
Jan Kundrát [Sat, 2 Nov 2019 15:42:01 +0000 (16:42 +0100)]
journalctl: allow running vacuum on remote journals, too
Right now the `systemd-journal-remote` service does not constrain its
resource usage (I just run out of space on my 100GB partition, for
example). This patch does not change that, but it at least makes it
possible to run something like:
Jérémy Rosen [Fri, 1 Nov 2019 23:03:54 +0000 (00:03 +0100)]
allow an empty DefaultInstance= in configuration files
It is currently possible to override the DefaultInstance via drop-ins but
not remove it completely. Allow to do that by specifying an empty
DefaultInstance=