# LANG=C journalctl --no-pager -u A.service -u B.service -u C.target -b
-- Logs begin at Mon 2019-09-09 00:25:06 EDT, end at Thu 2019-10-24 22:28:47 EDT. --
Oct 24 22:27:47 localhost.localdomain systemd[1]: Starting A...
Oct 24 22:27:47 localhost.localdomain systemd[1]: A.service: Child 967 belongs to A.service.
Oct 24 22:27:47 localhost.localdomain systemd[1]: A.service: Main process exited, code=exited, status=0/SUCCESS
Oct 24 22:27:47 localhost.localdomain systemd[1]: A.service: Running next main command for state start.
Oct 24 22:27:47 localhost.localdomain systemd[1]: A.service: Passing 0 fds to service
Oct 24 22:27:47 localhost.localdomain systemd[1]: A.service: About to execute: /usr/bin/sleep 60
Oct 24 22:27:47 localhost.localdomain systemd[1]: A.service: Forked /usr/bin/sleep as 968
Oct 24 22:27:47 localhost.localdomain systemd[968]: A.service: Executing: /usr/bin/sleep 60
Oct 24 22:27:52 localhost.localdomain systemd[1]: A.service: Trying to enqueue job A.service/reload/replace
Oct 24 22:27:52 localhost.localdomain systemd[1]: A.service: Merged into running job, re-running: A.service/reload as 1288
Oct 24 22:27:52 localhost.localdomain systemd[1]: A.service: Enqueued job A.service/reload as 1288
Oct 24 22:27:52 localhost.localdomain systemd[1]: A.service: Unit cannot be reloaded because it is inactive.
Oct 24 22:27:52 localhost.localdomain systemd[1]: A.service: Job 1288 A.service/reload finished, result=invalid
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Passing 0 fds to service
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: About to execute: /usr/bin/echo B
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Forked /usr/bin/echo as 970
Oct 24 22:27:52 localhost.localdomain systemd[970]: B.service: Executing: /usr/bin/echo B
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Failed to send unit change signal for B.service: Connection reset by peer
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Changed dead -> start
Oct 24 22:27:52 localhost.localdomain systemd[1]: Starting B...
Oct 24 22:27:52 localhost.localdomain echo[970]: B
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Child 970 belongs to B.service.
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Main process exited, code=exited, status=0/SUCCESS
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Changed start -> exited
Oct 24 22:27:52 localhost.localdomain systemd[1]: B.service: Job 1371 B.service/start finished, result=done
Oct 24 22:27:52 localhost.localdomain systemd[1]: Started B.
Oct 24 22:27:52 localhost.localdomain systemd[1]: C.target: Job 1287 C.target/start finished, result=done
Oct 24 22:27:52 localhost.localdomain systemd[1]: Reached target C.
Oct 24 22:27:52 localhost.localdomain systemd[1]: C.target: Failed to send unit change signal for C.target: Connection reset by peer
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Child 968 belongs to A.service.
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Main process exited, code=exited, status=0/SUCCESS
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Running next main command for state start.
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Passing 0 fds to service
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: About to execute: /usr/bin/echo A2
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Forked /usr/bin/echo as 972
Oct 24 22:28:47 localhost.localdomain systemd[972]: A.service: Executing: /usr/bin/echo A2
Oct 24 22:28:47 localhost.localdomain echo[972]: A2
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Child 972 belongs to A.service.
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Main process exited, code=exited, status=0/SUCCESS
Oct 24 22:28:47 localhost.localdomain systemd[1]: A.service: Changed start -> exited
The issue occurs not only in reload command, i.e.:
Michal Sekletár [Wed, 27 Nov 2019 13:27:58 +0000 (14:27 +0100)]
cryptsetup: reduce the chance that we will be OOM killed
cryptsetup introduced optional locking scheme that should serialize
unlocking keyslots which use memory hard key derivation
function (argon2). Using the serialization should prevent OOM situation
in early boot while unlocking encrypted volumes.
Michal Sekletár [Mon, 18 Nov 2019 11:50:11 +0000 (12:50 +0100)]
core: introduce NUMAPolicy and NUMAMask options
Make possible to set NUMA allocation policy for manager. Manager's
policy is by default inherited to all forked off processes. However, it
is possible to override the policy on per-service basis. Currently we
support, these policies: default, prefer, bind, interleave, local.
See man 2 set_mempolicy for details on each policy.
Overall NUMA policy actually consists of two parts. Policy itself and
bitmask representing NUMA nodes where is policy effective. Node mask can
be specified using related option, NUMAMask. Default mask can be
overwritten on per-service level.
shared/cpu-set-util: only force range printing one time
The idea is to have at least one range to make the new format clearly
distinguishable from the old. But it is enough to just do it once.
In particular, in case the affinity would be specified like 0, 2, 4, 6…,
this gives much shorter output.
cpu_set_malloc() was the last user. It doesn't seem useful to keep
it just to save the allocation of a few hundred bytes in a test, so
it is dropped and a fixed maximum is allocated (1024 bytes).
pid1: when reloading configuration, forget old settings
If we had a configuration setting from a configuration file, and it was
removed, we'd still remember the old value, because there's was no mechanism to
"reset" everything, just to assign new values.
Note that the effect of this is limited. For settings that have an "ongoing" effect,
like systemd.confirm_spawn, the new value is simply used. But some settings can only
be set at start.
In particular, CPUAffinity= will be updated if set to a new value, but if
CPUAffinity= is fully removed, it will not be reset, simply because we don't
know what to reset it to. We might have inherited a setting, or we might have
set it ourselves. In principle we could remember the "original" value that was
set when we were executed, but propagate this over reloads and reexecs, but
that would be a lot of work for little gain. So this corner case of removal of
CPUAffinity= is not handled fully, and a reboot is needed to execute the
change. As a work-around, a full mask of CPUAffinity=0-8191 can be specified.
pid1: don't reset setting from /proc/cmdline upon restart
We have settings which may be set on the kernel command line, and also
in /proc/cmdline (for pid1). The settings in /proc/cmdline have higher priority
of course. When a reload was done, we'd reload just the configuration file,
losing the overrides.
So read /proc/cmdline again during reload.
Also, when initially reading the configuration file when program starts,
don't treat any errors as fatal. The configuration done in there doesn't
seem important enough to refuse boot.
This makes the handling of this option match what we do in unit files. I think
consistency is important here. (As it happens, it is the only option in
system.conf that is "non-atomic", i.e. where there's a list of things which can
be split over multiple assignments. All other options are single-valued, so
there's no issue of how to handle multiple assignments.)
The CPU_SET_S api is pretty bad. In particular, it has a parameter for the size
of the array, but operations which take two (CPU_EQUAL_S) or even three arrays
(CPU_{AND,OR,XOR}_S) still take just one size. This means that all arrays must
be of the same size, or buffer overruns will occur. This is exactly what our
code would do, if it received an array of unexpected size over the network.
("Unexpected" here means anything different from what cpu_set_malloc() detects
as the "right" size.)
Let's rework this, and store the size in bytes of the allocated storage area.
The code will now parse any number up to 8191, independently of what the current
kernel supports. This matches the kernel maximum setting for any architecture,
to make things more portable.
fuzz-journal-stream: avoid assertion failure on samples which don't fit in pipe
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11587.
We had a sample which was large enough that write(2) failed to push all the
data into the pipe, and an assert failed. The code could be changed to use
a loop, but then we'd need to interleave writes and sd_event_run (to process
the journal). I don't think the complexity is worth it — fuzzing works best
if the sample is not too huge anyway. So let's just reject samples above 64k,
and tell oss-fuzz about this limit.
journald: check whether sscanf has changed the value corresponding to %n
It's possible for sscanf to receive strings containing all three fields
and not matching the template at the same time. When this happens the
value of k doesn't change, which basically means that process_audit_string
tries to access memory randomly. Sometimes it works and sometimes it doesn't :-)
See also https://bugzilla.redhat.com/show_bug.cgi?id=1059314.
test: initialize syslog_fd in fuzz-journald-kmsg too
This is a follow-up to 8857fb9beb9dcb that prevents the fuzzer from crashing with
```
==220==ERROR: AddressSanitizer: ABRT on unknown address 0x0000000000dc (pc 0x7ff4953c8428 bp 0x7ffcf66ec290 sp 0x7ffcf66ec128 T0)
SCARINESS: 10 (signal)
#0 0x7ff4953c8427 in gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x35427)
#1 0x7ff4953ca029 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x37029)
#2 0x7ff49666503a in log_assert_failed_realm /work/build/../../src/systemd/src/basic/log.c:805:9
#3 0x7ff496614ecf in safe_close /work/build/../../src/systemd/src/basic/fd-util.c:66:17
#4 0x548806 in server_done /work/build/../../src/systemd/src/journal/journald-server.c:2064:9
#5 0x5349fa in LLVMFuzzerTestOneInput /work/build/../../src/systemd/src/fuzz/fuzz-journald-kmsg.c:26:9
#6 0x592755 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:571:15
#7 0x590627 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) /src/libfuzzer/FuzzerLoop.cpp:480:3
#8 0x594432 in fuzzer::Fuzzer::MutateAndTestOne() /src/libfuzzer/FuzzerLoop.cpp:708:19
#9 0x5973c6 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:839:5
#10 0x574541 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:764:6
#11 0x5675fc in main /src/libfuzzer/FuzzerMain.cpp:20:10
#12 0x7ff4953b382f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#13 0x420f58 in _start (/out/fuzz-journald-kmsg+0x420f58)
```
The function takes a pointer to a random block of memory and
the length of that block. It shouldn't crash every time it sees
a zero byte at the beginning there.
This should help the dev-kmsg fuzzer to keep going.
resolved: allow access to Set*Link and Revert methods through polkit
This matches what is done in networkd very closely. In fact even the
policy descriptions are all identical (with s/network/resolve), except
for the last one:
resolved has org.freedesktop.resolve1.revert while
networkd has org.freedesktop.network1.revert-ntp and
org.freedesktop.network1.revert-dns so the description is a bit different.
This doesn't matter much, but let's just do the loop once and allocate
the populate the result set on the fly. If we find an error, it'll get
cleaned up automatically.
This only affects systemd-resolved. bus_open_system_watch_bind_with_description()
is also used in timesyncd, but it has no methods, only read-only properties, and
in networkd, but it annotates all methods with SD_BUS_VTABLE_UNPRIVILEGED and does
polkit checks.
Fabian Henneke [Wed, 21 Aug 2019 09:17:59 +0000 (11:17 +0200)]
udev: Add id program and rule for FIDO security tokens
Add a fido_id program meant to be run for devices in the hidraw
subsystem via an IMPORT directive. The program parses the HID report
descriptor and assigns the ID_SECURITY_TOKEN environment variable if a
declared usage matches the FIDO_CTAPHID_USAGE declared in the FIDO CTAP
specification. This replaces the previous approach of whitelisting all
known security token models manually.
This commit is accompanied by a test suite and a fuzzer target for the
descriptor parsing routine.
Michal Sekletar [Tue, 26 Feb 2019 16:33:27 +0000 (17:33 +0100)]
selinux: don't log SELINUX_INFO and SELINUX_WARNING messages to audit
Previously we logged even info message from libselinux as USER_AVC's to
audit. For example, setting SELinux to permissive mode generated
following audit message,
This is unnecessary and wrong at the same time. First, kernel already
records audit event that SELinux was switched to permissive mode, also
the type of the message really shouldn't be USER_AVC.
Let's ignore SELINUX_WARNING and SELINUX_INFO and forward to audit only
USER_AVC's and errors as these two libselinux message types have clear
mapping to audit message types.
Andrew Jorgensen [Wed, 25 Jul 2018 15:06:57 +0000 (08:06 -0700)]
shared/sleep-config: exclude zram devices from hibernation candidates
On a host with sufficiently large zram but with no actual swap, logind will
respond to CanHibernate() with yes. With this patch, it will correctly respond
no, unless there are other swap devices to consider.
Frantisek Sumsal [Mon, 21 Oct 2019 16:39:39 +0000 (18:39 +0200)]
test: bump the second partition's size to 50M
The former size (10M) caused systemd-journald to crash with SIGABRT when
used on a LUKS2 partition, as the LUKS2 metadata consume a significant
part of the 10M partition, thus leaving no space for the journal file
itself (relevant for TEST-02-CRYPTSETUP). This change has been present
in upstream for a while anyway.
Frantisek Sumsal [Fri, 15 Mar 2019 09:05:33 +0000 (10:05 +0100)]
test: use PBKDF2 instead of Argon2 in cryptsetup...
to reduce memory requirements for volume manipulation. Also,
to further improve the test performance, reduce number of PBKDF
iterations to 1000 (allowed minimum).
Anita Zhang [Mon, 8 Oct 2018 03:28:36 +0000 (20:28 -0700)]
core: implement per unit journal rate limiting
Add LogRateLimitIntervalSec= and LogRateLimitBurst= options for
services. If provided, these values get passed to the journald
client context, and those values are used in the rate limiting
function in the journal over the the journald.conf values.
Franck Bui [Tue, 19 Mar 2019 09:59:26 +0000 (10:59 +0100)]
core: only watch processes when it's really necessary
If we know that main pid is our child then it's unnecessary to watch all
other processes of a unit since in this case we will get SIGCHLD when the main
process will exit and will act upon accordingly.
So let's watch all processes only if the main process is not our child since in
this case we need to detect when the cgroup will become empty in order to
figure out when the service becomes dead. This is only needed by cgroupv1.
Thanks Renaud Métrich for backporting this to RHEL.
Resolves: #1744972
Franck Bui [Mon, 18 Mar 2019 19:59:36 +0000 (20:59 +0100)]
core: reduce the number of stalled PIDs from the watched processes list when possible
Some PIDs can remain in the watched list even though their processes have
exited since a long time. It can easily happen if the main process of a forking
service manages to spawn a child before the control process exits for example.
However when a pid is about to be mapped to a unit by calling unit_watch_pid(),
the caller usually knows if the pid should belong to this unit exclusively: if
we just forked() off a child, then we can be sure that its PID is otherwise
unused. In this case we take this opportunity to remove any stalled PIDs from
the watched process list.
If we learnt about a PID in any other form (for example via PID file, via
searching, MAINPID= and so on), then we can't assume anything.
Thanks Renaud Métrich for backporting this to RHEL.
Resolves: #1744972
Jan Synacek [Thu, 17 Oct 2019 07:37:35 +0000 (09:37 +0200)]
udev: introduce CONST key name
Currently, there is no way to match against system-wide constants, such
as architecture or virtualization type, without forking helper binaries.
That potentially results in a huge number of spawned processes which
output always the same answer.
This patch introduces a special CONST keyword which takes a hard-coded
string as its key and returns a value assigned to that key. Currently
implemented are CONST{arch} and CONST{virt}, which can be used to match
against the system's architecture and virtualization type.
Michal Sekletar [Tue, 3 Sep 2019 08:05:42 +0000 (10:05 +0200)]
buildsys: don't garbage collect sections while linking
gc-sections is actually very aggressive and garbage collects ELF
sections used by annobin gcc plugin and annocheck then reports gaps in
coverage. Let's drop that linker flag.
core: try to reopen /dev/kmsg again right after mounting /dev
I was debugging stuff during early boot, and was confused that I never
found the logs for it in kmsg. The reason for that was that /proc is
generally not mounted the first time we do log_open() and hence
log_set_target(LOG_TARGET_KMSG) we do when running as PID 1 had not
effect. A lot later during start-up we call log_open() again where this
is fixed (after the point where we close all remaining fds still open),
but in the meantime no logs every got written to kmsg. This patch fixes
that.
ask-password: prevent buffer overrow when reading from keyring
When we read from keyring, a temporary buffer is allocated in order to
determine the size needed for the entire data. However, when zeroing that area,
we use the data size returned by the read instead of the lesser size allocate
for the buffer.
That will cause memory corruption that causes systemd-cryptsetup to crash
either when a single large password is used or when multiple passwords have
already been pushed to the keyring.
kernel-install: do not require non-empty kernel cmdline
When booting with Fedora-Server-dvd-x86_64-30-20190411.n.0.iso,
/proc/cmdline is empty (libvirt, qemu host with bios, not sure if that
matters), after installation to disk, anaconda would "crash" in kernel-core
%posttrans, after calling kernel-install, because dracut would fail
with
> Could not determine the kernel command line parameters.
> Please specify the kernel command line in /etc/kernel/cmdline!
I guess it's legitimate, even if unusual, to have no cmdline parameters.
Two changes are done in this patch:
1. do not fail if the cmdline is empty.
2. if /usr/lib/kernel/cmdline or /etc/kernel/cmdline are present, but
empty, ignore /proc/cmdline. If there's explicit configuration to
have empty cmdline, don't ignore it.
The same change was done in dracut:
https://github.com/dracutdevs/dracut/pull/561.
Frantisek Sumsal [Mon, 14 Oct 2019 13:26:48 +0000 (15:26 +0200)]
travis: move to CentOS 8 docker images
As the CentOS 8 Docker images is finally out, we can use it and drop the
plethora of workarounds we had to implement to compile RHEL8 systemd on
CentOS 7.
man: reorder and add examples to systemd-analyze(1)
The number of verbs supported by systemd-analyze has grown quite a bit, and the
man page has become an unreadable wall of text. Let's put each verb in a
separate subsection, grouping similar verbs together, and add a lot of examples
to guide the user.
Change job mode of manager triggered restarts to JOB_REPLACE
Fixes: #11305 Fixes: #3260
Related: #11456
So, here's what happens in the described scenario in #11305. A unit goes
down, and that triggeres stop jobs for the other two units as they were
bound to it. Now, the timer for manager triggered restarts kicks in and
schedules a restart job with the JOB_FAIL job mode. This means there is
a stop job installed on those units, and now due to them being bound to
us they also get a restart job enqueued. This however is a conflicts, as
neither stop can merge into restart, nor restart into stop. However,
restart should be able to replace stop in any case. If the stop
procedure is ongoing, it can cancel the stop job, install itself, and
then after reaching dead finish and convert itself to a start job.
However, if we increase the timer, then it can always take those units
from inactive -> auto-restart.
We change the job mode to JOB_REPLACE so the restart job cancels the
stop job and installs itself.
Also, the original bug could be worked around by bumping RestartSec= to
avoid the conflicting.
This doesn't seem to be something that is going to break uses. That is
because for those who already had it working, there must have never been
conflicting jobs, as that would result in a desctructive transaction by
virtue of the job mode used.
After this change, the test case is able to work nicely without issues.
Milan Broz [Mon, 27 May 2019 07:44:14 +0000 (09:44 +0200)]
cryptsetup: Add LUKS2 token support.
LUKS2 supports so-called tokens. The libcryptsetup internally
support keyring token (it tries to open device using specified
keyring entry).
Only if all token fails (or are not available), it uses a passphrase.
This patch aligns the functionality with the cryptsetup utility
(cryptsetup luksOpen tries tokens first) but does not replace
the systemd native ask-password function (can be used the same in
combination with this patch).
Milan Broz [Mon, 27 May 2019 07:27:54 +0000 (09:27 +0200)]
cryptsetup: Do not fallback to PLAIN mapping if LUKS data device set fails.
If crypt_load() for LUKS succeeds, we know that it is a LUKS device.
Failure of data device setting should fail in this case; remapping
as a PLAIN device late could mean data corruption.
(If a user wants to map PLAIN device over a device with LUKS header,
it should be said explicitly with "plain" argument type.)
Also, if there is no explicit PLAIN type requested and crypt device
is already initialized (crypt_data_type() is set), do not run
the initialization again.
Apparently this happens IRL. Let's carefully deal with issues like this:
when we overrun, let's not go back to zero but instead leave the highest
cookie bit set. We use that as indication that we are in "overrun
territory", and then are particularly careful with checking cookies,
i.e. that they haven't been used for still outstanding replies yet. This
should retain the quick cookie generation behaviour we used to have, but
permits dealing with overruns.