Michal Koutný [Fri, 1 May 2020 16:38:10 +0000 (18:38 +0200)]
test: Fix build with !HAVE_LZ4 && HAVE_XZ
HUGE_SIZE was defined inconsistently.
> In file included from ../src/basic/alloc-util.h:9,
> from ../src/journal/test-compress.c:9:
> ../src/journal/test-compress.c: In function ‘main’:
> ../src/journal/test-compress.c:280:33: error: ‘HUGE_SIZE’ undeclared (first use in this function)
> 280 | assert_se(huge = malloc(HUGE_SIZE));
man: sd_notify() race is gone with sd_notify_barrier()
Add note for change of behaviour in systemd-notify, where parent pid trick
is only used when --no-block is passed, and with enough privileges ofcourse.
This adds the sd_notify_barrier function, to allow users to synchronize against
the reception of sd_notify(3) status messages. It acts as a synchronization
point, and a successful return gurantees that all previous messages have been
consumed by the manager. This can be used to eliminate race conditions where
the sending process exits too early for systemd to associate its PID to a
cgroup and attribute the status message to a unit correctly.
systemd-notify now uses this function for proper notification delivery and be
useful for NotifyAccess=all units again in user mode, or in cases where it
doesn't have a control process as parent.
Dan Streetman [Sun, 26 Apr 2020 15:19:55 +0000 (11:19 -0400)]
test: find path for systemd-journal-remote
As Debian/Ubuntu use /lib/systemd instead of /usr/lib/systemd,
add systemd-journal-remote to the list of programs that test-functions
detects the correct path to, and replace its direct usage with
$SYSTEMD_JOURNAL_REMOTE
Also use $JOURNALCTL instead of journalctl.
Also minor correction in install_plymouth() to look in /lib/... as
well as /usr/lib/... and /etc/...
A service can specify FDSTORE=1 FDPOLL=0 to request that PID1 does not
poll the fd to remove them on error. If set, fds will only be removed on
FDSTOREREMOVE=1 or when the service is done.
Michal Sekletár [Wed, 29 Apr 2020 15:53:43 +0000 (17:53 +0200)]
core: introduce support for cgroup freezer
With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.
This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.
Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.
efi: cache test results of boolean EFI state functions
EFI variable access is nowadays subject to rate limiting by the kernel.
Thus, let's cache the results of checking them, in order to minimize how
often we access them.
link: Allow configuring RX mini and jumbo ring sizes, too
This now covers all ethtool_ringparam configurables (as of v5.6;
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/ethtool.h?h=v5.6#n488)
Callers of cg_get_keyed_attribute_full() can now specify via the flag whether the
missing keyes in cgroup attribute file are OK or not. Also the wrappers for both
strict and graceful version are provided.
selinux: do preprocessor check only in selinux-access.c
This has the advantage that mac_selinux_access_check() can be used as a
function in all contexts. For example, parameters passed to it won't be
reported as unused if the "function" call is replaced with 0 on SELinux
disabled builds.
Let's allow more memory to be locked on beefy machines than on small
ones. The previous limit of 64M is the lower bound still. This
effectively means on a 4GB machine we can lock 512M, which should be
more than enough, but still not lock up the machine entirely under
pressure.
Revert "detect-virt: also detect "microsoft" as WSL"
WSL2 will soon (TM) include the "WSL2" string in /proc/sys/kernel/osrelease
so the workaround will no longer be necessary.
We have several different cloud images which do include the "microsoft"
string already, which would break this detection. They are for internal
usage at the moment, but the userspace side can come from all over the
place so it would be quite hard to track and downstream-patch to avoid
breakages.
By using a newline after executable( and run_target(, we get less
indentation and the indentation level does not change when the returned
object is saved to a variable.
On my laptop (Lenovo X1carbo 4th) I very occasionally see test-boot-timestamps
fail with this tb:
262/494 test-boot-timestamps FAIL 0.7348453998565674 s (killed by signal 6 SIGABRT)
08:12:48 SYSTEMD_LANGUAGE_FALLBACK_MAP='/home/zbyszek/src/systemd/src/locale/language-fallback-map' SYSTEMD_KBD_MODEL_MAP='/home/zbyszek/src/systemd/src/locale/kbd-model-map' PATH='/home/zbyszek/src/systemd/build:/home/zbyszek/.local/bin:/usr/lib64/qt-3.3/bin:/usr/share/Modules/bin:/usr/condabin:/usr/lib64/ccache:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/home/zbyszek/bin:/var/lib/snapd/snap/bin' /home/zbyszek/src/systemd/build/test-boot-timestamps
--- stderr ---
Failed to read $container of PID 1, ignoring: Permission denied
Found container virtualization none.
Failed to get SystemdOptions EFI variable, ignoring: Interrupted system call
Failed to read ACPI FPDT: Permission denied
Failed to read LoaderTimeInitUSec: Interrupted system call
Failed to read EFI loader data: Interrupted system call
Assertion 'q >= 0' failed at src/test/test-boot-timestamps.c:84, function main(). Aborting.
Normally it takes ~0.02s, but here there's a slowdown to 0.73 and things fail with EINTR.
This happens only occasionally, and I haven't been able to capture a strace.
It would be to ignore that case in test-boot-timestamps or always translate
EINTR to -ENODATA. Nevertheless, I think it's better to retry, since this gives
as more resilient behaviour and avoids a transient failure.
The "preset" column introduced in b01c1f305c044a381ad110709a62507d74bf6d86 breaks zsh completion for
systemctl disable/enable. Fix by ignoring everything after the last
space in a line.
Don't assume that 4MB can be allocated from stack since there could be smaller
DefaultLimitSTACK= in force, so let's use malloc(). NUL terminate the huge
strings by hand, also ensure termination in test_lz4_decompress_partial() and
optimize the memset() for the string.
Some items in /proc and /etc may not be accessible to poor unprivileged users
due to e.g. SELinux, BOFH or both, so check for EACCES and EPERM.
/var/tmp may be a symlink to /tmp and then path_compare() will always fail, so
let's stick to /tmp like elsewhere.
/tmp may be mounted with noexec option and then trying to execute scripts from
there would fail.
Detect and warn if seccomp is already in use, which could make seccomp test
fail if the syscalls are already blocked.
Unset $TMPDIR so it will not break specifier tests where %T is assumed to be
/tmp and %V /var/tmp.
Dan Streetman [Sat, 21 Mar 2020 14:59:42 +0000 (10:59 -0400)]
test-cgroup: skip if /sys/fs/cgroup unknown fs
It's not always mounted, e.g. during the build-time tests, it's running inside
a chroot (that's how Debian/Ubuntu build packages, in chroots) so this test
always fails because /sys/fs/cgroup isn't mounted.
The submit phase of the Fuzzit Travis job has been spuriously failing
for some time with various (and usually pretty hidden) errors, like:
```
./fuzzit create job --type regression ...
2020/04/23 17:02:12 please set env variable FUZZIT_API_KEY or pass --api-key. API Key for you account: ...
```
man: add a description of handling of single-label names
It turns out that our man page didn't describe the handling of single-label
names almost at all. This probably adds to the confusion regarding the subject.
So let's first describe what our current implementation is doing.
Quoting https://www.iab.org/documents/correspondence-reports-documents/2013-2/iab-statement-dotless-domains-considered-harmful/:
> Applications and platforms that apply a suffix search list to a single-label
> name are in conformance with IETF standards track RFCs. Furthermore,
> applications and platforms that do not query DNS for a TLD are in conformance
> with IETF standards track recommendations
Current behaviour is in line with that recommendation.
We unregister binfmt_misc twice during shutdown with this change:
1. A previous commit added support for doing that in the final shutdown
phase, i.e. when we do the aggressive umount loop. This is the robust
thing to do, in case the earlier ("clean") shutdown phase didn't work
for some reason.
2. This commit adds support for doing that when systemd-binfmt.service
is stopped. This is a good idea so that people can order mounts
before the service if they want to register binaries from such
mounts, as in that case we'll undo the registration on shutdown
again, before unmounting those mounts.
And all that, just because of that weird "F" flag the kernel introduced
that can pin files...
Let's just copy out the bit of the string we need, and let's make sure
we refuse rules called "status" and "register", since those are special
files in binfmt_misc's file system.
shutdown: unregister all binfmt_misc entries before entering shutdown loop
Apparently if the new "F" flag is used they might pin files, which
blocks us from unmounting things. Let's hence clear this up explicitly.
Before entering our umount loop.
tmpfiles: if we get ENOENT when opening /proc/self/fd/, check if /proc is mounted
let's return ENOSYS in that case, to make things a bit less confusng.
Previously we'd just propagate ENOENT, which people might mistake as
applying to the object being modified rather than /proc/ just not being
there.
Let's return ENOSYS instead, i.e. an error clearly indicating that some
kernel API is not available. This hopefully should put people on a
better track.
Note that we only do the procfs check in the error path, which hopefully
means it's the less likely path.
We probably can add similar bits to more suitable codepaths dealing with
/proc/self/fd, but for now, let's pick to the ones noticed in #14745.