execute: don't honour PrivateNetwork() if we lack CAP_NET_ADMIN
Somehow the Linux kernel allows us to allocate a network namespace if we
possess CAP_SYS_ADMIN but doesn't allow us to configure it, unless we
also have CAP_NET_ADMIN.
Taking that into consideration let's avoid allocating a network
namespace we cannot even configure "lo" in.
This is common case if nspawn is invoked without userns and without
netns, because in that case it will have CAP_SYS_ADMIN but no
CAP_NET_ADMIN.
This also takes down a notch the messages about the automatic
downgrading. These have been LOG_WARNING so far, and I downgraded them
to LOG_NOTICE, since in an environment where CAP_NET_ADMIN is not
available this is really not something to be concerned about, but still
noticable. With that it's still more priorized than regular LOG_INFO.
Mike Yuan [Tue, 18 Apr 2023 16:09:08 +0000 (00:09 +0800)]
sleep: always write resume_offset if possible
There's no need to conditionalize this.
Setting resume_offset=0 doesn't harm, and can even help
by overriding potentially existing half-written settings.
Frantisek Sumsal [Fri, 23 Jun 2023 12:28:30 +0000 (14:28 +0200)]
test: make sure we get PID1's stack trace on ASan/UBSan errors
As hitting an ASan/UBSan error in PID1 results in a crash (and a kernel
panic when running under qemu), we usually lose the stack trace which
makes debugging quite painful. Let's mitigate this by forwarding the
stack trace to multiple places - namely to a file and the syslog.
async: stop using threads for asynchronous_close()
Let's work towards PID1 being purely single threaded again. Let's rework
asynchronous_close() on top of clone() with CLONE_FILES (so that we
can manipulate PID1's fd table correctly).
This wraps glibc's clone() but deals with the 'stack' parameter in a
sensible way. Only supports invocations without CLONE_VM, i.e. when
child is a CoW copy of parent.
tpm2-util: look for tpm2-pcr-signature.json directly in /.extra/
So far we relied on tmpfiles.d to copy tpm2-pcr-signature.json from
/.extra/ into /run/systemd/. This is racy however if cryptsetup runs too
early, and we cannot unconditionally run it after tmpfiles completed.
hence, let's teach cryptsetup to directly look for the file in /.extra/,
in order to simplify this, and remove the race. But do so only in the
initrd (as only there /.extra/ is a concept).
We generally prefer looking in /run/systemd/, since things are under
user control then. In the regular system we exclusively want that
userspace looks there.
userdbd: when we hit a flood of requests to start more workers, don't exit
Let's tweak what we do if we detect a flood of requests to start more
workers: if none of the workers ever sticks (i.e. the worker count is
zero) then let's just give up, as before.
Otherwise, let's just not start more workers for a while, and do so
again after a while. Thus spawning ofr workers will "cool off" for a
while.
userdbd: drastically raise ratelimit we apply on requests for more worker processes
These requests might come in during lookup floods very quickly, since
multiple worker processes might detect that things should be scaled up
at the same time. Hence, let's substantially raise the limit so that it
doesn't get hit in real-life scenarios and acts more like a safety net.
tests: teach tests boilerplate to run selected tests only
sometimes its useful to only run a specific test (or multiple) instead
of all implemented in a test. Allow the test name(s) to be specified on the
in a $TESTFUNCS env var, separated by colons.
Let's restrict how we apply credential globbing in ImportCredential=, so
that we have some flexibility in automatically extending the glob
expression with per-instance data eventually without getting into
conflict with the globbing parts.
In our current uses we only allow globbing at the end of the expression,
and this is a new, unreleased feature hence let's be restrictive on this
initially. We can still relax this later if we feel the need to after
all.
Yu Watanabe [Wed, 31 May 2023 23:31:25 +0000 (08:31 +0900)]
time-util: introduce usleep_safe()
We use usec_t for storing time value, which is 64bit.
However, usleep() takes useconds_t that is (typically?) 32bit.
Also, usleep() may only support the range [0, 1000000].
Ronan Pigott [Wed, 21 Jun 2023 02:47:47 +0000 (19:47 -0700)]
systemd-analyze: allow --quiet for condition checks
I figure these messages are rather unnecessary, so let the user quiet
them with the existing --quiet flag if desired. Makes systemd-analyze
condition a little more ergonomic in scripts.
Romain Geissler [Tue, 20 Jun 2023 16:06:31 +0000 (16:06 +0000)]
elf-util: discard PT_LOAD segment early based on the start address.
Indeed when iterating over all the PT_LOAD segment of the core dump
while trying to look for the elf headers of a given module, we iterate
over them all and try to use the first one for which we can parse a
package metadata, but the start address is never taken into account,
so absolutely nothing guarantees we actually parse the right ELF header
of the right module we are currently iterating on.
This was tested like this:
- Create a core dump using sleep on a fedora 37 container, with an
explicit LD_PRELOAD of a library having a valid package metadata:
- Then from a fedora 38 container with systemd installed, the resulting
core dump has been passed to systemd-coredump with and without this
patch. Without this patch, we get:
Module /usr/bin/sleep from rpm bash-5.2.15-3.fc38.x86_64
Module /usr/lib64/libtinfo.so.6.3 from rpm coreutils-9.1-8.fc37.x86_64
Module /usr/lib64/libc.so.6 from rpm coreutils-9.1-8.fc37.x86_64
Module /usr/lib64/libreadline.so.8.2 from rpm coreutils-9.1-8.fc37.x86_64
Module /usr/lib64/ld-linux-x86-64.so.2 from rpm coreutils-9.1-8.fc37.x86_64
While with this patch we get:
Module /usr/bin/sleep from rpm bash-5.2.15-3.fc38.x86_64
Module /usr/lib64/libtinfo.so.6.3 from rpm ncurses-6.3-5.20220501.fc37.x86_64
Module /usr/lib64/libreadline.so.8.2 from rpm readline-8.2-2.fc37.x86_64
So the parsed package metadata reported by systemd-coredump when the module
files are not found on the host (ie the case of crash inside a container) are
now correct. The inconsistency of the first module in the above example
(sleep is indeed not provided by the bash package) can be ignored as it
is a consequence of how this was tested.
In addition to this, this also fixes the performance issue of
systemd-coredump in case of the crashing process uses a large number of
shared libraries and having no package metadata, as reported in
https://sourceware.org/pipermail/elfutils-devel/2023q2/006225.html.
Daan De Meyer [Tue, 6 Jun 2023 15:44:09 +0000 (17:44 +0200)]
core: Add RootEphemeral= setting
This setting allows services to run in an ephemeral copy of the root
directory or root image. To make sure the ephemeral copies are always
cleaned up, we add a tmpfiles snippet to unconditionally clean up
/var/lib/systemd/ephemeral. To prevent in use ephemeral copies from
being cleaned up by tmpfiles, we use the newly added COPY_LOCK_BSD
and BTRFS_SNAPSHOT_LOCK_BSD flags to take a BSD lock on the ephemeral
copies which instruct tmpfiles to not touch those ephemeral copies as
long as the BSD lock is held.
It's highly interesting to see if tools such as systemd-sysupdate
consider a version valid, hence let's output that too (though
gracefully, not fatally)
string-util: move version_is_valid() into generic code
While we are at it, replace the sloppy use of filename_is_valid() by the
less sloppy filename_part_is_valid() (as added by the preceeding
commit), since we don#t want to be too restrictive here. (After all,
version strings invalid as standalone filenames might be valid as part
of filenames, and hence we should allow them).
Add a helper filename_part_is_valid() which does half of what
filename_is_valid() does: it checks for valid chars and length, but does
not filter out ".", ".." and "", as these are OK as parts of filenames,
just not alone.