nspawn: disable propagation for selected host API bind mounts
We bind mount two selected inodes from the host into our container.
Let's turn off propagation for that, since we just want those inodes,
nothing else.
With this change "grep master: /proc/self/mountinfo" should list only
the mount propagation "tunnel" dir, and nothing else anymore.
nspawn: disconnect mounts propagation from host on our container dir
@brauner noticed that in invoked containers the root directory is set to
still receive mounts from the host. We should disable that, and
guarantee we live in our own world, because that's what an
(nspawn-style) container *is* after all: a whole new world.
This hence mounts the container subtree to MS_PRIVATE after getting the
root dir in place. Note that this will later be set to MS_SHARED again.
The MS_PRIVATE disconnects mounts from the host, the MS_SHARED then
establishes a new peer group for mount propagation events, so that
payload service managers (such as systemd) can take benefit of
propagation further down the tree.
Michal Koutný [Wed, 1 Mar 2023 21:54:06 +0000 (22:54 +0100)]
meson: Copy files with git only in true git repository
When mkosi is run from git-worktree(1), the .git is not a repository
directory but a textfile pointing to the real git dir
(e.g. /home/user/systemd/.git/worktrees/systemd-worktree). This git dir
is not bind mounted into build environment and it fails with:
> fatal: not a git repository: /home/user/systemd/.git/worktrees/systemd-worktree
> test/meson.build:190:16: ERROR: Command `/usr/bin/env -u GIT_WORK_TREE /usr/bin/git --git-dir=/root/src/.git ls-files ':/test/dmidecode-dumps/*.bin'` failed with status 128.
There is already a fallback to use shell globbing instead of ls-files,
use it with git worktrees as well.
msizanoen1 [Wed, 1 Mar 2023 10:35:17 +0000 (17:35 +0700)]
escape: Ensure that output is always valid UTF-8
This ensures that shell string escape operations will not produce output
with invalid UTF-8 from the input by escaping invalid UTF-8 data as if
they were single byte characters.
test: skip the hwdb update related tests w/ sanitizers and w/o accel
systemd-hwdb update is an expensive operation by itself, and when
running with sanitizers and in a VM without acceleration this cost is
exacerbated even further, making the test run for a very long time.
For example, in the daily CentOS CI ppc64le job with ASan+UBSan one
systemd-hwdb update takes more than 7 minutes; in the regular Arch job
with KVM it takes over 2 minutes.
Since the hwdb update is also tested in other places (like
TEST-01-BASIC and the test-hwdb meson test), let's skip it if we detect
we run with sanitizers and with plain QEMU.
Since quite a while the propagation from the DDI arch into the
personality() wasn't hooked up anymore. Let's fix that: when the DDI has
a determined arch, automatically propagate this into the personality.
units: let systemd --user manage its own memory pressure handling
Let's make things systematic: the per-user and the per-system manager
should manage their own memory pressure, as they are, well, managers of
things.
This is particularly relevant and the per-user service manager should
watch its own "init.scope" subcgroup, instead of the main service unit
cgroup, and hence $MEMORY_PRESSURE_WATCH as set by the per-system
service manager would simply be wrong.
pam_systemd: process the two new capabilities user records fields in pam_systemd
And also: by default, for the systemd-user service and for local
sessions (i.e. those assigned to a seat): let's imply CAP_WAKE_SYSTEM
for them by default. Yes, let's pass one specific capability by default to local
unprivileged users.
The capability services exactly once purpose: to allow system wake-up
from suspend via alarm clocks, hence is relatively limited in focus. By
adding this tools such as GNOME's Alarm Clock app can simply allocate a
CLOCK_REALTIME_ALARM (or ask systemd --user to do this) timer and it
will wake up the system as necessary.
Note that systemd --user will not pass the ambient caps on by default,
so even with this change, individual services need to use
AmbientCapabilities= to pass this on to the individual programs.
mkosi: add some really basic tools to default mkosi image
"passwd" and "pscap" are extremely useful to debug basic OS behaviour,
and tiny. So let's add them to our default development images, just to
save us some headaches.
sd-event: handle kernels that set CONFIG_PSI_DEFAULT_DISABLED more gracefully
If CONFIG_PSI_DEFAULT_DISABLED is set in the kernel, then the PSI files
will be there, and you can open them, but read()/write() will fail.
Which is terrible, since that happens so late. But anyway, handle this
gracefully.
journald: remove triplicate logging about failure to write log lines
Let's log exactly at one place about failed writing of log lines to
journal file: in shall_try_append_again().
Then, if we decide to suppress a retry-after-vacuum because we already
vacuumed anyway then say this explicitly as "supressed rotation",
because that's what we do here.
This removes triplicate logging about the same error, and logs exactly
once, plus optional one "suppressed rotation" message. (plus more debug
output). The triplicate logging was bad in particular because it had no
understanding of the actual error codes and just showed generic UNIX
error strings ("Not a XENIX named type file"). By relying on
shall_try_append_again() to do all logging we now get very clean error
strings for all conditions.
journald: always pass error code to logging function, even if we don't use it with %m
We always want to pass the error code along with the log call, so that
it can add it to structured logging, even if the format string does not
contain %m.
journald: downgrade various log messages from LOG_WARNING to LOG_INFO
None of these conditions are real issues, but they can simply happen
because we just swtched from /run to /var as backend for logging and
there are old files from different boots with different systemd versions
and so on.
Let's not make more noise than necessary: still log, but not consider it
a warning, but just some normal thing.
We are handling these issues safely after all: by rotating and starting
anew, i.e. there's no reason to be concerned.
Reported-by: Alexey Gladkov <legion@kernel.org> Fixes: 17d97d4c90f8 ("udev: create disk/by-diskseq symlink only when the device has diskseq") Fixes: 583dc6d933d8 ("udev: also create partition /dev/disk/by-diskseq/ symlinks")
We should check alignment *after* determining the pointer points into
our pool, not before. Otherwise might might end up checking alignment of
the pointer relative to our base, even though it is taken relative to
some other base.
libsystemd: sd_journal_get_seqnum() must be tagged with 254 symver, not 253
This is a follow-up for b1712fabd1702640b04b0acdbba2d78294313a4d which
was prepped for 253, but merged into early 254 development cycle. It
thus had the symbol it adds at the wrong symver. Fix thta.
Yu Watanabe [Mon, 13 Feb 2023 18:39:15 +0000 (03:39 +0900)]
time-util: make parse_timestamp() use the RFC-822/ISO 8601 standard timezone spec
If the timezone is specified with a number e.g. +0900 or so, then
let's parse the time as a UTC, and adjust it with the specified time
shift.
Otherwise, if an area has timezone change, e.g.
---
Africa/Casablanca Sun Jun 17 01:59:59 2018 UT = Sun Jun 17 01:59:59 2018 +00 isdst=0 gmtoff=0
Africa/Casablanca Sun Jun 17 02:00:00 2018 UT = Sun Jun 17 03:00:00 2018 +01 isdst=1 gmtoff=3600
Africa/Casablanca Sun Oct 28 01:59:59 2018 UT = Sun Oct 28 02:59:59 2018 +01 isdst=1 gmtoff=3600
Africa/Casablanca Sun Oct 28 02:00:00 2018 UT = Sun Oct 28 03:00:00 2018 +01 isdst=0 gmtoff=3600
---
then we could not determine isdst from the timezone (+01 in the above)
and mktime() will provide wrong results.
shared: move cg_set_access() declaration to right header file
This function was moved from cgroup-util.c to cgroup-setup.c a while
back, but the prototype in the matching header files wasn't migrated.
Let's fix that.