systemctl: explicitly cast the constants to uint64_t
Otherwise under certain conditions `va_arg()` might get garbage instead
of the expected value, i.e.:
$ sudo build-o0/systemctl disable asdfasfaf
sd_bus_message_appendv: Got uint64_t: 0
Failed to disable unit: Unit file asdfasfaf.service does not exist.
Fixes: #21787
(Strictly speaking, this leaves a race window open: the the system is
powered off in the short interval when we linked in the prepared hwdb
file into the dir under a temporary name and are about to rename it to
the final name, then the file might be left over after all. But this
minimizes the window so much that this shouldn't be an issue in
real-life. Key after all is that with this change we'll build up the
hwdb file under O_TMPFILE, and thus are robust to power loss during the
slow operation)
Yu Watanabe [Fri, 3 Mar 2023 07:00:59 +0000 (16:00 +0900)]
time-util: refuse non-zero gmtoff with non-UTC timezone
Also this moves the range check for gmtoff to parse_timestamp_impl(), to
address the post-merge comment:
https://github.com/systemd/systemd/pull/26409#discussion_r1118650190
Yu Watanabe [Fri, 3 Mar 2023 06:51:56 +0000 (15:51 +0900)]
time-util: rename len -> tz_offset
And merge parse_timestamp_with_tz() with parse_timestamp_impl().
Addresses the post-merge comment:
https://github.com/systemd/systemd/pull/26409#discussion_r1118647440
nspawn: disable propagation for selected host API bind mounts
We bind mount two selected inodes from the host into our container.
Let's turn off propagation for that, since we just want those inodes,
nothing else.
With this change "grep master: /proc/self/mountinfo" should list only
the mount propagation "tunnel" dir, and nothing else anymore.
nspawn: disconnect mounts propagation from host on our container dir
@brauner noticed that in invoked containers the root directory is set to
still receive mounts from the host. We should disable that, and
guarantee we live in our own world, because that's what an
(nspawn-style) container *is* after all: a whole new world.
This hence mounts the container subtree to MS_PRIVATE after getting the
root dir in place. Note that this will later be set to MS_SHARED again.
The MS_PRIVATE disconnects mounts from the host, the MS_SHARED then
establishes a new peer group for mount propagation events, so that
payload service managers (such as systemd) can take benefit of
propagation further down the tree.
Michal Koutný [Wed, 1 Mar 2023 21:54:06 +0000 (22:54 +0100)]
meson: Copy files with git only in true git repository
When mkosi is run from git-worktree(1), the .git is not a repository
directory but a textfile pointing to the real git dir
(e.g. /home/user/systemd/.git/worktrees/systemd-worktree). This git dir
is not bind mounted into build environment and it fails with:
> fatal: not a git repository: /home/user/systemd/.git/worktrees/systemd-worktree
> test/meson.build:190:16: ERROR: Command `/usr/bin/env -u GIT_WORK_TREE /usr/bin/git --git-dir=/root/src/.git ls-files ':/test/dmidecode-dumps/*.bin'` failed with status 128.
There is already a fallback to use shell globbing instead of ls-files,
use it with git worktrees as well.
msizanoen1 [Wed, 1 Mar 2023 10:35:17 +0000 (17:35 +0700)]
escape: Ensure that output is always valid UTF-8
This ensures that shell string escape operations will not produce output
with invalid UTF-8 from the input by escaping invalid UTF-8 data as if
they were single byte characters.
This adds a test for checking we can safely order boot IDs via the
timestamp of their most recent known entry. It takes a set of journal
files (supplied by a user) and that are partially corrupted, and ensures
we get a clear, defined order of boot IDs out of it.
journal: use boot-id/timestamp info for odering entries
With this we should be able to determine on systems without
battery-backed RTC even during early boot whether a boot is supposed to
be earlier than another.
sd-journal: track newest open journal file per boot ID
This is useful to later order boot IDs by time, addressing #662.
Basically, this determines the most recently written for each boot ID
from all currently open journal files. This is then stored in a hash
table (which maps the boot ID to a prioq of journal files, ordered by
their timestamp).
Why is this useful? If systems lack a battery-buffered RTC they will
initially have a system clock basically starting at zero. Later they
might acquire an NTP fix, or at least roughly monotonic time via a
stored timestamp. Thus, log entries written early during boot tend to be
badly timestamped, and those written most recently are likely to have
most accurate timestamps. Thus, if we track the newest entry for each
boot ID we likely can order the boot ID via their timestamps.
This commit only add the logic to maintain the hash table/prioq. It
doesn't actually make use of this information for ordering yet. A later
patch adds that.
test: skip the hwdb update related tests w/ sanitizers and w/o accel
systemd-hwdb update is an expensive operation by itself, and when
running with sanitizers and in a VM without acceleration this cost is
exacerbated even further, making the test run for a very long time.
For example, in the daily CentOS CI ppc64le job with ASan+UBSan one
systemd-hwdb update takes more than 7 minutes; in the regular Arch job
with KVM it takes over 2 minutes.
Since the hwdb update is also tested in other places (like
TEST-01-BASIC and the test-hwdb meson test), let's skip it if we detect
we run with sanitizers and with plain QEMU.
Since quite a while the propagation from the DDI arch into the
personality() wasn't hooked up anymore. Let's fix that: when the DDI has
a determined arch, automatically propagate this into the personality.
units: let systemd --user manage its own memory pressure handling
Let's make things systematic: the per-user and the per-system manager
should manage their own memory pressure, as they are, well, managers of
things.
This is particularly relevant and the per-user service manager should
watch its own "init.scope" subcgroup, instead of the main service unit
cgroup, and hence $MEMORY_PRESSURE_WATCH as set by the per-system
service manager would simply be wrong.
pam_systemd: process the two new capabilities user records fields in pam_systemd
And also: by default, for the systemd-user service and for local
sessions (i.e. those assigned to a seat): let's imply CAP_WAKE_SYSTEM
for them by default. Yes, let's pass one specific capability by default to local
unprivileged users.
The capability services exactly once purpose: to allow system wake-up
from suspend via alarm clocks, hence is relatively limited in focus. By
adding this tools such as GNOME's Alarm Clock app can simply allocate a
CLOCK_REALTIME_ALARM (or ask systemd --user to do this) timer and it
will wake up the system as necessary.
Note that systemd --user will not pass the ambient caps on by default,
so even with this change, individual services need to use
AmbientCapabilities= to pass this on to the individual programs.
mkosi: add some really basic tools to default mkosi image
"passwd" and "pscap" are extremely useful to debug basic OS behaviour,
and tiny. So let's add them to our default development images, just to
save us some headaches.
sd-event: handle kernels that set CONFIG_PSI_DEFAULT_DISABLED more gracefully
If CONFIG_PSI_DEFAULT_DISABLED is set in the kernel, then the PSI files
will be there, and you can open them, but read()/write() will fail.
Which is terrible, since that happens so late. But anyway, handle this
gracefully.
journald: remove triplicate logging about failure to write log lines
Let's log exactly at one place about failed writing of log lines to
journal file: in shall_try_append_again().
Then, if we decide to suppress a retry-after-vacuum because we already
vacuumed anyway then say this explicitly as "supressed rotation",
because that's what we do here.
This removes triplicate logging about the same error, and logs exactly
once, plus optional one "suppressed rotation" message. (plus more debug
output). The triplicate logging was bad in particular because it had no
understanding of the actual error codes and just showed generic UNIX
error strings ("Not a XENIX named type file"). By relying on
shall_try_append_again() to do all logging we now get very clean error
strings for all conditions.
journald: always pass error code to logging function, even if we don't use it with %m
We always want to pass the error code along with the log call, so that
it can add it to structured logging, even if the format string does not
contain %m.
journald: downgrade various log messages from LOG_WARNING to LOG_INFO
None of these conditions are real issues, but they can simply happen
because we just swtched from /run to /var as backend for logging and
there are old files from different boots with different systemd versions
and so on.
Let's not make more noise than necessary: still log, but not consider it
a warning, but just some normal thing.
We are handling these issues safely after all: by rotating and starting
anew, i.e. there's no reason to be concerned.
Reported-by: Alexey Gladkov <legion@kernel.org> Fixes: 17d97d4c90f8 ("udev: create disk/by-diskseq symlink only when the device has diskseq") Fixes: 583dc6d933d8 ("udev: also create partition /dev/disk/by-diskseq/ symlinks")
We should check alignment *after* determining the pointer points into
our pool, not before. Otherwise might might end up checking alignment of
the pointer relative to our base, even though it is taken relative to
some other base.