Adrian Vovk [Sat, 13 Jan 2024 16:08:12 +0000 (11:08 -0500)]
basic: Add some sha256 helper functions
Adds a util function to sha256 an open fd (moved from dissect). Also
adds functions to check if a string contains a valid sha256 hash, and
parse it into a sha256 array.
Adrian Vovk [Wed, 17 Jan 2024 19:48:45 +0000 (14:48 -0500)]
fundamental: Add overflow-safe math helpers
ADD_SAFE/SUB_SAFE/MUL_SAFE do addition/subtraction/multiplication
respectively with an overflow check. If an overflow occurs these return
false, otherwise true. Example: (c = a + b) would become ADD_SAFE(&c, a,
b)
INC_SAFE/DEC_SAFE/MUL_ASSIGN_SAFE are like above but they also reassign
the first argument. Example: (a += b) would become INC_SAFE(&a, b)
Also update tools/meson-vcs-tag.sh to use carets instead of hyphens
for the git part of the version as carets are allowed to be part of
a version by pacman while hyphens are not and both sort higher than
a version without the git part.
Ondrej Kozina [Wed, 31 Jan 2024 12:11:21 +0000 (13:11 +0100)]
cryptsetup: Add optional support for linking volume key in keyring.
cryptsetup 2.7.0 adds feature to link effective volume key in custom
kernel keyring during device activation. It can be used later to pass
linked volume key to other services.
For example: kdump enabled systems installed on LUKS2 device.
This feature allows it to store volume key linked in a kernel keyring
to the kdump reserved memory and reuse it to reactivate LUKS2 device
in case of kernel crash.
cunshunxia [Wed, 24 Jan 2024 03:23:19 +0000 (11:23 +0800)]
Fix OOMPolicy= version in manpage of systemd.scope
OOMPolicy in scope units is separately supported in
version v253, so I think it cannot be directly used
in the manpage with the version from the service.
Franck Bui [Thu, 8 Feb 2024 15:12:41 +0000 (16:12 +0100)]
test/test-shutdown.py: optionally display the test I/Os in a dedicated log file
Given that the test involves screen(1), sending various control sequences to
resize/clear the screen, most of the logs sent from the python script were
nearly impossible to read or mixed with other messages sent to the console
hence making the debug harder when the test is run manually.
This patch introduces an option to redirect the pexpect IOs into a file (to be
used in $STATEDIR/TEST-69-SHUTDOWN/run-nspawn).
The pexpect logs are also enabled later so the boot logs are skipped since
those are already included in the journal.
btrfs-util: apparently btrfs ioctls return unaligned data. deal with it.
Kinda sad, that interfaces like this exist in 2024. But let's deal with
it: before we access "struct btrfs_ioctl_search_header" let's copy it
out, and access it only in the aligned copy.
btrfs-util: use memdup_suffix0() instead of strndup() at one more place
The structure we copy this out is a large (unaligned) binary blob, hence
let's better use the memdup_suffix0() so that gcc doesn't make
assumption about the source being a valid string.
missing: change our close_range() syscall wrapper to map glibc's
So glibc exposes a close_range() syscall wrapper now, but they decided
to use "unsigned" as type for the fds. Which is a bit weird, because fds
are universally understood to be "int". The kernel internally uses
"unsigned", both for close() and for close_range(), but weirdly,
userspace didn't fix that for close_range() unlike what they did for
close()... Weird.
But anyway, let's follow suit, and make our wrapper match glibc's.
The idea is simple: skip the final operation that creates or removes things
or changes the attributes, but otherwise go through the rest of the code.
This results in quite a lot of fairly repetitive conditions in the low-level
code. Another approach would be to print earlier, at a higher level, but then
we'd have less precise information about what is about to happen.
Michal Koutný [Fri, 9 Feb 2024 15:03:00 +0000 (16:03 +0100)]
service: Demote log level of NotifyAccess= messages to debug
The situation is a service like
Type=notify
NotifyAccess=main
and the service uses some of the systemd helper utilities, e.g.
coredumpctl. The service process will pass NOTIFY_SOCKET to the helper
child (accidentally) and the result is a spurious notification and
the warning message:
> Jan 18 09:38:01 host systemd[1]: sdnotify.service: Got notification message from PID 13736, but reception only permitted for main PID 13549
Notification from helpers seem like an unintentional composition of the
commit c118b577fa ("coredumpctl: define main through macro") and commit 6b636c2d27 ("main-func: send main exit code to parent via sd_notify() on
exit"). The former used the handy macro for a main function, the latter
equipped any main function with the notification. (Further extended in
the commit 623a00020f ("notify: Add EXIT_STATUS field").)
Since notification from systemd utitilities are meant to extend
rudimentary exit()/wait() pair generally, they may happen to land into
service's NOTIFY_SOCKET. Tone down messages of notification that won't
match NotifyAccess=.
For the other verbs turning off JSON mode makes sense, but for "call"
not so much, after all the contents of a method call reply is JSON we
couldn't really show any other way.
Hence, when JSON output was not configured otherwise in "call", default
to the same as -j.
It exposes the varlink_collect() call we internally provide: it collects
all responses of a method call that is issued with the "more" method
call flag. It then returns the result as a single JSON array.
This reworks varlink_collect() so that it is not just a wrapper around
varlink_observe(), varlink_bind_reply() and others. It becomes a first
class operation.
This has various benefits:
1. Memory management is normalized: the reply json variant is now
tracked as part of the varlink object, and thus we do not pass
ownership to the caller. This is just like we do it for simple method
calls and removes a lot of confusion.
2. The bind reply/user data pointer can be used for user stuff, we'll
not silently override this.
3. We enforce an overall time-out operation on the whole thing, so that
this synchronous operation does no longer block forever.
units: enable MaxConnectionsPerSocket= for all our Accept=yes units
Let's make sure that user's cannot DoS services for other users so
easily, and enable MaxConnectionsPerSocket= by default for all of them.
Note that this is mostly paranoia for systemd-pcrextend.socket and
systemd-sysext.socket: the socket is only accessible to root anyway,
hence the accounting shouldn#t change anything. But this is just a
safety net, in preparation that we open up some functionality of these
services sooner or later.
pid1: make MaxConnectionsPerSource= also work for AF_UNIX sockets
The setting currently puts limits on connections per IP address and
AF_UNIX CID. Let's extend it to cover AF_UNIX too, where it puts a limit
on connections per UID.
This is particularly useful for the various Accept=yes Varlink services
we now have, as it means, the number of per-user instance services
cannot grow without bounds.
Eric Daigle [Fri, 9 Feb 2024 07:09:34 +0000 (23:09 -0800)]
firstboot: validate keymap entry
As described in #30940, systemd-firstboot currently does not perform
any validation on keymap entry, allowing nonexistent keymaps to be
written to /etc/vconsole.conf. This commit adds validation checks
based on those already performed on locale entry, preventing invalid
keymaps from being set.
Franck Bui [Wed, 7 Feb 2024 12:41:48 +0000 (13:41 +0100)]
gpt-auto-generator: be more defensive when checking the presence of ESP in fstab
Looking for the ESP node is useful to shortcut things but if we're told that
the node is not referenced in fstab that doesn't necessarily mean that ESP is
not mounted via fstab. Indeed the check is not reliable in all cases. Firstly
because it assumes that udev already set the symlinks up. This is not the case
for initrd-less boots. Secondly the devname of the ESP partition can be wrongly
constructed by the dissect code. For example, the approach which consists in
appending "p<partnum>" suffix to construct the partition devname from the disk
devname doesn't work for DM devices.
Hence this patch makes the logic more defensive and do not mount neither ESP
nor XBOOTLDR automatically if any path in paths that starts with /efi or /boot
exists.
Yu Watanabe [Tue, 2 Jan 2024 19:28:25 +0000 (04:28 +0900)]
logs-show: get timestamp and boot ID only when necessary
Previously, get_display_timestamp() is unconditionally called even if we
will show logs in e.g. json format.
This drops unnecessary call of get_display_timestamp().
This also makes journal fields in each entry parsed only once in
output_short(). Still output_verbose() twice though.
This should improve performance of dumping journals.
Replaces #29365.
Co-authored-by: Costa Tsaousis <costa@netdata.cloud>
Yu Watanabe [Tue, 2 Jan 2024 19:28:11 +0000 (04:28 +0900)]
sd-journal: drop to use Hashmap to manage journal files per boot ID
As reported at https://github.com/systemd/systemd/pull/30209#issuecomment-1831344431,
using hashmap in frequently called function reduces performance.
Let's replace it with a single array and bsearch.
Replaces #29366.
Co-authored-by: Costa Tsaousis <costa@netdata.cloud>
Yu Watanabe [Sat, 20 Jan 2024 13:14:14 +0000 (22:14 +0900)]
journalctl: call all cleanup functions before raise()
Note, even with this, memory allocated internally by glibc is not freed.
But, at least, memory explicitly allocated by us is freed cleanly even
Ctrl-C is pressed during 'journalctl --follow'.
Yu Watanabe [Tue, 2 Jan 2024 19:27:59 +0000 (04:27 +0900)]
sd-journal: cache last entry offset and journal file state
When the offset of the last entry object (or last object for journal
files generated by an old journald) is not changed, the timestamps
should be updated by journal_file_read_tail_timestamp() are unchanged.
So, we can drop to call fstat() in the function.
As, the journal header is always mapped, so we can read the offset and
journal file state without calling fstat.
Still, when the last entry offset is changed, we may need to call fstat()
to read the entry object. But, hopefully the number of fstat() call
can be reduced.
Mike Yuan [Wed, 31 Jan 2024 17:25:49 +0000 (01:25 +0800)]
core/service: allow RestartForceExitStatus= for oneshot services
I think this was just overlooked in #13754, which removed
the restriction of Restart= on Type=oneshot services.
There's no reason to prevent RestartForceExitStatus=
now that Restart= has been allowed.
Mike Yuan [Wed, 31 Jan 2024 17:47:35 +0000 (01:47 +0800)]
core/service: make error msg match with conditions
This was discussed in
https://github.com/systemd/systemd/pull/13754#discussion_r333395362.
I think we should actually list "success" Restart= settings instead.
There are more error statuses than success ones after all, and this
list hasn't really changed for quite some time.
Daan De Meyer [Mon, 25 Dec 2023 22:11:22 +0000 (23:11 +0100)]
repart: Add --generate-fstab= and --generate-crypttab= options
These can be used along with two new settings MountPoint= and
EncryptedVolume= to write fstab and crypttab entries to the given
paths respectively in the root directory that repart is operating on.
This is useful to cover scenarios that aren't covered by the
Discoverable Partitions Spec. For example when one wants to mount
/home as a separate btrfs subvolume. Because multiple btrfs subvolumes
can be mounted from the same partition, we allow specifying MountPoint=
multiple times to add multiple entries for the same partition.
test: make the MemoryHigh= limit a bit more generous with sanitizers
When we're running with sanitizers, sd-executor might pull in a
significant chunk of shared libraries on startup, that can cause a lot
of memory pressure and put us in the front when sd-oomd decides to go on
a killing spree. This is exacerbated further on Arch Linux when built
with gcc, as Arch ships unstripped gcc-libs so sd-executor pulls in over
30M of additional shared libs on startup:
In interactive terminal mode, let's set a window title that reflects our
change of context to the target. Let's prefix it it with red/yellow
emoji dot in case we changed privileges.
Let's update the window title with an ANSI sequence if we can. We'll
insert a blue dot, to match the blue tinting of the terminal screen,
indicating that we are in a container.
1. it resets the background color not only on NL (aka LF) but also on
CR, but without erasing things to the end of the line. This increases
compatbility with tools such as "less" which use CR to jump back to
the beginning of the line.
2. previously we'd not process series of newlines or ansi sequences
without intermediate other characters correctly, we'd always assume
what follows is regular text. Fix that, and correctly determine the
right state from the subsequent character.
Franck Bui [Thu, 8 Feb 2024 15:11:21 +0000 (16:11 +0100)]
test-69: send SIGTERM to ask systemd-nspawn to properly stop the container
The terminate() method sends SIGHUP but this signal is not handled by
systemd-nspawn hence the process just exits leaving the container scope around
breaking futher test executions.
This patch sends SIGTERM instead which is a defined API to request
sytemd-nspawn to stop and release the container's resources properly.
... i.e. apply nested config (exclusions and such) when executing R and D.
This fixes a long-standing RFE. The existing logic seems to have been an
accident of implementation. After all, if somebody specifies a config with
'R /foo; x /tmp/bar', then probably the goal is to remove stuff from under /foo,
but keep /tmp/bar. If they just wanted to nuke everything, then would not specify
the second item.
This also makes R and D use O_NOATIME, i.e. the access times of the directories
that are accessed will not be changed by the cleanup.
Obviously, we'll have to add this to NEWS and such.
Looking at the whole tmpfiles.d config in Fedora, this change has no effect.
The test cases are adjusted as appropriate. I also added another test case for
'R'/'D' with a file, just to test this code path more.
One of the three must always be specified, but they buried in a long list of
options in the output of --help. Make them more visible to draw the eye.
Also, drop "marked" from the description. It's supposed to mean "configured",
but it's a strange way to say that, and also it's generally obvious that the
program does what its configuration tells it to, and it's not going to remove
all files found on the system.
Previously, if given an absolute path, we would open the file, but when given a
relative path, we'd attempt to search the directories. If the user wants to open
a file from the search path, allowing paths is very confusing. E.g. with a path
like 'sysusers/foo.conf', we'd try to open '/etc/sysusers.d/sysusers/foo.conf',
'/run/sysusers.d/sysusers/foo.conf', …, and with '../foo.conf', we'd try to open
'/etc/sysusers.d/../foo.conf', '/run/sysusers.d/../foo.conf', …. This just isn't
useful, and in fact for a scheme like sysusers.d and tmpfiles.d where there we
have a flat directory with config files, only searching for plain names can
result in success. When a user specifies a relative path, it's more likely that
they wanted to open some local file. OTOH, to correctly open a local file, e.g.
one that they're just writing, this interface is also awkward, because something
like '$PWD/file.conf' has to be used to open a file with a relative path.
This patch changes the interface so that any path (i.e. an argument with "/") is
used to open a file directly, and only plain basenames are used for searching.
(Note that tpmfiles and sysusers are somewhat special here: their "config files"
make sense without the other config and users are likely to want to test them
without the other config. I was trying to do just that when writing a spec file
for a package and attempting to convert the existing scripts to sysusers and
tmpfiles. The same logic wouldn't apply for example to units or udev rules,
because they generally can only be interpreted with the whole rest of config
also available.)
I was annoyed that systemd-sysusers doesn't print any info when it opens a
config file. Its read_config_file() started out the same as the one in tmpfiles,
and then they diverged. The one in tmpfiles has that logging, hence the rework
to use it here too and get better logging. The two programs should provide
similar functionality, so using a common helper will make it easier to extend
them in tandem later.
No functional change apart from the log info.
The userdata argument (Context) is moved to the last position as requested in
the review.
a3451c2c4ce7d3c02451f6ace4ee9f873880f78f added offline uid/gid support in a way
where the <root>/etc/passwd and <root>/etc/group would be read anew for each
configuration file that was parsed. The result would always be the same, so I
assume that this was an oversight. Let's use a global cache and and read the
file just once.