Yu Watanabe [Sat, 20 Jan 2024 13:14:14 +0000 (22:14 +0900)]
journalctl: call all cleanup functions before raise()
Note, even with this, memory allocated internally by glibc is not freed.
But, at least, memory explicitly allocated by us is freed cleanly even
Ctrl-C is pressed during 'journalctl --follow'.
Daan De Meyer [Mon, 25 Dec 2023 22:11:22 +0000 (23:11 +0100)]
repart: Add --generate-fstab= and --generate-crypttab= options
These can be used along with two new settings MountPoint= and
EncryptedVolume= to write fstab and crypttab entries to the given
paths respectively in the root directory that repart is operating on.
This is useful to cover scenarios that aren't covered by the
Discoverable Partitions Spec. For example when one wants to mount
/home as a separate btrfs subvolume. Because multiple btrfs subvolumes
can be mounted from the same partition, we allow specifying MountPoint=
multiple times to add multiple entries for the same partition.
test: make the MemoryHigh= limit a bit more generous with sanitizers
When we're running with sanitizers, sd-executor might pull in a
significant chunk of shared libraries on startup, that can cause a lot
of memory pressure and put us in the front when sd-oomd decides to go on
a killing spree. This is exacerbated further on Arch Linux when built
with gcc, as Arch ships unstripped gcc-libs so sd-executor pulls in over
30M of additional shared libs on startup:
Yu Watanabe [Tue, 2 Jan 2024 19:30:29 +0000 (04:30 +0900)]
sd-journal: do not read unnecessary object
In journal_file_next_entry(), if the passed offset matches an entry object,
then generic_array_bisect() returns the object, but the object we
requested is the next (or previous) object. Hence, we should not validate
the object returned by generic_array_bisect(), otherwise it may fail
when the journal is corrupted.
Note the validity of the entry object that should be returned by
journal_file_next_entry() will be checked in the following generic_array_get().
So, when journal_file_next_entry() succeeds, the returned object is
always validated.
Yu Watanabe [Tue, 2 Jan 2024 19:30:24 +0000 (04:30 +0900)]
sd-journal: always put verified object into the chain cache
Let's consider the case that
- the first array contains valid entries,
- all entries in the second array are corrupted.
Then, when we are going to upwards, and a call of generic_array_bisect()
matches the last entry of the first array, then the second array was
cached with last_index == UINT64_MAX, instead of the first array with
its last entry.
Hence, when generic_array_bisect() is called next time, the function call
of test() always fail. So, the cache entry is mostly meaningless.
Luca Boccassi [Wed, 11 Oct 2023 18:23:40 +0000 (19:23 +0100)]
repart: support OpenSSL engines/providers for signing
The provider API which is new requires providers, which are not
widely available and don't work very well yet, so also use a
fallback with the legacy engine API.
bpf-devices: if a device node is referenced which doesn't exist, downgrade log message
Currently in many of our test cases you'll see a warning about a tun
device not being around. Let's make that quiet, since if there's no such
device there's no point in adding it to a policy anyway, and it makes
useless noise go away.
We keep the warning as a warning if a device node is missing for other
errors than ENOENT.
bpf-devices: normalize the return handling of functions that put together policy
under some conditions we suppress generating BPF programs. Let's
systematically return 0 when we do this, and 1 if we did actually
soething, instead of second guessing this in the caller.
This is not only more correct, but allows us to suppress BPF programs in
more cases in later commits.
bpf-devices: normalize how we pass around major/minor values
There's some unclarity whether major/minor of device nodes are supposed
to be "unsigned" or "dev_t". Various codebases assume the latter, but
glibc's major()/minor() types actually return a value typed to
"unsigned". On glibc dev_t is actually 64bit even if the kernel only
exposes 32bit. Hence this distinction kinda matters.
Let's clean things up a bit with handling: let's followe glibc's type
system here, and use unsigned (and not int).
Also let's pass invalid major/minor values around as UINT_MAX rather
than via pointers, to match how we usually do this, and to shorten our
code a bit. This is safe, since given the linux dev_t space being 32bit
only we can't possibly have a valid major or minor this hight, given
they must be smaller in size. While other archs disagree on the types of
major/minor, they also tend to have similar limits. In fact on FreeBSD
for example major()/minor() returns a signed int. Which would hence also
mean that UINT_MAX cannot be a valid major or minor.
dev-setup: normalize logging around lock_dev_console()
Previously this function would log loudly in some cases but not in
others. Clean this up, and dont log at all, matching our coding style
which says we should either log in all error cases or in none.
Both callers of this function do logging already, hence no need to
duplicate it here.
test: adjust test-path to fail gracefully with the new pidfd_spawn stuff
Since 2e106312e2 the test unit fails with 'resources' result instead of
'exit-code', which the test didn't account for when running unprivileged.
Before 2e106312e2:
$ /root/systemd/build/test-path
Failed to start transient scope unit: Interactive authentication required.
Couldn't allocate a scope unit for this test, proceeding without.
...
-.slice: Failed to enable/disable controllers on cgroup /user.slice/user-1000.slice/session-1.scope, ignoring: Permission denied
app.slice: Failed to create cgroup /user.slice/user-1000.slice/session-1.scope/app.slice: Permission denied
-.slice: Failed to enable/disable controllers on cgroup /user.slice/user-1000.slice/session-1.scope, ignoring: Permission denied
app.slice: Failed to create cgroup /user.slice/user-1000.slice/session-1.scope/app.slice: Permission denied
...
line 151: path-exists.path: state = running; result = success (left: 29986250)
line 151: path-exists.service: state = start; result = success
path-exists.service: Main process exited, code=exited, status=219/CGROUP
path-exists.service: Failed with result 'exit-code'.
line 151: path-exists.path: state = running; result = success (left: 29985948)
line 151: path-exists.service: state = failed; result = exit-code
Failed to start service path-exists.service, aborting test: failed/exit-code
After 2e106312e2:
$ /root/systemd/build/test-path
Failed to start transient scope unit: Interactive authentication required.
Couldn't allocate a scope unit for this test, proceeding without.
...
-.slice: Failed to enable/disable controllers on cgroup /user.slice/user-1000.slice/session-1.scope, ignoring: Permission denied
app.slice: Failed to create cgroup /user.slice/user-1000.slice/session-1.scope/app.slice: Permission denied
-.slice: Failed to enable/disable controllers on cgroup /user.slice/user-1000.slice/session-1.scope, ignoring: Permission denied
app.slice: Failed to create cgroup /user.slice/user-1000.slice/session-1.scope/app.slice: Permission denied
path-exists.service: Failed to spawn executor: No such file or directory
path-exists.service: Failed to spawn 'start' task: No such file or directory
path-exists.service: Failed with result 'resources'.
packit: temporarily build systemd without BPF stuff
The kernel-tools meta-package was retired in Rawhide, but its
replacement has not landed, yet. Until that happens, let's build without
the bpf-framework stuff.
Daan De Meyer [Thu, 8 Feb 2024 09:54:54 +0000 (10:54 +0100)]
Add systemd.default_debug_tty=
Let's allow configuring the debug tty independently of enabling/disabling
the debug shell. This allows mkosi to configure the correct tty while
leaving enabling/disabling the debug tty to the user.
sysext: rename "directory_name" field to "full_identifier"
So the field contains simply the full name of the command being invoked,
hence rename the field to match the contents, and to mirror the
"short_identifier" field.
Interestingly, the field is apparently not actually used by anything
though! But we are not going to remove it, since a follow-up commit will
start making use of it.
Yu Watanabe [Thu, 8 Feb 2024 03:47:39 +0000 (12:47 +0900)]
network: make Reload bus method synchronous
Prompted by https://github.com/systemd/systemd/pull/30085#discussion_r1401534107.
Note, like Reconfigure bus method, even reconfiguration for an interface is
triggered by Reload method, the method only wait for the link enters
configuring state (or unmanaged state if no matching .network file exists).
Users still need to invoke systemd-networkd-wait-online if it is
necessary to wait for the interface enters configured state after Reload
medhod.
As described in https://github.com/systemd/systemd/issues/31235, the preset
state for systemd-homed-activate.service was unclear. On the one hand, we have
a preset with 'enable systemd-homed.service', and systemd-homed.service has
'Also=systemd-homed-activate.service systemd-homed-firstboot.service', so
'preset systemd-homed.service' would also enable those two services, but
'preset systemd-homed-activate.service' would disable it, because the presets
don't say it is enabled. It seems that this configuration is internally
inconsistent. As described in the issue, maybe systemctl should be smarter
here, or warn about such configs. Either way, let's make our config consistent.
Luca Boccassi [Wed, 7 Feb 2024 00:36:39 +0000 (00:36 +0000)]
portable: add --copy=mixed to copy images and link profiles
This new mode copies resources provided by the client, so that they
remain available for inspect/detach even if the original images are
deleted, but symlinks the profile as that is owned by the OS, so that
updates are automatically applied.
man: mention that preset-all is performed during early boot
The intro of systemd-firstboot is rewritten to make it clearer how it fits into
the big picture. Systemd does some machine-id and presets and
systemd-firstboot.service is used to interactively fill in the blanks.
sd-dhcp6-client: allow setting send-release when client is running
The send-release option only affects to the client when STOPPING. There
is no reason to do not allow this option to be set while the client is
running.
An user might want to delay the decision of sending a RELEASE message to
a later stage where the client is already running.
process-util: use only the least significant byte from personality()
The personality() syscall returns a 32-bit value where the top three
bytes are reserved for flags that emulate historical or architectural
quirks, and only the least significant byte reflects the actual
personality we're interested in (in opinionated_personality()).
Use the newly defined mask in the corresponding test as well, otherwise
the test fails on some more "exotic" architectures that set some of the
"quirk" flags:
~# uname -m
armv7l
~# build/test-seccomp
...
/* test_lock_personality */
current personality=0x0
safe_personality(PERSONALITY_INVALID)=0x800000
Assertion '(unsigned long) safe_personality(current) == current' failed at src/test/test-seccomp.c:970, function test_lock_personality(). Aborting.
lockpersonalityseccomp terminated by signal ABRT.
Assertion 'wait_for_terminate_and_check("lockpersonalityseccomp", pid, WAIT_LOG) == EXIT_SUCCESS' failed at src/test/test-seccomp.c:996, function test_lock_personality(). Aborting.
Aborted (core dumped)
See: personality(2) and comments in sys/personality.h
Yu Watanabe [Fri, 2 Feb 2024 04:08:35 +0000 (13:08 +0900)]
network: set 'removing' flag to remembered object
Previously, if address_remove() or friends called with a temporary
object, the removing flag is assigned to the temporary object, and is
not set to the remembered object. Hence, e.g.
route_is_ready_to_configure() wrongly judge a required address for a
route is (still) ready, hence networkd fails to configure the route.
After the commit, remembered Address objects by Link are always given by
kernel. Hence, it is not necessary to set the flag, as it is always
ignored by the kernel, and the kernel set the flag on notification if it
is necessary.
This is in preparation for https://github.com/systemd/systemd/pull/30360 to be
merged in a future release. As described there:
nscd is known to be racy [1] and it was already deprecated and later dropped
in Fedora a while back [1,2]. We don't need to support obsolete stuff in
systemd, and the cache in systemd-resolved provides a better solution anyway.
Note that our "support" is only the signal to flush the cache that we send at
various points. Nscd itself may still exist, dropping it is a decision to be
made in glibc.
Mike Yuan [Sun, 4 Feb 2024 15:22:46 +0000 (23:22 +0800)]
core: reuse credential dir across start and start-post if populated,
fresh otherwise
Currently, exec_setup_credential() always rewrite all credentials
upon exec_invoke(), i.e. invocation of each ExecCommand, and within
a single tmpfs instance. This is problematic though:
* When writing each tmp cred file, we essentially double the size
of the credential. Therefore, if one cred is bigger than half
of CREDENTIALS_TOTAL_SIZE_MAX, confusing ENOSPC occurs (see also
https://github.com/systemd/systemd/pull/24734#issuecomment-1925440546)
* Credential is a unit-wide thing and thus should not change
during the whole lifetime of main process. However, if e.g.
a on-disk credential or SetCredential= in unit file
changes between ExecStart= and ExecStartPost=,
the credentials are overwritten when the latter gets to run,
and the already-running main process is suddenly seeing
completely different creds.
So, let's try to reuse final cred dir if the main process has started
and the tmpfs has been populated, so that the creds used is stable
across all ExecStart= and ExecStartPost=-s. We still want to retain
the ability of updating creds through ExecStartPre= though, therefore
we forcibly use a fresh cred dir for those. 'Fresh' means to actually
unmount the old tmpfs first, so the first problem goes away, too.