Mike Yuan [Tue, 30 Jul 2024 20:43:49 +0000 (22:43 +0200)]
cgroup-util: clean up cg_kill() and friends, completely split out cg_kill_kernel_sigkill()
cg_kill_kernel_sigkill() has a narrow use case, and currently
no code really reaches that branch. Let's detach it from
cg_kill_recursive() hence, and call it explicitly later
where appropriate.
Mike Yuan [Tue, 30 Jul 2024 20:29:00 +0000 (22:29 +0200)]
cgroup-util: drop unused cg_rmdir()
When removing a cgroup, we always want to eliminate subcgroups
first, i.e. use cg_trim(). And cg_rmdir() (along with
CGROUP_REMOVE flag) is simply unused. Kill it.
Ronan Pigott [Thu, 1 Aug 2024 17:59:12 +0000 (10:59 -0700)]
resolved: don't treat conn reset as packet loss
tcp reset / icmp port-unreachable are markedly different conditions than
packet loss. It doesn't make much sense to retry in this case. It's
actually not clear if there is any benefit at all retrying tcp
connections, which were presumably already retried as necessary by the
tcp stack.
Yu Watanabe [Mon, 27 May 2024 03:22:30 +0000 (12:22 +0900)]
sd-device-monitor: expose low-level functions
To make it work without sd-event.
Prompted by recent chat:
> Hey all!
> reading man libudev, it says to use sd-device instead now. I've read that
> APIs header file and it seems it no longer has an equivalent to libudev's
> udev_monitor_get_fd, which AFAICT means I have to use sd-event to watch
> for events I'm interested in. I know I can "embed" sd-event in other event
> loops I might already have, but that seems overkill when I'm only interested
> in this one type of event and don't need sd-event for anything else.
Yu Watanabe [Mon, 27 May 2024 03:05:24 +0000 (12:05 +0900)]
udev: manage only socket address of device monitor
Previously, the main process of systemd-udevd manages worker process
with their sd_device_monitor object to save the destination address.
Let's save only destination address, and drop worker's sd_device_monitor
object.
Yu Watanabe [Mon, 27 May 2024 01:43:54 +0000 (10:43 +0900)]
sd-device-monitor: bind socket in device_monitor_new_full()
Previously, device_monitor_enable_receiving() does
- update filter,
- bind socket.
But, binding socket can be done in when the socket is opened.
Let's remove device_monitor_enable_receiving() and bind the socket in
device_monitor_new_full().
Luca Boccassi [Thu, 1 Aug 2024 19:44:11 +0000 (20:44 +0100)]
os-release: break RELEASE_TYPE into paragraphs and clarify about rolling stable releases
Arch and Tumbleweed do not do EOLs but are still stable, so clarify the paragraph.
Also break the entry in paragraphs, to make it more readable when rendered.
Luca Boccassi [Thu, 1 Aug 2024 19:35:57 +0000 (20:35 +0100)]
os-release: change RELEASE_TYPE value from 'pre-release' to 'development'
The point was made on https://lists.debian.org/debian-ctte/2024/08/msg00005.html
that 'pre-release sounds' like an RC candidate, ie, something that will change
very slightly in the released version. But this is not necessarily the case
for example at the beginnig of a Fedora Rawhide or Debian Testing release cycle,
so change it to a more generic 'development'
Adrian Vovk [Fri, 24 May 2024 03:39:52 +0000 (23:39 -0400)]
os-release: Introduce experiment RELEASE_TYPE
This is for experimental builds of the OS made to test some specific WIP
feature.
For example, let's say the distro in question is Asahi Linux and Apple
just released the M3 SoC. The Asahi developers will start porting to the
M3, and will quickly generate builds of Asahi Linux that can technically
boot but aren't ready for any kind of daily use. These images are marked
as experimental, and can be shared among the developers. If a user
somehow stumbles upon one of these images and tries to install it,
they'll be warned that they're about to install an experimental Apple M3
port of Asahi Linux. Eventually, once the Asahi developers think that
their M3 port is ready for a wider audience, they can merge it into the
mainline Asahi repos, where it will be distributed through the usual
nightly CI builds (where RELEASE_TYPE=pre-release; M3 support is no
longer experimental).
Adrian Vovk [Wed, 22 May 2024 22:06:54 +0000 (18:06 -0400)]
os-release: Add RELEASE_TYPE=
This will allow GUIs to customize their behavior a little based on the
type of release.
For example, an OS installer may display a warning/disclaimer if
RELEASE_TYPE=prerelease. The software updates app might be a bit more
insistent about upgrading to the next major release if
RELEASE_TYPE=stable than if RELEASE_TYPE=lts
Yu Watanabe [Thu, 1 Aug 2024 02:40:20 +0000 (11:40 +0900)]
journalctl: fix compile error on i386
Fixes the following error:
===
In file included from ../src/basic/macro.h:13,
from ../src/basic/dirent-util.h:8,
from ../src/journal/journalctl-misc.c:3:
../src/journal/journalctl-misc.c: In function 'show_log_ids':
../src/journal/journalctl-misc.c:107:22: error: comparison is always true due to limited range of data type [-Werror=type-limits]
107 | assert(n_ids < INT64_MAX);
| ^
../src/fundamental/macro-fundamental.h:70:44: note: in definition of macro '_unlikely_'
70 | #define _unlikely_(x) (__builtin_expect(!!(x), 0))
| ^
../src/basic/macro.h:165:22: note: in expansion of macro 'assert_message_se'
165 | #define assert(expr) assert_message_se(expr, #expr)
| ^~~~~~~~~~~~~~~~~
../src/journal/journalctl-misc.c:107:9: note: in expansion of macro 'assert'
107 | assert(n_ids < INT64_MAX);
| ^~~~~~
cc1: all warnings being treated as errors
===
journalctl: add --list-invocations command and -I/--invocation options
The --list-invocations command is similar to --list-boots, but shows
invocation IDs of specified unit. This should be useful when showing
a specific invocation of a unit.
The --invocation option is similar to --boot, but takes a invocation ID
or an offset. The -I option is equivalent to --invocation=0.
The struct itself is generic, and can be used for other ID.
Let's rename it to more generic one.
No functional change, just refactoring and preparation for later
commits.
Let's use the newly added credentials to only enable autologin for
/dev/console (systemd-nspawn) and /dev/hvc0 (qemu) instead of enabling
autologin for every tty.
core: Add support for renaming credentials with ImportCredential=
This allows for "per-instance" credentials for units. The use case
is best explained with an example. Currently all our getty units
have the following stanzas in their unit file:
This means that setting agetty.autologin=root as a system credential
will make every instance of our all our getty units autologin as the
root user. This prevents us from doing autologin on /dev/hvc0 while
still requiring manual login on all other ttys.
To solve the issue, we introduce support for renaming credentials with
ImportCredential=. This will allow us to add the following to e.g.
serial-getty@.service:
which for serial-getty@hvc0.service will make the service manager read
all credentials of the form "tty.serial.hvc0.agetty.xxx" and pass them
to the service in the form "agetty.xxx" (same goes for login). We can
apply the same to each of the getty units to allow setting agetty and
login credentials for individual ttys instead of globally.
Credentials are written to a temporary file and renamed to the
destination with renameat() which will replace existing files so
EEXIST should not happen so drop the handling for EEXIST.
execute: Drop log level to unit log level in exec_spawn()
All messages logged from exec_spawn() are attributed to the unit
and as such we should set the log level to the unit's max log level
for the duration of the function.
Dan Nicholson [Tue, 30 Jul 2024 13:37:40 +0000 (07:37 -0600)]
firstboot: fix root params with creds and prompting disabled
Remove an early return that prevents --prompt-root-password or
--prompt-root-shell and systemd.firstboot=off using credentials. In that case,
arg_prompt_root_password and arg_prompt_root_shell will be false, but the
prompt helpers still need to be called to read the credentials. Furthermore, if
only the root shell has been set, don't overwrite the root password.
Dan Nicholson [Tue, 30 Jul 2024 19:42:26 +0000 (13:42 -0600)]
firstboot: handle missing root password entries
If /etc/passwd and/or /etc/shadow exist but don't have an existing root entry,
one needs to be added. Previously this only worked if the files didn't exist.
Łukasz Stelmach [Tue, 28 May 2024 14:56:03 +0000 (16:56 +0200)]
Revert "execute: Call capability_ambient_set_apply even if ambient set is 0"
With ambient capabilities being dropped at the start of process managers
(both system and user) as well as systemd-executor it isn't necessary
to drop them here. Moreover, at this point also the inheritable set can
be preserved. This makes it possible to assign a user session manager
inheritable capabilities which combined with file capabilites (ei sets)
of service executables enable running user services with capabilities
but only when started by the manager.
Łukasz Stelmach [Mon, 20 May 2024 14:51:55 +0000 (16:51 +0200)]
core: drop ambient capabilities in systemd-executor
Since the commit 963b6b906e ("core: drop ambient capabilities in
user manager") systemd running as the session manager has dropped ambient
capabilities retaining other sets allowing user services to be started
with elevated capabilities. This, worked fine until the introduction of
sd-executor. For a non-root process to be started with elevated
capabilities by a non-root parent it either needs file capabilities or
ambient capabilities in the parent process. Thus, systemd needs to allow
sd-executor to inherit its ambient capabilities and sd-executor should
drop them as systemd did before.
The ambient set is managed for both system and session managers, but
with the default set for PID#1 being empty, this code does not affect
operation of PID#1.
Dan Nicholson [Tue, 30 Jul 2024 17:11:11 +0000 (11:11 -0600)]
firstboot: create locked and empty root passwords consistently
Although locked and empty passwords in /etc/passwd are treated the same, in all
other cases the entry is configured to read the password from /etc/shadow.
network: request non-NULL SSID when a wlan interface is configured as station
To avoid conflicts with user .network file for the wlan interface with Bond=.
See https://github.com/systemd/systemd/issues/19832#issuecomment-857661200.
stub: allocate and zero enough space in legacy x86 handover protocol
A PE image's memory footprint might be larger than its file size due
to uninitialized memory sections. Normally all PE headers should be
parsed to check the actual required size, but the legacy EFI handover
protocol is only used for x86 Linux bzImages, so we know only the last
section will require extra memory. Use SizeOfImage from the PE header
and if it is larger than the file size, allocate and zero extra memory
before using it.
network: call link_handle_bound_by_list() before trying to reconfigure interface
Otherwise, when an interface gained its carrier, the interface may not
have matching .network file yet, then link_reconfigure_impl() returns
zero, and link_handle_bound_by_list() is skipped.
Similar to 2d393b1b6d8 ("network: IPv6 Compliance: Router Advertisement
Processing, Reachable Time [v6LC.2.2.15]"),
Extract from: https://www.ietf.org/rfc/rfc4861.html#section-4.2, p.21,
first paragraph:
The Router Lifetime applies only to
the router's usefulness as a default router; it
does not apply to information contained in other
message fields or options.
So it does not make sense to prevent DHCPv6 when Router Lifetime is 0.
Fix detection of TDX confidential VM on Azure platform
The original CVM detection logic for TDX assumes that the guest can see
the standard TDX CPUID leaf. This was true in Azure when this code was
originally written, however, current Azure now blocks that leaf in the
paravisor. Instead it is required to use the same Azure specific CPUID
leaf that is used for SEV-SNP detection, which reports the VM isolation
type.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>