Luca Boccassi [Wed, 6 Nov 2024 13:51:10 +0000 (13:51 +0000)]
introduce report_errno_and_exit() helper (#35028)
This is a follow for https://github.com/systemd/systemd/pull/34853. In
particular, this comment
https://github.com/systemd/systemd/pull/34853#discussion_r1825837705.
Since v256 we completely fail to boot if v1 is configured. Fedora 41 was just
released with v256.7 and this is probably the first major exposure of users to
this code. It turns out not work very well. Fedora switched to v2 as default in
F31 (2019) and at that time some people added configuration to use v1 either
because of Docker or for other reasons. But it's been long enough ago that
people don't remember this and are now very unhappy when the system refuses to
boot after an upgrade.
Refusing to boot is also unnecessarilly punishing to users. For machines that
are used remotely, this could mean somebody needs to physically access the
machine. For other users, the machine might be the only way to access the net
and help, and people might not know how to set kernel parameters without some
docs. And because this is in systemd, after an upgrade all boot choices are
affected, and it's not possible to e.g. select an older kernel for boot. And
crashing the machine doesn't really serve our goal either: we were giving a
hint how to continue using v1 and nothing else.
If the new override is configured, warn and immediately boot to v1.
If v1 is configured w/o the override, warn and wait 30 s and boot to v2.
Also give a hint how to switch to v2.
The advice is to set systemd.unified_cgroup_hierarchy=1 (instead of removing
systemd.unified_cgroup_hierarchy=0). I think this is easier to convey. Users
who are understand what is going on can just remove the option instead.
The caching is dropped in cg_is_legacy_wanted(). It turns out that the
order in which those functions are called during early setup is very fragile.
If cg_is_legacy_wanted() is called before we have set up the v2 hierarchy,
we incorrectly cache a true answer. The function is called just a handful
of times at most, so we don't really need to cache the response.
Let's systematically make sure that we link up the D-Bus interfaces from
the daemon man pages once in prose and once in short form at the bottom
("See Also"), for all daemons.
Also, add reverse links at the bottom of the D-Bus API docs.
We have this in a similar fashion for the other APIs libsystemd
provides. Add the same for sd-varlink. There isn't too much on it for
now, but at least it's a start.
man: tone down claims on processes having exited already in ExecStop=
Processes can easily survive the first kill operation we execute, hence
we shouldn't make strong claims about them having exited already. Let's
just say "likely" hence.
anonymix007 [Tue, 22 Oct 2024 11:38:00 +0000 (14:38 +0300)]
uki: add new .dtbauto PE section type
.dtbauto section contains DT blobs, just like .dtb, the difference is
that multiple .dtbauto sections are allowed to be in a UKI and only one
is selected automatically
Temporarily drop an assert_cc() check in systemd-measure to make it compilable before the next commit
Ronan Pigott [Mon, 4 Nov 2024 23:12:00 +0000 (16:12 -0700)]
network: handle ENODATA better with DNR
It is normal for DHCP leases not to have DNR options. We need to be less
verbose and more forgiving in these cases. Also, if either DHCP does not
have DNR options, make sure to still consider any DHCPv6/RA options.
Fixes: c7c9e3c7c016 (network: adjust log message about DNR)
Yu Watanabe [Tue, 5 Nov 2024 02:41:31 +0000 (11:41 +0900)]
network: introduce LINK_RECONFIGURE_CLEANLY flag
And use it when explicit reconfiguration is requested by Reconfigure() DBus method
or networkd certainly detects that connected network is changed.
Otherwise do not use the flag especially when we come back from sleep mode.
Yu Watanabe [Tue, 5 Nov 2024 02:39:31 +0000 (11:39 +0900)]
network: keep dynamic configurations as possible as we can on reconfigure
E.g. when a .network file is updated, but DHCP setting is unchanged, it
is not necessary to drop acquired DHCP lease.
So, let's not stop DHCP client and friends in link_reconfigure_impl(),
but stop them later when we know they are not necessary anymore.
Still DHCP clients and friends are stopped and leases are dropped when
the explicit reconfiguration is requested
Yu Watanabe [Tue, 5 Nov 2024 02:32:33 +0000 (11:32 +0900)]
network: merge link_foreignize_config() and link_drop_foreign_config()
When a reconfiguration of an interface is triggered, previously we
call link_foreignize_config(), which sets all static configurations as
foreign, then later call link_drop_foreign_config(), which drops
unnecessary foreign configurations.
This commit merges these two steps into one, link_drop_unmanaged_config(),
which drops unnecessary static and foreign configurations.
Also, this renames link_drop_managed_configs() to
link_drop_static_config(), as it only drops static configurations.
Note that dynamically aquired configurations are dropped by
link_stop_engines().
Yu Watanabe [Mon, 4 Nov 2024 19:04:33 +0000 (04:04 +0900)]
network: several cleanups for link_reconfigure()
Effectively no functional changes, just refactoring and preparation for
later changes.
- convert boolean flag 'force' to LinkReconfigurationFlag enum,
- merge link_reconfigure() and reconfigure_handler_on_bus_method_reload() as
link_reconfigure_full(),
- Rename ReconfigureData -> LinkReconfigurationData,
- make Reconfigure() DBus message wait for reconfiguration being
started before sending reply.
Daan De Meyer [Thu, 29 Aug 2024 15:10:46 +0000 (17:10 +0200)]
core: Introduce PrivatePIDs=
This new setting allows unsharing the pid namespace in a unit. Because
you have to fork to get a process into a pid namespace, we fork in
systemd-executor to get into the new pid namespace. The parent then
sends the pid of the child process back to the manager and exits while
the child process continues on with the rest of exec_invoke() and then
executes the actual payload.
Communicating the child pid is done via a new pidref socket pair that is
set up on manager startup.
We unshare the PID namespace right before the mount namespace so we
mount procfs correctly. Note PrivatePIDs=yes always implies MountAPIVFS=yes
to mount procfs.
When running unprivileged in a user session, user namespace is set up first
to allow for PID namespace to be unshared. However, when running in
privileged mode, we unshare the user namespace last to ensure the user
namespace does not own the PID namespace and cannot break out of the sandbox.
Note we disallow Type=forking services from using PrivatePIDs=yes since the
init proess inside the PID namespace must not exit for other processes in
the namespace to exist.
Note Daan De Meyer did the original work for this commit with Ryan Wilson
addressing follow-ups.
Co-authored-by: Daan De Meyer <daan.j.demeyer@gmail.com>
resolved: log error messages for openssl/gnutls context creation
In https://bugzilla.redhat.com/show_bug.cgi?id=2322937 we're getting
an error message:
Okt 29 22:21:03 fedora systemd-resolved[29311]: Could not create manager: Cannot allocate memory
I expect that this actually comes from dnstls_manager_init(), the
openssl version. But without real logs it's hard to know for sure.
Use EIO instead of ENOMEM, because the problem is unlikely to be actually
related to memory.
Daan De Meyer [Mon, 4 Nov 2024 11:21:21 +0000 (12:21 +0100)]
tmpfiles: Implement L? to only create symlinks if source exists
This allows a single tmpfiles snippet with lines to symlink directories
from /usr/share/factory to be shared across many different configurations
while making sure symlinks only get created if the source actually exists.
Yu Watanabe [Fri, 1 Nov 2024 21:31:25 +0000 (06:31 +0900)]
network: check if interface is initialized after enumeration completed
We enumerate interfaces at first, then enumerate other configurations
like addresses and so on. If we are running on a container, previously
we started to configure the enumerated interfaces before enumerating other
configurations.
Let's configure interfaces after all configurations are enumerated.
sd-daemon: add fd array size safety check to sd_notify_with_fds()
The previous commit removed the UINT_MAX check for the fd array. Let's
now re-add one, but at a better place, and with a more useful limit. As
it turns out the kernel does not allow passing more than 253 fds at the
same time, hence use that as limit. And do so immediately before
calculating the control buffer size, so that we catch multiplication
overflows.
We fucked that up in the original sd_listen() calls, and then we fixed
that on the newer flavours. But pour internal common implementation
should of course use the full range size_t, as it should be.
This then allows us to drop a redundant range check.
This cleans up the handling of the "unset_environment" parameter to
sd_listen() and related calls: the man pages claim we operate on it on
error too. Hence, actually do so in strictly all error paths. Previously
we'd miss out on some, because wrapper functions mishandled them.
This establishes a common pattern: a function to unset the relevant env
vars, that is called from a goto section at the botom on both success
and failure.
Martin Wilck [Wed, 30 Oct 2024 15:57:39 +0000 (16:57 +0100)]
udev-builtin-path_id: SAS wide ports must have num_phys > 1
Some kernel SAS drivers (e.g. smartpqi) expose ports with num_phys = 0. udev
shouldn't treat these ports as wide ports. SAS wide ports always have
num_phys > 1. See comments for sas_port_add_phy() in the kernel sources.
Sample data from a smartpqi system to illustrate the issue below.
Here the phy device is attached to port 0:0, which has no end devices attached
and the SAS end device (where sda is attached) is associated with SAS
port 0:1, which has no associated phy device. Thus num_phys for port-0:1 is 0.
This is arguably wrong, but it's how smartpqi has always set up its devices in
sysfs.