This extends the test framework a bit, and allows adding additional
initrds to the qemu invocation, which we use here to place credentials
in the new /run/systemd/@initrd/ credentials dir which are then passed
to the host.
Let's add two new helpers: mount_credentials_fs() and
credentials_fs_mount_flags(). The former mounts a file system suitable
for storing of unencrypted credentials at runtime (i.e. a ramfs or
tmpfs). The latter determines the right mount flags to use for such a
mount.
Both functions mostly just take code from execute.c, but make two
changes:
1. If the kernel supports it we'll use a tmpfs with the new "noswap"
mount option instead of ramfs. Was added in kernel 6.4, hence is very
recent, but tmpfs is so much less crappy than ramfs, hence worth it.
2. We'll set MS_NOSYMFOLLOW on the mounts if supported. These file
systems should only contain regulra files, hence no need to allow
symlinks.
creds-util: add new helper read_credential_with_decryption()
This is just like read_credential() but also looks into the encrypted
credential directory, not just the regular one.
Normally, we decrypt credentials at the moment we pass them to services.
From service PoV all credentials are hence decrypted credentials.
However, when we want to access credentials in a generator this logic
does not apply: here we have the regular and the encrypted credentials
directory. So far we didn't attempt to make use of credentials in
generators hence.
Let's address and add helper that looks into both directories, and talks
to the TPM if necessary to decrypt the credentials.
execute: fix credential dir handling for fs which support ACLs
When the credential dir is backed by an fs that supports ACLs we must be
more careful with adjusting the 'x' bit of the directory, as any chmod()
call on the dir will reset the mask entry of the ACL entirely which we
don't want. Hence, do a manual set of ACL changes, that only add/drop
the 'x' bit but otherwise leave the ACL as it is.
This matters if we use tmpfs rather than ramfs to store credentials.
This log message is shown pretty regular at boot in various scenarios
(such as CI builds), and it's not a reason for any concern, it's just the
immediate effect of explicit configuration. Hence let's downgrade from
LOG_NOTICE to LOG_INFO so that it is still usually in the boot output,
but not particularly highlighted, since there's really no reason to.
dhcp6: relax data assert in dhcp6_option_parse_string
dhcp6_option_parse_string is intended to clear strings with length 0,
for consistency. The data assert is too strict for this purpose, so we
will allow data || data_len == 0, similar to other dhcp6_option_parse*
helpers.
core: introduce a new job mode JOB_RESTART_DEPENDENCIES
This new job mode will enqueue a start job for a unit, and all units
depending on the unit will get a restart job enqueued. This is then used
for automatic sevice restarts: the unit itself is only started, the
depending units restarted. This way the unit will not go down
unnecessarily, triggering OnSuccess= needlessly.
This also introduces a new state SERVICE_AUTO_RESTART_QUEUED that is
entered once the restart jobs are enqueued. Previously we'd stay in
SERVICE_AUTO_RESTART, but that's problematic, since we'd lose
information whether we still need to enqueue the restart job during a
serialization/deserialization cycle or not. By having an explicit state
for this we know exactly whether we still need to enqueue the job or
not. It's also good since when we are in SERVICE_AUTO_RESTART_QUEUED we
want to act on unit_start(), but on SERVICE_AUTO_RESTART we want to wait
for the holdoff time to pass before we act on unit_start().
ndisc: reject malformed captive portal URI with EBADMSG
This allows the correct, gracious, error handling to follow up in the
ndisc handler. Otherwise, an internal error is assumed and the interface
disabled.
network: delay to configure address until it is removed on reconfigure
When we request an address that already exists and is under removing,
we need to wait for the address being removed. Otherwise, configuration
of a route whose preferred source is the address will fail.
Daan De Meyer [Fri, 30 Jun 2023 14:06:54 +0000 (16:06 +0200)]
dbus-cgroup: Make sure we overwrite cpuset properties in drop-in
The DBUS property setter overwrites the value of the property but
writes a drop-in that extends the value. Let's make sure the drop-in
overwrites the property value as well by assigning the empty string
first.
Let's rename the unit to systemd-battery-check.service. We usually want
to name our own unit files like our tools they wrap, in particular if
they are entirely defined by us (i.e. not just wrappers of foreign
concepts)
While we are at it, also hook this in from initrd.target, and order it
against initrd-root-device.target so that it runs before the root device
is possibly written to (i.e. mounted or fsck'ed).
This is heavily inspired by @aafeijoo-suse's PR #28208, but quite
different ;-)
static versions of libsystems.so are not really supportable, and
encourages mix&match which we cannot really support. Make the wording
about this stronger in the README, since people apparently don'd read to
the last paragraph.
Frantisek Sumsal [Thu, 29 Jun 2023 11:31:19 +0000 (13:31 +0200)]
core: reorder systemd arguments on reexec
When reexecuting system let's put our arguments carrying deserialization
info first followed by any existing arguments to make sure they get
parsed in case we get weird stuff from the kernel cmdline (like --).
Ivan Vecera [Thu, 22 Jun 2023 08:06:27 +0000 (10:06 +0200)]
udev-builtin-net_id: align VF representor names with VF names
Certain cards support to set their eswitch to switchdev mode. In this
mode for each created VF there is also created so called VF representor.
This representor is helper network interface used for configuration of
mentioned eswitch and belongs to an appropriate PF.
VF representors are identified by the specific value of phys_port_name
attribute and the value has format "pfMvfN" where M is PF function
number and N is VF number inside this PF.
As the VF representor interfaces belong to PF PCI device the naming
scheme used for them is the same like for other PCI devices. In this
case name of PF interface is used and phys_port_name suffix is appended.
E.g.
PF=enp65s0f0np0 # phys_port_name for PF interface is 'p0'
VF=enp65s0f0np0v0 # v0 is appended for VF0 in case of NAMING_SR_IOV_V
REP=enp65s0f0np0pf0vf0 # phys_port_name for VF0 representor is 'pf0vf0'
First as the phys_port_name for representors is long (6+ chars) then the
generated name does not fit into IFNAMSIZ so this name is used only as
alternate interface name and for the primary one is used generic one
like eth<N>. Second 'f0' and 'pf0' in REP name is redundant.
This patch fixes this issue by introducing another naming scheme for VF
representors and appending 'rN' suffix to PF interface name for them.
N is VF number so the name used for representor interface is similar to
VF interface and differs only by the suffix.
For the example above we get:
PF=enp65s0f0np0
VF=enp65s0f0np0v0
REP=enp65s0f0np0r0
This eases for userspace to determine which representor interface
represents particular VF.
mount-util: tweak flags decoding in mount_verbose_full()
Fine-tune the decoding of mount options in mount_verbose_full() to
provide more helpful log output:
1. decode changing of propagation changes
2. discern changing of superblock flags/mount option string from mount
flags
3. don't check secondary fields when deciding which mount op is
executed, only the flags decide that.
man: document vmm.notify_socket credential in systemd(1) man page
Let's move the long explanation to the man page of the component that
interprets the credential, and keep only a brief summary in
systemd.system-credentials(7).
Philipp Kern [Fri, 23 Jun 2023 08:39:52 +0000 (10:39 +0200)]
sd-dhcp6-lease: ignore invalid byte(s) at the end of the packet
Oracle Cloud sends malformed DHCPv6 replies that have an invalid
byte at the end, which cannot be parsed as an option code.
networkd currently can cope with the invalid option (it is ignored),
but the whole packet is ignored altogether because of the additional
null at the end.
It's better to be liberal in what we accept and actually assign an
address, given that the reply contains a valid IA_NA.
Daan De Meyer [Thu, 29 Jun 2023 11:35:03 +0000 (13:35 +0200)]
mkosi: Enable Incremental= mode by default
Since mkosi is now smart enough to drop the caches when the list of
packages changes, let's enable Incremental= mode by default to ensure
a good experience for anyone new to hacking on systemd with mkosi.
Yuxiang Zhu [Thu, 29 Jun 2023 10:11:52 +0000 (18:11 +0800)]
network: Add `IgnoreDdontFragment=` option for Fragmentation control (#28131)
From `ip-link(8)`:
> [no]ignore-df - enables/disables IPv4 DF
suppression on this tunnel. Normally datagrams
that exceed the MTU will be fragmented; the
presence of the DF flag inhibits this, resulting
instead in an ICMP Unreachable (Fragmentation
Required) message. Enabling this attribute causes
the DF flag to be ignored.
If this option is enabled for a GRE/GRETAP tunnel, the `DF` flag in the outer IP header
will not inherit the inner IP header's `DF` flag.
This is useful to transfer packets that exceed the MTU of the underlay
network.