Move the renaming function to reboot-util.h (since it writes out
/run/nologin at shutdown), and let's get rid of fileio-label.[ch] now
that it serves no purpose anymore.
fileio: port write_string_file_full() to openat_report_new()
This brings two benefits: we will label the created file only if it is
actually created, and we can correctly delete any file we create again
on failure.
fileio: port write_string_file() to LabelOps, and thus add WRITE_STRING_FILE_LABEL flag
Given that we have the LabelOps abstraction these days, we can teach
write_string_file() to use it, which means we can get rid of
fileio-label.[ch] as a separate concept.
(The only reason that fileio-label.[ch] exists independently of
fileio.[ch] was that the former linekd to libselinux potentially, and
thus had to be in src/shared/ while the other always was in src/basic/.
But the LabelOps vtable provides us with a nice work-around)
fs-util: tweak how openat_report_new() operates when O_CREAT is used on a dangling symlink
One of the big mistakes of Linux is that when you create a file with
open() and O_CREAT and the file already exists as dangling symlink that
the symlink will be followed and the file created that it points to.
This has resulted in many vulnerabilities, and triggered the creation of
the O_MOFOLLOW flag, addressing the problem.
O_NOFOLLOW is less than ideal in many ways, but in particular one: when
actually creating a file it makes sense to set, because it is a problem
to follow final symlinks in that case. But if the file is already
existing, it actually does make sense to follow the symlinks. With
openat_report_new() we distinguish these two cases anyway (the whole
function exists only to distinguish the create and the exists-already
case after all), hence let's do something about this: let's simply never
create files "through symlinks".
This can be implemented very easily: just pass O_NOFOLLOW to the 2nd
openat() call, where we actually create files.
And then basically remove 0dd82dab91eaac5e7b17bd5e9a1e07c6d2b78dca
again, because we don't need to care anymore, we already will see ELOOP
when we touch a symlink.
Note that this change means that openat_report_new() will thus start to
deviate from plain openat() behaviour in this one small detail: when
actually creating files we will *never* follow the symlink. That should
be a systematic improvement of security.
fs-util: always call label post ops in xopenat_full(), in both success and error path
For SELinux it is essential that we reset the file creation label both
in the success and in the error path, hence do so.
Moreover, when calling the label post ops do it if possible with the
opened fd of the inode itself, rather than always going via its path,
simply to reduce the attack surface.
fs-util: don't second guess openat_report_new() return values
If openat_report_new() fails, then 'made_file' will be false, as no file
was created, hence there's no need to skip the unlinkat() explicitly
early, given that we check for 'made_file' anyway in the error path. The
extra error code checks are hence entirely redundant.
label: tweak LabelOps post() hook to take "created" boolean
We have two distinct implementations of the post hook.
1. For SELinux we just reset the selinux label we told the kernel
earlier to use for new inodes.
2. For SMACK we might apply an xattr to the specified file.
The two calls are quite different: the first call we want to call in all
cases (failure or success), the latter only if we actually managed to
create an inode, in which case it is called on the inode.
Daan De Meyer [Tue, 22 Oct 2024 09:12:17 +0000 (11:12 +0200)]
bus-util: Special case when DBUS_SESSION_BUS_ADDRESS is set and XDG_RUNTIME_DIR isn't
We noticed some failures because we have code that connects to user
managers by setting DBUS_SESSION_BUS_ADDRESS without setting XDG_RUNTIME_DIR.
If that's the case, connect to the user session bus instead of the
private manager bus as we can't connect to the latter if XDG_RUNTIME_DIR
is not set.
Ronan Pigott [Thu, 25 Jan 2024 06:49:03 +0000 (23:49 -0700)]
test/fuzz: add dnr packets
The structure of DNR options is considerably more complicated than most
DHCP options, and as a result the fuzzer has poor coverage of these code
paths.
This adds some DNR packets to the fuzzing corpus, not with the intent of
capturing some specific edge case, but with the intent to rapidly
improve the fuzzers' coverage of these codepaths by giving it a valid
example to begin with.
Also include an ndisc router advert with a few Encrypted DNS options,
for the same purpose.
Ronan Pigott [Tue, 16 Jan 2024 07:04:41 +0000 (00:04 -0700)]
network: Serialize DNR servers
Implement serialization/deserialization for DNR servers. This re-uses
the string format in place for user configuration of DoT servers, and as
a consequence non-DoT servers are discarded when recording the link
configuration, for correctness.
This also enables sd-resolved to use these servers as it would other DNS
servers.
Ronan Pigott [Tue, 16 Jan 2024 07:04:07 +0000 (00:04 -0700)]
network: Introduce UseDNR DHCPv4 option
This option will control the use of DNR for choosing DNS servers on the
link. Defaults to the value of UseDNS so that in most cases they will be
toggled together.
Daan De Meyer [Fri, 4 Oct 2024 14:46:16 +0000 (16:46 +0200)]
Rework TEST-86-MULTI-PROFILE-UKI
Now that mkosi supports generating UKI profiles, let's make use of
that to generate the UKI profiles required for the test instead of
doing it within the test itself.
Daan De Meyer [Mon, 21 Oct 2024 13:01:59 +0000 (15:01 +0200)]
TEST-70-TPM2: Disable public key enrollment explicitly
Otherwise, when the test is executed on a system with signed PCRs,
cryptenroll will automatically pick up the public key from the UKI
which results in a volume that can't be unlocked because the pcrextend
tests appends extra things to pcr 11.
varlinkctl: respect $COLUMNS when rebreaking lines and we are not connected to a TTY
Let's provide a mechanism to select the number of screen columns for
rebreaking comments in Varlink IDL connected to a TTY, by honouring the
$COLUMNS env var then too. Previously we'd only honour when connected to
a TTY, but it's also useful otherwise for rebreaking ridiculously long
comments, hence honour it in this case too.
ask-password-api: don't accidentally create a dir, when we don't want one
Previously, we were using touch(), which usually works fine, because the
path should always refer to an existing directory, in which case it just
updates the timestamp. However, if the dir does not exist yet (which
shouldn't happen), it would be created as regular file, which is just
wrong.
Hence, let's instead create the dir as dir if it is missing, and then
update its timestamp.
Daan De Meyer [Fri, 4 Oct 2024 08:22:37 +0000 (10:22 +0200)]
measure: Take SizeOfImage into account as well for .linux section
Same change as https://github.com/systemd/systemd/pull/34583 but for
systemd-measure. Otherwise we end up with PCR policy digest mismatches
as systemd-stub will measure the full virtual size of the kernel image
after it has been loaded while systemd-measure will disregard the extra
size introduced by SizeOfImage.
While ideally the stub would only measure the data that's actually on
disk and not the uninitialized data introduced by VirtualSize > SizeOfRawData,
we want newer systemd-measure to work with older stubs so we have to fix
systemd-measure and can't fix this in the stub.
Ronan Pigott [Sat, 19 Oct 2024 04:10:57 +0000 (21:10 -0700)]
resolved: enable CD bit without DO set
This is useful for a validating resolver to indicate to a non-validating
resolver when checking was disabled for the query. This matches the
behavior of the major public resovlers in response to queries with CD bu
tnot DO set.
Ronan Pigott [Mon, 7 Oct 2024 18:05:18 +0000 (11:05 -0700)]
resolved: authenticate bypass queries
Following 13e15dae9f0b, resolved does not forward the AD bit for bypass
queries, but resolved also didn't do it's own validation, making these
replies appear to never be authentic. We should enable validation for
bypass queries.
Let's disable our own validation when processing a +cd query, and also
ensure that it skips the cache so that we don't accidentally fail to
return inauthentic replies from upstream.
Previously, when we had a bypass transaction without cd, a cached,
authenticated, reply with cd could be served, leaving the cd bit
erroneously set in the reply. Only reply with a CD bit if the client
requested it.
Fixes: 13e15dae9f0b (resolved: clear the AD bit for bypass packets)
Yu Watanabe [Fri, 11 Oct 2024 17:44:22 +0000 (02:44 +0900)]
TEST-55-OOMD: workaround for kernel regression in 6.12-rcX
This ignore failures when running on kernel-6.12-rcX, which has a
regression in the kernel scheduler that breaks PSI.
From https://github.com/systemd/systemd/issues/32730#issuecomment-2415312260
> There is a known scheduler bug in 6.12 that breaks psi. It leaks
> "running tasks" counts, which matches your symptoms of seeing partial
> pressure only.
>
> Do you see "inconsistent task state" warnings in dmesg | grep psi?
>
> A fix is queued in the scheduler tree, should be sent to Linus shortly:
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=c6508124193d42bbc3224571eb75bfa4c1821fbb
Adrian Vovk [Sat, 22 Jun 2024 18:48:50 +0000 (14:48 -0400)]
sysupdate: Introduce optional features
Optional features allow distros to define sets of transfers that can
be enabled or disabled by the system administrator. This is useful for
situations where a distro may want to ship some resources version-locked
to the core OS, but many people have no need for the resource, such as:
development tools/compilers, drivers for specialized hardware, language
packs, etc
We also rename sysupdate.d/*.conf -> sysupdate.d/*.transfer, because
now there are more than one type of definition in sysupdate.d/. For
backwards compat, we still load *.conf files as long as no *.transfer
files are found and the *.conf files don't try to declare themselves
as part of any features
Luca Boccassi [Fri, 18 Oct 2024 14:02:03 +0000 (15:02 +0100)]
test: customize /etc/os-release instead of /usr/lib/os-release
As per spec image builders can create a local /etc/os-release
with per-image IDs, so modify that one instead of the original
one in /usr/lib. For example we do this when we build debian
unstable images in mkosi.
Adrian Vovk [Wed, 4 Sep 2024 17:44:26 +0000 (13:44 -0400)]
GREEDY_REALLOC_APPEND: Make more type safe
Previously, GREEDY_REALLOC_APPEND would compile perfectly fine and cause
subtle memory corruption if the caller messes up the type they're passing
in (i.e. by forgetting to pass-by-reference when appending a Type* to an
array of Type*). Now this will lead to compilation failure
pid1: close fds we receive via sd_notify() and cannot make use of asynchronously
This addresses #11112 fully. It mostly was addressed by 99620f457ed0886852ba18c9093b59767299121c already, but for fds not
even passed to the fdstore, this adds the missing asynchronous close
codepath.
Ryan Wilson [Tue, 15 Oct 2024 03:49:54 +0000 (20:49 -0700)]
cgroup: Add ManagedOOMMemoryPressureDurationSec= override setting for units
This will allow units (scopes/slices/services) to override the default
systemd-oomd setting DefaultMemoryPressureDurationSec=.
The semantics of ManagedOOMMemoryPressureDurationSec= are:
- If >= 1 second, overrides DefaultMemoryPressureDurationSec= from oomd.conf
- If is empty, uses DefaultMemoryPressureDurationSec= from oomd.conf
- Ignored if ManagedOOMMemoryPressure= is not "kill"
- Disallowed if < 1 second
Note the corresponding dbus property is DefaultMemoryPressureDurationUSec
which is in microseconds. This is consistent with other time-based
dbus properties.
Ryan Wilson [Wed, 16 Oct 2024 17:40:30 +0000 (10:40 -0700)]
oomd: Refactor DefaultMemoryPressureDurationSec= to use conf parser
Parsing DefaultMemoryPressureDurationSec= is currently split between
conf parser, main() and manager_start() methods. This commit centralizes
parsing and bounds checking logic within a single custom conf parser
function.