Ronan Pigott [Fri, 22 Dec 2023 04:50:45 +0000 (21:50 -0700)]
resolved: add transaction result for upstream failures
This new transaction result is emitted when the upstream server
indicates a fatal error that we will not try to recover from.
Currently, it is emitted when a validating recursive resolver reports an
error validating dnssec records for a domain. The extended error message
should help give context to the admin.
Ronan Pigott [Wed, 20 Dec 2023 22:16:41 +0000 (15:16 -0700)]
resolved: delay server feature detection
Some fields of the DnsPacket are not populated until we extract an
answer, like p->opt, despite being referenced by macros like
DNS_PACKET_RCODE. We can reorder some of the basic checks to follow
dns_packet_extract.
Rose [Tue, 2 Jan 2024 15:13:27 +0000 (10:13 -0500)]
basic: fix overflow detection in sigbus_pop
The current check checks for n_sigbus_queue
being greater than or equal to SIGBUS_QUEUE_MAX,
when it should be just greater than as
n_sigbus_queue being SIGBUS_QUEUE_MAX indicates
that the queue is full, but not overflowed.
test: temporarily adjust the default mount rate limit
(Hopefully) a temporary workaround for #30573 where starting a user
session when PID 1 is rate limited stalls even after it leaves the rate
limited state:
[ 11.658201] H systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=4208 reply_cookie=0 signature=so error-name=n/a error-mes>
[ 11.658233] H systemd[1]: Event source 0x559babdd8bb0 (mount-monitor-dispatch) left rate limit state.
[ 101.562697] H busctl[784]: Failed to get credentials: Transport endpoint is not connected
[ 101.563480] H systemd[1]: systemd-journald.service: Got notification message from PID 300 (WATCHDOG=1)
[ 101.563725] H testsuite-74.sh[784]: BusAddress=unixexec:path=systemd-run,argv1=-M.host,argv2=-PGq,argv3=--wait,argv4=-pUser%3dtestuser,argv5=-pPAMName%3dlogin,argv6=systemd-stdio-bridge,argv7=-punix:path%3d%24%7bXDG_RUNTIME_DIR%7d/bus
[ 101.564136] H systemd[1]: Successfully forked off '(sd-expire)' as PID 787.
[ 101.564754] H systemd[1]: Successfully forked off '(sd-expire)' as PID 788.
[ 101.564831] H testsuite-74.sh[381]: + echo 'Subtest /usr/lib/systemd/tests/testdata/units/testsuite-74.busctl.sh failed'
The issue appeared after ee07fff03b which does a bunch of mounts/umounts
that get PID 1 into a rate limited state, and is frequent enough to be
annoying, so let's temporarily bump the rate limit to alleviate that.
When KeepCarrier is set, networkd doesn't close tun/tap file descriptor
preserving the active interface state, but doesn't disable its queue
which makes kernel to think that it's still active and send packets to
it.
This patch disables the created queue right after tun/tap interface
creation.
shared: add new "vpick" concept for ".v/" directories that contain versioned resources
This adds a new concept for handling paths. At appropriate places, if a
path such as /foo/bar/baz.v/ is specified, we'll
automatically enumerate all entries in /foo/bar/baz.v/baz* and then
do a version sort and pick the newest file.
A slightly more complex syntax is available, too:
/foo/bar/baz.v/quux___waldo
if that's used, then we'll look for all files matching
/foo/bar/baz.v/quux*waldo, and split out the middle, and version sort
it, and pick the nwest.
The ___ wildcard indicates both a version string, and if needed an
architecture ID, in case per-arch entries shall be supported.
This is a very simple way to maintain versioned resources in a dir, and
make systemd's components automatically pick the newest. Example:
Then it will automatically pick
/srv/myimages.v/foobar_1.33.45_x86-64.raw as the version to boot on
x86-64, and /srv/myimages.v/foobar_1.31.5_arm64.raw on arm64.
This commit only adds the basic implementation for picking files from a
dir, but no hook-up anywhere.
tpm2-util: handle TPMs gracefully that do not support ECC and return TPM2_RC_VALUES
If a TPM doesn't do ECC it could either return zero curves when asked
for it, or it could simply fail with TPM2_RC_VALUES because it doesn't
recognize the capability at all.
Michal Sekletar [Mon, 30 Oct 2023 11:08:59 +0000 (12:08 +0100)]
core/manager: add dbus API to create auxiliary scope from running service
This commit introduces new D-Bus API, StartAuxiliaryScope(). It may be
used by services as part of the restart procedure. Service sends an
array of PID file descriptors corresponding to processes that are part
of the service and must continue running also after service restarts,
i.e. they haven't finished the job why they were spawned in the first
place (e.g. long running video transcoding job). Systemd creates new
scope unit for these processes and migrates them into it. Cgroup
properties of scope are copied from the service so it retains same
cgroup settings and limits as service had.
units: add a tpm2.target synchronization point and small generator that pulls in
Distributions apparently only compile a subset of TPM2 drivers into the
kernel. For those not compiled it but provided as kmod we need a
synchronization point: we must wait before the first TPM2 interaction
until the driver is available and accessible.
This adds a tpm2.target unit as such a synchronization point. It's
ordered after /dev/tpmrm0, and is pulled in by a generator whenever we
detect that the kernel reported a TPM2 to exist but we have no device
for it yet.
This should solve the issue, but might create problems: if there are TPM
devices supported by firmware that we don't have Linux drivers for we'll
hang for a bit. Hence let's add a kernel cmdline switch to disable (or
alternatively force) this logic.
Michal KoutnĂ˝ [Mon, 14 Aug 2023 17:59:57 +0000 (19:59 +0200)]
cgroup: Restrict effective limits with global resource provision
Global resource (whole system or root cg's (e.g. in a container)) is
also a well-defined limit for memory and tasks, take it into account
when calculating effective limits.
Michal KoutnĂ˝ [Fri, 11 Aug 2023 11:51:20 +0000 (13:51 +0200)]
cgroup: Add EffectiveMemoryMax=, EffectiveMemoryHigh= and EffectiveTasksMax= properties
Users become perplexed when they run their workload in a unit with no
explicit limits configured (moreover, listing the limit property would
even show it's infinity) but they experience unexpected resource
limitation.
The memory and pid limits come as the most visible, therefore add new
unit read-only properties:
- EffectiveMemoryMax=,
- EffectiveMemoryHigh=,
- EffectiveTasksMax=.
These properties represent the most stringent limit systemd is aware of
for the given unit -- and that is typically(*) the effective value.
Implement the properties by simply traversing all parents in the
leaf-slice tree and picking the minimum value. Note that effective
limits are thus defined even for units that don't enable explicit
accounting (because of the hierarchy).
(*) The evasive case is when systemd runs in a cgroupns and cannot
reason about outer setup. Complete solution would need kernel support.
Yu Watanabe [Tue, 2 Jan 2024 19:36:47 +0000 (04:36 +0900)]
resolve/mdns: do not append goodby packet entries to known answers section
When we receive a goodby packet about a host, and we have a cache entry about
the host, we do not immediately remove the cache entry, but update it with TTL 1.
See RFC 6762 section 10.1 and 3755027c2cada70345c96787a9b5569994dd23ed.
If we receive a request soon after the goodby packet, previously the
entry was included in the known answers section of the reply. But such
information should not be appended.
This does what we do for system extension also for configuration
extension.
This is complicated by the fact that we previously looked for
<uki-binary>.d/*.raw for system extensions. We want to measure sysexts
and confexts to different PCRs (13 vs. 12) hence we must distinguish
them, but *.raw would match both kinds.
This commit solves this via the following mechanism: we'll load confexts
from *.confext.raw and sysexts from *.raw but will then enclude
*.confext.raw from the latter. This preserves compatibility but allows
us to somewhat reasonable distinguish both types of images.
The documentation is updated not going into this detail though, and
instead now claims that sysexts shall be *.sysext.raw and confexts
*.confext.raw even though we actually are more lenient than this. This
is simply to push people towards using the longer, more descriptive
suffixes.
I added an XML comment (<!-- … -->) about this to the docs, so that
whenever somebody notices the difference between code and docs
understands why and leaves it that way.
Yu Watanabe [Tue, 2 Jan 2024 19:19:33 +0000 (04:19 +0900)]
network/queue: fix potential double-free on oom
Currently, link_queue_request_safe(), which is a wrapper of
request_new(), is called with a free function at
- link_request_stacked_netdev() at netdev/netdev.c,
- link_request_address() at networkd-address.c,
- link_request_nexthop() at networkd-nexthop.c,
- link_request_neighbor() at networkd-networkd.c.
For the netdev case, the reference counter of the passed object is increased
only when the function returns 1. So, on failure (with -ENOMEM)
previously we unexpectedly dropped the reference of the NetDev object.
Similarly, for Address and friends, the ownership of the object is moved to the
Request object only when the function returns 1. And on failure, previously
the object was freed twice.
Also, netdev_queue_request(), which is another wrapper of request_new()
potentially leaks memory when the same NetDev object is queued twice.
Fortunately, that should not happen as the function is called only once
per object.
This fixes the above issue, and now the ownership or the reference
counter of the object is changed only when it is succeeded with 1.
Rewrite the test in bash and make it part of our integration test suite,
so it's actually executed in all our upstream CI environments.
The original test is flaky in environments where daemon-reload might
occur during the test runtime (e.g. when running the test in parallel
with the systemd-networkd test suite). Also, it was run only in CentOS
CI in limited way (i.e. without sanitizers), since it tests the host's
systemd, instead of the just built one.
If an address is requested, and the request is already called,
we may not received its reply and notification from the kernel, and
the corresponding address object may not be remmbered. Even in such
case, we need to remove the address, otherwise the address will come
later after the function called.
Frantisek Sumsal [Thu, 28 Dec 2023 16:12:24 +0000 (17:12 +0100)]
coccinelle: drop a couple of FIXMEs
Turns out Coccinelle can handle compound literals just fine, the parsing
errors were caused by incorrectly parsed macros in code before the
literals, so let's just provide simplified versions for such macros.
The parsing error in `Type *foo[ELEMENTSOF(bar)] = {};` is actually
harmless; it occurs only when creating an array of pointers for a type
that's in an external header and it occurs only on the first parser's
pass, subsequent passes resolve the type correctly.
Also, unset ENABLE_DEBUG_HASHMAP, so Coccinelle doesn't expand the
hashmap debug macros.
As for the remaining FIXMEs, I opened a couple of issues in the
Coccinelle upstream to see if they can be fixed there (or at least
properly analyzed).
efi-loader: when detecting if we are booted in UKI measured boot mode, imply a check for TPM2
We simply don't carry any userspace support for TPM1.2 in our tree, and
we shouldn't given it's too weak by today's standards. Hence, if we
check if we are booted in UKI measured boot mode, don't just check if we
are booted in EFI, but also check that we have a TPM2 chip (as opposed
to none or only a TPM1.2 chip).
This is an alternative to #30652 but more comprehensive (and simpler),
since it covers all invocations of efi_measured_uki().
Yu Watanabe [Fri, 8 Dec 2023 07:01:06 +0000 (16:01 +0900)]
sd-journal: introduce cleanup function and hash ops for Directory
This makes the folloing:
- Each Directory object now has a reference to sd-journal.
- Hence, directory_free(), which is renamed from remove_directory(), can
be called without sd-journal as an argument.
- Introduces hash ops for Directory, so the finalization becomes
slightly simpler.
- Allocate hashmaps that store Directory objects when necessary.
- Split out add_directory_impl().
Mike Yuan [Mon, 1 Jan 2024 12:08:11 +0000 (20:08 +0800)]
logind: use handle_action_to_string where appropriate
Since 138224fc807091d31f19a3b22f066d6044626001, HandleActionData
records the corresponding HandleAction. Let's use it instead of
relying on inhibit_what when mapping to string.