core: move pid watch/unwatch logic of the service manager to pidfd
This makes sure unit_watch_pid() and unit_unwatch_pid() will track
processes by pidfd if supported. Also ports over some related code.
Should not really change behaviour.
Note that this does *not* add support waiting for POLLIN on the pidfds
as additional exit notification. This is left for a later commit (this
commit is already large enough), in particular as that would add new
logic and not just convert existing logic.
This new helper can be used after reading process info from procfs, to
verify that the data that was just read actually matches the pidfd, and
does not belong to some new process that just reused the numeric PID of
the process we originally pinned.
pidref: add helpers for managing PidRef on the heap
Usually we want to embed PidRef in other structures, but sometimes it
makes sense to allocate it on the heap in case it should be used
standalone. Add helpers for that.
Primary usecase: use as key in Hashmap objects, that for example map
process to unit objects in PID 1.
This adds pidref_free()/pidref_freep() for freeing such an allocated
struct, as well as pidref_dup() (for duplicating an existing PidRef
on the heap 1:1), and pidref_new_pid() (for allocating a new PidRef from a
PID).
This helper truns a pid_t into a PidRef. It's different from
pidref_set_pid() in being "passive", i.e. it does not attempt to acquire
a pidfd for the pid.
This is useful when using the PidRef as a lookup key that shall also
work after a process is already dead, and hence no conversion to a pidfd
is possible anymore.
resolved: register ipv4only.arpa are private domain
From RFC 8880:
Because the 'ipv4only.arpa' zone has to be an insecure delegation,
DNSSEC cannot be used to protect these answers from tampering by
malicious devices on the path.
Consequently, the 'ipv4only.arpa' zone MUST be an insecure delegation to
give DNS64/NAT64 gateways the freedom to synthesize answers to those
queries at will, without the answers being rejected by DNSSEC-capable
resolvers. DNSSEC-capable resolvers that follow this specification MUST
NOT attempt to validate answers received in response to queries for the
IPv6 AAAA address records for 'ipv4only.arpa'. Note that the name
'ipv4only.arpa' has no use outside of being used for this special DNS
pseudo-query used to learn the DNS64/NAT64 address synthesis prefix, so
the lack of DNSSEC security for that name is not a problem.
There's no way for us to wait for specific virtiofs tags to appear,
so we have to try and make sure that the tags are all available by
the time we try to mount any virtiofs tag. Let's try to do that by
loading the necessary modules as early as we can.
Mike Yuan [Wed, 27 Sep 2023 10:38:10 +0000 (18:38 +0800)]
core: mark units as need daemon-reload if unit file operations are
performed
systemctl would issue daemon-reload after unit file operations
(enable/disable/preset/...) succeed. However, such operations
are not atomic, meaning that the unit file state could still change
even if the operation generally fails, and the unit_file_state
cached by manager becomes outdated.
pid1: add SurviveFinalKillSignal= to skip units on final sigterm/sigkill spree
Add a new boolean for units, SurviveFinalKillSignal=yes/no. Units that
set it will not have their process receive the final sigterm/sigkill in
the shutdown phase.
This is implemented by checking if a process is part of a cgroup marked
with a user.survive_final_kill_signal xattr (or a trusted xattr if we
can't set a user one, which were added only in kernel v5.7 and are not
supported in CentOS 8).
exec-util: print executed commands in do_execute()
kernel-install uses do_execute(). We would log whenever a spawned child
finished, but we would not log anything when the child is launched. When the
children log output without a prefix (as the kernel-install plugins do), it
is hard to see where that output is coming from.
For us, this is a compatibility mode, but most likely it is there to stay: the
kernel Makefile's install target expects to be able to call /bin/installkernel.
We want people who build their own kernels to use this, so that they use
kernel-install and get support for all the functionality provided by it,
including building of UKIs and other new features. So let's actually advertise
that this exists and works.
Because names beneath .alt are in an alternative namespace, they have no
significance in the regular DNS context. DNS stub and recursive
resolvers do not need to look them up in the DNS context.
Valentin David [Sun, 24 Sep 2023 12:35:59 +0000 (14:35 +0200)]
sysupdate: Allow patterns to match path with directories
`MatchPattern` for regular-file and directory as target can now match
subdirectories This is useful to install files for examples in `.extra.d`
directories:
Topi Miettinen [Sun, 22 May 2022 12:17:24 +0000 (15:17 +0300)]
core: add user and group to NFTSet=
The benefit of using this setting is that user and group IDs, especially dynamic and random
IDs used by DynamicUser=, can be used in firewall configuration easily.
core: firewall integration of cgroups with NFTSet=
New directive `NFTSet=` provides a method for integrating dynamic cgroup IDs
into firewall rules with NFT sets. The benefit of using this setting is to be
able to use control group as a selector in firewall rules easily and this in
turn allows more fine grained filtering. Also, NFT rules for cgroup matching
use numeric cgroup IDs, which change every time a service is restarted, making
them hard to use in systemd environment.
This option expects a whitespace separated list of NFT set definitions. Each
definition consists of a colon-separated tuple of source type (only "cgroup"),
NFT address family (one of "arp", "bridge", "inet", "ip", "ip6", or "netdev"),
table name and set name. The names of tables and sets must conform to lexical
restrictions of NFT table names. The type of the element used in the NFT filter
must be "cgroupsv2". When a control group for a unit is realized, the cgroup ID
will be appended to the NFT sets and it will be be removed when the control
group is removed. systemd only inserts elements to (or removes from) the sets,
so the related NFT rules, tables and sets must be prepared elsewhere in
advance. Failures to manage the sets will be ignored.
If the firewall rules are reinstalled so that the contents of NFT sets are
destroyed, command systemctl daemon-reload can be used to refill the sets.
Example:
```
table inet filter {
...
set timesyncd {
type cgroupsv2
}
In the only two codepaths we reach this place we know that cfd is
already invalidated. In the Accept=yes case there's already a
TAKE_FD() a few lines further up, and in the Accept=no case there is no
connection fd anyway.
This rearranges various cases of "goto fail" in service.c: sometimes the
whole "goto fail" logic was redundant, since only jumped to form a
single place. Sometimes the log message was generated in the fail
section, instead of the place jumped to from, which resulted in
duplicate or misleading error messages.
No real codeflow changes, just refactoring primarily around log
messages.
Mike Yuan [Thu, 21 Sep 2023 06:59:26 +0000 (14:59 +0800)]
sleep-config: several cleanups
* Rename free_sleep_config to sleep_config_free
* Rearrange functions
* Make SleepConfig.modes and .states only contain
operations that needs configuration
* Add missing assert
blockdev@.target is used as a synchronization point between
the mount unit and corresponding systemd-cryptsetup@.service.
After the mentioned commit, it doesn't get a stop job enqueued
during shutdown, and thus the stop job for systemd-cryptsetup@.service
could be run before the mount unit is stopped.
Therefore, let's make blockdev@.target conflict with umount.target,
which is also what systemd-cryptsetup@.service does.
EFI variable access is slow, hence let's avoid it if there's no need.
Let's cache the result of efi_measured_uki() so that we don't have to go
to the EFI variables each time.
This only caches in the yes/no case. If we encounter an error we don't
cache, so that we go to disk again.
This should optimize things a bit given we now have a bunch of services
which are conditioned with this at boot.
Let's say "uki" rather than "stub", since that is just too generic, and
we shouldn't limit us to our own stub anyway, but generally define a
concept of a "measured UKI", which is a UKI that measures its part to
PCR 11.
This is mostly preparation for exposing this check to the user via
ConditionSecurity=.
Balázs Úr [Wed, 27 Sep 2023 01:36:03 +0000 (03:36 +0200)]
po: Translated using Weblate (Hungarian)
Currently translated at 100.0% (227 of 227 strings)
Co-authored-by: Balázs Úr <balazs@urbalazs.hu>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/master/hu/
Translation: systemd/main
Jan Janssen [Tue, 26 Sep 2023 13:14:38 +0000 (15:14 +0200)]
meson: Fix version script handling
Build targets should have a link dependency on the version scripts they
use. This also uses absolute paths in anticipation for meson 1.3
needlessly deprecating file to string conversions.
systemd-hwdb: fix unsigned and signed comparison problem
...
uint8_t c;
struct trie_node *child;
for (p = 0; (c = trie->strings->buf[node->prefix_off + p]); p++) {
_cleanup_free_ struct trie_node *new_child = NULL;
_cleanup_free_ char *s = NULL;
ssize_t off;
if (c == search[i + p])
continue;
...
When '®' is present in search, c is 194, search[i + p] is -62, c is not equal to search[i + p], but c should be equal to search[i + p].
This was requested, though I think an issue was never filed. If people are
supposed to invoke it, even for testing, then it's reasonable to make it
"public".
cryptsetup: add parse_argv() and implement --version
All public programs are expected to have that. The --help output is adjusted to
follow the usual style (highlighting, listing of options). The OPTIONS
positional argument is renamed to "CONFIG", because we now also have "OPTIONS…"
to describe the non-positional options.
I was missing an example of how to use cryptenroll. We have that, but in
another page. Instead of repeating, let's just direct the user to the right
place.
Also, reformat synopsis to the "official" non-nested syntax.