git.ipfire.org Git - thirdparty/systemd.git/log

tree-wide: get rid of backslashes in file names

File names containing backslashes cannot be checked out on Windows,
are not handled properly by build systems such as buck, and are awkward
to work with in general, so store the \x2d slice units under sanitized
file names, and rename them to the real unit name on install via a new
optional "name" key in the units list.

The fuzz-unit-file corpus sample is renamed as well; its file name is
not meaningful to the fuzzer, and dropping the backslash means it is no
longer skipped by the meson workaround for
https://github.com/mesonbuild/meson/issues/1564, so it runs as a
regression test again.

machined: drop superfluos 'supervisor' varlink input parameter for register method

The supervisor is derived from the caller's socket in D-Bus, and it is
not an input parameter. Do the same in varlink.

Follow-up for 97754cd14dc7b3630585383ecba92191667860e4

core: add all manager timestamps to metrics report (#42842)

These are all very useful for establishing the health of a fleet, so
export them too

Follow-up for 0b0db27050595251b40b4e7cf56593a275eaf3c2

calendarspec: warn on weekday/date conflict in systemd-analyze and systemd-run

When a fixed date (e.g. 2027-01-01) is paired with a weekday constraint
(e.g. Thu) that does not match, the timer silently never elapses.

Add calendar_spec_from_string_full(..., warn_on_weekday_mismatch) so
user-facing tools can opt in to a log_warning() at parse time:
- systemd-analyze calendar: uses _full(true)
- systemd-run --on-calendar: uses _full(true)
- .timer OnCalendar=: uses log_syntax() with file/line context

Add test_calendar_spec_weekday_conflict(): forks a child with stderr
captured in a memfd via pidref_safe_fork_full(), verifies the warning
is emitted for conflicting specs and suppressed for valid ones.

Fixes: #40350
Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>

tpm2: optionally disable TPMA_NV_ORDERLY for NvPCRs (#42848)

test: ignore fails when the formatted timezone differs from the current one

When formatting a timestamp the C API takes into account historical data
from tzdata, so it returns a date strings with a historically-correct
timezone abbreviation. However, tzname[] doesn't do this and it returns
the most recent abbreviation for the given zone.

For example, according to tzdata America/Cancun switched from EST/EDT to
CST/CDT on 1998-08-02:

Zone America/Cancun     -5:47:04 -      LMT     1922 Jan  1  6:00u
                        -6:00   -       CST     1981 Dec 26  2:00
                        -5:00   -       EST     1983 Jan  4  0:00
                        -6:00   Mexico  C%sT    1997 Oct 26  2:00
                        -5:00   Mexico  E%sT    1998 Aug  2  2:00
                        -6:00   Mexico  C%sT    2015 Feb  1  2:00
                        -5:00   -       EST

So, formatting a timestamp from this time will yield a string with the
EDT timezone:

$ TZ=America/Cancun date -d "@902035565"
Sun Aug  2 01:26:05 EDT 1998

But using tzname[] (or strptime %z) shows the most recent data, where
America/Cancun uses EST (and doesn't use DST anymore, hence
tzname[1]=CDT that glibc remembers from the previous zone epoch):

$ TZ=America/Cancun ./tz
{EST, CDT}

This means that when we parse the formatted timestamp back we don't use
the historical timezone data, so we might end up with a different
offset:

TZ=America/Cancun, tzname[0]=EST, tzname[1]=CDT
@902035565603993 → Sun 1998-08-02 01:26:05 EDT → @902039165000000 → Sun 1998-08-02 01:26:05 CDT
src/test/test-time-util.c:452: Assertion failed: Expected "ignore" to be true
Aborted                    (core dumped) build-local/test-time-util

Instead of adding exceptions for every single timezone that switched
between different offsets in the past, let's address this a bit more
generally and skip the check if the parsed timezone doesn't match any of
the current timezones - this still keeps the check that the time
difference in such case is exactly one hour, so its effect should be
limited mostly to DST-related changes.

Resolves: #37684

Improve OpenSSL 4 support (#42843)

bootctl: expose --esp-path/--boot-path/--make-entry-directory via Varlink (#42838)

The Install/Unlink/Link/LinkAuto Varlink methods always auto-discover
the ESP and XBOOTLDR partitions and Install always runs
make-entry-directory in auto mode, so IPC callers cannot match what the
CLI verbs do with `--esp-path`, `--boot-path` and
`--make-entry-directory`. Add optional `espPath`, `xbootldrPath` and
`makeEntryDirectory` parameters that feed into the same code paths the
CLI already uses.

Allow systemd to be built as as single statically-linked binary (#42820)

The idea is that we can build a container by building a single-binary
systemd:
```console
$ meson setup build-static --default-library=static --prefer-static --auto-features=disabled -Dbuild-static=true -Dsystemd-multicall-binary=true && ninja -C build-static systemd
$ mkdir /var/tmp/container/usr/lib -p
$ cp build-static/systemd /var/tmp/container/usr/lib/
$ echo 'ID=quick' >/var/tmp/container/usr/lib/os-release
$ systemd-nspawn --restrict-address-families=af_unix --register=no --private-users=managed -D /var/tmp/container/ /usr/lib/systemd
░ Spawning container container on /var/tmp/container.
░ Press Ctrl-] three times within 1s to kill container; two times followed by r
░ to reboot container; two times followed by p to poweroff container.
Selected user namespace base 1855193088 and range 65536.
systemd 262~devel running in system mode (-PAM -AUDIT +SELINUX -APPARMOR +IMA +IPE +SMACK -SECCOMP -GCRYPT +GNUTLS +OPENSSL -ACL +BLKID +CURL -ELFUTILS -FIDO2 +IDN2 +KMOD +LIBCRYPTSETUP +LIBCRYPTSETUP_PLUGINS +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 -BZIP2 -LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -BTF -XKBCOMMON +UTMP -LIBARCHIVE)
Detected virtualization systemd-nspawn.
Detected architecture x86-64.
Detected first boot.

Welcome to Linux!

Initializing machine ID from container UUID.
Failed to open netlink, ignoring: Address family not supported by protocol
Applying preset policy.
Populated /etc with preset unit settings.
Unit default.target not found.
Falling back to graphical.target.
Mount unit not supported, skipping *MountsFor= dependencies.
Queued start job for default target graphical.target.
[  OK  ] Reached target sysinit.target.
[  OK  ] Reached target basic.target.
System is tainted: unmerged-bin:var-run-bad
[  OK  ] Reached target multi-user.target.
[  OK  ] Reached target graphical.target.
Startup finished in 61ms.
```
The container can be reloaded with SIGTERM, powered off with SIGRTMIN+4,
etc. SIGRTMIN+5 should cause a reboot but it currently fails:
```
...
Rebooting.
Container container is being rebooted.
Failed to attach root directory: Invalid argument
Failed to receive mount namespace fd from outer child: Input/output error
```
It's a bug … somewhere, but probably not caused by the linking changes
being done here.

update TODO

tpm2: cache NvPCR NV space exhaustion via flag files in /run/

When we run out of NV index space while allocating an NvPCR, the
situation will unlikely improve until (at least) reboot. Retrying the
(doomed) Esys_NV_DefineSpace call on every subsequent allocation attempt
is wasteful (and very slow), so remember the exhaustion in a flag file
under /run/ and fail early next time.

We use two separate flag files, one for orderly and one for non-orderly
NvPCRs, since the two draw on different TPM resources (RAM-backed vs.
NVRAM-backed): exhaustion of one doesn't imply exhaustion of the other.

The files live in /run/, hence are cleared on reboot, which is
potentially is when NV space might become available again.

tpm2: optionally disable TPMA_NV_ORDERLY for NvPCRs

NVIndexes in TPMs can operate in two modes:

1. Backed by TPM RAM. In this case they are only written to NVRAM on an
   orderly TPM shutdown when the system goes down. (TPMA_NV_ORDERLY flag
   is on)

2. Backed by TPM NVRAM. In this case the nvindex value is written to NVRAM
   on every write, and things are not delayed until orderly shutdown.

Normally mode 1 sounds like the obvious choice for NvPCRs, which reset
to zero anyway at boot. However, things are more complicated since
real-life TPMs tend to have a lot less RAM than NVRAM (both are
constrained but RAM even more than NVRAM). Hence there's value in using
NVRAM right-away. However, writing to NVRAM all the time means wearing
it out (since NVRAM is more vulnerable to that).

So far we unconditionally went for mode 1, but ran into space
constraints of RAM due to that.

Let's improve things a bit, and use orderly mode for NvPCRs we expect to
write many times, and non-orderly mode for those we expect to write only
a small, fixed number of times at boot, and not anymore during runtime.
Right now, this is only the "hardware" NvPCR, which measures hw identity
at boot.

Hopefully, this stretches available resources a bit further.

This also makes sure if the flag was set differently on allocation as
we'd set now, we accept it and won't complain, to make upgrades safe.

Suggested by Andreas Fuchs.

pcrextend,tpm2-util,tpm2-setup: gracefully skip NvPCR when TPM NV space is exhausted

Map TPM2_RC_NV_SPACE to -ENOBUFS in tpm2_define_nvpcr_nv_index() rather
than -ENOSPC, giving it a dedicated errno distinct from the -ENOSPC that
write_string_file_at() can return when /run is full — a different failure
that occurs after the NV index is already allocated on the TPM.

In extend_nvpcr_now(), propagate -ENOBUFS as-is so callers can handle it
in a nicer fashion. In vl_method_extend(), map it to the new varlink error
io.systemd.PCRExtend.NvPCRSpaceExhausted. In run(), handle -ENOBUFS
explicitly under --graceful and print an appropriate message. Update
tpm2-setup.c accordingly.

Fixes #42725
Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>

import: Trust subkeys included in signature (#41860)

- import: Trust subkeys included in signature

    With gpg sub keys one can rotate signing keys while having a stable
trust anchor. So far one still had to ship the sub key out of band but
a newer gpg has the option to include the sub key in the signature and
import it automatically. This is safe if we only allow importing a sub
    key signed by the top key we already have in the key ring.
    Add the --auto-key-import argument to gpg to import subkeys but also
set --import-options=merge-only,import-clean to restrict what we import
    to only be sub keys signed by the top key we have in the keyring and
    discard any irrelevant parts.
- import: Support env var to override gpg keyring

By default there is a fixed keyring in /usr or /etc. But when running
systemd-pull unprivileged in the user context or with a custom transfer
definition as in systemd-sysupdate --definitions=./... (e.g., for local
ParticleOS updates) it is limiting to require that all keys have to be
part of the OS keyring or otherwise no verification can be used. Also,
    for testing it is valuable to point it at a different keyring.
    Add a SYSTEMD_OPENPGP_KEYRING env var where the omission or empty
assignment sticks to the current behavior of the global OS keyrings but
    a keyring path given will take precedence. While an env var can leak
    down the process tree and is more difficult to secure for being the
trust anchor the advantage is that one can directly specify it in the
    service unit as drop-in instead of having to patch the command
invocation. Anyway it's a niche use case and thus not part of the man
    page.

run: default run0 to root explicitly

When neither --user=, --area= nor --empower is given, run0 already
behaves as if root was requested, but only implicitly. That can be
misleading downstream.

Set arg_exec_user to "root" up front in parse_argv_sudo_mode(), so the
intent is visible and the rest of the code can rely on it.

Fixes #40468

Signed-off-by: Shihao Ren <renshihao.rsh@bytedance.com>

Fix: tmpfiles clean device node (#42791)

Two minor issues regarding the tmpfile.c file

measure: Support binding signed policies to individual phases

systemd-measure can produce multiple signed policies for different
phases. However, a policy for a TPM resource that includes these signed
policies can currently be satisfied by any policy that is signed
with the same key.

It can be desirable to bind a resource's policy to one or more
specific phases. One way to do this could be to sign policies for
different phases or phase combinations with different keys. Another
approach is to limit the scope of signed policies using a policy
reference.

Using a policy reference works because:
- The reference is included along with the approved policy digest in the
  digest that is signed.
- The reference argument is included in the authorization policy for a
  resource via the TPM2_PolicyAuthorize assertion.
- During execution of the TPM2_PolicyAuthorize assertion, the TPM checks
  that the session's current policy digest is the approved policy digest,
  computes a digest from the approved policy digest and the supplied policy
  reference, and checks that the resulting digest is the one that was
  verified by TPM2_VerifySignature (via the returned ticket).

This adds a new --policyref argument to systemd-measure which binds all
of the signed policies to the specified policy reference. I did consider
making this more intelligent by auto-generating policy references for each
phase, but this approach provides the most flexibility for now. By making
use of the existing --append argument, a signer can produce multiple
signed policies with the same key that are bound to any individual phases
or combinations of phases.

The policy reference is a string without the NULL terminator. It is
supplied to the TPM via the TPM2B_NONCE type, which has a maximum size
equivalent to the size of the largest digest supported by the TPM. As
the signer doesn't know the capabilities of the target TPM,
systemd-measure limits the size of the policy reference to 32 bytes, to
fit within the size of a SHA256 digest.

This also includes the corresponding changes to systemd-cryptenroll and
systemd-repart (to add an equivalent --tpm2-public-key-policyref argument
so that the policy can be bound to the desired corresponding phase, and to
ensure that the policy reference is included in the LUKS2 token metadata),
and systemd-cryptsetup (to handle the policy reference stored in the LUKS2
token metadata).

This doesn't include policy reference support for credentials yet
because it requires a change to the credential headers.

bless-boot: avoid false maybe-uninitialized warning

Obserbed with GCC-11 on Ubuntu.
```
In file included from ../src/shared/format-table.h:7,
                 from ../src/bless-boot/bless-boot.c:11:
../src/bless-boot/bless-boot.c: In function ‘verb_set’:
../src/basic/log.h:187:27: error: ‘source2’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  187 |                         ? log_internal(_level, _e, PROJECT_FILE, __LINE__, __func__, __VA_ARGS__) \
      |                           ^~~~~~~~~~~~
../src/bless-boot/bless-boot.c:458:40: note: ‘source2’ was declared here
  458 |         const char *target, *source1, *source2;
      |                                        ^~~~~~~
In file included from ../src/shared/format-table.h:7,
                 from ../src/bless-boot/bless-boot.c:11:
../src/basic/log.h:187:27: error: ‘source1’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  187 |                         ? log_internal(_level, _e, PROJECT_FILE, __LINE__, __func__, __VA_ARGS__) \
      |                           ^~~~~~~~~~~~
../src/bless-boot/bless-boot.c:458:30: note: ‘source1’ was declared here
  458 |         const char *target, *source1, *source2;
      |                              ^~~~~~~
In file included from ../src/shared/format-table.h:7,
                 from ../src/bless-boot/bless-boot.c:11:
../src/basic/log.h:187:27: error: ‘target’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  187 |                         ? log_internal(_level, _e, PROJECT_FILE, __LINE__, __func__, __VA_ARGS__) \
      |                           ^~~~~~~~~~~~
../src/bless-boot/bless-boot.c:458:21: note: ‘target’ was declared here
  458 |         const char *target, *source1, *source2;
      |                     ^~~~~~
cc1: all warnings being treated as errors
```

ci/build-test: try to build with OPENSSL_NO_DEPRECATED

crypto-util: drop unused symbol

crypto-util: make OpenSSL UI API symbols optional during dlopen

Previously, if systemd was built with OpenSSL UI support, it would fail
to load libcrypto at runtime if the library lacked UI support, requiring
a recompilation of systemd to fix.

Let's relax this strict requirement by making the UI methods optional
during dlopen(). openssl_ui_supported() is added to dynamically check
if all required UI symbols were successfully loaded.

crypto-util: allow loading private keys from engine/provider without UI support

OpenSSL UI is not a mandatory feature to load private keys from an engine
or a provider. Let's allow loading private keys even if OpenSSL UI is not
supported.

Note that even if OPENSSL_NO_UI_CONSOLE is set, the type UI_METHOD is
always defined. Hence, the `#ifndef` condition in the definition of
struct OpenSSLAskPasswordUI is unnecessary and can be dropped.

crypto-util: drop redundant logs

The called functions already log errors internally.

crypto-util: move functions

Implementations for loading private/public keys and X.509 certificates
were scattered. Group them together to improve readability.

crypto-util: make OpenSSL ENGINE API symbols optional during dlopen

If systemd is compiled with OpenSSL 3 headers but executed in an environment
where OpenSSL 4 (libcrypto.so.4) is loaded, dlopen_many_sym_or_warn() will
fail because OpenSSL 4 completely removes the deprecated ENGINE API. This
breaks the ability to dynamically fallback and seamlessly upgrade OpenSSL
without recompiling systemd.

To fix this, drop the ENGINE API symbols from the mandatory DLSYM_ARG() list.
Instead, try to load them via DLSYM_OPTIONAL() after the library is opened.
load_key_from_engine() is updated to check for their presence and return
-EOPNOTSUPP if the loaded OpenSSL version does not provide them.

crypto-util: drop dlopen_libcrypto() from static functions

memory-util: drop unused memdup_reverse()

With the previous commit, now the function is unused anymore.
Let's drop it.

crypto-util: drop manual endianness handling in rsa_pkey_from_n_e()

Currently, rsa_pkey_from_n_e() uses architecture-specific `#if` branches
and memdup_reverse() to handle big-endian RSA components (n and e)
before passing them directly to OSSL_PARAM_construct_BN().

We can simplify this by parsing the raw big-endian bytes into BIGNUMs
first using BN_bin2bn(), which natively expects big-endian data. We
can then push these BIGNUMs into OSSL_PARAM_BLD. This delegates the
data format handling entirely to OpenSSL and successfully removes the
platform-specific code.

crypto-util: simplify openssl_extract_public_key()

Drop memstream and i2d_PUBKEY_fp(). We can simply use i2d_PUBKEY()
which automatically allocates the necessary buffer for us.

Note that dropping the secure erase (erase_and_freep()) in favor of
OPENSSL_free() is intentional and safe, as the buffer only holds
public key material which does not need to be securely wiped.

crypto-util: use correct cleanup function for OpenSSL buffers

Buffers allocated by OpenSSL must be freed with OPENSSL_free().
Fortunately, we do not enable the secure heap, so OPENSSL_free()
is currently equivalent to free(), but let's fix this for correctness.

crypto-util: simplify pubkey_fingerprint()

There is no need to call i2d_PublicKey() twice. Passing a pointer
to NULL allows OpenSSL to automatically allocate the necessary buffer.

crypto-util: drop unused deprecated symbols

resolved: migrate ECDSA verification to OpenSSL 3 EVP API

OpenSSL 3.0 deprecated low-level Elliptic Curve (EC) key manipulation
functions (EC_KEY, EC_POINT, EC_GROUP) and direct signature verification
functions like ECDSA_do_verify().

This commit modernizes dnssec_ecdsa_verify_raw() by transitioning to the
provider-aware EVP API:
* Uses OSSL_PARAM arrays and EVP_PKEY_fromdata() to construct the EC
  public key directly from the raw octet string and curve name, avoiding
  deprecated EC_POINT parsing.
* Converts the raw R and S signature components into a DER-encoded ASN.1
  signature using i2d_ECDSA_SIG(), as required by the modern EVP API.
* Uses EVP_PKEY_verify() for the actual signature validation.

Additionally, this drops an outdated TODO comment waiting for raw ECDSA
support in the EVP API, as well as the deprecated warning suppression
macros and fallback code blocks. Unit tests for ECDSA are now run
unconditionally.

resolve: make dnssec_ecdsa_verify_raw() take struct iovec

This also
- adds missing assertions,
- moves variable declarations,
- rename variables.

No functional change, just refactoring.

resolve: add unit test for dnssec_ecdsa_verify_raw()

resolved: migrate RSA key construction to OpenSSL 3 EVP API

OpenSSL 3.0 deprecated low-level key manipulation functions and direct
access to RSA structures (such as RSA_new(), RSA_set0_key(), and
RSA_size()).

This commit modernizes dnssec_rsa_verify_raw() by replacing these
deprecated functions with the provider-aware EVP API:
* Uses OSSL_PARAM_BLD and EVP_PKEY_fromdata() to construct the RSA
public key directly from the modulus and exponent BIGNUMs.
* Replaces RSA_size() with EVP_PKEY_get_size().

Consequently, the workaround macros suppressing deprecated warnings
(DISABLE_WARNING_DEPRECATED_DECLARATIONS) and the conditional fallback
blocks (#if !defined(OPENSSL_NO_DEPRECATED_3_0)) are no longer needed
and have been dropped. Unit tests are also updated to run unconditionally.

resolve: make dnssec_rsa_verify_raw() take struct iovec

This also
- adds missing assertions,
- moves variable declarations.

No functional change, just refactoring.

resolve: add unit test for dnssec_rsa_verify_raw()

crypto-util: load several more symbols

They will be used in later commits.

Note, ECDSA_SIG_new() and ECDSA_SIG_set0() are not deprecated.
They are moved to the section for non-deprecated symbols.

resolve: fix segfault when built with OPENSSL_NO_DEPRECATED_3_0

In that case, deprecated funcdions are not loaded from libcrypto.so,
and calling them causes segfault.

coredumpctl: use break instead of continue for time bound checks

When iterating journal entries with --until (forward scan) or --since
(reverse scan), the code used continue instead of break after crossing
the time boundary.

Since sd_journal_seek_realtime_usec() is called before the loop to
position at the start of the range, sd_journal_next()/previous()
returns entries in monotonically increasing/decreasing time order.
Once an entry's timestamp exceeds arg_until (or falls below arg_since
in reverse), all subsequent entries will also be out of range.

Using continue caused the entire remaining journal to be scanned
unnecessarily. journalctl uses break for the identical pattern in
src/journal/journalctl-show.c.

Fixes: #42808
Signed-off-by: dongshengyuan <dongshengyuan@uniontech.com>

udev: Tag debug appliance nodes as xaccess-debug-appliance

This is especially helpful for people who are running headless (i.e.
SSH-only) Android platform development setups with testing devices
attached.

man: add thread-awareness note to sd_bus/sd_event manpages

This question comes up every now and then, and it is not clearly documented,
so include the thread-aware tag in all bus/event manpages.

man: note that sd-tmpfiles/sysusers --root is not a sandboxing feature

This seems to be causing enough confusion that it is worth explicitly
mentioning in the docs

report: drop redundant .generate = NULL metric initializers

The .generate field of MetricFamily defaults to NULL when omitted from a
designated initializer, so spelling it out explicitly on the metric-family
entries (and the field-generating macros) that have no generator of their
own is just noise. Drop it.

core: add all manager timestamps to metrics report

These are all very useful for establishing the health of a fleet,
so export them too

Follow-up for 0b0db27050595251b40b4e7cf56593a275eaf3c2

bootctl: accept makeEntryDirectory in Install Varlink method

The CLI defaults --make-entry-directory to off and lets callers opt in
or request auto mode. The Varlink Install method always ran in auto
mode with no way to override it. Expose the tri-state so IPC callers
can match the CLI behaviour.

bootctl: accept espPath/xbootldrPath in Varlink methods

The CLI verbs have --esp-path/--boot-path but the Varlink methods
always auto-discover the partitions, so callers that mount the ESP
or XBOOTLDR at a non-standard location have to fall back to the
SYSTEMD_ESP_PATH/SYSTEMD_XBOOTLDR_PATH environment variables.
Allow to specify the paths when calling Install/Unlink/Link/LinkAuto
so the Varlink API is on par with the CLI.

vmspawn/nspawn: Always use a per-machine runtime subdirectory

Some state files were prefixed with the machine name which protected
against collisions when running concurrent machines in a systemd unit
with RuntimeDirectory= set. However, the socket files tpm.sock and
control were not, which caused a race. Prefixing these socket files
does not work because the path gets too long when both the subdirectory
and the socket file within it get unique identifiers.

Instead of prefixing all files, we can rather always create a
subdirectory and use simple names within in. This makes paths shorter
in the normal case and protects against races with RuntimeDirectory=
were instead of directly reusing RUNTIME_DIRECTORY we also create the
normal per-machine subdirectory. Since this here is about runtime state
it should not impact any running VMs/containers. The ssh-proxy looks
only at the normal case and does not support RUNTIME_DIRECTORY, so no
impact there as well.

NEWS,README: describe embedded files and static linking

manager: fall back to direct reboot() if shutdown binary is missing

When PID 1 reaches the shutdown phase, if systemd-shutdown was missing,
it'd complain loudly and freeze. Implement an internal fallback to
handle this case gracefully. The fallback is always in place, since it
makes things generally more robust and is only a little bit of code.

This shutdown is simplified, systemd issues the reboot() matching the
requested objective directly, mirroring what systemd-shutdown would do
as its final step. This skips all the cleanup systemd-shutdown normally
performs (unmounting file systems, detaching storage, killing remaining
processes, switching into the exitrd).

With SYSTEMD_LOG_LEVEL=debug:
...
[ OK ] Reached target poweroff.target.
Notify message sent to '/run/host/notify': "X_SYSTEMD_UNIT_ACTIVE=poweroff.target"
Shutting down.
OSC sequence for shutdown successfully written: ...
Failed to execute shutdown binary: No such file or directory
Powering off.
Container container has been shut down.

manager: use reboot.target as the default for ctrl-alt-del.target

systemd tries to start ctrl-alt-del.target after receiving a request
to reboot in a container (or when ctrl-alt-del is pressed on a real
machine). If the alias is missing, this will fail. Let's provide a
fallback for this too. I don't think it makes sense to make it
configurable: it seems almost everybody is fine with the default
mapping. If not, people can always just create the symlink in the
filesystem.

core: rename and extend manager_add_job_by_name_and_warn

The "and" in the name was a misnomer. We do one *or* the other.
So rename 's/_and_/_or_/', and extend with the ability to load a second
unit. No functional change.

manager: try to reexecute as /proc/self/exe too

If we try to reexecute, and the binary was started from a non-standard
path, this would fail:
src/core/main.c:2396: Reexecuting.
src/core/main.c:2240: Failed to execute our own binary /usr/lib/systemd/systemd: No such file or directory
Let's make things more robust by trying /proc/self/exe too.

core: let manager_load_startable_unit_or_warn take a log level

Add a log_level parameter so callers can control the severity of load
failures. During boot, when no unit was explicitly requested on the
kernel command line (arg_default_unit is unset), failing to load the
default unit is not fatal since we fall back to other targets, so log
those attempts at LOG_INFO instead of LOG_ERR. All other callers pass
LOG_ERR to keep their existing behaviour.

core: embed essential units as built-in fallbacks

Compile the contents of a few essential target unit files
(graphical.target, multi-user.target, rescue.target, etc.) into the
manager binary, with comments stripped, and fall back to those built-in
copies when no fragment is found on disk. This lets the manager reach a
usable state even on a system that ships none of these unit files, e.g.
in minimal environments or one-off chroots.

On-disk files always take precedence (masks included), as the fallback is
only consulted when nothing is found in the lookup paths. The unit file
list is defined in units/meson.build and embedded via a small generator
script; 'units' is now processed before 'src/core' so the list is
available there.

The set of units and services is selected that should allow for a normal
operation of starting/stopping/restarting, of the machine. The services
all use SuccessAction=…, so they don't require any binaries to be
installed.

sigpwr.target is included, even though it doesn't do anything useful.
The same is true in normal installations. We might want to change it
do something useful there too.

core: add fallback-default-target build option

Add a 'fallback-default-target' meson option that configures which unit
to activate when default.target is not installed. This makes things more
resilient in general and is also useful for minimal/statically-linked
installations that may not ship a default.target symlink.

'graphical.target' is used as the default value of the setting, because
that's what we symlink as default.target in units/meson.build.

In the initrd, we had a fallback to start default.target if initrd.target
cannot be started. This fallback is changed to only do that if it is
not found, not on other errors. This seems more correct (and makes
the two fallbacks symmetrical.)

meson: allow manager to be linked statically

After stripping, in an unoptimized build with glibc, we get a 8.7 MB binary:
$ file build-static/systemd
build-static/systemd: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=1ec004bf61b775d284d909a818a3c4f63f83f055, for GNU/Linux 3.2.0, stripped
$ ls -lh build-static/systemd
-rwxr-xr-x 1 zbyszek zbyszek 8.7M Jun 30 14:07 build-static/systemd

core: convert exec-invoke to use the new static-friendly functions

The libc API is rather ugly: it requires the caller to pass an
array of the maximum possible size (or to repeatedly query the group
list to discover the actual size). With our "static-friendly" helpers,
we reallocate the array on the fly.

shared/creds-util: use getpwuid_malloc()

Static linking, i.e. without dlopen, sometimes fails when getpwuid
is used. I doesn't fail in all cases (glibc version?), so I didn't
do this initially.

user-util: add getgrouplist_malloc and initgroups_wrapper

various: add -Dbuild-static and alternatives to libc functions

When we try to build with -static, the linker refuses dlopen and
various functions related to nss. This commit adds the basic support
to compile with -static and allow the link to pass.

user-util: add lookup_groups_in_files

user-util: sysconf_ngroups_max

This hides the iffy interface of sysconf(_SC_NGROUPS_MAX) to simplify
the calllers.

user-util: add lookup_pwent_in_files and lookup_grent_in_files

Those functions will be used later. This commits adds the
implementations and tests for them.

glibc functions require ability to open nss modules, which requires
dlopen. Linking will fail if any of the nss-using functions are used.
So the following few commits will provide replacements for all of
them.

meson: replace -Dbuild-executor-shared=single with -Dsystemd-multicall-binary=true

This effectively reverts the meson config changes introduced in
0c6186695a697831a7e5ccde0852ea35d4b2409b. I wasn't happy with that form
back when it was commited, and in hindsight, I like it even less. Let's
just add a new option that hopefully is easier to understand and will
work better when new functionality is added in the future.

Now there are two options:
link-executor-shared (boolean, defaults to true)
— "link (separate) systemd-executor to libsystemd-shared.so and libsystemd-core.so"
systemd-multicall-binary (also boolean, defaults to false)
— "link systemd+systemd-executor as a single binary"

Build directories will need to be recreated again, sorry!

meson: fix fs.exists() check for fuzz corpus samples

Commit 8355eb6e11 ("meson: Check if files returned by git ls-files
actually exist") checks the paths printed by git ls-files, which are
relative to the project root, with fs.exists(), which resolves them
relative to the test/fuzz subdirectory. As a result, every corpus
sample was silently skipped, and only the generated directives tests
were registered. Resolve the paths against the project source root
instead.

compress: handle ZSTD_CONTENTSIZE_UNKNOWN when decompressing blobs

The zstd blob decompression code assumed ZSTD_getFrameContentSize() always
returns the decompressed size, which holds for the journal and coredump blobs
systemd compresses itself with the one-shot ZSTD_compress(). Frames produced by
the streaming API, however, don't record the decompressed size in their header,
so ZSTD_getFrameContentSize() returns ZSTD_CONTENTSIZE_UNKNOWN. Per the zstd
documentation this is not an error, but decompress_blob_zstd() and
decompress_startswith_zstd() bailed out with -EBADMSG regardless.

This broke kexec of kernel images compressed with 'zstd' (e.g. zstd -22), whose
ZBOOT payload is decompressed via decompress_blob().

Treat only ZSTD_CONTENTSIZE_ERROR as fatal and, when the size is unknown, grow
the output buffer as we stream the frame out instead of relying on the recorded
size. Add a regression test that feeds a streaming-compressed (hence
unknown-size) zstd frame through decompress_blob() and decompress_startswith().

docs: Update memory pressure docs for latest GLib support for it

As of GLib 2.90.0 (not yet released), GLib will fully support the spec,
including the `MEMORY_PRESSURE_WATCH` and `MEMORY_PRESSURE_WRITE`
environment variables, which it did not support previously.

See https://gitlab.gnome.org/GNOME/glib/-/merge_requests/5046

test: suppress fails on the Africa/Tripoli (Libya) timezone

As it switched between CET and EET multiple times in the past, causing
random fails with certain historical timestamps:

834/1524 test - systemd:test-time-util FAIL 0.96s killed by signal 6 SIGABRT
...
TZ=Africa/Tripoli, tzname[0]=EET, tzname[1]=CEST
@378687574661411 → Thu 1981-12-31 23:59:34 CET → @378683974000000 → Thu 1981-12-31 23:59:34 EET
src/test/test-time-util.c:450: Assertion failed: Expected "ignore" to be true
/builddir/build/BUILD/systemd-261.1-build/systemd-261.1/tools/test-crash-trace.sh: line 18: 87088 Aborted (core dumped) "$@"

ukify: show all sections and profiles in inspect JSON output

The JSON output keyed every section by name, so a UKI with repeated
sections only showed the last one: in a multi-profile UKI all but one
.profile/.cmdline/.pcrsig were dropped, as were extra .dtbauto/.efifw.

Report the shared base sections by name at the top level (so .cmdline
etc. stay where they were), each profile as its own by-name object under
a new "_profiles" array, and the alternative-set sections (.dtbauto,
.efifw) as arrays.

Co-developed-by: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Paul Meyer <katexochen0@gmail.com>

boot: measure SMBIOS via non-TPM runtime measurements

In commit 29c6d1c12549 ("boot: measure select SMBIOS objects
explicitly") we added an explicit SMBIOS measurement in systemd-boot
and systemd-stub, covering cases where firmware doesn't measure SMBIOS
itself. measure_smbios() bailed unless a TPM was present, so it skipped
the measurement on confidential guests that only expose a CC
measurement protocol (e.g. Intel TDX RTMRs).

We can have much higher expectations of the virtual firmware used for
confidential computing, and the firmware is attested, so we can expect
it to always measure SMBIOS in this case. We still do our own
measurement to get a measurement structure similar to that of
non-confidential guests, and as another line of defense.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

shell-completion: add tdx to systemd-vmspawn --coco values

Missed in a78afc16168 ("vmspawn: add Intel TDX confidential VM
support")

Signed-off-by: Paul Meyer <katexochen0@gmail.com>

terminal-util: tweaks for show_menu() (#42822)

firstboot: add new systemd.firstboot=headless mode

This adds a new systemd.firstboot=headless mode. It differs from the
existing systemd.firstboot=no mode in that it still performs the
non-interactive auto-configuration that requires no user input (such
as selecting the only installed locale, or applying settings provided
via credentials), and only skips the prompts that would otherwise
block waiting for user input. In contrast, =no disables that
auto-configuration along with the prompts.

The option is also honoured by homectls firstboot logic and
systemd-cryptenroll, where headless behaves the same as no
(because the is no auto-configuration).

test: Add sysupdate gpg verification test

The test first checks that simple signatures work and that a foreign
signature is rejected. Then it also tests that sub key signatures are
accepted when the public sub key is included in the signature.

test: add test case for show_menu()

terminal-util: drop assert() on 'x'

We happily take a NULL strv as argument, equivalent to an empty one,
like most of our strv handling funcs.

terminal-util: make sure we never go below 10 characters line width

terminal-util: use LESS_BY() where appropriate

terminal-util: calculate array index only once

core: assorted hardening fixes flagged by kres (#42840)

bpf-restrict-fs: use a 32-bit magic key on big-endian too

The inner map is created with a uint32_t key, but the update passed
&magic[i] where magic is a (possibly 64-bit) statfs_f_type_t. On
little-endian the low 32 bits happen to be read; on big-endian 64-bit
(s390x, ppc64 BE) the zero high word is read instead, so every
filesystem collides on key 0 (the allow/deny selector) and
RestrictFileSystems= is silently broken. Pass a truncated copy.

Follow-up for 184b4f78cfbded54a6e06bbe1152256c204a7a73

core/scope: don't assert when start is retried during cgroup chown

While a delegated scope waits for the async cgroup chown helper it sits
in SCOPE_START_CHOWN (UNIT_ACTIVATING). unit_start() forwards to
->start() while activating, so scope_start() could be re-entered in
that state and trip assert(s->state == SCOPE_DEAD), aborting PID 1.
Treat SCOPE_START_CHOWN as already-starting instead.

Follow-up for 03860190fefce8bbea3a6f0e77919b882ade517c

core: donate the fdset to do_reexecute() to avoid a double free

do_reexecute() freed the FDSet on the switch-root/soft-reboot fallback
but the caller's copy stayed live, so main() freed it again if every
fallback exec then failed. Donate the fdset instead: pass it with
TAKE_PTR() and take ownership via a _cleanup_ local, freeing it exactly
once on every exit path.

Follow-up for 3c7878f94b02f65676889fa58a937ff4d4de4a4d

run: refuse --no-block when combined with --scope

In the systemd-run --scope mode, --no-block has no actual effect and will
be silently ignored. Therefore, this combination is explicitly rejected to
reduce confusion for users when using it.

Fixes: #42806

crypto-util, ssl-util: set log level to debug in dlopen_many_sym_or_warn() (#42811)

Otherwise in systems with OpenSSL 3 the journal contains multiple error
entries because the preferred library libcrypto.so.4 is not present.

Follow-up for ccdd42351f79cbb9c2e034a96280a1ded40a2f95

vmspawn: add Intel TDX confidential VM support (#42835)

mkfs-util: fixes for mtools invocation (#42839)

tree-wide: trivial sysconf() invocations tweaks (#42837)

shared/install: give the borrowed name back before bailing on error

In the unmask path, when install_changes_add() fails the borrowed
*name was not handed back via TAKE_PTR() before returning, so
install_info_clear() freed the caller's strv entry, leaving a dangling
pointer that is double-freed at the caller's strv_free().

Follow-up for f31f10a6207efc9ae9e0b1f73975b5b610914017

quotacheck: don't apply an invalid quotacheck.mode= value

quota_check_mode_from_string() returns -EINVAL on a bad value, storing
it in the global arg_mode. Only change arg_mode on success.

Follow-up for d73691c64e05650d838aaeb7da94fd8bdfb60907
Follow-up for dba4fe9a60e8876addcd6a597c9e1d5f529309ca

core: avoid using uninitialized buffer on bad systemd.random_seed=

unbase64mem() leaves p/sz uninitialized on failure, and the
deserialization doesn't bail out in that case, unlike other cases.

Follow-up for d247f232a8fd68f91769274f196566a6e9e75d15

tpm2-setup: Create and persist an endorsement key

This updates systemd-tpm2-setup to create and persist an endorsement
key if there isn't one already. For each supported EK template profile,
it will read the EK certificate from its NV index if there is one.
When there is an EK certificate present, a primary key is created using
the corresponding template. If the resulting EKpub matches the public
key in the certificate, the created EK is persisted and the process is
complete.

The low-range templates and the high-range RSA 2048/3072 and ECC NIST
P256/P384 storage templates are supported, as detailed in section
5.3 of the "TCG EK Credential Profile For TPM Family 2.0" spec v2.7.
High-range templates are preferred because these permit EK usage without
requiring knowledge of the authorization value for the endorsement
hierarchy, meaning that, like with the SRK, it is possible to restrict
the usage of the endorsement hierarchy whilst still permitting use of
the persistent EK.

The EK is always persisted at handle 0x81010001. This handle is
reserved in Table 2 of the "TCG TPM v2.0 Provisioning Guidance" spec
v1.0r1.0, although this is only a recommendation. This
handle is within the block of handles reserved for endorsement
primary keys in the "Registry of Reserved TPM 2.0 Handles and
Localities" spec v1.2r1.00. Section 2.3.2 of this specification also
makes a suggestion that there should be a relationship between the EK
certificate NV index and a corresponding persistent EK handle by
using handles at the same offsets within their respective ranges.
However, this contradicts the provisioning guidance spec which reserves
0x81010001 when there isn't a certificate at 0x01c00001. For simplicity,
I've chosen to use a single handle for the EK regardless of which profile
it is created with.

The "TCG EK Credential Profile For TPM Family 2.0" spec also provides a
way for endorsement keys to be certified with non-standard templates by
storing the template in a NV index. This is also supported.

The EK creation is not executed with tpm2-setup --early, as there's no
need for it to be created so early, unlike with the SRK. I also haven't
stored EKpub in /var/lib/systemd like with the SRKpub, as I'm not sure
there will be a use case for this yet.

A follow-up PR may be needed to add some internal helpers to make use
of the persisted EK, as use of low-range EKs requires a policy session.
High-range EKs can be used with a HMAC session because they have the
userWithAuth attribute set and we are creating them with an empty
authorization value.

https://trustedcomputinggroup.org/resource/http-trustedcomputinggroup-org-wp-content-uploads-tcg-ek-credential-profile/
https://trustedcomputinggroup.org/resource/tcg-tpm-v2-0-provisioning-guidance/
https://trustedcomputinggroup.org/resource/registry/

core: add absolute-path properties to varlink StartTransient

Similar to PR#42360 this commit adds missing properties for
absolute path handling in io.systemd.Unit.StartTransient for
the `Exec` context and a macro helper to share the common code.

The properties added are:
IPCNamespacePath, NetworkNamespacePath, RootDirectory, RootHashPath,
RootHashSignaturePath, RootImage, RootVerity, UserNamespacePath,
RootMStack.

Note that RootHashPath, RootHashSignaturePath need custom apply
functions because the varlink name "RootHashPath" differs from the
name that needs to be written into the unit file ("RootHash=")
and the iovec must be cleared.

tree-wide: fix return type of sysconf()

time-util: use sysconf_clock_ticks_cached() at all places we query _SC_CLK_TCK

memory-util: don't use 'r' for non-int returns

env-util: use our own sc_arg_max() helper at more places

mkfs-util: deal with mmd name clashes gracefully

In various cases we'll touch the same directories multiple times with
mtools: for example the /loader/ dir itself. Unfortunately mtools' mmd
does not implement a graceful "-p" switch like mkdir, but will do some
interactive name clash thing instead. We can turn this off via "-Ds
-DS", so let's do that. But that's not enough since the tool will still
fail with a non-explanatory exit status of 1. This hence ignores that
failure and proceeds anyway, under the assumption that failures to
create a directory will sooner or later be detected anyway once the
directory is to be populated and turns out not be existing because the
creation failed.

This makes some integration test/mkosi invocations work
non-interactively again.

Follow-up for f191ca982ce9eac85207807a65489909308f7d8f