It's explicitly for using in virtualization. Hence it's suitable for
detecting it as generic fallback.
This hence adds the check, similar to how we already look for one other
qemu-specific devicetree.
I ran into this while playing around with the new Pixel "Linux Terminal"
app from google which runs a Debian in a crosvm apparently. So far
systemd didn't recognize execution in it at all. Let's at least
recognize it as VM at all, even if this doesn't recognize it as
crosvm.
Currently, if you boot PID 1 in a container you always see a complaint
that BPF LSM won't work. That's fine, and log worthy, but probably not
above debug level. After all this is a really common case, and we should
gracefully adopt to our execution environment.
Luca Boccassi [Fri, 7 Mar 2025 11:58:13 +0000 (11:58 +0000)]
load-fragment: Fix config_parse_namespace_flags() for DelegateNamespaces= (#36633)
Boolean values have to be handled separately for RestrictNamespaces=
because
they get stored in a field with reverse meaning (which namespaces are
retained),
so let's check which field we're parsing and set the proper value
accordingly.
Daan De Meyer [Thu, 6 Mar 2025 16:31:49 +0000 (17:31 +0100)]
TEST-13-NSPAWN: Set TERM=dumb when calling machinectl shell
We only consider something not a tty if it's not connected to a tty
and not connected to /dev/null, so let's use the environment variable
instead to tell machinectl shell that it shouldn't do any of its TTY
stuff.
terminal-util: during terminal reset clear from beginning of line to end of screen
tianocore does some weird shit with its terminal emulation and regular
fills half the terminal with grey background and then invokes us with
this not cleared up. Hence let us clear this up for it: as part of the
ansi sequence based reset let's position the cursor explicitly at the
beginning of the current line, and erase everything till the end of the
screen. This makes boot output in tianocore vms much much cleaner.
Note that this does *not* erase any terminal output *before* the cursor
position where we take over, because that typically contains valuable
information still we should not erase.
@poettering hrm, there's still one thing unclear to me: we currently
have no way for canceling factory reset via IPC. And adding that to
varlink service solely doesn't seem feasible either, since the state
departs from the active state of `factory-reset.target` and it would
become impossible to re-request it without restarting
`factory-reset.target` _and all dependencies_, which feels
unmaintainable.
Daan De Meyer [Thu, 6 Mar 2025 13:15:34 +0000 (14:15 +0100)]
load-fragment: Fix config_parse_namespace_flags() for DelegateNamespaces=
Boolean values have to be handled separately for RestrictNamespaces= because
they get stored in a field with reverse meaning (which namespaces are retained),
so let's check which field we're parsing and set the proper value accordingly.
gpt-auto-generator: do not apply image policy on the root fs and /usr/ fs
At the moment the gpt-auto generator does its things we already
transitioned into the host OS, i.e. the root fs and /usr/ are mounted.
Hence suppress image policy checks for those two partitions.
This actually matters, because the root hash/usr hash is taken into
consideration for the image policy checks, but we don't have that in
gpt-auto and hence would refuse operation claiming policy conflicts
event though we never actually operate on the root fs via the dissection
logic.
The partition enumeration only runs on the main system, and we test that
early, hence no point in repeating this in functions further down the
call chain. But let's keep it in place as assert()s, just in case.
Also, move the top-level in_initrd() into add_mounts(), so that the
tests are nicely encapsulated in the code they protect.
This new helper patches a provide image policy, setting the policy for
specified designators to "ignore".
This is useful for contexts where we only want to mount some subset of
the available partitions, and hence don't care about the parts of the
policy that cover the others. Specifically this is useful in
systemd-gpt-auto-generator, which runs at a moment the root file system
is already established, and hence the policy for the root file system
can be ignored, the facts are already established.
Daan De Meyer [Wed, 5 Mar 2025 20:27:17 +0000 (21:27 +0100)]
mkosi: update fedora commit reference
* 4ab2a9e539 Drop old self-Obsoletes and provides
* ec182495e7 Drop libbpf versioned dependency version to 1.4.7
* 1f8d2b0ebd Make self-obsoletes for the sysusers split conditional
* 0d95af264f Include epoch in versioned libbpf dependency
* 8230f501b6 Make sure we pull in libbpf >= 1.5.0 if libbpf is installed
We do make use of the os-release ids to determine whether to initial resume
if they're present, hence log at warning level if invalid. While at it,
raise the level for the kernel version too, which is generally interesting
to the user if something goes wrong.
Mike Yuan [Wed, 5 Mar 2025 16:01:04 +0000 (17:01 +0100)]
factory-reset-tool: error out if we can't cancel pending reset
First of all, it seems very unlikely that we'd be in the pending state
if not booted via EFI in the first place. Moreover, the operation didn't
work out, hence let's not spurious report success.
Mike Yuan [Wed, 5 Mar 2025 14:54:26 +0000 (15:54 +0100)]
units: refuse manual operations on factory-reset-now.target and friends
It is strictly mandatory that this is done during initial
transaction, and not later when the system is already running.
Hence let's refuse manual start for all of the involved units.
Additionally, refuse manual stop for systemd-factory-reset-complete.service,
as it flags the factory reset completion through
/run/systemd/factory-reset-complete, which never gets removed
for the whole boot.
Thorsten Kukuk [Fri, 28 Feb 2025 13:01:16 +0000 (14:01 +0100)]
sysupdate: fix features and vaccum if all features are disabled
If all transfer definitions are features and disabled, a wrong error
is reported that there are no transfer definitions.
This breaks the features and vaccum verb, as they work on disabled
features, too.
Luca Boccassi [Wed, 5 Mar 2025 12:36:45 +0000 (12:36 +0000)]
test-async: Wait for asynchronous_sync() to finish (#36611)
Otherwise, if the system is busy, TEST-02-UNITTESTS will fail as
systemd will time out trying to kill the transient unit that we're
running test-async in.
run0: run agents during setup, until pty forwarder takes over
When services start up they might query for passwords, or issue polkit
requests. Hence it makese sense to run the password query agent and
polkit agent from systemd-run. We already ran the polkit agent, this
also ensures we run the password query agent.
There's one tweak to the story though: running the agents and the pty
forwarder concurrently is messy, since they both try to read from stdin
(one potentially, the other definitely). Hence, let's time the agents
properly: invoke them when we initialize, but stop them once the start
job for the unit we are supposed to run is complete, and only then run
the pty forwarder.
With this in place, the following series of commands starts to work
really nicely (which previously deadlocked):
# homectl create foobar
# run0 -u foobar
What happens in the background in run0 is this: a new session is invoked
for "foobar", which pulls in the user@.service instance for the user.
That user@.service instance will need to unlock the homedir first. Since 8af1b296cb2cec8ddbb2cb47f4194269eb6cee2b this will happen via the askpw
logic. With this commit here this prompt will now be shown by run0. Once
the password is entered the directory is unlocked and the real session
begins. Nice!
This new behaviour is conditioned behind --pty-late (distinct from the
existing --pty switches). For systemd-run we will never enable this mode
by default, for compat with command lines that use ExecStartPre=
(because we won't process the pty anymore during that command) For
run0 however this changes the default to --pty-late (unless
--no-ask-password is specified). This reflects the fact that run0 is
more of an interctive tool and unlikely to be used in more complex
service start-up situations with ExecStartPre= and suchlike.
This also merges JobDoneContext into RunContext, since it doesn't really
make sense to have two contexts around to communicate between outer
stack frame and event handlers. Let's just have one, and pass it around
to all handlers the same way. In particular as we should delay exit only
until both the unit's job is complete *and* in case of --wait the unit
is exited, one of the two should not suffice.
gpt-auto symlinks: take factory reset mode into consideration
In relevant factory reset situation the root disk itself is subject to
removal. This somewhat conflicts with automatic root disk discovery,
since the system first comes up with one candidate for the root disk,
which is then replaced by another.
Let's address this by determining at the moment of probing for the
gpt-root logic what the factory reset state currently is. This is then
used to maintain two distinct symlinks to the gpt auto root device: one
which is always available and one that is only available if factory
reset is off or complete.
The new symlinks is not used by anything yet. This will be added in a
later commit.
units: also require /dev/tpm0 to be around before tpm2.target can be reached
While we typically just use /dev/tpmrm0 for accessing the TPM chip (i.e
via the kernel's own resource manager), some sysfs properties that
matter are on /dev/tpm0 only (i.e. the version without the kernel TPM
resource manager). Hence, wait for both to show up in tpm2.target, so
that we can be sure the full API is available.
This matters because we want to access /sys/class/tpm/tpm0/ppi/request
in the next commit.
1. The factory-reset.target unit that requests a factory reset is now
complemented by factory-reset-now.target that executes it at next
boot.
2. This latter is added to the initial transaction via the new trivial
systemd-factory-reset-generator.
3. A tool systemd-factory-reset has been added to query, request,
cancel, complete factory reset operations (via EFI variables). Two of
these are wrapped into units that are plugged into
factory-reset.target and factory-reset-now.target respectively. The
tool also provides a simple Varlink API.
This should make things a lot cleaner, and both be useful as explicit
implementation on UEFI, and as template + hookpoints for alternative
implementations on non-UEFI.
Let's provide a generic implementation of the systemd.factory_reset
kernel cmdline checking repart implements. Moreover add support for
leaving the factory reset state again.
This only establishes the basic APIs, it does not hook them up with
anything.
Daan De Meyer [Tue, 4 Mar 2025 21:31:39 +0000 (22:31 +0100)]
test-async: Wait for asynchronous_sync() to finish
Otherwise, if the system is busy, TEST-02-UNITTESTS will fail as
systemd will time out trying to kill the transient unit that we're
running test-async in.
Mike Yuan [Tue, 4 Mar 2025 17:49:04 +0000 (18:49 +0100)]
missing_syscall: drop raw_getpid()
This used to be relevant since in old versions of glibc an internal
cache is maintained, while we might sidestep their invalidation
with raw_clone(). After glibc 2.25 getpid() is a trivial wrapper
for the syscall, and hence there's no need to have a separate
raw_getpid().
nspawn: add ability to poweroff container cleanly with ^]^]p
It's sometimes very useful to be able to terminate a container quickly
but cleanly while talking to it. Introduce a hotkey for that: ^]^]p for
powering it off. In similar style add ^]^]r for rebooting it.
We'll add another type of handler callback in the next commit, hence
rename the existing handler to be more precise what it is about:
handling hangups (either inline via tty, or explicit via user request)
Michal Koutný [Mon, 17 Feb 2025 14:40:24 +0000 (15:40 +0100)]
path: Close inotify FD asynchronously
inotify FD may take several milliseconds to close. We measured
daemon-reload
default: (0.427 ± 0.05) s
async: (0.323 ± 0.02) s
with 5 path units out of 422 units. I.e. ~1% of units cause ~25% of
delay, hence this fix seems like low-hanging fruit on the daemon-reload
critical path.
Particular inotify slowness pointed out by @fbuihuu.
We always validate that the target value is below _LOG_TARGET_SINGLE_MAX
before acessing it, but we don't actually size the array like that.
let's fix that.
This doesn#t effectively change anything, but it makes things more
explicit what the limit here is.
dns-stream: only read DNS packet data if we identified the peer properly
If we use TCP fastopen to connect to a DNS server via TCP, and it
responds really quickly between our connection attempt and our immediate
check back, then we have not identified the peer yet, and will not be
able to use the peer metadata to fill in our packet info.
Let's fix that, and simply not read from the socket until identification
is complete.
Yu Watanabe [Tue, 18 Feb 2025 18:09:38 +0000 (03:09 +0900)]
pe-binary: fix array overrun
This is a kind of paranoia, as memeqzero() does not read anyting if
length is zero. But, strictly speaking C language does not allow such,
and Coverity warn about that.