We would execute up to four hwdb match patterns (+ the keyboard builtin):
After the first hit, we would skip the other patterns, because of the GOTO="evdev_end"
action.
57bb707d48131f4daad2b1b746eab586eb66b4f3 (rules: Add extended evdev/input match
rules for event nodes with the same name), added an additional match with
":phys:<phys>:ev:<ev>" inserted. This breaks backwards compatibility for user
hwdb patterns, because we quit after the first match.
In general hwdb properties are "additive". We often have a general rule that
matches a wider class and then some specific overrides. E.g. in this particular
case, we have a match for all trackpoints, and then a bunch of model-specific
settings.
So let's change the rules to try all the match patterns and combine the
received properties. We execute builtin-keyboard once at the end, if there was
at least one match.
This also impacts other cases which I think would be very confusing for users.
Since we quit after a first successful match, if we had e.g. a match for
'evdev:input:b*v*p*' in out database, and the user added a match using
'evdev:name:*', which is the approach we document in the .hwdb files and which
users quite often use, it would be silently ignored. What's worse, if we added
our 'evdev:input:b*v*p*' match at a later point, user's match would stop
working. If we combine all the properties, we get more stable behaviour.
Yu Watanabe [Tue, 6 Dec 2022 21:49:17 +0000 (06:49 +0900)]
hexdecoct: several cleanups for base64_append()
- add missing assertions,
- use size_t for buffser size or memory index,
- handle empty input more gracefully,
- return the length or the result string,
- fix off-by-one issue when the prefix is already long enough.
This drops the special casing for s390 and other archs, which was
cargo-culted from glibc. Given it's not obvious why it exists, and is at
best an optimization let's simply avoid it, in particular as the archs
are relatively non-mainstream.
msizanoen1 [Wed, 7 Dec 2022 16:22:05 +0000 (23:22 +0700)]
core/sleep: set timeout for freeze/thaw operation to 1.5 seconds
A FreezeUnit operation can hang due to the presence of kernel threads
(see last 2 commits). Keeping the default configuration will mean the
system will hang for 25 seconds in suspend waiting for the response. 1.5
seconds should be sufficient for most cases.
msizanoen1 [Wed, 7 Dec 2022 16:09:33 +0000 (23:09 +0700)]
core/cgroup: ignore kernel cgroup.events when thawing
The `frozen` state can be `0` while the processes are indeed frozen (see
last commit). Therefore do not respect cgroup.events when checking
whether thawing is necessary.
dissect: add a mode for operating on an in-memory copy of a DDI, instead of directly on it
This is useful for operating in ephemeral, writable mode on any image,
including read-only ones. It also has the benefit of not keeping the
image file's filesystem busy.
"pcrUpdateCounter – this parameter is updated by TPM2_PolicyPCR(). This value
may only be set once during a policy. Each time TPM2_PolicyPCR() executes, it
checks to see if policySession->pcrUpdateCounter has its default state,
indicating that this is the first TPM2_PolicyPCR(). If it has its default value,
then policySession->pcrUpdateCounter is set to the current value of
pcrUpdateCounter. If policySession->pcrUpdateCounter does not have its default
value and its value is not the same as pcrUpdateCounter, the TPM shall return
TPM_RC_PCR_CHANGED.
If this parameter and pcrUpdateCounter are not the same, it indicates that PCR
have changed since checked by the previous TPM2_PolicyPCR(). Since they have
changed, the previous PCR validation is no longer valid."
The TPM will return TPM_RC_PCR_CHANGED if any PCR value changes (no matter
which) between validating the PCRs binded to the enrollment and unsealing the
HMAC key, so this patch adds a retry mechanism in this case.
Space Meyer [Wed, 7 Dec 2022 13:11:30 +0000 (14:11 +0100)]
journald: prevent segfault on empty attr/current
getpidcon() might set con to NULL, even when it returned a 0 return
code[0]. The subsequent strlen(con) will then cause a segfault.
Alternatively the behaviour could also be changed in getpidcon. I
don't know whether the libselinux folks are comitted to the current
behaviour, but the getpidcon man page doesn't really make it obvious
this case could happen.
msizanoen1 [Wed, 7 Dec 2022 13:46:01 +0000 (20:46 +0700)]
core/unit: allow overriding an ongoing freeze operation
Sometimes a freeze operation can hang due to the presence of kernel
threads inside the unit cgroup (e.g. QEMU-KVM). This ensures that the
ThawUnit operation invoked by systemd-sleep at wakeup always thaws the
unit.
msizanoen1 [Wed, 7 Dec 2022 09:54:13 +0000 (16:54 +0700)]
sleep: always thaw user.slice even if freezing failed
`FreezeUnit` can fail even when some units did got frozen, causing some
user units to be frozen. A possible symptom is `user@.service` being
frozen while still being able to log in over SSH.
If given, multiple initrds are concatenated into a temporary file which then
becomes the .initrd section.
It is also possible to give no initrd. After all, some machines boot without an
initrd, and it should be possible to use the stub without requiring an initrd.
(The stub might not like this, but this is something to fix there.)
ukify: try to find the uname string in the linux image if not specified
The approach is based on mkinicpio's autodetection.
This is hacky as hell. Some cases are actually fairly nice: ppc64el images have
a note that contains 'uname -r'. (The note is not uniquely labeled at all, and
only contains the release part instead of the full version-hostname-release
string, and we don't actually care about ppc, and it's very hard to read the
note from Python, but in general that'd be the approach I'd like.)
I opted to simply read and decompress the full linux binary in some cases.
Python doesn't make it easy to do streaming decompression with regexp matching,
and it doesn't seem to matter much: the image decompresses in a fraction of a
second.
Some gymnastics were needed to import ukify as a module. Before the file
was templated, this was trivial: insert the directory in sys.path, call import.
But it's a real pain to import the unsuffixed file after processing. Instead,
the untemplated file is imported, which works well enough for tests and is
very simple.
The tests can be called via pytest:
PATH=build/:$PATH pytest -v src/ukify/test/test_ukify.py
or directly:
PATH=build/:$PATH src/ukify/test/test_ukify.py
or via the meson test machinery output:
meson test -C build test-ukify -v
or without verbose output:
meson test -C build test-ukify
The option is added because we have a similar one for kernel-install. This
program requires python, and some people might want to skip it because of this.
The tool is installed in /usr/lib/systemd for now, since the interface might
change.
A template file is used, but there is no .in suffix.
The problem is that we'll later want to import the file as a module
for tests, but recent Python versions make it annoyingly hard to import
a module from a file without a .py suffix. imp.load_sources() works, but it
is deprecated and throws warnings.
importlib.machinery.SourceFileLoader().load_module() works, but is also
deprecated. And the documented replacements are a maze of twisted little
callbacks that result in an empty module.
So let's take the easy way out, and skip the suffix which makes it easy
to import the template as a module after adding the directory to sys.path.
Features:
- adds sections .linux, .initrd, .uname, .osrel, .pcrpkey, .pcrsig, .cmdline, .splash
- multiple initrds can be concatenated
- section flags are set properly (READONLY, DATA or CODE)
- uses systemd-measure to precalculate pcr measurements and create a signed json policy
- the inner linux image will be signed automatically with sbsign if unsigned
- uses sbsign to sign the output image
- offsets are calculated so that sections are placed adjacent, with .linux last
- custom sections are possible
- multiple pcr signing keys can be specified and different boot phase paths can be
signed with different keys
- most things can be overriden (path to tools, stub file, signing keys, pcr banks,
boot phase paths, whether to sign things)
- superficial verification of slash bmp is done
- kernel uname "scraping" from the kernel if not specified (in a later patch)
TODO:
- change systemd-measure to not require a functional TPM2. W/o this, we'd need
to support all banks in the build machine, which is hard to guarantee.
- load signing keys from /etc/kernel/
- supress exceptions, so if something external fails, the user will not see a traceback
- conversion to BMP from other formats
selinux: accept the fact that getxyzcon() can return success and NULL
Inspired by #25664: let's check explicitly for NULL everywhere we do one
of those getXYZcon() calls.
We usually turn this into EOPNOTSUPP, as when selinux is off (which is
supposed to be the only case this can happen according to selinux docs)
we otherwise return EOPNOTSUPP in that case.
Note that in most cases we have an explicit mac_selinux_use() call
beforehand, hence this should mostly not be triggerable codepaths.
With the commit 5d0030310c134a016321ad8cf0b4ede8b1800d84, networkd manages
addresses with the detailed hash and compare functions. But that causes
networkd cannot detect address update by the kernel or an external tool.
See issue
https://github.com/systemd/systemd/issues/481#issuecomment-1328132401.
With this commit, networkd (again) manages addresses in the way that the
kernel does. Hence, we can correctly detect address update.
It may take a bit for newly introduced binaries/other files to get
properly integrated into the Rawhide specfile, so don't choke up in the
meantime when rpmbuild detects unpackaged files.
Systemd documents "halt" as the primary shutdown mechanism, redirecting
"reboot" and "shutdown" to the halt(8), but halt is a really strange and
obsolete concept. Who would want to really keep their machine running after
shutdown? I expect that halting is almost unused. Let's at least make it less
prominent in the docs.
While at it, use "power off" for a verb and "power-off" for noun (but "poweroff"
of the actual command name).
Yu Watanabe [Tue, 22 Nov 2022 04:03:55 +0000 (13:03 +0900)]
network: drop REMOVING flag when a netlink message is sent to kernel
When an interface goes to down, the kernel drops several routes
automatically, and at the same time networkd requests to remove
them, but the kernel sometimes does not respond the requests. Hence,
the routes cannot drop the REMOVING flag, and networkd will never try
to configure other routes which depend on the previously removed
routes even if they are already reconfigured.
With this patch, when networkd sends a request to configure a route
(or any other network settings), REMOVING flag for the route is dropped
without waiting for the reply about the previous remove request, as we
can expect it will appear even if it is already removed or under removing.
meson: build a standalone version of systemd-shutdown
I'd like to use this as a basis for an exitrd:
When compiled with -Dstandalone-binaries=true -Db_lto=true -Dbuildtype=release,
the new file is 800k. It's more than I'd like, but still quite a bit less
than libsystemd-shared.so, which is 3800k.
Yu Watanabe [Tue, 6 Dec 2022 04:06:57 +0000 (13:06 +0900)]
boot: cleanups for efivar_get() and friends
- rename function arguments for storing results, and support the case
that they are NULL,
- return earlier on error,
- always validate read size in efivar_get_uint32_le() and efivar_get_uint64_le().
Richard Phibel [Mon, 5 Dec 2022 12:40:41 +0000 (13:40 +0100)]
log: Switch logging to runtime when FS becomes read-only
The journal has a mechanism to log to the runtime journal if it fails to
log to the system journal. This mechanism is not triggered when the file
system becomes read-only. We enable it here.
When appending an entry fails if shall_try_append_again returns true,
the journal is rotated. If the FS is read-only, rotation will fail and
s->system_journal will be set to NULL. After that, when find_journal
will try to open the journal since s->system_journal will be NULL, it
will open the runtime journal.
Before we supported pivot_root() nspawn used to make the rootfs shared
before setting up the mount tunnel. So it was safe for it to just turn
it into a dependent mount during setup.
However, we cannot do this anymore because of the requirements
pivot_root() has. After the pivot_root() we will make the rootfs shared
recursively. If we turned the mount tunnel into dependent mount before
mount_switch_root() this will have the consequence that it becomes a
shared mount within the same peer group as the rootfs. So no mounts will
propagate into the container from the host anymore.
To fix this we split setting up the mount tunnel and making it active
into two steps. Setting up the mount tunnel is performed before
mount_switch_root() and activating it afterwards. Note that this works
because turning a shared mount into a shared mount is a nop. IOW, no new
peer group will be allocated.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
nspawn: mount temporary visible procfs and sysfs instance
In order to mount procfs and sysfs in an unprivileged container the
kernel requires that a fully visible instance is already present in the
target mount namespace. Mount one here so the inner child can mount its
own instances. Later we umount the temporary instances created here
before we actually exec the payload. Since the rootfs is shared the
umount will propagate into the container. Note, the inner child wouldn't
be able to unmount the instances on its own since it doesn't own the
originating mount namespace. IOW, the outer child needs to do this.
So far nspawn didn't run into this issue because it used MS_MOVE which
meant that the shadow mount tree pinned a procfs and sysfs instance
which the kernel would find. The shadow mount tree is gone with proper
pivot_root() semantics.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
In order to support pivot_root() we need to move mount propagation
changes after the pivot_root(). While MS_MOVE requires the source mount
to not be a shared mount pivot_root() also requires the target mount to
not be a shared mount. This guarantees that pivot_root() doesn't leak
any mounts.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>