This adds 6 functions to implement RestrictFileSystems=
* lsm_bpf_supported() checks if LSM BPF is supported. It checks that
cgroupv2 is used, that BPF LSM is enabled, and tries to load the BPF
LSM program which makes sure BTF and hash of maps are supported, and
BPF LSM programs can be loaded.
* lsm_bpf_setup() loads and attaches the LSM BPF program.
* lsm_bpf_unit_restrict_filesystems() populates the hash of maps BPF map with the
cgroupID and the set of allowed or denied filesystems.
* lsm_bpf_cleanup() removes a cgroupID entry from the hash of maps.
* lsm_bpf_map_restrict_fs_fd() is a helper function to get the file
descriptor of the BPF map.
* lsm_bpf_destroy() is a wrapper around the destroy function of the BPF
skeleton file.
It hooks into the file_open LSM hook and allows only when the filesystem
where the open will take place is present in a BPF map for a particular
cgroup.
The BPF map used is a hash of maps with the following structure:
cgroupID -> (s_magic -> uint32)
The inner map is effectively a set.
The entry at key 0 in the inner map encodes whether the program behaves
as an allow list or a deny list: if its value is 0 it is a deny list,
otherwise it is an allow list.
When the cgroupID is present in the map, the program checks the inner
map for the magic number of the filesystem associated with the file
that's being opened. When the program behaves as an allow list, if that
magic number is present it allows the open to succeed, when the program
behaves as a deny list, it only allows access if the that magic number
is NOT present. When access is denied the program returns -EPERM.
The BPF program uses CO-RE (Compile-Once Run-Everywhere) to access
internal kernel structures without needing kernel headers present at
runtime.
basic: split out glyph/emoji related calls from locale-util.[ch] into glyph-util.[ch]
These functions are used pretty much independently of locale, i.e. the
only info relevant is whether th locale is UTF-8 or not. Hence let's
give this its own pair of .c/.h files.
Luca Boccassi [Sun, 3 Oct 2021 15:50:38 +0000 (16:50 +0100)]
test: create and merge code coverage reports in integration tests
If -Db_coverage=true is used at build time, then ARTIFACT_DIRECTORY/TEST-XX-FOO.coverage-info
files are created with code coverage data, and run-integration-test.sh also
merges them into ARTIFACT_DIRECTORY/merged.coverage-info since the coveralls.io
helpers accept only a single file.
units: run user service managers at OOM score adjustment 100
Let's make it slightly more likely that a per-user service manager is
killed than any system service. We use a conservative 100 (from a range
that goes all the way to 1000).
Replaces: #17426
Together with the previous commit this means: system manager and system
services are placed at OOM score adjustment 0 (specifically: they
inherit kernel default of 0). User service manager (both for root and
non-root) are placed at 100. User services for non-root are placed at
200, those for root inherit 100.
Note that processes forked off the user *sessions* (i.e. not forked off
the per-user service manager) remain at 0 (e.g. the shell process
created by a tty or ssh login). This probably should be
addressed too one day (maybe in pam_systemd?), but is not covered here.
core: add a new setting DefaultOOMScoreAdjust= and set it to 100 above service manager's by default
Let's make our service managers slightly less likely to be killed by the
OOM killer by adjusting our services' OOM score adjustment to 100 above
ours. Do this conservatively, i.e. only for regular user sessions.
Luca Boccassi [Sat, 2 Oct 2021 10:48:13 +0000 (11:48 +0100)]
man/glib-event-glue example: relicense to CC0-1.0
All other examples were relicensed to CC0-1.0 since they are intended
to be copied and pasted anywhere without any restrictions.
Relicense the last one too.
Luca Boccassi [Fri, 1 Oct 2021 10:44:33 +0000 (11:44 +0100)]
man: add licenses to all files that lack one
Documentation is licensed under LGPL-2.1-or-later.
Scripts are MIT to facilitate reuse.
Examples are relicensed to CC0-1.0 to maximise copy-and-paste
for users, with permission from authors.
licensing: add spdx header to chromiumos helper, move license file
It makes it easier to process the license automatically like other files.
The text of the license in tools/chromiumos/LICENSE matches
https://spdx.org/licenses/BSD-3-Clause.html exactly.
This is just a stupid file list, but without the header the file shows
up on the list of files without a header. I checked that 'systemd-update-po'
still works, so I think it's OK to add this.
mount-util: fix fd_is_mount_point() when both the parent and directory are network fs
The second call to name_to_handle_at_loop() didn't check for the specific
errors that can happen when the parent dir is mounted by nfs and instead of
falling back like it's done for the child dir, fd_is_mount_point() failed in
this case.
Michael Biebl [Thu, 30 Sep 2021 23:00:28 +0000 (01:00 +0200)]
networkd-test: fix resolved_domain_restricted_dns
megasearch.net was meant to be a non-existing bogus domain, and had been
for a long time. But it seems some domain grabber recently registered
it, and it's an actual thing now:
$ host megasearch.net
megasearch.net has address 207.148.248.143
This causes the test to fail randomly.
Use search.example.com instead which yields
$ host search.example.com
Host search.example.com not found: 3(NXDOMAIN)
test: use a less restrictive portable profile when running w/ sanitizers
Since f833df3 we now actually use the seccomp rules defined in portable
profiles. However, the default one is too restrictive for sanitizers, as
it blocks certain syscall required by LSan. Mitigate this by using the
'trusted' profile when running TEST-29-PORTABLE under sanitizers.
Benjamin Berg [Fri, 24 Sep 2021 11:35:34 +0000 (13:35 +0200)]
test: Add failing/non-failing syscall filter test setting architecture
This adds a high level test verifying that syscall filtering in
combination with a simple architecture filter for the "native"
architecture works fine.
Benjamin Berg [Fri, 17 Sep 2021 11:05:32 +0000 (13:05 +0200)]
seccomp: Always install filters for native architecture
The commit 6597686865ff ("seccomp: don't install filters for archs that
can't use syscalls") introduced a regression where filters may not be
installed for the "native" architecture. This means that setting
SystemCallArchitectures=native for a unit effectively disables the
SystemCallFilter= and SystemCallLog= options.
Conceptually, we have two filter stages:
1. architecture used for syscall (SystemCallArchitectures=)
2. syscall + architecture combination (SystemCallFilter=)
The above commit tried to optimize the filter generation by skipping the
second level filtering when it is not required.
However, systemd will never fully block the "native" architecture using
the first level filter. This makes the code a lot simpler, as systemd
can execve() the target binary using its own architecture. And, it
should be perfectly fine as the "native" architecture will always be the
one with the most restrictive seccomp filtering.
Said differently, the bug arises because (on x86_64):
1. x86_64 is permitted by libseccomp already
2. native != x86_64
3. the loop wants to block x86_64 because the permitted set only
contains "native" (i.e. "native" != "x86_64")
4. x86_64 is marked as blocked in seccomp_local_archs
Thereby we have an inconsistency, where it is marked as blocked in the
seccomp_local_archs array but it is allowed by libseccomp. i.e. we will
skip generating filter stage 2 without having stage 1 in place.
The fix is simple, we just skip the native architecture when looping
seccomp_local_archs. This way the inconsistency cannot happen.
Hans de Goede [Tue, 31 Aug 2021 13:49:33 +0000 (15:49 +0200)]
hwdb: sensors: Fix some modalias matches no longer working with newer kernels
Kernels >= 5.8 have added new fields to the dmi/id/modalias file in the
middle of the modalias (instead of adding them at the end).
Specifically new ":br<value>:" and (optional) ":efr<value>:" fields have
been added between the ":bd<value>:" and ":svn<value>:" fields.
Note the 5.13.0 and 5.14.0 kernels also added a new ":sku<value>:" field
between the ":pvr<value>:" and ":rvn<value>:" fields, this has been fixed
in later 5.13.y and 5.14.y releases, by moving the sku field to the end:
https://lore.kernel.org/lkml/20210831130508.14511-1-hdegoede@redhat.com/
Unfortunately the same cannot be done for the new br and efr fields since
those have been added more then a year ago and hwdb even already has some
newer entries relying on the new br field being there (and thus not working
with older kernels).
Fix the issue with the br and efr fields through the following changes:
1. Replace any matches on ":br<value>" from newer entries with an '*'
2. Replace "bd<value>:svn<value>" matches with: "bd<value>:*svn<value>"
inserting an '*' where newer kernels will have the new br + efr fields
This makes these matches working with old as well as new kernels.
Let's switch from the low-level SHA256 APIs to EVP APIs. The former are
deprecated on OpenSSL 3.0, the latter are supported both by old
OpenSSL and by OpenSSL 3.0, hence are the better choice.