core: add RestrictFileSystemAccess= BPF LSM for dm-verity execution enforcement
Add a new RestrictFileSystemAccess= boolean setting in the [Manager] section of
system.conf that enforces execution only from signed dm-verity block
devices and the initramfs during early boot.
When RestrictFileSystemAccess=yes is set, PID1 loads a BPF LSM program early in boot
that:
Integrity tracking (self-populating, no userspace involvement):
- bdev_setintegrity: records dm-verity signature status in a BPF hash
map when the kernel signals device integrity via
security_bdev_setintegrity()
- bdev_free_security: removes devices from the map on teardown
Trust anchors:
- Signed dm-verity volumes (sig_valid flag in the BPF map)
- Initramfs (s_dev captured at load time, cleared after switch_root)
- Everything else is denied (tmpfs, procfs, sysfs, anonymous PROT_EXEC)
PID1 requires dm-verity require_signatures=1 to be enabled and refuses
to load the BPF program otherwise, ensuring the kernel enforces that all
dm-verity devices carry valid signatures.
After attach, PID1 extracts owned FDs from the skeleton (link FDs +
.bss map FD) and lets the skeleton be destroyed. The dup'd link FDs
keep programs attached via the kernel reference chain (link FD ->
bpf_link -> bpf_prog -> bpf_map). Destroying the skeleton unmaps the
.bss page from PID1's address space so no BPF state is readable via
/proc/1/mem. The .bss map FD is retained for targeted writes (clearing
initramfs_s_dev after switch_root via mmap).
Signed-off-by: Christian Brauner <brauner@kernel.org>