Daan De Meyer [Tue, 15 Aug 2023 10:10:14 +0000 (12:10 +0200)]
mkfs-util: Don't set MKE2FS_DEVICE_PHYS_SECTSIZE
We only care about the logical sector size and if the physical sector
size isn't set and we're operating on a file, mke2fs will default the
physical sector size to the logical block size anyway.
This change makes sure that if we're operating on a block device and
set an explicit logical sector size, that doesn't affect the physical
sector size.
Daan De Meyer [Mon, 14 Aug 2023 19:57:59 +0000 (21:57 +0200)]
fd-util: Use /proc/pid/fd instead of /proc/self/fd
Currently, we mount via file descriptors using /proc/self/fd. This
works, but it means that in /proc/mounts and various other files,
the source of the mount will be listed as /proc/self/fd/xxx. For other
software that parses these files, /proc/self/fd/xxx doesn't mean anything,
or worse, it means the completely wrong thing, as it will refer to one of
their own file descriptors instead.
Let's improve the situation by using /proc/pid/fd instead. This allows
processes parsing /proc/mounts to do the right thing more often than not.
One scenario where even this doesn't work if when containers are involved,
as with the pid namespace unshared, even /proc/pid/fd will mean the wrong
thing, but it's no worse than /proc/self/fd which will always means the wrong
thing.
This also doesn't work if we mount via file descriptor and then exit, as the pid will
be gone, but it does work as long as the process that did the mount is alive, which
makes it useful for systemd-dissect --with for example if the program we run in the
image wants to parse /proc/mounts.
Daan De Meyer [Mon, 14 Aug 2023 14:44:30 +0000 (16:44 +0200)]
repart: Add Subvolumes= setting
This setting indicates which directories in the target partition
should be btrfs subvolumes. If set, we'll try to create these
directories as subvolumes.
Note that this only works when running as root without --offline,
as mkfs.btrfs does not support creating subvolumes.
Daan De Meyer [Mon, 14 Aug 2023 13:33:15 +0000 (15:33 +0200)]
copy: Add support for creating subvolumes to copy_tree_at()
The subvolumes set is a set of source inodes similar to how the
denylist hashmap contains source inodes as keys. It indicates
directories in the source tree that should become subvolumes in
the target tree.
Daan De Meyer [Sat, 12 Aug 2023 11:30:46 +0000 (13:30 +0200)]
repart: Use 4096 as the fallback sector size for verity/luks/filesystems
When we don't know the sector size of the actual block device, because
we're building an image in a loopback file and no sector size was specified
explicitly, let's use 4096 as the sector size for filesystems, verity and
LUKS. This should be the most compatible option, since 4096 will also work
on devices with sector size 512 or 2048.
For the actual GPT partition table size, we stick with 512 as the default
value since UEFI firmware and the kernel will only try to read the GPT
partition table from the first LBA on the device and the sector size for
most devices is still 512. It can also be trivially modified when copying
the image to another device using --copy-from + --sector-size.
Luca Boccassi [Sat, 12 Aug 2023 14:15:55 +0000 (15:15 +0100)]
test: skip test-path on Salsa CI
Salsa is the Debian git forge. In the package build environment test-path
always fails as we cannot set up cgroups and so the path unit fails to
start. Skip the test in that environment.
Unfortunately meson doesn't allow to skip individual tests by name.
Frantisek Sumsal [Fri, 11 Aug 2023 14:46:53 +0000 (16:46 +0200)]
test: introduce TEST-08-INITRD
And move the initrd related tests from TEST-01-BASIC there.
Additionally, this should provide coverage for recemt shutdown initrd
related issues, see:
- https://github.com/systemd/systemd/issues/28645
- https://github.com/systemd/systemd/pull/28648
- https://github.com/systemd/systemd/pull/28793
This makes tmpfiles, sysusers, and udevd invoked in the following order:
1. systemd-tmpfiles-setup-dev-early.service
Create device nodes gracefully, that is, create device nodes anyway
by ignoring unknown users and groups.
2. systemd-sysusers.service
Create users and groups, to make later invocations of tmpfiles and
udevd can resolve necessary users and groups.
3. systemd-tmpfiles-setup-dev.service
Adjust owners of previously created device nodes.
4. systemd-udevd.service
Process all devices. Especially to make block devices active and can
be mountable.
5. systemd-tmpfiles-setup.service
Setup basic filesystem.
The commit 112a41b6ece19d03e951d886fe2f26512ab31fab introduces #28765,
as systemd-tmpfiles-setup.service has ordering after local-fs.target,
but usually the target requires block devices processed by udevd.
Hence, the service can only start after the block devices timed out.
(I guess in the original patch authors usecase the root fs actually
*does* remain in memory, but that's a special case and does not belong
in the man pages this way).
This change makes sure a data copy using copy_bytes() does not exceed the
max_bytes value when using COPY_HOLES and max_bytes stops before the next
data section.
Yu Watanabe [Thu, 10 Aug 2023 19:48:01 +0000 (04:48 +0900)]
coredump: fix various invalid memory access
Previously, we did not check error from iovw_put(). If it fails, the
target iovw may have no iov or partial iovs from the journal importar.
So, the finalization may cause underflow and may access and free invalid
memory.
vconsole-setup: use "@kernel" rather than "kernel" as special string to leave keymap untouched
This is a magic string, and we should avoid stepping into the territory
of normal keymap names with that, given that users can pick names
otherwise freely.
Hence, prefix the name with a special char to avoid any namespace
issues.
file-io: let's use offsetof() rather than sizeof() for determining EFI_FILE_INFO prefix size
The gnu-efi definition of the struct uses [1], our local one [0] to size
the filename array. Let's avoid an ambiguity and use offsetof() so that
this difference doesn't matter. Also, doing it this way makes very clear
to the read what happens here: it's a structure with a variable size
suffix.
Daan De Meyer [Thu, 10 Aug 2023 15:05:55 +0000 (17:05 +0200)]
ukify: Use length= instead of ignore_padding= in inspect
ignore_padding= was only added in a recent version of pefile. Let's
set length= to the virtual size instead which is what ignore_padding
does behind the scenes so we're compatible with older versions of
pefile.
Michal Koutný [Wed, 9 Aug 2023 19:31:58 +0000 (21:31 +0200)]
mkosi: Copy sources under /usr in the image
Originally, the source code was copied under /root/src.
This home directory is part of root FS and the new mkosi building
paradigm has only ephemeral root FS that is generated lazily.
Any files placed on the root FS in the build environment are that
excluded from the final image.
It is useful to have source codes available in the image's runtime (not
build time) environment for debugging.
ExtraTrees= as used currently are ineffective, so change the destination
to copy files under /usr to achieve the intention.
gdb sees source files as:
> 1354 ../src/src/systemctl/systemctl.c: No such file or directory.
Modify gdb configration in the built image accordingly (that file cannot
be in /root neither) to resolve to the moved sources.
(Commit fdecbf7 ("Enable unprivileged image builds") envisions bind
mounting or virtiofsd for nspawn or qemu containers respectively.)
journalcl: simplify handling of stdout being a regular file and epoll()
Let's not check the fd type beforehand, let's instead gracefully handle
if we get EPERM back from epoll_ctl() because the fd doesn't do epoll.
THis should be safer and more generic.
The epoll_ctl(2) man page clearly documents EPERM is being returned in
this case, hence it's safe to check for exactly that case.