Luca Boccassi [Thu, 27 Jan 2022 14:10:34 +0000 (14:10 +0000)]
core: do not attempt to add 'private' symlinks when RootImage/RootDirectory are used
A bind mount is added directly from private on the host to the actual
destination directory, no need for the symlinks (which cannot be created
as the bind mount happens first and creates the target as an actual directory)
Luca Boccassi [Wed, 26 Jan 2022 19:00:25 +0000 (19:00 +0000)]
core: do not restart a service with Restart=always when ExecCondition fails
When a Condition*= fails, and a service has Restart=always,
the service is not restarted.
Follow the same behaviour for ExecCondition= to avoid inconsistencies.
Daan De Meyer [Wed, 26 Jan 2022 12:08:50 +0000 (12:08 +0000)]
shared: Ensure COPY_HOLES copies trailing holes
Previously, files with a hole at the end would get silently truncated
which breaks reading journal files. This commit makes sure that holes
are punched in existing space and if no more space is available, that
we grow the file and the hole by using ftruncate().
The corresponding test is extended to put a hole at the end of the file
and we make sure that hole is copied correctly.
Yu Watanabe [Wed, 26 Jan 2022 07:48:08 +0000 (16:48 +0900)]
wait-online: make manager_link_is_online() return 0 when in unmanaged state
Previously, even if a link is in unmanaged state, the function may
returns positive value. So, even if all managed links are in the configured
sate but do not satisfy the online criteria, e.g., IPv4 address state,
then wait-online finishes with positive value.
This makes the function always return 0 for unmanaged state. So, at
least one managed link must satisfies the online criteria.
Jan Janssen [Thu, 20 Jan 2022 10:59:49 +0000 (11:59 +0100)]
meson: Remove test-efi-create-disk.sh
The script was probably not used for a very long time. It is currently
passed systemd_boot.so as boot loader, which cannot work. The test
entries it creates are all pointing at non-existant efi/linux binaries,
which means they would not even show up in the menu if the created image
were actually booted. There is also nothing that actually tries to run
the image in the first place.
If we end up creating a proper systemd-boot test suite, it would be
better to start from scratch. In the meantime, mkosi already covers
the bare minimum with a simple bootup test.
user-runtime-dir: error out immediately if mkdir fails
We try to create two directories: /run/user and /run/user/<UID>. For the
first we check the return value and error out if creation fails. But for
the second one we continued based on the assumption that the subsequent
mount will immediately fail anyway. But this has the disadvantage that we
get a somewhat confusing error message:
janv. 23 22:04:31 nsfw systemd-user-runtime-dir[1660]: Failed to mount per-user tmpfs directory /run/user/1000: No such file or directory
Let's instead fail immediately with a precise error message.
For https://bugzilla.redhat.com/show_bug.cgi?id=2044100.
Rename the normalize_mounts() helper to drop_unused_mounts. All the
helpers called in there get rid of mounts that are unused for a variety
of reasons. And whereas the helpers are aptly prefixed with "drop" the
overall helper isn't and instead uses "normalize".
Make it more obvious what the helper actually does by renaming it from
normalize_mounts() to drop_unused_mounts(). Readers of code calling this
helper will immediately see that it will get rid of unused mounts.
core/namespace: allow using ProtectSubset=pid and ProtectHostname=true together
If a service requests both ProtectSubset=pid and ProtectHostname=true
then it will currently fail to start. The ProcSubset=pid option
instructs systemd to mount procfs for the service with subset=pid which
hides all entries other than /proc/<pid>. Consequently trying to
interact with the two files /proc/sys/kernel/{hostname,domainname}
covered by ProtectHostname=true will fail.
Fix this by only performing this check when ProtectSubset=pid is not
requested. Essentially ProtectSubset=pid implies/provides
ProtectHostname=true.
Yu Watanabe [Sat, 22 Jan 2022 17:27:26 +0000 (02:27 +0900)]
sd-dhcp-server: drop unnecessary buffer duplication
The block try to find and remove the existing static lease which matches
the provided client ID, and the provided client ID will not be stored
anywhere. Hence, it is not necessary to duplicate it.
Remove incorrect claim that C escapes (such as \t and \n) are recognized and that control characters are disallowed. Specify the allowed characters and escapes with single quotes, with double quotes, and without quotes.
Thomas Haller [Sat, 22 Jan 2022 14:02:04 +0000 (15:02 +0100)]
sd-event: workaround maybe-uninitalized warning in sd_event_add_inotify()
With LTO, the compiler might think that the variable is uninitialized
(from NetworkManager's fork, with gcc-11.2.1-1.fc35):
src/libnm-systemd-core/src/libsystemd/sd-event/sd-event.c: In function 'sd_event_add_inotify':
src/libnm-systemd-core/src/libsystemd/sd-event/sd-event.c:2120: error: 's' may be used uninitialized in this function [-Werror=maybe-uninitialized]
2120 | *ret = s;
|
src/libnm-systemd-core/src/libsystemd/sd-event/sd-event.c:2102: note: 's' was declared here
2102 | sd_event_source *s;
|
lto1: all warnings being treated as errors
In particular, that would happen for codepaths where event_add_inotify_fd_internal()
returns `-errno`, and the compiler cannot be sure that the returned value will
be negative. Technically, the compiler is right, but we rely on libc functions
to set errno correctly, so this only happens in code paths, where something
bad already happend.
While LTO is prone to such false warnings, we are largely able to build systemd
without warnings. So it is feasible and we should make the effort of working
around warnings as they appear.
Yu Watanabe [Sat, 22 Jan 2022 01:44:50 +0000 (10:44 +0900)]
hostname: allow to override hardware vendor and model
Sometimes hardware vendor does not set DMI info correctly.
Already there is a way that the dbus properties can be overriden by
using hwdb. But that is not user friendly.
Julia Kartseva [Sat, 22 Jan 2022 02:50:26 +0000 (18:50 -0800)]
bpf: name unnamed bpf programs
bpf-firewall and bpf-devices do not have names. This complicates
debugging with bpftool(8).
Assign names starting with 'sd_' prefix:
* firewall program names are 'sd_fw_ingress' for ingress attach
point and 'sd_fw_egress' for egress.
* 'sd_devices' for devices prog
'sd_' prefix is already used in source-compiled programs, e.g.
sd_restrictif_i, sd_restrictif_e, sd_bind6.
The name must not be longer than 15 characters or BPF_OBJ_NAME_LEN - 1.
Assign names only to programs loaded to kernel by systemd since
programs pinned to bpffs are already loaded.
YmrDtnJu [Fri, 21 Jan 2022 17:21:27 +0000 (18:21 +0100)]
Fix journald audit logging with fields > N_IOVEC_AUDIT_FIELDS.
ELEMENTSOF(iovec) is not the correct value for the newly introduced parameter m
to function map_all_fields because it is the maximum number of elements in the
iovec array, including those reserved for N_IOVEC_META_FIELDS. The correct
value is the current number of already used elements in the array plus the
maximum number to use for fields decoded from the kernel audit message.
Jan Janssen [Fri, 21 Jan 2022 17:34:04 +0000 (18:34 +0100)]
boot: Only build with debug symbols in developer mode
The debug symbols are of very limited use in proper deployments
unlike with regular userspace. Unless someone goes through the pain
of setting up an EFI debugger (assuming their firmware even supports
this in the first place) any provided debug symbols will just be
useless.
Debugging under QEMU is possible, but even then it is non-trivial
to set up, so anyone willing to go that far can just build in
developer mode.
Meanwhile, at least x86 firmware tends to refuse binaries that contain
debug symbols. We do strip the files when converted to PE anyway, but
the elf file needs to stay around on other arches as objcopy does not
support PE as input there.
Also, the generated debug symbols seem to be not reproducible when
building with LTO. Whether this is an issue in tooling or our side
is unclear. This works around this issue.
Daan De Meyer [Fri, 21 Jan 2022 14:28:23 +0000 (14:28 +0000)]
meson: Add missing test dependencies
Currently, running "meson build" followed by "meson test -C build"
will result in many failed tests due to missing dependencies. This
commit adds the missing dependencies to make sure no tests fail.
Luca Boccassi [Mon, 17 Jan 2022 01:14:14 +0000 (01:14 +0000)]
core: add ExtensionDirectories= setting
Add a new setting that follows the same principle and implementation
as ExtensionImages, but using directories as sources.
It will be used to implement support for extending portable images
with directories, since portable services can already use a directory
as root.
Martin Wilck [Thu, 20 Jan 2022 13:31:45 +0000 (14:31 +0100)]
udevadm: cleanup-db: don't delete information for kept db entries
devices with the db_persist property won't be deleted during database
cleanup. This applies to dm and md devices in particular.
For such devices, we should also keep the files under /run/udev/links,
/run/udev/tags, and /run/udev/watch, to make sure that after restart,
udevd has the same information about the devices as it did before
the cleanup.
If we don't do this, a lower-priority device that is discovered in
the coldplug phase may take over symlinks from a device that persisted.
Not removing the watches also enables udevd to resume watching a device
after restart.
core: add %y/%Y specifiers for the fragment path of the unit
Fixes #6308: people want to be able to link a unit file via 'systemctl enable'
from a git checkout or such and refer to other files in the same repo.
The new specifiers make that easy.
%y/%Y is used because other more obvious choices like %d/%D or %p/%P are
not available because at least on of the two letters is already used.
The new specifiers are only available in units. Technically it would be
trivial to add then in [Install] too, but I don't see how they could be
useful, so I didn't do that.
I added both %y and %Y because both were requested in the issue, and because I
think both could be useful, depending on the case. %Y to refer to other files
in the same repo, and %y in the case where a single repo has multiple unit files,
and e.g. each unit has some corresponding asset named after the unit file.
Yu Watanabe [Thu, 20 Jan 2022 18:03:45 +0000 (03:03 +0900)]
resolve: refuse to resolve empty hostname
Previously, varlink or dbus methods return
io.systemd.Resolve.NoNameServers or BUS_ERROR_NO_NAME_SERVERS if an
empty hostname is provided, and thus nss-resolve returns NSS_STATUS_TRYAGAIN.
That causes getaddrinfo() returns 'Temporary failure in name resolution'
instead of 'Name or service not known'.
This makes calling varlink or dbus method with an empty hostname result
-EINVAL, and hence nss-resolve returns NSS_STATUS_NOTFOUND.
Anita Zhang [Wed, 19 Jan 2022 21:26:01 +0000 (13:26 -0800)]
oomd: handle situations when no cgroups are killed
Currently if systemd-oomd doesn't kill anything in a selected cgroup, it
selects a new candidate immediately. But if a selected cgroup wasn't killed,
it is likely due to it disappearing or getting cleaned up between the time
it was selected as a candidate and getting sent SIGKILL(s). We should handle
it as though systemd-oomd did perform a kill so that it will check
swap/pressure again before it tries to select a new candidate.
Anita Zhang [Wed, 19 Jan 2022 18:40:46 +0000 (10:40 -0800)]
oomd: fix race with path unavailability when killing cgroups
There can be a situation where systemd-oomd would kill all of the processes
in a cgroup, pid1 would clean up that cgroup, and systemd-oomd would get
ENODEV trying to iterate the cgroup a final time to ensure it was empty.
systemd-oomd sees this as an error and immediately picks a new candidate even
though pressure may have recovered. To counter this, check and handle
path unavailability errnos specially.
We would busily allocate an empty string to concatenate all of it's
zero characters to the output. Let's make things a bit simpler by letting
the specifier functions return NULL to mean "nothing to append".