On typical systems, only few services need to create SUID/SGID files.
This often is limited to the user explicitly setting suid/sgid, the
`systemd-tmpfiles*` services, and the package manager. Allowing a default
to globally restrict creation of suid/sgid files makes it easier to apply
this restriction precisely.
I added them in 41afb5eb7214727301132aedc381831fbfc78e37 without too
much explanation. Most likely the idea was to get rid of unused code
in libsystemd.so [1]. But now that I'm testing this, it doesn't seem
to have an effect. LTO is needed to get rid of unused functions, and
it's enough to have LTO without those options. Those options might have
some downsides [2], so let's disable them since there are doubts and no
particularly good reason to have them.
But keep the -Wl,--gc-sections option. Without this, libsystemd.so
grows a little:
-rwxr-xr-x 1 zbyszek zbyszek 5532424 07-08 13:24 build/libsystemd.so.0.40.0-orig
-rwxr-xr-x 1 zbyszek zbyszek 5614472 07-08 13:26 build/libsystemd.so.0.40.0-no-sections
-rwxr-xr-x 1 zbyszek zbyszek 5532392 07-08 13:27 build/libsystemd.so.0.40.0
Let's apply the --gc-sections option always to make the debug and final
builds more similar.
We need to verify that distro packages don't unexpectedly grow after this.
We put the name of the variable in the message, but it is a local variable
and the name does not have global meaning. We end up with pointless copies
of the error string:
$ strings build/libsystemd.so.0.40.0 | grep 'big enough'
xsprintf: p[] must be big enough
xsprintf: error[] must be big enough
xsprintf: prefix[] must be big enough
xsprintf: pty[] must be big enough
xsprintf: mode[] must be big enough
xsprintf: t[] must be big enough
xsprintf: s[] must be big enough
xsprintf: spid[] must be big enough
xsprintf: header_priority[] must be big enough
xsprintf: header_pid[] must be big enough
xsprintf: path[] must be big enough
xsprintf: buf[] must be big enough
The error message already shows the file, line, and function name, which
is enough to identify the problem:
Assertion 'xsprintf: buffer too small' failed at src/test/test-string-util.c:20, function test_xsprintf(). Aborting.
Merge shared/exec-directory-util.? into basic/unit-def.?
Suggested in
https://github.com/systemd/systemd/pull/35892#discussion_r2180322856.
This is a tiny amount of code and does not warrant having a separate file
and spawning a separate instance of the compiler during the build.
Note: it took me a while to confirm that the contents of that table and
function don't end up in libsystemd.so. The issue is that they _are_ present in
it, unless LTO is used. We actually use link_whole[libbasic_static] for
libsystemd, so we end up with all that code there. LTO is needed to clean
that up.
As is usually the case, the bitfields don't create the expected space savings,
because the field that follows needs to be aligned. But we don't want to fully
drop the bitfields here, because then ConditionType and ConditionResult are
each 4 bytes, and the whole struct grows from 32 to 40 bytes (on amd64). We
potentially have lots of little Conditions and that'd waste some memory.
Make each of the four fields one byte. This still allows the compiler to
generate simpler code without changing the struct size:
nspawn: Support idmapped mounts on homed managed home directories (#38069)
Christian made this possible in Linux 6.15 with a new system call
open_tree_attr() that combines open_tree() and mount_setattr(). Because
idmapped mounts are (rightfully) not nested, we have to do some extra
shenanigans to make source we're putting the right source uid in the
userns for any idmapped mounts that we do in nspawn.
Of course we also add the necessary boilerplate to make open_tree_attr()
available in our code and wrap open_tree_attr() and the corresponding
fallback in a new function which we then use everywhere else.
The EXIT_STATUS is supposed to encapuslate an ANSI C process exit
status, which is 8bit unsigned. Hence parse it as such, do not accept
negative values, or values > 255.
All other status output lines use tabs, use that for the ID shift line
too. otherwise output will appear unaligned if log viewers have fixed
tab stop positions.
core: add quota support for State, Cache, and Log exec directories (#35892)
Based on https://github.com/systemd/systemd/issues/7820, this adds support for
quota enforcement to State, Cache, and Log exec directories.
* Add new directives, StateDirectoryQuota=, CacheDirectoryQuota=, and
LogDirectoryQuota=, to define quotas as percentages (hard limits for
blocks and inodes) or absolute values (hard limits for blocks only).
* Add new directives, StateDirectoryQuotaAccounting=,
CacheDirectoryQuotaAccounting= and LogDirectoryQuotaAccounting= to keep
track of storage quotas but not enforce them (effectively just assigning
a project ID to defined exec directories).
So we exposed different names for the entry types in JSON than we named
our enum values. Which is very confusing. Let's unify that. Given that
the JSON fields are externally visible let's stick to that naming, even
though I think "unified" and "conf" would have been more descriptive.
This ensures we follow our usual logic that the enum identifiers and the
strings they map to use the same naming.
bootspec: rename boot_entry_type_to_string() to boot_entry_type_description_to_string()
This helper does not translate BootEntryType to a string matching the
enum's value names, but instead returns a human readable descriptive
string. Let's make it clearer what this, by including "description" in
the name.
Mike Yuan [Tue, 27 May 2025 23:02:04 +0000 (01:02 +0200)]
core/cgroup: tweak unit_invalidate_cgroup_bpf() a bit
- Rename to unit_invalidate_cgroup_bpf_firewall() to make it clear
that this is about CGROUP_CONTROLLER_BPF_FIREWALL only
- Report whether things changed in unit_invalidate_cgroup()
to avoid duplicate checks
nspawn: Support idmapped mounts on homed managed home directories
Christian made this possible in Linux 6.15 with a new system call
open_tree_attr() that combines open_tree() and mount_setattr().
Because idmapped mounts are (rightfully) not nested, we have to do
some extra shenanigans to make source we're putting the right source
uid in the userns for any idmapped mounts that we do in nspawn.
Of course we also add the necessary boilerplate to make open_tree_attr()
available in our code and wrap open_tree_attr() and the corresponding
fallback in a new function which we then use everywhere else.
These source files uses symbols provided by sys/stat.h, e.g. struct stat,
S_IFREG, S_IFBLK, and so on. Let's explicitly include sys/stat.h where
necessary.
Glibc's fcntl.h includes bits/stat.h, which provides these symbols, so
these symbols can be used without explicitly including sys/stat.h. But,
based on the discussion in #37922, we should explicitly include relevant
headers, and should not rely on the indirect inclusion.
shared/bus-unit-util: fix PrivateTmp=/PrivateUsers=/ProtectControlGroups= and Ex variants
For some fields, we perform careful parsing and verification on the sender
side. For other fields, we accept any string or strv. I think that actually
this is fine: we should optimize for the correct case, i.e. the user runs a
command that is valid. The server must perform parsing in all cases, so doing
the verification on the sender side doesn't add value. When doing parsing
locally, in case of invalid or unsupported input, we would generate the error
message locally, so we would avoid the D-Bus call, but the message itself is
not better and from the user's point of view, the result is the same. And by
doing the parsing only on the server side, we deal better with the case where
the sender has an older version of the software. By not doing verification, we
implicitly "support" new values. And when the sender has a newer version that
supports additional fields, that does not help as long as the server uses an
older version. So in case of version mismatches, parsing on the server side is
as good or better.
shared/bus-unit-util: tweak bus_append_exec_command to use Ex prop only if necessary
This changes little in behaviour, the conceptual part is more important. The
non-Ex variant is the actual name on the command line, and we should use the
non-Ex D-Bus property too, if it works. This increases compatibility with old
versions. But the code was mostly doing the right thing. Even the tests tested
the right thing.
We generally want to have error messages with a fixed structure that convey the
important information, i.e. field name, error value, and the offending text for
options that take short values. (The text is not printed for strings encoded with
base64 and hexmem or for credentials.)
Let's use a helper that prints the message in a fixed format in the majority of
cases. In the few places where a custom message is useful, the helper is not
used. The helper:
- prints the field name, value, and error info,
- quotes the value,
- handles -ENOMEM, so we don't need to handle it separately everywhere.
When this code was originally written, parse functions would return -1
as error. Nowadays day all return a good errno, so it is fine if we print
the corresponding strerror.
shared/bus-unit-util: tweak error handling in bus_append_exec_command
exec_command_flags_to_strv() should not fail, unless we screwed up, so assert
instead of returning an error. Also, no need to strdup constant _PATH_BSHELL;
drop that so that we can get rid of the oom error handling. Finally, rename
l → cmdline for clarity.
basic/include: replace _Static_assert() with static_assert()
If one of the header is included in a C++ source file, then using
_Static_assert() triggers compile error for some reasons.
Let's use static_assert(), which can be used by both C and C++ code.
Yu Watanabe [Wed, 25 Jun 2025 16:03:26 +0000 (01:03 +0900)]
namespace-util,nsresource: explicitly include sched.h
These source files uses symbols provided by sched.h, e.g.
setns(), unshare(), CLONE_NEWNS, and friends, but they do not explicitly
include sched.h. Currently, it is included indirectly via missing_syscall.h,
which is included by e.g. pidfd-util.h.
Let's explicitly include headers that provides symbols used in the code.
tree-wide: several cleanups for reading/writing /proc/sys/fs/nr_open
- use unsigned for the return value of read_nr_open(), as it does not
fail, and the kernel internally uses unsigned for the value,
- when bumping the value by PID1, let's start from the kernel's maximum
value defined in fs/file.c. The maximum value should be mostly an API
of the kernel, but may changed in a future, hence still try several
times if we fail to bump the value.
Co-authored-by: Jared Baur <jaredbaur@fastmail.com> Co-authored-by: John Rinehart <johnrichardrinehart@gmail.com>
meson: do not reference variable unless feature that defines it is enabled
SYSTEMD_LANGUAGE_FALLBACK_MAP is used by the localed test, and
language_fallback_map is defined by the localed meson.
If the feature is disabled, the test is not built so the env var
is not needed, and the meson variable is not defined so the build
fails.
Enable nspawn job, as there's no nested kvm so VMs are too slow. Fix
some tests that fail in a VM anyway, might add a nightly job later that
runs them.
chase() is arguably a hot path in our code, hence it deserves
some caching whether open_tree() is available. Moreover,
the manual set of r to -EPERM feels kinda ugly. Let's
instead extract this bit into its own function.