shared: split out UID allocation range stuff from user-record.h
user-record.[ch] are about the UserRecord JSON stuff, and the UID
allocation range stuff (i.e. login.defs handling) is a very different
thing, and complex enough on its own, let's give it its own c/h files.
homed: include actual fs type + access mode as part of "status" section of user record
So far we have two properties for the intended fstype + access mode of
home dirs, but they might differ from what is actually used (because the
user record changed from the home dir, after it was created, or vice
versa). Let's hence add these props also to the "status" section of user
record, which report the status quo. That way we can always show the
correct, current settings.
homed: allow querying disk free status separetely from generating JSON from it
We later want to query per-home free status for implementing automatic
grow/shrink of home directories, hence let's separate the JSON
generation from the disk free status determination.
This adds to new helpers: keyring_read() for reading a key data from a
keyring entry, and TAKE_KEY_SERIAL which is what TAKE_FD is for fds, but
for key_serial_t.
The former is immediately used by ask-password-api.c
homed: add env var for overriding default mount options
This adds an esay way to override the default mount options to use for
LUKS home dirs via the env vars SYSTEMD_HOME_MOUNT_OPTIONS_EXT4,
SYSTEMD_HOME_MOUNT_OPTIONS_BTRFS, SYSTEMD_HOME_MOUNT_OPTIONS_XFS.
This follows what Fedora did with 34: enables compression by default,
lowering IO bandwidth and reducing disk space use, at the price of
slightly higher CPU use.
In delete_rule(), we already checked that the rule name is a valid file name
(i.e. no slashes), so we can just trivially append.
Also, let's always reject rules that we would later fail to delete. It's
probably better to avoid such confusion.
And print the operations we do with file name and line number. I hope this
helps with cases like https://github.com/systemd/systemd/pull/21178. At least
we'll know what rule failed.
$ sudo SYSTEMD_LOG_LEVEL=debug build/systemd-binfmt
Flushed all binfmt_misc rules.
Applying /etc/binfmt.d/kshcomp.conf…
/etc/binfmt.d/kshcomp.conf:1: binary format 'kshcomp' registered.
namespace: make tmp dir handling code independent of umask too
Let's make all code in namespace.c robust towards weird umask. This
doesn't matter too much given that the parent dirs we deal here almost
certainly exist anyway, but let's clean this up anyway and make it fully
clean.
namespace: make whole namespace_setup() work regardless of configured umask
Let's reset the umask during the whole namespace_setup() logic, so that
all our mkdir() + mknod() are not subjected to whatever umask might
currently be set.
This mostly moves the umask save/restore logic out of
mount_private_dev() and into the stack frame of namespace_setup() that
is further out.
Jonas Witschel [Thu, 11 Nov 2021 21:25:40 +0000 (22:25 +0100)]
test: add regression test for systemd-run --scope [--user]
systemd-run --scope --user failed to run in system 249.6, cf. #21297. Add tests
for systemd-run --scope and systemd-run --scope --user to make sure this does
not regress again.
Apparently version updates aren't always disabled on old forks,
which leads to new PRs opened there. To somewhat mitigate the
issue let's limit the number of PRs Dependabot can create.
It was reported in https://github.com/yuwata/systemd/pull/2#issuecomment-967737195
I think we should stick to the rule that stuff defined in
types-fundamental.h either:
1. adds a prefixed concept "sd_xyz" that maps differently in the two
environments
2. adds a non-prefixed concept "xyz" that adds a type otherwise missing
in one of the two environments but with the same definition as in the
other.
i.e. if have have some concept that might differ the way its set up in
the two environments it really should be prefixed by "sd_" to make clear
it has semantics we defined. Only drop the prefix if it really means the
exact same thin in all environments.
Now, sd_bool is defined prefixed, because its either mapped to "BOOLEAN"
(which is an integer) in UEFI or "bool" (which is C99 _Bool) in
userspace. size_t is not defined prefixed, because it's mapped to the
same thing ultimately (on the UEFI its mapped to UINTN, but that in turn
is defined as being the type for the size of memory objects, thus it's
really the same as userspace size_t).
So far "true" and "false" where defined unprefixed even though they map
to values of different types. typeof(true) in userspace would reveal
_Bool, but typeof(false) in UEFI would reveal BOOLEAN. The distinction
actually does matter in comparisons (i.e. (_Bool) 1 == (_Bool) 2 holds
while (BOOLEAN) 1 == (BOOLEAN) 2 does not hold).
Hence, let's add sd_true and sd_false, thus indicating we defined our
own concept here, and it has similar but different semantics in UEFI and
in userspace.
"type.h" is a very generic name, but this header is very specific to
making the "fundaemtnal" stuff work, it maps genric types in two
distinct ways. Hence let's make clear in the header name already what
this is about.
Unfortunately they forgot the "const" decoration on the MetaiMatch()
prototype, but let that omission not leak into our code, let's hide it
away in the innermost use.
Up until now the main reason why we didn't proceed with starting the
unit was exceed start limit burst. However, for unit types like mounts
the other reason could be effective ratelimit on /proc/self/mountinfo
event source. That means our mount unit state may not reflect current
kernel state. Hence, we need to attempt to re-run the start job again
after ratelimit on event source expires.
As we will be introducing another reason than start limit let's rename
the virtual function that implements the check.
Jonas Witschel [Wed, 10 Nov 2021 21:46:35 +0000 (22:46 +0100)]
scope: count successful cgroup additions when delegating via D-Bus
Since commit 8d3e4ac7cd37200d1431411a4b98925a24b7d9b3 ("scope: refuse
activation of scopes if no PIDs to add are left") all "systemd-run --scope
--user" calls fail because cgroup attachments delegated to the system instance
are not counted towards successful additions. Fix this by incrementing the
return value in case unit_attach_pid_to_cgroup_via_bus() succeeds, similar to
what happens when cg_attach() succeeds directly.
Note that this can *not* distinguish the case when
unit_attach_pid_to_cgroup_via_bus() has been run successfully, but all
processes to attach are gone in the meantime, unlike the checks that commit 8d3e4ac7cd37200d1431411a4b98925a24b7d9b3 adds for the system instance. This is
because even though unit_attach_pid_to_cgroup_via_bus() leads to an internal
unit_attach_pids_to_cgroup() call, the return value over D-Bus does not include
the number of successfully attached processes and is always NULL on success.
codeql-actions doesn't point to SHAs because it isn't clear
whether Dependabot supports their release cycle mentioned
at https://github.com/github/codeql-action/issues/307
Turns out GHActions where `pull_request_target` is used are capable
of pwning repositories: https://securitylab.github.com/research/github-actions-preventing-pwn-requests/
labeler doesn't check out the source code or build anything so
it's safe in its current form but to avoid surprises let's just pin
it to the latest version. It's annoying to manage dependencies like this
manually so additionally dependabot.yml is introduced to make it
easier to keep GHActions up to date more or less automatically:
https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/keeping-your-actions-up-to-date-with-dependabot
And while we are at it, make 'ssh-authorized-keys' verb properly
documented. Given that OpenSSH documents the interface in its man page
it's fine to just document our implementation of it too.
Julia Kartseva [Fri, 5 Nov 2021 01:52:02 +0000 (18:52 -0700)]
core: fix bpf-foreign cg controller realization
Requiring /sys/fs/bpf path to be a mount point at the moment of cgroup
controllers realization does more harm than good, because:
* Realization happens early on boot, the mount point may not be ready at
the time. That happens if mounts are made by a .mount unit (the issue we
encountered).
* BPF filesystem may be mounted on another point.
Remove the check. Instead verify that path provided by BPFProgram= is
within BPF fs when unit properties are parsed.