Anita Zhang [Tue, 9 Feb 2021 09:47:34 +0000 (01:47 -0800)]
oom: implement avoid/omit xattr support
There may be situations where a cgroup should be protected from killing
or deprioritized as a candidate. In FB oomd xattrs are used to bias oomd
away from supervisor cgroups and towards worker cgroups in container
tasks. On desktops this can be used to protect important units with
unpredictable resource consumption.
The patch allows systemd-oomd to understand 2 xattrs:
"user.oomd_avoid" and "user.oomd_omit". If systemd-oomd sees these
xattrs set to 1 on a candidate cgroup (i.e. while attempting to kill something)
AND the cgroup is owned by root, it will either deprioritize the cgroup as
a candidate (avoid) or remove it completely as a candidate (omit).
Usage is restricted to root owned cgroups to prevent situations where an
unprivileged user can set their own cgroups lower in the kill priority than
another user's (and prevent them from omitting their units from
systemd-oomd killing).
Antonius Frie [Mon, 8 Feb 2021 08:15:15 +0000 (09:15 +0100)]
Use correct config parser for MountAPIVFS (#18501)
As far as I can see, at some point the parser function for MountAPIVFS
was changed from the generic bool parser to a custom implementation, to
allow the context to keep track of whether MountAPIVFS had been set
explicitly. If not, exec_context_get_effective_mount_apivfs would fall
back to a default value. However, the corresponding entry in the big
parser table wasn't updated, meaning that the old bool parser was still
used, meaning that context->mount_apivfs_set remained at its default
value of false, meaning that the default value was always used and the
config option was effectively ignored.
When executed in test mode, "OUTDATED" is appropriate. But when executed
to actually update the text, after the tool executes, those pages are the
opposite, not outdated.
It happens too often that what people ask for already is implemented.
Let's help cut the noise a bit, and make people check things first
hopefully, and at least make it either for us to detect such cases.
resolved: suppress ifindex info in varlink JSON responses if zero
If we don't have ifindex info, don't set the field for it.
We already do that for parsed IP address replies, let's do it for all
cases: it's a bit nicer to suppress the ifindex prop if it doesn't apply
than to pass it invalid.
This is the other side of #18482, i.e. fixes things so that the parser
doesn't get tripped up by this.
(This too makes a problem go away we should track down properly, i.e.
figure out how the ifindex got lost in
https://github.com/systemd/systemd/pull/17823#issuecomment-742439422 )
nss-resolve: accept zero ifindex when parsing resolved reply
Sometimes a reply isn't associated to any specific interface, it might
be a general truth (for example served from /etc/hosts or so). In this
case the server might pass ifindex == 0. Accept that.
Since the test suite overhaul, the test units are now under
/usr/lib/systemd/tests/testdata/tetsuite-06.units with
system_u:object_r:lib_t context. This causes an AVC denial, since the
systemd unit files are expected to have the
system_u:object_r:systemd_unit_file_t context. Let's fix this by using a
custom file context definition.
Apparently the range is like that:
$ sudo bash -c 'echo "default 1001" >/sys/fs/cgroup/user.slice/io.bfq.weight'
bash: line 0: echo: write error: Numerical result out of range
test-fs-util: beef up test for conservative_renameat()
Instead of using a short fixed string, let's use a huge blob for
testing, with randomized size and contents, that definitely is above the
16K buffer size conservative_renameat() uses internally.
David Edmundson [Wed, 3 Feb 2021 12:29:28 +0000 (12:29 +0000)]
xdg-autostart: Generate autostart services with templated name
The "XDG standardization for applications" specification states that
services should be in the form:
app[-<launcher>]-<ApplicationID>[@<RANDOM>].service or
app[-<launcher>]-<ApplicationID>-<RANDOM>.scope
In this case "autostart" takes the place of [RANDOM] to provide a unique
identifier if the same app is launched elsewhere. As it is a service
that means it should be set as a template not using a hyphen delimiter.
Daan De Meyer [Wed, 3 Feb 2021 00:24:32 +0000 (00:24 +0000)]
boot: Replace efivar_set() persistent argument with flags argument
To add secure-boot enrolling support, we need to be able to specify
the EFI_VARIABLE_APPEND_WRITE flag so let's make the efivar_set()
methods more generic so we can set that flag.
Let's make sure we still look at the etags reported by http 304 (i.e.
the cache management code). Otherwise we won't properly realize we
already downloaded this before.
The old name originates when this was used to discover "machine" images,
as managed by machined/machinectl. But nowadays this is also used by
portable services and system extensions, hence let's use a more generic
name for this API. Taking inspiration from "dissect-image.[ch]", let's call
this "discover-image.[ch]".
Deprecate builds with split-usr, prepare for removal
There is no technical reason to support systems with split-usr, except for
backwards compatibility. Even though systemd itself makes an effort to support
this, many other tools aren't as careful. Despite those efforts, we
(collectively) get it wrong often, because doing it "wrong" on systems with
merged-usr has no consequences. Since almost all developers are on such
systems, any issues are only discovered late. Supporting this split-usr mode
makes both code and documentation more complicated. The split is purely
artificial and has no justification except to allow old installation to not
update. Mechanisms to update existing systems are available though: Fedora
did that in https://fedoraproject.org/wiki/Features/UsrMove, Debian has
the usrmerge package.
The next version of Debian will only support systems with split-usr=false,
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=978636#178:
The Technical Committee resolves that Debian 'bookworm' should
support only the merged-usr root filesystem layout, dropping support
for the non-merged-usr layout.
Let's start warning if split-usr mode is used, in preparation to removing the
split in one of the future releases.
Let's split out the two codepaths a bit, and emphasize which ones it the
new-style and which the old-style codepath, and let's clearly convert
the params of the old-stye into the new style for further processing, so
that the old style path is brief and isolated.
Yu Watanabe [Wed, 20 Jan 2021 06:50:01 +0000 (15:50 +0900)]
network,udev: move TransmitQueues=/ReceiveQueues= from .network to .link
As the settings are mostly hardware setup, and merely see from network
layer.
See also discussions in
https://github.com/systemd/systemd/pull/18170#issuecomment-758807497
https://github.com/orgs/systemd/teams/systemd/discussions/1
Anita Zhang [Tue, 2 Feb 2021 22:16:03 +0000 (14:16 -0800)]
oom: rework *MemoryPressureLimit= properties to have 1/10000 precision
Requested in
https://github.com/systemd/systemd/pull/15206#discussion_r505506657,
preserve the full granularity for memory pressure limits (permyriad)
instead of capping out at percent.
Let's tighten the logic behind path_extract_filename() a bit: first of
all, refuse all cases of invalid paths with -EINVAL. More importantly
though return a recognizable error when a valid path is specified that
does not contain any filename. Specifically, "/" will now result in
-EADDRNOTAVAIL.
This changes API, but none of the existing callers care about the return
value, hence the change should be fine.
man: move content from the wiki to systemd.preset(5)
The wiki was slightly stale, and almost all the information there
was already present in the man page. I moved the remaing part (discussion)
into the man page and adjusted all links to point to the man page instead.
daemon(7) has a some examples of packaging scriptlets… I don't think it fits
there very well. Most likely they should be moved to systemd.preset(5) or maybe
even removed, but I'm leaving that for later.
Yu Watanabe [Mon, 1 Feb 2021 17:16:01 +0000 (02:16 +0900)]
libudev: also drop the entry from LIST even if unique flag is set
Otherwise, the list becomes dirty when an entry is freed.
This also remove the entry from the hashmap only when its name is set.
The name should be always set, so that does not change anything. But
just for safety.
Yu Watanabe [Mon, 1 Feb 2021 17:18:49 +0000 (02:18 +0900)]
libudev: set entry->list after the entry is stored in the list
This should not change anything. As hashmap_remove() is called before
hashmap_ensure_put(). So, even if hashmap_ensure_put() fails, a wrong
entry will not removed from the hashmap by udev_list_entry_free().
But anyway, just for safety.
Yu Watanabe [Thu, 21 Jan 2021 07:08:33 +0000 (16:08 +0900)]
sd-event: retrieve more events when epoll_wait() returns number equivalent to the buffer size
When epoll_wait() returns number equivalent to the buffer size, there
may exist remaining events which may have higher priority. To make priority
sorting correctly, let's retrieve all events.
It should help to make it more clear what causes issues like
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=30140
and https://github.com/google/oss-fuzz/pull/5084
Anita Zhang [Mon, 1 Feb 2021 03:04:34 +0000 (19:04 -0800)]
tools: make update-dbus-docs compatible with Python 3.6
668b3a42fe9e250912bd3efa4460ed691452d9bf allowed update-dbus-docs.py to start
running on Cent OS 8 (instead of skipping). But subprocess.check_output()'s
text argument didn't exist until Python 3.7 and C8 is still running
Python 3.6. Use universal_newlines instead for backwards compatibility.
Daan De Meyer [Sat, 30 Jan 2021 23:25:24 +0000 (23:25 +0000)]
boot: Make all efivar util functions take the guid as an argument
Let's make these functions a little more generic so we can have
them work on more than one GUID. More specifically, this allows
using them with the global guid which will be used a bit more to
extend the secure boot support.
Daan De Meyer [Sat, 30 Jan 2021 23:02:24 +0000 (23:02 +0000)]
boot: Enable C99
Instead of using -nostdinc, we use -nostdlib. This is necessary
to allow moving to C99 as efibind.h includes stdint.h when C99
is enabled. It isn't necessarily problematic to use some standard
library headers as long as they don't contain functions defined in
libc or another system library (or in other words, header only
headers are fine to use in sd-boot).
The device is very similar to MACH-WX9 in many ways, including this
particular one. Adding these rules gets rid of evdev warnings as buttons
are being pressed on this device.
Yu Watanabe [Tue, 26 Jan 2021 07:12:41 +0000 (16:12 +0900)]
log: skip reading the kernel command line if the process is invoked by a script
CLI tools may be used in a script. E.g., a script for monitoring a
service may use `systemctl`. Previously, if the kernel command line has
e.g. systemd.log-level=debug, then systemctl in the script produces
debugging logs when the script is invoked by a .service unit, but does
not when the script is running in a terminal. Then,
https://github.com/systemd/systemd/pull/18281#discussion_r561697482,
> I expect users to be (negatively) surprised.
In the previous commit, $SYSTEMD_EXEC_PID= is introduced. Then, we can
now detect whether a command is directly invoked by systemd or through
a script. Let's skip reading the kernel command line when a command is
invoked through a script.