man: drop references to "syslog" and "syslog+console" from man page
These options are pretty much equivalent to "journal" and
"journal+console" anyway, let's simplify things, and drop them from the
documentation hence.
For compat reasons let's keep them in the code.
(Note that they are not 100% identical to 'journal', but I doubt the
distinction in behaviour is really relevant to keep this in the docs.
And we should probably should drop 'syslog' entirely from our codebase
eventually, but it's problematic as long as we semi-support udev on
non-systemd systems still.)
Anita Zhang [Mon, 20 May 2019 21:43:53 +0000 (14:43 -0700)]
bpf-firewall: optimization for IPAddressXYZ="any" (and unprivileged users)
This is a workaround to make IPAddressDeny=any/IPAddressAllow=any work
for non-root users that have CAP_NET_ADMIN. "any" was chosen since
all or nothing network access is one of the most common use cases for
isolation.
Allocating BPF LPM TRIE maps require CAP_SYS_ADMIN while BPF_PROG_TYPE_CGROUP_SKB
only needs CAP_NET_ADMIN. In the case of IPAddressXYZ="any" we can just
consistently return false/true to avoid allocating the map and limit the user
to having CAP_NET_ADMIN.
Topi Miettinen [Mon, 20 May 2019 09:20:58 +0000 (12:20 +0300)]
cgroup-util: kill also threads
It's possible for a zombie process to have live threads. These are not listed
in /sys in "cgroup.procs" for cgroupsv2, but they show up in
"cgroup.threads" (cgroupv2) or "tasks" (cgroupv1) nodes. When killing a
cgroup (v2 only) with SIGKILL, let's also kill threads after killing processes,
so the live threads of a zombie get killed too.
prefix_root() is equivalent to path_join() in almost all ways, hence
let's remove it.
There are subtle differences though: prefix_root() will try shorten
multiple "/" before and after the prefix. path_join() doesn't do that.
This means prefix_root() might return a string shorter than both its
inputs combined, while path_join() never does that. I like the
path_join() semantics better, hence I think dropping prefix_root() is
totally OK. In the end the strings generated by both functon should
always be identical in terms of path_equal() if not streq().
This leaves prefix_roota() in place. Ideally we'd have path_joina(), but
I don't think we can reasonably implement that as a macro. or maybe we
can? (if so, sounds like something for a later PR)
Anita Zhang [Mon, 3 Jun 2019 23:25:43 +0000 (16:25 -0700)]
nspawn: don't hard fail when setting capabilities
The OCI changes in #9762 broke a use case in which we use nspawn from
inside a container that has dropped capabilities from the bounding set
that nspawn expected to retain. In an attempt to keep OCI compliance
and support our use case, I made hard failing on setting capabilities
not in the bounding set optional (hard fail if using OCI and log only
if using nspawn cmdline).
cap_last_cap() returns the last valid cap (instead of the number of
valid caps). to iterate through all known caps we hence need to use a <=
check, and not a < check like for all other cases. We got this right
usually, but in three cases we did not.
Topi Miettinen [Wed, 1 May 2019 12:28:36 +0000 (15:28 +0300)]
units: deny access to block devices
While the need for access to character devices can be tricky to determine for
the general case, it's obvious that most of our services have no need to access
block devices. For logind and timedated this can be tightened further.
Donald Buczek [Thu, 25 Apr 2019 07:39:41 +0000 (09:39 +0200)]
cgroup: Continue unit reset if cgroup is busy
When part of the cgroup hierarchy cannot be deleted (e.g. because there
are still processes in it), do not exit unit_prune_cgroup early, but
continue so that u->cgroup_realized is reset.
Log the known case of non-empty cgroups at debug level and other errors
at warning level.
Iwan Timmer [Sat, 15 Jun 2019 20:54:41 +0000 (22:54 +0200)]
resolved: move TLS data shared by all servers to manager
Instead of having a context and/or trusted CA list per server this is now moved to the server. Ensures future TLS configuration options are global instead of per server.
Frantisek Sumsal [Tue, 18 Jun 2019 09:25:16 +0000 (11:25 +0200)]
hashmap: avoid using TLS in a destructor
Using C11 thread-local storage in destructors causes uninitialized
read. Let's avoid that using a direct comparison instead of using
the cached values. As this code path is taken only when compiled
with -DVALGRIND=1, the performance cost shouldn't matter too much.
Franck Bui [Fri, 7 Jun 2019 08:17:11 +0000 (10:17 +0200)]
terminal-util: introduce openpt_allocate()
Allocating a pty is done in a couple of places so let's introduce a new helper
which does the job.
Also the new function, as well as openpt_in_namespace(), returns both pty
master and slave so the callers don't need to know about the pty slave
allocation details.
For the same reasons machine_openpt() prototype has also been changed to return
both pty master and slave so callers don't need to allocate a pty slave which
might be in a different namespace.
Finally openpt_in_namespace() has been renamed into
openpt_allocate_in_namespace().
Franck Bui [Thu, 6 Jun 2019 08:05:33 +0000 (10:05 +0200)]
nspawn: allocate the pty used for /dev/console within the container
The console tty is now allocated from within the container so it's not
necessary anymore to allocate it from the host and bind mount the pty slave
into the container. The pty master is sent to the host.
/dev/console is now a symlink pointing to the pty slave.
This might also be less confusing for applications running inside the container
and the overall result looks cleaner (we don't need to apply manually the
passed selinux context, if any, to the allocated pty for instance).
core: set fs.file-max sysctl to LONG_MAX rather than ULONG_MAX
Since kernel 5.2 the kernel thankfully returns proper errors when we
write a value out of range to the sysctl. Which however breaks writing
ULONG_MAX to request the maximum value. Hence let's write the new
maximum value instead, LONG_MAX.
This is for 6d3646406560. It turns out that this is causing more problems than
expected. Let's retroactively introduce naming scheme v241 to conditionalize
this change.
Follow-up for #12792 and 6d36464065601f7. See also
https://bugzilla.suse.com/show_bug.cgi?id=1136600.
$ SYSTEMD_LOG_LEVEL=debug NET_NAMING_SCHEME=v240 build/udevadm test-builtin net_setup_link /sys/class/net/br11
$ SYSTEMD_LOG_LEVEL=debug NET_NAMING_SCHEME=v241 build/udevadm test-builtin net_setup_link /sys/class/net/br11
...
@@ -20,11 +20,13 @@
link_config: could not set ethtool features for br11
Could not set offload features of br11: Operation not permitted
br11: Device has name_assign_type=3
-Using interface naming scheme 'v240'.
+Using interface naming scheme 'v241'.
br11: Policy *keep*: keeping existing userspace name
br11: Device has addr_assign_type=1
-br11: No stable identifying information found
-br11: Could not generate persistent MAC: No data available
+br11: Using "br11" as stable identifying information
+br11: Using generated persistent MAC address
+Could not set Alias=, MACAddress= or MTU= on br11: Operation not permitted
+br11: Could not apply link config, ignoring: Operation not permitted
Unload module index
Unloaded link configuration context.
ID_NET_DRIVER=bridge
libsystemd-network: rename net_get_name() to net_get_name_persistent()
This reflect its role better.
(I didn't use …_persistent_name(), because which name is actually used
depends on the policy. So it's better not to make this sound like it returns
*the* persistent name.)
This is in preparation for later changes. Let's change the documentation of
net.naming-scheme= to also say that it applies to MAC addresses. This commit
doesn't actually implement that though.