man: adjust nspawn man page to follow same section/order as --help text
No other changes, just some reshuffling and adding of section headers
(well, admittedly, I changed some "see above" and "see below" in the
text to match the new order.)
David Michael [Wed, 20 Mar 2019 15:14:32 +0000 (15:14 +0000)]
shared/install: Preserve escape characters for escaped unit names
Since switching to extract_first_word with no flags for parsing
unit names in 4c9565eea534cd233a913c8c21f7920dba229743, escape
characters will be stripped from escaped unit names such as
"mnt-persistent\x2dvolume.mount" resulting in the unit not being
configured as defined. Preserve escape characters again for
compatibility with existing preset definitions.
test-fileio: avoid warning about ineffective comparison
On arm64 with gcc-8.2.1-5.fc29.aarch64:
../src/test/test-fileio.c:645:29: warning: comparison is always false due to limited range of data type [-Wtype-limits]
assert_se(c == EOF || safe_fgetc(f, &c) == 1);
^~
Casting c to int is not enough, gcc is able to figure out that the original
type was unsigned and still warns. So let's just silence the warning like
in test-sizeof.c.
sd-bus: avoid IN_SET() invocation with two identical values
Fixes #12036.
../../../src/systemd/src/libsystemd/sd-bus/bus-objects.c: In function ‘add_object_vtable_internal’:
../../../src/systemd/src/basic/macro.h:423:19: error: duplicate case value
Chris Morin [Wed, 20 Mar 2019 08:34:23 +0000 (01:34 -0700)]
journal-file: handle SIGBUS on offlining thread
The thread launched in journal_file_set_offline() accesses a memory
mapped file, so it needs to handle SIGBUS. Leave SIGBUS unblocked on the
offlining thread so that it uses the same handler as the main thread.
The result of triggering SIGBUS in a thread where it's blocked is
undefined in Linux. The tested implementations were observed to cause
the default handler to run, taking down the whole journald process.
We can leave SIGBUS unblocked in multiple threads since it's handler is
thread-safe. If SIGBUS is sent to the journald process asynchronously
(i.e. with kill, sigqueue, or raise), either thread handling it will
result in the same behavior: it will install the default handler and
reraise the signal, killing the process.
Franck Bui [Tue, 19 Mar 2019 09:59:26 +0000 (10:59 +0100)]
core: only watch processes when it's really necessary
If we know that main pid is our child then it's unnecessary to watch all
other processes of a unit since in this case we will get SIGCHLD when the main
process will exit and will act upon accordingly.
So let's watch all processes only if the main process is not our child since in
this case we need to detect when the cgroup will become empty in order to
figure out when the service becomes dead. This is only needed by cgroupv1.
Franck Bui [Mon, 18 Mar 2019 19:59:36 +0000 (20:59 +0100)]
core: reduce the number of stalled PIDs from the watched processes list when possible
Some PIDs can remain in the watched list even though their processes have
exited since a long time. It can easily happen if the main process of a forking
service manages to spawn a child before the control process exits for example.
However when a pid is about to be mapped to a unit by calling unit_watch_pid(),
the caller usually knows if the pid should belong to this unit exclusively: if
we just forked() off a child, then we can be sure that its PID is otherwise
unused. In this case we take this opportunity to remove any stalled PIDs from
the watched process list.
If we learnt about a PID in any other form (for example via PID file, via
searching, MAINPID= and so on), then we can't assume anything.
alloc-util: use malloc_usable_size() to determine allocated size
It's a glibc-specific API, but supported on FreeBSD and musl too at
least, hence fairly common. This way we can reduce our calls to
realloc() as much as possible.
units: turn off keyring handling for user@.service
This service uses PAM anyway, hence let pam_keyring set things up for
us. Moreover, this way we ensure that the invocation ID is not set for
this service as key, and thus can't confuse the user service's
invocation ID.
Claudius Ellsel [Tue, 19 Mar 2019 00:30:22 +0000 (01:30 +0100)]
Change Razer Abyssus DPI in 70-mouse.hwdb (#12029)
As discussed in https://gitlab.freedesktop.org/libinput/libinput/issues/198#note_100642 the DPI for the Razer Abyssus mouse is not 3500 by default, but around 1600-1700 when measured with the mouse-dpi-tool.
So I have done some measurements now and always got a value of about 21000 device units on a distance of 12.5 inch. This would result in a calculated resolution of about 1680 DPI. Since such an odd number does not occur in the hwdb file I decided to round to 1600 DPI.
The function so far always returned -ECANCELLED, which is ignored in all
cases the function is invoked, except one: in unit_test_start_limit()
where -ECANCELLED is returned when the start limit is hit, which is part
of unit_start()'s protocol of return values.
Since the emergency_action() logic should be relatively generic and is
used in many places, let's drop the return value from it, since it's
constant anyway, and in alll cases useless. Instead, let's return it in
unit_test_start_limit(), where it's part of the protocol.
Reproducer:
for i in `seq 1 100`; do gdbus call --session -d org.freedesktop.systemd1 -m org.freedesktop.systemd1.Manager.StartUnit -o "/$(for x in `seq 0 28000`; do echo -n $x; done)" & done
socket-util: add wrappers for binding socket to ifindex/ifname
socket_bind_to_ifindex() uses the the SO_BINDTOIFINDEX sockopt of kernel
5.0, with a fallback to SO_BINDTODEVICE on older kernels.
socket_bind_to_ifname() is a trivial wrapper around SO_BINDTODEVICE, the
only benefit of using it instead of SO_BINDTODEVICE directly is that it
determines the size of the interface name properly so that it also works
for unbinding. Moreover, it's an attempt to unify our invocations of the
sockopt with a size of strlen(ifname) rather than strlen(ifname)+1...
basic/fd-util: refuse "infinite" loop in close_all_fds()
I had a test machine with ulimit -n set to 1073741816 through pam
("session required pam_limits.so set_all", which copies the limits from PID 1,
left over from testing of #10921).
test-execute would "hang" and then fail with a timeout when running
exec-inaccessiblepaths-proc.service. It turns out that the problem was in
close_all_fds(), which would go to the fallback path of doing close() 1073741813 times. Let's just fail if we hit this case. This only matters
for cases where both /proc is inaccessible, and the *soft* limit has been
raised.
(gdb) bt
#0 0x00007f7e2e73fdc8 in close () from target:/lib64/libc.so.6
#1 0x00007f7e2e42cdfd in close_nointr ()
from target:/home/zbyszek/src/systemd-work3/build-rawhide/src/shared/libsystemd-shared-241.so
#2 0x00007f7e2e42d525 in close_all_fds ()
from target:/home/zbyszek/src/systemd-work3/build-rawhide/src/shared/libsystemd-shared-241.so
#3 0x0000000000426e53 in exec_child ()
#4 0x0000000000429578 in exec_spawn ()
#5 0x00000000004ce1ab in service_spawn ()
#6 0x00000000004cff77 in service_enter_start ()
#7 0x00000000004d028f in service_enter_start_pre ()
#8 0x00000000004d16f2 in service_start ()
#9 0x00000000004568f4 in unit_start ()
#10 0x0000000000416987 in test ()
#11 0x0000000000417632 in test_exec_inaccessiblepaths ()
#12 0x0000000000419362 in run_tests ()
#13 0x0000000000419632 in main ()
test-execute: allow filtering test cases by pattern
When debugging failure in one of the cases, it's annoying to have to wade
through the output from all the other cases. Let's allow picking select
cases.
seccomp: allow shmat to be a separate syscall on architectures which use a multiplexer
After
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d6040d46817,
those syscalls have their separate numbers and we can block them.
But glibc might still use the old ones. So let's just do a best-effort
block and not assume anything about how effective it is.
nspawn: add support for executing OCI runtime bundles with nspawn
This is a pretty large patch, and adds support for OCI runtime bundles
to nspawn. A new switch --oci-bundle= is added that takes a path to an
OCI bundle. The JSON file included therein is read similar to a .nspawn
settings files, however with a different feature set.
Implementation-wise this mostly extends the pre-existing Settings object
to carry additional properties for OCI. However, OCI supports some
concepts .nspawn files did not support yet, which this patch also adds:
1. Support for "masking" files and directories. This functionatly is now
also available via the new --inaccesible= cmdline command, and
Inaccessible= in .nspawn files.
2. Support for mounting arbitrary file systems. (not exposed through
nspawn cmdline nor .nspawn files, because probably not a good idea)
3. Ability to configure the console settings for a container. This
functionality is now also available on the nspawn cmdline in the new
--console= switch (not added to .nspawn for now, as it is something
specific to the invocation really, not a property of the container)
4. Console width/height configuration. Not exposed through
.nspawn/cmdline, but this may be controlled through $COLUMNS and
$LINES like in most other UNIX tools.
5. UID/GID configuration by raw numbers. (not exposed in .nspawn and on
the cmdline, since containers likely have different user tables, and
the existing --user= switch appears to be the better option)
6. OCI hook commands (no exposed in .nspawn/cmdline, as very specific to
OCI)
7. Creation of additional devices nodes in /dev. Most likely not a good
idea, hence not exposed in .nspawn/cmdline. There's already --bind=
to achieve the same, which is the better alternative.
8. Explicit syscall filters. This is not a good idea, due to the skewed
arch support, hence not exposed through .nspawn/cmdline.
9. Configuration of some sysctls on a whitelist. Questionnable, not
supported in .nspawn/cmdline for now.
10. Configuration of all 5 types of capabilities. Not a useful concept,
since the kernel will reduce the caps on execve() anyway. Not
exposed through .nspawn/cmdline as this is not very useful hence.
Note that this only implements the OCI runtime logic itself. It does not
provide a runc-compatible command line tool. This is left for a later
PR. Only with that in place tools such as "buildah" can use the OCI
support in nspawn as drop-in replacement.
Currently still missing is OCI hook support, but it's already parsed and
everything, and should be easy to add. Other than that it's OCI is
implemented pretty comprehensively.
There's a list of incompatibilities in the nspawn-oci.c file. In a later
PR I'd like to convert this into proper markdown and add it to the
documentation directory.