Tejun Heo [Fri, 19 Aug 2016 02:57:53 +0000 (22:57 -0400)]
logind: update empty and "infinity" handling for [User]TasksMax (#3835)
The parsing functions for [User]TasksMax were inconsistent. Empty string and
"infinity" were interpreted as no limit for TasksMax but not accepted for
UserTasksMax. Update them so that they're consistent with other knobs.
* Empty string indicates the default value.
* "infinity" indicates no limit.
While at it, replace opencoded (uint64_t) -1 with CGROUP_LIMIT_MAX in TasksMax
handling.
v2: Update empty string to indicate the default value as suggested by Zbigniew
Jędrzejewski-Szmek.
Peter Hutterer [Tue, 16 Aug 2016 05:23:42 +0000 (15:23 +1000)]
hwdb: add a udev property for a wheel click angle on horiz wheels
The Logitech MX Master has a horizontal scroll wheel with a different click
angle than the vertical one. Add a new property for this case, we can't add
values to the normal one without risking upsetting existing parsers.
Let's make sure we don't clobber the input parameter args[1], following our
coding style to not clobber parameters unless explicitly indicated. (in
particular, as we don't want to have our changes appear in the command line
shown in "ps"...)
nss-mymachines: avoid connecting to dbus from inside dbus-daemon
Inspired from the new logic in nss-systemd let's make sure we don't end up
deadlocking in nss-mymachines either in case dbus-daemon tries to a look up a
name and we want to connect to the bus.
This case is much simpler though, as there's no point in resolving virtual
machine UIDs by dbus-daemon as those should never be able to connect to the
host's busses.
core: bypass dynamic user lookups from dbus-daemon
dbus-daemon does NSS name look-ups in order to enforce its bus policy. This
might dead-lock if an NSS module use wants to use D-Bus for the look-up itself,
like our nss-systemd does. Let's work around this by bypassing bus
communication in the NSS module if we run inside of dbus-daemon. To make this
work we keep a bit of extra state in /run/systemd/dynamic-uid/ so that we don't
have to consult the bus, but can still resolve the names.
Note that the normal codepath continues to be via the bus, so that resolving
works from all mount namespaces and is subject to authentication, as before.
This is a bit dirty, but not too dirty, as dbus daemon is kinda special anyway
for PID 1.
This adds the boolean RemoveIPC= setting to service, socket, mount and swap
units (i.e. all unit types that may invoke processes). if turned on, and the
unit's user/group is not root, all IPC objects of the user/group are removed
when the service is shut down. The life-cycle of the IPC objects is hence bound
to the unit life-cycle.
This is particularly relevant for units with dynamic users, as it is essential
that no objects owned by the dynamic users survive the service exiting. In
fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set.
In order to communicate the UID/GID of an executed process back to PID 1 this
adds a new "user lookup" socket pair, that is inherited into the forked
processes, and closed before the exec(). This is needed since we cannot do NSS
from PID 1 due to deadlock risks, However need to know the used UID/GID in
order to clean up IPC owned by it if the unit shuts down.
core: move obsolete properties to the end of vtables
This makes it easier to discern the relevant and obsolete parts of the vtables,
and in particular helps when comparing introspection data with the actual
vtable definitions.
Felipe Sateler [Wed, 17 Aug 2016 18:11:27 +0000 (15:11 -0300)]
sysv-generator: better error reporting (#3977)
Currently in the journal you get messages without context like:
systemd-sysv-generator[$pid]: Failed to build name: Invalid argument
When parsing the init script, show the file and line number where the
error was found. At the same time, add more context information if
available.
Thus turning the message into something like:
systemd-sysv-generator[$pid]: [/etc/init.d/root-system-proofd:13] Could not build name for facility $network,: Invalid argument
Vito Caputo [Wed, 17 Aug 2016 12:51:07 +0000 (05:51 -0700)]
journal: ensure open journals from find_journal() (#3973)
If journals get into a closed state like when rotate fails due to
ENOSPC, when space is made available it currently goes unnoticed leaving
the journals in a closed state indefinitely.
By calling system_journal_open() on entry to find_journal() we ensure
the journal has been opened/created if possible.
Also moved system_journal_open() up to after open_journal(), before
find_journal().
Daniel Hahler [Tue, 16 Aug 2016 16:42:41 +0000 (18:42 +0200)]
zsh: _journalctl: do not complete exclusive modes (#3957)
After `journalctl -D /var/log/journal` "--directory", "--file",
"--machine" and "--root" should not be available for completion, because
they are exclusive. But multiple `--file` arguments are allowed.
However, it should not use `--system` in that case anyway, so this patch
removes the part that should cause a default to be used and adds some
comments.
Daniel Hahler [Sat, 13 Aug 2016 14:41:22 +0000 (16:41 +0200)]
zsh: _journalctl: improve support for handling mode args (#3952)
This only completes fields from `journalctl --user` in _journal_fields when `--user`
is used.
It also changes $_sys_service_mgr to include both `--system` and `--user`,
because `journalctl` behaves different from `systemctl` in this regard.
No attempt is made to filter out invalid combinations, e.g. when using both
`--directory` and `--system` (see https://github.com/systemd/systemd/issues/3949).
journalctl: allow --root argument for journal watching
It is useful to look at a (possibly inactive) container or other os tree
with --root=/path/to/container. This is similar to specifying
--directory=/path/to/container/var/log/journal --directory=/path/to/container/run/systemd/journal
(if using --directory multiple times was allowed), but doesn't require
as much typing.
sd-journal: fix sd_journal_open_directory with SD_JOURNAL_OS_ROOT
The directory argument that is given to sd_j_o_d was ignored when
SD_JOURNAL_OS_ROOT was given, and directories relative to the root of the host
file system were used. With that flag, sd_j_o_d should do the same as
sd_j_open_container: use the path as "prefix", i.e. the directory relative to
which everything happens.
Instead of touching sd_j_o_d, journal_new is fixed to do what sd_j_o_c
was doing, and treat the specified path as prefix when SD_JOURNAL_OS_ROOT is
specified.
Daniel Hahler [Thu, 11 Aug 2016 16:52:13 +0000 (18:52 +0200)]
zsh: _journalctl: handle --user in _journal_none
This uses the same mechanism from _systemctl to inject `--user` into the
`journalctrl -F _EXE` call to list executables.
Before this patch the "commands" section would list executables from
system units always.
Daniel Hahler [Thu, 11 Aug 2016 16:46:31 +0000 (18:46 +0200)]
zsh: _filter_units_by_property: respect --user
Use `$_sys_service_mgr` to handle `--user`, so that `systemctl --user
stop` will correctly filter the active (user) units. Before this patch,
only user units that also exist as system units and are stoppable there
would be listed.
coredump: treat RLIMIT_CORE below page size as disabling coredumps (#3932)
The kernel treats values below a certain threshold (minfmt->min_coredump
which is initialized do ELF_EXEC_PAGESIZE, which varies between architectures,
but is usually the same as PAGE_SIZE) as disabling coredumps [1].
Any core image below ELF_EXEC_PAGESIZE will yield an invalid backtrace anyway [2],
so follow the kernel and not try to parse or store such images.
[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/coredump.c#n660
[2] systemd-coredump[16260]: Process 16258 (sleep) of user 1002 dumped core.
Stack trace of thread 16258:
#0 0x00007f1d8b3d3810 n/a (n/a)
Rhys [Tue, 9 Aug 2016 13:33:46 +0000 (23:33 +1000)]
install: follow config_path symlink (#3362)
Under NixOS, the config_path /etc/systemd/system is a symlink to
/etc/static/systemd/system. Commands such as `systemctl list-unit-files`
and `systemctl is-enabled` did not work as the symlink was not followed.
This does not affect how symlinks are treated within the config_path
directory.
It's hard to say which one of the two mappings should stay. But the later
one would win (when both very present), and nobody complained, so let's
assume that that's the one.
hwdb: remove duplicated matches for old Logitech unifying receiver
Quoting https://github.com/systemd/systemd/pull/3906#discussion_r73828368:
> According to
> http://support.logitech.com/en_us/product/v220-cordless-optical-mouse-for-notebooks
> it seems the mouse is using a pre-version of the small unifying receiver we
> know now. If there are 2 mice with the same receiver, that means that the
> values should both be dropped IMO.
This works for hwdb/[67]0-*.hwdb. I also added code to parse hwdb/20-*, but those
files are huge, and parsing them using this parser is annoyingly slow (about one
minute for the biggest files). So I removed the support for hwdb/20-*, a much simpler
hand-generated parser should suffice for those.
Current output:
hwdb/60-evdev.hwdb: 24 match groups, 35 matches, 88 properties, 0.19323015213012695s to parse
Match 'evdev:input:b0003v05ACp0259*' is duplicated
Match 'evdev:input:b0003v05ACp025A*' is duplicated
Match 'evdev:input:b0003v05ACp025B*' is duplicated
hwdb/60-keyboard.hwdb: 122 match groups, 188 matches, 638 properties, 1.0906572341918945s to parse
Failed to parse: 'KEYBOARD_KEY_8F=switchvideomode'
Failed to parse: 'KEYBOARD_KEY_C0183=media'
Failed to parse: 'KEYBOARD_KEY_C0201=new'
Failed to parse: 'KEYBOARD_KEY_C0289=reply'
Failed to parse: 'KEYBOARD_KEY_C028B=forwardmail'
Failed to parse: 'KEYBOARD_KEY_C028C=send'
Failed to parse: 'KEYBOARD_KEY_C021A=undo'
Failed to parse: 'KEYBOARD_KEY_C0279=redo'
Failed to parse: 'KEYBOARD_KEY_C0208=print'
Failed to parse: 'KEYBOARD_KEY_C0207=save'
Failed to parse: 'KEYBOARD_KEY_C0194=file'
Failed to parse: 'KEYBOARD_KEY_C01A7=documents'
Failed to parse: 'KEYBOARD_KEY_C01B6=images'
Failed to parse: 'KEYBOARD_KEY_C01B7=sound'
Property KEYBOARD_KEY_c7 is duplicated
Failed to parse: 'KEYBOARD_KEY_cF=end'
hwdb/70-mouse.hwdb: 62 match groups, 93 matches, 68 properties, 0.34186625480651855s to parse
Match 'mouse:usb:v046dpc51b:name:Logitech USB Receiver:' is duplicated
hwdb/70-pointingstick.hwdb: 5 match groups, 14 matches, 7 properties, 0.06518816947937012s to parse
hwdb/70-touchpad.hwdb: 3 match groups, 5 matches, 3 properties, 0.039690494537353516s to parse
This way it's clear that the property block does not end at the comment.
The python checker will complain if this is not the case.
We had a few bugs before where two match blocks were merged by mistake,
and this change should help avoid that.
Tejun Heo [Sun, 7 Aug 2016 13:45:39 +0000 (09:45 -0400)]
core: add cgroup CPU controller support on the unified hierarchy
Unfortunately, due to the disagreements in the kernel development community,
CPU controller cgroup v2 support has not been merged and enabling it requires
applying two small out-of-tree kernel patches. The situation is explained in
the following documentation.
While it isn't clear what will happen with CPU controller cgroup v2 support,
there are critical features which are possible only on cgroup v2 such as
buffered write control making cgroup v2 essential for a lot of workloads. This
commit implements systemd CPU controller support on the unified hierarchy so
that users who choose to deploy CPU controller cgroup v2 support can easily
take advantage of it.
On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max"
replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config
options are added with the usual compat translation. CPU quota settings remain
unchanged and apply to both legacy and unified hierarchies.
v2: - Error in man page corrected.
- CPU config application in cgroup_context_apply() refactored.
- CPU accounting now works on unified hierarchy.
buildsys,journal: allow -fsanitize=address without VALGRIND defined
Fixed (master) versions of libtool pass -fsanitize=address correctly
into CFLAGS and LDFLAGS allowing ASAN to be used without any special
configure tricks..however ASAN triggers in lookup3.c for the same
reasons valgrind does. take the alternative codepath if
__SANITIZE_ADDRESS__ is defined as well.
It was meant to write to q instead of t
FAIL: test-id128
================
=================================================================
==125770==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd4615bd31 at pc 0x7a2f41b1bf33 bp 0x7ffd4615b750 sp 0x7ffd4615b748
WRITE of size 1 at 0x7ffd4615bd31 thread T0
#0 0x7a2f41b1bf32 in id128_to_uuid_string src/libsystemd/sd-id128/id128-util.c:42
#1 0x401f73 in main src/test/test-id128.c:147
#2 0x7a2f41336341 in __libc_start_main (/lib64/libc.so.6+0x20341)
#3 0x401129 in _start (/home/crrodriguez/scm/systemd/.libs/test-id128+0x401129)
Address 0x7ffd4615bd31 is located in stack of thread T0 at offset 1409 in frame
#0 0x401205 in main src/test/test-id128.c:37
This adds parse_nice() that parses a nice level and ensures it is in the right
range, via a new nice_is_valid() helper. It then ports over a number of users
to this.
core: remember first unit failure, not last unit failure
Previously, the result value of a unit was overriden with each failure that
took place, so that the result always reported the last failure that took
place.
With this commit this is changed, so that the first failure taking place is
stored instead. This should normally not matter much as multiple failures are
sufficiently uncommon. However, it improves one behaviour: if we send SIGABRT
to a service due to a watchdog timeout, then this currently would be reported
as "coredump" failure, rather than the "watchodg" failure it really is. Hence,
in order to report information about the type of the failure, and not about
the effect of it, let's change this from all unit type to store the first, not
the last failure.
Let's extend nss-systemd to also synthesize user/group entries for the
UIDs/GIDs 0 and 65534 which have special kernel meaning. Given that nss-systemd
is listed in /etc/nsswitch.conf only very late any explicit listing in
/etc/passwd or /etc/group takes precedence.
This functionality is useful in minimal container-like setups that lack
/etc/passwd files (or only have incompletely populated ones).
core: set $SERVICE_RESULT, $EXIT_CODE and $EXIT_STATUS in ExecStop=/ExecStopPost= commands
This should simplify monitoring tools for services, by passing the most basic
information about service result/exit information via environment variables,
thus making it unnecessary to retrieve them explicitly via the bus.
Vito Caputo [Thu, 4 Aug 2016 20:52:02 +0000 (13:52 -0700)]
fileio: fix read_full_stream() bugs (#3887)
read_full_stream() _always_ allocated twice the memory needed, due to
only breaking the realloc() && fread() loop when fread() returned 0,
requiring another iteration and exponentially enlarged buffer just to
discover the EOF condition.
This also caused file sizes >2MiB && <= 4MiB to erroneously be treated
as E2BIG, due to the inappropriately doubled buffer size exceeding
4*1024*1024.
Also made the 4*1024*1024 magic number a READ_FULL_BYTES_MAX constant.
core: turn various execution flags into a proper flags parameter
The ExecParameters structure contains a number of bit-flags, that were so far
exposed as bool:1, change this to a proper, single binary bit flag field. This
makes things a bit more expressive, and is helpful as we add more flags, since
these booleans are passed around in various callers, for example
service_spawn(), whose signature can be made much shorter now.
Not all bit booleans from ExecParameters are moved into the flags field for
now, but this can be added later.
Beef up the existing var_tmp() call, rename it to var_tmp_dir() and add a
matching tmp_dir() call (the former looks for the place for /var/tmp, the
latter for /tmp).
Both calls check $TMPDIR, $TEMP, $TMP, following the algorithm Python3 uses.
All dirs are validated before use. secure_getenv() is used in order to limite
exposure in suid binaries.
This also ports a couple of users over to these new APIs.
The var_tmp() return parameter is changed from an allocated buffer the caller
will own to a const string either pointing into environ[], or into a static
const buffer. Given that environ[] is mostly considered constant (and this is
exposed in the very well-known getenv() call), this should be OK behaviour and
allows us to avoid memory allocations in most cases.
Note that $TMPDIR and friends override both /var/tmp and /tmp usage if set.