Tejun Heo [Mon, 15 Aug 2016 22:13:36 +0000 (18:13 -0400)]
core: rename cg_unified() to cg_all_unified()
A following patch will update cgroup handling so that the systemd controller
(/sys/fs/cgroup/systemd) can use the unified hierarchy even if the kernel
resource controllers are on the legacy hierarchies. This would require
distinguishing whether all controllers are on cgroup v2 or only the systemd
controller is. In preparation, this patch renames cg_unified() to
cg_all_unified().
However, it should not use `--system` in that case anyway, so this patch
removes the part that should cause a default to be used and adds some
comments.
Daniel Hahler [Sat, 13 Aug 2016 14:41:22 +0000 (16:41 +0200)]
zsh: _journalctl: improve support for handling mode args (#3952)
This only completes fields from `journalctl --user` in _journal_fields when `--user`
is used.
It also changes $_sys_service_mgr to include both `--system` and `--user`,
because `journalctl` behaves different from `systemctl` in this regard.
No attempt is made to filter out invalid combinations, e.g. when using both
`--directory` and `--system` (see https://github.com/systemd/systemd/issues/3949).
Daniel Hahler [Thu, 11 Aug 2016 16:52:13 +0000 (18:52 +0200)]
zsh: _journalctl: handle --user in _journal_none
This uses the same mechanism from _systemctl to inject `--user` into the
`journalctrl -F _EXE` call to list executables.
Before this patch the "commands" section would list executables from
system units always.
Daniel Hahler [Thu, 11 Aug 2016 16:46:31 +0000 (18:46 +0200)]
zsh: _filter_units_by_property: respect --user
Use `$_sys_service_mgr` to handle `--user`, so that `systemctl --user
stop` will correctly filter the active (user) units. Before this patch,
only user units that also exist as system units and are stoppable there
would be listed.
coredump: treat RLIMIT_CORE below page size as disabling coredumps (#3932)
The kernel treats values below a certain threshold (minfmt->min_coredump
which is initialized do ELF_EXEC_PAGESIZE, which varies between architectures,
but is usually the same as PAGE_SIZE) as disabling coredumps [1].
Any core image below ELF_EXEC_PAGESIZE will yield an invalid backtrace anyway [2],
so follow the kernel and not try to parse or store such images.
[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/coredump.c#n660
[2] systemd-coredump[16260]: Process 16258 (sleep) of user 1002 dumped core.
Stack trace of thread 16258:
#0 0x00007f1d8b3d3810 n/a (n/a)
Rhys [Tue, 9 Aug 2016 13:33:46 +0000 (23:33 +1000)]
install: follow config_path symlink (#3362)
Under NixOS, the config_path /etc/systemd/system is a symlink to
/etc/static/systemd/system. Commands such as `systemctl list-unit-files`
and `systemctl is-enabled` did not work as the symlink was not followed.
This does not affect how symlinks are treated within the config_path
directory.
It's hard to say which one of the two mappings should stay. But the later
one would win (when both very present), and nobody complained, so let's
assume that that's the one.
hwdb: remove duplicated matches for old Logitech unifying receiver
Quoting https://github.com/systemd/systemd/pull/3906#discussion_r73828368:
> According to
> http://support.logitech.com/en_us/product/v220-cordless-optical-mouse-for-notebooks
> it seems the mouse is using a pre-version of the small unifying receiver we
> know now. If there are 2 mice with the same receiver, that means that the
> values should both be dropped IMO.
This works for hwdb/[67]0-*.hwdb. I also added code to parse hwdb/20-*, but those
files are huge, and parsing them using this parser is annoyingly slow (about one
minute for the biggest files). So I removed the support for hwdb/20-*, a much simpler
hand-generated parser should suffice for those.
Current output:
hwdb/60-evdev.hwdb: 24 match groups, 35 matches, 88 properties, 0.19323015213012695s to parse
Match 'evdev:input:b0003v05ACp0259*' is duplicated
Match 'evdev:input:b0003v05ACp025A*' is duplicated
Match 'evdev:input:b0003v05ACp025B*' is duplicated
hwdb/60-keyboard.hwdb: 122 match groups, 188 matches, 638 properties, 1.0906572341918945s to parse
Failed to parse: 'KEYBOARD_KEY_8F=switchvideomode'
Failed to parse: 'KEYBOARD_KEY_C0183=media'
Failed to parse: 'KEYBOARD_KEY_C0201=new'
Failed to parse: 'KEYBOARD_KEY_C0289=reply'
Failed to parse: 'KEYBOARD_KEY_C028B=forwardmail'
Failed to parse: 'KEYBOARD_KEY_C028C=send'
Failed to parse: 'KEYBOARD_KEY_C021A=undo'
Failed to parse: 'KEYBOARD_KEY_C0279=redo'
Failed to parse: 'KEYBOARD_KEY_C0208=print'
Failed to parse: 'KEYBOARD_KEY_C0207=save'
Failed to parse: 'KEYBOARD_KEY_C0194=file'
Failed to parse: 'KEYBOARD_KEY_C01A7=documents'
Failed to parse: 'KEYBOARD_KEY_C01B6=images'
Failed to parse: 'KEYBOARD_KEY_C01B7=sound'
Property KEYBOARD_KEY_c7 is duplicated
Failed to parse: 'KEYBOARD_KEY_cF=end'
hwdb/70-mouse.hwdb: 62 match groups, 93 matches, 68 properties, 0.34186625480651855s to parse
Match 'mouse:usb:v046dpc51b:name:Logitech USB Receiver:' is duplicated
hwdb/70-pointingstick.hwdb: 5 match groups, 14 matches, 7 properties, 0.06518816947937012s to parse
hwdb/70-touchpad.hwdb: 3 match groups, 5 matches, 3 properties, 0.039690494537353516s to parse
This way it's clear that the property block does not end at the comment.
The python checker will complain if this is not the case.
We had a few bugs before where two match blocks were merged by mistake,
and this change should help avoid that.
Tejun Heo [Sun, 7 Aug 2016 13:45:39 +0000 (09:45 -0400)]
core: add cgroup CPU controller support on the unified hierarchy
Unfortunately, due to the disagreements in the kernel development community,
CPU controller cgroup v2 support has not been merged and enabling it requires
applying two small out-of-tree kernel patches. The situation is explained in
the following documentation.
While it isn't clear what will happen with CPU controller cgroup v2 support,
there are critical features which are possible only on cgroup v2 such as
buffered write control making cgroup v2 essential for a lot of workloads. This
commit implements systemd CPU controller support on the unified hierarchy so
that users who choose to deploy CPU controller cgroup v2 support can easily
take advantage of it.
On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max"
replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config
options are added with the usual compat translation. CPU quota settings remain
unchanged and apply to both legacy and unified hierarchies.
v2: - Error in man page corrected.
- CPU config application in cgroup_context_apply() refactored.
- CPU accounting now works on unified hierarchy.
buildsys,journal: allow -fsanitize=address without VALGRIND defined
Fixed (master) versions of libtool pass -fsanitize=address correctly
into CFLAGS and LDFLAGS allowing ASAN to be used without any special
configure tricks..however ASAN triggers in lookup3.c for the same
reasons valgrind does. take the alternative codepath if
__SANITIZE_ADDRESS__ is defined as well.
It was meant to write to q instead of t
FAIL: test-id128
================
=================================================================
==125770==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd4615bd31 at pc 0x7a2f41b1bf33 bp 0x7ffd4615b750 sp 0x7ffd4615b748
WRITE of size 1 at 0x7ffd4615bd31 thread T0
#0 0x7a2f41b1bf32 in id128_to_uuid_string src/libsystemd/sd-id128/id128-util.c:42
#1 0x401f73 in main src/test/test-id128.c:147
#2 0x7a2f41336341 in __libc_start_main (/lib64/libc.so.6+0x20341)
#3 0x401129 in _start (/home/crrodriguez/scm/systemd/.libs/test-id128+0x401129)
Address 0x7ffd4615bd31 is located in stack of thread T0 at offset 1409 in frame
#0 0x401205 in main src/test/test-id128.c:37
This adds parse_nice() that parses a nice level and ensures it is in the right
range, via a new nice_is_valid() helper. It then ports over a number of users
to this.
core: remember first unit failure, not last unit failure
Previously, the result value of a unit was overriden with each failure that
took place, so that the result always reported the last failure that took
place.
With this commit this is changed, so that the first failure taking place is
stored instead. This should normally not matter much as multiple failures are
sufficiently uncommon. However, it improves one behaviour: if we send SIGABRT
to a service due to a watchdog timeout, then this currently would be reported
as "coredump" failure, rather than the "watchodg" failure it really is. Hence,
in order to report information about the type of the failure, and not about
the effect of it, let's change this from all unit type to store the first, not
the last failure.
Let's extend nss-systemd to also synthesize user/group entries for the
UIDs/GIDs 0 and 65534 which have special kernel meaning. Given that nss-systemd
is listed in /etc/nsswitch.conf only very late any explicit listing in
/etc/passwd or /etc/group takes precedence.
This functionality is useful in minimal container-like setups that lack
/etc/passwd files (or only have incompletely populated ones).
core: set $SERVICE_RESULT, $EXIT_CODE and $EXIT_STATUS in ExecStop=/ExecStopPost= commands
This should simplify monitoring tools for services, by passing the most basic
information about service result/exit information via environment variables,
thus making it unnecessary to retrieve them explicitly via the bus.
Vito Caputo [Thu, 4 Aug 2016 20:52:02 +0000 (13:52 -0700)]
fileio: fix read_full_stream() bugs (#3887)
read_full_stream() _always_ allocated twice the memory needed, due to
only breaking the realloc() && fread() loop when fread() returned 0,
requiring another iteration and exponentially enlarged buffer just to
discover the EOF condition.
This also caused file sizes >2MiB && <= 4MiB to erroneously be treated
as E2BIG, due to the inappropriately doubled buffer size exceeding
4*1024*1024.
Also made the 4*1024*1024 magic number a READ_FULL_BYTES_MAX constant.
core: turn various execution flags into a proper flags parameter
The ExecParameters structure contains a number of bit-flags, that were so far
exposed as bool:1, change this to a proper, single binary bit flag field. This
makes things a bit more expressive, and is helpful as we add more flags, since
these booleans are passed around in various callers, for example
service_spawn(), whose signature can be made much shorter now.
Not all bit booleans from ExecParameters are moved into the flags field for
now, but this can be added later.
Beef up the existing var_tmp() call, rename it to var_tmp_dir() and add a
matching tmp_dir() call (the former looks for the place for /var/tmp, the
latter for /tmp).
Both calls check $TMPDIR, $TEMP, $TMP, following the algorithm Python3 uses.
All dirs are validated before use. secure_getenv() is used in order to limite
exposure in suid binaries.
This also ports a couple of users over to these new APIs.
The var_tmp() return parameter is changed from an allocated buffer the caller
will own to a const string either pointing into environ[], or into a static
const buffer. Given that environ[] is mostly considered constant (and this is
exposed in the very well-known getenv() call), this should be OK behaviour and
allows us to avoid memory allocations in most cases.
Note that $TMPDIR and friends override both /var/tmp and /tmp usage if set.
journalctl: add new output mode "short-full" (#3880)
This new output mode formats all timestamps using the usual format_timestamp()
call we use pretty much everywhere else. Timestamps formatted this way are some
ways more useful than traditional syslog timestamps as they include weekday,
month and timezone information, while not being much longer. They are also not
locale-dependent. The primary advantage however is that they may be passed
directly to journalctl's --since= and --until= switches as soon as #3869 is
merged.
While we are at it, let's also add "short-unix" to shell completion.
util-lib: make timestamp generation and parsing reversible (#3869)
This patch improves parsing and generation of timestamps and calendar
specifications in two ways:
- The week day is now always printed in the abbreviated English form, instead
of the locale's setting. This makes sure we can always parse the week day
again, even if the locale is changed. Given that we don't follow locale
settings for printing timestamps in any other way either (for example, we
always use 24h syntax in order to make uniform parsing possible), it only
makes sense to also stick to a generic, non-localized form for the timestamp,
too.
- When parsing a timestamp, the local timezone (in its DST or non-DST name)
may be specified, in addition to "UTC". Other timezones are still not
supported however (not because we wouldn't want to, but mostly because libc
offers no nice API for that). In itself this brings no new features, however
it ensures that any locally formatted timestamp's timezone is also parsable
again.
These two changes ensure that the output of format_timestamp() may always be
passed to parse_timestamp() and results in the original input. The related
flavours for usec/UTC also work accordingly. Calendar specifications are
extended in a similar way.
The man page is updated accordingly, in particular this removes the claim that
timestamps systemd prints wouldn't be parsable by systemd. They are now.
The man page previously showed invalid timestamps as examples. This has been
removed, as the man page shouldn't be a unit test, where such negative examples
would be useful. The man page also no longer mentions the names of internal
functions, such as format_timestamp_us() or UNIX error codes such as EINVAL.
core: add new PrivateUsers= option to service execution
This setting adds minimal user namespacing support to a service. When set the invoked
processes will run in their own user namespace. Only a trivial mapping will be
set up: the root user/group is mapped to root, and the user/group of the
service will be mapped to itself, everything else is mapped to nobody.
If this setting is used the service runs with no capabilities on the host, but
configurable capabilities within the service.
This setting is particularly useful in conjunction with RootDirectory= as the
need to synchronize /etc/passwd and /etc/group between the host and the service
OS tree is reduced, as only three UID/GIDs need to match: root, nobody and the
user of the service itself. But even outside the RootDirectory= case this
setting is useful to substantially reduce the attack surface of a service.
This removes the --share-system switch: from the documentation, the --help text
as well as the command line parsing. It's an ugly option, given that it kinda
contradicts the whole concept of PID namespaces that nspawn implements. Since
it's barely ever used, let's just deprecate it and remove it from the options.
It might be useful as a debugging option, hence the functionality is kept
around for now, exposed via an undocumented $SYSTEMD_NSPAWN_SHARE_SYSTEM
environment variable.
nspawn: try to bind mount resolved's resolv.conf snippet into the container
This has the benefit that the container can follow the host's DNS server
changes without us having to constantly update the container's resolv.conf
settings.
Peter Hutterer [Wed, 3 Aug 2016 10:34:56 +0000 (20:34 +1000)]
hwdb: add ID_INPUT_TRACKBALL as additional identifier (#3872)
Whether a device is a trackball or not is a physical property so we should
store this globally, in one place. The new property must be set in addition to
ID_INPUT_MOUSE, otherwise existing clients won't detect the device.
No actual code changes required, the default match rule is simply checking for
"Trackball" in the name (in a few versions), other entries need to be added
manually.
Jakub Filak [Wed, 27 Apr 2016 13:23:49 +0000 (15:23 +0200)]
coredump: save process container parent cmdline
Process container parent is the process used to start processes with a new
user namespace - e.g systemd-nspawn, runc, lxc, etc.
There is not standard way how to find such a process - or I do not know
about it - hence I have decided to find the first process in the parent
process hierarchy with a different mount namespace and different
/proc/self/root's inode.
I have decided for this criteria because in ABRT we take special care
only if the crashed process runs different code than installed on the
host. Other processes with namespaces different than PID 1's namespaces
are just processes running code shipped by the OS vendor and bug
reporting tools can get information about the provider of the code
without the need to deal with changed root and so on.
Ismo Puustinen [Tue, 2 Aug 2016 12:58:30 +0000 (15:58 +0300)]
main: load Smack policy before IMA policy (#3859)
IMA wiki says: "If the IMA policy contains LSM labels, then the LSM
policy must be loaded prior to the IMA policy." Right now, in case of
Smack, the IMA policy is loaded before the Smack policy. Move the order
around to allow Smack labels to be used in IMA policy.
Martin Pitt [Tue, 2 Aug 2016 12:56:45 +0000 (14:56 +0200)]
units: add graphical-session-pre.target user unit (#3848)
This complements graphical-session.target for services which set up the
environment (e. g. dbus-update-activation-environment) and need to run before
the actual graphical session.