nspawn: decrease mkdir error logging in /sys to debug priority (#3748)
Such mkdir errors happen for example when trying to mkdir /sys/fs/selinux.
/sys is documented to be readonly in the container, so mkdir errors below /sys
can be expected.
They shouldn't be logged as warnings since they lead users to think that
there is something wrong.
basic/strv: add an extra NUL after strings in strv_make_nulstr
strv_make_nulstr was creating a nulstr which was not a valid nulstr,
because it was missing the terminating NUL. This didn't cause any issues,
because strv_parse_nulstr correctly parsed the result, using the
separately specified length.
But it's confusing to have something called nulstr which really isn't.
It is likely that somebody will try to use strv_make_nulstr() in
some other place, incorrectly.
This patch changes strv_parse_nulstr() to produce a valid nulstr, and
changes the output length parameter to be the minimum number of bytes
which can be later on parsed by strv_parse_nulstr(). This allows the
only user in ask-password-api to be slightly simplified.
manager: don't skip sigchld handler for main and control pid for services (#3738)
During stop when service has one "regular" pid one main pid and one
control pid and the sighld for the regular one is processed first the
unit_tidy_watch_pids will skip the main and control pid and does not
remove them from u->pids(). But then we skip the sigchld event because we
already did one in the iteration and there are two pids in u->pids.
v2: Use general unit_main_pid() and unit_control_pid() instead of
reaching directly to service structure.
Michael Biebl [Sat, 16 Jul 2016 16:51:45 +0000 (18:51 +0200)]
man: mention system-shutdown hook directory in synopsis (#3741)
The distinction between systemd-shutdown the binary vs system-shutdown
the hook directory (without the 'd') is not immediately obvious and can
be quite confusing if you are looking for a directory which doesn't exist.
Therefore explicitly mention the hook directory in the synopsis with a
trailing slash to make it clearer which is which.
This adds a build script and a settings file for "mkosi", a tool for putting
together full, bootable disk images for container managers of EFI systems and
VMs.
With these files it's enough to type "mkosi" in the project directory to
generate a bootable Fedora 24 OS image with a version of systemd compiled fresh
from the working tree.
Sometimes, the persistent storage rules should be skipped for a subset
of devices. For example, the Qubes operating system prevents dom0 from
parsing untrusted block device content (such as filesystem metadata) by
shipping a custom 60-persistent-storage.rules, patched to bail out early
if the device name matches a hardcoded pattern.
As a less brittle and more flexible alternative, this commit adds a line
to the two relevant .rules files which makes them test the value of the
UDEV_DISABLE_PERSISTENT_STORAGE_RULES_FLAG device property, modeled
after the various DM_UDEV_DISABLE_*_RULES_FLAG properties.
Stef Walter [Fri, 15 Jul 2016 10:24:34 +0000 (12:24 +0200)]
udev: Line buffer 'udev monitor' output (#3733)
Callers of the 'udev monitor' tool expect to see output when
an event occurs. The stdio buffering defeats that. This patch
switches it to line buffering.
macros: provide %_systemdgeneratordir and %_systemdusergeneratordir (#3672)
... as requested in
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/DJ7HDNRM5JGBSA4HL3UWW5ZGLQDJ6Y7M/.
Adding the macro makes it marginally easier to create generators
for outside projects.
I opted for "generatordir" and "usergeneratordir" to match
%unitdir and %userunitdir. OTOH, "_systemd" prefix makes it obvious
that this is related to systemd. "%_generatordir" would be to generic
of a name.
Daniel Mack [Fri, 15 Jul 2016 02:56:11 +0000 (04:56 +0200)]
network-ndisc: avoid VLAs (#3725)
Do not allocate objects of dynamic and potentially large size on the stack
to avoid both clang compilation errors and unpredictable runtime behavior
on exotic platforms. Use the heap for that instead.
While at it, refactor the code a bit. Access 's->domain' via
NDISC_DNSSL_DOMAIN(), and refrain from allocating 'x' independently, but
rather reuse 's' if we're dealing with a new entry to the set.
There's really no reason to use 10s here, let's instead default to 90s like we
do for everything else.
The SIGKILL during the final killing spree is in most regards the fourth level
of a safety net, after all: any normal service should have already been stopped
during the normal service shutdown logic, first via SIGTERM and then SIGKILL,
and then also via SIGTERM during the finall killing spree before we send
SIGKILL. And as a fourth level safety net it should only be required in
exceptional cases, which means it's safe to rais the default timeout, as normal
shutdowns should never be delayed by it.
Note that journald excludes itself from the normal service shutdown, and relies
on the final killing spree to terminate it (this is because it wants to cover
the normal shutdown phase's complete logging). If the system's IO is
excessively slow, then the 10s might not be enough for journald to sync
everything to disk and logs might get lost during shutdown.
Luca Bruno [Tue, 12 Jul 2016 09:55:26 +0000 (11:55 +0200)]
seccomp: only abort on syscall name resolution failures (#3701)
seccomp_syscall_resolve_name() can return a mix of positive and negative
(pseudo-) syscall numbers, while errors are signaled via __NR_SCMP_ERROR.
This commit lets the syscall filter parser only abort on real parsing
failures, letting libseccomp handle pseudo-syscall number on its own
and allowing proper multiplexed syscalls filtering.
rules: block: add support for pmem devices (#3683)
Persistent memory devices can be exposed as block devices as /dev/pmemN
and /dev/pmemNs. pmemN is the raw device and is byte-addressable from
within the kernel and when mmapped by applications from a DAX-mounted
file system. pmemNs has the block translation table (BTT) layered on top,
offering atomic sector/block access. Both pmemN and pmemNs are expected
to contain file systems.
blkid(8) and lsblk(8) seem to correctly report on pmemN and pmemNs.
systemd v219 will populate /dev/disk/by-uuid/ when, for example, mkfs is
used on pmem, but systemd v228 does not.
David Michael [Fri, 8 Jul 2016 03:43:01 +0000 (20:43 -0700)]
core: queue loading transient units after setting their properties (#3676)
The unit load queue can be processed in the middle of setting the
unit's properties, so its load_state would no longer be UNIT_STUB
for the check in bus_unit_set_properties(), which would cause it to
incorrectly return an error.
Daniel Mack [Fri, 8 Jul 2016 02:29:35 +0000 (04:29 +0200)]
cgroup: fix memory cgroup limit regression on kernel 3.10 (#3673)
Commit da4d897e ("core: add cgroup memory controller support on the unified
hierarchy (#3315)") changed the code in src/core/cgroup.c to always write
the real numeric value from the cgroup parameters to the
"memory.limit_in_bytes" attribute file.
For parameters set to CGROUP_LIMIT_MAX, this results in the string
"18446744073709551615" being written into that file, which is UINT64_MAX.
Before that commit, CGROUP_LIMIT_MAX was special-cased to the string "-1".
This causes a regression on CentOS 7, which is based on kernel 3.10, as the
value is interpreted as *signed* 64 bit, and clamped to 0:
Hence, all units that are subject to the limits enforced by the memory
controller will crash immediately, even though they have no actual limit
set. This happens to for the user.slice, for instance:
By cleaning up before setting up PAM we maintain control of overriding
behavior in setting variables. Otherwise, pam_putenv is in control.
This also makes sure we use a cleaned up environment in replacing
variables in argv.
Daniel Mack [Thu, 7 Jul 2016 04:30:34 +0000 (06:30 +0200)]
basic: log: Increase static buffer for source file location (#3674)
Commit d054f0a4 ("tree-wide: use xsprintf() where applicable") used a
semantic patch approach to change a number of locations from
snprintf(buf, sizeof(buf), FMT, ...)
to
xsprintf(buf, FMT, ...)
The problem is that xsprintf() wraps the snprintf() in an
assert_message_se(), so if snprintf() reports an overflow of the
destination buffer, the binary will now terminate.
This hit a user running a version of systemd that was built from a
deeply nested system path.
Fix this by
a) Switching back to snprintf() for this particular case. We should really
rather truncate the location string than crash in such situations.
b) Increasing the size of that static string buffer, to make the event more
unlikely.
tests: fix memory leak in test_strv_fnmatch (#3653)
==1447== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1
==1447== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==1447== by 0x5350F19: strdup (in /usr/lib64/libc-2.23.so)
==1447== by 0x4E9D435: strv_new_ap (strv.c:166)
==1447== by 0x4E9D5FA: strv_new (strv.c:199)
==1447== by 0x10E665: test_strv_fnmatch (test-strv.c:693)
==1447== by 0x10EAD5: main (test-strv.c:763)
==1447==
Peter Hutterer [Fri, 1 Jul 2016 05:12:34 +0000 (15:12 +1000)]
rules: set ID_BUS for bluetooth, rmi and i8042
Something has to so we can have udev rules rely on this. Right now the ID_BUS
setting is inconsistent: usb is set, ata and pci are set, bluetooth is not
set, rmi is too new to be featured.
70-mouse even relied on bluetooth even though it was never set
When we encounter a check for an architecture we don't know we should not
let the condition check fail with an error code, but instead simply return
false. After all the architecture might just be newer than the ones we know, in
which case it's certainly not our local one.
sd-event: expose the event loop iteration counter via sd_event_get_iteration() (#3631)
This extends the existing event loop iteration counter to 64bit, and exposes it
via a new function sd_event_get_iteration(). This is helpful for cases like
issue #3612. After all, since we maintain the counter anyway, we might as well
expose it.
(This also fixes an unrelated issue in the man page for sd_event_wait() where
micro and milliseconds got mixed up)
Kyle Walker [Thu, 30 Jun 2016 19:12:18 +0000 (15:12 -0400)]
manager: Only invoke a single sigchld per unit within a cleanup cycle
By default, each iteration of manager_dispatch_sigchld() results in a unit level
sigchld event being invoked. For scope units, this results in a scope_sigchld_event()
which can seemingly stall for workloads that have a large number of PIDs within the
scope. The stall exhibits itself as a SIG_0 being initiated for each u->pids entry
as a result of pid_is_unwaited().
v2:
This patch resolves this condition by only paying to cost of a sigchld in the underlying
scope unit once per sigchld iteration. A new "sigchldgen" member resides within the
Unit struct. The Manager is incremented via the sd event loop, accessed via
sd_event_get_iteration, and the Unit member is set to the same value as the manager each
time that a sigchld event is invoked. If the Manager iteration value and Unit member
match, the sigchld event is not invoked for that iteration.
sd-event: expose the event loop iteration counter via sd_event_get_iteration()
This extends the existing event loop iteration counter to 64bit, and exposes it
via a new function sd_event_get_iteration(). This is helpful for cases like
issue #3612. After all, since we maintain the counter anyway, we might as well
expose it.
(This also fixes an unrelated issue in the man page for sd_event_wait() where
micro and milliseconds got mixed up)
Martin Pitt [Thu, 30 Jun 2016 13:44:22 +0000 (15:44 +0200)]
test: check resolved generated resolv.conf in networkd-test (#3628)
* test: check resolved generated resolv.conf in networkd-test
Directly verify the contents of /run/systemd/resolve/resolv.conf instead of
/etc/resolv.conf. The latter might be a plain file or a symlink to something
else (like Debian's resolvconf output), and in these cases we cannot make
strong assumptions about the contents.
Drop the "/etc/resolv.conf is a symlink" conditions and the "resolv.conf can
have at most three nameservers" alternatives, as we know that resolved always
adds all nameservers.
Explicitly start resolved at the start of a test to ensure that it is running.
* test: get along with existing system search domains in resolv.conf
The previous change has uncovered a bug in the tests: Existing search domains
can exist in resolv.conf which test_search_domains{,_too_long} didn't take into account.
As existing domains take some of the "max 6 domains" and "max 255 chars" limit,
don't expect that the last items from our test data actually appears in the
output, just the first few.
journalct: allow --boot=0 to DTRT with --file/--directory
--boot=0 magically meant "this boot", but when used with --file/--directory it
should simply refer to the last boot found in the specified journal. This way,
--boot and --list-boots are consistent.
Luca Bruno [Tue, 28 Jun 2016 18:14:08 +0000 (20:14 +0200)]
man: clarify NotifyAccess overriding (#3620)
Type=notify has a magic overriding case where a NotifyAccess=none
is turned into a NotifyAccess=main for sanity purposes.
This makes docs more clear about such behavior:
https://github.com/systemd/systemd/blob/2787d83c28b7565ea6f80737170514e5e6186917/src/core/service.c#L650:L651
Martin Pitt [Tue, 28 Jun 2016 16:18:27 +0000 (18:18 +0200)]
resolved: add test for route-only domain filtering (#3609)
With commit 6f7da49d00 route-only domains do not get put into resolv.conf's
"search" list any more. Add a comment about the tri-state, to clarify its
semantics and why we are passing a bool parameter into an int type. Also add a
test case for it.
Tom Gundersen [Sun, 26 Jun 2016 21:05:27 +0000 (23:05 +0200)]
sd-device: device_id - set correctly for 'drivers'
The 'drivers' pseudo-subsystem needs special treatment. These pseudo-devices are
found under /sys/bus/drivers/, so needs the real subsystem encoded
in the device_id in order to be resolved.
The reader side already assumed this to be the case.
catalog: make support URL to show in shipped catalog entries configurable (#3597)
Let's allow distros to change the support URL to expose in catalog entries by
default. It doesn't make sense to direct end-users to the upstream project for
common errors.
This adds a --with-support-url= switch to configure, which allows overriding
the default at build-time.
man: document what Authenticated: in the systemd-resolve output actually means (#3571)
My educated guess is that #3561 was filed due to confusion around the
systemd-resolve "Data Authenticated:" output. Let's try to clean up the
confusion a bit, and document what it means in the man page.