units: disable /sys/fs/fuse/connections in private user namespaces (#4592)
The mount fails, even though CAP_SYS_ADMIN is granted.
Only file systems with FU_USERNS_MOUNT in .fs_flags may be mounted in userns,
and the patch to add that fusectl was rejected [1]. It would be nice if we
could check if the kernel has FU_USERNS_MOUNT for a given fs type, since this
could change over time, but this information doesn't seem to be exported.
So let's just skip this mount in userns to avoid an error during boot.
akochetkov [Fri, 11 Nov 2016 17:50:46 +0000 (20:50 +0300)]
timesyncd: clear ADJ_MAXERROR to keep STA_UNSYNC cleared after jump adjust (#4626)
NTP use jump adjust if system has incorrect time read from RTC during boot.
It is desireble to update RTC time as soon as NTP set correct system time.
Sometimes kernel failed to update RTC due to STA_UNSYNC get set before RTC
update finised. In that case RTC time wouldn't be updated within long time.
The commit makes RTC updates stable.
When NTP do jump time adjust using ADJ_SETOFFSET it clears STA_UNSYNC flag.
If don't clear ADJ_MAXERROR, STA_UNSYNC will be set again by kernel within
1 second (by second_overflow() function). STA_UNSYNC flag prevent RTC updates
in kernel. Sometimes the kernel is able to update RTC withing 1 second,
but sometimes it falied.
basic/virt: fix userns check on CONFIG_USER_NS=n kernel (#4651)
ENOENT should be treated as "false", but because of the broken errno check it
was treated as an error. So ConditionVirtualization=user-namespaces probably
returned the correct answer, but only by accident.
This adds a new systemd fstab option x-systemd.mount-timeout. The option
adds a timeout value that specifies how long systemd waits for the mount
command to finish. It allows to mount huge btrfs volumes without issues.
This is equivalent to adding option TimeoutSec= to [Mount] section in a
mount unit file.
Djalal Harouni [Thu, 10 Nov 2016 17:11:37 +0000 (18:11 +0100)]
core:namespace: count and free failed paths inside chase_all_symlinks() (#4619)
This certainly fixes a bug that was introduced by PR
https://github.com/systemd/systemd/pull/4594 that intended to fix
https://github.com/systemd/systemd/issues/4567.
The fix was not complete. This patch makes sure that we count and free
all paths that fail inside chase_all_symlinks().
Susant Sahani [Fri, 4 Nov 2016 09:55:07 +0000 (15:25 +0530)]
Link: port to new ethtool ETHTOOL_xLINKSETTINGS
Link: port to new ethtool ETHTOOL_xLINKSETTINGS
This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
handled by the new get_link_ksettings/set_link_ksettings .
This is a WIP version based on this [kernel
patch](https://patchwork.kernel.org/patch/8411401/).
So revert the default to the legacy hierarchy for now. Developers of the above
software can opt into the unified hierarchy with
"systemd.legacy_systemd_cgroup_controller=0".
busctl introspect: accept direction="out" for signals
According to the D-Bus spec (v0.29),
| The direction element on <arg> may be omitted, in which case it
| defaults to "in" for method calls and "out" for signals. Signals only
| allow "out" so while direction may be specified, it's pointless.
Therefore we still should accept a 'direction' attribute, even if it's
useless in reality.
Christian Hesse [Wed, 9 Nov 2016 03:01:26 +0000 (04:01 +0100)]
nspawn: fix condition for mounting resolv.conf (#4622)
The file /usr/lib/systemd/resolv.conf can be stale, it does not tell us
whether or not systemd-resolved is running or not.
So check for /run/systemd/resolve/resolv.conf as well, which is created
at runtime and hence is a better indication.
Djalal Harouni [Sun, 6 Nov 2016 22:31:55 +0000 (23:31 +0100)]
core: on DynamicUser= make sure that protecting sensitive paths is enforced (#4596)
This adds a variable that is always set to false to make sure that
protect paths inside sandbox are always enforced and not ignored. The only
case when it is set to true is on DynamicUser=no and RootDirectory=/chroot
is set. This allows users to use more our sandbox features inside RootDirectory=
The only exception is ProtectSystem=full|strict and when DynamicUser=yes
is implied. Currently RootDirectory= is not fully compatible with these
due to two reasons:
* /chroot/usr|etc has to be present on ProtectSystem=full
* /chroot// has to be a mount point on ProtectSystem=strict.
build-sys: fix appending of CFLAGS and define __SANE_USERSPACE_TYPES__
It's pointless to call AC_SUBST more than once on the same variable. Because
of all the copypasta, we were mixing CLFAGS and LDFLAGS.
… and the assertion in previous commit was wrong. PPC64 is a special snowflake.
__SANE_USERSPACE_TYPES__ is needed on PPC64 to make __u64 be llu, instead of
lu. Considering that both lu and llu are 64 bits, there's nothing sane about
this, maybe the flag should be called __INSANE_USERSPACE_TYPES__ instead. Sane
or not, this makes ppc64 kernel headers behave consistent with other
architectures. With this flag, no warnings are emitted at -O0 level.
Martin Pitt [Tue, 8 Nov 2016 04:31:55 +0000 (05:31 +0100)]
nspawn: fix exit code for --help and --version (#4609)
Commit b006762 inverted the initial exit code which is relevant for --help and
--version without a particular reason. For these special options, parse_argv()
returns 0 so that our main() immediately skips to the end without adjusting
"ret". Otherwise, if an actual container is being started, ret is set on error
in run(), which still provides the "non-zero exit on error" behaviour.
Martin Pitt [Mon, 7 Nov 2016 18:51:20 +0000 (19:51 +0100)]
tests: use less aggressive systemctl --wait timeout in TEST-03-JOBS (#4606)
If the "systemctl start" happens at an "unlucky" time such as 1000.9 seconds
and then e. g. runs for 2.6 s (sleep 2 plus the overhead of starting the unit
and waiting for it) the END_SEC would be 1003.5s which would round to 1004,
making the difference 4. On busier testbeds the overhead apparently can take a
bit more than 0.5s. The main point is really that it doesn't wait that much
longer, so "-le 4" seems perfectly fine. We allow up to 1.5s in the subsequent
"wait5fail" test below too.
In file included from ./src/basic/macro.h:415:0,
from ./src/shared/acl-util.h:28,
from src/coredump/coredump.c:36:
src/coredump/coredump.c: In function ‘submit_coredump’:
src/coredump/coredump.c:711:26: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 7 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:183:28: note: in expansion of macro ‘log_full’
#define log_info(...) log_full(LOG_INFO, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:711:17: note: in expansion of macro ‘log_info’
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^~~~~~~~
src/coredump/coredump.c:711:26: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 8 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:183:28: note: in expansion of macro ‘log_full’
#define log_info(...) log_full(LOG_INFO, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:711:17: note: in expansion of macro ‘log_info’
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^~~~~~~~
src/coredump/coredump.c:741:27: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 7 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_debug("Not generating stack trace: core size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:182:28: note: in expansion of macro ‘log_full’
#define log_debug(...) log_full(LOG_DEBUG, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:741:17: note: in expansion of macro ‘log_debug’
log_debug("Not generating stack trace: core size %zu is greater than %zu (the configured maximum)",
^~~~~~~~~
src/coredump/coredump.c:741:27: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 8 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_debug("Not generating stack trace: core size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:182:28: note: in expansion of macro ‘log_full’
#define log_debug(...) log_full(LOG_DEBUG, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:741:17: note: in expansion of macro ‘log_debug’
log_debug("Not generating stack trace: core size %zu is greater than %zu (the configured maximum)",
^~~~~~~~~
src/coredump/coredump.c:768:34: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 7 has type ‘uint64_t {aka long long unsigned int}’ [-Wformat=]
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^
./src/basic/log.h:175:82: note: in definition of macro ‘log_full_errno’
? log_internal(_level, _e, __FILE__, __LINE__, __func__, __VA_ARGS__) \
^~~~~~~~~~~
./src/basic/log.h:183:28: note: in expansion of macro ‘log_full’
#define log_info(...) log_full(LOG_INFO, __VA_ARGS__)
^~~~~~~~
src/coredump/coredump.c:768:25: note: in expansion of macro ‘log_info’
log_info("The core will not be stored: size %zu is greater than %zu (the configured maximum)",
^~~~~~~~
We don't have plural in the name of any other -util files and this
inconsistency trips me up every time I try to type this file name
from memory. "formats-util" is even hard to pronounce.
Djalal Harouni [Sun, 6 Nov 2016 21:51:49 +0000 (22:51 +0100)]
core: make RootDirectory= and ProtectKernelModules= work
Instead of having two fields inside BindMount struct where one is stack
based and the other one is heap, use one field to store the full path
and updated it when we chase symlinks. This way we avoid dealing with
both at the same time.
This makes RootDirectory= work with ProtectHome= and ProtectKernelModules=yes
Felipe Sateler [Sun, 6 Nov 2016 14:16:42 +0000 (11:16 -0300)]
delta: skip symlink paths when split-usr is enabled (#4591)
If systemd is built with --enable-split-usr, but the system is indeed a
merged-usr system, then systemd-delta gets all confused and reports
that all units and configuration files have been overridden.
Skip any prefix paths that are symlinks in this case.
core: add new RestrictNamespaces= unit file setting
This new setting permits restricting whether namespaces may be created and
managed by processes started by a unit. It installs a seccomp filter blocking
certain invocations of unshare(), clone() and setns().
RestrictNamespaces=no is the default, and does not restrict namespaces in any
way. RestrictNamespaces=yes takes away the ability to create or manage any kind
of namspace. "RestrictNamespaces=mnt ipc" restricts the creation of namespaces
so that only mount and IPC namespaces may be created/managed, but no other
kind of namespaces.
This setting should be improve security quite a bit as in particular user
namespacing was a major source of CVEs in the kernel in the past, and is
accessible to unprivileged processes. With this setting the entire attack
surface may be removed for system services that do not make use of namespaces.
Fixes:
$ ./libtool --mode execute valgrind --leak-check=full ./journalctl >/dev/null
==22309== Memcheck, a memory error detector
==22309== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==22309== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==22309== Command: /home/vagrant/systemd/.libs/lt-journalctl
==22309==
Hint: You are currently not seeing messages from other users and the system.
Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
Pass -q to turn off this notice.
==22309==
==22309== HEAP SUMMARY:
==22309== in use at exit: 8,680 bytes in 4 blocks
==22309== total heap usage: 5,543 allocs, 5,539 frees, 9,045,618 bytes allocated
==22309==
==22309== 488 (56 direct, 432 indirect) bytes in 1 blocks are definitely lost in loss record 2 of 4
==22309== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==22309== by 0x6F37A0A: __new_var_obj_p (__libobj.c:36)
==22309== by 0x6F362F7: __acl_init_obj (acl_init.c:28)
==22309== by 0x6F37731: __acl_from_xattr (__acl_from_xattr.c:54)
==22309== by 0x6F36087: acl_get_file (acl_get_file.c:69)
==22309== by 0x4F15752: acl_search_groups (acl-util.c:172)
==22309== by 0x113A1E: access_check_var_log_journal (journalctl.c:1836)
==22309== by 0x113D8D: access_check (journalctl.c:1889)
==22309== by 0x115681: main (journalctl.c:2236)
==22309==
==22309== LEAK SUMMARY:
==22309== definitely lost: 56 bytes in 1 blocks
==22309== indirectly lost: 432 bytes in 1 blocks
==22309== possibly lost: 0 bytes in 0 blocks
==22309== still reachable: 8,192 bytes in 2 blocks
==22309== suppressed: 0 bytes in 0 blocks
Direct leak of 48492 byte(s) in 2694 object(s) allocated from:
#0 0x7fb4aba13e60 in malloc (/lib64/libasan.so.3+0xc6e60)
#1 0x7fb4ab5b2cc4 in malloc_multiply src/basic/alloc-util.h:70
#2 0x7fb4ab5b3194 in parse_field src/shared/logs-show.c:98
#3 0x7fb4ab5b4918 in output_short src/shared/logs-show.c:347
#4 0x7fb4ab5b7cb7 in output_journal src/shared/logs-show.c:977
#5 0x5650e29cd83d in main src/journal/journalctl.c:2581
#6 0x7fb4aabdb730 in __libc_start_main (/lib64/libc.so.6+0x20730)
SUMMARY: AddressSanitizer: 48492 byte(s) leaked in 2694 allocation(s).
Follow up for #4546:
> @@ -848,8 +848,7 @@ static int bus_kernel_make_message(sd_bus *bus, struct kdbus_msg *k) {
if (k->src_id == KDBUS_SRC_ID_KERNEL)
bus_message_set_sender_driver(bus, m);
else {
- xsprintf(m->sender_buffer, ":1.%llu",
- (unsigned long long)k->src_id);
+ xsprintf(m->sender_buffer, ":1.%"PRIu64, k->src_id);
This produces:
src/libsystemd/sd-bus/bus-kernel.c: In function ‘bus_kernel_make_message’:
src/libsystemd/sd-bus/bus-kernel.c:851:44: warning: format ‘%lu’ expects argument of type ‘long
unsigned int’, but argument 4 has type ‘__u64 {aka long long unsigned int}’ [-Wformat=]
xsprintf(m->sender_buffer, ":1.%"PRIu64, k->src_id);
^
If we encounter the (unlikely) situation where the combined path to the
new root and a path to a mount to be moved together exceed maximum path length,
we shouldn't crash, but fail this path instead.
This reverts some changes introduced in d054f0a4d4.
xsprintf should be used in cases where we calculated the right buffer
size by hand (using DECIMAL_STRING_MAX and such), and never in cases where
we are printing externally specified strings of arbitrary length.
Unfortunately, github drops the original commiter when a PR is "squashed" (even
if it is only a single commit) and replaces it with some rubbish
github-specific user id. Thus, to make the contributors list somewhat useful,
update the .mailmap file and undo all the weirdness github applied there.
pid1: fix fd memleak when we hit FileDescriptorStoreMax limit
Since service_add_fd_store() already does the check, remove the redundant check
from service_add_fd_store_set().
Also, print a warning when repopulating FDStore after daemon-reexec and we hit
the limit. This is a user visible issue, so we should not discard fds silently.
(Note that service_deserialize_item is impacted by the return value from
service_add_fd_store(), but we rely on the general error message, so the caller
does not need to be modified, and does not show up in the diff.)
core: change mount_synthesize_root() return to int
Let's propagate the error here, instead of eating it up early.
In a later change we should probably also change mount_enumerate() to propagate
errors up, but that would mean we'd have to change the unit vtable, and thus
change all unit types, hence is quite an invasive change.
nspawn: if we set up a loopback device, try to mount it with "discard"
Let's make sure that our loopback files remain sparse, hence let's set
"discard" as mount option on file systems that support it if the backing device
is a loopback.