audit: introduce audit_session_is_valid() and make use of it everywhere
Let's add a proper validation function, since validation isn't entirely
trivial. Make use of it where applicable. Also make use of
AUDIT_SESSION_INVALID where we need a marker for an invalid audit
session.
The long man page paragraph got it right: the tool is for escaping systemd unit
names, not just system unit names. Also fix the short man page paragraph
and the --help text.
Nicolas Iooss [Mon, 31 Jul 2017 15:45:33 +0000 (17:45 +0200)]
namespace: keep selinuxfs mounted read-write with ProtectKernelTunables (#5741)
When a service unit uses "ProtectKernelTunables=yes", it currently
remounts /sys/fs/selinux read-only. This makes libselinux report SELinux
state as "disabled", because most SELinux features are not usable. For
example it is not possible to validate security contexts (with
security_check_context_raw() or /sys/fs/selinux/context). This behavior
of libselinux has been described in
http://danwalsh.livejournal.com/73099.html and confirmed in a recent
email, https://marc.info/?l=selinux&m=149220233032594&w=2 .
Since commit 0c28d51ac849 ("units: further lock down our long-running
services"), systemd-localed unit uses ProtectKernelTunables=yes.
Nevertheless this service needs to use libselinux API in order to create
/etc/vconsole.conf, /etc/locale.conf... with the right SELinux contexts.
This is broken when /sys/fs/selinux is mounted read-only in the mount
namespace of the service.
Make SELinux-aware systemd services work again when they are using
ProtectKernelTunables=yes by keeping selinuxfs mounted read-write.
core: Do not fail perpetual mount units without fragment (#6459)
mount_load does not require fragment files to be present in order to
load mount units which are perpetual, or come from /proc/self/mountinfo.
mount_verify should do the same, otherwise a synthesized '-.mount' would
be marked as failed with "No such file or directory", as it is perpetual
but not marked to come from /proc/self/mountinfo at this point.
This happens for the user instance, and I suspect it was the cause of #5375
for the system instance, without gpt-generator.
core: properly handle deserialization of unknown unit types (#6476)
We just abort startup, without printing any error. Make sure we always
print something, and when we cannot deserialize some unit, just ignore it and
continue.
rescue.target does not work well, and we don't have a suitable emergency
shell unit that can be started on existing systems right now. So let's just
remove the recommendation for now.
journal-remote: use MHD_OPTION_STRICT_FOR_CLIENT if MHD_USE_PEDANTIC_CHECKS is deprecated
The option MHD_OPTION_STRICT_FOR_CLIENT is provided since libmicrohttpd-0.9.54, and
MHD_USE_PEDANTIC_CHECKS will be deprecated in future.
This makes support both option.
journal-gateway: use MHD_USE_POLL_INTERNAL_THREAD instead of MHD_USE_POLL
The option MHD_USE_THREAD_PER_CONNECTION requires MHD_USE_POLL_INTERNAL_THREAD
since libmicrohttpd-0.9.53.
If MHD_USE_POLL is used instead of MHD_USE_POLL_INTERNAL_THREAD, then
the library outputs the following warning:
```
Warning: MHD_USE_THREAD_PER_CONNECTION must be used only with
MHD_USE_INTERNAL_POLLING_THREAD. Flag MHD_USE_INTERNAL_POLLING_THREAD was added.
Consider setting MHD_USE_INTERNAL_POLLING_THREAD explicitly.
```
The option MHD_USE_POLL_INTERNAL_THREAD is defined as
`MHD_USE_POLL_INTERNAL_THREAD = MHD_USE_POLL | MHD_USE_INTERNAL_POLLING_THREAD,`
So, let's use MHD_USE_POLL_INTERNAL_THREAD instead of MHD_USE_POLL.
the number of the suggestions matches one the function returns.
For consistency with the other internal functions, it should use the first argument
instead of the global variable $mode.
[zj: add commit message to make it sound like we know what we're doing]
README: document that max_bonds=0 is the way to go for bonding.ko
Everything else just is annoying, hence let's list this among the
requirements we make on the kernel in order to minimize confusion
leading to #6184 and suchlike.
DISTRO_PORTING: document that distros may/should change fallback DNS as well as fallback NTP if they wish
The DNS and NTP fallback server situation is pretty similar, and
downstream distros might want to change both to whatever they need,
hence mention them both.
This was done autogen.sh previously and was dropped in 72cdb3e783174dcf9223a49f03e3b0e2ca95ddb8. Let's add it back.
The meson configuration step is the only reasonable place.
Note that this only works for the most standard git dirs, e.g.
the hook will not be installed if git worktree is used or if
$GIT_DIR is specified, etc. I think that's OK because most of
the time meson will be run at least once in the original cloned
dir.
Some kdbus_flag and memfd related parts are left behind, because they
are entangled with the "legacy" dbus support.
test-bus-benchmark is switched to "manual". It was already broken before
(in the non-kdbus mode) but apparently nobody noticed. Hopefully it can
be fixed later.
Since busname units are only useful with kdbus, they weren't actively
used. This was dead code, only compile-tested. If busname units are
ever added back, it'll be cleaner to start from scratch (possibly reverting
parts of this patch).
When we add another name to a unit (by following an alias), we need to
reload all drop-ins. This is necessary to load any additional dropins
found in the dirs created from the alias name.
tree-wide: fput[cs]() → fput[cs]_unlocked() wherever that makes sense (#6396)
As a follow-up for db3f45e2d2586d78f942a43e661415bc50716d11 let's do the
same for all other cases where we create a FILE* with local scope and
know that no other threads hence can have access to it.
For most cases this shouldn't change much really, but this should speed
dbus introspection and calender time formatting up a bit.
nspawn: downgrade warning when we get sd_notify() message from unexpected process (#6416)
Given that we set NOTIFY_SOCKET unconditionally it's not surprising that
processes way down the process tree think it's smart to send us a
notification message.
It's still useful to keep this message, for debugging things, but it
shouldn't be generated by default.
tree-wide: make use of getpid_cached() wherever we can
This moves pretty much all uses of getpid() over to getpid_raw(). I
didn't specifically check whether the optimization is worth it for each
replacement, but in order to keep things simple and systematic I
switched over everything at once.
Harald Hoyer [Thu, 20 Jul 2017 17:13:09 +0000 (19:13 +0200)]
call chase_symlinks without the /sysroot prefix (#6411)
In case fstab-generator is called in the initrd, chase_symlinks()
returns with a canonical path "/sysroot/sysroot/<mountpoint>", if the
"/sysroot" prefix is present in the path.
This patch skips the "/sysroot" prefix for the chase_symlinks() call,
because "/sysroot" is already the root directory and chase_symlinks()
prepends the root directory in the canonical path returned.
test-unit-name: setup fake runtime directory before starting manager (#6401)
Since 3536f49e8fa281539798a7bc5004d73302f39673, manager_new() in
user mode requires XDG_RUNTIME_DIR is set. So, in this commit,
setup_fake_runtime_directory() is added in the beginning of test.
Also, it fixes meson status output which was looking for HAVE_ and ENABLE_
prefixes only (the define under meson was OK, just the summary message was
wrong.)
build-sys: add basic support for ./configure && make && make install
This adds the basic make support required by
https://github.com/cgwalters/build-api. CFLAGS, CXXFLAGS, DESTDIR variables are
supported:
./configure CFLAGS=... CXXFLAGS=... && make && make install DESTDIR=
Automatic rebuilding is removed: it doesn't play well with ninja, because
ninja always writes logs, and even if nothing needs to be built, it will
make the log file owned by root. So let's just remove this, and say that
the user must always do the build first.
I'm also keeping make for the tests, because ninja doesn't play well with
sudo.
Since the build directory is arbitrary, it needs to be specified, e.g.
sudo make BUILD_DIR=/home/zbyszek/src/systemd/build1 -C test/TEST-01-BASIC/
This introduces {State,Cache,Log,Configuration}Directory= those are
similar to RuntimeDirectory=. They create the directories under
/var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode
specified in {State,Cache,Log,Configuration}DirectoryMode=.
Martin Pitt [Mon, 17 Jul 2017 22:06:35 +0000 (00:06 +0200)]
tests: ignore router state in networkd test (#6390)
In networkd-test.py, don't assert that the router state is "routable".
While it should eventually become that, we don't wait for it, and thus
at that point it often is "carrier" or "degrated" still. It is also not
really relevant as this only tests the "client" side interface.
I think it would be confusing for JobRunningTimeoutSec= to have different
syntax then TimeoutSec= and JobTimeoutSec=, so this patch implements the
second option.
Michal Sekletar [Mon, 17 Jul 2017 08:04:37 +0000 (10:04 +0200)]
journald: make sure we retain all stream fds across restarts (#6348)
Currently we set 4096 as maximum for number of stream connections that
we accept. However maximum number of file descriptors that systemd is
willing to accept from us is just 1024. This means we can't retain all
stream connections that we accepted. Hence bump the limit of fds in a
unit file so that systemd holds open all stream fds while we are
restarted.
I messed up when adding the definitions in 4278d1f5310f5acb4c6a6788233625234edb5145.
Unfortunately I didn't have the hardware at hand and went by
looking at the kernel headers.
So don't even try to added the filter to reduce noise.
The test is updated to skip calling _sysctl because the kernel prints
an oops-like message that is confusing and unhelpful:
core: support "nsdelegate" cgroup v2 mount option (#6294)
cgroup namespace wasn't useful for delegation because it allowed resource
control interface files (e.g. memory.high) to be written from inside the
namespace - this allowed the namespace parent's resource distribution to be
disturbed by its namespace-scoped children.
A new mount option, "nsdelegate", was added to cgroup v2 to address this issue.
The flag is meangingful only when mounting cgroup v2 in the init namespace and
makes a cgroup namespace a delegation boundary. The kernel feature is pending
for v4.13.
This should have been the default behavior on cgroup namespaces and this commit
makes systemd try "nsdelegate" first when trying to mount cgroup v2 and fall
back if the option is not supported.
Note that this has danger of breaking usages which depend on modifying the
parent's resource settings from the namespace root, which isn't a valid thing
to do, but such usages may still exist.