When we add another name to a unit (by following an alias), we need to
reload all drop-ins. This is necessary to load any additional dropins
found in the dirs created from the alias name.
tree-wide: fput[cs]() → fput[cs]_unlocked() wherever that makes sense (#6396)
As a follow-up for db3f45e2d2586d78f942a43e661415bc50716d11 let's do the
same for all other cases where we create a FILE* with local scope and
know that no other threads hence can have access to it.
For most cases this shouldn't change much really, but this should speed
dbus introspection and calender time formatting up a bit.
nspawn: downgrade warning when we get sd_notify() message from unexpected process (#6416)
Given that we set NOTIFY_SOCKET unconditionally it's not surprising that
processes way down the process tree think it's smart to send us a
notification message.
It's still useful to keep this message, for debugging things, but it
shouldn't be generated by default.
tree-wide: make use of getpid_cached() wherever we can
This moves pretty much all uses of getpid() over to getpid_raw(). I
didn't specifically check whether the optimization is worth it for each
replacement, but in order to keep things simple and systematic I
switched over everything at once.
Harald Hoyer [Thu, 20 Jul 2017 17:13:09 +0000 (19:13 +0200)]
call chase_symlinks without the /sysroot prefix (#6411)
In case fstab-generator is called in the initrd, chase_symlinks()
returns with a canonical path "/sysroot/sysroot/<mountpoint>", if the
"/sysroot" prefix is present in the path.
This patch skips the "/sysroot" prefix for the chase_symlinks() call,
because "/sysroot" is already the root directory and chase_symlinks()
prepends the root directory in the canonical path returned.
test-unit-name: setup fake runtime directory before starting manager (#6401)
Since 3536f49e8fa281539798a7bc5004d73302f39673, manager_new() in
user mode requires XDG_RUNTIME_DIR is set. So, in this commit,
setup_fake_runtime_directory() is added in the beginning of test.
Also, it fixes meson status output which was looking for HAVE_ and ENABLE_
prefixes only (the define under meson was OK, just the summary message was
wrong.)
build-sys: add basic support for ./configure && make && make install
This adds the basic make support required by
https://github.com/cgwalters/build-api. CFLAGS, CXXFLAGS, DESTDIR variables are
supported:
./configure CFLAGS=... CXXFLAGS=... && make && make install DESTDIR=
Automatic rebuilding is removed: it doesn't play well with ninja, because
ninja always writes logs, and even if nothing needs to be built, it will
make the log file owned by root. So let's just remove this, and say that
the user must always do the build first.
I'm also keeping make for the tests, because ninja doesn't play well with
sudo.
Since the build directory is arbitrary, it needs to be specified, e.g.
sudo make BUILD_DIR=/home/zbyszek/src/systemd/build1 -C test/TEST-01-BASIC/
This introduces {State,Cache,Log,Configuration}Directory= those are
similar to RuntimeDirectory=. They create the directories under
/var/lib, /var/cache/, /var/log, or /etc, respectively, with the mode
specified in {State,Cache,Log,Configuration}DirectoryMode=.
Martin Pitt [Mon, 17 Jul 2017 22:06:35 +0000 (00:06 +0200)]
tests: ignore router state in networkd test (#6390)
In networkd-test.py, don't assert that the router state is "routable".
While it should eventually become that, we don't wait for it, and thus
at that point it often is "carrier" or "degrated" still. It is also not
really relevant as this only tests the "client" side interface.
I think it would be confusing for JobRunningTimeoutSec= to have different
syntax then TimeoutSec= and JobTimeoutSec=, so this patch implements the
second option.
Michal Sekletar [Mon, 17 Jul 2017 08:04:37 +0000 (10:04 +0200)]
journald: make sure we retain all stream fds across restarts (#6348)
Currently we set 4096 as maximum for number of stream connections that
we accept. However maximum number of file descriptors that systemd is
willing to accept from us is just 1024. This means we can't retain all
stream connections that we accepted. Hence bump the limit of fds in a
unit file so that systemd holds open all stream fds while we are
restarted.
I messed up when adding the definitions in 4278d1f5310f5acb4c6a6788233625234edb5145.
Unfortunately I didn't have the hardware at hand and went by
looking at the kernel headers.
So don't even try to added the filter to reduce noise.
The test is updated to skip calling _sysctl because the kernel prints
an oops-like message that is confusing and unhelpful:
core: support "nsdelegate" cgroup v2 mount option (#6294)
cgroup namespace wasn't useful for delegation because it allowed resource
control interface files (e.g. memory.high) to be written from inside the
namespace - this allowed the namespace parent's resource distribution to be
disturbed by its namespace-scoped children.
A new mount option, "nsdelegate", was added to cgroup v2 to address this issue.
The flag is meangingful only when mounting cgroup v2 in the init namespace and
makes a cgroup namespace a delegation boundary. The kernel feature is pending
for v4.13.
This should have been the default behavior on cgroup namespaces and this commit
makes systemd try "nsdelegate" first when trying to mount cgroup v2 and fall
back if the option is not supported.
Note that this has danger of breaking usages which depend on modifying the
parent's resource settings from the namespace root, which isn't a valid thing
to do, but such usages may still exist.
journal: use context_attach_window() in add_mmap() (#6339)
Instead of context_detach_window() and a manual attach of the new
window, simply call context_attach_window() which performs the
detach first if appropriate.
generator_add_symlink() is extended to ignore EEXIST. This should be fine
for all existing callers.
There's a small difference in behaviour when adding symlinks in sysv-generator:
the message is more generic and does not include ", ignored". But creation of
symlinks shouldn't ever fail except if things are very wrong, so in practice
this shouldn't matter.
Test needed updating: os.path.exists(os.readlink(link)) only works if the link
is absolute (or if we are in the right directory). Let's just use
os.path.exists(link), which properly tests that the symlink target exists.
journal_file_move_to_object() can skip the second
journal_file_move_to() call if the first one already mapped a
sufficiently large area.
Now that mmap_cache_get() returns the size of the mapped area
when asked, ask for the size and only perform the second call if
the required size exceeds the mapped size instead of the object
header size.
This results in a nice performance boost in my testing, even with
a corpus of many small logs burning much CPU time elsewhere:
Before:
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m16.330s
user 0m16.281s
sys 0m0.046s
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m16.409s
user 0m16.358s
sys 0m0.048s
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m16.625s
user 0m16.558s
sys 0m0.061s
After:
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m15.311s
user 0m15.257s
sys 0m0.046s
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m15.201s
user 0m15.135s
sys 0m0.062s
# time ./journalctl -b -1 --no-pager > /dev/null
real 0m15.170s
user 0m15.113s
sys 0m0.053s
If requested, return the actual mapping size to the caller in
addition to the address.
journal_file_move_to_object() often performs two successive
mmap_cache_get() calls via journal_file_move_to(); one to get the
object header, then another to get the entire object when it's
larger than the header's size.
If mmap_cache_get() returned the actual mapping's size, it's
probable that the second mmap_cache_get() could be skipped when
the established mapping already encompassed the desired size.
core/load-fragment: refuse units with errors in certain directives
If an error is encountered in any of the Exec* lines, WorkingDirectory,
SELinuxContext, ApparmorProfile, SmackProcessLabel, Service (in .socket
units), User, or Group, refuse to load the unit. If the config stanza
has support, ignore the failure if '-' is present.
For those configuration directives, even if we started the unit, it's
pretty likely that it'll do something unexpected (like write files
in a wrong place, or with a wrong context, or run with wrong permissions,
etc). It seems better to refuse to start the unit and have the admin
clean up the configuration without giving the service a chance to mess
up stuff.
Note that all "security" options that restrict what the unit can do
(Capabilities, AmbientCapabilities, Restrict*, SystemCallFilter, Limit*,
PrivateDevices, Protect*, etc) are _not_ treated like this. Such options are
only supplementary, and are not always available depending on the architecture
and compilation options, so unit authors have to make sure that the service
runs correctly without them anyway.
Colin Walters [Tue, 11 Jul 2017 16:48:57 +0000 (12:48 -0400)]
fstab-generator: Chase symlinks where possible (#6293)
This has a long history; see see 5261ba901845c084de5a8fd06500ed09bfb0bd80
which originally introduced the behavior. Unfortunately that commit
doesn't include any rationale, but IIRC the basic issue is that
systemd wants to model the real mount state as units, and symlinks
make canonicalization much more difficult.
At the same time, on a RHEL6 system (upstart), one can make e.g. `/home` a
symlink, and things work as well as they always did; but one doesn't have
access to the sophistication of mount units (dependencies, introspection, etc.)
Supporting symlinks here will hence make it easier for people to do upgrades to
RHEL7 and beyond.
The `/home` as symlink case also appears prominently for OSTree; see
https://ostree.readthedocs.io/en/latest/manual/adapting-existing/
A basic limitation with doing this in the fstab generator (and that I hit while
doing some testing) is that we obviously can't chase symlinks into mounts,
since the generator runs early before mounts. Or at least - doing so would
require multiple passes over the fstab data (as well as looking at existing
mount units), and potentially doing multi-phase generation. I'm not sure it's
worth doing that without a real world use case. For now, this will fix at least
the OSTree + `/home` <https://bugzilla.redhat.com/show_bug.cgi?id=1382873> case
mentioned above, and in general anyone who for whatever reason has symlinks in
their `/etc/fstab`.
systemd: do not stop units bound to inactive units while coldplugging (#6316)
When running systemd-analyze verify I would get a random subset of warnings
(sometimes none, sometimes one or two):
dev-mapper-luks\x2d8db85dcf\x2d6230\x2d4e88\x2d940d\x2dba176d062b31.swap: Unit is bound to inactive unit dev-mapper-luks\x2d8db85dcf\x2d6230\x2d4e88\x2d940d\x2dba176d062b31.device. Stopping, too.
home.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device. Stopping, too.
boot.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-56c56bfd\x2d93f0\x2d48fb\x2dbc4b\x2d90aa67144ea5.device. Stopping, too.
When running with debug on, it's pretty obvious what is happening:
home.mount: Changed dead -> mounted
home.mount: Unit is bound to inactive unit dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device. Stopping, too.
home.mount: Trying to enqueue job home.mount/stop/fail
home.mount: Installed new job home.mount/stop as 27
home.mount: Enqueued job home.mount/stop as 27
...
dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device: Installed new job dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device/start as 47
dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device: Changed dead -> plugged
dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device: Job dev-disk-by\x2duuid-75751556\x2d6e31\x2d438b\x2d99c9\x2dd626330d9a1b.device/start finished, result=done
resolved: allow resolution of names which libidn2 considers invalid (#6315)
https://tools.ietf.org/html/rfc5891#section-4.2.3.1 says that
> The Unicode string MUST NOT contain "--" (two consecutive hyphens) in the third
> and fourth character positions and MUST NOT start or end with a "-" (hyphen).
This means that libidn2 refuses to encode such names.
Let's just resolve them without trying to use IDN.
random-util: always cast from smaller to bigger type when comparing
When we compare two size values, let's make sure we cast from the
smaller to the bigger type first, if both types differ, rather than the
reverse in order to not run into overflows.
This way we have a MMapFileDescriptor reference external to the cache,
and can supply the handle directly to mmap_cache_get(), eliminating
hashmap lookups entirely from the hot path.
This should make output deterministic, and independent of the directory
layout on disk. Just using ordered hashmaps would be enough to make
the output deterministic on a specific machine, but to make it
identical on different machines with the same set of files and
directories, names are sorted after being use.
mount: change find_loop_device() error code when no loop device is found to ENXIO
ENOENT is a bit too likely to be returned for various reasons, for
example if /sys or /proc are not mounted and hence the files we need not
around. Hence, let's use ENXIO instead, which is equally fitting for the
purpose but has the benefit that the underlying calls won't generate
this error on their own, hence any ambiguity is removed.
mount: rework find_loop_device() to log about no errors
We should either log about all errors in a function, or about none (and
then leave the logging about it to the caller who we propagate the error
to). Given that the callers of find_loop_device() already log about the
returned errors let's hence suppress the log messages in
find_loop_device() itself.