This removes the current bus_init() call, as it had multiple problems:
it munged handling of the three bus connections we care about (private,
"api" and system) into one, even though the conditions when which was
ready are very different. It also added redundant logging, as the
individual calls it called all logged on their own anyway.
The three calls bus_init_api(), bus_init_private() and bus_init_system()
are now made public. A new call manager_dbus_is_running() is added that
works much like manager_journal_is_running() and is a lot more careful
when checking whether dbus is around. Optionally it checks the unit's
deserialized_state rather than state, in order to accomodate for cases
where we cant to connect to the bus before deserializing the
"subscribed" list, before coldplugging the units.
manager_recheck_dbus() is added, that works a lot like
manager_recheck_journal() and is invoked in unit_notify(), i.e. when
units change state.
All in all this should make handling a bit more alike to journal
handling, and it also fixes one major bug: when running in user mode
we'll now connect to the system bus early on, without conditionalizing
this in anyway.
I figure saying "systemd" here was a typo, and it should have been
"system". (Yes, it becomes very hard after a while typing "system"
correctly if you type "systemd" so often.) That said, "systemd" in some
ways is actually more correct, since BPF might be available for the
system instance but not in the user instance.
Either way, talking of "this systemd" is weird, let's reword this to be
"this manager", to emphasize that it's the local instance of systemd
where BPF is not available, but that it might be available otherwise.
process-util: be more careful in is_kernel_thread()
This reworks is_kernel_thread() a bit. Instead of checking whether
/proc/$pid/cmdline is entirely empty we now parse the 'flags' field from
/proc/$pid/stat and check the PF_KTHREAD flag, which directly encodes
whether something is a kernel thread.
Why all this? With current kernels userspace processes can set their
command line to empty too (through PR_SET_MM_ARG_START and friends), and
could potentially confuse us. Hence, let's use a more reliable way to
detect kernels like this.
core: fold manager_set_exec_params() into unit_set_exec_params()
Let's simplify things a bit: we so far called both functions every
single time, let's just merge one into the other, so that we have fewer
functions to call.
cgroup: add a new "can_delegate" flag to the unit vtable, and set it for scope and service units only
Currently we allowed delegation for alluntis with cgroup backing
except for slices. Let's make this a bit more strict for now, and only
allow this in service and scope units.
Let's also add a generic accessor unit_cgroup_delegate() for checking
whether a unit has delegation turned on that checks the new bool first.
Also, when doing transient units, let's explcitly refuse turning on
delegation for unit types that don#t support it. This is mostly
cosmetical as we wouldn't act on the delegation request anyway, but
certainly helpful for debugging.
This adds some paranoia code that moves some of the fds we allocate for
longer periods of times to fds > 2 if they are allocated below this
boundary. This is a paranoid safety thing, in order to avoid that
external code might end up erroneously use our fds under the assumption
they were valid stdin/stdout/stderr. Think: some app closes
stdin/stdout/stderr and then invokes 'fprintf(stderr, …' which causes
writes on our fds.
This both adds the helper to do the moving as well as ports over a
number of users to this new logic. Since we don't want to litter all our
code with invocations of this I tried to strictly focus on fds we keep
open for long periods of times only and only in code that is frequently
loaded into foreign programs (under the assumptions that in our own
codebase we are smart enough to always keep stdin/stdout/stderr
allocated to avoid this pitfall). Specifically this means all code used
by NSS and our sd-xyz API:
Simon Fowler [Fri, 9 Feb 2018 16:37:39 +0000 (02:37 +1000)]
Suspend on lid close based on power status. (#8016)
This change adds support for controlling the suspend-on-lid-close
behaviour based on the power status as well as whether the machine is
docked or has an external monitor. For backwards compatibility the new
configuration file variable is ignored completely by default, and must
be set explicitly before being considered in any decisions.
service: relax PID file symlink chain checks a bit (#8133)
Let's read the PID file after all if there's a potentially unsafe
symlink chain in place. But if we do, then refuse taking the PID if its
outside of the cgroup.
So far we didn't document control, transient, dbus config, or generator paths.
But those paths are visible to users, and they need to understand why systemd
loads units from those paths, and how the precedence hierarchy looks.
The whole thing is a bit messy, since the list of paths is quite long.
I made the tables a bit shorter by combining rows for the alternatives
where $XDG_* is set and the fallback.
In various places, tags are split like <element
param="blah">
this. This is necessary to keep everyting in one logical XML line so that
docbook renders the table properly.
Strictly speaking, online upgrades of user instances through daemon-reexec will
be broken. We can get away with this since
a) reexecs of the user instance are not commonly done, at least package upgrade
scripts don't do this afawk.
b) cgroups aren't delegateable on cgroupsv1 there's little reason to use "systemctl
set-property" for --user mode
shared/path-lookup: rearrange paths in --global mode to match --user mode
It's not good if the paths are in different order. With --user, we expect
more paths, but it must be a strict superset, and the order for the ones
that appear in both sets must be the same.
path-lookup: include paths from --global in --user search path too
This doesn't matter that much, because set-property --global does not work,
so at least those paths wouldn't be used automatically. It is still possible
to create such snippets manually, so we better fix this.
Mao [Thu, 1 Feb 2018 09:33:13 +0000 (17:33 +0800)]
udevadm: allow trigger command to be synchronous
There are cases that we want to trigger and settle only specific
commands. For example, let's say at boot time we want to make sure all
the graphics devices are working correctly because it's critical for
booting, but not the USB subsystem (we'll trigger USB events later). So
we do:
However, we cannot block the kernel from emitting kernel events from
discovering USB devices. So if any of the USB kernel event was emitted
before the settle command, the settle command would still wait for the
entire queue to complete. And if the USB event takes a long time to be
processed, the system slows down.
The new `settle` option allows the `trigger` command to wait for only
the triggered events, and effectively solves this problem.
man: fix capability name in man:systemd-tmpfiles(8) (#8139)
CAP_ADMIN does not exist (the closest existing capability name would be
CAP_SYS_ADMIN), and according to man:open(2) and man:capabilities(7),
the capability required to specify O_NOATIME is actually CAP_FOWNER.
Peter Portante [Sun, 28 Jan 2018 21:48:04 +0000 (16:48 -0500)]
Periodically call sd_journal_process in journalctl
If `journalctl` take a long time to process messages, and during that
time journal file rotation occurs, a `journalctl` client will keep
those rotated files open until it calls `sd_journal_process()`, which
typically happens as a result of calling `sd_journal_wait()` below in
the "following" case. By periodically calling `sd_journal_process()`
during the processing loop we shrink the window of time a client
instance has open file descriptors for rotated (deleted) journal
files.
**Warning**
This change does not appear to solve the case of a "paused" output
stream. If somebody is using `journalctl | less` and pauses the
output, then without a background thread periodically listening for
inotify delete events and cleaning up, journal logs will eventually
stop flowing in cases where a journal client with enough open files
causes the "free" disk space threshold to be crossed.
Shawn Landden [Sat, 3 Feb 2018 18:16:33 +0000 (10:16 -0800)]
sd-bus: cleanup ssh sessions (Closes: #8076)
we still invoke ssh unnecessarily when there in incompatible or erreneous input
The fallow-up to finish that would make the code a bit more verbose,
as it would require repeating this bit:
```
r = bus_connect_transport(arg_transport, arg_host, false, &bus);
if (r < 0) {
log_error_errno(r, "Failed to create bus connection: %m");
goto finish;
}
sd_bus_set_allow_interactive_authorization(bus, arg_ask_password);
```
in every verb, after parsing.
v2: add waitpid() to avoid a zombie process, switch to SIGTERM from SIGKILL
v3: refactor, wait in bus_start_address()
Susant Sahani [Thu, 8 Feb 2018 09:22:46 +0000 (14:52 +0530)]
networkd: vxlan require Remote= to be a non multicast address (#8117)
Remote= must be a non multicast address. ip-link(8) says:
> remote IPADDR - specifies the unicast destination IP address to
> use in outgoing packets when the destination link layer address
> is not known in the VXLAN device forwarding database.
Faalagorn [Thu, 8 Feb 2018 08:14:55 +0000 (09:14 +0100)]
man: .service <filename> to <literal> (#8126)
Changed <filename>.service</filename> to <literal>.service</literal> to match style in other manual pages: man 5 systemd.socket, device, mount, automount, swap, target path, timer, slice and scope.
Alan Jenkins [Thu, 8 Feb 2018 08:14:32 +0000 (08:14 +0000)]
journal: avoid code that relies on LOG_KERN == 0 (#8110)
LOG_FAC() is the general way to extract the logging facility (when it has
been combined with the logging priority).
LOG_FACMASK can be used to mask off the priority so you only have the
logging facility bits... but to get the logging facility e.g. LOG_USER,
you also have to bitshift it as well. (The priority is in the low bits,
and so only requires masking).
((priority & LOG_FACMASK) == LOG_KERN) happens to work only because
LOG_KERN is 0, and hence has the same value with or without the bitshift.
Code that relies on weird assumptions like this could make it harder to
realize how the logging values are treated.
Faalagorn [Wed, 7 Feb 2018 18:10:41 +0000 (19:10 +0100)]
man: "reboot" to "power off" in poweroff.target (#8124)
Changed "reboot" to "power off" in poweroff.target description. It was most likely copied and pasted from the reboot.target below, compare with e.g. halt.target
Franck Bui [Wed, 7 Feb 2018 13:08:02 +0000 (14:08 +0100)]
core: use id unit when retrieving unit file state (#8038)
Previous code was using the basename(id->fragment_path) which returned
incorrect result if the unit was an instance.
For example, assuming that no instances of "template" have been created so far:
$ systemctl enable template@1
Created symlink from /etc/systemd/system/multi-user.target.wants/template@1.service to /usr/lib/systemd/system/template@.service.
process-util: use raw_getpid() in getpid_cache() internally (#8115)
We have the raw_getpid() definition in place anyway, and it's certainly
beneficial to expose the same semantics on pre glibc 2.24 and after it
too, hence always bypass glibc for this, and always cache things on our
side.
Add more file triggers to handle more aspects of systemd (#8090)
For quite a while now, there have been file triggers to handle
automatically setting up service units in upstream systemd. However,
most of the actions being done by these macros upon files can be set up
as RPM file triggers.
In fact, in Mageia, we had been doing this for most of these. In particular,
we have file triggers in place for sysusers, tmpfiles, hwdb, and the journal.
This change adds Lua versions of the original file triggers used in Mageia,
based on the existing Lua-based file triggers for service units.
In addition, we can also have useful file triggers for udev rules, sysctl
directives, and binfmt directives. These are based on the other existing
file triggers.
Yu Watanabe [Tue, 6 Feb 2018 08:08:38 +0000 (17:08 +0900)]
nss-mymachines: add work-around to silence gcc warning
This is similar to 3c3d384ae93700ef08545b078c37065fdb98eee7 and
a workaround for the following warning.
```
In file included from ../src/basic/in-addr-util.h:28,
from ../src/nss-mymachines/nss-mymachines.c:31:
../src/nss-mymachines/nss-mymachines.c: In function '_nss_mymachines_getgrnam_r':
../src/nss-mymachines/nss-mymachines.c:653:32: warning: argument to 'sizeof' in 'memset' call is the same pointer type 'char *' as the destination; expected 'char' or an explicit length [-Wsizeof-pointer-memaccess]
memzero(buffer, sizeof(char*));
^~~~
../src/basic/util.h:118:39: note: in definition of macro 'memzero'
#define memzero(x,l) (memset((x), 0, (l)))
^
../src/nss-mymachines/nss-mymachines.c: In function '_nss_mymachines_getgrgid_r':
../src/nss-mymachines/nss-mymachines.c:730:32: warning: argument to 'sizeof' in 'memset' call is the same pointer type 'char *' as the destination; expected 'char' or an explicit length [-Wsizeof-pointer-memaccess]
memzero(buffer, sizeof(char*));
^~~~
../src/basic/util.h:118:39: note: in definition of macro 'memzero'
#define memzero(x,l) (memset((x), 0, (l)))
^
```
Yu Watanabe [Tue, 6 Feb 2018 08:05:58 +0000 (17:05 +0900)]
networkd: fix dhcp6_prefixes_compare_func()
Found by the following warning by gcc.
```
../src/network/networkd-manager.c: In function 'dhcp6_prefixes_compare_func':
../src/network/networkd-manager.c:1383:16: warning: 'memcmp' reading 16 bytes from a region of size 8 [-Wstringop-overflow=]
return memcmp(&a, &b, sizeof(*a));
^
```
Yu Watanabe [Tue, 6 Feb 2018 07:00:34 +0000 (16:00 +0900)]
core: make ExecRuntime be manager managed object
Before this, each ExecRuntime object is owned by a unit. However,
it may be shared with other units which enable JoinsNamespaceOf=.
Thus, by the serialization/deserialization process, its sharing
information, more specifically, reference counter is lost, and
causes issue #7790.
This makes ExecRuntime objects be managed by manager, and changes
the serialization/deserialization process.
Alan Jenkins [Mon, 5 Feb 2018 16:53:40 +0000 (16:53 +0000)]
journal: include kmsg lines from the systemd process which exec()d us (#8078)
Let the journal capture messages emitted by systemd, before it ran
exec("/usr/lib/systemd/systemd-journald"). Usually such messages will only
appear with `systemd.log_level=debug`. kmsg lines written after the exec()
will be ignored as before.
In other words, we are avoiding reading our own lines, which start
"systemd-journald[100]: " assuming we are PID 100. But now we will start
allowing ourself to read lines which start "systemd[100]: ", or any other
prefix which is not "systemd-journald[100]: ".
So this can't help you see messages when we fail to exec() journald :). But,
it makes it easier to see what the pre-exec() messages look like in
the successful case. Comparing messages like this can be useful when
debugging. Noticing weird omissions of messages, otoh, makes me anxious.
nss-systemd: add work-around to silence gcc warning
In file included from ../src/basic/fs-util.h:32,
from ../src/nss-systemd/nss-systemd.c:28:
../src/nss-systemd/nss-systemd.c: In function '_nss_systemd_getgrnam_r':
../src/nss-systemd/nss-systemd.c:416:32: warning: argument to 'sizeof' in 'memset' call is the same pointer type 'char *' as the destination; expected 'char' or an explicit length [-Wsizeof-pointer-memaccess]
memzero(buffer, sizeof(char*));
^~~~
../src/basic/util.h:118:39: note: in definition of macro 'memzero'
#define memzero(x,l) (memset((x), 0, (l)))
^
gcc is trying to be helpful, and it's not far from being right. It _looks_ like
sizeof(char*) is an error, but in this case we're really leaving a space empty
for a pointer, and our calculation is correct. Since this is a short file,
let's just use simplest option and turn off the warning above the two functions
that trigger it.
I expect that this will be mostly obsoleted by transfiletriggers that
(I hope) we will soon add. But let's do this for completeness anyway.
I'm keeping the description of the macro a bit vague, since I expect
that it'll be changed when transfiletriggers are added.
man: document meaning of age in tmpfiles.d (#8092)
This documents how the age of a file is determined, which previously was
only alluded to in other parts of the documentation. Fixes #8091.
The phrasings of “last modification timestamp” etc. are taken from
man:inode(7) (as of man-pages 4.14). The debug messages in tmpfiles.c
use different messages (“modify time”), which according to a code
comment follow man:stat(1); however, my copy of that manpage (from GNU
coreutils 8.29) documents %y as “time of last data modification”
instead.
test: sort imports and use "new" string formatting
Followed PEP8 and PEP3101 rules (#8079)
Imports re-ordered by Alphabetical Standarts for following PEP8
Old type string formattings (" example %s " % exampleVar ) re-writed as new type string
formattings ( " example {} ".format(exampleVar) ) for following PEP3101
Yu Watanabe [Thu, 1 Feb 2018 10:39:30 +0000 (19:39 +0900)]
systemctl: show: use EnvironmentFiles= instead of EnvironmentFile=
EnvironmentFile= is used in the unit file, but in the dbus,
the related field name is EnvironmentFiles=.
As the other variables, let's use the field name instead of the name
used in the unit file setting.
Alan Jenkins [Sun, 4 Feb 2018 20:46:27 +0000 (20:46 +0000)]
slice: system.slice should be perpetual like -.mount
`-.mount` is placed in `system.slice`, and hence depends on it.
`-.mount` is always active and can never be stopped. Therefore the same
should be true of `system.slice`.
Synthesize it as perpetual (unless systemd is running as a user manager).
Notice we also drop `Before=slices.target` as unnecessary.
AFAICS the justification for `perpetual` is to provide extra protection
against unintentionally stopping every single service. So adding
system.slice to the perpetual units is perfectly consistent.
I don't expect this will (or can) fix any other problem. And the
`perpetual` protection probably isn't formal enough to spend much time
thinking about. I've just noticed this a couple of times, as something
that looks strange.
Might be a bit surprising that we have user.slice on-disk but not
system.slice, but I think it's ok. `systemctl status system.slice` will
still point you towards `man systemd.special`. The only detail is that the
system slice disables `DefaultDependencies`. If you're worrying about how
system shutdown works when you read `man systemd.slice`, I think it is not
too hard to guess that system.slice might do this:
> Only slice units involved with early boot
> or late system shutdown should disable this option
(Docs are great. I really appreciate the systemd ones).
Alan Jenkins [Sun, 4 Feb 2018 20:16:50 +0000 (20:16 +0000)]
slice, scope: IgnoreOnIsolate=yes is already the default
`IgnoreOnIsolate=yes` is the default for slices and scopes. So it's not
essential to set it on root.slice or init.scope.
We don't need to worry about a bad unit file configuration. Any attempt
to stop these unit should fail, since we mark them as `perpetual`.
Also since init.scope cannot be stopped, there is no point setting
`KillSignal=SIGRTMIN+14`. According to both documentation and testing,
KillSignal= does not affect the behaviour of `systemctl kill`.
Alan Jenkins [Fri, 2 Feb 2018 16:06:32 +0000 (16:06 +0000)]
seccomp: allow x86-64 syscalls on x32, used by the VDSO (fix #8060)
The VDSO provided by the kernel for x32, uses x86-64 syscalls instead of
x32 ones.
I think we can safely allow this; the set of x86-64 syscalls should be
very similar to the x32 ones. The real point is not to allow *x86*
syscalls, because some of those are inconveniently multiplexed and we're
apparently not able to block the specific actions we want to.
Boucman [Fri, 2 Feb 2018 14:58:40 +0000 (15:58 +0100)]
do not report total time when kernel time is not provided (#8063)
the whole systemd-analyze time logic is based on the fact that monotonic
time 0 is the start of the kernel.
If the firmware does not provide a correct time, firmware_time degrades to
0, which is the start of the kernel. The diference between FinishTime and
firmware_time is thus correct.
That assumption is still true with containers, but the start time of the
kernel is not what the user expects : It's the time when the host booted.
The total is thus still correct, but highly misleading. Containers can be
easily detected (and, in fact, already are) by systemd not reporting any
kernel non-monotonic timestamp.
This patch simply avoids printing a misleading time when it can detect that
case
basic/hashmap: tweak code to avoid pointless gcc warning
gcc says:
[196/1142] Compiling C object 'src/basic/basic@sta/hashmap.c.o'.
../src/basic/hashmap.c: In function ‘cachemem_maintain’:
../src/basic/hashmap.c:1913:17: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
mem->active = r = true;
^~~
which conflates two things: the first is transitive assignent a = b = c = d;
the second is assignment of the value of an expression, which happens to be a
an assignment expression here, and boolean. While the second _should_ be
parenthesized, the first should _not_, and it's more natural to understand
our code as the first, and gcc should treat this as an exception and not emit
the warning. But since it's a while until this will be fixed, let's update
our code too.
This is close to %sysusers_create_inline and %sysusers_create that we had
already, but expects a file name and uses --replace= to implement proper
priority.
This is used like:
%sysusers_create_package %{name} %SOURCE1
where %SOURCE1 is a file with called %{name}.conf that will be installed
into /usr/lib/sysusers.d/.
The tough part is that the file needs to be available before %prep,
i.e. outside of the source tarball. This is because the spec file is
parsed (and any macros expanded), before the sources are unpackaged.
v2:
- disallow the case case when --config-name= is given but there are no
positional args. Most likely this would be a user error, so at least for now
forbid it.
v3:
- replace --config-name= with --target=
- drop quotes around %1 and %2 — if necessary, the caller should add
those.
v4:
- replace --target with --replace
- add a big comment
sysusers: allow admin/runtime overrides to command-line config
When used in a package installation script, we want to invoke systemd-sysusers
before that package is installed (so it can contain files owned by the newly
created user), so the configuration to use is specified on the command
line. This should be a copy of the configuration that will be installed as
/usr/lib/sysusers.d/package.conf. We still want to obey any overrides in
/etc/sysusers.d or /run/sysusers.d in the usual fashion. Otherwise, we'd get a
different result when systemd-sysusers is run with a copy of the new config on
the command line and when systemd-sysusers is run at boot after package
instalation. In the second case any files in /etc or /run have higher priority,
so the same should happen when the configuration is given on the command line.
More generally, we want the behaviour in this special case to be as close to
the case where the file is finally on disk as possible, so we have to read all
configuration files, since they all might contain overrides and additional
configuration that matters. Even files that have lower priority might specify
additional groups for the user we are creating. Thus, we need to read all
configuration, but insert our new configuration somewhere with the right
priority.
If --target=/path/to/file.conf is given on the command line, we gather the list
of files, and pretend that the command-line config is read from
/path/to/file.conf (doesn't matter if the file on disk actually exists or
not). All package scripts should use this option to obtain consistent and
idempotent behaviour.
The corner case when --target= is specified and there are no positional
arguments is disallowed.
v1:
- version with --config-name=
v2:
- disallow --config-name= and no positional args
v3:
- remove --config-name=
v4:
- add --target= and rework the code completely
v5:
- fix argcounting bug and add example in man page
v6:
- rename --target to --replace