journal: make sure the clock increases strict monotonic
Let's work around crappy clocks in test-journal-interleaving.c too. This
does the same as 98d2a5341788b49e82d628dfdc2e241af6d70dcd but for
test-journal-interlaving.c rather than test-journal-stream.c.
load-fragment: reset the list on an ExecStart= containing only whitespace
This is consistent with how an empty string works in an ExecStart=
statement. We should not differentiate between an empty string and
whitespace only (since they look the same.)
Update the test case with whitespace only to reflect that the list is
reset.
Tested that `test-unit-file` passes and other test cases are not
affected. Installed the patched systemd binaries on a machine, booted
it, looked for out of the usual behavior but did not find any.
load-fragment: use unquote_first_word in config_parse_exec
Convert config_parse_exec() from using FOREACH_WORD_QUOTED into a loop
of unquote_first_word.
Loop through the arguments only once (the FOREACH_WORD_QUOTED
implementation did it twice, once to count them and another time to
process and store them.)
Use newly introduced flag UNQUOTE_UNESCAPE_RELAX to preserve
unrecognized escape sequences such as regexps matches such as "\w",
"\d", etc. (Valid escape sequences such as "\s" or "\b" still need an
extra backslash if literals are desired for regexps.)
Differences in behavior:
- Handle ; (command separator) in special, so that only ; on its own is
valid for that purpose, an quoted semicolon ";" or ';' will now behave
as a literal semicolon. This is probably what was initially intended.
- Handle \; (to introduce a literal semicolon) in special, so that only \;
is turned into a semicolon but not \\; or "\\;" or "\;" which are kept
as a literal \; in the output. This is probably what was initially
intended.
Known issues:
- Using an empty string (for example, ExecStartPre=<empty>) will empty
the list and remove the existing commands, but using whitespace only
(for example, ExecStartPre=<spaces>) will not. This is a pre-existing
issue and will be dealt with in a follow up commit.
Tested:
- Unit tests passing. Also `make distcheck` still works as expected.
- Installed it on a local machine and booted with it, checked console
output, systemctl and journalctl output, did not notice any issues
running the patched systemd binaries.
These tests will be useful to check the cases regarding quoted and
escaped semicolon when we switch to using unquote_first_word.
Additionally, convert some of the tests that have semicolons so that the
argument after the semicolon looks like a path (starts with /) so that
we can see the change of behavior when making config_parse_exec more
strict about what it accepts as a command separator.
It will try to unquot_first_word, but if it runs into escaping problems
it will retry it adding UNQUOTE_CUNESCAPE_RELAX to the flags. If it
succeeds on the second try, it will log a warning about it. If it fails
both times, it will log an error.
util: New flag UNQUOTE_UNESCAPE_RELAX for unquote_first_word
The new flag UNQUOTE_UNESCAPE_RELAX preserves unrecognized escape
sequences verbatim in unquote_first_word, either when it's a trailing
backslash (similar to UNQUOTE_RELAX, but in this case keep the extra
backslash in the output) or in the middle of a sequence string.
Add unit test cases to ensure the new flag works as expected and to
prevent regressions from being introduced.
Tested with a follow up commit converting config_parse_exec() to start
using unquote_first_word, in which case this flags makes it possible to
preserve unrecognized escape sequences.
David Herrmann [Wed, 17 Jun 2015 17:15:58 +0000 (19:15 +0200)]
bus: fix installing DRIVER matches on kdbus
In kdbus we still have to support org.freedesktop.DBus matches even though
there is no real bus driver. The reason is that bus-control.c turns
NameOwnerChanged matches into proper kdbus matches. If we drop DRIVER
matches early, we will never match on name-changes for kdbus.
Two ways to fix this:
1) Install DRIVER matches on kdbus (which is the simple way our and which
is what this patch does).
2) Properly fix the scope-detection to let NameOwnerChanged matches
through (or better: block anything with Member!=NameOwnerChanged).
Not all watchdog drivers implement WDIOC_SETOPTIONS. Drivers which do
not implement it have their device always enabled. So it's fine to
report an error if WDIOS_DISABLECARD is passed and the ioctl is not
implemented, however failing when WDIOS_ENABLECARD is passed and the
ioctl is not implemented is not good: if the device was already
enabled then WDIOS_ENABLECARD was a no-op and wasn't needed in the
first place. So we can just ignore the error and continue.
Simon McVittie [Wed, 17 Jun 2015 15:45:49 +0000 (16:45 +0100)]
logind: save /run/systemd/users/UID before starting user@.service
Previously, this had a race condition during a user's first login.
Some component calls CreateSession (most likely by a PAM service
other than 'systemd-user' running pam_systemd), with the following
results:
- logind:
* create the user's XDG_RUNTIME_DIR
* tell pid 1 to create user-UID.slice
* tell pid 1 to start user@UID.service
Then these two processes race:
- logind:
* save information including XDG_RUNTIME_DIR to /run/systemd/users/UID
- the subprocess of pid 1 responsible for user@service:
* start a 'systemd-user' PAM session, which reads XDG_RUNTIME_DIR
and puts it in the environment
* run systemd --user, which requires XDG_RUNTIME_DIR in the
environment
If logind wins the race, which usually happens, everything is fine;
but if the subprocesses of pid 1 win the race, which can happen
under load, then systemd --user exits unsuccessfully.
To avoid this race, we have to write out /run/systemd/users/UID
even though the service has not "officially" started yet;
previously this did an early-return without saving anything.
Record its state as OPENING in this case.
Bug: https://github.com/systemd/systemd/issues/232 Reviewed-by: Philip Withnall <philip.withnall@collabora.co.uk>
logind: rework display counting when detecting whether the system is docked
Previously, we'd just count connected displays, and if there was 2 or
more we assumed a "docked" state.
With this change we now:
- Only count external displays, ignore internal ones (which we detect by
checking the connector name against a whitelist of known external plug
types)
- We ignore connectors which are explicitly disabled
- We then compare the count with >= 1 rather than >= 2 as before
This new logic has the benefit that systems that disconnect the internal
display when the lid is closed are better supported. Also, explicitly
disabled ports do not confuse the algorithm anymore.
David Herrmann [Wed, 17 Jun 2015 13:11:11 +0000 (15:11 +0200)]
man: add libudev man-pages (skeletons)
This adds man-pages for most of the libudev symbols we export. Similar
symbols are grouped together in a single man-page, with respective links
added. All man-pages contain the full skeleton including NAME, SYNOPSIS,
RETURN VALUE and SEE ALSO. However, most of them still lack the
DESCRIPTION part. This should be copied from the gtkdoc descriptions in
src/libudev/libudev*.[ch]. Any help is welcome! (the whole skeleton is
already done, so it's really just about the prose-part of the man-pages to
be written).
Missing from the man-pages are the following parts:
- udev_set_log_fn()
- udev_[gs]et_log_priority()
- udev_[gs]et_userdata()
- udev_list_entry_foreach()
- udev_device_get_seqnum()
- udev_device_get_usec_since_initialized()
- udev_util_encode_string()
These are considered legacy, afaik. If not, please feel free to add them
now!
Furthermore, udev-hwdb and udev-queue are not documented at all (for the
same reasons).
Daniel Mack [Wed, 17 Jun 2015 12:31:49 +0000 (14:31 +0200)]
core: execute: fix regression in pam_setup()
Commit 72c0a2c25 ("everywhere: port everything to sigprocmask_many()
and friends") reworked code tree-wide to use the new sigprocmask_many()
helper. In this, it caused a regression in pam_setup, because it
dropped a line to initialize the 'ss' signal mask which is later used
in sigwait().
While at it, move the variable declaration to an inner scope.
Eric Cook [Wed, 17 Jun 2015 11:41:24 +0000 (07:41 -0400)]
zsh-completion: _loginctl - general bug fixes
1) the iterator `fun' has an local scope. after running the completer,
it will no longer be defined.
2) use _describe instead of calling compadd. Using compadd without
calling _description or something similar before, restricts the
user's ability to customize what is presented to them.
zstyle ':completion:*' format 'Completing %d'
- now displays an header showing what is being completed.
zstyle ':completion::complete:loginctl-*::users' users user1 user2
- allows the user to manually specify which users is offered
zstyle :completion::complete:loginctl-kill-user:\* \
ignored-patterns '(100<0-4>|user1)'
- selectively ignore some users when completing loginctl kill-user
<tab>
Sessions, UIDs now have descriptions when selecting them.
3) removed the call to _loginctl_all_seats in _loginctl_attach(), since
_loginctl_seats calls it a second time, right before adding matches.
There isn't a noticeable difference doing this.
Simon McVittie [Wed, 17 Jun 2015 10:23:46 +0000 (11:23 +0100)]
Stop talking about the "XDG" version of basename()
XDG refers to X Desktop Group, a former name for freedesktop.org.
This group is responsible for specifications like basedirs,
.desktop files and icon naming, but as far as I know, it has never
tried to redefine basename().
I think these references were meant to say XPG (X/Open Portability
Guide), a precursor of POSIX. POSIX is better-known and less easily
confused with XDG, and is how the basename(3) man page describes
the libgen.h version of basename().
The other version of basename() is glibc-specific and is described
in basename(3) as "the GNU version"; specifically mention that
version, to disambiguate.
sd-bus: suppress installing local bus matches server side
Matches that can only match against messages from the
org.freedesktop.DBus.Local service (or the local interfaces or path)
should never be installed server side, suppress them hence.
Similar, on kdbus matches that can only match driver messages shouldn't
be passed to the kernel.
David Herrmann [Tue, 16 Jun 2015 23:15:09 +0000 (01:15 +0200)]
sd-event: make errors on EPOLL_CTL_DEL pseudo-fatal
If we call EPOLL_CTL_DEL, we *REALLY* expect the file-descriptor to be
present in that given epoll-set. We actually track such state via our
s->io.registered flag, so it better be true.
Make sure if that's not true, we treat it similar to assert_return() (ie.,
print a loud warning).
David Herrmann [Tue, 16 Jun 2015 21:36:36 +0000 (23:36 +0200)]
udev: don't close FDs before dropping them from epoll
Make sure we never close fds before we drop their related event-source.
This will cause horrible disruptions if the fd-num is re-used by someone
else. Under normal conditions, this should not cause any problems as the
close() will drop the fd from the epoll-set automatically. However, this
changes if you have any child processes with a copy of that fd.
This fixes issue #163.
Background:
If you create an epoll-set via epoll_create() (lets call it 'EFD')
you can add file-descriptors to it to watch for events. Whenever
you call EPOLL_CTL_ADD on a file-descriptor you want to watch, the
kernel looks up the attached "struct file" pointer, that this FD
refers to. This combination of the FD-number and the "struct file"
pointer is used as key to link it into the epoll-set (EFD).
This means, if you duplicate your file-descriptor, you can watch
this file-descriptor, too (because the duplicate will have a
different FD-number, hence, the combination of FD-number and
"struct file" is different as before).
If you want to stop watching an FD, you use EPOLL_CTL_DEL and pass
the FD to the kernel. The kernel again looks up your
file-descriptor in your FD-table to find the linked "struct file".
This FD-number and "struct file" combination is then dropped from
the epoll-set (EFD).
Last, but not least: If you close a file-descriptor that is linked
to an epoll-set, the kernel does *NOTHING* regarding the
epoll-set. This is a vital observation! Because this means, your
epoll_wait() calls will still return the metadata you used to
watch/subscribe your file-descriptor to events.
There is one exception to this rule: If the file-descriptor that
you just close()ed was the last FD that referred to the underlying
"struct file", then _all_ epoll-set watches/subscriptions are
destroyed. Hence, if you never dup()ed your FD, then a simple
close() will also unsubscribe it from any epoll-set.
With this in mind, lets look at fork():
Assume you have an epoll-set (EFD) and a bunch of FDs
subscribed to events on that EFD. If you now call fork(),
the new process gets a copy of your file-descriptor table.
This means, the whole table is copied and the "struct
file" reference of each FD is increased by 1. It is
important to notice that the FD-numbers in the child are
exactly the same as in the parent (eg., FD #5 in the child
refers to the same "struct file" as FD #5 in the parent).
This means, if the child calls EPOLL_CTL_DEL on an FD, the
kernel will look up the linked "struct file" and drop the
FD-number and "struct file" combination from the epoll-set
(EFD). However, this will effectively drop the
subscription that was installed by the parent.
To sum up: even though the child gets a duplicate of the
EFD and all FDs, the subscriptions in the EFD are *NOT*
duplicated!
Now, with this in mind, lets look at what udevd does:
Udevd has a bunch of file-descriptors that it watches in its
sd-event main-loop. Whenever a uevent is received, the event is
dispatched on its workers. If no suitable worker is present, a new
worker is fork()ed to handle the event. Inside of this worker, we
try to free all resources we inherited. However, the fork() call
is done from a call-stack that is never rewinded. Therefore, this
call stack might own references that it drops once it is left.
Those references we cannot deduce from the fork()'ed process;
effectively causing us to leak objects in the worker (eg., the
call to sd_event_dispatch() that dispatched our uevent owns a
reference to the sd_event object it used; and drops it again once
the function is left).
(Another example is udev_monitor_ref() for each 'worker' that is
also inherited by all children; thus keeping the udev-monitor and
the uevent-fd alive in all children (which is the real cause for
bug #163))
(The extreme variant is sd_event_source_unref(), which explicitly
keeps event-sources alive, if they're currently dispatched,
knowing that the dispatcher will free the event once done. But
if the dispatcher is in the parent, the child will never ever
free that object, thus leaking it)
This is usually not an issue. However, if such an object has a
file-descriptor embedded, this FD is left open and never closed in
the child.
In manager_exit(), if we now destroy an object (i.e., close its embedded
file-descriptor) before we destroy its related sd_event_source, then
sd-event will not be able to drop the FD from the epoll-set (EFD). This
is, because the FD is no longer valid at the time we call EPOLL_CTL_DEL.
Hence, the kernel cannot figure out the linked "struct file" and thus
cannot remove the FD-number plus "struct file" combination; effectively
leaving the subscription in the epoll-set.
Since we leak the uevent-fd in the children, they retain a copy of the FD
pointing to the same "struct file". Thus, the EFD-subscription are not
automatically removed by close() (as described above). Therefore, the main
daemon will still get its metadata back on epoll_watch() whenever an event
occurs (even though it already freed the metadata). This then causes the
free-after-use bug described in #163.
This patch fixes the order in which we destruct objects and related
sd-event-sources. Some open questions remain:
* Why does source_io_unregister() not warn on EPOLL_CTL_DEL failures?
This really needs to be turned into an assert_return().
* udevd really should not leak file-descriptors into its children. Fixing
this would *not* have prevented this bug, though (since the child-setup
is still async).
It's non-trivial to fix this, though. The stack-context of the caller
cannot be rewinded, so we cannot figure out temporary refs. Maybe it's
time to exec() the udev-workers?
* Why does the kernel not copy FD-subscriptions across fork()?
Or at least drop subscriptions if you close() your FD (it uses the
FD-number as key, so it better subscribe to it)?
Or it better used
FD+"struct file_table*"+"struct file*"
as key to not allow the childen to share the subscription table..
*sigh*
Seems like we have to live with that API forever.
Djalal Harouni [Tue, 16 Jun 2015 16:30:45 +0000 (17:30 +0100)]
nspawn: check if kernel supports userns as early as possible
If the kernel do not support user namespace then one of the children
created by nspawn parent will fail at clone(CLONE_NEWUSER) with the
generic error EINVAL and without logging the error. At the same time
the parent may also try to setup the user namespace and will fail with
another error.
To improve this, check if the kernel supports user namespace as early
as possible.
Tom Gundersen [Tue, 16 Jun 2015 14:22:16 +0000 (16:22 +0200)]
tmpfiles: silently ignore failed removal of btrfs submount from non-dir
This fixes:
Jun 16 16:00:20 tomegun-x2402 systemd-tmpfiles[233]: rm_rf(/var/lib/machines/.#fedora.lck): Not a directory
Jun 16 16:00:20 tomegun-x2402 systemd-tmpfiles[233]: rm_rf(/var/lib/machines/.#Fedora-Cloud-Base-20141203-21.x86_64.raw.lck): Not a directory
tmpfiles: automatically remove old machine snapshots at boot
Remove old temporary snapshots, but only at boot. Ideally we'd have
"self-destroying" btrfs snapshots that go away if the last last
reference to it does. To mimic a scheme like this at least remove the
old snapshots on fresh boots, where we know they cannot be referenced
anymore. Note that we actually remove all temporary files in
/var/lib/machines/ at boot, which should be safe since the directory has
defined semantics. In the root directory (where systemd-nspawn
--ephemeral places snapshots) we are more strict, to avoid removing
unrelated temporary files.
This also splits out nspawn/container related tmpfiles bits into a new
tmpfiles snippet to systemd-nspawn.conf
util: when creating temporary file names, allow including extra id string in it
This adds a "char *extra" parameter to tempfn_xxxxxx(), tempfn_random(),
tempfn_ranomd_child(). If non-NULL this string is included in the middle
of the newly created file name. This is useful for being able to
distuingish the kind of temporary file when we see one.
This also adds tests for the three call.
For now, we don't make use of this at all, but port all users over.
btrfs-util: when snapshotting make sure we don't descent into subvolumes we just created
We already had a safety check in place that we don't end up descending
to the original subvolume again, but we also should avoid descending in
the newly created one.
This is particularly important if we make a snapshot below its source,
like we do in "systemd-nspawn --ephemeral -D /".