Ivan Shapovalov [Wed, 30 Aug 2017 16:49:07 +0000 (19:49 +0300)]
cryptsetup-generator: do not bind to the decrypted device unit (#6538)
This breaks things when the decrypted device is not immediately
`SYSTEMD_READY=1` (e. g. when a multi-device btrfs system is placed on
multiple cryptsetup devices).
systemd-shutdown is run after the network is stopped,
so remounting a network filesystem read-only can hang.
A simple umount is the most useful thing that can
be done for a network filesystem once the network is down.
Alan Jenkins [Wed, 30 Aug 2017 16:47:40 +0000 (17:47 +0100)]
man: fix note for `systemctl enable --global` (#6592)
The last sentence in the paragraph described the behaviour of `--global`. But "the last case" we listed was "only this boot", which does not match... This was the fifth case described, but there are only _four_ different option names. Fix it.
Alan Jenkins [Wed, 30 Aug 2017 16:20:23 +0000 (17:20 +0100)]
units: starting suspend.target should not fail when suspend is successful (#6678)
and the same for hibernate.target and hybrid-sleep.target.
Tested with both sucessful and unsuccessful suspends. The result of the
start job was correct in both cases. Closes #6419 (a regression in v233
and v234).
> suspend is unsual for a target, because it has to stop itself once it's
> started. Otherwise you couldn't start it again, so you could only suspend
> once! Currently that's implemented using BindsTo=systemd-sleep.service.
> Meaning it pulls in systemd-sleep.service to do the actual suspend, and
> then de-activates afterwards. But the behaviour of BindsTo was changed
> recently (not without some issues during development) - maybe this bug
> is caused by poettering/systemd@631b676 which I think was added in
> release v233.
>
> sleep.target (see man systemd.special) has the same need, but it
> implements it differently. It simply has StopWhenUnneeded=yes.
This commit switches suspend.target etc. to the approach used by
sleep.target.
Alan Jenkins [Wed, 30 Aug 2017 16:11:31 +0000 (17:11 +0100)]
sulogin-shell: remove ineffective job mode option from `systemctl isolate` (#6627)
`systemctl default` uses job mode `isolate` (see `action_table`).
The job mode option is ignored.
Note that exiting the emergency shell service by using e.g.
`systemctl isolate multi-user` or `systemctl start multi-user.target`
already kills `emergency.service`. There's only a potential conflict
between your command and the command in systemd-sulogin-shell if you run
something like `systemctl start --no-block multi-user.target; exit`.
Which is nothing like what we told them to do :).
Franck Bui [Wed, 30 Aug 2017 15:16:16 +0000 (17:16 +0200)]
device: make sure to remove all device units sharing the same sysfs path (#6679)
When a device is unplugged all device units sharing the same sysfs path
pointing to that device are supposed to be removed.
However it didn't work since while iterating the device unit list containing
all the relevant units, each unit was removed during each iteration of
LIST_FOREACH. However LIST_FOREACH doesn't support this use case and
LIST_FOREACH_SAFE must be use instead.
Tom Gundersen [Wed, 30 Aug 2017 11:09:03 +0000 (13:09 +0200)]
sd-bus: socket - only transmit auxillary FDs once (#6603)
If a message is too large to fit into the output buffer, it will be
transmitted to the kernel in several chunks. However, the FDs must
only ever be transmitted once or they will bereceived by the remote
end repeatedly.
The D-Bus specification disallows several sets of FDs attached to
one message, however, the reference implementation of D-Bus will
not reject such a message, rather it will reassign the duplicate
FDs to subsequent FD-carrying messages.
This attaches the FD array only to the first byte of the message.
Michal Sekletar [Wed, 30 Aug 2017 11:07:43 +0000 (13:07 +0200)]
README: note that installing valgrind-devel maybe useful to developers (#6502)
Commit also mentions that when running under valgrind we actually don't
execve() systemd-shutdown. We have a comment about this in the code, but
being upfront about this change in behavior doesn't hurt.
Jon Ringle [Wed, 30 Aug 2017 09:38:00 +0000 (05:38 -0400)]
networkd: Honor configured DHCP ClientIdentifier on link_update (#6622)
We have an embedded board with a couple of ethernet ports. From the kernel
log, I can see that the ethernet drivers are obtaining their correct MAC
address, but for some reason, at first systemd-networkd doesn't see the
mac address for the ethernet port at the time that it looks at
dhcp_client_identifier configuration (it has 00:00:00:00:00:00 for mac).
Later on, systemd-networkd gets a link_update() call, and at this time, it
has the correct mac address for the ethernet port. However, in link_update()
the dhcp_client_identifier configuration is not being considered, and a call
to sd_dhcp_client_set_iaid_duid() is being done always
seccomp: default to something resembling the current personality when locking it
Let's lock the personality to the currently set one, if nothing is
specifically specified. But do so with a grain of salt, and never
default to any exotic personality here, but only PER_LINUX or
PER_LINUX32.
Add LockPersonality boolean to allow locking down personality(2)
system call so that the execution domain can't be changed.
This may be useful to improve security because odd emulations
may be poorly tested and source of vulnerabilities, while
system services shouldn't need any weird personalities.
Alan Jenkins [Tue, 29 Aug 2017 09:56:32 +0000 (10:56 +0100)]
fileio: rename function parameter to avoid masking global symbol
> glibc exports a function called sync(), we should probably avoid
> overloading that as a variable here locally (gcc even used to warn about
> that, not sure why it doesn't anymore), to avoid confusion around what
> "if (sync)" actually means
Felipe Sateler [Mon, 28 Aug 2017 16:49:03 +0000 (13:49 -0300)]
shared: Add a linker script so that all functions are tagget @SD_SHARED instead of @Base (#6669)
This helps prevent symbol collisions with other programs and libraries. In particular,
because PAM modules are loaded into the process that is creating the session, and
systemd creates PAM sessions, the potential for collisions is high.
Disambiguate all systemd calls by tagging a 'version' SD_SHARED.
Yu Watanabe [Sun, 27 Aug 2017 07:34:53 +0000 (16:34 +0900)]
journal-remote: show error message if output file name does not end with .journal
`journalctl -o export | systemd-journal-remote -o /tmp/dir -`
gives the following error messages.
```
Failed to open output journal /tmp/dir: Invalid argument
Failed to get writer for source stdin: Invalid argument
Failed to create source for fd:0 (stdin): Invalid argument
```
And these are hard to understand what is the problem.
This commit makes journal-remote check whether the output file name
ends with .journal suffix or not, and if not, output error message.
Michal Sekletar [Fri, 25 Aug 2017 13:36:10 +0000 (15:36 +0200)]
service: attempt to execute next main command only for oneshot services (#6619)
This commit fixes crash described in
https://github.com/systemd/systemd/issues/6533
Multiple ExecStart lines are allowed only for oneshot services
anyway so it doesn't make sense to call service_run_next_main() with
services of type other than SERVICE_ONESHOT.
Referring back to reproducer from the issue, previously we didn't observe
this problem because s->main_command was reset after daemon-reload hence
we never reached the assert statement in service_run_next_main().
Alan Jenkins [Thu, 17 Aug 2017 16:09:44 +0000 (17:09 +0100)]
"Don't fear the fsync()"
For files which are vital to boot
1. Avoid opening any window where power loss will zero them out or worse.
I know app developers all coded to the ext3 implementation, but
the only formal documentation we have says we're broken if we actually
rely on it. E.g.
* `man mount`, search for `auto_da_alloc`.
* http://www.linux-mtd.infradead.org/faq/ubifs.html#L_atomic_change
* https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
2. If we tell the kernel we're interested in writing them to disk, it will
tell us if that fails. So at minimum, this means we play our part in
notifying the user about errors.
I refactored error-handling in `udevadm-hwdb` a little. It turns out I did
exactly the same as had already been done in the `systemd-hwdb` version,
i.e. commit d702dcd.
Alan Jenkins [Tue, 15 Aug 2017 13:22:44 +0000 (14:22 +0100)]
units: order service(s) before udevd, not udev-trigger (coldplug)
Since hotplugs happen as soon as udevd is started, there is not much sense
in giving udev-trigger an After= dependency on any service. The device
could be hotplugged before coldplug starts.
This is intended to avoid the race window where we create the hwdb with
the wrong selinux context (then fix it up afterwards).
https://github.com/systemd/systemd/issues/3458#issuecomment-322444107
core: add two new special ExecStart= character prefixes
This patch adds two new special character prefixes to ExecStart= and
friends, in addition to the existing "-", "@" and "+":
"!" → much like "+", except with a much reduced effect as it only
disables the actual setresuid()/setresgid()/setgroups() calls, but
leaves all other security features on, including namespace
options. This is very useful in combination with
RuntimeDirectory= or DynamicUser= and similar option, as a user
is still allocated and used for the runtime directory, but the
actual UID/GID dropping is left to the daemon process itself.
This should make RuntimeDirectory= a lot more useful for daemons
which insist on doing their own privilege dropping.
"!!" → Similar to "!", but on systems supporting ambient caps this
becomes a NOP. This makes it relatively straightforward to write
unit files that make use of ambient capabilities to let systemd
drop all privs while retaining compatibility with systems that
lack ambient caps, where priv dropping is the left to the daemon
codes themselves.
This is an alternative approach to #6564 and related PRs.
This new group lists all UID/GID credential changing syscalls (which are
quite a number these days). This will become particularly useful in a
later commit, which uses this group to optionally permit user credential
changing to daemons in case ambient capabilities are not available.
These booleans simply store whether selinux/apparmor/smack are supposed
ot be used, and chache the various mac_xyz_use() calls before we
transition into the namespace, hence let's use the same verb for the
variables and the functions: "use"
Let's merge three if blocks that shall only run when sandboxing is applied
into one.
Note that this changes behaviour in one corner case: PrivateUsers=1 is
now honours both PermissionsStartOnly= and the "+" modifier in
ExecStart=, and not just the former, as before. This was an oversight,
so let's fix this now, at a point in time the option isn't used much
yet.
"Permissions" was a bit of a misnomer, as it suggests that UNIX file
permission bits are adjusted, which aren't really changed here. Instead,
this is about UNIX credentials such as users or groups, as well as
namespacing, hence let's use a more generic term here, without any
misleading reference to UNIX file permissions: "sandboxing", which shall
refer to all kinds of sandboxing technologies, including UID/GID
dropping, selinux relabelling, namespacing, seccomp, and so on.
The new unit_set_exec_params() call is to units what
manager_set_exec_params() is to the manager object: it initializes the
various fields from the relevant generic properties set.
execute: add one more ExecFlags flag, for controlling unconditional directory chowning
Let's decouple the Manager object from the execution logic a bit more
here too, and simply pass along the fact whether we should
unconditionally chown the runtime/... directories via the ExecFlags
field too.
execute: let's decouple execute.c a bit from the unit logic
Let's try to decouple the execution engine a bit from the Unit/Manager
concept, and hence pass one more flag as part of the ExecParameters flags
field.
This adds a per-service restart counter. Each time an automatic
restart is scheduled (due to Restart=) it is increased by one. Its
current value is exposed over the bus as NRestarts=. It is also logged
(in a structured, recognizable way) on each restart.
Note that this really only counts automatic starts triggered by Restart=
(which it nicely complements). Manual restarts will reset the counter,
as will explicit calls to "systemctl reset-failed". It's supposed to be
a tool for measure the automatic restart feature, and nothing else.
Alan Jenkins [Wed, 9 Aug 2017 17:56:26 +0000 (18:56 +0100)]
units: console-getty.service: use the default RestartSec
> Note that console-getty.service as more uses than just containers. The
> idea is that it may be used as alternative to the whole VC/logind stuff,
> if all you need is a console on /dev/console, even on physical devices.
This means we want to remove RestartSec=0, for serial systems.
See 4bf0432 "units/serial-getty@.service: use the default RestartSec".
Alan Jenkins [Mon, 7 Aug 2017 18:24:32 +0000 (19:24 +0100)]
units: add Conflicts=rescue.service to container-getty@.service
The traditional runlevel 1 is "single user mode", and shuts down all but
the main console. In systemd, rescue.target provides runlevel1.target.
But it did not shut down logins on secondary consoles... if systemd was
running in a container.
I don't think we strictly need to change this. But when you look at both
container-getty@.service and getty@.service, you see that both have
IgnoreOnIsolate, but only the latter has Conflicts=rescue.service.
This also makes rescue.target in a container consistent with
emergency.target. In the latter case, the gettys were already stopped,
because they have a Requires dependency on sysinit.target.
Alan Jenkins [Wed, 9 Aug 2017 13:43:41 +0000 (14:43 +0100)]
units/console-getty.service: comment reason for ConditionPathExists
Currently we have 4 getty services. 1 has a BindsTo dependency on a
device unit. 3 have ConditionPathExists, but the reason is different in
every single one.
* Add comment to console-getty@.service (see commit 1b41981d)
* getty@.service is already commented
* container-getty.service is not strictly correct, as I realized while
trying to compose a comment. Reported as #6584.
pam_logind: skip leading /dev/ from PAM_TTY field before passing it on
Apparently, PAM documents that the PAM_TTY should come with a /dev
prefix, but we don't expect it so far, except that Wayland ends up
setting it after all, the way the docs suggest. Hence, let's simply drop
the /dev prefix if it is there.
This new helper removes a leading /dev if there is one. We have code
doing this all over the place, let's unify this, and correct it while
we are at it, by using path_startswith() rather than startswith() to
drop the prefix.