seccomp: default to something resembling the current personality when locking it
Let's lock the personality to the currently set one, if nothing is
specifically specified. But do so with a grain of salt, and never
default to any exotic personality here, but only PER_LINUX or
PER_LINUX32.
Add LockPersonality boolean to allow locking down personality(2)
system call so that the execution domain can't be changed.
This may be useful to improve security because odd emulations
may be poorly tested and source of vulnerabilities, while
system services shouldn't need any weird personalities.
Alan Jenkins [Tue, 29 Aug 2017 09:56:32 +0000 (10:56 +0100)]
fileio: rename function parameter to avoid masking global symbol
> glibc exports a function called sync(), we should probably avoid
> overloading that as a variable here locally (gcc even used to warn about
> that, not sure why it doesn't anymore), to avoid confusion around what
> "if (sync)" actually means
Felipe Sateler [Mon, 28 Aug 2017 16:49:03 +0000 (13:49 -0300)]
shared: Add a linker script so that all functions are tagget @SD_SHARED instead of @Base (#6669)
This helps prevent symbol collisions with other programs and libraries. In particular,
because PAM modules are loaded into the process that is creating the session, and
systemd creates PAM sessions, the potential for collisions is high.
Disambiguate all systemd calls by tagging a 'version' SD_SHARED.
Michal Sekletar [Fri, 25 Aug 2017 13:36:10 +0000 (15:36 +0200)]
service: attempt to execute next main command only for oneshot services (#6619)
This commit fixes crash described in
https://github.com/systemd/systemd/issues/6533
Multiple ExecStart lines are allowed only for oneshot services
anyway so it doesn't make sense to call service_run_next_main() with
services of type other than SERVICE_ONESHOT.
Referring back to reproducer from the issue, previously we didn't observe
this problem because s->main_command was reset after daemon-reload hence
we never reached the assert statement in service_run_next_main().
Alan Jenkins [Thu, 17 Aug 2017 16:09:44 +0000 (17:09 +0100)]
"Don't fear the fsync()"
For files which are vital to boot
1. Avoid opening any window where power loss will zero them out or worse.
I know app developers all coded to the ext3 implementation, but
the only formal documentation we have says we're broken if we actually
rely on it. E.g.
* `man mount`, search for `auto_da_alloc`.
* http://www.linux-mtd.infradead.org/faq/ubifs.html#L_atomic_change
* https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
2. If we tell the kernel we're interested in writing them to disk, it will
tell us if that fails. So at minimum, this means we play our part in
notifying the user about errors.
I refactored error-handling in `udevadm-hwdb` a little. It turns out I did
exactly the same as had already been done in the `systemd-hwdb` version,
i.e. commit d702dcd.
core: add two new special ExecStart= character prefixes
This patch adds two new special character prefixes to ExecStart= and
friends, in addition to the existing "-", "@" and "+":
"!" → much like "+", except with a much reduced effect as it only
disables the actual setresuid()/setresgid()/setgroups() calls, but
leaves all other security features on, including namespace
options. This is very useful in combination with
RuntimeDirectory= or DynamicUser= and similar option, as a user
is still allocated and used for the runtime directory, but the
actual UID/GID dropping is left to the daemon process itself.
This should make RuntimeDirectory= a lot more useful for daemons
which insist on doing their own privilege dropping.
"!!" → Similar to "!", but on systems supporting ambient caps this
becomes a NOP. This makes it relatively straightforward to write
unit files that make use of ambient capabilities to let systemd
drop all privs while retaining compatibility with systems that
lack ambient caps, where priv dropping is the left to the daemon
codes themselves.
This is an alternative approach to #6564 and related PRs.
This new group lists all UID/GID credential changing syscalls (which are
quite a number these days). This will become particularly useful in a
later commit, which uses this group to optionally permit user credential
changing to daemons in case ambient capabilities are not available.
These booleans simply store whether selinux/apparmor/smack are supposed
ot be used, and chache the various mac_xyz_use() calls before we
transition into the namespace, hence let's use the same verb for the
variables and the functions: "use"
Let's merge three if blocks that shall only run when sandboxing is applied
into one.
Note that this changes behaviour in one corner case: PrivateUsers=1 is
now honours both PermissionsStartOnly= and the "+" modifier in
ExecStart=, and not just the former, as before. This was an oversight,
so let's fix this now, at a point in time the option isn't used much
yet.
"Permissions" was a bit of a misnomer, as it suggests that UNIX file
permission bits are adjusted, which aren't really changed here. Instead,
this is about UNIX credentials such as users or groups, as well as
namespacing, hence let's use a more generic term here, without any
misleading reference to UNIX file permissions: "sandboxing", which shall
refer to all kinds of sandboxing technologies, including UID/GID
dropping, selinux relabelling, namespacing, seccomp, and so on.
The new unit_set_exec_params() call is to units what
manager_set_exec_params() is to the manager object: it initializes the
various fields from the relevant generic properties set.
execute: add one more ExecFlags flag, for controlling unconditional directory chowning
Let's decouple the Manager object from the execution logic a bit more
here too, and simply pass along the fact whether we should
unconditionally chown the runtime/... directories via the ExecFlags
field too.
execute: let's decouple execute.c a bit from the unit logic
Let's try to decouple the execution engine a bit from the Unit/Manager
concept, and hence pass one more flag as part of the ExecParameters flags
field.
This adds a per-service restart counter. Each time an automatic
restart is scheduled (due to Restart=) it is increased by one. Its
current value is exposed over the bus as NRestarts=. It is also logged
(in a structured, recognizable way) on each restart.
Note that this really only counts automatic starts triggered by Restart=
(which it nicely complements). Manual restarts will reset the counter,
as will explicit calls to "systemctl reset-failed". It's supposed to be
a tool for measure the automatic restart feature, and nothing else.
Alan Jenkins [Wed, 9 Aug 2017 17:56:26 +0000 (18:56 +0100)]
units: console-getty.service: use the default RestartSec
> Note that console-getty.service as more uses than just containers. The
> idea is that it may be used as alternative to the whole VC/logind stuff,
> if all you need is a console on /dev/console, even on physical devices.
This means we want to remove RestartSec=0, for serial systems.
See 4bf0432 "units/serial-getty@.service: use the default RestartSec".
Alan Jenkins [Mon, 7 Aug 2017 18:24:32 +0000 (19:24 +0100)]
units: add Conflicts=rescue.service to container-getty@.service
The traditional runlevel 1 is "single user mode", and shuts down all but
the main console. In systemd, rescue.target provides runlevel1.target.
But it did not shut down logins on secondary consoles... if systemd was
running in a container.
I don't think we strictly need to change this. But when you look at both
container-getty@.service and getty@.service, you see that both have
IgnoreOnIsolate, but only the latter has Conflicts=rescue.service.
This also makes rescue.target in a container consistent with
emergency.target. In the latter case, the gettys were already stopped,
because they have a Requires dependency on sysinit.target.
Alan Jenkins [Wed, 9 Aug 2017 13:43:41 +0000 (14:43 +0100)]
units/console-getty.service: comment reason for ConditionPathExists
Currently we have 4 getty services. 1 has a BindsTo dependency on a
device unit. 3 have ConditionPathExists, but the reason is different in
every single one.
* Add comment to console-getty@.service (see commit 1b41981d)
* getty@.service is already commented
* container-getty.service is not strictly correct, as I realized while
trying to compose a comment. Reported as #6584.
pam_logind: skip leading /dev/ from PAM_TTY field before passing it on
Apparently, PAM documents that the PAM_TTY should come with a /dev
prefix, but we don't expect it so far, except that Wayland ends up
setting it after all, the way the docs suggest. Hence, let's simply drop
the /dev prefix if it is there.
This new helper removes a leading /dev if there is one. We have code
doing this all over the place, let's unify this, and correct it while
we are at it, by using path_startswith() rather than startswith() to
drop the prefix.
William Douglas [Wed, 9 Aug 2017 15:53:03 +0000 (08:53 -0700)]
tmpfiles: Allow create symlink on directories (#6039)
Currently if tmpfiles is run with force on symlink creation but there already
exists a directory at that location, the creation will fail. This change
updates the behavior to remove the directory with rm_fr and then attempts to
create the symlink again.
tests: when running a manager object in a test, migrate to private cgroup subroot first (#6576)
Without this "meson test" will end up running all tests in the same
cgroup root, and they all will try to manage it. Which usually isn't too
bad, except when they end up clearing up each other's cgroups. This race
is hard to trigger but has caused various CI runs to fail spuriously.
With this change we simply move every test that runs a manager object
into their own private cgroup. Note that we don't clean up the cgroup at
the end, we leave that to the cgroup manager around it.
This fixes races that become visible by test runs throwing out errors
like this:
```
exec-systemcallfilter-failing.service: Passing 0 fds to service
exec-systemcallfilter-failing.service: About to execute: /bin/echo 'This should not be seen'
exec-systemcallfilter-failing.service: Forked /bin/echo as 5693
exec-systemcallfilter-failing.service: Changed dead -> start
exec-systemcallfilter-failing.service: Failed to attach to cgroup /exec-systemcallfilter-failing.service: No such file or directory
Received SIGCHLD from PID 5693 ((echo)).
Child 5693 ((echo)) died (code=exited, status=219/CGROUP)
exec-systemcallfilter-failing.service: Child 5693 belongs to exec-systemcallfilter-failing.service
exec-systemcallfilter-failing.service: Main process exited, code=exited, status=219/CGROUP
exec-systemcallfilter-failing.service: Changed start -> failed
exec-systemcallfilter-failing.service: Unit entered failed state.
exec-systemcallfilter-failing.service: Failed with result 'exit-code'.
exec-systemcallfilter-failing.service: cgroup is empty
Assertion 'service->main_exec_status.status == status_expected' failed at ../src/src/test/test-execute.c:71, function check(). Aborting.
```
bengal [Tue, 8 Aug 2017 16:55:31 +0000 (18:55 +0200)]
dhcp-network: adjust sockaddr length for addresses longer than 8 bytes (#6527)
An infiniband hardware address is 20 bytes, but sockaddr_ll.sll_addr is only 8
bytes. Explicitly ensure that sockaddr_union has enough space for infiniband
addresses, even if they run over sockaddr_ll and add a macro to compute the
proper size to pass to kernel.
sd-bus: free everything when bus_set_address_user fails (#6552)
Fixes:
```
$ env -i valgrind --leak-check=full ./build/test-bus-chat
...
==7763== 1,888 (1,824 direct, 64 indirect) bytes in 1 blocks are
definitely lost in loss record 2 of 2
==7763== at 0x4C2FA50: calloc (vg_replace_malloc.c:711)
==7763== by 0x4F8FF9A: sd_bus_new (sd-bus.c:175)
==7763== by 0x4F938BF: sd_bus_open_user (sd-bus.c:1138)
==7763== by 0x109ACD: server_init (test-bus-chat.c:70)
==7763== by 0x10BCF8: main (test-bus-chat.c:526)
==7763==
```
The directories are only used by the specific services, and
created before the services are started. So, it is not necessary
to create them by systemd-tmpfiles.
Luca Bruno [Sun, 6 Aug 2017 13:24:24 +0000 (13:24 +0000)]
core: evaluate presets after generators have run (#6526)
This commit moves the first-boot system preset-settings evaluation out
of main and into the manager startup logic itself. Notably, it reverses
the order between generators and presets evaluation, so that any changes
performed by first-boot generators are taken into the account by presets
logic.
After this change, units created by a generator can be enabled as part
of a preset.