core: serialize/deserialize IP accounting across daemon reload/reexec
Make sure the current IP accounting counters aren't lost during
reload/reexec.
Note that we destroy all BPF file objects during a reload: the BPF
programs, the access and the accounting maps. The former two need to be
regenerated anyway with the newly loaded configuration data, but the
latter one needs to survive reloads/reexec. In this implementation I
opted to only save/restore the accounting map content instead of the map
itself. While this opens a (theoretic) window where IP traffic is still
accounted to the old map after we read it out, and we thus miss a few
bytes this has the benefit that we can alter the map layout between
versions should the need arise.
core: when creating the socket fds for a socket unit, join socket's cgroup first
Let's make sure that a socket unit's IPAddressAllow=/IPAddressDeny=
settings are in effect on all socket fds associated with it. In order to
make this happen we need to make sure the cgroup the fds are associated
with are the socket unit's cgroup. The only way to do that is invoking
socket()+accept() in them. Since we really don't want to migrate PID 1
around we do this by forking off a helper process, which invokes
socket()/accept() and sends the newly created fd to PID 1. Ugly, but
works, and there's apparently no better way right now.
This generalizes forking off per-unit helper processes in a new function
unit_fork_helper_process(), which is then also used by the NSS chown()
code of socket units.
Daniel Mack [Thu, 3 Nov 2016 16:30:06 +0000 (17:30 +0100)]
Add IP address address ACL representation and parser
Add a config directive parser that takes multiple space separated IPv4
or IPv6 addresses with optional netmasks in CIDR notation rvalue and
puts a parsed version of it to linked list of IPAddressAccessItem objects.
The code actually using this will be added later.
manager: initialize timeouts when allocating a naked Manager object
This way we can safely run manager objects from tests and good timeouts
apply. Without this all timeouts are set 0, which means they fire
instantly, when run from tests which do not explicitly configure them
(the way main.c does).
cgroup: always invalidate "cpu" and "cpuacct" together
This doesn't really matter, as we never invalidate cpuacct explicitly,
and there's no real reason to care for it explicitly, however it's
prettier if we always treat cpu and cpuacct as belonging together, the
same way we conisder "io" and "blkio" to belong together.
core: make sure to dump cgroup context when unit_dump() is called for all unit types
For some reason we didn't dump the cgroup context for a number of unit
types, including service units. Not sure how this wasn't noticed
before... Add this in.
journald: make maximum size of stream log lines configurable and bump it to 48K (#6838)
This adds a new setting LineMax= to journald.conf, and sets it by
default to 48K. When we convert stream-based stdout/stderr logging into
record-based log entries, read up to the specified amount of bytes
before forcing a line-break.
This also makes three related changes:
- When a NUL byte is read we'll not recognize this as alternative line
break, instead of silently dropping everything after it. (see #4863)
- The reason for a line-break is now encoded in the log record, if it
wasn't a plain newline. Specifically, we distuingish "nul",
"line-max" and "eof", for line breaks due to NUL byte, due to the
maximum line length as configured with LineMax= or due to end of
stream. This data is stored in the new implicit _LINE_BREAK= field.
It's not synthesized for plain \n line breaks.
- A randomized 128bit ID is assigned to each log stream.
With these three changes in place it's (mostly) possible to reconstruct
the original byte streams from log data, as (most) of the context of
the conversion from the byte stream to log records is saved now. (So,
the only bits we still drop are empty lines. Which might be something to
look into in a future change, and which is outside of the scope of this
work)
Michael Biebl [Tue, 19 Sep 2017 12:17:57 +0000 (14:17 +0200)]
tests: change dbus tests to use user bus (#6845)
This makes it possible to run more dbus tests in a build
environment/chroot where no system bus is available.
To run the dbus test one then can use dbus-run-session.
systemd-link: Add support to configure tx-tcp6-segmentation (#6859)
closes #6854
tcp-segmentation-offload: off
tx-tcp-segmentation: off
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: off <==========================
Typically when DHCP server sets MTU it is a lower one. And a lower than usual
MTU is then thus required on said network to have operational networking. This
makes networkd's dhcp client to work in more similar way to other dhcp-clients
(e.g. isc-dhcp). In particular, in a cloud setting, without this default
instances have resulted in timing out talking to cloud metadata source and
failing to provision.
This does not change this default for the Annonymize code path.
man: import "Multi-Seat on Linux" into sd-login(3)
The document on the wiki is partially outdated and not very visible. Let's
import the gist of it here. The original text is retained, with only grammar
and stylistic and formatting changes.
Alan Jenkins [Sat, 16 Sep 2017 11:32:59 +0000 (12:32 +0100)]
sd-bus: style nitpick node_vtable_get_userdata()
It's confusing to use a single void* to store data with two different
types, i.e. a userdata value which is safe to pass to ->find(), and a
userdata value which identifies the found object.
Name the latter `found_u`. This naming treats (!c->find) as a degenerate
case. (I.e. at that point, we know the object has already been found :).
Ivan Kurnosov [Sun, 17 Sep 2017 11:09:38 +0000 (23:09 +1200)]
Fix for dst/non-dst timezones
The problem was with the tm.tm_isdst that is set to the current environment
value: either DST or not. While the current state is not relevant to the state
in the desired date.
Hence — it should be reset so that the mktime_or_timegm could normalise it
later.
Christian Hesse [Fri, 15 Sep 2017 19:28:24 +0000 (21:28 +0200)]
fix path in btrfs rule (#6844)
Commit 0e8856d2 (assemble multidevice btrfs volumes without external
tools (#6607)) introduced a call to udevadm. That lives in @rootbindir@,
not @rootlibexecdir@. So fix the path.
We settled on "filename" and "file system", so change a couple of places for
consistency. The exception is when there's an adjective before "file" that
binds more strongly then "name": "password file name", "output file name", etc.
Those cases are left intact.
cryptsetup: make sure we invoke the cryptsetup tools with a shared keyring
We want that cryptsetup can cache keys between multiple invocations, and
it does so via the root user's user keyring, hence let's share it among
services.
core: add new per-unit setting KeyringMode= for controlling kernel keyring setup
Usually, it's a good thing that we isolate the kernel session keyring
for the various services and disconnect them from the user keyring.
However, in case of the cryptsetup key caching we actually want that
multiple instances of the cryptsetup service can share the keys in the
root user's user keyring, hence we need to be able to disable this logic
for them.
This adds KeyringMode=inherit|private|shared:
inherit: don't do any keyring magic (this is the default in systemd --user)
private: a private keyring as before (default in systemd --system)
shared: the new setting
Patrik Flykt [Mon, 21 Aug 2017 10:41:20 +0000 (13:41 +0300)]
sd-radv: Add Router Advertisement DNS Search List option
Add Router Advertisement DNS Search List option as specified
in RFC 8106. The search list option uses and identical option
header as the RDNSS option and therefore the option header
structure can be reused.
If systemd is compiled with IDNA support, internationalization
of the provided search domain is applied, after which the search
list is written in wire format into the DNSSL option.
man: delete note about propagating signal termination
That advice is generally apropriate for "user" programs, i.e. programs which
are run interactively and used pipelines and such. But it makes less sense for
daemons to propagate the exit signal. For example, if a process receives a SIGTERM,
it is apropriate for it to exit with 0 code. So let's just delete the whole
paragraph, since this page doesn't seem to be the right place for the longer
discussion which would be required to mention all the caveats and considerations.
Martin Pitt [Fri, 15 Sep 2017 07:21:49 +0000 (09:21 +0200)]
Revert "device : reload when udev generates a "changed" event" (#6836)
This reverts commit 0ffddc6e2c6e19e5dc81812aee9fbe964059f3aa. That
causes a rather severe disruption of D-Bus and other services when e. g.
restarting local-fs.target (as spotted by the "storage" test regression).
core: make sure that $JOURNAL_STREAM prefers stderr over stdout information (#6824)
If two separate log streams are connected to stdout and stderr, let's
make sure $JOURNAL_STREAM points to the latter, as that's the preferred
log destination, and the environment variable has been created in order
to permit services to automatically upgrade from stderr based logging to
native journal logging.
Martin Pitt [Fri, 15 Sep 2017 05:32:50 +0000 (07:32 +0200)]
cryptsetup: fix unused variable (#6833)
When building without veracrypt, gcc warns
../src/cryptsetup/cryptsetup.c:55:13: warning: ‘arg_tcrypt_veracrypt’ defined but not used [-Wunused-variable]
static bool arg_tcrypt_veracrypt = false;
networkd: add support to configure IP Rule (#5725)
Routing Policy rule manipulates rules in the routing policy database control the
route selection algorithm.
This work supports to configure Rule
```
[RoutingPolicyRule]
TypeOfService=0x08
Table=7
From= 192.168.100.18
```
```
ip rule show
0: from all lookup local
0: from 192.168.100.18 tos 0x08 lookup 7
```
V2 changes:
1. Added logic to handle duplicate rules.
2. If rules are changed or deleted and networkd restarted
then those are deleted when networkd restarts next time
Alan Jenkins [Thu, 14 Sep 2017 19:43:43 +0000 (20:43 +0100)]
units: don't kill the emergency shell when sysinit.target is triggered (#6765)
Why
---
The advantage of this is that starting sysinit.target from the emergency
shell will no longer kill the emergency shell and lock you out of the
system. Our docs already claimed that emergency.target was useful for
"starting individual units in order to continue the boot process in steps".
This resolves #6509 for my purposes.
Remaining limitation
--------------------
Starting getty.target will still kill the shell, and if you don't have a
root password you will then be locked out at that point. This is relevant
to distributions which patch the sulogin system to permit logins when the
root password is locked. Both Debian and RedHat used to follow this
behaviour! Debian have been discussing what they could replace it with at
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=806852
So this doesn't quite achieve perfection, but I think it's a worthwhile
change. It should be easier to understand the logic now it doesn't have
such a big hole in it. Repairing the sysinit stage of the boot is the main
reason we have emergency.target. And as discussed in the issue,
sysinit.target gets pulled in implicitly as soon as any DefaultDependencies
service is activated.
How
---
sysinit.target only needs to conflict with emergency.target. It didn't
need to conflict with emergency.service as well. In theory the conflicts
are pointless, we could just change the dependency of sysinit.target on
local-fs.target from Wants to Requires. However, doing so would mean that
when local-fs fails, the screen is flooded with yellow [DEPEND] failures.
That would hinder the poor unfortunate admin, so let's not do that.
There is no additional ordering requirement against emergency. If the
failure happens, the job for sysinit will be cancelled instantly. We don't
need to worry about when sysinit.target and its dependents would be
stopped, because sysinit waits for local-fs before it starts.
emergency.target is still necessarily stopped once we reach sysinit
(you can't express a one-way conflict in pure unit directives).
This is largely cosmetic... though perhaps it symbolizes that you're no
longer in Emergency Mode if System Initialization is successful ;-).
As a secondary advantage, the getty's which conflict on rescue.service now
need to conflict on emergency.service as well. This makes the system more
uniform and simpler to understand.
The only other effect this should have is that
`systemctl start emergency.target` is now practically the same as
`systemctl start rescue.target`. The only units this command will stop are
the conflicting getty units. Neither of those commands should ever be
used. E.g. they will not stop the gdm.service unit on Fedora 26.
Felipe Sateler [Thu, 14 Sep 2017 17:51:20 +0000 (14:51 -0300)]
shared: end string with % if one was found at the end of a expandible string (#6828)
Current behavior is that %X where X is an unidentified specifier, then the result is
the same %X string. This was not the case when the string ended with a stray %, where
the character would have not been output. Lets add that missing character.
core/manager: when running in test mode, use a temp dir for generated stuff
When running through systemd-analyze verify or with --test, we would
not run generators (environment or unit). But at the end, we would nuke
the generator dirs anyway.
Simplify things by actually running generators of both types, but redirecting
their output to a temporary directory. This has the advantage that we test more
code, and the verification is more complete.
Since now we are not touching the real generator directories, we also don't
delete them, which fixes #5609.
pid1: improve the check guarding unit_file_preset_all()
When running in systemd-analyze verify, first_boot was initialized to -1
and never changed, so we'd try to run unit_file_preset_all(). Change the
check to > 0 which is more correct. Also, add a separate test for !test_run,
since we wouldn't want to run presets even if we were in first boot
(or /etc was empty for whatever other reason).
timer: don't use persietent file timestamps from the future (#6823)
Also, use the mtime rather than the atime of the timestamp file. While
the atime is not completely wrong, the mtime appears more appropriate
as that's what we actually explicitly change, and is not effected by
mere reading.
core: don't synthesize empty list when empty string is read in config_parse_strv()
This was added to make
https://bugs.freedesktop.org/show_bug.cgi?id=62558 work, which has long
been removed, hence let's revert to the original behaviour and fully
flush out the list when an empty string is assigned.