core: make the StartLimitXYZ= settings generic and apply to any kind of unit, not just services
This moves the StartLimitBurst=, StartLimitInterval=, StartLimitAction=, RebootArgument= from the [Service] section
into the [Unit] section of unit files, and thus support it in all unit types, not just in services.
This way we can enforce the start limit much earlier, in particular before testing the unit conditions, so that
repeated start-up failure due to failed conditions is also considered for the start limit logic.
For compatibility the four options may also be configured in the [Service] section still, but we only document them in
their new section [Unit].
This also renamed the socket unit failure code "service-failed-permanent" into "service-start-limit-hit" to express
more clearly what it is about, after all it's only triggered through the start limit being hit.
Finally, the code in busname_trigger_notify() and socket_trigger_notify() is altered to become more alike.
Michal Sekletar [Tue, 9 Feb 2016 08:57:45 +0000 (09:57 +0100)]
path_id: reintroduce by-path links for virtio block devices
Enumeration of virtio buses is global and hence
non-deterministic. However, we are guaranteed there is never going to be
more than one virtio bus per parent PCI device. While populating
ID_PATH we simply skip virtio part of the syspath and we extend the path
using the sysname of the parent PCI device.
With this patch udev creates following by-path links for virtio-blk
device /dev/vda which contains two partitions.
ls -l /dev/disk/by-path/
total 0
lrwxrwxrwx 1 root root 9 Feb 9 10:47 virtio-pci-0000:00:05.0 -> ../../vda
lrwxrwxrwx 1 root root 10 Feb 9 10:47 virtio-pci-0000:00:05.0-part1 -> ../../vda1
lrwxrwxrwx 1 root root 10 Feb 9 10:47 virtio-pci-0000:00:05.0-part2 -> ../../vda2
journal: Drop monotonicity check when appending to journal file
Remove the check that triggers rotation of the journal file when the arriving log entry had a monotonic timestamp smaller that the previous log entry. This check causes unnecessary rotations when journal-remote was receiving from multiple senders, therefore monotonicity can not be guaranteed. Also, it does not offer any useful functionality for systemd-journald.
The dual_timestamp_from_realtime(), dual_timestamp_from_monotonic()
and dual_timestamp_from_boottime_or_monotonic() shares the same
code for comparison given ts with delta. Let's move it to the
separate inline function to prevent code duplication.
time-util: merge format_timestamp_internal() and format_timestamp_internal_us()
The time_util.c provides format_timestamp_internal() and
format_timestamp_internal_us() functions for a timestamp formating. Both
functions are very similar and differ only in formats handling.
We can add additional boolean parameter to the format_timestamp_internal()
function which will represent is a format for us timestamp or not.
This allows us to get rid of format_timestamp_internal_us() that is prevent
code duplication.
We can remove format_timestamp_internal_us() safely, because it is static and
has no users outside of the time_util.c. New fourth parameter will be passed
inside of the format_timestamp(), format_timestamp_us() and etc, functions,
but the public API is not changed.
random-seed: provide nicer error message when unable to open file
If /var is read-only, and the seed file does not exist, we would print
a misleading error message for ENOENT. Print both messages instead, to
make it easy to diagonose.
Also, treat the cases of missing seed file the same as empty seed file
and exit successfully. Initialize the return code properly.
Michał Górny [Sat, 6 Feb 2016 12:47:30 +0000 (13:47 +0100)]
build-sys: Perform flag tests in context to existing flags
Fix the CC_CHECK_FLAG_APPEND macro to test appended flags in context to
current flag values. Otherwise, it is possible to append flags colliding
with user's *FLAGS or even previously appended flags that will cause
the build to fail.
The time-util.c provides dual_timestamp_get() function for getting
realtime and monotonic timestamps. Let's use it instead of direct
realtime/monotonic calculation.
Vito Caputo [Fri, 5 Feb 2016 15:26:18 +0000 (07:26 -0800)]
journal: remove template from open_journal args
None of the callers take advantage of this parameter, it's always NULL,
this is just a private helper function to simplify the call sites so
drop the template parameter altogether. If a caller emerges later who
needs it, it can be restored.
Stef Walter [Sun, 8 Nov 2015 10:20:01 +0000 (11:20 +0100)]
journal: Combine journal catalog entries with the same id
Instead of discarding duplicate catalog entries, we now combine
them. This allows software or admins to add or override catalog
headers, or add additional text to the catalog message.
Set the MHD_OPTION_CONNECTION_MEMORY_LIMIT to 128KB. The precious value was DATA_SIZE_MAX, which was defined as 1024*1024*768. This caused journal-remote to allocate 756MB for each journal-upload connection, thus exhausting the available memory.
Let's use bitfields for our booleans, and don't try to apply binary OR or addition on them, because that's weird and we
should instead use logical OR only.
We really shouldn't fail silently, but print a log message about these errors. Also make sure to attach error codes to
all log messages where that makes sense.
(While we are at it, add a couple of (void) casts to functions where we knowingly ignore return values.)
core: when a service's ExecStartPre= times out, skip ExecStop=
This makes sure we never run two control processes at the same time, we cannot keep track off.
This introduces a slight change of behaviour but cleans up the definition of ExecStop= and ExecStopPost=. The former is
now invoked only if the service managed to start-up correctly. The latter is called even if start-up failed half-way.
Thus, ExecStopPost= may be used as clean-up step for both successful and failed start-up attempts, but ExecStop='s
purpose is clearly defined as being responsible for shutting down the service and nothing else.
The precise behaviour of this was not documented yet. This commit adds the necessary docs.
nspawn: optionally run a stub init process as PID 1
This adds a new switch --as-pid2, which allows running commands as PID 2, while a stub init process is run as PID 1.
This is useful in order to run arbitrary commands in a container, as PID1's semantics are different from all other
processes regarding reaping of unknown children or signal handling.
For use in timesyncd we already defined a compile-time "epoch" value, which is based on the mtime of the NEWS file, and
specifies a point in time we know lies in the past at runtime. timesyncd uses this to filter out nonsensical timestamp
file data, and bump the system clock to a time that is after the build time of systemd. This patch adds similar bumping
code to earliest PID 1 initialization, so that the system never continues operation with a clock that is in the 1970ies
or even 1930s.
Nicolas Iooss [Tue, 2 Feb 2016 19:07:46 +0000 (20:07 +0100)]
logind: load SELinux labelling system
systemd-logind uses mkdir_label and label_fix functions without calling
first mac_selinux_init. This makes /run/user/$UID/ directories not
labelled correctly on an Arch Linux system using SELinux.
Fix this by calling mac_selinux_init("/run") early in systemd-logind.
This makes files created in /etc/udev/rules.d and /var/lib/systemd to be
labelled through transitions in the SELinux policy instead of using
setfscreatecon (with mac_selinux_create_file_prepare).
The mount_setup_early() and mount_setup() contain almost the same
pieces of code which calls mount_one() for a certain mount point
from the mount_table. This patch introduces mount_points_setup()
helper to prevent code duplication.
Michal Sekletar [Mon, 1 Feb 2016 09:44:58 +0000 (10:44 +0100)]
journalctl: make "journalctl /dev/sda" work
Currently when journalctl is called with path to block device node we
add following match _KERNEL_DEVICE=b$MAJOR:$MINOR.
That is not sufficient to actually obtain logs about the disk because
dev_printk() kernel helper puts to /dev/kmsg information about the
device in following format, +$SUBSYSTEM:$ADDRESS,
e.g. "+pci:pci:0000:00:14.0".
Now we will walk upward the syspath and add match for every device in
format produced by dev_printk() as well as match for its device node if
it exists.
The server might answer to a DHCPREQUEST with a NAK and currently the
client restarts the configuration process immediately. It was
observed that this can easily generate loops in which the network is
flooded with DISCOVER,OFFER,REQUEST,NAK sequences.
RFC 2131 only states that "if the client receives a DHCPNAK message,
the client restarts the configuration process" without further
details.
Add a delay with exponential backoff between retries after NAKs to
limit the number of requests and cap the delay to 30 minutes.
core: rework unit timeout handling, and add new setting RuntimeMaxSec=
This clean-ups timeout handling in PID 1. Specifically, instead of storing 0 in internal timeout variables as
indication for a disabled timeout, use USEC_INFINITY which is in-line with how we do this in the rest of our code
(following the logic that 0 means "no", and USEC_INFINITY means "never").
This also replace all usec_t additions with invocations to usec_add(), so that USEC_INFINITY is properly propagated,
and sd-event considers it has indication for turning off the event source.
This also alters the deserialization of the units to restart timeouts from the time they were originally started from.
Before this patch timeouts would be restarted beginning with the time of the deserialization, which could lead to
artificially prolonged timeouts if a daemon reload took place.
Finally, a new RuntimeMaxSec= setting is introduced for service units, that specifies a maximum runtime after which a
specific service is forcibly terminated. This is useful to put time limits on time-intensive processing jobs.
This also simplifies the various xyz_spawn() calls of the various types in that explicit distruction of the timers is
removed, as that is done anyway by the state change handlers, and a state change is always done when the xyz_spawn()
calls fail.
core: fix support for transient resource limit properties
Make sure we can properly process resource limit properties. Specifically, allow transient configuration of both the
soft and hard limit, the same way from the unit files. Previously, only the the hard rlimits could be configured but
they'd implicitly spill into the soft hard rlimits.
This also updates the client-side code to be able to parse hard/soft resource limit specifications. Since we need to
serialize two properties in bus_append_unit_property_assignment() now, the marshalling of the container around it is
now moved into the function itself. This has the benefit of shortening the calling code.
As a side effect this now beefs up the rlimit parser of "systemctl set-property" to understand time and disk sizes
where that's appropriate.
clang is apparently not smart enough to detect when a switch statement contains case statements for all possible values
of the used type. Work around that.
(And while we are at it, normalize indentation a bit)