Until that commit, do determine whether a daemon reload was required we compare
the mtime of the main unit file we loaded with the mtime of it on disk for
equality, but for drop-ins we only stored the newest mtime of all of them and
then did a "newer-than" comparison. This was brokeni with the above commit,
when all checks where changed to be for equality.
With this change all checks are now done as "newer-than", fixing the drop-in
mtime case. Strictly speaking this will not detect a number of changes that the
code before above commit detected, but given that the mtime is unlikely to go
backwards, and this is just intended to be a helpful hint anyway, this looks OK
in order to keep things simple.
core: move enforcement of the start limit into per-unit-type code again
Let's move the enforcement of the per-unit start limit from unit.c into the
type-specific files again. For unit types that know a concept of "result" codes
this allows us to hook up the start limit condition to it with an explicit
result code. Also, this makes sure that the state checks in clal like
service_start() may be done before the start limit is checked, as the start
limit really should be checked last, right before everything has been verified
to be in order.
The generic start limit logic is left in unit.c, but the invocation of it is
moved into the per-type files, in the various xyz_start() functions, so that
they may place the check at the right location.
Note that this change drops the enforcement entirely from device, slice, target
and scope units, since these unit types generally may not fail activation, or
may only be activated a single time. This is also documented now.
Note that restores the "start-limit-hit" result code that existed before 6bf0f408e4833152197fb38fb10a9989c89f3a59 already in the service code. However,
it's not introduced for all units that have a result code concept.
core: refuse merging on units when the unit type does not support alias
The concept of merging units exists so that we can create Unit objects for a
number of names early, and then load them only later, possibly merging units
which then turn out to be symlinked to other names. This of course only makes
sense for unit types where multiple names per unit are supported. For all
others, let's refuse the merge operation early.
core: merge service_connection_unref() into service_close_socket_fd()
We always call one after the other anyway, and this way service_set_socket_fd()
and service_close_socket_fd() nicely match each other as one undoes the effect
of the other.
There's no need to set the no_gc bit for service units that socket units
prepare, as we always keep a proper reference (as maintained by unit_ref_set())
on them, and such references are honoured by the GC logic anyway. Moreover,
explicitly setting the no_gc bit is problematic if the socket gets GC'ed for a
reason, as the service might then leak with the bit set.
In service_set_socket_fd(), let's make sure that if we can't add the requested
dependencies we take no possession of the passed connection fd.
This way, we follow the strict rule: we take possession of the passed fd on
success, but on failure we don't, and the fd remains in possession of the
caller.
core: rename StartLimitInterval= to StartLimitIntervalSec=
We generally follow the rule that for time settings we suffix the setting name
with "Sec" to indicate the default unit if none is specified. The only
exception was the rate limiting interval settings. Fix this, and keep the old
names for compatibility.
Do the same for journald's RateLimitInterval= setting
core: move start ratelimiting check after condition checks
With #2564 unit start rate limiting was moved from after the condition checks
are to before they are made, in an attempt to fix #2467. This however resulted
in #2684. However, with a previous commit a concept of per socket unit trigger
rate limiting has been added, to fix #2467 more comprehensively, hence the
start limit can be moved after the condition checks again, thus fixing #2684.
core: introduce activation rate limiting for socket units
This adds two new settings TriggerLimitIntervalSec= and TriggerLimitBurst= that
define a rate limit for activation of socket units. When the limit is hit, the
socket is is put into a failure mode. This is an alternative fix for #2467,
since the original fix resulted in issue #2684.
In a later commit the StartLimitInterval=/StartLimitBurst= rate limiter will be
changed to be applied after any start conditions checks are made. This way,
there are two separate rate limiters enforced: one at triggering time, before
any jobs are queued with this patch, as well as the start limit that is moved
again to be run immediately before the unit is activated. Condition checks are
done in between the two, and thus no longer affect the start limit.
build-sys: improve compat with older kernel headers
In 4.2 kernel headers, some netlink defines are missing that we need. missing.h
already can add them in, but currently makes this dependent on a definition
that these kernels already have. Change the check hence to check for the newest
definition in the table, so that the whole bunch of definitions as added in on
all kernels lacking this.
path-util: also support ".old" and ".new" suffixes and recommend them
~ suffix works fine, but looks to much like it the file is supposed to be
automatically cleaned up. For new versions of configuration files installers
might want to using something that looks more permanent like foobar.new.
So let's add treat ".old" and ".new" as special.
We previously would fail with EOPNOTSUPP when encountering an AF_UNIX socket in
the directory tree to copy. Fix that, and copy them too (even if they are dead
in the result).
Let's move DUID configuration into the [DHCP] section, since it only makes
sense in a DHCP context, and should be close to the configuration of
ClientIdentifier= and suchlike.
This really shouldn't be a section of its own, we don't have any for any of our
other per-protocol specific identifiers...
parse-util: fix conversion from size_t on s390 (#3147)
On s390 size_t is an unsigned long, nor an unsigned int. They both are
of the same size and can be cast to each other safely, but the compiler
still seems unhappy about incompatible pointers.
Running cgtop on a system, which lacks expecting stat file, results in a
segfault. For example, a system with blkio tree but without cfq io scheduler,
lacks "blkio.io_service_bytes".
When the targeting cgroup's file does not exist, process() returns 0 and
also does not modify `*ret' value (which is `*ours'). As a result,
callers of refresh_one() can have bogus pointer, which result in SEGV.
This patch just properly initialize the variable to NULL.
tree-wide: rename hidden_file to hidden_or_backup_file and optimize
In standard linux parlance, "hidden" usually means that the file name starts
with ".", and nothing else. Rename the function to convey what the function does
better to casual readers.
Stop exposing hidden_file_allow_backup which is rather ugly and rewrite
hidden_file to extract the suffix first. Note that hidden_file_allow_backup
excluded files with "~" at the end, which is quite confusing. Let's get
rid of it before it gets used in the wrong place.
basic/dirent-util: do not call hidden_file_allow_backup from dirent_is_file_with_suffix
If the file name is supposed to end in a suffix, there's not need to check the
name against a list of "special" file names, which is slow. Instead, just check
that the name doens't start with a period.
Martin Pitt [Wed, 27 Apr 2016 08:34:24 +0000 (10:34 +0200)]
Stop syslog.socket when entering emergency mode (#3130)
When enabling ForwardToSyslog=yes, the syslog.socket is active when entering
emergency mode. Any log message then triggers the start of rsyslog.service (or
other implementation) along with its dependencies such as local-fs.target and
sysinit.target. As these might fail themselves (e. g. faulty /etc/fstab), this
breaks the emergency mode.
This causes syslog.socket to fail with "Failed to queue service startup job:
Transition is destructive".
Add Conflicts=syslog.socket to emergency.service to make sure the socket is
stopped when emergency.service is started.
Martin Pitt [Wed, 27 Apr 2016 07:58:42 +0000 (09:58 +0200)]
path-util: Add hidden suffixes for ucf (#3131)
ucf is a standard Debian helper for managing configuration file upgrades which
need more interaction or elaborate merging than conffiles managed by dpkg.
Ignore its temporary and backup files similarly to the *.dpkg-* ones to avoid
creating units for them in generators.
journal: set STATE_ARCHIVED as part of offlining (#2740)
The only code path which makes a journal durable is via
journal_file_set_offline().
When we perform a rotate the journal's header->state is being set to
STATE_ARCHIVED prior to journal_file_set_offline() being called.
In journal_file_set_offline(), we short-circuit the entire offline when
f->header->state != STATE_ONLINE.
This all results in none of the journal_file_set_offline() fsync() calls
being reached when rotate archives a journal, so archived journals are
never explicitly made durable.
What we do now is instead of setting the f->header->state to
STATE_ARCHIVED directly in journal_file_rotate() prior to
journal_file_close(), we set an archive flag in f->archive for the
journal_file_set_offline() machinery to honor by committing
STATE_ARCHIVED instead of STATE_OFFLINE when set.
Prior to this, rotated journals were never getting fsync() explicitly
performed on them, since journal_file_set_offline() short-circuited.
Obviously this is undesirable, and depends entirely on the underlying
filesystem as to how much durability was achieved when simply closing
the file.
Note that this problem existed prior to the recent asynchronous fsync
changes, but those changes do facilitate our performing this durable
offline on rotate without blocking, regardless of the underlying
filesystem sync-on-close semantics.
* sd-journal: detect earlier if we try to read an object from an invalid offset
Specifically, detect early if we try to read from offset 0, i.e. are using
uninitialized offset data.
* journal: when dumping journal contents, react nicer to lines we can't read
If journal files are not cleanly closed it might happen that intermediaery
journal entries cannot be read. Handle this nicely, skip over the unreadable
entries, and log a debug message about it; after all we generally follow the
logic that we try to make the best of corrupted files.
* journal-file: always generate the same error when encountering corrupted files
Let's make sure EBADMSG is the one error we throw when we encounter corrupted
data, so that we can neatly test for it.
* journal-file: when iterating through a partly corruped journal file, treat error like EOF
When we linearly iterate through a corrupted journal file, and we encounter a
read error, don't consider this fatal, but merely as EOF condition (and log
about it).
* journal-file: make seeking in corrupted files work
Previously, when we used a bisection table for seeking through a corrupted
file, and the end of the bisection table was corrupted we'd most likely fail
the entire seek operation. Improve the situation: if we encounter invalid
entries in a bisection table, linearly go backwards until we find a working
entry again.
* man: elaborate on the automatic systemd-journald.socket service dependencies
journal-file: make seeking in corrupted files work
Previously, when we used a bisection table for seeking through a corrupted
file, and the end of the bisection table was corrupted we'd most likely fail
the entire seek operation. Improve the situation: if we encounter invalid
entries in a bisection table, linearly go backwards until we find a working
entry again.
journal-file: when iterating through a partly corruped journal file, treat error like EOF
When we linearly iterate through a corrupted journal file, and we encounter a
read error, don't consider this fatal, but merely as EOF condition (and log
about it).
journal: when dumping journal contents, react nicer to lines we can't read
If journal files are not cleanly closed it might happen that intermediaery
journal entries cannot be read. Handle this nicely, skip over the unreadable
entries, and log a debug message about it; after all we generally follow the
logic that we try to make the best of corrupted files.
systemd --user: call pam_loginuid when creating user@.service (#3120)
This way the user service will have a loginuid, and it will be inherited by
child services. This shouldn't change anything as far as systemd itself is
concerned, but is nice for various services spawned from by systemd --user
that expect a loginuid.
pam_loginuid(8) says that it should be enabled for "..., crond and atd".
user@.service should behave similarly to those two as far as audit is
concerned.
machined: generate a nicer error when the user tries "machinectl clone" on non-btrfs file systems (#3117)
Fixes: #2060
(Of course, in the long run, we should probably add a copy-based fall-back. But
given how slow that is, this probably requires some asynchronous forking logic
like the CopyFrom() and CopyTo() method calls already implement.)
core: fix description of "resources" service error (#3119)
The "resources" error is really just the generic error we return when
we hit some kind of error and we have no more appropriate error for the case to
return, for example because of some OS error.
Hence, reword the explanation and don't claim any relation to resource limits.
Admittedly, the "resources" service error is a bit of a misnomer, but I figure
it's kind of API now.
journal: fix already offline check and thread leak (#2810)
Early in journal_file_set_offline() f->header->state is tested to see if
it's != STATE_ONLINE, and since there's no need to do anything if the
journal isn't online, the function simply returned here.
Since moving part of the offlining process to a separate thread, there
are two problems here:
1. We can't simply check f->header->state, because if there is an
offline thread active it may modify f->header->state.
2. Even if the journal is deemed offline, the thread responsible may
still need joining, so a bare return may leak the thread's resources
like its stack.
To address #1, the helper journal_file_is_offlining() is called prior to
accessing f->header->state.
If journal_file_is_offlining() returns true, f->header->state isn't even
checked, because an offlining journal is obviously online, and we'll
just continue with the normal set offline code path.
If journal_file_is_offlining() returns false, then it's safe to check
f->header->state, because the offline_state is beyond the point of
modifying f->header->state, and there's a memory barrier in the helper.
If we find f->header->state is != STATE_ONLINE, then we call the
idempotent journal_file_set_offline_thread_join() on the way out of the
function, to join a potential lingering offline thread.
I am pretty sure we shouldn't carry history sections in man pages, since it's
very hard to keep them correctly updated, the current ones are very
out-of-date, and they tend to make APIs appear unnecessarily complex.
journalctl: port --machine= switch to use machined's OpenMachineRootDirectory()
This way, the switch becomes compatible with nspawn containers using --image=,
and those which only store journal data in /run (i.e. have persistant logs
off).
journalctl: don't trust the per-field entry tables when looking for boot IDs
When appending to a journal file, journald will:
a) first, append the actual entry to the end of the journal file
b) second, add an offset reference to it to the global entry array stored at
the beginning of the file
c) third, add offset references to it to the per-field entry array stored at
various places of the file
The global entry array, maintained by b) is used when iterating through the
journal without matches applied.
The per-field entry array maintained by c) is used when iterating through the
journal with a match for that specific field applied.
In the wild, there are journal files where a) and b) were completed, but c)
was not before the files were abandoned. This means, that in some cases log
entries are at the end of these files that appear in the global entry array,
but not in the per-field entry array of the _BOOT_ID= field. Now, the
"journalctl --list-boots" command alternatingly uses the global entry array
and the per-field entry array of the _BOOT_ID= field. It seeks to the last
entry of a specific _BOOT_ID=field by having the right match installed, and
then jumps to the next following entry with no match installed anymore, under
the assumption this would bring it to the next boot ID. However, if the
per-field entry wasn't written fully, it might actually turn out that the
global entry array might know one more entry with the same _BOOT_ID, thus
resulting in a indefinite loop around the same _BOOT_ID.
This patch fixes that, by updating the boot search logic to always continue
reading entries until the boot ID actually changed from the previous. Thus, the
per-field entry array is used as quick jump index (i.e. as an optimization),
but not trusted otherwise. Only the global entry array is trusted.
This replaces PR #1904, which is actually very similar to this one. However,
this one actually reads the boot ID directly from the entry header, and doesn't
try to read it at all until the read pointer is actually really located on the
first item to read.
Show the various timestamps in hexadecimal too. This is useful for matching the
timestamps included in cursor strings (which are encoded in hex, too), with the
references in the journal header.
Drop the "read_realtime" parameter. Getting the realtime timestamp from an
entry is cheap, as it is a normal header field, hence let's just get this
unconditionally, and simplify our code a bit.
sd-journal: add logic to open journal files of a specific OS tree
With this change a new flag SD_JOURNAL_OS_ROOT is introduced. If specified
while opening the journal with the per-directory calls (specifically:
sd_journal_open_directory() and sd_journal_open_directory_fd()) the passed
directory is assumed to be the root directory of an OS tree, and the journal
files are searched for in /var/log/journal, /run/log/journal relative to it.
This is useful to allow usage of sd-journal on file descriptors returned by the
OpenRootDirectory() call of machined.
machined: add new OpenRootDirectory() call to Machine objects
This new call returns a file descriptor for the root directory of a container.
This file descriptor may then be used to access the rest of the container's
file system, via openat() and similar calls. Since the file descriptor returned
is for the file system namespace inside of the container it may be used to
access all files of the container exactly the way the container itself would
see them. This is particularly useful for containers run directly from
loopback media, for example via systemd-nspawn's --image= switch. It also
provides access to directories such as /run of a container that are normally
not accessible to the outside of a container.
sd-journal: add API for opening journal files or directories by fd
Also, expose this via the "journalctl --file=-" syntax for STDIN. This feature
remains undocumented though, as it is probably not too useful in real-life as
this still requires fds that support mmaping and seeking, i.e. does not work
for pipes, for which reading from STDIN is most commonly used.
Michal Koutný [Mon, 25 Apr 2016 11:25:00 +0000 (13:25 +0200)]
Always create dependencies for loop device mounts
In case a file is on a networked filesystem, we may tag the fstab record with
_netdev option, however, corrrect dependencies will be created for this mount.
Michal Koutný [Tue, 19 Apr 2016 16:44:40 +0000 (18:44 +0200)]
Always create dependencies for bind mounts
Dependencies were not created for _netdev mountpoints, the reasoning for this
is in the commit fc676b00, i.e. to avoid adding dependencies for network
mountpoints where What= appears like a path. Thus proposing this semantically
more correct condition when dependencies are added for _actual_ bind mounts
irrespectively of network flag.
Consequently it allows to add _netdev option to bind mounts, which includes
them in remote-fs.target, which simplifies configuration.