Rewrite the function to be slightly simpler. In particular, if a specific
match is found (like ConditionVirtualization=yes), simply return an answer
immediately, instead of relying that "yes" will not be matched by any of
the virtualization names below.
detect-virt: add --private-users switch to check if a userns is active
Various things don't work when we're running in a user namespace, but it's
pretty hard to reliably detect if that is true.
A function is added which looks at /proc/self/uid_map and returns false
if the default "0 0 UINT32_MAX" is found, and true if it finds anything else.
This misses the case where an 1:1 mapping with the full range was used, but
I don't know how to distinguish this case.
'systemd-detect-virt --private-users' is very similar to
'systemd-detect-virt --chroot', but we check for a user namespace instead.
Fixes:
```
==28075== 64 bytes in 1 blocks are definitely lost in loss record 2 of 3
==28075== at 0x4C2BAEE: malloc (vg_replace_malloc.c:298)
==28075== by 0x4C2DCA1: realloc (vg_replace_malloc.c:785)
==28075== by 0x4ED40A2: greedy_realloc (alloc-util.c:57)
==28075== by 0x4E90F87: extract_first_word (extract-word.c:78)
==28075== by 0x4E91813: extract_many_words (extract-word.c:270)
==28075== by 0x10FE93: parse_line (sysusers.c:1325)
==28075== by 0x11198B: read_config_file (sysusers.c:1640)
==28075== by 0x111EB8: main (sysusers.c:1773)
==28075==
```
sysctl: run sysctl service if /proc/sys/net is writable (#4425)
This simply changes this line:
ConditionPathIsReadWrite=/proc/sys/
to this:
ConditionPathIsReadWrite=/proc/sys/net/
The background for this is that the latter is namespaced through network
namespacing usually and hence frequently set as writable in containers, even
though the former is kept read-only. If /proc/sys is read-only but
/proc/sys/net is writable we should run the sysctl service, as useful settings
may be made in this case.
units: extend stop timeout for user@.service to 120s (#4426)
By default all user and all system services get stop timeouts for 90s. This is
problematic as the user manager of course is run as system service. Thus, if
the default time-out is hit for any user service, then it will also be hit for
user@.service as a whole, thus making the whole concept useless for user
services.
This patch extends the stop timeout to 120s for user@.service hence, so that
that the user service manager has ample time to process user services timing
out.
(The other option would have been to shorten the default user service timeout,
but I think a user service should get the same timeout by default as a system
service)
Fixes:
```
==10750==
==10750== HEAP SUMMARY:
==10750== in use at exit: 96 bytes in 3 blocks
==10750== total heap usage: 1,711 allocs, 1,708 frees, 854,545 bytes
allocated
==10750==
==10750== 96 (64 direct, 32 indirect) bytes in 1 blocks are definitely
lost in loss record 3 of 3
==10750== at 0x4C2DA60: calloc (vg_replace_malloc.c:711)
==10750== by 0x4EB3BDA: calendar_spec_from_string
(calendarspec.c:771)
==10750== by 0x109675: test_hourly_bug_4031 (test-calendarspec.c:118)
==10750== by 0x10A00E: main (test-calendarspec.c:202)
==10750==
==10750== LEAK SUMMARY:
==10750== definitely lost: 64 bytes in 1 blocks
==10750== indirectly lost: 32 bytes in 2 blocks
==10750== possibly lost: 0 bytes in 0 blocks
==10750== still reachable: 0 bytes in 0 blocks
==10750== suppressed: 0 bytes in 0 blocks
==10750==
==10750== For counts of detected and suppressed errors, rerun with: -v
==10750== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```
Franck Bui [Wed, 12 Oct 2016 08:09:45 +0000 (10:09 +0200)]
journal: rename determine_space_for() into cache_space_refresh()
Now that determine_space_for() only deals with storage space (cached) values,
rename it so it reflects the fact that only the cached storage space values are
updated.
Franck Bui [Wed, 12 Oct 2016 07:58:10 +0000 (09:58 +0200)]
journal: introduce patch_min_use() helper
Updating min_use is rather an unusual operation that is limited when we first
open the journal files, therefore extracts it from determine_space_for() and
create a function of its own and call this new function when needed.
determine_space_for() is now dealing with storage space (cached) values only.
Franck Bui [Tue, 11 Oct 2016 14:51:37 +0000 (16:51 +0200)]
journal: don't emit space usage message when opening the journal (#4190)
This patch makes system_journal_open() stop emitting the space usage
message. The caller is now free to emit this message when appropriate.
When restarting the journal, we can now emit the message *after*
flushing the journal (if required) so that all flushed log entries are
written in the persistent journal *before* the status message.
This is required since the status message is always younger than the
flushed entries.
Franck Bui [Tue, 11 Oct 2016 14:46:16 +0000 (16:46 +0200)]
journal: introduce server_space_usage_message()
This commit simply extracts from determine_space_for() the code which emits the
storage usage message and put it into a function of its own so it can be reused
by others paths later.
Franck Bui [Mon, 3 Oct 2016 16:12:41 +0000 (18:12 +0200)]
journal: introduce determine_path_usage()
This commit simply extracts from determine_space_for() the code which
determines the FS usage where the passed path lives (statvfs(3)) and put it
into a function of its own so it can be reused by others paths later.
shared/install: report invalid unit files slightly better
When a unit file is invalid, we'd return an error without any details:
$ systemctl --root=/ enable testing@instance.service
Failed to enable: Invalid argument.
Fix things to at least print the offending file name:
$ systemctl enable testing@instance.service
Failed to enable unit: File testing@instance.service: Invalid argument
$ systemctl --root=/ enable testing@instance.service
Failed to enable unit, file testing@instance.service: Invalid argument.
A real fix would be to pass back a proper error message from conf-parser.
But this would require major surgery, since conf-parser functions now
simply print log errors, but we would need to return them over the bus.
So let's just print the file name, to indicate where the error is.
shared/install: in install_context_mark_for_removal ignore not found units
With the following test case:
[Install]
WantedBy= default.target
Also=foobar-unknown.service
disabling would fail with:
$ ./systemctl --root=/ disable testing.service
Cannot find unit foobar-unknown.service. # this is level debug
Failed to disable: No such file or directory. # this is the error
After the change we proceed:
$ ./systemctl --root=/ disable testing.service
Cannot find unit foobar-unknown.service.
Removed /etc/systemd/system/default.target.wants/testing.service.
This does not affect specifying a missing unit directly:
$ ./systemctl --root=/ disable nosuch.service
Failed to disable: No such file or directory.
Luca Bruno [Tue, 18 Oct 2016 00:05:49 +0000 (00:05 +0000)]
core/exec: add a named-descriptor option ("fd") for streams (#4179)
This commit adds a `fd` option to `StandardInput=`,
`StandardOutput=` and `StandardError=` properties in order to
connect standard streams to externally named descriptors provided
by some socket units.
This option looks for a file descriptor named as the corresponding
stream. Custom names can be specified, separated by a colon.
If multiple name-matches exist, the first matching fd will be used.
Let's avoid the overly abbreviated "cgroups" terminology. Let's instead write:
"Linux Control Groups (cgroups)" is the long form wherever the term is
introduced in prose. Use "control groups" in the short form wherever the term
is used within brief explanations.
llua [Mon, 17 Oct 2016 12:35:26 +0000 (08:35 -0400)]
zsh-completion: fix for #4318 (#4394)
Escape unit names for the eval call in _call_program
The value of the Id property is transformed back into a unit name
usable by systemctl.
system-systemd\x5cx2dcryptsetup.slice -> system-systemd\x2dcryptsetup.slice
Also filter units by property via parameter expansion, not a for loop
core/timer: reset next_elapse_*time when timer is not waiting
When the unit that is triggered by a timer is started and running,
we transition to "running" state, and the timer will not elapse again
until the unit has finished running. In this state "systemctl list-timers"
would display the previously calculated next elapse time, which would
now of course be in the past, leading to nonsensical values.
Simply set the next elapse to infinity, which causes list-timers to
show n/a. We cannot specify when the next elapse will happen, possibly
never.
pid1: do not use mtime==0 as sign of masking (#4388)
It is allowed for unit files to have an mtime==0, so instead of assuming that
any file that had mtime==0 was masked, use the load_state to filter masked
units.
It's a common pattern, so add a helper for it. A macro is necessary
because a function that takes a pointer to a pointer would be type specific,
similarly to cleanup functions. Seems better to use a macro.
man: drop discouragment of runtime and vendor drop-ins
In certain situations drop-ins in /usr/lib/ are useful, for example when one package
wants to modify the behaviour of another package, or the vendor wants to tweak some
upstream unit without patching.
Drop-ins in /run are useful for testing, and may also be created by systemd itself.
Tejun Heo [Sat, 15 Oct 2016 01:07:16 +0000 (21:07 -0400)]
core: make settings for unified cgroup hierarchy supersede the ones for legacy hierarchy (#4269)
There are overlapping control group resource settings for the unified and
legacy hierarchies. To help transition, the settings are translated back and
forth. When both versions of a given setting are present, the one matching the
cgroup hierarchy type in use is used. Unfortunately, this is more confusing to
use and document than necessary because there is no clear static precedence.
Update the translation logic so that the settings for the unified hierarchy are
always preferred. systemd.resource-control man page is updated to reflect the
change and reorganized so that the deprecated settings are at the end in its
own section.
journal: refuse opening journal files from the future for writing
Never permit that we write to journal files that have newer timestamps than our
local wallclock has. If we'd accept that, then the entries in the file might
end up not being ordered strictly.
Let's refuse this with ETXTBSY, and then immediately rotate to use a new file,
so that each file remains strictly ordered also be wallclock internally.
journald: automatically rotate journal files when the clock jumps backwards
As soon as we notice that the clock jumps backwards, rotate journal files. This
is beneficial, as this makes sure that the entries in journal files remain
strictly ordered internally, and thus the bisection algorithm applied on it is
not confused.
This should help avoiding borked wallclock-based bisection on journal files as
witnessed in #4278.
journald: use the event loop dispatch timestamp for journal entries
Let's use the earliest linearized event timestamp for journal entries we have:
the event dispatch timestamp from the event loop, instead of requerying the
timestamp at the time of writing.
This makes the time a bit more accurate, allows us to query the kernel time one
time less per event loop, and also makes sure we always use the same timestamp
for both attempts to write an entry to a journal file.
journal: when iterating through entry arrays and we hit an invalid one keep going
When iterating through partially synced journal files we need to be prepared
for hitting with invalid entries (specifically: non-initialized). Instead of
generated an error and giving up, let's simply try to preceed with the next one
that is valid (and debug log about this).
This reworks the logic introduced with caeab8f626e709569cc492b75eb7e119076059e7
to iteration in both directions, and tries to look for valid entries located
after the invalid one. It also extends the behaviour to both iterating through
the global entry array and per-data object entry arrays.
journal: add an explicit check for uninitialized objects
Let's make dissecting of borked journal files more expressive: if we encounter
an object whose first 8 bytes are all zeroes, then let's assume the object was
simply never initialized, and say so.
Previously, this would be detected as "overly short object", which is true too
in a away, but it's a lot more helpful printing different debug options for the
case where the size is not initialized at all and where the size is initialized
to some bogus value.
No function behaviour change, only a different log messages for both cases.