Tejun Heo [Sat, 15 Oct 2016 01:07:16 +0000 (21:07 -0400)]
core: make settings for unified cgroup hierarchy supersede the ones for legacy hierarchy (#4269)
There are overlapping control group resource settings for the unified and
legacy hierarchies. To help transition, the settings are translated back and
forth. When both versions of a given setting are present, the one matching the
cgroup hierarchy type in use is used. Unfortunately, this is more confusing to
use and document than necessary because there is no clear static precedence.
Update the translation logic so that the settings for the unified hierarchy are
always preferred. systemd.resource-control man page is updated to reflect the
change and reorganized so that the deprecated settings are at the end in its
own section.
journal: refuse opening journal files from the future for writing
Never permit that we write to journal files that have newer timestamps than our
local wallclock has. If we'd accept that, then the entries in the file might
end up not being ordered strictly.
Let's refuse this with ETXTBSY, and then immediately rotate to use a new file,
so that each file remains strictly ordered also be wallclock internally.
journald: automatically rotate journal files when the clock jumps backwards
As soon as we notice that the clock jumps backwards, rotate journal files. This
is beneficial, as this makes sure that the entries in journal files remain
strictly ordered internally, and thus the bisection algorithm applied on it is
not confused.
This should help avoiding borked wallclock-based bisection on journal files as
witnessed in #4278.
journald: use the event loop dispatch timestamp for journal entries
Let's use the earliest linearized event timestamp for journal entries we have:
the event dispatch timestamp from the event loop, instead of requerying the
timestamp at the time of writing.
This makes the time a bit more accurate, allows us to query the kernel time one
time less per event loop, and also makes sure we always use the same timestamp
for both attempts to write an entry to a journal file.
journal: when iterating through entry arrays and we hit an invalid one keep going
When iterating through partially synced journal files we need to be prepared
for hitting with invalid entries (specifically: non-initialized). Instead of
generated an error and giving up, let's simply try to preceed with the next one
that is valid (and debug log about this).
This reworks the logic introduced with caeab8f626e709569cc492b75eb7e119076059e7
to iteration in both directions, and tries to look for valid entries located
after the invalid one. It also extends the behaviour to both iterating through
the global entry array and per-data object entry arrays.
journal: add an explicit check for uninitialized objects
Let's make dissecting of borked journal files more expressive: if we encounter
an object whose first 8 bytes are all zeroes, then let's assume the object was
simply never initialized, and say so.
Previously, this would be detected as "overly short object", which is true too
in a away, but it's a lot more helpful printing different debug options for the
case where the size is not initialized at all and where the size is initialized
to some bogus value.
No function behaviour change, only a different log messages for both cases.
hese10 [Wed, 12 Oct 2016 16:40:28 +0000 (19:40 +0300)]
Avoid forever loop for journalctl --list-boots command (#4278)
When date is changed in system to future and normal user logs to new journal file, and then date is changed back to present time, the "journalctl --list-boot" command goes to forever loop. This commit tries to fix this problem by checking first the boot id list if the found boot id was already in that list. If it is found, then stopping the boot id find loop.
Ben Harris [Wed, 12 Oct 2016 13:41:56 +0000 (14:41 +0100)]
hwdb: Treat Latitude 2110 brightness keys like on Inspiron 1520 (#4355)
Like the Inspiron 1520, the Dell Latitude 2110 emits brightness-control
key events both through atkbd and acpi-video. This suppresses them on
the atkbd side.
Djalal Harouni [Wed, 12 Oct 2016 12:11:16 +0000 (14:11 +0200)]
core:sandbox: lets make /lib/modules/ inaccessible on ProtectKernelModules=
Lets go further and make /lib/modules/ inaccessible for services that do
not have business with modules, this is a minor improvment but it may
help on setups with custom modules and they are limited... in regard of
kernel auto-load feature.
This change introduce NameSpaceInfo struct which we may embed later
inside ExecContext but for now lets just reduce the argument number to
setup_namespace() and merge ProtectKernelModules feature.
Djalal Harouni [Wed, 12 Oct 2016 11:31:21 +0000 (13:31 +0200)]
core:sandbox: Add ProtectKernelModules= option
This is useful to turn off explicit module load and unload operations on modular
kernels. This option removes CAP_SYS_MODULE from the capability bounding set for
the unit, and installs a system call filter to block module system calls.
This option will not prevent the kernel from loading modules using the module
auto-load feature which is a system wide operation.
Allow block and char classes in DeviceAllow bus properties (#4353)
Allowed paths are unified betwen the configuration file parses and the bus
property checker. The biggest change is that the bus code now allows "block-"
and "char-" classes. In addition, path_startswith("/dev") was used in the bus
code, and startswith("/dev") was used in the config file code. It seems
reasonable to use path_startswith() which allows a slightly broader class of
strings.
0xAX [Tue, 11 Oct 2016 21:30:04 +0000 (00:30 +0300)]
core/main: get rid from excess check of ACTION_TEST (#4350)
If `--test` command line option was passed, the systemd set skip_setup
to true during bootup. But after this we check again that arg_action is
test or help and opens pager depends on result.
We should skip setup in a case when `--test` is passed, but it is also
safe to set skip_setup in a case of `--help`. So let's remove first
check and move skip_setup = true to the second check.
core: chown() any TTY used for stdin, not just when StandardInput=tty is used (#4347)
If stdin is supplied as an fd for transient units (using the
StandardInputFileDescriptor pseudo-property for transient units), then we
should also fix up the TTY ownership, not just when we opened the TTY
ourselves.
This simply drops the explicit is_terminal_input()-based check. Note that
chown_terminal() internally does a much more appropriate isatty()-based check
anyway, hence we can drop this without replacement.
Yu Watanabe [Tue, 11 Oct 2016 12:36:14 +0000 (21:36 +0900)]
units: add Wants=initrd-cleanup.service to initrd-switch-root.target (#4345)
`systemctl isolate initrd-switch-root.target` called by initrd-cleanup.service
kills initrd-cleanup.service itself. Then, initrd-cleanup.service failed and
system goes to emergency shell.
To prevent this problem, this commit adds `Wants=initrd-cleanup.service` to
initrd-switch-root.target.
r was not initialized and would be used if "tcp" was the only option
used for the stub. We should initialize it to 0 to indicate that no
error happened in the udp case.
core: when determining whether a process exit status is clean, consider whether it is a command or a daemon
SIGTERM should be considered a clean exit code for daemons (i.e. long-running
processes, as a daemon without SIGTERM handler may be shut down without issues
via SIGTERM still) while it should not be considered a clean exit code for
commands (i.e. short-running processes).
Let's add two different clean checking modes for this, and use the right one at
the appropriate places.
When we print information about PID 1's crashdump subprocess failing. In this
case we *know* that we do not generate LSB exit codes, as it's basically PID 1
itself that exited there.
0xAX [Mon, 10 Oct 2016 20:11:36 +0000 (23:11 +0300)]
main: use strdup instead of free_and_strdup to initialize default unit (#4335)
Previously we've used free_and_strdup() to fill arg_default_unit with unit
name, If we didn't pass default unit name through a kernel command line or
command line arguments. But we can use just strdup() instead of
free_and_strdup() for this, because we will start fill arg_default_unit
only if it wasn't set before.
exit-status: kill is_clean_exit_lsb(), move logic to sysv-generator
Let's get rid of is_clean_exit_lsb(), let's move the logic for the special
handling of the two LSB exit codes into the sysv-generator by writing out
appropriate SuccessExitStatus= lines if the LSB header exists. This is not only
semantically more correct, bug also fixes a bug as the code in service.c that
chose between is_clean_exit_lsb() and is_clean_exit() based this check on
whether a native unit files was available for the unit. However, that check was
bogus since a long time, since the SysV generator was introduced and native
SysV script support was removed from PID 1, as in that case a unit file always
existed.
Dan Dedrick [Wed, 4 May 2016 21:06:45 +0000 (17:06 -0400)]
journal-remote: make the child pipe non-blocking
We are going to add this child as a source to our event loop so we don't
want to block when reading data from it as this will prevent us from
processing other events. Specifically this will block the signalfds
which means if we are waiting for data from curl we won't handle SIGTERM
or SIGINT until we happen to get more data.
Do not make up our own type for ExitStatus, but use the type used by POSIX for
this, which is "int". In particular as we never used that type outside of the
definition of exit_status_to_string() where we internally cast the paramter to
(int) every single time we used it.
Hence, let's simplify things, drop the type and use the kernel type directly.
Felipe Sateler [Mon, 10 Oct 2016 13:40:05 +0000 (10:40 -0300)]
login: drop fedora-specific PAM config, add note to DISTRO_PORTING (#4314)
It is impossible to ship a fully generic PAM configuration upstream.
Therefore, ship a minimal configuration with the systemd --user requirements,
and add a note to DISTRO_PORTING documenting this.
Franck Bui [Mon, 10 Oct 2016 10:06:26 +0000 (12:06 +0200)]
unit: drop console-shell.service (#4298) (#4325)
console-shell.service was supposed to be useful for normal clean boots
(i.e. multi-user.target or so), as a replacement for logind/getty@.service for
simpler use cases.
But due to the lack of documentation and sanity check one can easily be
confused and enable this service in // with getty@.service.
In this case we end up with both services sharing the same tty which ends up in
strange results.
Even worse, console-shell.service might be failing while getty@.service tries
to acquire the terminal which ends up in the system to poweroff since
console-shell.service uses:
"ExecStopPost=-/usr/bin/systemctl poweroff".
Another issue: this service doesn't work well if plymouth is also used since it
lets the splash screen program run and mess the tty (at least a "plymouth quit"
is missing).
0xAX [Mon, 10 Oct 2016 02:57:03 +0000 (05:57 +0300)]
main: initialize default unit little later (#4321)
systemd fills arg_default_unit during startup with default.target
value. But arg_default_unit may be overwritten in parse_argv() or
parse_proc_cmdline_item().
Let's check value of arg_default_unit after calls of parse_argv()
and parse_proc_cmdline_item() and fill it with default.target if
it wasn't filled before. In this way we will not spend unnecessary
time to for filling arg_default_unit with default.target.
When running in a user namespace without private networking, resolved would
fail to start. There isn't much difference between EADDRINUSE and EPERM,
so treat them the same, except for the warning message text.
resolved: simplify error handling in manager_dns_stub_{udp,tcp}_fd()
Make sure an error is always printed… When systemd-resolved is started in a
user namespace without private network, it would fail on setsockopt, but the
error wouldn't be particularly informative:
"Failed to start manager: permission denied."
Lans Zhang [Sun, 9 Oct 2016 22:59:54 +0000 (06:59 +0800)]
sd-boot: trigger to record further logs to tcg 2.0 final event log area (#4302)
According to TCG EFI Protocol Specification for TPM 2.0 family,
all events generated after the invocation of EFI_TCG2_GET_EVENT_LOG
shall be stored in an instance of an EFI_CONFIGURATION_TABLE aka
EFI TCG 2.0 final events table. Hence, it is necessary to trigger the
internal switch through calling get_event_log() in order to allow
to retrieve the logs from OS runtime.
msekletar:
> I've looked at EDK2 and indeed log entry is added to FinalEventsTable only after
> EFI_TCG2_PROTOCOL.GetEventLog was called[1][2]. Also, same patch was currently
> merged to shim by Peter Jones [3].
nspawn: fix parsing of numeric arguments for --private-users
The documentation says lists "yes", "no", "pick", and numeric arguments.
But parse_boolean was attempted first, so various numeric arguments were
misinterpreted.
In particular, this fixes --private-users=0 to mean the same thing as
--private-users=0:65536.
While at it, use strndupa to avoid some error handling.
Also give a better error for an empty UID range. I think it's likely that
people will use --private-users=0:0 thinking that the argument means UID:GID.
nspawn: also fall back to legacy cgroup hierarchy for old containers
Current systemd version detection routine cannot detect systemd 230,
only systmed >= 231. This means that we'll still use the legacy hierarchy
in some cases where we wouldn't have too. If somebody figures out a nice
way to detect systemd 230 this can be later improved.
nspawn: use mixed cgroup hierarchy only when container has new systemd
systemd-soon-to-be-released-232 is able to deal with the mixed hierarchy.
So make an educated guess, and use the mixed hierarchy in that case.
Tested by running the host with mixed hierarchy (i.e. simply using a recent
kernel with systemd from git), and booting first a container with older systemd,
and then one with a newer systemd.