Previously --ephemeral was only supported with container trees in btrfs
subvolumes (i.e. in combination with --directory=). This adds support for
--ephemeral in conjunction with disk images (i.e. --image=) too.
As side effect this fixes that --ephemeral was accepted but ignored when using
-M on a container that turned out to be an image.
@filesystem groups various file system operations, such as opening files and
directories for read/write and stat()ing them, plus renaming, deleting,
symlinking, hardlinking.
Martin Pitt [Fri, 18 Nov 2016 15:17:01 +0000 (16:17 +0100)]
hostnamed: allow networkd to set the transient hostname
systemd-networkd runs as user "systemd-network" and thus is not privileged to
set the transient hostname:
systemd-networkd[516]: ens3: Could not set hostname: Interactive authentication required.
Standard polkit *.policy files do not have a syntax for granting privileges to
a user, so ship a pklocalauthority (for polkit < 106) and a JavaScript rules
file (for polkit >= 106) that grants the "systemd-network" system user that
privilege.
Add DnsmasqClientTest.test_transient_hostname() test to networkd-test.py to
cover this. Make do_test() a bit more flexible by interpreting "coldplug==None"
as "test sets up the interface by itself". Change DnsmasqClientTest to set up
test_eth42 with a fixed MAC address so that we can configure dnsmasq to send a
special host name for that.
Add MSI VR420 (model MS-1422) to the list of MSI models which need to
ignore brightness hotkey presses, as these are already reported through
the acpi-video interface.
This commit adds the possibility to leave /sys, and /proc/sys read-write.
It introduces a new (undocumented) env var SYSTEMD_NSPAWN_API_VFS_WRITABLE
to enable this feature.
If set to "yes", /sys, and /proc/sys will be read-write.
If set to "no", /sys, and /proc/sys will be read-only.
If set to "network" /proc/sys/net will be read-write. This is useful in
use-cases, where systemd-nspawn is used in an external network
namespace.
This adds the possibility to start privileged containers which need more
control over settings in the /proc, and /sys filesystem.
This is also a follow-up on the discussion from
https://github.com/systemd/systemd/pull/4018#r76971862 where an
introduction of a simple env var to enable R/W support for those
directories was already discussed.
basic/process-util: we need to take the shorter of two strings
==30496== Conditional jump or move depends on uninitialised value(s)
==30496== at 0x489F654: memcmp (vg_replace_strmem.c:1091)
==30496== by 0x49BF203: getenv_for_pid (process-util.c:678)
==30496== by 0x4993ACB: detect_container (virt.c:442)
==30496== by 0x182DFF: test_get_process_comm (test-process-util.c:98)
==30496== by 0x185847: main (test-process-util.c:368)
==30496==
Franck Bui [Mon, 7 Nov 2016 16:14:59 +0000 (17:14 +0100)]
core: limit the length of the confirmation question
When "confirmation_spawn=1", the confirmation question can look like:
Execute /usr/bin/kmod static-nodes --format=tmpfiles --output=/run/tmpfiles.d/kmod.conf? [Yes, No, Skip]
which is pretty verbose and might not fit in the console width size (which is
usually 80 chars) and thus question will be splitted into 2 consecutive lines.
However since the question is now refreshed every 2 secs, the reprinted
question will overwrite the second line of the previous one...
To prevent this, this patch makes sure that the command line won't be longer
than 60 chars by ellipsizing it if the command is longer:
Execute /usr/bin/kmod static-nodes --format=tmpfiles --output=/ru…nf? [Yes, No, View, Skip]
A following patch will introduce a new choice that will allow the user to get
details on the command to be executed so it will still be possible to see the
full command line.
Franck Bui [Mon, 7 Nov 2016 16:14:59 +0000 (17:14 +0100)]
core: reprint the question every 2 sec in ask_char()
ask_char() now reprints the question every 2sec automatically.
It prefixes its output with '\r' to to bring the cursor to the
beginning of the terminal line, and then print the message, redoing it
every 2sec.
As long as nothing interferes with out output this logic will have no
visible effect as we constantly overprint the visible text with the
exact same text.
However, if something is dumped in the middle, then our question won't
get lost, as we'll ask soon again.
This is useful if the question is asked to a terminal that is also
used to dump some other status messages/logs. For example when
confirmation messages are enabled during the boot
(systemd.confirm_spawn=1), the question can easily be lost if the
kernel logs are also enabled and both use the same console.
Franck Bui [Wed, 2 Nov 2016 12:51:02 +0000 (13:51 +0100)]
core: rework ask_for_confirmation()
Now the reponses are handled by ask_for_confirmation() as well as the report of
any errors occuring during the process of retrieving the confirmation response.
One benefit of this is that there's no need to open/close the console one more
time when reporting error/status messages.
The caller now just needs to care about the return values whose meanings are:
- don't execute and pretend that the command failed
- don't execute and pretend that the command succeeed
- positive answer, execute the command
Also some slight code reorganization and introduce write_confirm_error() and
write_confirm_error_fd(). write_confim_message becomes unneeded.
Franck Bui [Wed, 2 Nov 2016 09:38:22 +0000 (10:38 +0100)]
core: allow to redirect confirmation messages to a different console
It's rather hard to parse the confirmation messages (enabled with
systemd.confirm_spawn=true) amongst the status messages and the kernel
ones (if enabled).
This patch gives the possibility to the user to redirect the confirmation
message to a different virtual console, either by giving its name or its path,
so those messages are separated from the other ones and easier to read.
Franck Bui [Wed, 2 Nov 2016 09:50:20 +0000 (10:50 +0100)]
core: prevent the cylon when confirmation_spawn=yes (#2194)
When booting with systemd.confirm_spawn=true, the eye of cylon
animation kicks in pretty quickly so user doesn't have any chance to
answer the questions which services to start before the confirmation
message is screwed by the cylon.
This basically breaks the confirm_spawn functionality completely.
This patch prevents the cylon animation to kick in when
confirmation_spawn=yes.
namespace: simplify, optimize and extend handling of mounts for namespace
This changes a couple of things in the namespace handling:
It merges the BindMount and TargetMount structures. They are mostly the same,
hence let's just use the same structue, and rely on C's implicit zero
initialization of partially initialized structures for the unneeded fields.
This reworks memory management of each entry a bit. It now contains one "const"
and one "malloc" path. We use the former whenever we can, but use the latter
when we have to, which is the case when we have to chase symlinks or prefix a
root directory. This means in the common case we don't actually need to
allocate any dynamic memory. To make this easy to use we add an accessor
function bind_mount_path() which retrieves the right path string from a
BindMount structure.
While we are at it, also permit "+" as prefix for dirs configured with
ReadOnlyPaths= and friends: if specified the root directory of the unit is
implicited prefixed.
This also drops set_bind_mount() and uses C99 structure initialization instead,
which I think is more readable and clarifies what is being done.
This drops append_protect_kernel_tunables() and
append_protect_kernel_modules() as append_static_mounts() is now simple enough
to be called directly.
Prefixing with the root dir is now done in an explicit step in
prefix_where_needed(). It will prepend the root directory on each entry that
doesn't have it prefixed yet. The latter is determined depending on an extra
bit in the BindMount structure.
Merge pull request #4678 from poettering/gc-device
Automatically GC device jobs when there's no need to keep them in the job queue anymore.
Implement systemctl list-jobs --before/--after.
Allow systemd-run -p After/Before/Wants/Requires= ...
systemctl: shorter list-jobs --before/--after output a bit
(before)$ systemctl list-jobs --before --after
JOB UNIT TYPE STATE
8769 foobar.device start running
A job waits for this job: 8669 (run-rb6da596d0cfa4e36b7c594cd973e795a.service/start)
8669 run-rb6da596d0cfa4e36b7c594cd973e795a.service start waiting
This job waits for a job: 8769 (foobar.device/start)
2 jobs listed.
(after)$ systemctl list-jobs --before --after
JOB UNIT TYPE STATE
8769 foobar.device start running
waiting for job 8669 (run-rb6da596d0cfa4e36b7c594cd973e795a.service/start)
8669 run-rb6da596d0cfa4e36b7c594cd973e795a.service start waiting
blocking job 8769 (foobar.device/start)
systemctl: add env var to force connection to system manager via the bus
Sometimes it is useful for debugging purposes to force systemctl to connect to
PID 1 via the bus instead of direct connection, even if the direct connection
is possible.
In contrast to all other unit types device units when queued just track
external state, they cannot effect state changes on their own. Hence unless a
client or other job waits for them there's no reason to keep them in the job
queue. This adds a concept of GC'ing jobs of this type as soon as no client or
other job waits for them anymore.
To ensure this works correctly we need to track which clients actually
reference a job (i.e. which ones enqueued it). Unfortunately that's pretty
nasty to do for direct connections, as sd_bus_track doesn't work for
them. For now, work around this, by simply remembering in a boolean that a job
was requested by a direct connection, and reset it when we notice the direct
connection is gone. This means the GC logic works fine, except that jobs are
not immediately removed when direct connections disconnect.
In the longer term, a rework of the bus logic should fix this properly. For now
this should be good enough, as GC works for fine all cases except this one, and
thus is a clear improvement over the previous behaviour.
core: rename "clients" field of Job structure to "bus_track"
Let's make semantics of this field more similar to the same functionality in
the Unit object, in particular as we add new functionality to it later on.
Djalal Harouni [Mon, 14 Nov 2016 08:12:21 +0000 (09:12 +0100)]
core:gperf: pass the exec_context struct directly to parse restrict namespaces
The RestrictNamespaces= takes yes, no or a list of namespaces types,
therefor config_parse_restrict_namespaces() is a bit complex and it
operates on the ExecContext, fix this by passing the offset of
ExecContext directly otherwise restricting namespaces won't work.
Djalal Harouni [Tue, 15 Nov 2016 09:15:27 +0000 (10:15 +0100)]
core: improve the logic that implies no new privileges
The no_new_privileged_set variable is not used any more since commit 9b232d3241fcfbf60af that fixed another thing. So remove it. Also no
need to check if we are under user manager, remove that part too.
nspawn: restart the whole systemd-nspawn@.service unit on container reboot (#4613)
Since 133 is now used in a few places, add a #define for it.
Also make the status message a bit informative.
Another issue introduced in b006762. The logic was borked, we were supposed
to return 0 to break the loop, and 133 to restart the container, not the other
way around.
But this doesn't seem to work, reboot fails with:
Nov 08 00:41:32 laptop systemd-nspawn[26564]: Failed to register machine: Machine 'fedora-rawhide' already exists
So actually the version before this patch worked better, since 133 > 0 and we'd
at least loop internally.
build-sys: do not install ctrl-alt-del.target symlink twice
It was a harmless but pointless duplication. Fixes #4655.
Note: in general we try to install as little as possible in
/etc/systemd/{system,user}. We only install .wants links there for units which
are "user configurable", i.e. which have an [Install] section. Most our units
and aliases are not user configurable, do not have an [Install] section, and
must be symlinked statically during installation. A few units do have an
[Install] section, and are enabled through symlinks in /etc/ during
installation using GENERAL_ALIASES. It *would* be possible to not create those
symlinks, and instead require 'systemctl preset' to be invoked after
installation, but GENERAL_ALIASES works well enough.
Felipe Sateler [Sat, 12 Nov 2016 02:28:06 +0000 (23:28 -0300)]
systemctl: resolve symlinks when finding unit paths (#4545)
Otherwise we think the alias is the real unit, and may edit/cat the
wrong unit.
Before this patch:
$ systemctl edit autovt@ # creates dropin in /etc/systemd/system/autovt@.service.d
$ systemctl cat autovt@ | grep @.service
# /lib/systemd/system/autovt@.service
# that serial gettys are covered by serial-getty@.service, not this
# /etc/systemd/system/autovt@.service.d/override.conf
$ systemctl cat getty@ | grep @.service
# /lib/systemd/system/getty@.service
# that serial gettys are covered by serial-getty@.service, not this
After this patch
$ systemctl edit autovt@ # creates dropin in /etc/systemd/system/getty@.service.d
$ systemctl cat autovt@ | grep @.service
# /usr/lib/systemd/system/getty@.service
# that serial gettys are covered by serial-getty@.service, not this
# /etc/systemd/system/getty@.service.d/override.conf
systemctl cat getty@ | grep @.service
# /usr/lib/systemd/system/getty@.service
# that serial gettys are covered by serial-getty@.service, not this
# /etc/systemd/system/getty@.service.d/override.conf
units: disable /sys/fs/fuse/connections in private user namespaces (#4592)
The mount fails, even though CAP_SYS_ADMIN is granted.
Only file systems with FU_USERNS_MOUNT in .fs_flags may be mounted in userns,
and the patch to add that fusectl was rejected [1]. It would be nice if we
could check if the kernel has FU_USERNS_MOUNT for a given fs type, since this
could change over time, but this information doesn't seem to be exported.
So let's just skip this mount in userns to avoid an error during boot.
akochetkov [Fri, 11 Nov 2016 17:50:46 +0000 (20:50 +0300)]
timesyncd: clear ADJ_MAXERROR to keep STA_UNSYNC cleared after jump adjust (#4626)
NTP use jump adjust if system has incorrect time read from RTC during boot.
It is desireble to update RTC time as soon as NTP set correct system time.
Sometimes kernel failed to update RTC due to STA_UNSYNC get set before RTC
update finised. In that case RTC time wouldn't be updated within long time.
The commit makes RTC updates stable.
When NTP do jump time adjust using ADJ_SETOFFSET it clears STA_UNSYNC flag.
If don't clear ADJ_MAXERROR, STA_UNSYNC will be set again by kernel within
1 second (by second_overflow() function). STA_UNSYNC flag prevent RTC updates
in kernel. Sometimes the kernel is able to update RTC withing 1 second,
but sometimes it falied.
basic/virt: fix userns check on CONFIG_USER_NS=n kernel (#4651)
ENOENT should be treated as "false", but because of the broken errno check it
was treated as an error. So ConditionVirtualization=user-namespaces probably
returned the correct answer, but only by accident.
This adds a new systemd fstab option x-systemd.mount-timeout. The option
adds a timeout value that specifies how long systemd waits for the mount
command to finish. It allows to mount huge btrfs volumes without issues.
This is equivalent to adding option TimeoutSec= to [Mount] section in a
mount unit file.
Djalal Harouni [Thu, 10 Nov 2016 17:11:37 +0000 (18:11 +0100)]
core:namespace: count and free failed paths inside chase_all_symlinks() (#4619)
This certainly fixes a bug that was introduced by PR
https://github.com/systemd/systemd/pull/4594 that intended to fix
https://github.com/systemd/systemd/issues/4567.
The fix was not complete. This patch makes sure that we count and free
all paths that fail inside chase_all_symlinks().
Susant Sahani [Fri, 4 Nov 2016 09:55:07 +0000 (15:25 +0530)]
Link: port to new ethtool ETHTOOL_xLINKSETTINGS
Link: port to new ethtool ETHTOOL_xLINKSETTINGS
This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
handled by the new get_link_ksettings/set_link_ksettings .
This is a WIP version based on this [kernel
patch](https://patchwork.kernel.org/patch/8411401/).
So revert the default to the legacy hierarchy for now. Developers of the above
software can opt into the unified hierarchy with
"systemd.legacy_systemd_cgroup_controller=0".