shared/install: allow "enable" on linked unit files (#3790)
User expectations are broken when "systemctl enable /some/path/service.service"
behaves differently to "systemctl link ..." followed by "systemctl enable".
From user's POV, "enable" with the full path just combines the two steps into
one.
Michal Soltys [Mon, 25 Jul 2016 14:18:00 +0000 (16:18 +0200)]
getty@.service.m4: add Conflicts=/Before= against rescue.service (#3792)
If user isolates rescue target from multi-user or graphical target (or just
starts the service), IgnoreOnIsolate will cause issues with sulogin which is
directly started on current virtual console. This patch adds necessary
Conflicts= and Before= against rescue.service.
Note that this is not needed for emergency target, as implicit Requires= and
After= against sysinit.target is in effect for this service
(DefaultDependencies=yes).
Alban Crequy [Mon, 25 Jul 2016 13:39:46 +0000 (15:39 +0200)]
namespace: don't fail on masked mounts (#3794)
Before this patch, a service file with ReadWriteDirectories=/file...
could fail if the file exists but is not a mountpoint, despite being
listed in /proc/self/mountinfo. It could happen with masked mounts.
man: update systemctl man page for unit file commands, in particular "systemctl enable"
Clarify that "systemctl enable" can operate either on unit names or on unit
file paths (also, adjust the --help text to clarify this). Say that "systemctl
enable" on unit file paths also links the unit into the search path.
Many other fixes.
This should improve the documentation to avoid further confusion around #3706.
Let's not mention the supposed security benefit of turning off caching. It is
really questionnable, and I#d rather not create the impression that we actually
believed turning off caching would be a good idea.
Instead, mention that Cache=no is implicit if a DNS server on the local host is
used.
systemctl: never check inhibitors if -H or -M are used (#3781)
Don't check inhibitors when operating remotely. The interactivity inhibitors
imply can#t be provided anyway, and the current code checks for local sessions
directly, via various sd_session_xyz() APIs, hence bypass it entirely if we
operate on remote systems.
cgroup: whitelist inaccessible devices for "auto" and "closed" DevicePolicy.
https://github.com/systemd/systemd/pull/3685 introduced
/run/systemd/inaccessible/{chr,blk} to map inacessible devices,
this patch allows systemd running inside a nspawn container to create
/run/systemd/inaccessible/{chr,blk}.
namespace: ensure to return a valid inaccessible nodes (#3778)
Because /run/systemd/inaccessible/{chr,blk} are devices with
major=0 and minor=0 it might be possible that these devices cannot be created
so we use /run/systemd/inaccessible/sock instead to map them.
core: add a concept of "dynamic" user ids, that are allocated as long as a service is running
This adds a new boolean setting DynamicUser= to service files. If set, a new
user will be allocated dynamically when the unit is started, and released when
it is stopped. The user ID is allocated from the range 61184..65519. The user
will not be added to /etc/passwd (but an NSS module to be added later should
make it show up in getent passwd).
For now, care should be taken that the service writes no files to disk, since
this might result in files owned by UIDs that might get assigned dynamically to
a different service later on. Later patches will tighten sandboxing in order to
ensure that this cannot happen, except for a few selected directories.
Harald Hoyer [Fri, 22 Jul 2016 13:33:13 +0000 (15:33 +0200)]
macros.systemd.in: add %systemd_ordering (#3776)
To remove the hard dependency on systemd, for packages, which function
without a running systemd the %systemd_ordering macro can be used to
ensure ordering in the rpm transaction. %systemd_ordering makes sure,
the systemd rpm is installed prior to the package, so the %pre/%post
scripts can execute the systemd parts.
Installing systemd afterwards though, does not result in the same outcome.
core: change TasksMax= default for system services to 15%
As it turns out 512 is max number of tasks per service is hit by too many
applications, hence let's bump it a bit, and make it relative to the system's
maximum number of PIDs. With this change the new default is 15%. At the
kernel's default pids_max value of 32768 this translates to 4915. At machined's
default TasksMax= setting of 16384 this translates to 2457.
Why 15%? Because it sounds like a round number and is close enough to 4096
which I was going for, i.e. an eight-fold increase over the old 512
Summary:
| on the host | in a container
old default | 512 | 512
new default | 4915 | 2457
logind: change TasksMax= value for user logins to 33%
Let's change from a fixed value of 12288 tasks per user to a relative value of
33%, which with the kernel's default of 32768 translates to 10813. This is a
slight decrease of the limit, for no other reason than "33%" sounding like a nice
round number that is close enough to 12288 (which would translate to 37.5%).
(Well, it also has the nice effect of still leaving a bit of room in the PID
space if there are 3 cooperating evil users that try to consume all PIDs...
Also, I like my bikesheds blue).
Since the new value is taken relative, and machined's TasksMax= setting
defaults to 16384, 33% inside of containers is usually equivalent to 5406,
which should still be ample space.
To summarize:
| on the host | in the container
old default | 12288 | 12288
new default | 10813 | 5406
core: support percentage specifications on TasksMax=
This adds support for a TasksMax=40% syntax for specifying values relative to
the system's configured maximum number of processes. This is useful in order to
neatly subdivide the available room for tasks within containers.
With this change we'll no longer write to /etc/machine-id from nspawn, as that
breaks the --volatile= operation, as it ensures the image is never considered
in "first boot", since that's bound to the pre-existance of /etc/machine-id.
The new logic works like this:
- If /etc/machine-id already exists in the container, it is read by nspawn and
exposed in "machinectl status" and friends.
- If the file doesn't exist yet, but --uuid= is passed on the nspawn cmdline,
this UUID is passed in $container_uuid to PID 1, and PID 1 is then expected
to persist this to /etc/machine-id for future boots (which systemd already
does).
- If the file doesn#t exist yet, and no --uuid= is passed a random UUID is
generated and passed via $container_uuid.
The result is that /etc/machine-id is never initialized by nspawn itself, thus
unbreaking the volatile mode. However still the machine ID configured in the
machine always matches nspawn's and thus machined's idea of it.
sd-id128: split UUID file read/write code into new id128-util.[ch]
We currently have code to read and write files containing UUIDs at various
places. Unify this in id128-util.[ch], and move some other stuff there too.
The new files are located in src/libsystemd/sd-id128/ (instead of src/shared/),
because they are actually the backend of sd_id128_get_machine() and
sd_id128_get_boot().
In follow-up patches we can use this reduce the code in nspawn and
machine-id-setup by adopted the common implementation.
nspawn: enable major=0/minor=0 devices inside the container (#3773)
https://github.com/systemd/systemd/pull/3685 introduced
/run/systemd/inaccessible/{chr,blk} to map inacessible devices,
this patch allows systemd running inside a nspawn container to create
/run/systemd/inaccessible/{chr,blk}.
kernel-install: when searching for location to place kernel consider /efi
With this change kernel-install will now first look for an existing kernel
installation in /efi, /boot and /boot/efi. If none is found, /efi is used if it
is a mount point, otherwise /boot/efi if it is one. If nothing of that worked
/boot is used without further checking.
This means /boot should be the default unless something was installed before or
something else was explicitly mounted.
bootctl: move toupper() implementation to string-util.h
We already have tolower() calls there, hence let's unify this at one place.
Also, update the code to only use ASCII operations, so that we don't end up
being locale dependant.
bootctl: rework to use common verbs parsing, and add searching of ESP path
This rearranges bootctl a bit, so that it uses the usual verbs parsing
routines, and automatically searches the ESP in /boot, /efi or /boot/efi, thus
increasing compatibility with mainstream distros that insist on /boot/efi.
This also adds minimal support for running bootctl in a container environment:
when run inside a container verification of the ESP via raw block device
access, trusting the container manager to mount the ESP correctly. Moreover,
EFI variables are not accessed when running in the container.
nspawn: if an ESP is part of the disk image to operate on, mount it to /efi or /boot
Matching the behaviour of gpt-auto-generator, if we find an ESP while
dissecting a container image, mount it to /efi or /boot if those dirs exist and
are empty.
This should enable us to run "bootctl" inside a container and do the right
thing.
gpt-generator: use /efi as mount point for the ESP if it exists
Let's make the EFI generator a bit smarter: if /efi exists it is used as mount
point for the ESP, otherwise /boot is used. This should increase compatibility
with distros which use legacy boot loaders that insist on having /boot as
something that isn't the ESP.
Alexander Kurtz [Thu, 21 Jul 2016 00:29:54 +0000 (02:29 +0200)]
bootctl: Always use upper case for "/EFI/BOOT" and "/EFI/BOOT/BOOT*.EFI".
If the ESP is not mounted with "iocharset=ascii", but with "iocharset=utf8"
(which is for example the default in Debian), the file system becomes case
sensitive. This means that a file created as "FooBarBaz" cannot be accessed as
"foobarbaz" since those are then considered different files.
Moreover, a file created as "FooBar" can then also not be accessed as "foobar",
and it also prevents such a file from being created, as both would use the same
8.3 short name "FOOBAR".
Even though the UEFI specification [0] does give the canonical spelling for
the files mentioned above, not all implementations completely conform to that,
so it's possible that those files would already exist, but with a different
spelling, causing subtle bugs when scanning or modifying the ESP.
While the proper fix would of course be that everybody conformed to the
standard, we can work around this problem by just referencing the files by
their 8.3 short names, i.e. using upper case.
Fixes: #3740
[0] <http://www.uefi.org/specifications>, version 2.6, section 3.5.1.1
nspawn: when netns is on, mount /proc/sys/net writable
Normally we make all of /proc/sys read-only in a container, but if we do have
netns enabled we can make /proc/sys/net writable, as things are virtualized
then.
units: fix TasksMax=16384 for systemd-nspawn@.service
When a container scope is allocated via machined it gets 16K set already since cf7d1a30e44bf380027a2e73f9bf13f423a33cc1. Make sure when a container is run as
system service it gets the same values.
core: normalize header inclusion in execute.h a bit
We don't actually need any functionality from cgroup.h in execute.h, hence
don't include that. However, we do need the Unit structure from unit.h, hence
include that, and move it as late as possible, since it needs the definitions
from execute.h.
All other functions in execute.c that need the unit id take a Unit* parameter
as first argument. Let's change connect_logger_as() to follow a similar logic.
core: when forcibly killing/aborting left-over unit processes log about it
Let's lot at LOG_NOTICE about any processes that we are going to
SIGKILL/SIGABRT because clean termination of them didn't work.
This turns the various boolean flag parameters to cg_kill(), cg_migrate() and
related calls into a single binary flags parameter, simply because the function
now gained even more parameters and the parameter listed shouldn't get too
long.
Logging for killing processes is done either when the kill signal is SIGABRT or
SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE
is passed. This isn't used yet in this patch, but is made use of in a later
patch.