sd-id128: be more liberal when reading files with 128bit IDs
Accept both files with and without trailing newlines. Apparently some rkt
releases generated them incorrectly, missing the trailing newlines, and we
shouldn't break that.
Michael Olbrich [Mon, 25 Jul 2016 18:04:02 +0000 (20:04 +0200)]
automount: don't cancel mount/umount request on reload/reexec (#3670)
All pending tokens are already serialized correctly and will be handled
when the mount unit is done.
Without this a 'daemon-reload' cancels all pending tokens. Any process
waiting for the mount will continue with EHOSTDOWN.
This can happen when the mount unit waits for it's dependencies, e.g.
network, devices, fsck, etc.
Michael Olbrich [Mon, 25 Jul 2016 18:02:55 +0000 (20:02 +0200)]
transaction: don't cancel jobs for units with IgnoreOnIsolate=true (#3671)
This is important if a job was queued for a unit but not yet started.
Without this, the job will be canceled and is never executed even though
IgnoreOnIsolate it set to 'true'.
systemctl: avoid "leaking" some strings in UnitStatusInfo
% valgrind --leak-check=full systemctl status multipathd.service --no-pager -n0
...
==431== 16 bytes in 2 blocks are definitely lost in loss record 1 of 2
==431== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==431== by 0x534AF19: strdup (in /usr/lib64/libc-2.23.so)
==431== by 0x4E81AEE: free_and_strdup (string-util.c:794)
==431== by 0x4EF66C1: map_basic (bus-util.c:1030)
==431== by 0x4EF6A8E: bus_message_map_all_properties (bus-util.c:1153)
==431== by 0x120487: show_one (systemctl.c:4672)
==431== by 0x1218F3: show (systemctl.c:4990)
==431== by 0x4EC359E: dispatch_verb (verbs.c:92)
==431== by 0x12A3AE: systemctl_main (systemctl.c:7742)
==431== by 0x12B1A8: main (systemctl.c:8011)
==431==
==431== LEAK SUMMARY:
==431== definitely lost: 16 bytes in 2 blocks
This happens because map_basic() strdups the strings. Other code in systemctl
assigns strings to UnitStatusInfo without copying them, relying on the fact
that the message is longer lived than UnitStatusInfo. Add a helper function
that is similar to map_basic, but only accepts strings and does not copy them.
The alternative of continuing to use map_basic() but adding proper cleanup
to free fields in UnitStatusInfo seems less attractive because it'd require
changing a lot of code and doing a lot of more allocations for little gain.
(I put "leaking" in quotes, because systemctl is short lived anyway.)
core: change ExecStart=! syntax to ExecStart=+ (#3797)
As suggested by @mbiebl we already use the "!" special char in unit file
assignments for negation, hence we should not use it in a different context for
privileged execution. Let's use "+" instead.
To "search something", in the meaning of looking for it, is valid,
but "search _for_ something" is much more commonly used, especially when
the meaning could be confused with "looking _through_ something"
(for some other object).
(C.f. "the police search a person", "the police search for a person".)
Also reword the rest of the paragraph to avoid using "automatically"
three times.
"strict versioned dependency" suggests that version "231" of the library
is stable. But the ABI or API might be changed in any patch, so reword
the text to avoid using "version".
shared/install: allow "enable" on linked unit files (#3790)
User expectations are broken when "systemctl enable /some/path/service.service"
behaves differently to "systemctl link ..." followed by "systemctl enable".
From user's POV, "enable" with the full path just combines the two steps into
one.
Michal Soltys [Mon, 25 Jul 2016 14:18:00 +0000 (16:18 +0200)]
getty@.service.m4: add Conflicts=/Before= against rescue.service (#3792)
If user isolates rescue target from multi-user or graphical target (or just
starts the service), IgnoreOnIsolate will cause issues with sulogin which is
directly started on current virtual console. This patch adds necessary
Conflicts= and Before= against rescue.service.
Note that this is not needed for emergency target, as implicit Requires= and
After= against sysinit.target is in effect for this service
(DefaultDependencies=yes).
Alban Crequy [Mon, 25 Jul 2016 13:39:46 +0000 (15:39 +0200)]
namespace: don't fail on masked mounts (#3794)
Before this patch, a service file with ReadWriteDirectories=/file...
could fail if the file exists but is not a mountpoint, despite being
listed in /proc/self/mountinfo. It could happen with masked mounts.
man: update systemctl man page for unit file commands, in particular "systemctl enable"
Clarify that "systemctl enable" can operate either on unit names or on unit
file paths (also, adjust the --help text to clarify this). Say that "systemctl
enable" on unit file paths also links the unit into the search path.
Many other fixes.
This should improve the documentation to avoid further confusion around #3706.
Let's not mention the supposed security benefit of turning off caching. It is
really questionnable, and I#d rather not create the impression that we actually
believed turning off caching would be a good idea.
Instead, mention that Cache=no is implicit if a DNS server on the local host is
used.
systemctl: never check inhibitors if -H or -M are used (#3781)
Don't check inhibitors when operating remotely. The interactivity inhibitors
imply can#t be provided anyway, and the current code checks for local sessions
directly, via various sd_session_xyz() APIs, hence bypass it entirely if we
operate on remote systems.
cgroup: whitelist inaccessible devices for "auto" and "closed" DevicePolicy.
https://github.com/systemd/systemd/pull/3685 introduced
/run/systemd/inaccessible/{chr,blk} to map inacessible devices,
this patch allows systemd running inside a nspawn container to create
/run/systemd/inaccessible/{chr,blk}.
namespace: ensure to return a valid inaccessible nodes (#3778)
Because /run/systemd/inaccessible/{chr,blk} are devices with
major=0 and minor=0 it might be possible that these devices cannot be created
so we use /run/systemd/inaccessible/sock instead to map them.
Harald Hoyer [Fri, 22 Jul 2016 13:33:13 +0000 (15:33 +0200)]
macros.systemd.in: add %systemd_ordering (#3776)
To remove the hard dependency on systemd, for packages, which function
without a running systemd the %systemd_ordering macro can be used to
ensure ordering in the rpm transaction. %systemd_ordering makes sure,
the systemd rpm is installed prior to the package, so the %pre/%post
scripts can execute the systemd parts.
Installing systemd afterwards though, does not result in the same outcome.
core: change TasksMax= default for system services to 15%
As it turns out 512 is max number of tasks per service is hit by too many
applications, hence let's bump it a bit, and make it relative to the system's
maximum number of PIDs. With this change the new default is 15%. At the
kernel's default pids_max value of 32768 this translates to 4915. At machined's
default TasksMax= setting of 16384 this translates to 2457.
Why 15%? Because it sounds like a round number and is close enough to 4096
which I was going for, i.e. an eight-fold increase over the old 512
Summary:
| on the host | in a container
old default | 512 | 512
new default | 4915 | 2457
logind: change TasksMax= value for user logins to 33%
Let's change from a fixed value of 12288 tasks per user to a relative value of
33%, which with the kernel's default of 32768 translates to 10813. This is a
slight decrease of the limit, for no other reason than "33%" sounding like a nice
round number that is close enough to 12288 (which would translate to 37.5%).
(Well, it also has the nice effect of still leaving a bit of room in the PID
space if there are 3 cooperating evil users that try to consume all PIDs...
Also, I like my bikesheds blue).
Since the new value is taken relative, and machined's TasksMax= setting
defaults to 16384, 33% inside of containers is usually equivalent to 5406,
which should still be ample space.
To summarize:
| on the host | in the container
old default | 12288 | 12288
new default | 10813 | 5406
core: support percentage specifications on TasksMax=
This adds support for a TasksMax=40% syntax for specifying values relative to
the system's configured maximum number of processes. This is useful in order to
neatly subdivide the available room for tasks within containers.
With this change we'll no longer write to /etc/machine-id from nspawn, as that
breaks the --volatile= operation, as it ensures the image is never considered
in "first boot", since that's bound to the pre-existance of /etc/machine-id.
The new logic works like this:
- If /etc/machine-id already exists in the container, it is read by nspawn and
exposed in "machinectl status" and friends.
- If the file doesn't exist yet, but --uuid= is passed on the nspawn cmdline,
this UUID is passed in $container_uuid to PID 1, and PID 1 is then expected
to persist this to /etc/machine-id for future boots (which systemd already
does).
- If the file doesn#t exist yet, and no --uuid= is passed a random UUID is
generated and passed via $container_uuid.
The result is that /etc/machine-id is never initialized by nspawn itself, thus
unbreaking the volatile mode. However still the machine ID configured in the
machine always matches nspawn's and thus machined's idea of it.
sd-id128: split UUID file read/write code into new id128-util.[ch]
We currently have code to read and write files containing UUIDs at various
places. Unify this in id128-util.[ch], and move some other stuff there too.
The new files are located in src/libsystemd/sd-id128/ (instead of src/shared/),
because they are actually the backend of sd_id128_get_machine() and
sd_id128_get_boot().
In follow-up patches we can use this reduce the code in nspawn and
machine-id-setup by adopted the common implementation.
nspawn: enable major=0/minor=0 devices inside the container (#3773)
https://github.com/systemd/systemd/pull/3685 introduced
/run/systemd/inaccessible/{chr,blk} to map inacessible devices,
this patch allows systemd running inside a nspawn container to create
/run/systemd/inaccessible/{chr,blk}.
Alexander Kurtz [Thu, 21 Jul 2016 00:29:54 +0000 (02:29 +0200)]
bootctl: Always use upper case for "/EFI/BOOT" and "/EFI/BOOT/BOOT*.EFI".
If the ESP is not mounted with "iocharset=ascii", but with "iocharset=utf8"
(which is for example the default in Debian), the file system becomes case
sensitive. This means that a file created as "FooBarBaz" cannot be accessed as
"foobarbaz" since those are then considered different files.
Moreover, a file created as "FooBar" can then also not be accessed as "foobar",
and it also prevents such a file from being created, as both would use the same
8.3 short name "FOOBAR".
Even though the UEFI specification [0] does give the canonical spelling for
the files mentioned above, not all implementations completely conform to that,
so it's possible that those files would already exist, but with a different
spelling, causing subtle bugs when scanning or modifying the ESP.
While the proper fix would of course be that everybody conformed to the
standard, we can work around this problem by just referencing the files by
their 8.3 short names, i.e. using upper case.
Fixes: #3740
[0] <http://www.uefi.org/specifications>, version 2.6, section 3.5.1.1
nspawn: when netns is on, mount /proc/sys/net writable
Normally we make all of /proc/sys read-only in a container, but if we do have
netns enabled we can make /proc/sys/net writable, as things are virtualized
then.
units: fix TasksMax=16384 for systemd-nspawn@.service
When a container scope is allocated via machined it gets 16K set already since cf7d1a30e44bf380027a2e73f9bf13f423a33cc1. Make sure when a container is run as
system service it gets the same values.
core: normalize header inclusion in execute.h a bit
We don't actually need any functionality from cgroup.h in execute.h, hence
don't include that. However, we do need the Unit structure from unit.h, hence
include that, and move it as late as possible, since it needs the definitions
from execute.h.
All other functions in execute.c that need the unit id take a Unit* parameter
as first argument. Let's change connect_logger_as() to follow a similar logic.
core: when forcibly killing/aborting left-over unit processes log about it
Let's lot at LOG_NOTICE about any processes that we are going to
SIGKILL/SIGABRT because clean termination of them didn't work.
This turns the various boolean flag parameters to cg_kill(), cg_migrate() and
related calls into a single binary flags parameter, simply because the function
now gained even more parameters and the parameter listed shouldn't get too
long.
Logging for killing processes is done either when the kill signal is SIGABRT or
SIGKILL, or on explicit request if KILL_TERMINATE_AND_LOG instead of LOG_TERMINATE
is passed. This isn't used yet in this patch, but is made use of in a later
patch.