core: add new "scope" unit type for making a unit of pre-existing processes
"Scope" units are very much like service units, however with the
difference that they are created from pre-existing processes, rather
than processes that systemd itself forks off. This means they are
generated programmatically via the bus API as transient units rather
than from static configuration read from disk. Also, they do not provide
execution-time parameters, as at the time systemd adds the processes to
the scope unit they already exist and the parameters cannot be applied
anymore.
The primary benefit of this new unit type is to create arbitrary cgroups
for worker-processes forked off an existing service.
This commit also adds a a new mode to "systemd-run" to run the specified
processes in a scope rather then a transient service.
This fix is more complicated than it should be due to the consecutive
XML elements separated by collapsible whitespace.
Merging the lines and separating the XML elements with an en space or a
non-breaking space is the only solution that results in one, and only
one, space being inserted between them when testing. An em space results
in two spaces being inserted.
Transient units can be created via the bus API. They are configured via
the method call parameters rather than on-disk files. They are subject
to normal GC. Transient units currently may only be created for
services (however, we will extend this), and currently only ExecStart=
and the cgroup parameters can be configured (also to be extended).
Transient units require a unique name, that previously had no
configuration file on disk.
A tool systemd-run is added that makes use of this functionality to run
arbitrary command lines as transient services:
$ systemd-run /bin/ping www.heise.de
Will cause systemd to create a new transient service and run ping in it.
dbus: add infrastructure for changing multiple properties at once on units and hook some cgroup attributes up to it
This introduces two bus calls to make runtime changes to selected bus
properties, optionally with persistence.
This currently hooks this up only for three cgroup atributes, but this
brings the infrastructure to add more changable attributes.
This allows setting multiple attributes at once, and takes an array
rather than a dictionary of properties, in order to implement simple
resetting of lists using the same approach as when they are sourced from
unit files. This means, that list properties are appended to by this
call, unless they are first reset via assigning the empty list.
Replace the very generic cgroup hookup with a much simpler one. With
this change only the high-level cgroup settings remain, the ability to
set arbitrary cgroup attributes is removed, so is support for adding
units to arbitrary cgroup controllers or setting arbitrary paths for
them (especially paths that are different for the various controllers).
This also introduces a new -.slice root slice, that is the parent of
system.slice and friends. This enables easy admin configuration of
root-level cgrouo properties.
This replaces DeviceDeny= by DevicePolicy=, and implicitly adds in
/dev/null, /dev/zero and friends if DeviceAllow= is used (unless this is
turned off by DevicePolicy=).
When manpages are displayed on a terminal, <literal>s are indistinguishable
from surrounding text. Add quotes everywhere, remove duplicate quotes,
and tweak a few lists for consistent formatting.
Corrupted empty files are relatively common. I think they are created
when a coredump for a user who never logged anything before is
attempted to be written, but the write does not succeed because the
coredump is too big, but there are probably other ways to create
those, especially if the machine crashes at the right time.
Non-corrupted empty files can also happen, e.g. if a journal file is
opened, but nothing is ever successfully written to it and it is
rotated because of MaxFileSec=. Either way, each "empty" journal file
costs around 3 MB, and there's little point in keeping them around.
Reporting of the free space was bogus, since the remaining space
was compared with the maximum allowed, instead of the current
use being compared with the maximum allowed. Simplify and fix
by reporting limits directly at the point where they are calculated.
Sometimes an entry is not successfully written, and we end up with
data items which are "unlinked", not connected to, and not used by any
entry. This will usually happen when we write to write a core dump,
and the initial small data fields are written successfully, but
the huge COREDUMP= field is not written. This situation is hard
to avoid, but the results are mostly harmless. Thus only warn about
unused data items.
Also, be more verbose about why journal files failed verification.
This should help diagnose journal failure modes without resorting
to a hexadecimal editor.
https://bugs.freedesktop.org/show_bug.cgi?id=65235 (esp. see
system.journal attached to the bug report).
Umut Tezduyar [Sun, 9 Jun 2013 05:08:46 +0000 (07:08 +0200)]
manager: add DefaultEnvironment option
This complements existing functionality of setting variables
through 'systemctl set-environment', the kernel command line,
and through normal environment variables for systemd in session
mode.
logind: add infrastructure to keep track of machines, and move to slices
- This changes all logind cgroup objects to use slice objects rather
than fixed croup locations.
- logind can now collect minimal information about running
VMs/containers. As fixed cgroup locations can no longer be used we
need an entity that keeps track of machine cgroups in whatever slice
they might be located. Since logind already keeps track of users,
sessions and seats this is a trivial addition.
- nspawn will now register with logind and pass various bits of metadata
along. A new option "--slice=" has been added to place the container
in a specific slice.
- loginctl gained commands to list, introspect and terminate machines.
- user.slice and machine.slice will now be pulled in by logind.service,
since only logind.service requires this slice.
core: add new .slice unit type for partitioning systems
In order to prepare for the kernel cgroup rework, let's introduce a new
unit type to systemd, the "slice". Slices can be arranged in a tree and
are useful to partition resources freely and hierarchally by the user.
Each service unit can now be assigned to one of these slices, and later
on login users and machines may too.
Slices translate pretty directly to the cgroup hierarchy, and the
various objects can be assigned to any of the slices in the tree.
Ross Lagerwall [Thu, 13 Jun 2013 09:45:12 +0000 (10:45 +0100)]
rules: only run systemd-sysctl when a network device is added
Otherwise, when a network device is renamed, systemd-sysctl is run twice
with the same network device name: once for ACTION="add" and once for
ACTION="move".
Jason St. John [Wed, 12 Jun 2013 17:45:14 +0000 (19:45 +0200)]
man: improve readability of "_TRANSPORT=" section in systemd.journal-fields(7)
The list and descriptions of valid transports was difficult to read, so
break the long sentence up into discrete man page list items to improve
readability.
Since the system journal wasn't open yet, available_space() returned 0.
Before:
systemd-journal[22170]: Allowing system journal files to grow to 4.0G.
systemd-journal[22170]: Journal size currently limited to 0B due to SystemKeepFree.
After:
systemd-journal[22178]: Allowing system journal files to grow to 4.0G.
systemd-journal[22178]: Journal size currently limited to 3.0G due to SystemKeepFree.
Also, when failing to write a message, show how much space was needed:
"Failed to write entry (26 items, 260123456 bytes) despite vacuuming, ignoring: ...".
Ross Burton [Tue, 11 Jun 2013 16:16:37 +0000 (17:16 +0100)]
build-sys: don't install quotaon.service twice
quotaon.service is already installed through dist_systemunit_DATA, so it doesn't
need to be added to nodist_systemunit_DATA. Installing the same file twice
results in a race condition where the install process can fail.
In the following scenario:
server creates system.journal
server creates user-1000.journal
both journals share the same seqnum_id.
Then
server writes to user-1000.journal first,
and server writes to system.journal a bit later,
and everything is fine.
The server then terminates (crash, reboot, rsyslog testing,
whatever), and user-1000.journal has entries which end with
a lower seqnum than system.journal. Now
server is restarted
server opens user-1000.journal and writes entries to it...
BAM! duplicate seqnums for the same seqnum_id.
Now, we usually don't see that happen, because system.journal
is closed last, and opened first. Since usually at least one
message is written during boot and lands in the system.journal,
the seqnum is initialized from it, and is set to a number higher
than than anything found in user journals. Nevertheless, if
system.journal is corrupted and is rotated, it can happen that
an entry is written to the user journal with a seqnum that is
a duplicate with an entry found in the corrupted system.journal~.
When browsing the journal, journalctl can fall into a loop
where it tries to follow the seqnums, and tries to go the
next location by seqnum, and is transported back in time to
to the older duplicate seqnum. There is not way to find
out the maximum seqnum used in a multiple files, without
actually looking at all of them. But we don't want to do
that because it would be slow, and actually it isn't really
possible, because a file might e.g. be temporarily unaccessible.
Fix the problem by using different seqnum series for user
journals. Using the same seqnum series for rotated journals
is still fine, because we know that nothing will write
to the rotated journal anymore.
This allows the caller to explicitly specify which journal files
should be opened. The same functionality could be achieved before
by creating a directory and playing around with symlinks. It
is useful to debug stuff and explore the journal, and has been
requested before.
Waiting is supported, the journal will notice modifications on
the files supplied when opening the journal, but will not add
any new files.