dbus: add infrastructure for changing multiple properties at once on units and hook some cgroup attributes up to it
This introduces two bus calls to make runtime changes to selected bus
properties, optionally with persistence.
This currently hooks this up only for three cgroup atributes, but this
brings the infrastructure to add more changable attributes.
This allows setting multiple attributes at once, and takes an array
rather than a dictionary of properties, in order to implement simple
resetting of lists using the same approach as when they are sourced from
unit files. This means, that list properties are appended to by this
call, unless they are first reset via assigning the empty list.
Replace the very generic cgroup hookup with a much simpler one. With
this change only the high-level cgroup settings remain, the ability to
set arbitrary cgroup attributes is removed, so is support for adding
units to arbitrary cgroup controllers or setting arbitrary paths for
them (especially paths that are different for the various controllers).
This also introduces a new -.slice root slice, that is the parent of
system.slice and friends. This enables easy admin configuration of
root-level cgrouo properties.
This replaces DeviceDeny= by DevicePolicy=, and implicitly adds in
/dev/null, /dev/zero and friends if DeviceAllow= is used (unless this is
turned off by DevicePolicy=).
When manpages are displayed on a terminal, <literal>s are indistinguishable
from surrounding text. Add quotes everywhere, remove duplicate quotes,
and tweak a few lists for consistent formatting.
Corrupted empty files are relatively common. I think they are created
when a coredump for a user who never logged anything before is
attempted to be written, but the write does not succeed because the
coredump is too big, but there are probably other ways to create
those, especially if the machine crashes at the right time.
Non-corrupted empty files can also happen, e.g. if a journal file is
opened, but nothing is ever successfully written to it and it is
rotated because of MaxFileSec=. Either way, each "empty" journal file
costs around 3 MB, and there's little point in keeping them around.
Reporting of the free space was bogus, since the remaining space
was compared with the maximum allowed, instead of the current
use being compared with the maximum allowed. Simplify and fix
by reporting limits directly at the point where they are calculated.
Sometimes an entry is not successfully written, and we end up with
data items which are "unlinked", not connected to, and not used by any
entry. This will usually happen when we write to write a core dump,
and the initial small data fields are written successfully, but
the huge COREDUMP= field is not written. This situation is hard
to avoid, but the results are mostly harmless. Thus only warn about
unused data items.
Also, be more verbose about why journal files failed verification.
This should help diagnose journal failure modes without resorting
to a hexadecimal editor.
https://bugs.freedesktop.org/show_bug.cgi?id=65235 (esp. see
system.journal attached to the bug report).
Umut Tezduyar [Sun, 9 Jun 2013 05:08:46 +0000 (07:08 +0200)]
manager: add DefaultEnvironment option
This complements existing functionality of setting variables
through 'systemctl set-environment', the kernel command line,
and through normal environment variables for systemd in session
mode.
logind: add infrastructure to keep track of machines, and move to slices
- This changes all logind cgroup objects to use slice objects rather
than fixed croup locations.
- logind can now collect minimal information about running
VMs/containers. As fixed cgroup locations can no longer be used we
need an entity that keeps track of machine cgroups in whatever slice
they might be located. Since logind already keeps track of users,
sessions and seats this is a trivial addition.
- nspawn will now register with logind and pass various bits of metadata
along. A new option "--slice=" has been added to place the container
in a specific slice.
- loginctl gained commands to list, introspect and terminate machines.
- user.slice and machine.slice will now be pulled in by logind.service,
since only logind.service requires this slice.
core: add new .slice unit type for partitioning systems
In order to prepare for the kernel cgroup rework, let's introduce a new
unit type to systemd, the "slice". Slices can be arranged in a tree and
are useful to partition resources freely and hierarchally by the user.
Each service unit can now be assigned to one of these slices, and later
on login users and machines may too.
Slices translate pretty directly to the cgroup hierarchy, and the
various objects can be assigned to any of the slices in the tree.
Ross Lagerwall [Thu, 13 Jun 2013 09:45:12 +0000 (10:45 +0100)]
rules: only run systemd-sysctl when a network device is added
Otherwise, when a network device is renamed, systemd-sysctl is run twice
with the same network device name: once for ACTION="add" and once for
ACTION="move".
Jason St. John [Wed, 12 Jun 2013 17:45:14 +0000 (19:45 +0200)]
man: improve readability of "_TRANSPORT=" section in systemd.journal-fields(7)
The list and descriptions of valid transports was difficult to read, so
break the long sentence up into discrete man page list items to improve
readability.
Since the system journal wasn't open yet, available_space() returned 0.
Before:
systemd-journal[22170]: Allowing system journal files to grow to 4.0G.
systemd-journal[22170]: Journal size currently limited to 0B due to SystemKeepFree.
After:
systemd-journal[22178]: Allowing system journal files to grow to 4.0G.
systemd-journal[22178]: Journal size currently limited to 3.0G due to SystemKeepFree.
Also, when failing to write a message, show how much space was needed:
"Failed to write entry (26 items, 260123456 bytes) despite vacuuming, ignoring: ...".
Ross Burton [Tue, 11 Jun 2013 16:16:37 +0000 (17:16 +0100)]
build-sys: don't install quotaon.service twice
quotaon.service is already installed through dist_systemunit_DATA, so it doesn't
need to be added to nodist_systemunit_DATA. Installing the same file twice
results in a race condition where the install process can fail.
In the following scenario:
server creates system.journal
server creates user-1000.journal
both journals share the same seqnum_id.
Then
server writes to user-1000.journal first,
and server writes to system.journal a bit later,
and everything is fine.
The server then terminates (crash, reboot, rsyslog testing,
whatever), and user-1000.journal has entries which end with
a lower seqnum than system.journal. Now
server is restarted
server opens user-1000.journal and writes entries to it...
BAM! duplicate seqnums for the same seqnum_id.
Now, we usually don't see that happen, because system.journal
is closed last, and opened first. Since usually at least one
message is written during boot and lands in the system.journal,
the seqnum is initialized from it, and is set to a number higher
than than anything found in user journals. Nevertheless, if
system.journal is corrupted and is rotated, it can happen that
an entry is written to the user journal with a seqnum that is
a duplicate with an entry found in the corrupted system.journal~.
When browsing the journal, journalctl can fall into a loop
where it tries to follow the seqnums, and tries to go the
next location by seqnum, and is transported back in time to
to the older duplicate seqnum. There is not way to find
out the maximum seqnum used in a multiple files, without
actually looking at all of them. But we don't want to do
that because it would be slow, and actually it isn't really
possible, because a file might e.g. be temporarily unaccessible.
Fix the problem by using different seqnum series for user
journals. Using the same seqnum series for rotated journals
is still fine, because we know that nothing will write
to the rotated journal anymore.
This allows the caller to explicitly specify which journal files
should be opened. The same functionality could be achieved before
by creating a directory and playing around with symlinks. It
is useful to debug stuff and explore the journal, and has been
requested before.
Waiting is supported, the journal will notice modifications on
the files supplied when opening the journal, but will not add
any new files.
AND term usually don't have many subterms (4 seems to be the maximum
sensible number, e.g. _BOOT_ID && _SYSTEMD_UNIT && _PID && MESSAGE_ID).
Nevertheless, the cost of checking each subterm can be relatively
high, especially when the nested terms are compound, and it
makes sense to minimize the number of checks.
Instead of looping to the end and then again over the whole list once
again after at least one term changed the offset, start the loop at
the term which caused the change. This way ½ terms in the AND match
are not checked unnecessarily again.
--user basically gives messages from your own systemd --user services.
--system basically gives messages from PID 1, kernel, and --system
services. Those two options are not exahustive, because a priviledged
user might be able to see messages from other users, and they will not
be shown with either or both of those flags.
SD_JOURNAL_CURRENT_USER flags is added to sd_j_open(), to open
files from current user.
SD_JOURNAL_SYSTEM_ONLY is renamed to SD_JOURNAL_SYSTEM,
and changed to mean to (also) open system files. This way various
flags can be combined, which gives them nicer semantics, especially
if other ones are added later.
Backwards compatibility is kept, because SD_JOURNAL_SYSTEM_ONLY
is equivalent to SD_JOURNAL_SYSTEM if used alone, and before there
we no other flags.
journalctl: fix verbose output when no logs are found
$ journalctl -o verbose _EXE=/quiet/binary -f
-- Logs begin at Sun 2013-03-17 17:28:22 EDT. --
Failed to get realtime timestamp: Cannot assign requested address
JOURNAL_FOREACH_DATA_RETVAL is added, which allows the caller
to get the return value from sd_journal_enumerate_data. I think
we might want to expose this macro like SD_JOURNAL_FOREACH_DATA,
but for now it is in journal-internal.h.
There's a change in behaviour for output_*, not only in
output_verbose, that errors in sd_j_enumerate_data are not silently
ignored anymore.
Ross Lagerwall [Sun, 9 Jun 2013 16:28:44 +0000 (17:28 +0100)]
service: don't report alien child as alive when it's not
When a sigchld is received from an alien child, main_pid is set to
0 then service_enter_running calls main_pid_good to check if the
child is running. This incorrectly returned true because
kill(main_pid, 0) would return >= 0.
This fixes an error where a service would die and the cgroup would
become empty but the service would still report as active (running).
systemctl: remove extra padding from status output
In 131601349 'systemctl: align all status fields to common column',
padding was calculated for 'ListenStream: ...', etc. Later on in 45a4f7233 'systemctl: tweak output of Listen: fields a bit' output
was changed to 'Listen: ... (stream)', but calculation didn't change.
Just remove the calculation, since now the result will be always 8,
and it it more important to have everything aligned to the widest
field ("Main-PID"), than to save a few columns, usually at most two
(e.g. "Listen").
Note: strlen is more natural, and is optimized to sizeof even
with -O0.
service: execute ExecStopPost= commands when the watchdog timeout hits
We can assume that a service for which a watchdog timeout was triggered
is unresponsive to a clean shutdown. However, it still makes sense to
execute the post-stop cleanup commands that can be configured with
ExecStopPost=. Hence, when the timeout is hit enter STOP_SIGKILL rather
than FINAL_SIGKILL.
Chengwei Yang [Mon, 20 May 2013 07:22:27 +0000 (15:22 +0800)]
manager: Do not handle SIGKILL since we can not
This is a minor fix because it's not a major issue, this fix just avoid
to get EINVAL error from sigaction(2).
There are two signals can not handled at user space, SIGKILL and
SIGSTOP even we're PID 1, trying to handle these two signals will get
EINVAL error.
There are two kinds of systemd instance, running as system manager or
user session manager, apparently, the latter is a general user space
process which can not handle SIGKILL. The special pid 1 also can not
do that refer to kernel/signal.c:do_sigaction().
However, pid 1 is unkillable because the kernel did attach
SIGNAL_UNKILLABLE to it at system boot up, refer to
init/main.c:start_kernel()
--> rest_init()
--> kernel_thread()
--> kernel_init()
--> init_post()
current->signal->flags |= SIGNAL_UNKILLABLE