Anita Zhang [Mon, 8 Oct 2018 03:28:36 +0000 (20:28 -0700)]
core: implement per unit journal rate limiting
Add LogRateLimitIntervalSec= and LogRateLimitBurst= options for
services. If provided, these values get passed to the journald
client context, and those values are used in the rate limiting
function in the journal over the the journald.conf values.
sulogin-shell: Use force if SYSTEMD_SULOGIN_FORCE set
When the root account is locked sulogin will either inform you of
this and not allow you in or if --force is used it will hand
you passwordless root (if using a recent enough version of util-linux).
Not being allowed a shell is ofcourse inconvenient, but at the same
time handing out passwordless root unconditionally is probably not
a good idea everywhere.
This patch thus allows to control which behaviour you want by
setting the SYSTEMD_SULOGIN_FORCE environment variable to true
or false to control the behaviour, eg. via adding this to
'systemctl edit rescue.service' (or emergency.service):
[Service]
Environment=SYSTEMD_SULOGIN_FORCE=1
Distributions who used locked root accounts and want the passwordless
behaviour could thus simply drop in the override file in
/etc/systemd/system/rescue.service.d/override.conf
core: do not "warn" about mundane emergency actions
For example in a container we'd log:
Oct 17 17:01:10 rawhide systemd[1]: Started Power-Off.
Oct 17 17:01:10 rawhide systemd[1]: Forcibly powering off: unit succeeded
Oct 17 17:01:10 rawhide systemd[1]: Reached target Power-Off.
Oct 17 17:01:10 rawhide systemd[1]: Shutting down.
and on the console we'd write (in red)
[ !! ] Forcibly powering off: unit succeeded
This is not useful in any way, and the fact that we're calling an "emergency action"
is an internal implementation detail. Let's log about c-a-d and the watchdog actions
only.
units: allow and use SuccessAction=exit-force in system systemd-exit.service
C.f. 287419c119ef961db487a281162ab037eba70c61: 'systemctl exit 42' can be
used to set an exit value and pulls in exit.target, which pulls in systemd-exit.service,
which calls org.fdo.Manager.Exit, which calls method_exit(), which sets the objective
to MANAGER_EXIT. Allow the same to happen through SuccessAction=exit.
units: use SuccessAction=poweroff-force in systemd-poweroff.service
Explicit systemctl calls remain in systemd-halt.service and the system
systemd-exit.service. To convert systemd-halt, we'd need to add
SuccessAction=halt-force. Halting doesn't make much sense, so let's just
leave that is. systemd-exit.service will be converted in the next commit.
core: limit service-watchdogs=no to actual "watchdog" commands
The setting is now only looked at when considering an action for a job timeout
or unit start limit. It is ignored for ctrl-alt-del, SuccessAction, SuccessFailure.
v2: turn the parameter into a flag field
v3: rename Options to Flags
core: accept system mode emergency action specifiers with a warning
Before we would only accept those "system" values, so there wasn't other
chocie. Let's provide backwards compatiblity in case somebody made use of
this functionality in user mode.
v2: use 'exit-force' not 'exit'
v3: use error value in log_syntax
core: define "exit" and "exit-force" actions for user units and only accept that
We would accept e.g. FailureAction=reboot-force in user units and then do an
exit in the user manager. Let's be stricter, and define "exit"/"exit-force" as
the only supported actions in user units.
v2:
- rename 'exit' to 'exit-force' and add new 'exit'
- add test for the parsing function
man: move description of *Action= modes to FailureAction=/SuccessAction=
FailureAction=/SuccessAction= were added later then StartLimitAction=, so it
was easiest to refer to the existing description. But those two settings are
somewhat simpler (they just execute the action unconditionally) while
StartLimitAction= has additional timing and burst parameters, and they are
about to take on a more prominent role, so let's move the description of
allowed values.
core: consider service with no start command immediately started
The service would always be in state == SERVICE_INACTIVE, but it needs to go
through state == SERVICE_START so that SuccessAction/FailureAction are executed.
main: bump fs.nr_open + fs.max-file to their largest possible values
After discussions with kernel folks, a system with memcg really
shouldn't need extra hard limits on file descriptors anymore, as they
are properly accounted for by memcg anyway. Hence, let's bump these
values to their maximums.
This also adds a build time option to turn thiss off, to cover those
users who do not want to use memcg.
sd-boot: write the IDs of all discovered entries to an EFI variable
This is primarily useful for debugging, but can be useful for other
purposes too. For example userspace could check whether "auto-windows"
is included in the list, before triggering a boot-into-windows
operation.
The field derives from a file name only in very specific cases, for
many cases it's a fixed string (for example, all "auto-" items are like
this). Also, even when it derives from a file name, it is processed a
bit, as suffixes are removed and the string is converted to lower case.
hence, let's name this field "id" instead, because that's what it is
used for: as general identification token.
Firmware implementations are generally pretty bad, hence let's better
add an explicit check for NULL before invokin FreePool(), in particular
is it doesn't appear to be documented whether FreePool() is supposed to
be happy with NULL.
units: bump the RLIMIT_NOFILE soft limit for all services that access the journal
This updates the unit files of all our serviecs that deal with journal
stuff to use a higher RLIMIT_NOFILE soft limit by default. The new value
is the same as used for the new HIGH_RLIMIT_NOFILE we just added.
With this we ensure all code that access the journal has higher
RLIMIT_NOFILE. The code that runs as daemon via the unit files, the code
that is run from the user's command line via C code internal to the
relevant tools. In some cases this means we'll redundantly bump the
limits as there are tools run both from the command line and as service.
core: raise the RLIMIT_NOFILE hard limit for all services by default
Following the discussions with the kernel folks, let's substantially
increase the hard limit (but not the soft limit) of RLIMIT_NOFILE to
256K for all services we start.
Note that PID 1 itself bumps the limit even further, to the max the
kernel allows. We can deal with that after all.
tree-wide: uniformly bump RLIMIT_NOFILE in all our tools that access the journal
This makes use of rlimit_nofile_bump() in all tools that access the
journal. In some cases this replaces older code to achieve this, and
others we add it in where it was missing.
core: add a new call for bumping RLIMIT_NOFILE to "high" values
Following discussions with some kernel folks at All Systems Go! it
appears that file descriptors are not really as expensive as they used
to be (both memory and performance-wise) and it should thus be OK to allow
programs (including unprivileged ones) to have more of them without ill
effects.
Unfortunately we can't just raise the RLIMIT_NOFILE soft limit
globally for all processes, as select() and friends can't handle fds
>= 1024, and thus unexpecting programs might fail if they accidently get
an fd outside of that range. We can however raise the hard limit, so
that programs that need a lot of fds can opt-in into getting fds beyond
the 1024 boundary, simply by bumping the soft limit to the now higher
hard limit.
This is useful for all our client code that accesses the journal, as the
journal merging logic might need a lot of fds. Let's add a unified
function for bumping the limit in a robust way.
This simply adds a new constant we can use for bumping RLIMIT_NOFILE to
a "high" value. It default to 256K for now, which is pretty high, but
smaller than the kernel built-in limit of 1M.
Previously, some tools that needed a higher RLIMIT_NOFILE bumped it to
16K. This new define goes substantially higher than this, following the
discussion with the kernel folks.