main: bump fs.nr_open + fs.max-file to their largest possible values
After discussions with kernel folks, a system with memcg really
shouldn't need extra hard limits on file descriptors anymore, as they
are properly accounted for by memcg anyway. Hence, let's bump these
values to their maximums.
This also adds a build time option to turn thiss off, to cover those
users who do not want to use memcg.
units: bump the RLIMIT_NOFILE soft limit for all services that access the journal
This updates the unit files of all our serviecs that deal with journal
stuff to use a higher RLIMIT_NOFILE soft limit by default. The new value
is the same as used for the new HIGH_RLIMIT_NOFILE we just added.
With this we ensure all code that access the journal has higher
RLIMIT_NOFILE. The code that runs as daemon via the unit files, the code
that is run from the user's command line via C code internal to the
relevant tools. In some cases this means we'll redundantly bump the
limits as there are tools run both from the command line and as service.
core: raise the RLIMIT_NOFILE hard limit for all services by default
Following the discussions with the kernel folks, let's substantially
increase the hard limit (but not the soft limit) of RLIMIT_NOFILE to
256K for all services we start.
Note that PID 1 itself bumps the limit even further, to the max the
kernel allows. We can deal with that after all.
tree-wide: uniformly bump RLIMIT_NOFILE in all our tools that access the journal
This makes use of rlimit_nofile_bump() in all tools that access the
journal. In some cases this replaces older code to achieve this, and
others we add it in where it was missing.
core: add a new call for bumping RLIMIT_NOFILE to "high" values
Following discussions with some kernel folks at All Systems Go! it
appears that file descriptors are not really as expensive as they used
to be (both memory and performance-wise) and it should thus be OK to allow
programs (including unprivileged ones) to have more of them without ill
effects.
Unfortunately we can't just raise the RLIMIT_NOFILE soft limit
globally for all processes, as select() and friends can't handle fds
>= 1024, and thus unexpecting programs might fail if they accidently get
an fd outside of that range. We can however raise the hard limit, so
that programs that need a lot of fds can opt-in into getting fds beyond
the 1024 boundary, simply by bumping the soft limit to the now higher
hard limit.
This is useful for all our client code that accesses the journal, as the
journal merging logic might need a lot of fds. Let's add a unified
function for bumping the limit in a robust way.
This simply adds a new constant we can use for bumping RLIMIT_NOFILE to
a "high" value. It default to 256K for now, which is pretty high, but
smaller than the kernel built-in limit of 1M.
Previously, some tools that needed a higher RLIMIT_NOFILE bumped it to
16K. This new define goes substantially higher than this, following the
discussion with the kernel folks.
systemctl: clean up start_unit_one() error handling
Let's split exit code handling in two: "r" is only used for errno-style
errors, and "ret" is used for exit() codes. Then, let's use EXIT_SUCCESS
for checking whether the latter is already used.
This way it should always be clear what kind of error we are processing,
and when we propaate one into the other.
Moreover this allows us to drop "q" form all inner loops, avoiding
confusion when to use "q" and when "r" to store received errors.
Let's be safe than sorry, in particular as logind doesn't set it up
anymore, but user-runtime-dir@.service does, and logind doesn't really
track success of that.
So far we always used "yes" instead of "true" in all our unit files,
except for one outlier. Let's do this here too. No change in behaviour
whatsoever, except that it looks prettier ;-)
logind: automatically GC lingering users for who now user@.service (nor slice, not runtime dir service) is running anymore
This heavily borrows from @intelfx' PR #5546, but watches all three
units that are associated with a user now: the slice, the user@.service
and user-runtime-dir@.service.
The logic and reasoning behind it is the same though: there's no value
in keeping lingering users around if all their three services are gone.
logind: add a RequiresMountsFor= dependency from the session scope unit to the home directory of the user
This is useful so that during shutdown scope units are always terminated
before the mounts necessary for the home directory.
(Ideally we'd also add a similar dependency from the user@.service
instance to the home directory, but this isn't as easy as that service
is defined statically and not dynamically, and hence not easy to modify
dynamically, in particular when it comes to deps)
logind: change user-runtime-dir to query runtime dir size from logind via the bus
I think this is a slightly cleaner approach than parsing the
configuration file at multiple places, as this way there's only a single
reload cycle for logind.conf, and that's systemd-logind.service's
runtime.
This means that logind and dbus become a requirement of
user-runtime-dir, but given that XDG_RUNTIME_DIR is not set anyway
without logind and dbus around this isn't really any limitation.
This also simplifies linking a bit as this means user-runtime-dir
doesn't have to link against any code of logind itself.
logind: rework how we manage the slice and user-runtime-dir@.service unit for each user
Instead of managing it explicitly, let's simplify things and rely on
regular Wants=/Requires= dependencies to pull in these units from
user@.service and the session scope, and StopWhenUneeded= to stop these
auxiliary units again. This way, they can be pulled in easily by
unrelated units too.
This simplifies things quite a bit: for each session we now only need to
manage the session scope, and for each user the user@.service, the other
units are not something we need to manage anymore.
This patch also makes sure that if user@.service of a user is masked we
will continue to work, and user-runtime-dir@.service will still be
correctly pulled in, as it is now a dependency of the scope unit.
Let's propagate errors from stopping sessions via seat_stop(). This is
similar to how we propagate such errors in user_stop() for all sessions
associated with a user.
Note that we propagate these errors, but we don't abort the function.
logind: improve logging in manager_connect_console()
let's make sure we log about every failure
Also, complain about systems where /dev/tty0 exists but
/sys/class/tty/tty0/active does not. Such systems (usually container
environments) are pretty broken as they mount something that is not a VC
to /dev/tty0 and they really shouldn't.
Systems should either have a VC or not, but not badly fake one by
mounting things wildly.
This just adds a warning message, as before we'll simply turn off VC
handling in this case.
Let's not use the word "wrapper", as it's not clear what that is, and in
some way any unit file is a "wrapper"... let's simply say that it's
about the runtime directory.
logind: fix serialization/deserialization of user's "display session"
Previously this was serialized as part of the user object. This didn't
work however, as we load users first, and sessions seconds and hence
referencing a session from the user load logic cannot work.
Fix this by storing an IS_DISPLAY property along with each session, and
make the session with this set display session when it is loaded.
man: systemctl: clarify that --lines=0 is allowed (#10375)
The term “positive” is often read to exclude 0 (though “strictly
positive” is sometimes used to clarify this), so let’s explicitly state
that --lines=0 is legal and completely disables journal output.
networkd: fix attribute length for wireguard (#10380)
This is actually a u16, not a u32, so the kernel complains:
kernel: netlink: 'systemd-network': attribute type 5 has an invalid length
This is due to:
if (nla_attr_len[pt->type] && attrlen != nla_attr_len[pt->type]) {
pr_warn_ratelimited("netlink: '%s': attribute type %d has an invalid length.\n",
current->comm, type);
}
Presumably this has been working fine in functionality on little-endian
systems, but nobody bothered to try on big-endian systems.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>