man: mention pages with more settings at end of each option list
For some unit types we have hundreds of options, and the reader may easily miss
that more options are described in other pages. We already mentioned this in
the introduction and then at the top of the option list, but it can't hurt to
repeat the information.
Also, add an (almost empty) Options section for the unit types which don't have
any custom options. It is nicer to have the same page structure in all cases,
so people can jump between pages for different types more easily.
I have no idea if this is going to cause rendering problems, and it is fairly
hard to check. So let's just merge this, and if it github markdown processor
doesn't like it, revert.
man: cross-reference DeviceAllow= and PrivateDevices=
They are somewhat similar, but not easy to discover, esp. considering that
they are described in different pages.
For PrivateDevices=, split out the first paragraph that gives the high-level
overview. (The giant second paragraph could also use some heavy editing to break
it up into more digestible chunks, alas.)
As far as I can see, we use this to get a list of ARPHRD_* defines (used in
particular for Type= in .link files). If we drop our copy, and build against
old kernel headers, the user will have a shorter list of types available. This
seems OK, and I don't think it's worth carrying our own version of this file
just to have newest possible entries.
7c5b9952c4f6e2b72f90edbe439982528b7cf223 recently updated this file, but we'd
have to update it every time the kernel adds new entries. But if we look at
the failure carefully:
src/basic/arphrd-from-name.gperf:65:16: error: ‘ARPHRD_MCTP’ undeclared (first use in this function); did you mean ‘ARPHRD_FCPP’?
65 | MCTP, ARPHRD_MCTP
| ^~
| ARPHRD_FCPP
we see that the list we were generating was from the system headers, so it was
only as good as the system headers anyway, without the newer entries in our
bundled copy, if there were any. So let's make things simpler by always using
system headers.
And if somebody wants to fix things so that we always have the newest list,
then we should just generate and store the converted list, not the full header.
boot: ReallocatePool() supports NULL pointers as first argument
Just like userspace realloc() the EFIlib ReallocatePool() function is
happy to use a NULL pointer as input, in which case it is equivalent to
AllocatePool(). See:
oom: Add support for user unit ManagedOOM property updates
Compared to PID1 where systemd-oomd has to be the client to PID1
because PID1 is a more privileged process than systemd-oomd, systemd-oomd
is the more privileged process compared to a user manager so we have
user managers be the client whereas systemd-oomd is now the server.
The same varlink protocol is used between user managers and systemd-oomd
to deliver ManagedOOM property updates. systemd-oomd now sets up a varlink
server that user managers connect to to send ManagedOOM property updates.
We also add extra validation to make sure that non-root senders don't
send updates for cgroups they don't own.
The integration test was extended to repeat the chill/bloat test using
a user manager instead of PID1.
Unfortunately, when checking the return/exit code using &&, ||, if,
while, etc., `set -e` is disabled for all nested functions as well,
which leads to incorrectly ignored errors, *sigh*.
Example:
```
set -eu
set -o pipefail
task() {
echo "task init"
echo "this should fail"
false
nonexistentcommand
echo "task end (we shouldn't be here)"
}
if ! task; then
echo >&2 "The task failed"
exit 1
else
echo "The task passed"
fi
```
```
$ bash test.sh
task init
this should fail
test.sh: line 10: nonexistentcommand: command not found
task end (we shouldn't be here)
The task passed
$ echo $?
0
```
But without the `if`, everything works "as expected":
```
set -eu
set -o pipefail
task() {
echo "task init"
echo "this should fail"
false
nonexistentcommand
echo "task end (we shouldn't be here)"
}
task
```
```
$ bash test.sh
task init
this should fail
$ echo $?
1
```
ci: temporarily set -Wno-deprecated-declarations in Packit
to suppress OpenSSL 3.0 deprecation warnings (until a proper solution
is deployed):
```
../src/shared/creds-util.c: In function ‘sha256_hash_host_and_tpm2_key’:
../src/shared/creds-util.c:412:9: error: ‘SHA256_Init’ is deprecated: Since OpenSSL 3.0 [-Werror=deprecated-declarations]
412 | if (SHA256_Init(&sha256_context) != 1)
| ^~
In file included from /usr/include/openssl/x509.h:41,
from ../src/shared/openssl-util.h:8,
from ../src/shared/creds-util.c:21:
/usr/include/openssl/sha.h:73:27: note: declared here
73 | OSSL_DEPRECATEDIN_3_0 int SHA256_Init(SHA256_CTX *c);
| ^~~~~~~~~~~
../src/shared/creds-util.c:415:9: error: ‘SHA256_Update’ is deprecated: Since OpenSSL 3.0 [-Werror=deprecated-declarations]
415 | if (host_key && SHA256_Update(&sha256_context, host_key, host_key_size) != 1)
| ^~
In file included from /usr/include/openssl/x509.h:41,
from ../src/shared/openssl-util.h:8,
from ../src/shared/creds-util.c:21:
/usr/include/openssl/sha.h:74:27: note: declared here
74 | OSSL_DEPRECATEDIN_3_0 int SHA256_Update(SHA256_CTX *c,
| ^~~~~~~~~~~~~
../src/shared/creds-util.c:418:9: error: ‘SHA256_Update’ is deprecated: Since OpenSSL 3.0 [-Werror=deprecated-declarations]
418 | if (tpm2_key && SHA256_Update(&sha256_context, tpm2_key, tpm2_key_size) != 1)
| ^~
In file included from /usr/include/openssl/x509.h:41,
from ../src/shared/openssl-util.h:8,
from ../src/shared/creds-util.c:21:
/usr/include/openssl/sha.h:74:27: note: declared here
74 | OSSL_DEPRECATEDIN_3_0 int SHA256_Update(SHA256_CTX *c,
| ^~~~~~~~~~~~~
../src/shared/creds-util.c:421:9: error: ‘SHA256_Final’ is deprecated: Since OpenSSL 3.0 [-Werror=deprecated-declarations]
421 | if (SHA256_Final(ret, &sha256_context) != 1)
| ^~
In file included from /usr/include/openssl/x509.h:41,
from ../src/shared/openssl-util.h:8,
from ../src/shared/creds-util.c:21:
/usr/include/openssl/sha.h:76:27: note: declared here
76 | OSSL_DEPRECATEDIN_3_0 int SHA256_Final(unsigned char *md, SHA256_CTX *c);
| ^~~~~~~~~~~~
cc1: all warnings being treated as errors
Gets rid of a few gotos, allows removing the extra ret variable and
will also be used in a future commit by the codepath that receives
cgroups from user instances of systemd.
test-fileio: test read_virtual_file() with more files from /proc
i.e. let's pick some files we know are too large, or where struct stat's
.st_size is zero even though non-empty, and test read_virtual_file()
with that, to ensure things are handled sensibly. Goal is to ensure all
three major codepaths in read_virtual_file() are tested.
fileio: fix truncated read handling in read_virtual_file()
We mishandled the case where the size we read from the file actually
matched the maximum size fully. In that case we cannot really make a
determination whether the file was fully read or only partially. In that
case let's do another loop, so that we operate with a buffer, and
we can detect the EOF (which will be signalled to us via a short read).
systemd-journald makes many calls to read /proc/PID/cmdline and
/proc/PID/status, both of which tend to be well under 4K. However the
combination of allocating 4M read buffers, then using `realloc()` to
shrink the buffer in `read_virtual_file()` appears to be creating
fragmentation in the heap (when combined with the other allocations
systemd-journald is doing).
To help mitigate this, try reading /proc with a 4K buffer as
`read_virtual_file()` did before 2ac67221bb6270f0fbe7cbd0076653832cd49de2.
If it isn't big enough then try again with the larger buffers.
sd-journal: Ignore data threshold if set to zero in sd_journal_enumerate_fields()
According to the documentation, Setting the data threshold to zero disables the
data threshold alltogether. Let's make sure we actually implement this behaviour
in sd_journal_enumerate_fields() by only applying the data threshold if it exceeds
zero.
hostnamed: add support for getting the chassis type from device-tree
Device-tree based devices can't get the chassis type from DMI or ACPI,
and so far need a custom `/etc/machine-info` to set this property right.
A new 'chassis-type' toplevel device tree property has recently been
approved into the DT specification, making it possible to automate
chassis type detection on such devices.
This patch therefore falls back to reading this device-tree property if
nothing is available through both DMI and ACPI.
basic: nulstr-util: add nulstr_get() returning the matching string
Currently `nulstr_contains` returns a boolean, making it difficult to
identify which of the input strings matches the "needle".
Adding a new `nulstr_get()` function, returning a const pointer to the
matching string, eases this process and allows us to directly re-use the
result of a call to this function without additional processing or
memory allocation.
sysctl-util: make sysctl_read_ip_property() a wrapper around sysctl_read()
let's do what we did for sysctl_write()/sysctl_write_ip_property() also
for the read paths: i.e. make one a wrapper of the other, and add more
careful input validation.
sysctl-util: make sysctl_write_ip_property() a wrapper around sysctl_write()
It does the same stuff, let's use the same codepaths as much as we can.
And while we are at it, let's generate good error codes in case we are
called with unsupported parameters/let's validate stuff more that might
originate from user input.
sysctl-util: rework sysctl_write() to wrap write_string_file()
The sysctl_write_ip_property() call already uses write_string_file(), so
let's do so here, too, to make the codepaths more uniform.
While we are at it, let's also validate the passed path a bit, since we
shouldn't allow sysctls with /../ or such in the name. Hence simplify
the path first, and then check if it is normalized, and refuse if not.
fileio: lower maximum virtual file buffer size by one byte
When reading virtual files (i.e. procfs, sysfs, …) we currently put a
limit of 4M-1 on that. We have to pick something, and we have to read
these files in a single read() (since the kernel generally doesn't
support continuation read()s for them). 4M-1 is actually the maximum
size the kernel allows for reads from files in /proc/sys/, all larger
reads will result in an ENOMEM error (which is really weird, but the
kernel does what the kernel does). Hence 4M-1 sounds like a smart
choice.
However, we made one mistake here: in order to be able to detect EOFs
properly we actually read one byte more than we actually intend to
return: if that extra byte can be read, then we know the file is
actually larger than our limit and we can generate an EFBIG error from
that. However, if it cannot be read then we know EOF was hit, and we are
good. So ultimately after all we issued a single 4M read, which the
kernel then responds with ENOMEM to. And that means read_virtual_file()
actually doesn't work properly right now on /proc/sys/. Let's fix that.
The fix is simple, lower the limit of the the buffer we intend to return
by one, i.e. 4M-2. That way, the read() we'll issue is exactly as large
as the limit the kernel allows, and we still get safely detect EOF from
it.
otherwise compilation with -Db_ndebug=true complains about a
set-but-not-used variable:
```
../src/partition/repart.c:907:33: error: variable 'left' set but not used [-Werror,-Wunused-but-set-variable]
uint64_t start, left;
^
1 error generated.
```
Franck Bui [Wed, 4 Aug 2021 09:20:07 +0000 (11:20 +0200)]
journalctl: never fail at flushing when the flushed flag is set
Even if journald was not running, flushing the volatile journal used to work if
the journal was already flushed (ie the flushed flag
/run/systemd/journald/flushed was created).
However since commit 4f413af2a0a, this behavior changed and now '--flush' fails
because it tries to contact journald without checking the presence of the
flushed flag anymore.
This patch restores the previous behavior since there's no reason to fail when
journalctl can figure out that the flush is not necessary.
tree-wide: mark set-but-not-used variables as unused to make LLVM happy
LLVM 13 introduced `-Wunused-but-set-variable` diagnostic flag, which
trips over some intentionally set-but-not-used variables or variables
attached to cleanup handlers with side effects (`_cleanup_umask_`,
`_cleanup_(notify_on_cleanup)`, `_cleanup_(restore_sigsetp)`, etc.):
```
../src/basic/process-util.c:1257:46: error: variable 'saved_ssp' set but not used [-Werror,-Wunused-but-set-variable]
_cleanup_(restore_sigsetp) sigset_t *saved_ssp = NULL;
^
1 error generated.
```
watchdog: update watchdog_timeout with the closest timeout found by the driver
Store the actual timeout value found by the driver in watchdog_timeout since
this value is more accurate for calculating the next time for pinging the
device.
This basically reverts commit 61927b9f116bf45bfdbf19dc2981d4a4f527ae5f and
relies on the fact that watchdog_ping() will open and setup the watchdog for us
in case the device appears later on.
Also unlike what is said in comment
https://github.com/systemd/systemd/pull/17460#pullrequestreview-517434377, both
m->watchdog[] and m->overriden_watchdog[] are not supposed to store the actual
timeout used by the watchdog device but stores the value defined by the user.
If the HW timeout value is really needed by the manager then it's probably
better to read it via an helper defined in watchdog.c instead. However the HW
timeout value is currently only needed by the watchdog code itself mainly when
it calculates the time for the next ping.
watchdog: make watchdog_ping() a NOP when the watchdog is disabled or closed
This patch allows watchdog_ping() to be used unconditionally regardless of
whether watchdog_set_timeout() or watchdog_close() has been previously called
or not and in both cases watchdog_ping() does nothing.
shutdown.c has been updated to cope with this change.