And this makes the new merged function `sd_nfnl_nft_message_new_setelems()`
not open container, as containers should be opened and closed in the
same function in general. Otherwise, it is hard to understand which
level we are in the nested attribute tree.
Yu Watanabe [Tue, 14 Jun 2022 13:22:54 +0000 (22:22 +0900)]
sd-netlink: several cleanups for netfilter
- rename family -> nfproto, and other arguments,
- check specified nfproto,
- change type of several function arguments that specify data length,
- add several assertions,
- drop unnecessary headers.
For the system manager, /run/systemd/private is publicly accessible, because
/run/systemd is 0755, and /run/systemd/private is 0777. For the user manager,
/run/user/<uid> is 0700, and /run/user/<uid>/systemd/private is 0777. This
does not directly cause any security issue because we check the sender in
bus_check_peercred (ucred.uid != 0 && ucred.uid != geteuid()).
But it makes sense to limit access to the socket to avoid wasting time in PID1.
Somebody could send messages there that'd we'd reject anyway. It also makes
things more explicit.
libblkid gained new tags - FSSIZE, FSLASTBLOCK and FSBLOCKSIZE.
These tags are filesystem related properties probed from superblock.
All of them are enabled by BLKID_SUBLKS_FSINFO flag.
Set the flag to allow these tags to be cached in udev db.
Kai Lueke [Fri, 22 Jul 2022 12:57:55 +0000 (14:57 +0200)]
man: Use correct target type for sysupdate entry
While Type=file works because it seems to be the default, the line gets
ignored as printed on the stderr output.
Use the correct value "regular-file" for the target type.
Kai Lueke [Fri, 22 Jul 2022 13:09:21 +0000 (15:09 +0200)]
man: Document mask workaround for sysext images
A read-only /usr may ship a sysext image by default and the user wants
to opt out. Currently it's not clear how to do this.
Document that a /dev/null symlink in /etc/extensions/ works to "mask" a
sysext image in a folder with lower precedence.
Kai Lueke [Fri, 22 Jul 2022 13:03:12 +0000 (15:03 +0200)]
man: Do not recommend to overlay files with sysext even if possible
While overlaying files with a sysext can be useful, it may lead to
unexpected problems depending on when a process got started and which
version of the file it gets.
Call out that overlaying files is possible but don't recommend to make
use of it.
Let's re-introduce the option in the correct way, as some servers seem
to return borked message when the solicit message contain the rapid
commit option.
test-network: do not stop/restart udevd and related socket units
That's not necessary. Moreover, if the socket units are stopped in
`setUpModule()`, then there exists a short timespan that we cannot call
`udevadm control`, as the control socket may not be opened yet.
If we run whole tests, then the first test is
NetworkctlTests.test_altname, and it calls `udevadm control` in `setUp()`.
Hence, the test may fail.
man: split out "Type Modifiers" section from "Types" section in tmpfiles.d docs
I had trouble finding the right paragraphs, so I guess others might have
too. Hence let's add a tiny bit more structure by separating these two
parts out.
core: cache unit file selinux label, and make decisions based on that
Do not go back to disk on each selinux access, but instead cache the
label off the inode we are actually reading. That way unit file contents
and unit file label we use for access checks are always in sync.
This changes behaviour a bit, because we'll reach and cache the label at
the moment of loading the unit (i.e. usually on boot and reload), but
not after relabelling. Thus, users must refresh the cache explicitly via
a "systemctl daemon-reload" if they relabelled things.
This makes the SELinux story a bit more debuggable, as it adds an
AccessSELinuxContext bus property to units that will report the label we are
using for a unit (or the empty string if not known).
This also drops using the "source" path of a unit as label source. if
there's value in it, then generators should manually copy the selinux
label from the source files onto the generated unit files, so that the
rule that "access labels are read when we read the definition files" is
upheld. But I am not convinced this is really a necessary, good idea.
sleep: store battery discharge rate/hour with hash
Estimated battery discharge rate per hour is stored in :
/var/lib/systemd/sleep/battery_discharge_percentage_rate_per_hour
This value is used to determine the initial suspend interval. In case
this file is not available or value is invalid, HibernateDelaySec
interval is used.
After wakeup from initial suspend, this value is again estimated and
written to file if value is in range of 1-199.
Logs for reference : HibernateDelaySec=15min
- Updated in /etc/systemd/sleep.conf
Jul 14 19:17:58 localhost systemd-sleep[567]: Current battery charge
percentage: 100%
Jul 14 19:17:58 localhost systemd-sleep[567]: Failed to read discharge
rate from /var/lib/systemd/sleep/batt
ery_discharge_percentage_rate_per_hour: No such file or directory
Jul 14 19:17:58 localhost systemd-sleep[567]: Set timerfd wake alarm
for 15min
Jul 14 19:33:00 localhost systemd-sleep[567]: Current battery charge
percentage after wakeup: 90%
Jul 14 19:33:00 localhost systemd-sleep[567]: Attempting to estimate
battery discharge rate after wakeup from 15min sleep
Jul 14 19:33:00 localhost systemd-sleep[567]: product_id does not
exist: No such file or directory
Jul 14 19:33:00 localhost systemd-sleep[567]: Estimated discharge rate
39 successfully updated to
/var/lib/systemd/sleep/battery_discharge_percentage_rate_per_hour
Jul 14 19:33:00 localhost systemd-sleep[567]: Current battery charge
percentage: 90%
Jul 14 19:33:00 localhost systemd-sleep[567]: product_id does not
exist: No such file or directory
Jul 14 19:33:00 localhost systemd-sleep[567]: Set timerfd wake alarm
for 1h 48min 27s
Jul 14 21:21:30 localhost systemd-sleep[567]: Current battery charge
percentage after wakeup: 90%
Jul 14 21:21:30 localhost systemd-sleep[567]: Battery was not
discharged during suspension
sleep: use current charge level to decide suspension
If battery current charge percentage is below 5% hibernate directly.
Else initial suspend interval is set for HibernateDelaySec. On wakeup
estimate battery discharge rate per hour and if battery charge
percentage is not below 5% system is suspended else hibernated.
mkosi: Changes to allow booting with sanitizers in mkosi
- Extra memory because ASAN needs it
- The environment variables to make the sanitizers more useful
- LD_PRELOAD because the ASAN DSO needs to be the first in the list
- The sanitizer library packages
- Disable syscall filters because they interfere with ASAN
- Disable systemd-hwdb-update because it's super slow when systemd-hwdb
is built with sanitizers
- Take the value for meson's b_sanitize option from the SANITIZERS
environment variable
man/system-or-user-ns.xml: explicitly refer to `PrivateUsers=` option
It is not clear what "unprivileged user namespaces are available" means.
It could mean either that they are only usable, that is, enabled in the kernel,
or they have been enabled for the specific service. Referring to the
`PrivateUsers=` options makes it clear that the latter is meant.
core: drop a stray %m specifier from a warning message
since in this specific case (r == 0) `errno` is irrelevant and most likely
set to zero, leading up to a confusing message:
```
[ 120.595085] H systemd[1]: session-5.scope: No PIDs left to attach to the scope's control group, refusing: Success
[ 120.595144] H systemd[1]: session-5.scope: Failed with result 'resources'.
```
coredump: Try to write journald coredump metadata to the journal
Currently, if journald coredumps, the coredump is written to
/var/lib/systemd/coredump but the coredump metadata is not written
to the journal meaning we can't find out about the coredump's
existence via the journal. This means that coredumpctl can't be
used to work with journald coredumps, as well as any other tools
that rely on journald to know about coredumps.
To solve the issue, let's have systemd-coredump try to write
systemd-journald coredump metadata to the journal. We have to be
careful though, since if journald coredumps, there's no active
reader on the receive end of the journal socket, so we have to make
sure we don't deadlock trying to write to the socket. To avoid the
deadlock, we put the socket in nonblocking mode before trying to
write to it.
fstab-generator: do not skip /sysroot prefix if the mount point is missing
When chase_symlinks() is called on something on a doesn't exist, it immediately
returns an error. But we were relying on it to prepend "/sysroot/". If it
fails, we need to do that ourselves.
For example, with /sysroot/etc/fstab containing a line for /foo, if /sysroot/foo
doesn't exist, we'd generate a mount point for /foo.
Originally (6b1dc2bd3cdb3bd932b0692be636ddd2879edb92) we had 'pre' and 'post'
to refer to remote-fs-pre.target and remote-fs.target or local-fs-pre.target
and local-fs.target. But 'pre' is long gone, and 'post' by itself doesn't
make much sense. Rename it for clarity.
fstab-generator: properly report the source of data
Mount information can come from /etc/fstab, /sysroot/etc/fstab, and
/proc/cmdline. Even when we had the path to the right source handy, we would
often write something inaccurate. In particular, in the initrd, we would
generally write "/etc/fstab" instead of "/sysroot/etc/fstab" for no good
reason.
generators: only redirect logging when invoked by systemd
We would always print output to the kmsg or journal, but that is only needed
and useful when invoked by systemd. So let's skip redirection unless we are
invoked by systemd. Otherwise, let's log normally. This makes test invocations
easier, and also helps when the generator is invoked by mistake. If redirection
is necessary, the generator can be invoked with SYSTEMD_LOG_TARGET=… even
during tests.
Ambient capabilities should not be passed implicitly to user
services. Dropping them does not affect the permitted and effective sets
which are important for the manager itself to operate.
cgroups-agent: connect stdin/stdout/stderr to /dev/null
Inspired by https://github.com/systemd/systemd/pull/24024 this is
another user mode helper, where this might be an issue. hence let's
rather be safe than sorry, and also connect stdin/stdout/stderr
explicitly with /dev/null.
generators: accept one or three args, do not write to /tmp
Since the general generator logic was established in the rewrite in 07719a21b6425d378b36bb8d7f47ad5ec5296d28, generators would always write to /tmp
by default. I think this not a good default at all, because generators write a
bunch of files and would create a mess in /tmp. And for debugging, one
generally needs to remove all the files in the output directory, because
generators will complain in the output paths are already present. Thus the
approach of disabling console logging and writing many files to /tmp when
invoked with no arguments is not nice, so let's disallow operation with no
args.
But when debugging, one generally does not care about the separate output dirs
(most generators use only one). Thus the general pattern I use is something
like:
rm -rf /tmp/x && mkdir /tmp/x && build/some-generator /tmp/{x,x,x}
This commit allows only one directory to be specified and simplifies this to:
rm -rf /tmp/x && mkdir /tmp/x && build/some-generator /tmp/x
tmpfiles: optionally, decode string to write to files with base64
This is useful to use "f" or "w" to write arbitrary binary files to
disk, or files with newlines and similar (for example to provision SSH
host keys and similar).
coredump: Connect stdout/stderr to /dev/null before doing anything
When invoked as the coredump handler by the kernel, systemd-coredump's
stdout and stderr streams are closed. This is dangerous as this means
the fd's can get reallocated, leading to hard to debug errors such as
log messages ending up being appended to a compressed coredump file.
To avoid such issues in the future, let's bind stdout/stderr to
/dev/null so the file descriptors can't get used for anything else.