In some kernels (specifically, 5.4) even though the clone3 syscall is
supported, setting CLONE_INTO_CGROUP is not. The error message returned
in this case is E2BIG.
If posix_spawn_wrapper encounters this error, it does not retry, and
cannot spawn any programs in said kernels.
This commit adds a check for the E2BIG error and retries pidfd_spawn()
without the POSIX_SPAWN_SETCGROUP flag.
If we encounter an E2BIG error, and the pidfd_spawn() succeeds after
removing the POSIX_SPAWN_SETCGROUP flag, then we cache the result so
that we do not retry every time.
Originally, this issue was reported in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1077204.
Ronan Pigott [Mon, 19 Aug 2024 20:18:10 +0000 (13:18 -0700)]
resolved: demote the global unicast scope
This will greatly reduce the number of cases where the global unicast
scope overlaps with link scopes configured as default-route, making it
feasible to use the global DNS setting in conjunction with per-link dns
servers configured by the network.
This change is preferred over demoting links to default-route=no where
the user prefers to use the network provided DNS servers, and I expect
it is non-disruptive in that it should not degrade the efficacy of any
existing configuration.
terminal-util: don't assume errno is correctly set when using isatty_safe()
let's instead generate ENOTTY on our own. This is more correct with out
coding style (since we generally do not propagate errors via errno), and
also addresses #34039 as side effect. (#34039 really needs to be fixed
in musl though, too, this is just a work-around as side-effect).
maia x. [Mon, 19 Aug 2024 19:47:21 +0000 (12:47 -0700)]
namespace: Fix extension release memory leak
In apply_one_mount(), in the MOUNT_EXTENSION_DIRECTORY case,
char **extension_release was used as a return pointer twice but only
cleaned up once in the end. Fix it by removing duplicate code that
was causing this issue.
Yu Watanabe [Sat, 17 Aug 2024 11:13:12 +0000 (20:13 +0900)]
network/routing-policy-rule: use int32_t for suppress_prefixlen
The kernel parses FRA_SUPPRESS_PREFIXLEN as uint32_t, but internally
handled as signed integer and negative values as unset. Let's explicitly
specify the size of the variable.
Daan De Meyer [Sun, 18 Aug 2024 11:20:14 +0000 (13:20 +0200)]
test: Gracefully handle running within user namespace with single user
Unprivileged users often make themselves root by unsharing a user namespace
and then mapping their current user to root which does not require privileges.
Let's make sure our tests don't fail in such an environment by adding checks
where required to see if we're not running in a user namespace with only a
single user.
Yu Watanabe [Sat, 17 Aug 2024 02:26:32 +0000 (11:26 +0900)]
analyze: introduce --instance= option to control instance name for template units
Note, `systemd-analyze foo@.service --instance=hoge` is equivalent to
`systemd-analyze foo@hoge.service`. But, the option may be useful when
e.g. passing multiple template units that have restriction on their
instance name:
```
$ ls
template_aaa@.service template_bbb@.service template_ccc@.service
$ systemd-analyze ./template_* --instance=hoge
```
Without the option, we need to embed an instance name into each unit
name, so cannot use globs.
Before this commit, the "Cannot raise nice level" branch
is rather confusing, as we're actually lowering the nice.
Also, it's better to log about the final nice value
for both cases, no matter whether we need to set to limit
or not.
The current semantics of "filtered" in unit_is_filtered()
are actually the contrary of ListUnitsFiltered(). Let's
make things consistent, i.e. return true when the unit
shall be included.
Mike Yuan [Tue, 23 Jul 2024 14:09:53 +0000 (16:09 +0200)]
core/dbus-service: refuse bind mounting over /run/credentials/
The credential mounts should be managed singlehandedly by pid1.
Preparation for the future introduction of RefreshOnReload=credential,
where refreshing creds will be properly supported on reload.
Mike Yuan [Mon, 10 Jun 2024 15:27:51 +0000 (17:27 +0200)]
core/dbus-service: some modernization for bus_service_method_mount()
Perform some checks earlier to avoid pointless polkit auth.
Plus, the missing unit_get_exec_context() shall not be
a formalized error. As it's our internal representation
and in the normal operation should never happen.
Yu Watanabe [Fri, 16 Aug 2024 15:00:32 +0000 (00:00 +0900)]
network: make IPMasquerade= imply global IP forwarding settings again
After 3976c430927e1bfefa0413f80ebac84ab9a64350 (#31423), IPMasquerade=
implies only per-interface IP forwarding. That means, nspawn users need
to manually enable IPv4/IPv6Forwarding= in networkd.conf when
--network-veth or friend is used. Even the change was announced in NEWS,
the change itself breaks backward compatibility and extremely reduces
usability.
Let's make the setting imply the global setting again.
Daan De Meyer [Wed, 14 Aug 2024 10:43:05 +0000 (12:43 +0200)]
Add $SYSTEMD_IN_CHROOT to override chroot detection
When running unprivileged, checking /proc/1/root doesn't work because
it requires privileges. Instead, let's add an environment variable so
the process that chroot's can tell (systemd) subprocesses whether
they're running in a chroot or not.
bryango [Thu, 15 Aug 2024 05:18:17 +0000 (13:18 +0800)]
shell-completion: zsh: fix incorrect unescaping
Previously the `_filter_units_by_property` completion function
outputs with a [zsh parameter expansion flag] `g:o:`. This means
that the returned result is unescaped as the zsh builtin `echo`,
except that octal escapes don’t take a leading zero. This seemed to
have worked back in the days when it was first introduced:
udev-builtin-net_id: add NAMING_DEVICETREE_PORT_ALIASES to check of_node of netdevs before their parents
The net_id builtin only checked the of_node of a netdev's parent device,
not that of the netdev itself. While it is common that netdevs don't have
an OF node assigned themselves, as they are derived from some parent
device, this is not always the case. In particular when a single
controller provides multiple ports that can be referenced indiviually in
the Device Tree (both for aliases/MAC address assignment and phandle
references), the correct of_node will be that of the netdev itself, not
that of the parent, so it needs to be checked, too.
A new naming scheme flag NAMING_DEVICETREE_PORT_ALIASES is added to
allow selecting the new behavior.
Yu Watanabe [Thu, 15 Aug 2024 07:33:51 +0000 (16:33 +0900)]
test: sync journal after all invocations finished
Otherwise, several messages for the last invocation have not been
stored to journal yet.
Hopefully fixes the following race:
===
[ 603.037765] H systemd-run[10503]: Running as unit: invocation-id-test-26448.service; invocation ID: 1a49edeb05a641aaa2def72411134822
[ 603.099587] H bash[10504]: invocation 10 1a49edeb05a641aaa2def72411134822
[ 603.212069] H systemd[1]: invocation-id-test-26448.service: Deactivated successfully.
[ 603.225092] H systemd-run[10503]: Finished with result: success
[ 603.225163] H TEST-04-JOURNAL.sh[10506]: + journalctl --list-invocation -u invocation-id-test-26448.service
[ 603.225318] H systemd-run[10503]: Main processes terminated with: code=exited, status=0/SUCCESS
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: + tee /tmp/tmp.UzSmYamXyg/10
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: IDX INVOCATION ID FIRST ENTRY LAST ENTRY
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -9 d6efabb546014027b6bd7ee3a78386d6 Wed 2024-08-14 22:12:16 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -8 3e402b81c28d4a8fa2c5e8e31dffd9ee Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -7 5ebd0ba07d4f4f52bc84275f55a3ee2e Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -6 bc53c49d6ce24bb7acd438c3e61cfb23 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -5 24680907919e4839a75378117bb5a816 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -4 ec364ed7673c4a1fa22929f95ce7047b Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -3 2e8a4dea43044d1a9faf922f7a2f3d42 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -2 ac610b6e6c9c4a29bf8947890685478b Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: -1 9b7d52c3620948f9831e323910f605f5 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225357] H TEST-04-JOURNAL.sh[10507]: 0 1a49edeb05a641aaa2def72411134822 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.225823] H systemd-run[10503]: Service runtime: 174ms
[ 603.225866] H TEST-04-JOURNAL.sh[10508]: + journalctl --list-invocation -u invocation-id-test-26448.service --reverse
[ 603.226110] H systemd-run[10503]: CPU time consumed: 12ms
[ 603.226142] H TEST-04-JOURNAL.sh[10509]: + tee /tmp/tmp.UzSmYamXyg/10-r
[ 603.226378] H systemd-run[10503]: Memory peak: 1.4M (swap: 0B)
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: IDX INVOCATION ID FIRST ENTRY LAST ENTRY
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: 0 1a49edeb05a641aaa2def72411134822 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:18 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -1 9b7d52c3620948f9831e323910f605f5 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -2 ac610b6e6c9c4a29bf8947890685478b Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -3 2e8a4dea43044d1a9faf922f7a2f3d42 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -4 ec364ed7673c4a1fa22929f95ce7047b Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -5 24680907919e4839a75378117bb5a816 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -6 bc53c49d6ce24bb7acd438c3e61cfb23 Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -7 5ebd0ba07d4f4f52bc84275f55a3ee2e Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -8 3e402b81c28d4a8fa2c5e8e31dffd9ee Wed 2024-08-14 22:12:17 UTC Wed 2024-08-14 22:12:17 UTC
[ 603.230161] H TEST-04-JOURNAL.sh[10509]: -9 d6efabb546014027b6bd7ee3a78386d6 Wed 2024-08-14 22:12:16 UTC Wed 2024-08-14 22:12:17 UTC
===
Ronan Pigott [Wed, 14 Aug 2024 18:42:03 +0000 (11:42 -0700)]
units: drop "-p" flag from agetty's login options
This flag was added in db6aedab9292 with the justification that locale
environment variables should be preserved by the user session. However,
the companion patch to drop the UnsetEnvironment= directive blocking
these variables was never merged, so the intended change was never
effected.
While the patch was ineffective toward its stated goal, the "-p" option
does have material negative consequences for the user session in
systemd — environment variables to support the use of
credentials and memory pressure directives, such as
$CREDENTIALS_DIRECTORY and $MEMORY_PRESSURE_WATCH, which are now
directly used by agetty and login, get leaked into the user session
potentially breaking applications that rely on these values.
E.g. systemd-ask-password fails from the tty when $CREDENTIALS_DIRECTORY
has been leaked from agetty, because it expects to be able to access
credentials in $CREDENTIALS_DIRECTORY.
Chengen Du [Mon, 12 Aug 2024 03:41:52 +0000 (11:41 +0800)]
udev: Handle PTP device symlink properly on udev action 'change'
PTP device symlink creation rules are currently executed only when the
udev action is 'add'. If a user reloads the rules and runs the udevadm
trigger command to reapply changes, the symlink may be deleted, which
can prevent the chronyd service from restarting properly.
Signed-off-by: Chengen Du <chengen.du@canonical.com>
Yu Watanabe [Wed, 7 Aug 2024 03:01:45 +0000 (12:01 +0900)]
sd-journal: fix sd_journal_seek_monotonic_usec()
This fixes the following issues:
- We have a journal file, which contains entries of boot A and B. Let T
be the timestamp of the _last_ entry of boot A.
If sd_journal_seek_monotonic_usec() is called for boot A with a timestamp
_after_ T, following sd_journal_next() will provide the _first_ entry of
boot A, rather than the first entry of boot B.
- We have two journal files X and Y. The file X contains entries of boot A.
Let T be the timestamp of the _last_ entry of boot A in file X. The file Y
contains entries of boot A after timestamp T.
If sd_journal_seek_monotonic_usec() is called for boot A with a
timestamp _after_ T, following sd_journal_next() will provide the
_first_ entry of boot A, whose timestamp is of course earlier than T.
In the next commit, we'll introduce a varlink server for the user
manager. As preparation for that, let's introduce a new function to
initialize only the managed OOM connection whenever we send a managed
OOM update.