Yu Watanabe [Thu, 15 May 2025 03:34:35 +0000 (12:34 +0900)]
core: introduce Unit.dependency_generation counter and restart loop when dependency is updated in the loop
When starting unit A, a dependent unit B may be loaded if it is not
loaded yet, and the dependencies in unit A may be updated.
As Hashmap does not allow a new entry to be added in a loop, we need to
restart loop in such case.
Yu Watanabe [Tue, 20 May 2025 19:38:07 +0000 (04:38 +0900)]
core/transaction: do not override unit load state when unit_load() failed
When unit_load() failed for some reasons, previously we overrided the
load state with UNIT_NOT_FOUND, but we did not update the
Unit.fragment_not_found_timestamp_hash. So, the unit may be loaded
multiple times when the unit is in a dependency list of another unit,
as manager_unit_cache_should_retry_load() will be true again even on
next call.
Let's not override the unit state set by unit_load().
Note, after unit_load(), the unit state should not be UNIT_STUB.
Let's also add the assertion about that.
This change is important when combined with the next commit, as with the
next commit we will restart the FOREACH_UNIT_DEPENDENCY() loop if an unit
is reloaded, hence overriding load state with UNIT_NOT_FOUND may cause
infinit loop.
Yu Watanabe [Tue, 20 May 2025 19:32:09 +0000 (04:32 +0900)]
core/transaction: drop redundant call of bus_unit_validate_load_state()
The function manager_unit_cache_should_retry_load() reutrns true only
when the unit state is UNIT_NOT_FOUND. Hence, it is not necessary to
call bus_unit_validate_load_state() before checking
manager_unit_cache_should_retry_load().
console: when switching console modes and one doesn't work, always go for the next
So far we already had a logic in place to go for the next mode if some
mode doesn't work – but it was only applied if we'd actively cycle
through resolutions.
Let's extend the logic and always apply it: whenever we try to switch to
a mode, and it doesn't work, go to the next one until we find one that
works.
Lukas Nykryn [Wed, 18 Jun 2025 11:33:25 +0000 (13:33 +0200)]
man: encourage the creation of empty machine-id instead of deleting it
Current text hints that machine-id in template image should be empty
if the system is read-only. But most of the bare metal systems and
regular VMs have /etc read-only at this phase of boot.
We now consider link-local addresses routable when we have configured
unicast link-local dns servers. This allows creating the DNS scope, even
when the interface doesn't get a routable address.
Luca Boccassi [Mon, 16 Jun 2025 22:28:57 +0000 (23:28 +0100)]
fstab-generator: set mode=0755 with root=tmpfs
If mode= is not set in rootflags= add mode=0755 when a tmpfs
is used on the rootfs, otherwise it will be group/world writable
as that's the default mode for tmpfs filesystems.
Yu Watanabe [Mon, 16 Jun 2025 08:55:11 +0000 (17:55 +0900)]
manager: also restart stub listner on reload
Previously, the extra stub listners were stopped but new ones were not
started. Also, the main stub listners were not restarted, hence the
new settings were not applied. This fixes the above two issues.
Note, to fix the issue, we need to keep CAP_NET_BIND_SERVICE capability
to make it allow to bind stub listner later.
Yu Watanabe [Thu, 12 Jun 2025 09:25:54 +0000 (18:25 +0900)]
udev/rules.d: import hwdb before calling net_id builtin
The commit cdcb1eeeb883b2ecb3992865f458f874900ddb87 adds
ID_NET_NAME_INCLUDE_DOMAIN property support in net_id builtin.
The property is basically set through hwdb. However, previously hwdb was
imported after calling net_id builtin, hence when net_id is called, the
property was never set.
This makes hwdb is imported before calling net_id builtin, so that the
property is set when net_id is called if hwdb has an entry about that
for the interface.
Daan De Meyer [Thu, 5 Jun 2025 10:14:45 +0000 (12:14 +0200)]
meson: Don't fail install script if file doesn't exist
Depending on which optional features are enabled, the NSS module
might not have been built, which means the custom install script
will fail to remove the file. Let's pass -f so it succeeds regardless
of whether the file exists or not.
vmspawn: do not preserve access permissions and xattrs of template OVMF vars
This makes vmspawn work when /usr/share/qemu/edk2-i386-vars.fd is on
disk with 0444 permissions as is the case on NixOS.
The nix package manager does not store any access permissions, ownership,
timestamps, or extended attributes in its package format to increase
reproducibility. The only meta-data that is stored is the executable bit.
Thus when unpacking a nix package, the executable bit is preserved, but no other
access permissions are preserved and all files in /nix/store end up as
read-only.
This causes the template OVMF vars file to have 0444 permissions. If we preserve
those permissions when copying the template file to /tmp that means QEMU can not
write to the file and fails.
So lets not preserve permissions and keep the 0600 permissions that are set by
default.
Alex [Mon, 2 Jun 2025 22:47:49 +0000 (18:47 -0400)]
network: fix a potential divide-by-zero (#37705)
In function `tc_init`, hz is parsed from the content of file
`"/proc/net/psched"` and can be 0.
In function `hierarchy_token_bucket_class_verify`, hz is directly used
as a divisor in
`htb->buffer = htb->rate / hz + htb->mtu;` without any check. This adds a check on hz before using it as a divisor.
mount-util: avoid unnecessary mount_setattr() call in make_fsmount()
If .attr_set is zero (and .att_clr, .propagation too), then there's no
point in calling mount_setattr().
Fixes: #37062
Note that this optimization is not precisely load-bearing anymore, since 3cc23a2c2345eb188551565349c89ec1fa8f650f got merged which removes the
only caller of make_fsmount() that might trigger it. But it's worth
fixing generic code anyway, in case it gets used like this later again.
Yu Watanabe [Tue, 27 May 2025 17:09:52 +0000 (02:09 +0900)]
network/link: update state file when master ifindex is changed
If master ifindex is non-zero, then the carrier state and operational
state of the interface may be the enslaved state.
As the operational state is saved in link state file, and read by
wait-online, we need to update the state file when the master ifindex is
changed.
Yu Watanabe [Tue, 27 May 2025 14:17:40 +0000 (23:17 +0900)]
network/link: ENODATA from reading IFLA_MASTER when an interface has no master
When an interface leaved from the master interface, then reading
IFLA_MASTER attribute causes ENODATA. When the interface was previously
enslaved to another interface, we need to remove reference to the
interface from the previous master interface.
This is especially important when
```
ip link set dev eth0 nomaster
```
is called.
Adrian Vovk [Tue, 18 Feb 2025 20:59:03 +0000 (15:59 -0500)]
man/systemd.timer: Correct inaccuracy in man page
The docs previously stated that RandomizedDelaySec is applied onto the
next scheduled time, but after 9fa326b18aef0c1e5c80e23a5b41de02155e6f7e
this is no longer the case.
I also reworded FixedRandomDelay= slightly, to make it a bit clearer
The check looks plausible, but when I started checking whether it needs
to be lowered for the recent changes, I realized that it doesn't make
much sense.
context_parse_iovw() is called from a few places, e.g.:
- process_socket(), where the other side controls the contents of the
message. We already do other checks on the correctness of the message
and this assert is not needed.
- gather_pid_metadata_from_argv(), which is called after
inserting MESSAGE_ID= and PRIORITY= into the array, so there is no
direct relation between _META_ARGV_MAX and the number of args in the
iovw.
- gather_pid_metadata_from_procfs(), where we insert a bazillion fields,
but without any relation to _META_ARGV_MAX.
Since we already separately check if the required stuff was set, drop this
misleading check.
The kernel provides %d which is documented as
"dump mode—same as value returned by prctl(2) PR_GET_DUMPABLE".
We already query /proc/pid/auxv for this information, but unfortunately this
check is subject to a race, because the crashed process may be replaced by an
attacker before we read this data, for example replacing a SUID process that
was killed by a signal with another process that is not SUID, tricking us into
making the coredump of the original process readable by the attacker.
With this patch, we effectively add one more check to the list of conditions
that need be satisfied if we are to make the coredump accessible to the user.
No functional change. This change is done in preparation for future changes.
Currently, the list of fields which are received on the command line is a
strict subset of the fields which are always expected to be received on a
socket. But when we add new kernel args in the future, we'll have two
non-overlapping sets and this approach will not work. Get rid of the variable
and enumerate the required fields. This set will never change, so this is
actually more maintainable.
The message with the hint where to add new fields is switched with
_META_ARGV_MAX. The new order is more correct.
unit_gc_sweep() might try to add the unit to gc queue again.
While that becomes no-op as Unit.in_gc_queue is not cleared
yet, it induces minor inconsistency of states.
TheHillBright [Wed, 21 May 2025 10:38:12 +0000 (18:38 +0800)]
journald: clarify doc for usage-related values cap (#37528)
The old description makes users wrongly assume that the cap of 4G
applied, even when the user specifies a value that will result in higher
than 4G. This commit avoids this misunderstanding.
coredump: restore compatibility with older patterns
This was broken in f45b8015513d38ee5f7cc361db9c5b88c9aae704. Unfortunately
the review does not talk about backward compatibility at all. There are
two places where it matters:
- During upgrades, the replacement of kernel.core_pattern is asynchronous.
For example, during rpm upgrades, it would be updated a post-transaction
file trigger. In other scenarios, the update might only happen after
reboot. We have a potentially long window where the old pattern is in
place. We need to capture coredumps during upgrades too.
- With --backtrace. The interface of --backtrace, in hindsight, is not
great. But there are users of --backtrace which were written to use
a specific set of arguments, and we can't just break compatiblity.
One example is systemd-coredump-python, but there are also reports of
users using --backtrace to generate coredump logs.
Thus, we require the original set of args, and will use the additional args if
found.
A test is added to verify that --backtrace works with and without the optional
args.
This returns to the original approach proposed in
https://github.com/systemd/systemd/pull/17270. After review, the approach was
changed to use sd_pid_get_owner_uid() instead. Back then, when running in a
typical graphical session, sd_pid_get_owner_uid() would usually return the user
UID, and when running under sudo, geteuid() would return 0, so we'd trigger the
secure path.
sudo may allocate a new session if is invoked outside of a session (depending
on the PAM config). Since nowadays desktop environments usually start the user
shell through user units, the typical shell in a terminal emulator is not part
of a session, and when sudo is invoked, a new session is allocated, and
sd_pid_get_owner_uid() returns 0 too. Technically, the code still works as
documented in the man page, but in the common case, it doesn't do the expected
thing.
$ build/test-sd-login |& rg 'get_(owner_uid|cgroup|session)'
sd_pid_get_session(0) → No data available
sd_pid_get_owner_uid(0) → 1000
sd_pid_get_cgroup(0) → /user.slice/user-1000.slice/user@1000.service/app.slice/app-ghostty-transient-5088.scope/surfaces/556FAF50BA40.scope
I think it's worth checking for sudo because it is a common case used by users.
There obviously are other mechanims, so the man page is extended to say that
only some common mechanisms are supported, and to (again) recommend setting
SYSTEMD_LESSSECURE explicitly. The other option would be to set "secure mode"
by default. But this would create an inconvenience for users doing the right
thing, running systemctl and other tools directly, because then they can't run
privileged commands from the pager, e.g. to save the output to a file. (Or the
user would need to explicitly set SYSTEMD_LESSSECURE. One option would be to
set it always in the environment and to rely on sudo and other tools stripping
it from the environment before running privileged code. But that is also fairly
fragile and it obviously relies on the user doing a complicated setup to
support a fairly common use case. I think this decreases usability of the
system quite a bit. I don't think we should build solutions that work in
priniciple, but are painfully inconvenient in common cases.)
man: rework the description of $SYSTEMD_PAGER and $PAGER
$PAGER wasn't documented, but actually we treat it same as $SYSTEMD_PAGER,
except for lower priority. And the two variables can be used to disable the
pager, even if $SYSTEMD_PAGERSECURE is not set.
Behaviour is (obviously) not changed by this patch, it intentionally just
updates the docs to match the code.
man: reword the description of "secure pager" handling
The existing description was not *wrong*, but it was a bit muddled. Let's
reorder the text to give a short intro and then describe what the options
actually do and the clear "true" and "false" cases first, and then describe
autodetection.
Related to https://yeswehack.com/vulnerability-center/reports/346802.