Bastien Nocera [Wed, 2 Dec 2020 11:40:42 +0000 (12:40 +0100)]
udev: Extract RAM properties from DMI information
Add memory_id program to set properties about the physical memory
devices in the system. This is useful on machines with removable memory
modules to show how the machine can be upgraded, and on all devices to
detect the actual RAM size, without relying on the OS accessible amount.
test-login: skip consistency checks when logind is not active
There are two ways in swich sd_login_* functions acquire data:
some are derived from the cgroup path, but others use the data serialized
by logind.
When the tests are executed under Fedora's mock, without systemd-spawn
but instead in a traditional chroot, test-login gets confused:
the "outside" cgroup path is visible, so sd_pid_get_unit() and
sd_pid_get_session() work, but sd_session_is_active() and other functions
that need logind data fail.
Such a buildroot setup is fairly bad, but it can be encountered in the wild, so
let's just skip the tests in that case.
/* Information printed is from the live system */
sd_pid_get_unit(0, …) → "session-237.scope"
sd_pid_get_user_unit(0, …) → "n/a"
sd_pid_get_slice(0, …) → "user-1000.slice"
sd_pid_get_session(0, …) → "237"
sd_pid_get_owner_uid(0, …) → 1000
sd_pid_get_cgroup(0, …) → "/user.slice/user-1000.slice/session-237.scope"
sd_uid_get_display(1000, …) → "(null)"
sd_uid_get_sessions(1000, …) → [0] ""
sd_uid_get_seats(1000, …) → [0] ""
Assertion 'r >= 0' failed at src/libsystemd/sd-login/test-login.c:104, function test_login(). Aborting.
Devon Pringle [Mon, 14 Dec 2020 04:22:18 +0000 (14:22 +1000)]
networkd: handle ignoring ll gateway being link ll
In the event where network discovery gets a route with the gateway being
the interfaces local link address, networkd will fail the interface.
systemd-networkd[44319]: br_lan: Configuring route: dst: fdcd:41a4:5559:ec03::/64, src: n/a, gw: fe80::e4da:7eff:fe77:5c5e, prefsrc: n/a, scope: global, table: main, proto: ra, type: unicast
systemd-networkd[44319]: br_lan: Could not set NDisc route or address: Gateway can not be a local address. Invalid argument
systemd-networkd[44319]: br_lan: Failed
systemd-networkd[44319]: br_lan: State changed: configuring -> failed
This patch, instead of allowing the interface to fail, will instead log
the event and skip setting the route.
hostnamed,shared/hostname-setup: expose the origin of the current hostname
In hostnamed this is exposed as a dbus property, and in the logs in both
places.
This is of interest to network management software and such: if the fallback
hostname is used, it's not as useful as the real configured thing. Right now
various programs try to guess the source of hostname by looking at the string.
E.g. "localhost" is assumed to be not the real hostname, but "fedora" is. Any
such attempts are bound to fail, because we cannot distinguish "fedora" (a
fallback value set by a distro), from "fedora" (received from reverse dns),
from "fedora" read from /etc/hostname.
/run/systemd/fallback-hostname is written with the fallback hostname when
either pid1 or hostnamed sets the kernel hostname to the fallback value. Why
remember the fallback value and not the transient hostname in /run/hostname
instead?
We have three hostname types: "static", "transient", fallback".
– Distinguishing "static" is easy: the hostname that is set matches what
is in /etc/hostname.
– Distingiushing "transient" and "fallback" is not easy. And the
"transient" hostname may be set outside of pid1+hostnamed. In particular,
it may be set by container manager, some non-systemd tool in the initramfs,
or even by a direct call. All those mechanisms count as "transient". Trying
to get those cases to write /run/hostname is futile. It is much easier to
isolate the "fallback" case which is mostly under our control.
And since the file is only used as a flag to mark the hostname as fallback,
it can be hidden inside of our /run/systemd directory.
For https://bugzilla.redhat.com/show_bug.cgi?id=1892235.
hostnamed: stop discriminating against "localhost" in /etc/hostname
We would sometimes ignore localhost-style names in /etc/hostname. That is
brittle. If the user configured some hostname, it's most likely because they
want to use that as the hostname. If they don't want to use such a hostname,
they should just not create the config. Everything becomes simples if we just
use the configured hostname as-is.
This behaviour seems to have been a workaround for Anaconda installer and other
tools writing out /etc/hostname with the default of "localhost.localdomain".
Anaconda PR to stop doing that: https://github.com/rhinstaller/anaconda/pull/3040.
That might have been useful as a work-around for other programs misbehaving if
/etc/hostname was not present, but nowadays it's not useful because systemd
mostly controls the hostname and it is perfectly happy without that file.
Apart from making things simpler, this allows users to set a hostname like
"localhost" and have it honoured, if such a whim strikes them.
shared/hostname-setup: leave the terminator byte alone
gethostname(3) says it's unspecified whether the string is properly terminated
when the hostname is too long. We created a buffer with one extra byte, and it
seems the intent was to let that byte serve as terminator even if we get an
unterminated string from gethostname().
Move hostname setup logic to new shared/hostname-setup.[ch]
No functional change, just moving a bunch of things around. Before
we needed a rather complicated setup to test hostname_setup(), because
the code was in src/core/. When things are moved to src/shared/
we can just test it as any function.
The test is still "unsafe" because hostname_setup() may modify the
hostname.
hostnamed: expose the fallback-hostname setting as a const dbus property
Various users want to know what the fallback hostname is. Since it was made
configurable in 8146c32b9264a6915d467a5cab1a24311fbede7e, we didn't expose this
nicely.
man/hostnamectl,hostaned,hostname1: adjust the docs to match reality
The semantics were significantly changed in c779a44222161155c039a7fd2fd304c006590ac7
("hostnamed: Fix the way that static and transient host names interact", Feb. 2014),
but when the dbus api documentation was imported much later, it wasn't properly
adjusted to describe those new semantics.
Let's ove various bits and pieces around so that they are in more appropriate
places. Drop recommendations to set the hostname for DHCP or mDNS purposes.
Nowadays we expect tools that want to expose some different hostname to the
outside to manage that internally without affecting visible state. Also drop
mentions of DHCP or mDNS directly setting the hostname, since nowadays network
management software is expected to (and does) go through hostnamed.
Also, add a high-level description of semantics. It glosses over the details of
handling of localhost-style names. Later commits will remove this special handling
anyway.
Upgrading to qemu 5.2 breaks TEST-36-NUMAPOLICY like:
qemu-system-x86_64: total memory for NUMA nodes (0x0) should
equal RAM size (0x20000000)
Use the new (as in >=2014) form of memdev in test 36:
-object memory-backend-ram,id=mem0,size=512M -numa node,memdev=mem0,nodeid=0
Since some target systems are as old as qemu 1.5.3 (CentOS7) but the new
kind to specify was added in qemu 2.1 this needs to add version parsing and
add the argument only when qemu is >=5.2.
Fixes #17986.
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
bus-util: improve logging when we can't connect to the bus
Previously, we'd already have explicit logging for the case where
$XDG_RUNTIME_DIR is not set. Let's also add some explicit logging for
the EPERM/ACCESS case. Let's also in both cases suggest the
--machine=<user>@.host syntax.
And while we are at it, let's remove side-effects from the macro.
By checking for both the EPERM/EACCES case and the $XDG_RUNTIME_DIR case
we will now catch both the cases where people use "su" to issue a
"systemctl --user" operation, and those where they (more correctly, but
still not good enough) call "su -".
So far, the bridge always acted as if "--system" was used, i.e. would
unconditionally connect to the system bus. Let's add "--user" too, to
connect to the users session bus.
This is mostly for completeness' sake.
I wanted to use this when making sd-bus's ability to connect to other
user's D-Bus busses work, but it didn't exist so far. In the interest of
keeping things compatible the implementation in sd-bus will not use the
new "--user" switch, and instead manually construct the right bus path
via "--path=", but we still should add the proper switches, as
preparation for a brighter future, one day.
sd-bus: add API for connecting to a specific user's user bus of a specific container
This is unfortunately harder to implement than it sounds. The user's bus
is bound a to the user's lifecycle after all (i.e. only exists as long
as the user has at least one PAM session), and the path dynamically (at
least theoretically, in practice it's going to be the same always)
generated via $XDG_RUNTIME_DIR in /run/.
To fix this properly, we'll thus go through PAM before connecting to a
user bus. Which is hard since we cannot just link against libpam in the
container, since the container might have been compiled entirely
differently. So our way out is to use systemd-run from outside, which
invokes a transient unit that does PAM from outside, doing so via D-Bus.
Inside the transient unit we then invoke systemd-stdio-bridge which
forwards D-Bus from the user bus to us. The systemd-stdio-bridge makes
up the PAM session and thus we can sure tht the bus exists at least as
long as the bus connection is kept.
Or so say this differently: if you use "systemctl -M lennart@foobar"
now, the bus connection works like this:
2. systemd-run gets a connection to the "foobar" container's
system bus, and invokes the "systemd-stdio-bridge" binary as
transient service inside a PAM session for the user "lennart"
3. The systemd-stdio-bridge then proxies our D-Bus traffic to
the user bus.
sd-bus (on host) → systemd-run (on host) → systemd-stdio-bridge (in container)
Complicated? Well, to some point yes, but otoh it's actually nice in
various other ways, primarily as it makes the -H and -M codepaths more
alike. In the -H case (i.e. connect to remote host via SSH) a very
similar three steps are used. The only difference is that instead of
"systemd-run" the "ssh" binary is used to invoke the stdio bridge in a
PAM session of some other system. Thus we get similar implementation and
isolation for similar operations.
So far when asked for augmented bus credentials and the process was
already gone we'd fail fatally. Let's make this graceful instead, and
never allow augmenting fail due to PID having vanished — unless the
augmenting is the explicit and only purpose of the requested operation.
This should be safe as clients have to explicitly query the acquired
creds anyway and handle if they couldn't be acquired. Moreover we
already handle permission problems gracefully, thus clients must be
ready to deal with missing creds.
This is useful to make selinux authorization work for short-lived client
proceses. PReviously we'd augment creds to have more info to log about
(the selinux decision would not be based on augmented data however,
because that'd be unsafe), and would fail if we couldn't get it. Now,
we'll try to acquire the data, but if we cannot acquire it, we'll still
do the selinux check, except that logging will be more limited.
hostname-util: flagsify hostname_is_valid(), drop machine_name_is_valid()
Let's clean up hostname_is_valid() a bit: let's turn the second boolean
argument into a more explanatory flags field, and add a flag that
accepts the special name ".host" as valid. This is useful for the
container logic, where the special hostname ".host" refers to the "root
container", i.e. the host system itself, and can be specified at various
places.
let's also get rid of machine_name_is_valid(). It was just an alias,
which is confusing and even more so now that we have the flags param.
logs-show: drop redundant validation of machine name
The immediately following container_get_leader() call validate the name
anyway, no need to twice exactly the same way twice immediately after
each other.
Gaurav [Fri, 4 Dec 2020 11:15:15 +0000 (16:45 +0530)]
Detect special character in dbus interface name
Added debug log to detect special character in dbus interface names.
Helps to detect a case mentioned in https://github.com/systemd/systemd/issues/14636
Dan Streetman [Wed, 9 Dec 2020 20:24:09 +0000 (15:24 -0500)]
test-network: increase wait_online timeout to handle longer dhcpv4 transient timeout
Previous commits changed the dhcpv4 retransmission algorithm to be
slightly slower, changing the amount of time it takes to notify
systemd-networkd that the dhcpv4 configuration has (transiently)
failed from around 14 second up to 28 seconds.
Since the test_dhcp_client_with_ipv4ll_without_dhcp_server test
configures an interface to use dhcpv4 without any operating dhcpv4
server running, it must increase the amount of time it waits for
the test interface to reach degraded state.
Dan Streetman [Wed, 9 Dec 2020 19:32:06 +0000 (14:32 -0500)]
sd-dhcp-client: correct retransmission timeout to match RFC
This changes the retransmission timeout algorithm for requests
other than RENEW and REBIND. Previously, the retransmission timeout
started at 2 seconds, then doubling each retransmission up to a max
of 64 seconds. This is changed to match what RFC2131 section 4.1 describes,
which skips the initial 2 second timeout and starts with a 4 second timeout
instead. Note that -1 to +1 seconds of random 'fuzz' is added to each
timeout, in previous and current behavior.
This change is therefore slightly slower than the previous behavior in
attempting retransmissions when no server response is received, since the
first transmission times out in 4 seconds instead of 2.
Since TRANSIENT_FAILURE_ATTEMPTS is set to 3, the previous length of time
before a transient failure was reported back to systemd-networkd was
2 + 4 + 8 = 14 seconds, plus, on average, 3 seconds of random 'fuzz' for
a transient failure timeout between 11 and 17 seconds. Now, since the
first timeout starts at 4, the transient failure will be reported at
4 + 8 + 16 = 28 seconds, again plus 3 random seconds for a transient
failure timeout between 25 and 31 seconds.
Additionally, if MaxAttempts= is set, it will take slightly longer to
reach than with previous behavior.
Use the request timeout algorithm specified in RFC2131 section 4.4.5 for
handling timed out RENEW and REBIND requests.
This changes behavior, as previously only 2 RENEW and 2 REBIND requests
were sent, no matter how long the lease lifetime. Now, requests are
send according to the RFC, which results in starting with a timeout
of 1/2 the t1 or t2 period, and halving the timeout for each retry
down to a minimum of 60 seconds.
Dan Streetman [Tue, 8 Dec 2020 20:36:19 +0000 (15:36 -0500)]
sd-dhcp-client: simplify dhcp4 t1/t2 parsing
The parsing of the dhcpv4 lease lifetime, as well as the t1/t2
times, is simplified by this commit.
This differs from previous behavior; previously, the lease lifetime and
t1/t2 values were modified by random 'fuzz' by subtracting 3, then adding
a random number between 0 and (slightly over) 2 seconds. The resulting
values were therefore always between 1-3 seconds shorter than the value
provided by the server (or the default, in case of t1/t2). Now, as
described in RFC2131, the random 'fuzz' is between -1 and +1 seconds,
meaning the actual t1 and t2 value will be up to 1 second earlier or
later than the server-provided (or default) t1/t2 value.
This also differs in handling the lease lifetime, as described above it
previously was adjusted by the random 'fuzz', but the RFC does not state
that the lease expiration time should be adjusted, so now the code uses
exactly the lease lifetime as provided by the server with no adjustment.
RFC2131, providing the details for dhcpv4, has specific retransmission
intervals that it outlines. This adds functions to compute the timeouts
as the RFC describes.
When something fails, we need some logs to figure out what happened.
This is primarily relevant for connection errors, but in general we
want to log about all errors, even if they are relatively unlikely.
We want one log on failure, and generally no logs on success.
The general idea is to not log in static functions, and to log in the
non-static functions. Non-static functions which call other functions
may thus log or not log as appropriate to have just one log entry in the
end.