All backslashes that should be single in shell syntax need to be written as "\\" because
our parser will remove one level of quoting. Also, single quotes were doubly nested, which
cannot work.
Yu Watanabe [Wed, 26 Aug 2020 13:31:01 +0000 (22:31 +0900)]
network: fixes gateway assignment through DHCPv4
This fixes the following issue:
- If a DHCP lease does not contains router option, then routes with
`Gateway=_dhcp` setting introduce unexpected results.
This also makes several failure paths critical. And adjust warnings when
classless routes are provided.
udev-test: make sure we run udev tests with selinux assumed off
This is cleaner that way given that we create our own half-virtualizes
device tree, and really shouldn't pull selinux labelling and access
control into that, we can only lose, in particular as our overmounted
/sys/ actually lacks /sys/fs/selinux.
(This fixes udev test woes introduced by #16821 where suddenly the test
would fail because libselinux assumed selinux was on, but selinuxfs
wasn't actually available)
core/socket: fold socket_instantiate_service() into socket_enter_running()
socket_instantiate_service() was doing unit_ref_set(), and the caller was
immediately doing unit_ref_unset(). After we get rid of this, it doesn't seem
worth it to have two functions.
core/socket: we may get ENOTCONN from socket_instantiate_service()
This means that the connection was aborted before we even got to figure out
what the service name will be. Let's treat this as a non-event and close the
connection fd without any further messages.
Code last changed in 934ef6a5. Reported-by: Thiago Macieira <thiago.macieira@intel.com>
With the patch:
systemd[1]: foobar.socket: Incoming traffic
systemd[1]: foobar.socket: Got ENOTCONN on incoming socket, assuming aborted connection attempt, ignoring.
...
Also, when we get ENOMEM, don't give the hint about missing unit.
Gibeom Gwon [Wed, 26 Aug 2020 13:56:01 +0000 (22:56 +0900)]
homed: remember the secret even when the for_state is FIXATING_FOR_ACQUIRE
Remember the secret if the for_state is FIXATING_FOR_ACTIVATION or
FIXATING_FOR_ACQUIRE. This fixes login failures when logging in
to an unfixated user.
Let's make libcryptsetup a dlopen() style dep for PID 1 (i.e. for
RootImage= and stuff), systemd-growfs and systemd-repart. (But leave to
be a regulra dep in systemd-cryptsetup, systemd-veritysetup and
systemd-homed since for them the libcryptsetup support is not auxiliary
but pretty much at the core of what they do.)
This should be useful for container images that want systemd in the
payload but don't care for the cryptsetup logic since dm-crypt and stuff
isn't available in containers anyway.
"crypt-util.c" is such a generic name, let's avoid that, in particular
as libc's/libcrypt's crypt() function is so generically named too that
one might thing this is about that. Let's hence be more precise, and
make clear that this is about cryptsetup, and nothing else.
We already had cryptsetup-util.[ch] in src/cryptsetup/ doing keyfile
management. To avoid the needless confusion, let's rename that file to
cryptsetup-keyfile.[ch].
import: make sure gnu tar complains on tar files with trailing garbage
By default GNU tar will only read the first archive if multiple archives
are concatenated and ignore the rest. If an archive contains trailing
garbage this will hence not be recognized by tar as error, it simply
stops reading when the first archive is done (which might escalate to
SIGPIPE when invoked via a pipe).
Let's add --ignore-zeros to the tar command line when extracting. This
means:
1) if a tar archive was concatenated (i.e. generated with tar -A) we'll
process it correctly.
2) if a tar archive contains trailing garbage tar will now generate an
error message about it, instead of just throwing EPIPE, which makes
things easier to debug as broken files are not silently processed.
I think it's OK for gnu tar to ignore trailing garbage when dealing with
classic tapes drives, i.e. mediums that do not have a size limit
built-in. However, this is not what we are dealing with: we are dealing
with OS images here, that hopefully someone generated with a clean build
system, that were signed and validated and hence should not contain
trailing garbage. Hence it's better to refuse and complain thant to
silently eat up like for classic tape drives.
nspawn: let's make LinkJournal an extended boolean
Let's accept the usual boolean parameters for LinkJournal. It's
confusing otherwise.
Previously we'd accept "no" but not the other values we typically accept
for "false". We'd not accept any values for "true".
With this change we'll accept all true and false values and will do
something somewhat reasonable: any false value is treated like "no"
previously was reated. And any true value is now treated like "auto".
We don't document the new values, since this logic is mostly redundant,
and it's probably better if people consider this an enum rather than a
bool.
logind: only apply ACLs for device currently tagged with "uaccess"
This is about security, hence let's be particularly careful here: only
devices currenlty tagged with "uaccess" will get ACL management, and
it's not sufficient if they once were (though that is used for
filtering).
core: make sure to recheck current udev tag "systemd" before considering a device ready
Let's ensure that a device once tagged can become active/inactive simply
by toggling the current tag.
Note that this means that a device once tagged with "systemd" will
always have a matching .device unit. However, the active/inactive state
of the unit reflects whether it is currently tagged that way (and
doesn't have SYSTEMD_READY=0 set).
This tries to address the "bind"/"unbind" uevent kernel API breakage, by
changing the semantics of device tags.
Previously, tags would be applied on uevents (and the database entries
they result in) only depending on the immediate context. This means that
if one uevent causes the tag to be set and the next to be unset, this
would immediately effect what apps would see and the database entries
would contain each time. This is problematic however, as tags are a
filtering concept, and if tags vanish then clients won't hence notice
when a device stops being relevant to them since not only the tags
disappear but immediately also the uevents for it are filtered including
the one necessary for the app to notice that the device lost its tag and
hence relevance.
With this change tags become "sticky". If a tag is applied is once
applied to a device it will stay in place forever, until the device is
removed. Tags can never be removed again. This means that an app
watching a specific set of devices by filtering for a tag is guaranteed
to not only see the events where the tag is set but also all follow-up
events where the tags might be removed again.
This change of behaviour is unfortunate, but is required due to the
kernel introducing new "bind" and "unbind" uevents that generally have
the effect that tags and properties disappear and apps hence don't
notice when a device looses relevance to it. "bind"/"unbind" events were
introduced in kernel 4.12, and are now used in more and more subsystems.
The introduction broke userspace widely, and this commit is an attempt
to provide a way for apps to deal with it.
While tags are now "sticky" a new automatic device property
CURRENT_TAGS is introduced (matching the existing TAGS property) that
always reflects the precise set of tags applied on the most recent
events. Thus, when subscribing to devices through tags, all devices that
ever had the tag put on them will be be seen, and by CURRENT_TAGS it may
be checked whether the device right at the moment matches the tag
requirements.
test-functions: make sure we test our own libudev instead of the host libudev
When invoking "ldd" to find dependency libraries we already set
$LD_LIBRARY_PATH to point to our own build tree, so that our libraries
are checked, not the host libraries. This is not sufficient howeever, as
libudev is built in a subdir. Add that, too.
user-record-nss: check if strings from pwd/spwd/grp/sgrp are valid utf-8
strv_extend_strv_utf8_only() uses a temporary buffer to make the implementation
conscise. Otherwise we'd have to rewrite all of strv_extend_strv() which didn't
seem worth the trouble for this one use outside of a hot path.
If the data is not serializable, we just pretend it doesn't exists.
This fixes #16683 and https://bugs.gentoo.org/735072 in a second way.
They both are both short and contain similar parts and various helper will be
shared between both parts of the code so it's easier to use a single file.
JSON strings must be utf-8-clean. We also verify this in json_parse_string()
so we would reject a message with invalid utf-8 anyway.
It would probably be slightly cheaper to detect non-conformaning strings in
serialization, but then we'd have to fail serialization. By doing this early,
we give the caller a chance to handle the error nicely.
The test is adjusted to contain a valid utf-8 string after decoding of the
utf-32 encoding in json ("विवेकख्यातिरविप्लवा हानोपायः।", something about the
cessation of ignorance).
basic/hashmap,set: move pointer symbol adjactent to the returned value
I think this is nicer in general, and here in particular we have a lot
of code like:
static inline IteratedCache* hashmap_iterated_cache_new(Hashmap *h) {
return (IteratedCache*) _hashmap_iterated_cache_new(HASHMAP_BASE(h));
}
and it's visually appealing to use the same whitespace in the function
signature and the cast in the body of the function.
The compiler would do this to, esp. with LTO, but we can short-circuit the
whole process and make everything a bit simpler by avoiding the separate
definition.
(It would be nice to do the same for _set_new(), _set_ensure_allocated()
and other similar functions which are one-line trivial wrappers too. Unfortunately
that would require enum HashmapType to be made public, which we don't want
to do.)
Tobias Kaufmann [Mon, 31 Aug 2020 11:48:31 +0000 (13:48 +0200)]
core: fix securebits setting
Desired functionality:
Set securebits for services started as non-root user.
Failure:
The starting of the service fails if no ambient capability shall be
raised.
... systemd[217941]: ...: Failed to set process secure bits: Operation
not permitted
... systemd[217941]: ...: Failed at step SECUREBITS spawning
/usr/bin/abc.service: Operation not permitted
... systemd[1]: abc.service: Failed with result 'exit-code'.
Reason:
For setting securebits the capability CAP_SETPCAP is required. However
the securebits (if no ambient capability shall be raised) are set after
setresuid.
When setresuid is invoked all capabilities are dropped from the
permitted, effective and ambient capability set. If the securebit
SECBIT_KEEP_CAPS is set the permitted capability set is retained, but
the effective and the ambient set are cleared.
If ambient capabilities shall be set, the securebit SECBIT_KEEP_CAPS is
added to the securebits configured in the service file and set together
with the securebits from the service file before setresuid is executed
(in enforce_user).
Before setresuid is executed the capabilities are the same as for pid1.
This means that all capabilities in the effective, permitted and
bounding set are set. Thus the capability CAP_SETPCAP is in the
effective set and the prctl(PR_SET_SECUREBITS, ...) succeeds.
However, if the secure bits aren't set before setresuid is invoked they
shall be set shortly after the uid change in enforce_user.
This fails as SECBIT_KEEP_CAPS wasn't set before setresuid and in
consequence the effective and permitted set was cleared, hence
CAP_SETPCAP is not set in the effective set (and cannot be raised any
longer) and prctl(PR_SET_SECUREBITS, ...) failes with EPERM.
Proposed solution:
The proposed solution consists of three parts
1. Check in enforce_user, if securebits are configured in the service
file. If securebits are configured, set SECBIT_KEEP_CAPS
before invoking setresuid.
2. Don't set any other securebits than SECBIT_KEEP_CAPS in enforce_user,
but set all requested ones after enforce_user.
This has the advantage that securebits are set at the same place for
root and non-root services.
3. Raise CAP_SETPCAP to the effective set (if not already set) before
setting the securebits to avoid EPERM during the prctl syscall.
For gaining CAP_SETPCAP the function capability_bounding_set_drop is
splitted into two functions:
- The first one raises CAP_SETPCAP (required for dropping bounding
capabilities)
- The second drops the bounding capabilities
Why are ambient capabilities not affected by this change?
Ambient capabilities get cleared during setresuid, no matter if
SECBIT_KEEP_CAPS is set or not.
For raising ambient capabilities for a user different to root, the
requested capability has to be raised in the inheritable set first. Then
the SECBIT_KEEP_CAPS securebit needs to be set before setresuid is
invoked. Afterwards the ambient capability can be raised, because it is
in the inheritable and permitted set.
Security considerations:
Although the manpage is ambiguous SECBIT_KEEP_CAPS is cleared during
execve no matter if SECBIT_KEEP_CAPS_LOCKED is set or not. If both are
set only SECBIT_KEEP_CAPS_LOCKED is set after execve.
Setting SECBIT_KEEP_CAPS in enforce_user for being able to set
securebits is no security risk, as the effective and permitted set are
set to the value of the ambient set during execve (if the executed file
has no file capabilities. For details check man 7 capabilities).
Remark:
In capability-util.c is a comment complaining about the missing
capability CAP_SETPCAP in the effective set, after the kernel executed
/sbin/init. Thus it is checked there if this capability has to be raised
in the effective set before dropping capabilities from the bounding set.
If this were true all the time, ambient capabilities couldn't be set
without dropping at least one capability from the bounding set, as the
capability CAP_SETPCAP would miss and setting SECBIT_KEEP_CAPS would
fail with EPERM.
Upon reception of a message which fails in json_parse(), we would proceed to
parse it again from a deferred callback and hang. Once we have realized that
the message is invalid, let's move the pointer in the buffer even if the
message is invalid. We don't want to look at this data again.
(before) $ build-rawhide/userdbctl --output=json user test.user
n/a: varlink: setting state idle-client
/run/systemd/userdb/io.systemd.Multiplexer: Sending message: {"method":"io.systemd.UserDatabase.GetUserRecord","parameters":{"userName":"test.user","service":"io.systemd.Multiplexer"}}
/run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state idle-client → awaiting-reply
/run/systemd/userdb/io.systemd.Multiplexer: New incoming message: {...}
/run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state awaiting-reply → pending-disconnect
/run/systemd/userdb/io.systemd.Multiplexer: New incoming message: {...}
/run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state pending-disconnect → disconnected
^C
(after) $ n/a: varlink: setting state idle-client
/run/systemd/userdb/io.systemd.Multiplexer: Sending message: {"method":"io.systemd.UserDatabase.GetUserRecord","parameters":{"userName":"test.user","service":"io.systemd.Multiplexer"}}
/run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state idle-client → awaiting-reply
/run/systemd/userdb/io.systemd.Multiplexer: New incoming message: {...}
/run/systemd/userdb/io.systemd.Multiplexer: Failed to parse JSON: Invalid argument
/run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state awaiting-reply → pending-disconnect
/run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state pending-disconnect → processing-disconnect
Got lookup error: io.systemd.Disconnected
/run/systemd/userdb/io.systemd.Multiplexer: varlink: changing state processing-disconnect → disconnected
Failed to find user test.user: Input/output error
This should fix #16683 and https://bugs.gentoo.org/735072.
man: add hint how to show password strings with userdbctl
I started working on a command-line switch to show passwords also in
"pretty" mode. I can submit that code for review if anyone thinks that
woul be useful, but after writing the man page I realized that it's a
fairly niche case, and the hint in the man page is a sufficient
replacement.
basic/escape: use consistent location for "*" in function declarations
I think it's nicer to move it to the left, since the function
is already a pointer by itself, and it just happens to return a pointer,
and the two concepts are completely separate.
shared/{user,group}-record-nss: adjust filtering of "valid" passwords
We would reject various passwords that glibc accepts, for example ""
or any descrypted password. Accounts with empty password are definitely
useful, for example for testing or in scenarios where a password is not
needed. Also, using weak encryption methods is probably not a good idea,
it's not the job of our nss helpers to decide that: they should just
faithfully forward whatever data is there.
Also rename the function to make it more obvious that the returned answer
is not in any way certain.
Rework how we cache mtime to figure out if units changed
Instead of assuming that more-recently modified directories have higher mtime,
just look for any mtime changes, up or down. Since we don't want to remember
individual mtimes, hash them to obtain a single value.
This should help us behave properly in the case when the time jumps backwards
during boot: various files might have mtimes that in the future, but we won't
care. This fixes the following scenario:
We have /etc/systemd/system with T1. T1 is initially far in the past.
We have /run/systemd/generator with time T2.
The time is adjusted backwards, so T2 will be always in the future for a while.
Now the user writes new files to /etc/systemd/system, and T1 is updated to T1'.
Nevertheless, T1 < T1' << T2.
We would consider our cache to be up-to-date, falsely.
This check was added in d904afc730268d50502f764dfd55b8cf4906c46f. It would only
apply in the case where the cache hasn't been loaded yet. I think we pretty
much always have the cache loaded when we reach this point, but even if we
didn't, it seems better to try to reload the unit. So let's drop this check.
pid1: use the cache mtime not clock to "mark" load attempts
We really only care if the cache has been reloaded between the time when we
last attempted to load this unit and now. So instead of recording the actual
time we try to load the unit, just store the timestamp of the cache. This has
the advantage that we'll notice if the cache mtime jumps forward or backward.
Also rename fragment_loadtime to fragment_not_found_time. It only gets set when
we failed to load the unit and the old name was suggesting it is always set.
In https://bugzilla.redhat.com/show_bug.cgi?id=1871327
(and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1867930
and most likely https://bugzilla.redhat.com/show_bug.cgi?id=1872068) we try
to load a non-existent unit over and over from transaction_add_job_and_dependencies().
My understanding is that the clock was in the future during inital boot,
so cache_mtime is always in the future (since we don't touch the fs after initial boot),
so no matter how many times we try to load the unit and set
fragment_loadtime / fragment_not_found_time, it is always higher than cache_mtime,
so manager_unit_cache_should_retry_load() always returns true.
The name is misleading, since we aren't really loading the unit from cache — if
this function returns true, we'll try to load the unit from disk, updating the
cache in the process.
Florian Klink [Sat, 29 Aug 2020 17:57:24 +0000 (19:57 +0200)]
homed: fix log message to honor real homework path
This seems to be overridable by setting the SYSTEMD_HOMEWORK_PATH env
variable, but the error message always printed the SYSTEMD_HOMEWORK_PATH
constant.
However, this variable is only defined if HAVE_BLKID is set resulting in
the following build failure if cryptsetup is enabled but not libblkid:
../src/shared/dissect-image.c:1336:34: error: 'N_DEVICE_NODE_LIST_ATTEMPTS' undeclared (first use in this function)
1336 | for (unsigned i = 0; i < N_DEVICE_NODE_LIST_ATTEMPTS; i++) {
|