resolved: only attempt non-answer SOA RRs if they are parents of our query
There's no value in authenticating SOA RRs that are neither answer to
our question nor parent of our question (the latter being relevant so
that we have a TTL from the SOA field for negative caching of the actual
query).
By being to eager here, and trying to authenticate too much we run the
risk of creating cyclic deps between our transactions which then causes
the over-all authentication to fail.
KeyringMode option is useful for user services. Also, documentation for the
option suggests that the option applies to user services. However, setting the
option to any of its allowed values has no effect.
This commit fixes that and removes EXEC_NEW_KEYRING flag. The flag is no longer
necessary: instead of checking if the flag is set we can check if keyring_mode
is not equal to EXEC_KEYRING_INHERIT.
Michal Sekletar [Fri, 14 Dec 2018 14:17:27 +0000 (15:17 +0100)]
journald: correctly attribute log messages also with cgroupsv1
With cgroupsv1 a zombie process is migrated to root cgroup in all
hierarchies. This was changed for unified hierarchy and /proc/PID/cgroup
reports cgroup to which process belonged before it exited.
Be more suspicious about cgroup path reported by the kernel and use
unit_id provided by the log client if the kernel reports that process is
running in the root cgroup.
Users tend to care the most about 'log->unit_id' mapping so systemctl
status can correctly report last log lines. Also we wouldn't be able to
infer anything useful from "/" path anyway.
Tore Anderson [Mon, 17 Dec 2018 08:15:59 +0000 (09:15 +0100)]
resolve: enable EDNS0 towards the 127.0.0.53 stub resolver
This appears to be necessary for client software to ensure the reponse data
is validated with DNSSEC. For example, `ssh -v -o VerifyHostKeyDNS=yes -o
StrictHostKeyChecking=yes redpilllinpro01.ring.nlnog.net` fails if EDNS0 is
not enabled. The debugging output reveals that the `SSHFP` records were
found in DNS, but were considered insecure.
Note that the patch intentionally does *not* enable EDNS0 in the
`/run/systemd/resolve/resolv.conf` file (the one that contains `nameserver`
entries for the upstream DNS servers), as it is impossible to know for
certain that all the upstream DNS servers handles EDNS0 correctly.
dissect-image: wait for the main device and all partitions to be known by udev
Fixes #10526.
Even if we waited for the root device to appear, the mount could still fail if
we didn't wait for udev to initalize the device. In particular, the
/dev/block/n:m path used to mount the device is created by udev, and nspawn
would sometimes win the race and the mount would fail with -ENOENT.
The same wait is done for partitions, since if we try to mount them, the same
considerations apply.
Note: I first implemented a version which just does a loop (with a short wait).
In that approach, udev takes on average ~800 µs to initialize the loopback
device. The approach where we set up a monitor and avoid the loop is a bit
nicer. There doesn't seem to be a significant difference in speed.
With 1000 invocations of 'systemd-nspawn -i image.squashfs echo':
loop (previous approach):
real 4m52.625s
user 0m37.094s
sys 2m14.705s
monitor (this patch):
real 4m50.791s
user 0m36.619s
sys 2m14.039s
dissect-image would wait for the root device and paritions to appear. But if we
had an image with no partitions, we'd not wait at all. If the kernel or udev
were slow in creating device nodes or symlinks, subsequent mount attempt might
fail if nspawn won the race.
Calling wait_for_partitions_to_appear() in case of no partitions means that we
verify that the kernel agrees that there are no partitions. We verify that the
kernel sees the same number of partitions as blkid, so let's that also in this
case.
This makes the failure in #10526 much less likely, but doesn't eliminate it
completely. Stay tuned.
The function interface is the same, except that the output pointer may be NULL.
The implementation is slightly simplified by taking advantage of changes in
ancestor commit 'sd-device: attempt to read db again if it wasn't found', by
not creating a new sd_device object before re-checking the is_initialized
status.
v2:
- In v1, the old object was always used and the device received back from the
sd_device_monitor_start callback was ignored. I *think* the result will be
equivalent in both cases, because by the time we the callback gets called,
the db entry in the filesystem will also exist, and any subsequent access to
properties of the object would trigger a read of the database from disk. But
I'm not certain, and anyway, using the device object received in the callback
seems cleaner.
Normally, we don't care too much about what pahole reports. But this structure
could potentially be allocated for every device on the system, i.e. in a large
number of copies. 5 vs 7 cache lines is nice.
/* size: 400, cachelines: 7, members: 53 */
/* sum members: 330, holes: 12, sum holes: 70 */
/* last cacheline: 16 bytes */
/* size: 320, cachelines: 5, members: 53 */
/* bit holes: 1, sum bit holes: 6 bits */
/* bit_padding: 5 bits */
sd-device: reduce the number of implementations of device_read_db() we keep around
We had two very similar functions: device_read_db_aux and device_read_db,
and a number of wrappers for them:
device_read_db_aux
← device_read_db (in sd-device.c)
← all functions in sd-device.c, including sd_device_is_initialized
← device_read_db_force
← event_execute_rules_on_remove (in udev-event.c)
device_read_db (in device-private.c)
← functions in device_private.c (but not device_read_db_force):
device_get_devnode_{mode,uid,gid}
device_get_devlink_priority
device_get_watch_handle
device_clone_with_db
← called from udevadm, udev-{node,event,watch}.c
Before 7141e4f62c3f220872df3114c42d9e4b9525e43e (sd-device: don't retry loading
uevent/db files more than once), the two implementations were the same. In that
commit, device_read_db_aux was changed. Those changes were reverted in the parent
commit, so the two implementations are now again the same except for superficial
differences. This commit removes device_read_db (in sd-device.c), and renames
device_read_db_aux to device_read_db_internal and makes everyone use this one
implementation. There should be no functional change.
sd-device: attempt to read db again if it wasn't found
This mostly reverts "sd-device: don't retry loading uevent/db files more than
once", 7141e4f62c3f220872df3114c42d9e4b9525e43e. We will retry if we couldn't
access the file, but not if parsing failed.
Not re-reading the database at all just doesn't seem like a good idea. We have
two implementations of device_read_db, and one does that, and the other retries
to read the db. Re-reading seems more useful, since we can create the object
and then access properties as some later time when we know that the device has
been initialized and we can get useful results. Otherwise, we force the user to
destroy this object and create a new one.
This changes device_read_uevent_file() and device_read_db_aux(). See next
commit for description of where those functions are used.
NeilBrown [Thu, 4 Oct 2018 05:49:22 +0000 (15:49 +1000)]
core/mount: minimize impact on mount storm.
If we create 2000 mounts (on a 1-CPU qemu VM) with
mkdir -p /MNT/{1..2000}
time for i in {1..2000}; do mount --bind /etc /MNT/$i ; done
it takes around 20 seconds to complete. Much of this time is taken up
by systemd repeatedly processing /proc/self/mountinfo.
If I disable the processing, the time drops to about 4 seconds.
I have reports that on a larger system with multiple active user sessions, each
with it's own systemd, the impact can be higher.
One particular use-case where a large number of mounts can be expected in quick
succession is when the "clearcase" SCM starts up.
This patch modifies the handling up events from /proc/self/mountinfo so
that systemd backs off when a storm is detected. Specifically the time to process
mountinfo is measured, and the process will not be repeated until 10 times
that duration has passed. This ensures systemd won't use more than 10% of
real time processing mountinfo.
With this patch, my test above takes about 5 seconds.
Let's make sure the first hashmap we destroy also frees all machines,
because otherwise when freeing the other hashmaps we'll try to
deregister the contained machines from the hashmaps already destroyed.
Apparently, people do use nscd, hence play somewhat nice with it, and
let's explicitly flush nscd caches whenever we register a new
user/group.
This patch only adds the actual refresh request invocation. Later
commits then issue this call at appropriate moments.
Note that the nscd protocol is not officially documented though very
simple. This code is written very defensively so that incompatibilities
don't affect us much.
Given that glibc really has a duty to maintain compat between
differently compiled programs and their system nscd they can't break API
and thus it should be safe for us to implement an alternative,
minimalistic client.
Ideally this kind of explicit, global cache flushing would not be necessary.
However nscd currently has no cache coherency protocol, hence we can't
really implement this better. The only concept it knows is a TTL for
positive hosts lookups. Hoewver for negative lookups or any of the other
tables nothing is available.
lldp: simplify compare_func, using ?: to chain comparisons
The ?: operator is very useful for chaining comparison functions
(strcmp, memcmp, CMP), since its behavior is to return the result
of the comparison function call if non-zero, or continue evaluating
the chain of comparison functions.
This simplifies the code in that using a temporary `r` variable
to store the function results is no longer necessary and the checks
for non-zero to return are no longer needed either, resulting in a
typical three-fold reduction to the number of lines in the code.
Introduce a new memcmp_nn() to compare two memory buffers in
lexicographic order, taking length in consideration.
Tested: $ ninja -C build/ test
All test cases pass. In particular, test_multiple_neighbors_sorted()
in test-lldp would catch regressions introduced by this commit.
fileio: fail early if we can't return the number of bytes we read anymore in an int
This is mostly paranoia, but let's better be safer than sorry. This of
course means there's always an implicit limit to how much we can read at
a time of 2G. But that should be ample.
time-out
n 1: a brief suspension of play; "each team has two time-outs left"
From The Free On-line Dictionary of Computing (18 March 2015) [foldoc]:
timeout
A period of time after which an error condition is raised if
some event has not occured. A common example is sending a
message. If the receiver does not acknowledge the message
within some preset timeout period, a transmission error is
assumed to have occured.
Thomas Haller [Thu, 13 Dec 2018 18:59:46 +0000 (19:59 +0100)]
in-addr-util: fix undefined result for in4_addr_netmask_to_prefixlen(<0.0.0.0>)
u32ctz() was undefined for zero due to __builtin_ctz() [1].
Explicitly check for zero to make the behavior defined.
Note that this issue only affected in4_addr_netmask_to_prefixlen()
which is the only caller.
It may seem slightly odd, to return 32 (bits) for utz(0). But that
is what in4_addr_netmask_to_prefixlen() needs, and it probably makes
the most sense here.
Yu Watanabe [Fri, 14 Dec 2018 00:28:29 +0000 (09:28 +0900)]
sd-device: do not modify socket option(s) if socket is passed by PID1
If the socket fd is passed by PID1, then it is created by .socket unit
and we have already set sufficient option(s) for the socket.
So, let's not touch the passed socket.
units: replace symlinks in units/user/ by real files
We already *install* those as real files since de78fa9ba0be55b01066ca5a716c6673d76b817b.
Meson will start to copy symlinks as-is, so we would get dangling symlinks in
/usr/lib/systemd/user/.
I considered the layout in our sources to match the layout in the installation
filesystem (i.e. creating units/system/ and moving all files from units/ to
units/system/), but that seems overkill. By using normal files for both we get
some duplication, but those files change rarely, so it's not a big downside in
practice.