Sergey Bugaev [Mon, 22 Mar 2021 15:31:12 +0000 (18:31 +0300)]
log: protect errno in log_open()
Commit 0b1f3c768ce1bd1490a5e53f539976dcef8ca765 has introduced log_open()
calls after exec fails post-fork. However, the log_open() call itself could
change the value of errno, which, for me, manifested in:
$ coredumpctl gdb
...
Failed to invoke gdb: Success
repart: make sure to grow partition table after growing backing loopback file
This fixes the --size= switch, i.e. where we grow a disk image: after
growing it we need to expand the partition table so that its idea of the
the medium size matches the new reality. Otherwise our disk size
calculations in the subsequent steps might still use the original
ungrown size.
(This used to work, I guess this was borked when libfdisk learnt the
concept of "minimized" partition tables)
Michael Gisbers [Fri, 19 Mar 2021 10:38:53 +0000 (11:38 +0100)]
correct incorrect command in NEWS (#19048)
* for /dev/vsock a file permission of 0o666 was mentioned but 0666 is probably better understood, so let's use that
* correct non existing command 'ip dev'
Frantisek Sumsal [Thu, 18 Mar 2021 10:59:53 +0000 (11:59 +0100)]
coccinelle: filter out a couple of 'false-positive' transformations
* flag-set.cocci: perform the transformation only if the second
argument is a constant
* sd-journal/lookup3.c: skip the cocci completely for this file, since
it's not "ours"
* strjoina.cocci: skip the transformation on the "test_strjoina" test,
since it intentionally tests the "incorrect" expression we're trying to
transform (the same thing was already done in strjoin.cocci)
Luca Boccassi [Wed, 17 Mar 2021 14:34:36 +0000 (14:34 +0000)]
resolved: simplify min_ttl check
rr is asserted upon a few lines above, no need to check for null.
Coverity-found issue, CID 1450844
CID 1450844: Null pointer dereferences (REVERSE_INULL)
Null-checking "rr" suggests that it may be null, but it has already
been dereferenced on all paths leading to the check.
fileio: don't use realloc() in read_full_virtual_file()
We aren't interested in the data previousl read, hence free() followed
by malloc() is typically better since it means libc doesn't have to
restore the contained data needlessly.
systemctl: pecify read_full_file() size argument as NULL
If it is specified as NULL read_full_file() assumes the caller wants a C
string, and it looks for embedded NUL bytes to ensure that works. Given
we don#t actually use the size argument here, let's drop it.
(in one case the size argument is used, but not for actually processing
the full returned data, but just as a shortcut to compare things with
the original string. Let's drop use of that there, too given the risk of
embedded NUL bytes in the data read.)
tree-wide: use read_full_virtual_file() where appropriate
Wherever we read virtual files we better should use
read_full_virtual_file(), to make sure we get a consistent response
given how weird the kernel's handling with partial read on such file
systems is.
Anita Zhang [Tue, 16 Mar 2021 00:21:45 +0000 (17:21 -0700)]
oomd: sort by pgscan rate not pgscan
For pressure based killing we want to target who has the highest
increase in pgscan from the previous interval (vs. the previous logic
which used raw pgscan). This will prevent biasing towards long running
cgroups as mentioned in #19007.
Mike Gilbert [Tue, 9 Mar 2021 22:57:37 +0000 (17:57 -0500)]
cg_unified_cached: return ENOMEDIUM if we cannot find a known hierarchy
When the test suite is being run in a foreign environment,
/sys/fs/cgroup might not be set up in a way that we recognize.
Returning ENOMEDIUM causes the tests to be skipped in this case.
Yu Watanabe [Mon, 8 Mar 2021 06:39:53 +0000 (15:39 +0900)]
sd-event: re-check new epoll events when a child event is queued
Previously, when a process outputs something and exit just after
epoll_wait() but before process_child(), then the IO event is ignored
even if the IO event has higher priority. See #18190.
This can be solved by checking epoll event again after process_child().
However, there exists a possibility that another process outputs and
exits just after process_child() but before the second epoll_wait().
When the IO event has lower priority than the child event, still IO
event is processed.
So, this makes new epoll events and child events are checked in a loop
until no new event is detected. To prevent an infinite loop, the number
of maximum trial is set to 10.
resolved: don't flush answer RRs on CNAME redirect too early
When doing a CNAME/DNAME redirect let's first check if the answer we
already have fully answers the redirected question already. If so, let's
use that. If not, let's properly restart things.
This simply removes one call to dns_answer_reset() that was placed too
early: instead of resetting when we detect a CNAME/DNAME redirect, do so
only after checking if the answer we already have doesn't match the
reply, and then decide to *actually* follow it. Or in other words: rely
on the dns_answer_reset() call in dns_query_go() which we'll call to
actually begin with the redirected question.
(This doesn't really matter as much as one might think, since our cache
stepped in anyway and answered the questions before going back to the
network. However, this adds noise if RRs with very short TTLs are cached
– which some CDNs do – and is of course relavant when people turn off
the local cache.)
Previously by mistake we'd always match every single reply we get in a
CNAME chain to the original question from the stub client. That's
broken, we need to test it against the CNAME query we are currently
looking at.
The effect of this incorrect matching was that we'd assign the RRs to
the wrong section since we'd assume they'd be auxiliary answers instead
of primary answers.
When responding from DNS cache, let's slightly tweak how the TTL is
lowered: as before let's round down when converting from our internal µs
to the external seconds. (This is preferable, since records should
better be cached too short instead of too long.) Let's avoid rounding
down to zero though, since that has special semantics in many cases (in
particular mDNS). Let's just use 1s in that case.
resolved: take shortest TTL of all of RRs in answer as cache lifetime
We nowadays cache full answer RRset combinations instead of just the
exact matching rrset. This means we should not cache RRs that are not
immediate answers to our question for longer then their own RRs. Or in
other words: let's determine the shortest TTL of all RRs in the whole
answer, and use that as cache lifetime.
Luca Boccassi [Sun, 14 Mar 2021 12:36:15 +0000 (12:36 +0000)]
man: specify that ProtectProc= does not work with root/cap_sys_ptrace
When using hidepid=invisible on procfs, the kernel will check if the
gid of the process trying to access /proc is the same as the gid of
the process that mounted the /proc instance, or if it has the ptrace
capability:
Given we set up the /proc instance as root for system services,
The same restriction applies to CAP_SYS_PTRACE, if a process runs with
it then hidepid=invisible has no effect.
ProtectProc effectively can only be used with User= or DynamicUser=yes,
without CAP_SYS_PTRACE.
Update the documentation to explicitly state these limitations.
Daan De Meyer [Fri, 12 Mar 2021 22:09:44 +0000 (22:09 +0000)]
boot: Move console declarations to missing_efi.h
These were added to eficonex.h in gnu-efi 3.0.13. Let's move them
to missing_efi.h behind an appropriate guard to fix the build with
recent versions of gnu-efi.
Kevin Backhouse [Fri, 12 Mar 2021 17:00:56 +0000 (18:00 +0100)]
ask-password-api: fix error handling on invalid unicode character
The integer overflow happens when utf8_encoded_valid_unichar() returns an error
code. The error code is a negative number: -22. This overflows when it is
assigned to `z` (type `size_t`). This can cause an infinite loop if the value
of `q` is 22 or larger.
To reproduce the bug, you need to run `systemd-ask-password` and enter an
invalid unicode character, followed by a backspace character.
Yu Watanabe [Mon, 8 Mar 2021 06:39:53 +0000 (15:39 +0900)]
sd-event: re-check new epoll events when a child event is queued
Previously, when a process outputs something and exit just after
epoll_wait() but before process_child(), then the IO event is ignored
even if the IO event has higher priority. See #18190.
This can be solved by checking epoll event again after process_child().
However, there exists a possibility that another process outputs and
exits just after process_child() but before the second epoll_wait().
When the IO event has lower priority than the child event, still IO
event is processed.
So, this makes new epoll events and child events are checked in a loop
until no new event is detected. To prevent an infinite loop, the number
of maximum trial is set to 10.
Frantisek Sumsal [Thu, 11 Mar 2021 11:49:00 +0000 (12:49 +0100)]
repart: fix the loop dev support check
Since f17bdf8264e231fa31c769bff2475ef698487d0b the test-repart was
effectively disabled, since `/dev/loop-control` is a character special
file, whereas `-f` works only on regular files. Even though we could use
`-c` to check specifically for character special files, let's use `-e`
just in case.
Michal Sekletar [Tue, 9 Mar 2021 16:22:32 +0000 (17:22 +0100)]
install: refactor find_symlinks() and don't search for symlinks recursively
After all we are only interested in symlinks either in top-level config
directory or in .wants and .requires sub-directories.
As a bonus this should speed up ListUnitFiles() roughly 3-4x on systems
with a lot of units that use drop-ins (e.g. SSH jump hosts with a lot of
user session scopes).
Tables with only one column aren't really tables, they are lists. And if
each cell only consists of a single word, they are probably better
written in a single line. Hence, shorten the man page a bit, and list
boot loader spec partition types in a simple sentence.
Also, drop "root-secondary" from the list. When dissecting images we'll
upgrade "root-secondary" to "root" if we mount it, and do so only if
"root" doesn't exist. Hence never mention "root-secondary" as we never
will mount a partition under that id.
This makes sure nspawn's --volatile=yes switch works again: there we
have a read-only image that is overmounted by a tmpfs (with the
exception of /usr). This we need to mkdir all mount points even though
the image is read-only.
Hence, let's drop the optimizatio of avoiding mkdir() on images that are
read-only, it's wrong and misleading here, since the image itself might
be read-only but our mounts are not.
dissect-image: clean up meaning of DISSECT_IMAGE_MKDIR
Previously handling of DISSECT_IMAGE_MKDIR was pretty weird and broken:
it would control both if we create the top-level mount point when
mounting an image, and the inner mount points for images that consist of
multiple file systems. However, the latter is redundant, since 1f0f82f1311e4c52152b8e2b6f266258709c137d does this too, a few lines
further up – unconditionally!
Hence, let's make the meaning of DISSECT_IMAGE_MKDIR more strict: it
shall be only about the top-level mount point, not about the inner ones
(where we'll continue to create what is missing alwayway). Having a
separate flag for the top-level mount point is relevant, since the mount
point dir created by it will remain on the host fs – unlike the
directories we create inside the image, which will stay within the
image.
This slightly change of meaning is actually inline with what the flag is
actually used for and documented in systemd-dissect.