Unfortunately the new syscall causes test-event to hang. 32 bit architectures
seem affected: i686 and arm32 in fedora koji. 32 bit build of test-event hangs
reliably under valgrind:
If I set epoll_pwait2_absent=true, so the new function is never called, then
the issue does not reproduce. It seems to be strictly tied to the syscall.
Yu Watanabe [Tue, 23 Mar 2021 05:38:18 +0000 (14:38 +0900)]
firewall-util: probe firewall backend in fw_ctx_new()
FirewallContext is used by networkd and nspawn. Both allocates the
context when it is really necessary. Hence, it is not necessary to delay
probing backend.
Moreover, if iptables backend is not enabled on build, and nftables is
not supported by kernel, previously `fw_nftables_init()` is called
everytime when we try to configure masquerade or dnat. It causes
significant performance loss.
We rely on mktime() normalizing the time. The man page does not say that it'll
move the time forward, but our algorithm relies on this. So let's catch this
case explicitly.
With this patch:
$ TZ=Europe/Dublin faketime 2021-03-21 build/systemd-analyze calendar --iterations=5 'Sun *-*-* 01:00:00'
Normalized form: Sun *-*-* 01:00:00
Next elapse: Sun 2021-03-21 01:00:00 GMT
(in UTC): Sun 2021-03-21 01:00:00 UTC
From now: 59min left
Iter. #2: Sun 2021-04-04 01:00:00 IST
(in UTC): Sun 2021-04-04 00:00:00 UTC
From now: 1 weeks 6 days left <---- note the 2 week jump here
Iter. #3: Sun 2021-04-11 01:00:00 IST
(in UTC): Sun 2021-04-11 00:00:00 UTC
From now: 2 weeks 6 days left
Iter. #4: Sun 2021-04-18 01:00:00 IST
(in UTC): Sun 2021-04-18 00:00:00 UTC
From now: 3 weeks 6 days left
Iter. #5: Sun 2021-04-25 01:00:00 IST
(in UTC): Sun 2021-04-25 00:00:00 UTC
From now: 1 months 4 days left
The syntax like "0666" is very unclear. It only makes sense for some subset of
people who do C programming. Let's use the much more sensible modern python
syntax instead.
test-calendarspec: do not convert timezone "" to ":"
I *think* it doesn't actually make any difference, because ":" will be ignored. 437f48a471f51ac9dd2697ee3b848a71b4f101df added prefixing with ":", but didn't
take into account the fact that we also use "" with a different meaning than
NULL here. But let's restore the original behaviour of specifying the empty
string.
The output is rather long at this makes it easier to jump to the right place.
Also use normal output routines and set_unset_env() to make things more
compact.
shared/calendarspec: constify parameter and simplify assignments to variable
The scope of start & stop is narrowed down, and they are assigned only once.
No functional change, but I think the code is easier to read this way.
Also add a comment to make the code easier to read.
resolved: don't accept responses to query unless they completely answer our questions
When we checking if the responses we collected for a DnsQuery are
sufficient to complete it we previously only check if one of the
collected response RRs matches at least one of the question RR keys.
This changes the logic to require that there must be at least one
response RR matched *each* of the question RR keys before considering
the answer complete.
Otherwise we might end up accepting an A reply as complete answer for an
A/AAAA query and vice versa, but we want to make sure we wait until we
get a reply on both types before returning this to the user in all
cases.
This has been broken for basically forever, but didn't surface until b1eea703e01da1e280e179fb119449436a0c9b8e since until then we'd basically
ignore the auxiliary RRs included in CNAME/DNAME replies. Once that
commit was made we'd start using the auxiliary RRs included in
CNAME/DNAME replies but those typically included only A or only AAAA
which we then took for complete.
Sergey Bugaev [Mon, 22 Mar 2021 15:31:12 +0000 (18:31 +0300)]
log: protect errno in log_open()
Commit 0b1f3c768ce1bd1490a5e53f539976dcef8ca765 has introduced log_open()
calls after exec fails post-fork. However, the log_open() call itself could
change the value of errno, which, for me, manifested in:
$ coredumpctl gdb
...
Failed to invoke gdb: Success
shared/calendarspec: abort calculation after 1000 iterations
We have a bug where we seem to enter an infinite loop when running in the
Europe/Dublin timezone. The timezone is "special" because it has negative SAVE
values. The handling of this should obviously be fixed, but let's use a
belt-and-suspenders approach, and gracefully fail if we fail to find an answer
within a specific number of attempts. The code in this function is rather
complex, and it's hard to rule out another bug in the future.
repart: make sure to grow partition table after growing backing loopback file
This fixes the --size= switch, i.e. where we grow a disk image: after
growing it we need to expand the partition table so that its idea of the
the medium size matches the new reality. Otherwise our disk size
calculations in the subsequent steps might still use the original
ungrown size.
(This used to work, I guess this was borked when libfdisk learnt the
concept of "minimized" partition tables)
Michael Gisbers [Fri, 19 Mar 2021 10:38:53 +0000 (11:38 +0100)]
correct incorrect command in NEWS (#19048)
* for /dev/vsock a file permission of 0o666 was mentioned but 0666 is probably better understood, so let's use that
* correct non existing command 'ip dev'
Frantisek Sumsal [Thu, 18 Mar 2021 10:59:53 +0000 (11:59 +0100)]
coccinelle: filter out a couple of 'false-positive' transformations
* flag-set.cocci: perform the transformation only if the second
argument is a constant
* sd-journal/lookup3.c: skip the cocci completely for this file, since
it's not "ours"
* strjoina.cocci: skip the transformation on the "test_strjoina" test,
since it intentionally tests the "incorrect" expression we're trying to
transform (the same thing was already done in strjoin.cocci)
Luca Boccassi [Wed, 17 Mar 2021 14:34:36 +0000 (14:34 +0000)]
resolved: simplify min_ttl check
rr is asserted upon a few lines above, no need to check for null.
Coverity-found issue, CID 1450844
CID 1450844: Null pointer dereferences (REVERSE_INULL)
Null-checking "rr" suggests that it may be null, but it has already
been dereferenced on all paths leading to the check.
fileio: don't use realloc() in read_full_virtual_file()
We aren't interested in the data previousl read, hence free() followed
by malloc() is typically better since it means libc doesn't have to
restore the contained data needlessly.
systemctl: pecify read_full_file() size argument as NULL
If it is specified as NULL read_full_file() assumes the caller wants a C
string, and it looks for embedded NUL bytes to ensure that works. Given
we don#t actually use the size argument here, let's drop it.
(in one case the size argument is used, but not for actually processing
the full returned data, but just as a shortcut to compare things with
the original string. Let's drop use of that there, too given the risk of
embedded NUL bytes in the data read.)
tree-wide: use read_full_virtual_file() where appropriate
Wherever we read virtual files we better should use
read_full_virtual_file(), to make sure we get a consistent response
given how weird the kernel's handling with partial read on such file
systems is.
Anita Zhang [Tue, 16 Mar 2021 00:21:45 +0000 (17:21 -0700)]
oomd: sort by pgscan rate not pgscan
For pressure based killing we want to target who has the highest
increase in pgscan from the previous interval (vs. the previous logic
which used raw pgscan). This will prevent biasing towards long running
cgroups as mentioned in #19007.
Mike Gilbert [Tue, 9 Mar 2021 22:57:37 +0000 (17:57 -0500)]
cg_unified_cached: return ENOMEDIUM if we cannot find a known hierarchy
When the test suite is being run in a foreign environment,
/sys/fs/cgroup might not be set up in a way that we recognize.
Returning ENOMEDIUM causes the tests to be skipped in this case.
execute: drop DissectImageFlags parameter from namespace_setup()
The function already has a ridiculous amount of paramaters, let's drop
one that is either not used at all or has a constant value and let's
pick it internally.
dissect-image: split DISSECT_IMAGE_REQUIRE_ROOT in two
Previously, the flag did two things at once: enable support for using
generic partitions as root fs if there were only one/allow use of
partition-table-less images as root fs. And secondly, insist that there
was a rootfs, and fail if not. Let's split these two in two separate
options so that they can be used independently of each other.
There are cases where one wants to use one without the other (i.e. when
inspecting things with systemd-dissect tool it should be OK to do so
even if image has no root fs), and it's cleaner anyway.
dissect-image: do an strverscmp() on the partition label of root/usr if multiple exist
Let's add a very simple mechanism for doing A/B updating of disk images:
for root + /usr and their verity partitions let's ue strverscmp() on the
label to determine which one to use when dissecting a disk image. That
way, if the root partition label contains a string such as "foo-0.15"
and another one "foo-0.16", the latter wins.
For other partition types let's stick to the logic of "first partition
found" win, as before. Versioning makes sense for partitions that
typically and primarily may carry software packages, but the other
partition types usuall don't.
tree-wide: make use of DISSECT_IMAGE_USR_NO_ROOT in various tools
Let's make use of the new dissection in all tools where this makes
sense, which are all tools that dissect images, except for those which
inherently operate on state/configuraiton and thus where an image
without state nor configuration is useless (e.g.
systemd-tmpfiles/systemd-firstboot/… --image= switch).
dissect-image: support images without rootfs but with /usr/
Let's add support for images that include an /usr/ file system but no
root fs. Mount a tmpfs as root for images like this, all controlled by a
new flag DISSECT_IMAGE_USR_NO_ROOT.
This is useful for entirely stateless images, that come up pristine on
every single boot.
Yu Watanabe [Mon, 8 Mar 2021 06:39:53 +0000 (15:39 +0900)]
sd-event: re-check new epoll events when a child event is queued
Previously, when a process outputs something and exit just after
epoll_wait() but before process_child(), then the IO event is ignored
even if the IO event has higher priority. See #18190.
This can be solved by checking epoll event again after process_child().
However, there exists a possibility that another process outputs and
exits just after process_child() but before the second epoll_wait().
When the IO event has lower priority than the child event, still IO
event is processed.
So, this makes new epoll events and child events are checked in a loop
until no new event is detected. To prevent an infinite loop, the number
of maximum trial is set to 10.
resolved: don't flush answer RRs on CNAME redirect too early
When doing a CNAME/DNAME redirect let's first check if the answer we
already have fully answers the redirected question already. If so, let's
use that. If not, let's properly restart things.
This simply removes one call to dns_answer_reset() that was placed too
early: instead of resetting when we detect a CNAME/DNAME redirect, do so
only after checking if the answer we already have doesn't match the
reply, and then decide to *actually* follow it. Or in other words: rely
on the dns_answer_reset() call in dns_query_go() which we'll call to
actually begin with the redirected question.
(This doesn't really matter as much as one might think, since our cache
stepped in anyway and answered the questions before going back to the
network. However, this adds noise if RRs with very short TTLs are cached
– which some CDNs do – and is of course relavant when people turn off
the local cache.)
Previously by mistake we'd always match every single reply we get in a
CNAME chain to the original question from the stub client. That's
broken, we need to test it against the CNAME query we are currently
looking at.
The effect of this incorrect matching was that we'd assign the RRs to
the wrong section since we'd assume they'd be auxiliary answers instead
of primary answers.