But it seems that we don't want those calls at all. The test was originally
added with the call in a6ee01caf3409ba9820e8824b9262fbac31a9f77, but I don't
see why we should override this. If the user wants to execute the test with
mempool disabled, we shouldn't ignore that.
basic/fileio: optimize buffer sizes in read_full_virtual_file()
We'd proceed rather inefficiently: the initial buffer size was LINE_MAX/2,
i.e. only 1k. We can read 4k at the same cost.
Also, we'd try to allocate 1025, 2049, 4097 bytes, i.e. always one higher than
the power-of-two size. Effectively the allocation would be bigger, and we'd
waste the additional space. So let's allocate aligned to the power-of-two size.
size=4095, 8191, 16383, so we allocate 4k, 8k, 16k.
basic/fileio: simplify calculation of buffer size in read_full_virtual_file()
We'd first assign a value up to SSIZE_MAX, and then immediately check if we
have a value bigger than READ_FULL_BYTES_MAX. This wasn't exactly wrong, but a
bit roundabout. Let's immediately assign the value from the appropriate range
or error out.
Anita Zhang [Tue, 23 Mar 2021 07:49:28 +0000 (00:49 -0700)]
process-util: dont allocate max length to read /proc/PID/cmdline
Alternative title: Replace get_process_cmdline()'s fopen()/fread() with
read_full_virtual_file().
When RLIMIT_STACK is set to infinity:infinity, _SC_ARG_MAX will
return 4611686018427387903 (depending on the system, but definitely
something larger than most systems have). It's impractical to allocate this
in one go when most cmdlines are much shorter than that.
Instead use read_full_virtual_file() which seems to increase the buffer
depending on the size of the contents.
Unfortunately the new syscall causes test-event to hang. 32 bit architectures
seem affected: i686 and arm32 in fedora koji. 32 bit build of test-event hangs
reliably under valgrind:
If I set epoll_pwait2_absent=true, so the new function is never called, then
the issue does not reproduce. It seems to be strictly tied to the syscall.
We rely on mktime() normalizing the time. The man page does not say that it'll
move the time forward, but our algorithm relies on this. So let's catch this
case explicitly.
With this patch:
$ TZ=Europe/Dublin faketime 2021-03-21 build/systemd-analyze calendar --iterations=5 'Sun *-*-* 01:00:00'
Normalized form: Sun *-*-* 01:00:00
Next elapse: Sun 2021-03-21 01:00:00 GMT
(in UTC): Sun 2021-03-21 01:00:00 UTC
From now: 59min left
Iter. #2: Sun 2021-04-04 01:00:00 IST
(in UTC): Sun 2021-04-04 00:00:00 UTC
From now: 1 weeks 6 days left <---- note the 2 week jump here
Iter. #3: Sun 2021-04-11 01:00:00 IST
(in UTC): Sun 2021-04-11 00:00:00 UTC
From now: 2 weeks 6 days left
Iter. #4: Sun 2021-04-18 01:00:00 IST
(in UTC): Sun 2021-04-18 00:00:00 UTC
From now: 3 weeks 6 days left
Iter. #5: Sun 2021-04-25 01:00:00 IST
(in UTC): Sun 2021-04-25 00:00:00 UTC
From now: 1 months 4 days left
The syntax like "0666" is very unclear. It only makes sense for some subset of
people who do C programming. Let's use the much more sensible modern python
syntax instead.
test-calendarspec: do not convert timezone "" to ":"
I *think* it doesn't actually make any difference, because ":" will be ignored. 437f48a471f51ac9dd2697ee3b848a71b4f101df added prefixing with ":", but didn't
take into account the fact that we also use "" with a different meaning than
NULL here. But let's restore the original behaviour of specifying the empty
string.
The output is rather long at this makes it easier to jump to the right place.
Also use normal output routines and set_unset_env() to make things more
compact.
shared/calendarspec: constify parameter and simplify assignments to variable
The scope of start & stop is narrowed down, and they are assigned only once.
No functional change, but I think the code is easier to read this way.
Also add a comment to make the code easier to read.
resolved: don't accept responses to query unless they completely answer our questions
When we checking if the responses we collected for a DnsQuery are
sufficient to complete it we previously only check if one of the
collected response RRs matches at least one of the question RR keys.
This changes the logic to require that there must be at least one
response RR matched *each* of the question RR keys before considering
the answer complete.
Otherwise we might end up accepting an A reply as complete answer for an
A/AAAA query and vice versa, but we want to make sure we wait until we
get a reply on both types before returning this to the user in all
cases.
This has been broken for basically forever, but didn't surface until b1eea703e01da1e280e179fb119449436a0c9b8e since until then we'd basically
ignore the auxiliary RRs included in CNAME/DNAME replies. Once that
commit was made we'd start using the auxiliary RRs included in
CNAME/DNAME replies but those typically included only A or only AAAA
which we then took for complete.
Sergey Bugaev [Mon, 22 Mar 2021 15:31:12 +0000 (18:31 +0300)]
log: protect errno in log_open()
Commit 0b1f3c768ce1bd1490a5e53f539976dcef8ca765 has introduced log_open()
calls after exec fails post-fork. However, the log_open() call itself could
change the value of errno, which, for me, manifested in:
$ coredumpctl gdb
...
Failed to invoke gdb: Success
shared/calendarspec: abort calculation after 1000 iterations
We have a bug where we seem to enter an infinite loop when running in the
Europe/Dublin timezone. The timezone is "special" because it has negative SAVE
values. The handling of this should obviously be fixed, but let's use a
belt-and-suspenders approach, and gracefully fail if we fail to find an answer
within a specific number of attempts. The code in this function is rather
complex, and it's hard to rule out another bug in the future.
repart: make sure to grow partition table after growing backing loopback file
This fixes the --size= switch, i.e. where we grow a disk image: after
growing it we need to expand the partition table so that its idea of the
the medium size matches the new reality. Otherwise our disk size
calculations in the subsequent steps might still use the original
ungrown size.
(This used to work, I guess this was borked when libfdisk learnt the
concept of "minimized" partition tables)
Michael Gisbers [Fri, 19 Mar 2021 10:38:53 +0000 (11:38 +0100)]
correct incorrect command in NEWS (#19048)
* for /dev/vsock a file permission of 0o666 was mentioned but 0666 is probably better understood, so let's use that
* correct non existing command 'ip dev'
Frantisek Sumsal [Thu, 18 Mar 2021 10:59:53 +0000 (11:59 +0100)]
coccinelle: filter out a couple of 'false-positive' transformations
* flag-set.cocci: perform the transformation only if the second
argument is a constant
* sd-journal/lookup3.c: skip the cocci completely for this file, since
it's not "ours"
* strjoina.cocci: skip the transformation on the "test_strjoina" test,
since it intentionally tests the "incorrect" expression we're trying to
transform (the same thing was already done in strjoin.cocci)
Luca Boccassi [Wed, 17 Mar 2021 14:34:36 +0000 (14:34 +0000)]
resolved: simplify min_ttl check
rr is asserted upon a few lines above, no need to check for null.
Coverity-found issue, CID 1450844
CID 1450844: Null pointer dereferences (REVERSE_INULL)
Null-checking "rr" suggests that it may be null, but it has already
been dereferenced on all paths leading to the check.
fileio: don't use realloc() in read_full_virtual_file()
We aren't interested in the data previousl read, hence free() followed
by malloc() is typically better since it means libc doesn't have to
restore the contained data needlessly.
systemctl: pecify read_full_file() size argument as NULL
If it is specified as NULL read_full_file() assumes the caller wants a C
string, and it looks for embedded NUL bytes to ensure that works. Given
we don#t actually use the size argument here, let's drop it.
(in one case the size argument is used, but not for actually processing
the full returned data, but just as a shortcut to compare things with
the original string. Let's drop use of that there, too given the risk of
embedded NUL bytes in the data read.)
tree-wide: use read_full_virtual_file() where appropriate
Wherever we read virtual files we better should use
read_full_virtual_file(), to make sure we get a consistent response
given how weird the kernel's handling with partial read on such file
systems is.
Anita Zhang [Tue, 16 Mar 2021 00:21:45 +0000 (17:21 -0700)]
oomd: sort by pgscan rate not pgscan
For pressure based killing we want to target who has the highest
increase in pgscan from the previous interval (vs. the previous logic
which used raw pgscan). This will prevent biasing towards long running
cgroups as mentioned in #19007.
Mike Gilbert [Tue, 9 Mar 2021 22:57:37 +0000 (17:57 -0500)]
cg_unified_cached: return ENOMEDIUM if we cannot find a known hierarchy
When the test suite is being run in a foreign environment,
/sys/fs/cgroup might not be set up in a way that we recognize.
Returning ENOMEDIUM causes the tests to be skipped in this case.
Yu Watanabe [Mon, 8 Mar 2021 06:39:53 +0000 (15:39 +0900)]
sd-event: re-check new epoll events when a child event is queued
Previously, when a process outputs something and exit just after
epoll_wait() but before process_child(), then the IO event is ignored
even if the IO event has higher priority. See #18190.
This can be solved by checking epoll event again after process_child().
However, there exists a possibility that another process outputs and
exits just after process_child() but before the second epoll_wait().
When the IO event has lower priority than the child event, still IO
event is processed.
So, this makes new epoll events and child events are checked in a loop
until no new event is detected. To prevent an infinite loop, the number
of maximum trial is set to 10.
resolved: don't flush answer RRs on CNAME redirect too early
When doing a CNAME/DNAME redirect let's first check if the answer we
already have fully answers the redirected question already. If so, let's
use that. If not, let's properly restart things.
This simply removes one call to dns_answer_reset() that was placed too
early: instead of resetting when we detect a CNAME/DNAME redirect, do so
only after checking if the answer we already have doesn't match the
reply, and then decide to *actually* follow it. Or in other words: rely
on the dns_answer_reset() call in dns_query_go() which we'll call to
actually begin with the redirected question.
(This doesn't really matter as much as one might think, since our cache
stepped in anyway and answered the questions before going back to the
network. However, this adds noise if RRs with very short TTLs are cached
– which some CDNs do – and is of course relavant when people turn off
the local cache.)
Previously by mistake we'd always match every single reply we get in a
CNAME chain to the original question from the stub client. That's
broken, we need to test it against the CNAME query we are currently
looking at.
The effect of this incorrect matching was that we'd assign the RRs to
the wrong section since we'd assume they'd be auxiliary answers instead
of primary answers.
When responding from DNS cache, let's slightly tweak how the TTL is
lowered: as before let's round down when converting from our internal µs
to the external seconds. (This is preferable, since records should
better be cached too short instead of too long.) Let's avoid rounding
down to zero though, since that has special semantics in many cases (in
particular mDNS). Let's just use 1s in that case.
resolved: take shortest TTL of all of RRs in answer as cache lifetime
We nowadays cache full answer RRset combinations instead of just the
exact matching rrset. This means we should not cache RRs that are not
immediate answers to our question for longer then their own RRs. Or in
other words: let's determine the shortest TTL of all RRs in the whole
answer, and use that as cache lifetime.