In glibc 2.36 was added the arc4random family of functions. However,
unlike on other supported systems, it is not a user-space PRNG
implementation. It just wraps the getrandom() system call with no
buffering, which causes a performance loss on NTP servers due to
the function being called twice for each response to add randomness
to the RX and TX timestamp below the clock precision.
Don't check for arc4random on Linux to keep using the buffered
getrandom().
Replace NULL in test code of functions which have (at least in glibc) or
could have arguments marked as nonnull to avoid the -Wnonnull warnings,
which breaks the detection with the -Werror option.
test: fix ntp_core unit test to disable source selection
If the randomly generated timestamps are close to the current time, the
source can be selected for synchronization, which causes a crash when
logging the source name due to uninitialized ntp_sources.
Specify the source with the noselect option to prevent selection.
Miroslav Lichvar [Thu, 21 Jul 2022 13:16:47 +0000 (15:16 +0200)]
ntp: add maxdelayquant option
Add a new test for maximum delay using a long-term estimate of a
p-quantile of the peer delay. If enabled, it replaces the
maxdelaydevratio test. It's main advantage is that it is not sensitive
to outliers corrupting the minimum delay.
As it can take a large number of samples for the estimate to reach the
expected value and adapt to a new value after a network change, the
option is recommended only for local networks with very short polling
intervals.
Miroslav Lichvar [Tue, 19 Jul 2022 14:28:32 +0000 (16:28 +0200)]
ntp: rework filter option to count missing samples
Instead of waiting for the sample filter to accumulate the specified
number of samples and then deciding if the result is acceptable, count
missing samples and get the result after the specified number of polls.
This should work better when samples are dropped at a high rate. The
source and clock update interval will be stable as long as at least
one sample can be collected.
Miroslav Lichvar [Mon, 18 Jul 2022 10:50:05 +0000 (12:50 +0200)]
ntp: enable sub-second poll sooner with filter option
When the minimum round-trip time is checked to enable a sub-second
polling interval, consider also the last sample in the filter to avoid
waiting for the first sample to be accumulated in sourcestats.
Miroslav Lichvar [Mon, 18 Jul 2022 10:43:13 +0000 (12:43 +0200)]
ntp: fix initial poll to follow non-LAN minimum
If a sub-second polling interval is configured, initialize the local
poll to 0 to avoid a shorter interval between the first and second
request in case no response to the first request is received (in time).
Miroslav Lichvar [Thu, 14 Jul 2022 12:51:24 +0000 (14:51 +0200)]
client: check for stdout errors
Return with an error code from chronyc if the command is expected to
print some data and fflush() or ferror() indicates an error. This should
make it easier for scripts to detect missing data when redirected to a
file.
Filtering was moved to a separate source file in commit c498c21fad35 ("refclock: split off median filter). It looks like
MedianFilter struct somehow survived the split. Remove it to reduce
confusion.
Miroslav Lichvar [Thu, 30 Jun 2022 08:18:48 +0000 (10:18 +0200)]
ntp: don't use first response in interleaved mode
With the first interleaved response coming after a basic response the
client is forced to select the four timestamps covering most of the last
polling interval, which makes measured delay very sensitive to the
frequency offset between server and client. To avoid corrupting the
minimum delay held in sourcestats (which can cause testC failures),
reject the first interleaved response in the client/server mode as
failing the test A.
This does not change anything for the symmetric mode, where both sets of
the four timestamps generally cover a significant part of the polling
interval.
Miroslav Lichvar [Tue, 14 Jun 2022 14:31:22 +0000 (16:31 +0200)]
sys_generic: damp slew oscillation due to delayed stop
If the computer is overloaded so much that chronyd cannot stop a slew
within one second of the scheduled end and the actual duration is more
than doubled (2 seconds with the minimum duration of 1 second), the
overshoot will be larger than the intended correction. If these
conditions persist, the oscillation will grow up to the maximum offset
allowed by maxslewrate and the delay in stopping.
Monitor the excess duration as an exponentially decaying maximum value
and don't allow any slews shorter than 5 times the value to damp the
oscillation. Ignore delays longer than 100 seconds, assuming they have a
different cause (e.g. the system was suspended and resumed) and are
already handled in the scheduler by triggering cancellation of the
ongoing slew.
This should also make it safer to shorten the minimum duration if
needed.
Estimate the 1st and 2nd 10-quantile of the reading delay and accept
only readings between them unless the error of the offset predicted from
previous samples is larger than the minimum reading error. With the 25
PHC readings per ioctl it should combine about 2-3 readings.
This should improve hwclock tracking and synchronization stability when
a PHC reading delay occasionally falls below the normal expected
minimum, or all readings in the batch are delayed significantly (e.g.
due to high PCIe load).
Miroslav Lichvar [Wed, 18 May 2022 10:16:33 +0000 (12:16 +0200)]
quantiles: add support for quantile estimation
Add estimation of quantiles using the Frugal-2U streaming algorithm
(https://arxiv.org/pdf/1407.1121v1.pdf). It does not need to save
previous samples and adapts to changes in the distribution.
Allow multiple estimates of the same quantile and select the median for
better stability.
Move processing of PHC readings from sys_linux to hwclock, where
statistics can be collected and filtering improved.
In the PHC refclock driver accumulate the samples even if not in the
external timestamping mode to update the context which will be needed
for improved filtering.
Increase the number of requested readings from 10 to 25 - the maximum
accepted by the PTP_SYS_OFFSET* ioctls. This should improve stability of
HW clock tracking and PHC refclock.
Miroslav Lichvar [Thu, 19 May 2022 08:09:08 +0000 (10:09 +0200)]
doc: improve hwtimestamp description
Latest versions of ethtool print only the shorter lower-case names of
capabilities and filters. Explain that chronyd doesn't synchronize the
PHC and refer to the new vclock feature of the kernel, which should be
used by applications that need a synchronized PHC (e.g. ptp4l and
phc2sys) in order to not interfere with chronyd.
Miroslav Lichvar [Thu, 12 May 2022 09:53:15 +0000 (11:53 +0200)]
local: cancel remaining correction after external step
Instead of the generic clock driver silently zeroing the remaining
offset after detecting an external step, cancel it properly with the
slew handlers in order to correct timestamps that are not reset in
handling of the unknown step (e.g. the NTP local TX).
Miroslav Lichvar [Wed, 11 May 2022 09:53:07 +0000 (11:53 +0200)]
refclock: set minimum maxlockage in local mode
Use 3 as the minimum maxlockage in the local mode to avoid disruptions
due to losing the lock when a single sample is missed, e.g. when the PPS
driver polling interval is slightly longer than the pulse interval and a
pulse is skipped.
Miroslav Lichvar [Wed, 11 May 2022 09:36:57 +0000 (11:36 +0200)]
refclock: restart local mode after losing lock
A refclock in the local mode is locked to itself. When the maxlockage
check failed after missing some samples, it failed permanently and the
refclock was not able to accumulate any new samples.
When the check fails, drop all samples and reset the source to start
from scratch.
Miroslav Lichvar [Wed, 11 May 2022 06:57:22 +0000 (08:57 +0200)]
siv: set key directly with gnutls
A new function is provided by the latest gnutls (should be in 3.7.5) to
set the key of an AEAD cipher. If available, use it to avoid destroying
and creating a new SIV instance with each key change.
This improves the server NTS-NTP performance if using gnutls for SIV.
nts: don't exit if initialization of priority cache fails
Initialization of the gnutls priority cache can fail depending on the
system crypto policy (e.g. disabled TLS1.3). Log an error mentioning
TLS, but continue to run without the server/client credentials.
Use snprintf() instead of strcat() and don't try to parse commands
longer than 2048 characters to make it consistent with the chrony.conf
parser, avoid memory allocation, and not rely on the system ARG_MAX to
keep the length sane.
Miroslav Lichvar [Wed, 23 Mar 2022 14:17:03 +0000 (15:17 +0100)]
examples: replace grep command in NM dispatcher script
Some grep implementations detect binary data and return success without
matching whole line. This might be an issue for the DHCPv6 NTP FQDN
check. The GNU grep in the C locale seems to check only for the NUL
character, which cannot be passed in an environment variable, but other
implementations might behave differently and there doesn't seem to be a
portable way to force matching the whole line.
Instead of the grep command, check for invalid characters by comparing
the length of the input passed through "tr -d -c".
When an added source is specified by IP address, save the original
string instead of formatting a new string from the parsed address, which
can be different (e.g. compressed vs expanded IPv6 address).
This fixes the chronyc sourcename command and -N option to print the IP
address exactly as it was specified in the configuration file or chronyc
add command.
Miroslav Lichvar [Wed, 23 Feb 2022 10:31:24 +0000 (11:31 +0100)]
sys_linux: don't require configurable pin for external PPS
Some PHCs that have a PPS input don't have configurable pins (their
function is hardcoded). Accept a negative pin index to skip the pin
configuration before requesting external timestamping.
Miroslav Lichvar [Wed, 23 Feb 2022 09:23:18 +0000 (10:23 +0100)]
refclock: improve precision with large offset
If a SHM or PHC refclock has a very large offset compensated by the
offset option, or ignored with the pps or local option, there is a
persistent loss of precision in the calculation of the sample offset
using the double format.
Rework the code to delay the calculation of the accumulated offset to
include the specificed compensation and remaining correction of the
system clock, where the calculation can be split to improve the
precision. In the pps mode ignore integer seconds competely.
The precision of the SOCK refclock is now limited to 1 nanosecond due to
the extra double->timespec->double conversion.
Miroslav Lichvar [Tue, 22 Feb 2022 10:24:00 +0000 (11:24 +0100)]
refclock: add local option
Add "local" option to specify that the reference clock is an
unsynchronized clock which is more stable than the system clock (e.g.
TCXO, OCXO, or atomic clock) and it should be used as a local standard
to stabilize the system clock.
Handle the local refclock as a PPS refclock locked to itself which gives
the unsynchronized status to be ignored in the source selection. Wait
for the refclock to get at least minsamples samples and adjust the clock
directly to follow changes in the refclock's sourcestats frequency and
offset.
There should be at most one refclock specified with this option.
Miroslav Lichvar [Mon, 14 Feb 2022 09:55:22 +0000 (10:55 +0100)]
sources: handle unsynchronized sources in selection
Allow sources to accumulate samples with the leap status set to not
synchronized. Define a new state for them to be ignored in the
selection. This is intended for sources that are never synchronized and
will be used only for stabilization.
Miroslav Lichvar [Thu, 10 Feb 2022 14:24:25 +0000 (15:24 +0100)]
sourcestats: clamp minsamples and maxsamples in initialization
Don't leave the variables set to values outside their effective range.
This has no functional impact, but makes it clear what is the precedence
of the two settings.
Tested-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Reviewed-by: Christian Ehrhardt <christian.ehrhardt@canonical.com> Signed-off-by: Michael Hudson-Doyle <michael.hudson@canonical.com> Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
examples: handle more actions in NM dispatcher script
Run the chronyc onoffline command also when the connectivity-change
and dhcp6-change actions are reported by the NetworkManager dispatcher.
The latter should not be necessary, but there currently doesn't seem to
be any action for IPv6 becoming routable after duplicate address
detection, so at least in networks using DHCPv6, IPv6 NTP servers should
not be stuck in the offline state from a previously reported action.
Miroslav Lichvar [Wed, 26 Jan 2022 15:00:36 +0000 (16:00 +0100)]
client: fix waitsync command to reconnect to server
If chronyc waitsync was started before chronyd, it would try all
addresses (Unix socket, IPv4, IPv6) and get stuck with no address, not
getting any response later when chronyd was running.
Reset the address index in open_io() when returning with failure to
allow the next call to start with the first address again.
Reported-by: Jan Mikkelsen <janm@transactionware.com>
Miroslav Lichvar [Thu, 16 Dec 2021 12:08:19 +0000 (13:08 +0100)]
ntp: set local address on PTP socket on FreeBSD
Fix the FreeBSD-specific code checking for a bound IPv4 socket to
include the new PTP port. This should fix a multihomed server to respond
to NTP-over-PTP requests from the address which received the request.
Fixes: be3158c4e5b2 ("ntp: add support for NTP over PTP")
Miroslav Lichvar [Thu, 16 Dec 2021 10:36:26 +0000 (11:36 +0100)]
cmdmon: fix transmit_reply() to not read uninitialized data
In the FreeBSD-specific code checking for a bound IPv4 socket, make
sure it is not a Unix domain address to avoid reading uninitialized
IP-specific fields.
Miroslav Lichvar [Tue, 14 Dec 2021 09:04:39 +0000 (10:04 +0100)]
ntp: avoid unnecessary source lookups
Avoid searching the hash table of sources when a packet in the client
mode is received. It cannot be a response from our source. Analogously,
avoid source lookups for transmitted packets in the server mode. This
doesn't change anything for packets in symmetric modes, which can be
requests and responses at the same time.
This slightly improves the maximum packet rate handled as a server.
For a long time, the Solaris support in chrony wasn't tested on a real
Solaris system, but on illumos/OpenIndiana, which was forked from
OpenSolaris when it was discontinued in 2010.
While Solaris and illumos might have not diverged enough to make a
difference for chrony, replace Solaris in the documentation with illumos
to make it clear which system is actually supported by the chrony
project.
The dosynctodr kernel variable needs to be set to 0 to block automatic
synchronization of the system clock to the hardware clock. chronyd used
to disable dosynctodr on Solaris versions before 2.6, but it seems it is
now needed even on current versions as the clock driver sets frequency
only without calling adjtime() or setting the ntp_adjtime() PLL offset.
This issue was reproduced and fix tested on current OpenIndiana.
Fixes: 8feb37df2b48 ("sys_solaris: use timex driver")
In addition to the 16s limit in per-response change in the monotonic
offset, don't allow the total accumulated offset injected in sourcestats
to be larger than 16 seconds.
reference: check for unset leap_when in is_leap_close()
Check that the leap_when variable is set before testing a timestamp for
being close to a leap second. This allows the first measurement to be
accepted if starting at the Unix epoch (e.g. in a test).
ntp: check for zero timestamp in initial TX timeout
Calculate the delay since the previous transmission only if the
TX timestamp is actually set. This removes an unnecessary delay when
starting at the Unix epoch in 1970 (e.g. in a test).
Miroslav Lichvar [Mon, 29 Nov 2021 11:30:09 +0000 (12:30 +0100)]
rtc: don't drop first sample after initial trim
It seems there is no longer an issue with the first sample after the
initial trim and it can be accumulated. It might have been a workaround
for an unrelated bug which was fixed since then.
This fixes the number of samples reported in rtcdata briefly jumping to
65535 and also brings back the expectation that n_samples is never
negative.
Miroslav Lichvar [Tue, 23 Nov 2021 13:41:08 +0000 (14:41 +0100)]
main: add assertions for timespec signedness
Some of the code (e.g. util and clientlog) may work with negative
values. Require that time_t and the tv_nsec types are signed. This seems
to be the case on all supported systems, but it it is not required by
POSIX.
Miroslav Lichvar [Tue, 23 Nov 2021 12:17:26 +0000 (13:17 +0100)]
util: reset GetRandom functions in helpers after fork
Close /dev/urandom and drop cached getrandom() data after forking helper
processes to avoid them getting the same sequence of random numbers
(e.g. two NTS-KE helpers generating cookies with identical nonces).
arc4random() is assumed to be able to detect forks and reseed
automatically.
This is not strictly necessary with the current code, which does not use
the GetRandom functions before the NTS-KE helper processes are forked,
but that could change in future.
Also, call the reset function before exit to close /dev/urandom in order
to avoid valgrind reporting the file object as "still reachable".
Miroslav Lichvar [Tue, 23 Nov 2021 09:35:22 +0000 (10:35 +0100)]
ntp: fix exp1 EF search in process_response()
Don't ignore the magic field when searching for the exp1 extension
field in a received response. If there were two exp1 fields in the
packet, and only one of them had the expected magic value, it should
pick the right one.
Fixes: 2319f72b29a9 ("ntp: add client support for experimental extension field")
Miroslav Lichvar [Mon, 22 Nov 2021 15:44:24 +0000 (16:44 +0100)]
ntp: make default NTP version with xleave to be always 4
If the xleave option is enabled, ignore the key option and the hash
length. Always use version 4 as the default to get interleaved responses
from new chrony servers.
Miroslav Lichvar [Mon, 22 Nov 2021 14:52:01 +0000 (15:52 +0100)]
ntp: suppress monotonic timestamp if smoothing is enabled
Frequency transfer and time smoothing are conflicting features. Set the
monotonic timestamp in the experimental extension field to zero
(invalid) if time smoothing is activated.
Miroslav Lichvar [Mon, 22 Nov 2021 10:39:29 +0000 (11:39 +0100)]
ntp: add special value to experimental root delay/disp
The maximum value of the new 32-bit fields is slightly less than 16,
which can cause the NTP test #7 to pass for a server which has a zero
root delay but maximum root dispersion.
Interpret the maximum value as the maximum value of the original 32-bit
fields (~65536.0 seconds) for better compatibility with NTPv4.