Miroslav Lichvar [Tue, 12 Sep 2023 08:18:01 +0000 (10:18 +0200)]
siv: add support for AES-GCM-SIV in gnutls
Add support for AES-128-GCM-SIV in the current development code of
gnutls. There doesn't seem to be an API to get the cipher's minimum and
maximum nonce length and it doesn't check for invalid lengths. Hardcode
and check the limits in chrony for now.
Miroslav Lichvar [Mon, 11 Sep 2023 13:29:04 +0000 (15:29 +0200)]
conf: fix reloading modified sources specified by IP address
When reloading a modified source from sourcedir which is ordered before
the original source (e.g. maxpoll was decreased), the new source is
added before the original one is removed. If the source is specified by
IP address, the addition fails due to the conflict with the original
source. Sources specified by hostname don't conflict. They are resolved
later (repeatedly if the resolver provides only conflicting addresses).
Split the processing of sorted source lists into two phases, so all
modified sources are removed before they are added again to avoid the
conflict.
socket: enable nanosecond resolution RX timestamp on FreeBSD
FreeBSD allows switching the receive timestamp format to struct timespec by
setting the SO_TS_CLOCK socket option to SO_TS_REALTIME after enabling
SO_TIMESTAMP. If successful, the kernel then starts adding SCM_REALTIME
control messages instead of SCM_TIMESTAMP.
Avoid frequently ending in the middle of a client/server exchange with
long delays. This changed after commit 4a11399c2ebb ("ntp: rework
calculation of transmit timeout").
ntp: don't require previous HW TX timestamp to wait for another
Clients sockets are closed immediately after receiving valid response.
Don't wait for the first early HW TX timestamp to enable waiting for
late timestamps. It may take a long time or never come if the HW/driver
is consistently slow. It's a chicken and egg problem.
Instead, simply check if HW timestamping is enabled on at least one
interface. Responses from NTP sources on other interfaces will always be
saved (for 1 millisecond by default).
If noselect is present in the configured options, don't assume it
cannot change and the effective options are equal. This fixes chronyc
selectopts +noselect command.
Fixes: 38777348143e ("sources: add function to modify selection options")
Miroslav Lichvar [Mon, 17 Jul 2023 14:22:19 +0000 (16:22 +0200)]
ntp: handle negotiated NTS-KE server in refreshment
When refreshing a source, compare the newly resolved addresses with the
originally resolved address instead of the current address to avoid
unnecessary replacements when the address is changed due to the NTS-KE
server negotiation.
Miroslav Lichvar [Mon, 26 Jun 2023 11:20:22 +0000 (13:20 +0200)]
makefile: compile getdate.o with -fwrapv option
The getdate code (copied from gnulib before it was switched to GPLv3)
has multiple issues with signed integer overflows. Use the -fwrapv
compiler option for this object to at least make the operations defined.
Miroslav Lichvar [Tue, 20 Jun 2023 14:23:34 +0000 (16:23 +0200)]
ntp: refresh IP addresses periodically
Refresh NTP sources specified by hostname periodically (every 2 weeks
by default) to avoid long-running instances using a server which is no
longer intended for service, even if it is still responding correctly
and would not be replaced as unreachable, and help redistributing load
in large pools like pool.ntp.org. Only one source is refreshed at a time
to not interrupt clock updates if there are multiple selectable servers.
The refresh directive configures the interval. A value of 0 disables
the periodic refreshment.
Miroslav Lichvar [Mon, 19 Jun 2023 14:10:45 +0000 (16:10 +0200)]
sched: reset timer queue in finalization
Don't leave dangling pointers to timer queue entries when they are
freed in the scheduler finalization in case some code tried to remove
a timer later.
Fixes: 6ea1082a72d8 ("sched: free timer blocks on exit")
Miroslav Lichvar [Thu, 15 Jun 2023 10:54:32 +0000 (12:54 +0200)]
sources: delay source replacement
Wait for four consecutive source selections giving a bad status
(falseticker, bad distance or jittery) before triggering the source
replacement. This should reduce the rate of unnecessary replacements
and shorten the time needed to find a solution when unreplaceable
falsetickers are preventing other sources from forming a majority due
to switching back and forth to unreachable servers.
Miroslav Lichvar [Wed, 14 Jun 2023 12:52:10 +0000 (14:52 +0200)]
sources: replace reachable sources in selection
Instead of waiting for the next update of reachability, trigger
replacement of falsetickers, jittery and distant sources as soon as
the selection status is updated in their SRC_SelectSource() call.
Miroslav Lichvar [Mon, 12 Jun 2023 14:11:10 +0000 (16:11 +0200)]
main: wait for parent process to terminate
When starting the daemon, wait in the grandparent process for the parent
process to terminate before exiting to avoid systemd logging a warning
"Supervising process $PID which is not our child". Waiting for the pipe
to be closed by the kernel when the parent process exits is not
sufficient.
Reported-by: Jan Pazdziora <jpazdziora@redhat.com>
Replacement attempts are globally rate limited to one per 7*2^8 seconds
to limit the rate of DNS requests for public servers like pool.ntp.org.
If multiple sources are repeatedly attempting replacement (at their
polling intervals), one source can be getting all attempts for periods
of time.
Use a randomly generated interval to randomize the order of source
replacements without changing the average rate.
client: avoid passing uninitialized address to format_name()
The clang memory sanitizer seems to trigger on an uninitialized value
passed to format_name() when the source is a refclock, even though the
value is not used for anything. Pass 0 in this case to avoid the error.
memory: use free() instead of realloc() for size 0
valgrind 3.21.0 reports realloc() of 0 bytes as an error due to having
different behavior on different systems. The only place where this can
happen in chrony is the array, which doesn't care what value realloc()
returns.
Modify the realloc wrapper to call free() in this case to make valgrind
happy.
Miroslav Lichvar [Mon, 29 May 2023 12:12:54 +0000 (14:12 +0200)]
nts: don't load zero-length keys with unsupported algorithm
Don't load keys and cookies from the client's dump file if it has an
unsupported algorithm and unparseable keys (matching the algorithm's
expected length of zero). They would fail all SIV operations and trigger
new NTS-KE session.
Miroslav Lichvar [Tue, 23 May 2023 13:37:06 +0000 (15:37 +0200)]
nts: initialize unused part of server key
Initialize the unused part of shorter server NTS keys (AES-128-GCM-SIV)
loaded from ntsdumpdir to avoid sending uninitialized data in requests
to the NTS-KE helper process.
Do that also for newly generated keys in case the memory will be
allocated dynamically.
Fixes: b1230efac333 ("nts: add support for encrypting cookies with AES-128-GCM-SIV")
Miroslav Lichvar [Mon, 22 May 2023 09:58:41 +0000 (11:58 +0200)]
ntp: randomize address selection on all source replacements
If the resolver orders addresses by IP family, there is more than one
address in the preferred IP family, and they are all reachable, but
not selectable (e.g. falsetickers in a small pool which cannot remove
them from DNS), chronyd is unable to switch to addresses in the other IP
family as it follows the resolver's order.
Enable randomization of the address selection for all source
replacements and not just replacement of (unreachable) tentative
sources. If the system doesn't have connectivity in the other family,
the addresses will be skipped and no change in behavior should be
observed.
Miroslav Lichvar [Wed, 17 May 2023 14:37:55 +0000 (16:37 +0200)]
ntp: set minimum polltarget
The polltarget value is used in a floating-point division in the
calculation of the poll adjustment. Set 1 as the minimum accepted
polltarget value to avoid working with infinite values.
Miroslav Lichvar [Tue, 16 May 2023 13:11:22 +0000 (15:11 +0200)]
ntp: reset polling interval when replacing sources
Set the polling interval to minpoll when changing address of a source,
but only if it is reachable to avoid increasing load on server or
network in case that is the reason for the source being unreachable.
This shortens the time needed to replace a falseticker or
unsynchronized source with a selectable source.
Miroslav Lichvar [Mon, 15 May 2023 14:26:21 +0000 (16:26 +0200)]
ntp: avoid unneccessary replacements on refresh command
When the refresh command is issued, instead of trying to replace all
NTP sources as if they were unreachable or falsetickers, keep using the
current address if it is still returned by the resolver for the name.
This avoids unnecessary loss of measurements and switching to
potentially unreachable addresses.
Miroslav Lichvar [Tue, 28 Mar 2023 13:33:50 +0000 (15:33 +0200)]
ntp: save response when waiting for HW TX timestamp
Rework handling of late HW TX timestamps. Instead of suspending reading
from client-only sockets that have HW TX timestamping enabled, save the
whole response if it is valid and a HW TX timestamp was received for the
source before. When the timestamp is received, or the configurable
timeout is reached, process the saved response again, but skip the
authentication test as the NTS code allows only one response per
request. Only one valid response per source can be saved. If a second
valid response is received while waiting for the timestamp, process both
responses immediately in the order they were received.
The main advantage of this approach is that it works on all sockets, i.e.
even in the symmetric mode and with NTP-over-PTP, and the kernel does
not need to buffer invalid responses.
Miroslav Lichvar [Thu, 23 Mar 2023 16:04:57 +0000 (17:04 +0100)]
ntp: rework calculation of transmit timeout
Previously, in the calculation of the next transmission time
corresponding to the current polling interval, the reference point was
the current time in the client mode (i.e. the time when the response is
processed) and the last transmission time in the symmetric mode.
Rework the code to use the last transmission in both modes and make it
independent from the time when the response is processed to avoid extra
delays due to waiting for HW TX timestamps.
Miroslav Lichvar [Thu, 23 Mar 2023 10:37:11 +0000 (11:37 +0100)]
cmdmon: define 64-bit integer
Add a structure for 64-bit integers without requiring 64-bit alignment
to be usable in CMD_Reply without struct packing.
Add utility functions for conversion to/from network order. Avoid using
be64toh() and htobe64() as they don't seem to be available on all
supported systems.
Miroslav Lichvar [Thu, 16 Mar 2023 15:56:28 +0000 (16:56 +0100)]
clientlog: count RX and TX timestamps for each source
Count served timestamps in all combinations of RX/TX and
daemon/kernel/hardware. Repurpose CLG_LogAuthNtpRequest() to update all
NTP-specific stats in one call per accepted request and response.
Miroslav Lichvar [Thu, 16 Mar 2023 16:19:33 +0000 (17:19 +0100)]
ntp: remove unnecessary check for NULL local timestamp
After 5f4cbaab7e0e ("ntp: optimize detection of clients using
interleaved mode") the local TX timestamp is saved for all requests
indicating interleaved mode even when no previous RX timestamp is found.
Miroslav Lichvar [Tue, 14 Mar 2023 11:23:21 +0000 (12:23 +0100)]
ntp: add maximum PHC poll interval
Specify maxpoll for HW timestamping (default minpoll + 1) to track the
PHC well even when there is little NTP traffic on the interface. After
each PHC reading schedule a timeout according to the maxpoll. Polling
between minpoll and maxpoll is still triggered by HW timestamps.
Wait for the first HW timestamp before adding the timeout to avoid
polling PHCs on interfaces that are enabled in the configuration but
not used for NTP. Add a new scheduling class to separate polling of
different PHCs to avoid too long intervals between processing I/O
events.
In some cases even the new timeout of 1 millisecond is not sufficient to
get all HW TX timestamps. Add a new directive to allow users to
specify longer timeouts.
With some hardware it takes milliseconds to get the HW TX timestamp.
Rework the code to handle multiple suspended client-only sockets at the
same time in order to allow longer timeouts, which may overlap for
different sources. Instead of waiting for the first read event simply
suspend the socket and create timeout when the HW TX timestamp is
requested.
refclock_phc: support multiple extpps refclocks on one PHC
The Linux kernel (as of 6.2) has a shared queue of external timestamps
for all descriptors of the same PHC. If multiple refclocks using the
same PHC and the same or different channels were specified, some
refclocks didn't receive any or most of their timestamps, depending on
the rate and timing of the events (with the previous commit avoiding
blocking reads).
Track extpps-enabled refclocks in an array. Add PHC index to the PHC
instance. When a timestamp is read from the descriptor, provide it to
all refclocks that have the same PHC index and a channel matching the
event.
Make sure the timestamp is different from the previous one in case the
kernel will be improved to duplicate the timestamps for different
descriptors.
Reported-by: Matt Corallo <ntp-lists@mattcorallo.com>
sys_linux: avoid blocking in reading of external PHC timestamp
The kernel has a common queue for all readers of a PHC device. With
multiple PHC refclocks using the same device some reads blocked. PHC
devices don't seem to support non-blocking reads. Use poll() to check if
a timestamp is available before reading from the descriptor.
Miroslav Lichvar [Mon, 27 Feb 2023 14:00:50 +0000 (15:00 +0100)]
ntp: don't adjust poll interval when waiting for NTS-KE
Don't adjust the NTP polling interval and decrement the burst count when
NAU_PrepareRequestAuth() fails (e.g. no NTS-KE response received yet,
network being down, or the server refusing connections), same as if an
NTP request could not be sent. Rely on the rate limiting implemented in
the NTS code.
Miroslav Lichvar [Thu, 23 Feb 2023 12:10:11 +0000 (13:10 +0100)]
nts: use shorter NTS-KE retry interval when network is down
When chronyd configured with an NTS source not specified as offline and
resolvable without network was started before the network was up, it was
using an unnecessarily long NTS-KE retry interval, same as if the server
was refusing the connections.
When the network is down, the connect() call made from NKC_Start() on
the non-blocking TCP socket should fail with a different error than
EINPROGRESS and cause NKC_Start() to return with failure. Add a constant
2-second retry interval (matching default iburst) for this case.
Miroslav Lichvar [Thu, 23 Feb 2023 13:58:29 +0000 (14:58 +0100)]
nts: destroy NTS-KE client right after failed start
When NKC_Start() fails (e.g. due to unreachable network), don't wait for
the next poll to destroy the client and another poll to create and start
it again.
In a non-tty session with chronyc it is not possible to detect the
end of the response without relying on timeouts, or separate responses
to a repeated command if using the -c option.
Add -e option to end each response with a line containing a single dot.
sourcestats: don't fudge refclock LastRx in sources report
The sample time used in calculation of the last_meas_ago (LastRx) value
in the sources report is aligned to the second to minimize the leak
of the NTP receive timestamp, which could be useful in some attacks.
There is no need to do that with reference clocks, which are often used
with very short polling intervals and an extra second in the LastRx
value can be misinterpreted as a missed sample.
Miroslav Lichvar [Thu, 26 Jan 2023 15:21:11 +0000 (16:21 +0100)]
sources: warn about detected falsetickers
Log a warning message for each detected falseticker, but only once
between changes in the selection of the best source. Don't print all
sources when no majority is reached as that case has its own warning
message.
Miroslav Lichvar [Wed, 25 Jan 2023 13:29:06 +0000 (14:29 +0100)]
conf: warn if not having read-only access to keys
After dropping root privileges, log a warning message if chronyd
doesn't have read access or has (unnecessary) write access to the
files containing symmetric and server NTS keys.