With some hardware it takes milliseconds to get the HW TX timestamp.
Rework the code to handle multiple suspended client-only sockets at the
same time in order to allow longer timeouts, which may overlap for
different sources. Instead of waiting for the first read event simply
suspend the socket and create timeout when the HW TX timestamp is
requested.
refclock_phc: support multiple extpps refclocks on one PHC
The Linux kernel (as of 6.2) has a shared queue of external timestamps
for all descriptors of the same PHC. If multiple refclocks using the
same PHC and the same or different channels were specified, some
refclocks didn't receive any or most of their timestamps, depending on
the rate and timing of the events (with the previous commit avoiding
blocking reads).
Track extpps-enabled refclocks in an array. Add PHC index to the PHC
instance. When a timestamp is read from the descriptor, provide it to
all refclocks that have the same PHC index and a channel matching the
event.
Make sure the timestamp is different from the previous one in case the
kernel will be improved to duplicate the timestamps for different
descriptors.
Reported-by: Matt Corallo <ntp-lists@mattcorallo.com>
sys_linux: avoid blocking in reading of external PHC timestamp
The kernel has a common queue for all readers of a PHC device. With
multiple PHC refclocks using the same device some reads blocked. PHC
devices don't seem to support non-blocking reads. Use poll() to check if
a timestamp is available before reading from the descriptor.
Miroslav Lichvar [Mon, 27 Feb 2023 14:00:50 +0000 (15:00 +0100)]
ntp: don't adjust poll interval when waiting for NTS-KE
Don't adjust the NTP polling interval and decrement the burst count when
NAU_PrepareRequestAuth() fails (e.g. no NTS-KE response received yet,
network being down, or the server refusing connections), same as if an
NTP request could not be sent. Rely on the rate limiting implemented in
the NTS code.
Miroslav Lichvar [Thu, 23 Feb 2023 12:10:11 +0000 (13:10 +0100)]
nts: use shorter NTS-KE retry interval when network is down
When chronyd configured with an NTS source not specified as offline and
resolvable without network was started before the network was up, it was
using an unnecessarily long NTS-KE retry interval, same as if the server
was refusing the connections.
When the network is down, the connect() call made from NKC_Start() on
the non-blocking TCP socket should fail with a different error than
EINPROGRESS and cause NKC_Start() to return with failure. Add a constant
2-second retry interval (matching default iburst) for this case.
Miroslav Lichvar [Thu, 23 Feb 2023 13:58:29 +0000 (14:58 +0100)]
nts: destroy NTS-KE client right after failed start
When NKC_Start() fails (e.g. due to unreachable network), don't wait for
the next poll to destroy the client and another poll to create and start
it again.
In a non-tty session with chronyc it is not possible to detect the
end of the response without relying on timeouts, or separate responses
to a repeated command if using the -c option.
Add -e option to end each response with a line containing a single dot.
sourcestats: don't fudge refclock LastRx in sources report
The sample time used in calculation of the last_meas_ago (LastRx) value
in the sources report is aligned to the second to minimize the leak
of the NTP receive timestamp, which could be useful in some attacks.
There is no need to do that with reference clocks, which are often used
with very short polling intervals and an extra second in the LastRx
value can be misinterpreted as a missed sample.
Miroslav Lichvar [Thu, 26 Jan 2023 15:21:11 +0000 (16:21 +0100)]
sources: warn about detected falsetickers
Log a warning message for each detected falseticker, but only once
between changes in the selection of the best source. Don't print all
sources when no majority is reached as that case has its own warning
message.
Miroslav Lichvar [Wed, 25 Jan 2023 13:29:06 +0000 (14:29 +0100)]
conf: warn if not having read-only access to keys
After dropping root privileges, log a warning message if chronyd
doesn't have read access or has (unnecessary) write access to the
files containing symmetric and server NTS keys.
Miroslav Lichvar [Thu, 19 Jan 2023 15:09:40 +0000 (16:09 +0100)]
keys+nts: warn if loading world-readable/writable key
Log a warning message if the file specified by the keyfile or
ntsserverkey directive is world-readable or writable, which is likely
an insecure misconfiguration. There is no check of directories
containing the file.
Miroslav Lichvar [Wed, 18 Jan 2023 15:14:10 +0000 (16:14 +0100)]
refclock: convert mismatched timeval in SOCK messages
On 32-bit glibc-based (>=2.34) systems, allow the SOCK client to send
messages with timevals using the other time_t size than chrony. If the
length of the received message corresponds to the other size, convert
the timeval and move the rest of the message before its processing.
This is needed for compatibility with the current development version of
gpsd, which forces 64-bit time_t on these systems, while chrony needs to
be compiled with the same time_t as gnutls.
Miroslav Lichvar [Thu, 12 Jan 2023 14:23:21 +0000 (15:23 +0100)]
doc: deprecate SHM refclocks in favor of SOCK
The NTP SHM refclock protocol has the following properties:
- the memory segments have a predictable key (first segment 0x4e545030)
- it's expected to work in any order of starting chronyd and the program
providing samples to chronyd, i.e. both the consumer and producer need
to be able to create the segment
- the producer and consumer generally don't know under which user is
the other side running (e.g. gpsd can create the segment as root and
also as nobody after it drops root privileges)
- there is no authentication of data provided via SHM
- there is no way to restart the protocol
This makes it difficult for chronyd to ensure it is receiving
measurements from the process that the admin expects it to and not some
other process that managed to create the segment before it was started.
It's up to the admin to configure the system so that chronyd or the
producer is started before untrusted applications or users can create
the segment, or at least verify at some point later that the segment was
created with the expected owner and permissions.
There doesn't seem to be a backward-compatible fix of the protocol. Even
if one side could detect the segment had a wrong owner or permissions,
it wouldn't be able to tell the other side to reattach after recreating
the segment with the expected owner and permissions, if it still had the
permissions to do that.
The protocol would need to specify which side is responsible for
creating the segment and the start order would need to strictly follow
that.
As gpsd (likely the most common refclock source for chronyd) now
supports in the latest version SOCK even for message-based timing,
update the man page and FAQ to deprecate SHM in favor of SOCK.
Miroslav Lichvar [Tue, 10 Jan 2023 14:02:49 +0000 (15:02 +0100)]
examples: add chronyd-restricted.service
This is a more restricted version of the chronyd service intended for
minimal NTP/NTS client configurations. The daemon is started without
root privileges and is allowed to write only to its own runtime, state,
and log directories. It cannot bind to privileged ports in order to
operate as an NTP server, or provide monitoring access over IPv4/IPv6.
It cannot use reference clocks, HW timestamping, RTC tracking, and other
features.
Miroslav Lichvar [Wed, 14 Dec 2022 14:15:41 +0000 (15:15 +0100)]
sources: add function to modify selection options
Add a function to add new selection options or remove existing options
specified in the configuration for both NTP sources and reference
clocks.
Provide a pair of IP address and reference ID to identify the source
depending on the type. Find the source directly in the array of sources
instead of going through the NSR hashtable for NTP sources to not
complicate it unnecessarily.
Log important changes from chronyc for auditing purposes.
Add log messages for:
- loaded symmetric keys and server NTS keys (logged also on start)
- modified maxupdateskew and makestep
- enabled/disabled local reference mode (logged also on start)
- reset time smoothing (logged also on clock steps)
- reset sources
Miroslav Lichvar [Tue, 15 Nov 2022 15:38:50 +0000 (16:38 +0100)]
ntp: log added and removed sources
Log a message when a single NTP source or pool of sources is added or
removed. Use the INFO severity if it's a result of a chronyc command or
(re)load of sourcefiles (which are assumed to change over time), and
DEBUG for other contexts, e.g. sources loaded from the config, sources
removed when pruning pools after reaching maxsources, and other parts of
normal operation.
Miroslav Lichvar [Tue, 15 Nov 2022 14:05:36 +0000 (15:05 +0100)]
logging: support context-specific severity
Allow messages to have severity set to INFO or DEBUG depending on the
context in which they are made to allow logging important changes made
from chronyc or sourcefile, but not spam the system log if those changes
are normally expected (e.g. specified in the config).
Miroslav Lichvar [Mon, 24 Oct 2022 14:14:35 +0000 (16:14 +0200)]
nts: warn if server started without ntsdumpdir
If an NTS server is configured without ntsdumpdir, keys will not be
saved and reloaded after restart, which will cause existing cookies
to be invalidated and can cause a short-term denial of service if
the server has so many clients that it cannot handle them all
making an NTS-KE session within one polling interval.
Log a warning message if a server key+certificate is specified without
ntsdumpdir.
Miroslav Lichvar [Wed, 19 Oct 2022 12:57:16 +0000 (14:57 +0200)]
nts: fix number of extension fields after failed encryption
If the authenticator SIV encryption fails (e.g. due to wrong nonce
length), decrement the number of extension fields to keep the packet
info consistent.
Miroslav Lichvar [Thu, 13 Oct 2022 13:35:53 +0000 (15:35 +0200)]
nts: change ntskeys format to support different algorithms
Specify the AEAD ID for each key saved in the ntskeys file instead of
one ID for all keys. Keep support for loading files in the old format.
This will allow servers to save their keys after upgrading to a new
version with AES-128-GCM-SIV support before the loaded AES-SIV-CMAC-256
keys are rotated out.
If an unsupported key is found, don't load any keys. Also, change the
severity of the error message from debug to error.
Miroslav Lichvar [Wed, 12 Oct 2022 14:46:56 +0000 (16:46 +0200)]
nts: add support for encrypting cookies with AES-128-GCM-SIV
If AES-128-GCM-SIV is available on the server, use it for encryption of
cookies. This makes them shorter by 4 bytes due to shorter nonce and it
might also improve the server performance.
After server upgrade and restart with ntsdumpdir, the switch will happen
on the second rotation of the server key. Clients should accept shorter
cookies without restarting NTS-KE. The first response will have extra
padding in the authenticator field to make the length symmetric.
Miroslav Lichvar [Tue, 11 Oct 2022 12:36:14 +0000 (14:36 +0200)]
nts: add server support for authentication with AES-128-GCM-SIV
Keep a server SIV instance for each available algorithm.
Select AES-128-GCM-SIV if requested by NTS-KE client as the first
supported algorithm.
Instead of encoding the AEAD ID in the cookie, select the algorithm
according to the length of decrypted keys. (This can work as a long as
all supported algorithms use keys with different lengths.)
Miroslav Lichvar [Mon, 10 Oct 2022 14:35:20 +0000 (16:35 +0200)]
nts: add client support for authentication with AES-128-GCM-SIV
If AES-128-GCM-SIV is available on the client, add it to the requested
algorithms in NTS-KE as the first (preferred) entry.
If supported on the server, it will make the cookies shorter, which
will get the length of NTP messages containing only one cookie below
200 octets. This should make NTS more reliable in networks where longer
NTP packets are filtered as a mitigation against amplification attacks
exploiting the ntpd mode 6/7 protocol.
Miroslav Lichvar [Tue, 11 Oct 2022 10:32:04 +0000 (12:32 +0200)]
nts: make sure encrypted S2C and C2S keys have equal length
Don't allow a cookie to contain keys with different lengths to not break
the assumption made in decoding, if there will ever be a case where this
could be requested.
Miroslav Lichvar [Mon, 10 Oct 2022 10:25:47 +0000 (12:25 +0200)]
siv: add functions to return min and max nonce length
While AES-SIV-CMAC allows nonces of any length, AES-GCM-SIV requires
exactly 12 bytes, which is less than the unpadded minimum length of 16
used in the NTS authenticator field. These functions will be needed to
support both ciphers in the NTS code.
In glibc 2.36 was added the arc4random family of functions. However,
unlike on other supported systems, it is not a user-space PRNG
implementation. It just wraps the getrandom() system call with no
buffering, which causes a performance loss on NTP servers due to
the function being called twice for each response to add randomness
to the RX and TX timestamp below the clock precision.
Don't check for arc4random on Linux to keep using the buffered
getrandom().
Replace NULL in test code of functions which have (at least in glibc) or
could have arguments marked as nonnull to avoid the -Wnonnull warnings,
which breaks the detection with the -Werror option.
test: fix ntp_core unit test to disable source selection
If the randomly generated timestamps are close to the current time, the
source can be selected for synchronization, which causes a crash when
logging the source name due to uninitialized ntp_sources.
Specify the source with the noselect option to prevent selection.
Miroslav Lichvar [Thu, 21 Jul 2022 13:16:47 +0000 (15:16 +0200)]
ntp: add maxdelayquant option
Add a new test for maximum delay using a long-term estimate of a
p-quantile of the peer delay. If enabled, it replaces the
maxdelaydevratio test. It's main advantage is that it is not sensitive
to outliers corrupting the minimum delay.
As it can take a large number of samples for the estimate to reach the
expected value and adapt to a new value after a network change, the
option is recommended only for local networks with very short polling
intervals.
Miroslav Lichvar [Tue, 19 Jul 2022 14:28:32 +0000 (16:28 +0200)]
ntp: rework filter option to count missing samples
Instead of waiting for the sample filter to accumulate the specified
number of samples and then deciding if the result is acceptable, count
missing samples and get the result after the specified number of polls.
This should work better when samples are dropped at a high rate. The
source and clock update interval will be stable as long as at least
one sample can be collected.
Miroslav Lichvar [Mon, 18 Jul 2022 10:50:05 +0000 (12:50 +0200)]
ntp: enable sub-second poll sooner with filter option
When the minimum round-trip time is checked to enable a sub-second
polling interval, consider also the last sample in the filter to avoid
waiting for the first sample to be accumulated in sourcestats.
Miroslav Lichvar [Mon, 18 Jul 2022 10:43:13 +0000 (12:43 +0200)]
ntp: fix initial poll to follow non-LAN minimum
If a sub-second polling interval is configured, initialize the local
poll to 0 to avoid a shorter interval between the first and second
request in case no response to the first request is received (in time).
Miroslav Lichvar [Thu, 14 Jul 2022 12:51:24 +0000 (14:51 +0200)]
client: check for stdout errors
Return with an error code from chronyc if the command is expected to
print some data and fflush() or ferror() indicates an error. This should
make it easier for scripts to detect missing data when redirected to a
file.
Filtering was moved to a separate source file in commit c498c21fad35 ("refclock: split off median filter). It looks like
MedianFilter struct somehow survived the split. Remove it to reduce
confusion.
Miroslav Lichvar [Thu, 30 Jun 2022 08:18:48 +0000 (10:18 +0200)]
ntp: don't use first response in interleaved mode
With the first interleaved response coming after a basic response the
client is forced to select the four timestamps covering most of the last
polling interval, which makes measured delay very sensitive to the
frequency offset between server and client. To avoid corrupting the
minimum delay held in sourcestats (which can cause testC failures),
reject the first interleaved response in the client/server mode as
failing the test A.
This does not change anything for the symmetric mode, where both sets of
the four timestamps generally cover a significant part of the polling
interval.
Miroslav Lichvar [Tue, 14 Jun 2022 14:31:22 +0000 (16:31 +0200)]
sys_generic: damp slew oscillation due to delayed stop
If the computer is overloaded so much that chronyd cannot stop a slew
within one second of the scheduled end and the actual duration is more
than doubled (2 seconds with the minimum duration of 1 second), the
overshoot will be larger than the intended correction. If these
conditions persist, the oscillation will grow up to the maximum offset
allowed by maxslewrate and the delay in stopping.
Monitor the excess duration as an exponentially decaying maximum value
and don't allow any slews shorter than 5 times the value to damp the
oscillation. Ignore delays longer than 100 seconds, assuming they have a
different cause (e.g. the system was suspended and resumed) and are
already handled in the scheduler by triggering cancellation of the
ongoing slew.
This should also make it safer to shorten the minimum duration if
needed.
Estimate the 1st and 2nd 10-quantile of the reading delay and accept
only readings between them unless the error of the offset predicted from
previous samples is larger than the minimum reading error. With the 25
PHC readings per ioctl it should combine about 2-3 readings.
This should improve hwclock tracking and synchronization stability when
a PHC reading delay occasionally falls below the normal expected
minimum, or all readings in the batch are delayed significantly (e.g.
due to high PCIe load).
Miroslav Lichvar [Wed, 18 May 2022 10:16:33 +0000 (12:16 +0200)]
quantiles: add support for quantile estimation
Add estimation of quantiles using the Frugal-2U streaming algorithm
(https://arxiv.org/pdf/1407.1121v1.pdf). It does not need to save
previous samples and adapts to changes in the distribution.
Allow multiple estimates of the same quantile and select the median for
better stability.
Move processing of PHC readings from sys_linux to hwclock, where
statistics can be collected and filtering improved.
In the PHC refclock driver accumulate the samples even if not in the
external timestamping mode to update the context which will be needed
for improved filtering.
Increase the number of requested readings from 10 to 25 - the maximum
accepted by the PTP_SYS_OFFSET* ioctls. This should improve stability of
HW clock tracking and PHC refclock.
Miroslav Lichvar [Thu, 19 May 2022 08:09:08 +0000 (10:09 +0200)]
doc: improve hwtimestamp description
Latest versions of ethtool print only the shorter lower-case names of
capabilities and filters. Explain that chronyd doesn't synchronize the
PHC and refer to the new vclock feature of the kernel, which should be
used by applications that need a synchronized PHC (e.g. ptp4l and
phc2sys) in order to not interfere with chronyd.
Miroslav Lichvar [Thu, 12 May 2022 09:53:15 +0000 (11:53 +0200)]
local: cancel remaining correction after external step
Instead of the generic clock driver silently zeroing the remaining
offset after detecting an external step, cancel it properly with the
slew handlers in order to correct timestamps that are not reset in
handling of the unknown step (e.g. the NTP local TX).
Miroslav Lichvar [Wed, 11 May 2022 09:53:07 +0000 (11:53 +0200)]
refclock: set minimum maxlockage in local mode
Use 3 as the minimum maxlockage in the local mode to avoid disruptions
due to losing the lock when a single sample is missed, e.g. when the PPS
driver polling interval is slightly longer than the pulse interval and a
pulse is skipped.