From: Aurelien DARRAGON Date: Thu, 19 Feb 2026 12:53:11 +0000 (+0100) Subject: BUG/MEDIUM: stats-file: detect and fix inconsistent shared clock when resuming from... X-Git-Tag: v3.4-dev5~2 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=09bf1162422c19a31443b1338e2520f8191fc960;p=thirdparty%2Fhaproxy.git BUG/MEDIUM: stats-file: detect and fix inconsistent shared clock when resuming from shm-stats-file When leveraging shm-stats-file, global_now_ms and global_now_ns are stored (and thus shared) inside the shared map, so that all co-processes share the same clock source. Since the global_now_{ns,ms} clocks are derived from now_ns, and given that now_ns is a monotonic clock (hence inconsistent from one host to another or reset after reboot) special care must be taken to detect situations where the clock stored in the shared map is inconsistent with the one from the local process during startup, and cannot be relied upon anymore. A common situation where the current implementation fails is resuming from a shared file after reboot: the global_now_ns stored in the shm-stats-file will be greater than the local now_ns after reboot, and applying the shared offset doesn't help since it was only relevant to processes prior to rebooting. Haproxy's clock code doesn't expect that (once the now offset is applied) global_now_ns > now_ns, and it creates ambiguous situation where the clock computations (both haproxy oriented and shm-stats-file oriented) are broken. To fix the issue, when we detect that the clock stored in the shm is off by more than SHM_STATS_FILE_HEARTBEAT_TIMEOUT (60s) from the local now_ns, since this situation is not supposed to happen in normal environment on the host, we assume that the shm file was previously used on a different system (or that the current host rebooted). In this case, we perform a manually adjustment of the now offset so that the monotonic clock from the current host is consistent again with the global_now_ns stored in the file. Doing so we can ensure that clock- dependent objects (such as freq_counters) stored within the map will keep working as if we just (re)started where we left off when the last process stopped updating the map. Normally it is not expected that we update the now offset stored in the map once the map was already created (because of concurrent accesses to the file when multiple processes are attached to it), but in this specific case, we know we are the first process on this host to start working (again) on the file, thus we update the offset as if we created the file ourself, while keeping existing content. It should be backported in 3.3 --- diff --git a/src/stats-file.c b/src/stats-file.c index 6942fe45f..2559277a1 100644 --- a/src/stats-file.c +++ b/src/stats-file.c @@ -925,6 +925,38 @@ int shm_stats_file_prepare(void) * is no longer relevant. So we fix it by applying the one from the * initial process instead */ + if (HA_ATOMIC_LOAD(global_now_ns) > + (now_ns + adjt_offset) + + (unsigned long)SHM_STATS_FILE_HEARTBEAT_TIMEOUT * 1000 * 1000 * 1000) { + /* global_now_ns (which is supposed to be monotonic, as + * with now_ns) is inconsistent with local now_ns (off by + * more than SHM_STATS_FILE_HEARTBEAT_TIMEOUT seconds): global + * is too ahead from local while they are supposed to be close + * to each other. A possible cause for that is that we are + * resuming from a shm-state-file which was generated on another + * host or after a system reboot (monotonic clock is reset + * between reboots) + * + * Since we cannot work with inconsistent global and local + * now_ns, to prevent existing shared records that depend on + * the shared global_now_ns to become obsolete, we manually + * adjust the now_offset so that local now_ns is consistent + * with the global one. + */ + now_ns -= clock_get_now_offset(); + adjt_offset = HA_ATOMIC_LOAD(global_now_ns) - now_ns; + + /* + * While we are normally not supposed to change the shm-stats-file + * offset once it was set, we make an exception here as we + * can safely consider we are the only process working on the + * file (after reboot or host migration). Doing this ensure + * future processes from the same host will use the corrected + * offset right away. + */ + shm_stats_file_hdr->now_offset = adjt_offset; + } + now_ns = now_ns + adjt_offset; start_time_ns = start_time_ns + adjt_offset; clock_set_now_offset(shm_stats_file_hdr->now_offset);