From: Carl Vuosalo Date: Wed, 28 May 2025 19:17:38 +0000 (+0000) Subject: Fix SNMP cacheNumObjCount -- number of cached objects (#2053) X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=d048024f7d8e7c9819f52ba30625e1eb2be0f241;p=thirdparty%2Fsquid.git Fix SNMP cacheNumObjCount -- number of cached objects (#2053) SNMP counter cacheNumObjCount used StoreEntry::inUseCount() stats. For Squid instances using a rock cache_dirs or a shared memory cache, the number of StoreEntry objects in use is usually very different from the number of cached objects because these caches do not use StoreEntry objects as a part of their index. For all instances, inUseCount() also includes ongoing transactions and internal tasks that are not related to cached objects at all. We now use the sum of the counters already reported on "on-disk objects" and "Hot Object Cache Items" lines in "Internal Data Structures" section of `mgr:info` cache manager report. Due to floating-point arithmetic, these stats are approximate, but it is best to keep SNMP and cache manager reports consistent. This change does not fix SNMP Gauge32 overflow bug: Caches with 2^32 or more objects continue to report wrong/smaller cacheNumObjCount values. ### On MemStore::getStats() and StoreInfoStats changes To include the number of memory-cached objects while supporting SMP configurations with shared memory caches, we had to change how cache manager code aggregates StoreInfoStats::mem data collected from SMP worker processes. Before these changes, `StoreInfoStats::operator +=()` used a mem.shared data member to trigger special aggregation code hack, but * SNMP-specific code cannot benefit from that StoreInfoStats aggregation because SNMP code exchanges simple counters rather than StoreInfoStats objects. `StoreInfoStats::operator +=()` is never called by SNMP code. Instead, SNMP uses Snmp::Pdu::aggregate() and friends. * We could not accommodate SNMP by simply adding special aggregation hacks directly to MemStore::getStats() because that would break critical "all workers report about the same stats" expectations of the special hack in `StoreInfoStats::operator +=()`. To make both SNMP and cache manager use cases work, we removed the hack from StoreInfoStats::operator +=() and hacked MemStore::getStats() instead, making the first worker responsible for shared memory cache stats reporting (unlike SMP rock diskers, there is no single kid process dedicated to managing a shared memory cache). StoreInfoStats operator now uses natural aggregation logic without hacks. TODO: After these changes, StoreInfoStats::mem.shared becomes essentially unused because it was only used to enable special aggregation hack in StoreInfoStats that no longer exists. Remove? --- diff --git a/CONTRIBUTORS b/CONTRIBUTORS index d9156cae0f..51abcd99e8 100644 --- a/CONTRIBUTORS +++ b/CONTRIBUTORS @@ -88,6 +88,7 @@ Thank you! Brian Degenhardt Brian Denehy Bruce Murphy + Carl Vuosalo Carson Gaspar Carson Gaspar Carsten Grzemba diff --git a/src/MemStore.cc b/src/MemStore.cc index 660dd58cd2..97abd7ca5f 100644 --- a/src/MemStore.cc +++ b/src/MemStore.cc @@ -207,6 +207,14 @@ MemStore::getStats(StoreInfoStats &stats) const const size_t pageSize = Ipc::Mem::PageSize(); stats.mem.shared = true; + + // In SMP mode, only the first worker reports shared memory stats to avoid + // adding up same-cache positive stats (reported by multiple worker + // processes) when Coordinator aggregates worker-reported stats. + // See also: Store::Disk::doReportStat(). + if (UsingSmp() && KidIdentifier != 1) + return; + stats.mem.capacity = Ipc::Mem::PageLimit(Ipc::Mem::PageId::cachePage) * pageSize; stats.mem.size = diff --git a/src/StoreStats.cc b/src/StoreStats.cc index 305e74d636..dc4220e93a 100644 --- a/src/StoreStats.cc +++ b/src/StoreStats.cc @@ -23,22 +23,10 @@ StoreInfoStats::operator +=(const StoreInfoStats &stats) // Assume that either all workers use shared memory cache or none do. // It is possible but difficult to report correct stats for an arbitrary // mix, and only rather unusual deployments can benefit from mixing. - - // If workers share memory, we will get shared stats from those workers - // and non-shared stats from other processes. Ignore order and also - // ignore other processes stats because they are zero in most setups. - if (stats.mem.shared) { // workers share memory - // use the latest reported stats, they all should be about the same - mem.shared = true; - mem.size = stats.mem.size; - mem.capacity = stats.mem.capacity; - mem.count = stats.mem.count; - } else if (!mem.shared) { // do not corrupt shared stats, if any - // workers do not share so we must add everything up - mem.size += stats.mem.size; - mem.capacity += stats.mem.capacity; - mem.count += stats.mem.count; - } + mem.shared = mem.shared || stats.mem.shared; // TODO: Remove mem.shared as effectively unused? + mem.size += stats.mem.size; + mem.capacity += stats.mem.capacity; + mem.count += stats.mem.count; store_entry_count += stats.store_entry_count; mem_object_count += stats.mem_object_count; diff --git a/src/snmp_agent.cc b/src/snmp_agent.cc index 3b1e2df3dd..193238b799 100644 --- a/src/snmp_agent.cc +++ b/src/snmp_agent.cc @@ -25,6 +25,8 @@ #include "StatCounters.h" #include "StatHist.h" #include "Store.h" +#include "store/Controller.h" +#include "StoreStats.h" #include "tools.h" #include "util.h" @@ -409,11 +411,14 @@ snmp_prfSysFn(variable_list * Var, snint * ErrP) SMI_GAUGE32); break; - case PERF_SYS_NUMOBJCNT: + case PERF_SYS_NUMOBJCNT: { + StoreInfoStats stats; + Store::Root().getStats(stats); Answer = snmp_var_new_integer(Var->name, Var->name_length, - (snint) StoreEntry::inUseCount(), + (snint) (stats.mem.count + stats.swap.count), SMI_GAUGE32); break; + } default: *ErrP = SNMP_ERR_NOSUCHNAME;