From: Thomas Markwalder Date: Wed, 3 Apr 2024 17:31:48 +0000 (-0400) Subject: [#3297] Updated ARM, added a ChangeLog X-Git-Tag: Kea-2.5.8~94 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=f39b4692f8f707cda35c34c2e839b8da34517e58;p=thirdparty%2Fkea.git [#3297] Updated ARM, added a ChangeLog --- diff --git a/ChangeLog b/ChangeLog index 1de4eef37e..b300a0d673 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,10 @@ +2216. [func] tmark + PerfMon hook library is now functional. It accumulates + and reports performance data, and supports alarms. Still + lacking are API commands. + (Gitlab #3297) + + 2215. [bug] tmark Corrected an issue that can cause kea-dhcp-ddns to stop processing queued requests. Thanks to Shawn diff --git a/doc/sphinx/arm/hooks-perfmon.rst b/doc/sphinx/arm/hooks-perfmon.rst index 224a0ad4d4..8e27f28724 100644 --- a/doc/sphinx/arm/hooks-perfmon.rst +++ b/doc/sphinx/arm/hooks-perfmon.rst @@ -9,7 +9,8 @@ to extend them with the ability to track and report performance related data. .. note:: - This library is currently under development and not fully functional. + This library is experimental and not recommended for use in production + environments. Overview ~~~~~~~~ @@ -93,23 +94,127 @@ uniquely identified by a "duration key" which consists of the following values: Once the server has finished constructing a response to a query, the query's event stack is processed into a series of updates to monitored durations. If upon updating, a duration's sample interval is found to have been completed, it is sent to reporting -and a new sample interval is begun. - -.. Note: - Monitoring is not yet functional. +and a new sample interval is begun. The interval width is dictacted by configuration +parameter ``interval-width-secs``. + +The event stack for the multi-threaded mode DHCPDISCOVER/DHCPOFFER cycle shown above +contains the following events: + + +-----------------------------+--------------------+ + | Event Timestamp | Event Label | + +=============================+====================+ + | 2024-Mar-20 14:52:20.067563 | socket_received | + +-----------------------------+--------------------+ + | 2024-Mar-20 14:52:20.067810 | buffer_read | + +-----------------------------+--------------------+ + | 2024-Mar-20 14:52:20.067897 | mt_queued | + +-----------------------------+--------------------+ + | 2024-Mar-20 14:52:20.067952 | process_started | + +-----------------------------+--------------------+ + | 2024-Mar-20 14:52:20.069614 | process_completed | + +-----------------------------+--------------------+ + +Assuming the selected subnet's ID was 100, the duration updates formed by PerfMon +from these events are shown below: + + +--------------------------------------------------------------+--------------+ + | Duration Keys for SubnetID 100 | Update in | + | | microseconds | + +==============================================================+==============+ + | DHCPDISCOVER.DHCPOFFER.socket_received-buffer_read.100 | 247 | + +--------------------------------------------------------------+--------------+ + | DHCPDISCOVER.DHCPOFFER.buffer_read-mt_queue.100 | 87 | + +--------------------------------------------------------------+--------------+ + | DHCPDISCOVER.DHCPOFFER.mt_queued-process_started.100 | 55 | + +--------------------------------------------------------------+--------------+ + | DHCPDISCOVER.DHCPOFFER.process_started-process_completed.100 | 1662 | + +--------------------------------------------------------------+--------------+ + | DHCPDISCOVER.DHCPOFFER.composite-total_response.100 | 2051 | + +--------------------------------------------------------------+--------------+ + +Notice that in addition to the adjacent event updates, there is an additional duration +update for the total duration of the entire stack whose key contains the event-pair +``composite-total_response``. This tracks the total time to responds from query +received until the response is ready to send. Finally, there would also be global +duration updates for each of the above: + + +--------------------------------------------------------------+--------------+ + | Global Duration Keys | Update in | + | | milliseconds | + +==============================================================+==============+ + | DHCPDISCOVER.DHCPOFFER.socket_received-buffer_read.0 | 247 | + +--------------------------------------------------------------+--------------+ + | DHCPDISCOVER.DHCPOFFER.buffer_read-mt_queue.0 | 87 | + +--------------------------------------------------------------+--------------+ + | DHCPDISCOVER.DHCPOFFER.mt_queued-process_started.0 | 55 | + +--------------------------------------------------------------+--------------+ + | DHCPDISCOVER.DHCPOFFER.process_started-process_completed.0 | 1662 | + +--------------------------------------------------------------+--------------+ + | DHCPDISCOVER.DHCPOFFER.composite-total_response.0 | 2051 | + +--------------------------------------------------------------+--------------+ + +Statistics Reporting +~~~~~~~~~~~~~~~~~~~~ + +When enabled (see ``stats-mgr-reporting``), PerfMon will report a duration's data +data each time the duration completes a sampling interval. Each statistic employs +the following naming convention: + +:: + + {subnet-id[x]}.perfmon.-.-. + +And there will be both a global and a subnet-specific value for each. Currently the only +value reported for a given duration key is "average-ms". This statistic is the average time +between the duration's event pair over the most recently completed interval. In other +words if during a given interval there were 7 occurrences (i.e. updates) totaling +350ms, the the average-ms reported would be 50ms. Continuing with example above, the +statistics reported would be named as follows for subnet level values: + +:: + + subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.socket_received-buffer_read.average-ms + subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.buffer_read-mt_queue.average-ms + subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.mt_queued-process_started.average-ms + subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.process_started-process_completed.average-ms + subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.composite-total_response.average-ms + +and as shown for global values: + +:: + + perfmon.DHCPDISCOVER.DHCPOFFER.socket_received-buffer_read.average-ms + perfmon.DHCPDISCOVER.DHCPOFFER.buffer_read-mt_queue.average-ms + perfmon.DHCPDISCOVER.DHCPOFFER.mt_queued-process_started.average-ms + perfmon.DHCPDISCOVER.DHCPOFFER.process_started-process_completed.average-ms + perfmon.DHCPDISCOVER.DHCPOFFER.composite-total_response.average-ms + +Since they are reported to StatsMgr they may be fetched using the commands :isccmd:`statistic-get-all` +or :isccmd:`statistic-get`. Alarms ~~~~~~ -Alarms may be defined to watch specific durations. Each alarm defines a high-water mark and a -low-water mark. When the reported average value for duration exceeds the high-water mark, a -WARN level alarm log will be emitted at which point the alarm is considered "triggered". Once -triggered the WARN level log will be repeated at a specified alarm report interval as long the -reported average for the duration remains above the low-water mark. Once the average falls -below the low-water mark the alarm is "cleared" and an INFO level log will be emitted. +Alarms may be defined to watch specific durations. Each alarm defines a high-water mark, +``high-water-ms`` and a low-water mark, ``low-water-ms``. When the reported average value +for duration exceeds the high-water mark, a WARN level alarm log will be emitted at which +point the alarm is considered "triggered". Once triggered the WARN level log will be +repeated at a specified alarm report interval, ``alarm-report-secs`` as long the reported +average for the duration remains above the low-water mark. Once the average falls below the +low-water mark the alarm is "cleared" and an INFO level log will be emitted. + +The alarm triggered WARN log will look similar to the following: + +:: + + 2024-03-20 10:22:14.030 WARN [kea-dhcp6.leases/47195.139913679886272] PERFMON_ALARM_TRIGGERED Alarm for DHCPDISCOVER.DHCPOFFER.composite-total_response.0 has been triggered since 2024-03-20 10:18:20.070000, reported average duration 00:00:00.700000 exceeds high-water-ms: 500 + + +and he alarm cleared INFO log will look similar to the following: + +:: -.. Note: - Alarms are not yet functional. + 2024-03-20 10:30:14.030 INFO [kea-dhcp6.leases/47195.139913679886272] PERFMON_ALARM_CLEARED Alarm for DHCPDISCOVER.DHCPOFFER.composite-total_response.0 has been cleared, reported average duration 00:00:00.010000 is now below low-water-ms: 25 API Commands ~~~~~~~~~~~~ @@ -153,7 +258,7 @@ An example of the anticipated configuration is shown below: Where: * enable-monitoring - Enables event data aggregation for reporting, statisitcs, and alarms. Defaults to false. + Enables event data aggregation for reporting, statistics, and alarms. Defaults to false. * interval-width-secs The amount of time, in seconds, that individual task durations are accumulated into an aggregate before it is reported. Default is 60 seconds. @@ -167,7 +272,7 @@ Where: defined by the following: * duration-key - Idenitifies the monitored duration to watch + Identifies the monitored duration to watch * query-type - Message type of the client query (e.g.DHCPDISCOVER, DHCPV6_REQUEST) * response-type - Message type of the server response (e.g. DHCPOFFER, DHCPV6_REPLY) diff --git a/src/hooks/dhcp/perfmon/perfmon_messages.cc b/src/hooks/dhcp/perfmon/perfmon_messages.cc index 32edeff45a..7c390503ca 100644 --- a/src/hooks/dhcp/perfmon/perfmon_messages.cc +++ b/src/hooks/dhcp/perfmon/perfmon_messages.cc @@ -20,7 +20,7 @@ extern const isc::log::MessageID PERFMON_INIT_OK = "PERFMON_INIT_OK"; namespace { const char* values[] = { - "PERFMON_ALARM_CLEARED", "Alarm for %1 has been cleared, reported average duration %2 is now beloe low-water-ms: %3", + "PERFMON_ALARM_CLEARED", "Alarm for %1 has been cleared, reported average duration %2 is now below low-water-ms: %3", "PERFMON_ALARM_TRIGGERED", "Alarm for %1 has been triggered since %2, reported average duration %3 exceeds high-water-ms: %4", "PERFMON_DEINIT_FAILED", "unloading PerfMon hooks library failed: %1", "PERFMON_DEINIT_OK", "unloading PerfMon hooks library successful",