Florian Forster [Mon, 22 Jan 2024 17:07:31 +0000 (18:07 +0100)]
disk plugin: Revamp the Linux code.
* Use `counter_diff` to calculate counter differences.
* Use `counter_t` as we actually want the counter overflow behavior for these metrics.
* Remove the `has_...` fields from the global data struct.
* Use `value_to_rate()` to calculate disk busyness.
* Use `strtoull()` to parse counter values.
Florian Forster [Mon, 18 Dec 2023 13:40:06 +0000 (14:40 +0100)]
disk plugin: Align metrics with OpenTelemetry recommendations.
* Rename labels to `system.device` and `disk.io.direction`.
* Rename `system.disk.time` to `system.disk.operation_time`.
* Add descriptions and units to all metric families.
* Add the "utilization" metric to FreeBSD.
Florian Forster [Wed, 10 Jan 2024 16:57:15 +0000 (17:57 +0100)]
cpu plugin: Remove tests for `usage_global_ratio` and `usage_global_count`.
The functionality is tested in the test cases for `usage_ratio` and
`usage_count` and there is no need to test these separately. The opposite: the
rest of the CPU plugin only uses `usage_ratio` and `usage_count`, so testing
the global variants leaks abstraction.
Florian Forster [Thu, 4 Jan 2024 16:58:52 +0000 (17:58 +0100)]
cpu plugin: Simplify the configuration options available.
* The options `ReportUsage`, `ReportUtilization`, and `ReportNumCpu` control
which metrics are emitted on a high level. Other options no longer influence
*what* is being collected. This also allows to report usage and utilization
metrics simultaneously.
* The documentation has been updated to reflect that the plugin no longer
emits a percentage, but a ratio for utilization metrics.
Florian Forster [Sun, 17 Dec 2023 13:50:20 +0000 (14:50 +0100)]
cpu plugin: Align metrics with OpenTelemetry recommendations.
* Add metric description and unit.
* Update label names (e.g. "cpu" → "system.cpu.logical_number")
* Divide rates by number of CPUs, so that the sum of all rates equals to 1.
(Previously the sum of all rates was equal to the number of logical CPUs)
* Remove the "cpu=total" label when aggregating CPUs.
Eero Tamminen [Wed, 17 Jan 2024 18:49:20 +0000 (20:49 +0200)]
gpu_sysman: change configure to use pkg-config for dependency
And enable Sysman plugin automatically when dependency is found.
While GPU plugin uses just Sysman API subset of the Level-Zero API
family, name of the used library / loader is level-zero => dropped
"--with-sysman" option and renamed configure variables.
Also, now that Level-Zero packages are in the distributions, Sysman
API GPU plugin could be enabled by default when pkg-config finds
level-zero. This simplifies config code.
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Florian Forster [Thu, 4 Jan 2024 07:58:14 +0000 (08:58 +0100)]
memory plugin: Remove "slab" and "available" memory.
Under Linux, the plugin reported "slab" (kernel data structures cache) and
"available" (estimate of how much memory might be available without swapping).
These metrics group memory along different dimensions than other reported
memory, leading to double counting of some memory. The result was that the sum
of all memory metrics was not constant.
Florian Forster [Thu, 11 Jan 2024 19:35:30 +0000 (20:35 +0100)]
write_riemann plugin: Terminate `riemann_event_set` arguments with `RIEMANN_EVENT_FIELD_NONE`.
`riemann_event_set` is a variadic function, that means it accepts a variable
number of arguments. That means it needs some way to determine – at runtime
– how many arguments there are. It appears to be doing so by using
`RIEMANN_EVENT_FIELD_NONE` to indicate the last element in the argument
list. Unfortunately I was unable to find the library's documentation and
code and could not verify this.
That means that the argument list passed to `riemann_event_set` was not
always terminated, causing it to read past where it was supposed to and
adding random crap into the message it crafted.
Florian Forster [Fri, 12 Jan 2024 16:51:07 +0000 (17:51 +0100)]
common: Overhaul the `sstrncpy` implementation.
Properly check all arguments and behave in a sane manner, i.e. don't crash.
I went back and forth a few times on whether to return `NULL` or `dest` when `n == 0`.
* On the one hand, `n == 0` is not really an error and a situation that could
naturally occur, e.g. when you're implementing code that appends to the end
of a string.
* On the other hand, if we return `NULL` when `n` is zero we can guarantee
that we will either return `NULL` or a null terminated string.
Ultimately I decided to go with the stronger guarantee, i.e.
Florian Forster [Thu, 11 Jan 2024 20:44:39 +0000 (21:44 +0100)]
common: Reserve a null byte when calling `strncpy`.
While `sstrncpy` guarantees a null terminated string, some compilers don't get
the memo and complain about the buffer size being equal to the size provided to
*strncpy(3)*. This *is* a potential source of error with *strncpy(3)*, because
if the source string is longer than the buffer, the buffer is not null
terminated. That is the precise reason `sstrncpy` exists in the first place.
Make these compilers happy by decreasing the size passed to *strncpy(3)* by
one.
Florian Forster [Wed, 3 Jan 2024 17:39:52 +0000 (18:39 +0100)]
write_open_telemetry plugin: Fix minor memory leak.
Error:
```
==171106== 8 bytes in 1 blocks are definitely lost in loss record 58 of 228
==171106== at 0x48407B4: malloc (vg_replace_malloc.c:381)
==171106== by 0x492E7F9: strdup (strdup.c:42)
==171106== by 0x1131F9: cf_util_get_string (configfile.c:1127)
==171106== by 0x4EA3FD0: ot_config_node (write_open_telemetry.cc:206)
==171106== by 0x4EA3FD0: ??? (write_open_telemetry.cc:262)
==171106== by 0x111794: dispatch_block_plugin (configfile.c:464)
==171106== by 0x1119B9: dispatch_block (configfile.c:508)
==171106== by 0x11313D: cf_read (configfile.c:1100)
==171106== by 0x11034E: configure_collectd (collectd.c:356)
==171106== by 0x1104B6: init_config (collectd.c:406)
==171106== by 0x121F3F: main (cmd.c:167)
```