Florian Forster [Sun, 17 Dec 2023 08:45:11 +0000 (09:45 +0100)]
resource_metrics: Skip duplicate metrics instead of erroring out.
The semantics have been changed to simply ignore metrics that are already
in the set. The previous semantic was optimized for a "add and if that
fails flush and try again" plugin behavior. With the support for multiple
instances of the same metric (at different times), there no longer is a
need to ensure metrics in the set are unique.
Florian Forster [Sat, 16 Dec 2023 11:58:37 +0000 (12:58 +0100)]
resource_metrics: Fix bugs surfaced by the unit test.
* Sort metric families after inserting.
* Dereference the pointer-pointer returned by `bsearch` in `lookup_family`.
* Delete the metrics created by `metric_family_clone`.
* Run ./contrib/format.sh
Florian Forster [Sat, 16 Dec 2023 07:26:26 +0000 (08:26 +0100)]
New utility: resource_metrics.
Resource metrics allows plugin to stage metric families, grouped by
resource. This is designed to work well with plugins that export the
OpenTelemetry protocol.
Florian Forster [Wed, 29 Nov 2023 12:41:32 +0000 (13:41 +0100)]
write_riemann plugin: use reference counting to when freeing user data.
While reference counting was present previously, it had problems:
* The reference passed to `plugin_register_flush()` was not counted.
* If a flush plugin was registered, `free_func` was set to NULL but
the reference passed to `plugin_register_notification()` was still
counted, meaning in that case the counter never went to zero.
* Mutexes must be unlocked when calling `pthread_mutex_destroy()`.
* The code limped on after an error, returning a failure eventually.
This is unnecessarily complex control flow that has been simplified.
Leonard Göhrs [Fri, 24 Mar 2023 09:49:24 +0000 (10:49 +0100)]
src/write_http.c: use reference counting to decide when to free user_data
The teardown code for the wh_callback_t struct previously relied on
the order in which the different callback functions are de-initialized
to be known and to never change.
This is prone to failure and is indeed currently broken, leading to a
segmentation fault on collectd exit.
Fix this by counting the active references to the user data and freeing
it once it reaches zero.
Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
Apparently we defined a bunch of `plugin_foo` variables that were never
used. The generated list from the arguments to `AC_PLUGIN` now appear as
duplicates. This removes the previously unused definitions and leaves
the generated ones.
Florian Forster [Tue, 28 Nov 2023 18:31:11 +0000 (19:31 +0100)]
procevent plugin: remove use of a nested flexible array member.
The previous code used an ad-hoc struct to construct or parse a Netlink
message. This relied on allocating a field _after_ the struct with the
flexible array member, which is prohibited by the C standard, leading to
compiler warnings.
Florian Forster [Sat, 25 Nov 2023 13:12:59 +0000 (14:12 +0100)]
SMART plugin: initialize struct passed to `ioctl(2)`.
Valgrind is complaining about a conditional jump based on uninitialized
memory:
```
==66462== Conditional jump or move depends on uninitialised value(s)
==66462== at 0x10C500: smart_read_nvme_intel_disk (in /__w/collectd/collectd/test_plugin_smart)
==66462== by 0x10D366: test_x (in /__w/collectd/collectd/test_plugin_smart)
==66462== by 0x10D638: main (in /__w/collectd/collectd/test_plugin_smart)
```
This may be due to the `struct nvme_additional_smart_log` being
uninitialized when it's being passed to `ioctl(2)`.
This there, this removed an unnecessary level of indentation.
Florian Forster [Sat, 25 Nov 2023 12:51:57 +0000 (13:51 +0100)]
Netlink plugin: complete initialize structs used for testing.
Valgrind complains about a conditional jump based on uninitialized
memory:
```
==66438== Conditional jump or move depends on uninitialised value(s)
==66438== at 0x10CA06: vf_info_submit (in /__w/collectd/collectd/test_plugin_netlink)
==66438== by 0x1110F2: test_vf_submit_test (in /__w/collectd/collectd/test_plugin_netlink)
==66438== by 0x112EAC: main (in /__w/collectd/collectd/test_plugin_netlink)
```
This is likely caused by the `vf_stats_t` being only partially
initialized. Using a struct initializer is not only cleaner, it also
ensures the remainder of the struct is initialized to zero.
Jim Klimov [Wed, 31 Aug 2022 13:32:46 +0000 (15:32 +0200)]
configure.ac: if neither UPSCONN{,_t} type was found, refuse to build NUT plugin
NOTE: src/nut.c also has pragmas to error out in this situation,
but that handling is compiler-dependent and happens too late in
the checkout/configure/build loop.
Presumably this inability to find the type in the earlier-found header file
is also triggered by build environment "inconsistencies" like lack of basic
types in the libc implementation (maybe highlighting the need for additional
headers or macros for the platform).
Jim Klimov [Wed, 31 Aug 2022 09:40:01 +0000 (11:40 +0200)]
configure.ac, src/nut.c: detect int types required by NUT API we build against
Either use the stricter int types required by NUT headers since v2.8.0 release,
or the relaxed (arch-dependent) types required by older NUT releases - depending
on which NUT API version the collectd is building against at the moment.
Inspired by discussion at https://github.com/networkupstools/nut/issues/1638
Florian Forster [Tue, 28 Nov 2023 13:42:54 +0000 (14:42 +0100)]
cpu plugin: Fix potential buffer overflow.
```
In function 'cpu_commit_without_aggregation',
inlined from 'cpu_commit' at src/cpu.c:563:5,
inlined from 'cpu_read' at src/cpu.c:925:3:
src/cpu.c:534:50: note: directive argument in the range [0, 18446744073709551614]
534 | snprintf(cpu_num_str, sizeof(cpu_num_str), "%zu", cpu_num);
| ^~~~~
src/cpu.c:534:7: note: 'snprintf' output between 2 and 21 bytes into a destination of size 16
534 | snprintf(cpu_num_str, sizeof(cpu_num_str), "%zu", cpu_num);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
Florian Forster [Fri, 24 Nov 2023 07:28:06 +0000 (08:28 +0100)]
curl_stats: fix compatibility with new versions of cURL.
Use integer based keys for metrics if available.
cURL ≥ 7.55.0 provides additional keys that allow getting certain
metrics as integers rather than doubles, e.g. content length. In some
newer versions of cURL, the original keys (using doubles) are marked as
deprecated.
ChangeLog: cURL, cURL-JSON, cURL-XML, Write HTTP plugins: fix compatibility with new versions of cURL.
Florian Forster [Tue, 28 Nov 2023 12:56:27 +0000 (13:56 +0100)]
configure: disable all plugins not yet supporting collectd 6.
This should allow us (and users) to just run `./configure` without
further flags, lowering the barrier to entry. It also allows us to
remove these configure flags from the CI configuration.
Eero Tamminen [Fri, 24 Nov 2023 17:05:51 +0000 (19:05 +0200)]
gpu_sysman: rename "counter" output variant to more generic "base"
And make it control output for all base metric values, not just
counters. That allows disabling output of values for:
- Memory usage
- Frequency
- Temperature
If one wants to see only their rates.
That will be useful with the new "LogMetrics" option in next commit.
Did also small optimization for output variant checks (no need for
free() if they're moved earlier).
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
- Move enabled/disabled metric reporting to a separate function
- Report metrics enabling and metric details enabling separately
- Error if all metrics are disabled, regardless of detail options
- Explicitly log what metrics are still being reported if any of
them were disabled at run-time
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
Leonard Göhrs [Mon, 9 Jan 2023 13:43:44 +0000 (14:43 +0100)]
[collectd 6] exec: make PUTMETRIC work
The cmd_handle_putmetric function checks if the command actually is
a PUTMETRIC command, at least that is what is supposed to check.
Prior to this fix it actually checked for PUTVAL and always prints a
-1 Unexpected command: `PUTMETRIC'.
error. While at it also remove the development printf that results in
Leonard Göhrs [Thu, 5 Jan 2023 09:11:23 +0000 (10:11 +0100)]
[collectd 6] exec: add PUTMETRIC command
Most existing setups using exec will use PUTVAL, which should just continue
to work with collectd 6 due to the plugin_dispatch_values compatibility
function.
New plugins should however use the new PUTMETRIC.
The respective command handler already exists. This commit just pipes it
through.
Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
Leonard Göhrs [Fri, 24 Mar 2023 13:03:43 +0000 (14:03 +0100)]
src/daemon/plugin.c: init fam.{time,interval} before uc_update(fam) on dispatch
The changes in commit
55efb56a ([collectd 6] src/daemon/plugin.c: Use one thread per write plugin)
changed the order in which the time and interval on a to-be-dispatched metric
family were set and uc_update() was called on the metric family.
The time and interval inside a metric family have to be set before calling
uc_update() on it, otherwise warnings like these will be generated en masse[1]:
uc_update: Value too old: … value time = 0.000; last cache update = 0.000;
And the "missing" handlers for the metric family will be called resulting in
plugins like write_prometheus dropping the value, resulting in it not showing
up when queried [2].
This change requires the inroduction of a second metric_family_clone() in the
dispatch path:
plugin_dispatch_metric_family()
| \
| Has to modify the passed fam to initialize the correct time and interval.
| So it has to metric_family_clone() the input.
v
plugin_dispatch_metric_internal()
| \
| Calls the filter chains and uc_update(fam), which both require correct
| times and intervals to be set.
|
| plugin_dispatch_metric_internal() calls either fc_process_chain() or
v <-- fc_default_action() depending on if a chain is configured.
|\
| fc_process_chain()
| |
| v
| plugin_write()
| \
| A chain may call plugin_write() multiple times with the same fam.
| This means plugin_write() can not take "ownership" of the fam,
| put it in a queue and free it once it feels like it.
| It must instead create another clone to put into the queue.
v
fc_default_action()
|
v
plugin_write()
\
This is the much more common case, as filter chains are a niche feature,
and in this case we could actually transfer ownership of the fam to
plugin_write and save a clone, but we would have to tell plugin_write()
that it is responsible for freeing the passed fam.
The negative performance impact of the clone could be mitigated by adding a
reference count to the metric_family_t struct and only freeing it once all
references to it are gone. But that would be a larger change and not a bug fix.
Only fix the "uninitialized time and interval" bug for now.
Leonard Göhrs [Thu, 23 Mar 2023 12:12:27 +0000 (13:12 +0100)]
src/daemon/plugin.c: don't store references to stack allocated values
The changes in commit
55efb56a ([collectd 6] src/daemon/plugin.c: Use one thread per write plugin)
wrongly assume that the references passed in to plugin_register_write()
somehow outlive the spawned write thread.
While this is true for some plugins that pass staticly allocated strings
and global user_data_t structs to plugin_register_write() it is not correct
for all plugins.
See [1] for an example of a plugin (write_http) mis-behaving due to this.
Store owned versions of the passed values instead. For user_data this means
the content of the struct and for the name it means a strdup()ed version of
the string.
The behaviour of plugin_write with regards to fam being NULL has not changed
with the switch to one thread per write plugin, so the documentation should
not change as well.
Leonard Göhrs [Fri, 28 Oct 2022 08:02:01 +0000 (10:02 +0200)]
[collectd 6] src/collectd.conf.pod polish the WriteQueueLimitHigh/Low docs
The behaviour of LimitHight/LimitLow has changed with the switch to one
thread per write plugin. This warrants a rewrite of the respective
documentation. Thanks to @eero-t for the suggestion.
Leonard Göhrs [Mon, 15 Aug 2022 06:43:51 +0000 (08:43 +0200)]
[collectd 6] src/daemon/plugin.c: restore previous position of plugin_write
The new queue design resulted in plugin_write being based on
enqueue_metric_family, which resulted in the functions moving around in the
file. This made the diff harder to read. Restore old position.
Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>