]> git.ipfire.org Git - thirdparty/collectd.git/log
thirdparty/collectd.git
2 years ago[collectd 6] disk: migration to v6.0
Manuel Luis Sanmartín Rozada [Wed, 20 Jan 2021 22:35:43 +0000 (23:35 +0100)] 
[collectd 6] disk: migration to v6.0

2 years ago[collectd 6] contextswitch: migration to v6.0
Manuel Luis Sanmartín Rozada [Wed, 20 Jan 2021 23:02:18 +0000 (00:02 +0100)] 
[collectd 6] contextswitch: migration to v6.0

2 years ago[collectd 6] memory: add laundry and user wired pages (#3962)
François Charlier [Tue, 15 Feb 2022 12:02:55 +0000 (13:02 +0100)] 
[collectd 6] memory: add laundry and user wired pages (#3962)

This ports "add laundry and user wired pages (#3962)" (de33b26ba4d67)
from the main branch to collectd-6.0.

Changelog: memory: report for laundry and user_wire pages on FreeBSD

Add the `vm.stats.vm.v_laundry_count` and
`vm.stats.vm.v_user_wire_count`  which have been present on FreeBSD for
a little while now.

2 years ago[collectd 6] port Do not account reclaimable slab as used
Weiping Zhang [Mon, 17 May 2021 07:59:50 +0000 (15:59 +0800)] 
[collectd 6] port Do not account reclaimable slab as used

This ports "Do not account reclaimable slab as used" (77e2fcd91e27)
from the main branch to collectd-6.0

ChangeLog: memory plugin: do not account reclaimable slab as used.

Align this counter with free(1).
https://gitlab.com/procps-ng/procps/-/blob/v3.3.17/proc/sysinfo.c#L789

2 years ago[collectd 6] port Report MemAvailable when present in meminfo (#3916)
Leonard Göhrs [Tue, 21 Sep 2021 06:32:57 +0000 (08:32 +0200)] 
[collectd 6] port Report MemAvailable when present in meminfo (#3916)

This ports "Report MemAvailable when present in meminfo (#3916)" (848b2394dc2)
from the main branch to collectd-6.0.

2 years ago[collectd 6] chrony: migration to v6.0
Manuel Luis Sanmartín Rozada [Tue, 26 Jan 2021 23:53:00 +0000 (00:53 +0100)] 
[collectd 6] chrony: migration to v6.0

2 years agogpu_sysman: Add fabric port metric type and related metrics support
Eero Tamminen [Thu, 17 Feb 2022 18:02:50 +0000 (20:02 +0200)] 
gpu_sysman: Add fabric port metric type and related metrics support

Already in L0 spec v1.0.

Fabric ports have a lot of properties which means lot of labels and
extra functions.  This required increasing some things in the test
code too.

2 years ago[collectd 6] disable the same plugins in github worker as in cirrus CI
Leonard Göhrs [Mon, 30 Jan 2023 14:06:09 +0000 (15:06 +0100)] 
[collectd 6] disable the same plugins in github worker as in cirrus CI

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
2 years agocirrus: drop CentOS6
Kentaro Hayashi [Sat, 5 Dec 2020 04:05:57 +0000 (13:05 +0900)] 
cirrus: drop CentOS6

It had been reached EOL at November 30th, 2020.
There is no security updates available anymore.

ref. https://wiki.centos.org/About/Product

Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
2 years ago[collectd 6] trigger github worker on collectd-6.0 branch
Leonard Göhrs [Mon, 30 Jan 2023 13:09:28 +0000 (14:09 +0100)] 
[collectd 6] trigger github worker on collectd-6.0 branch

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
2 years agoUpdate build.yml
Alex [Mon, 29 Aug 2022 18:57:45 +0000 (19:57 +0100)] 
Update build.yml

Signed-off-by: sashashura <93376818+sashashura@users.noreply.github.com>
2 years ago[gha] Add el9_x86_64 to tested distros
Emma Foley [Thu, 9 Jun 2022 17:43:46 +0000 (18:43 +0100)] 
[gha] Add el9_x86_64 to tested distros

2 years ago[CI] Update Fedora versions used in GHA
Emma Foley [Thu, 9 Jun 2022 13:49:26 +0000 (14:49 +0100)] 
[CI] Update Fedora versions used in GHA

2 years ago[ci][gha] Miscellaneous improvements and sync with cirrus (#3976)
Emma Foley [Tue, 15 Feb 2022 14:14:15 +0000 (14:14 +0000)] 
[ci][gha] Miscellaneous improvements and sync with cirrus (#3976)

* [ci][gha] Rename tasks

* [ci][gha] Update and use MAKEFLAGS

* [ci][gha] Remove continue-on-error from ``make distcheck`` tasks

Installation of bzip2 and make distcheck were failing on el8.
This was resolved by updating it to use CentOS Stream 8 in [1]

[1] https://github.com/collectd/ci-docker/pull/55

2 years ago[gha] Add a test log when the tests fail (#3971)
Emma Foley [Tue, 15 Feb 2022 11:41:38 +0000 (11:41 +0000)] 
[gha] Add a test log when the tests fail (#3971)

* [ci][gha] Add a test log when the tests fail

* [ci][cirrus] Update CI to provide test logs on failure

Co-authored-by: Matthias Runge <mrunge@redhat.com>
2 years agoFix CI failures caused by unsupported distros and updates to dependencies (#3975)
Emma Foley [Tue, 15 Feb 2022 07:46:21 +0000 (07:46 +0000)] 
Fix CI failures caused by unsupported distros and updates to dependencies (#3975)

* [ci][gha] Replace trusy with Bionic and Focal

Ubuntu 14.04 (Trusty) is out of standard support [1].
``make check`` fails for test_capabilities, as noted in [2].
[3] indicates that the cause is glibc, but that updates are not expected
to the version in trusty.

This PR replaces trusty with Ubuntu 18.04 (Bionic) and 20.04 (Focal).

[1] https://wiki.ubuntu.com/Releases
[2] #3936
[3] #3927 (comment)

* [ci][cirrus] Make Valgrind error on defininte memory leaks only

Valgrind gives errors when it find possible leaks;
update the options to only error on definite leaks.

This is done using the VALGRIND_OPTS env var, which is used by valgrind
when it is invoked.

* [ci][gha] Make Valgrind error on defininte memory leaks only

Valgrind gives errors when it find possible leaks;
update the options to only error on definite leaks.

This is done using the VALGRIND_OPTS env var, which is used by valgrind
when it is invoked.

2 years agoReplace travis CI with GHA (#3913)
Emma Foley [Mon, 4 Oct 2021 09:32:28 +0000 (10:32 +0100)] 
Replace travis CI with GHA (#3913)

* [githubactions] Use collectd-ci container to run tests

Uses containers for collectd provided by collectd/ci-docker [1]
Repeats what travis was using for building collectd

* checks out branch
* installs dependencies (already in containers)
* runs the script commands from travis (pkg-config, confgure, make)

[1] https://github.com/collectd/ci-docker

[githubactions] Add config flags to builds

* [githubactions] Add a job for experimental OSes

* [GHA] update actions for new distro containers in collectd/ci-docker

* Mark ``make check`` as optional for now

it is not passing reliably, and is being marked as optional until it is

Co-authored-by: BarometerExperimental <barometer-experimental@container>
2 years agoCreate build.yml (#3911)
Matthias Runge [Thu, 9 Sep 2021 16:33:31 +0000 (18:33 +0200)] 
Create build.yml (#3911)

* Create build.yml

2 years agogpu_sysman: initialize struct .pNext members before use
Eero Tamminen [Thu, 17 Nov 2022 20:19:07 +0000 (22:19 +0200)] 
gpu_sysman: initialize struct .pNext members before use

Next Sysman spec will explictly state that they need be initialized:
https://github.com/oneapi-src/level-zero-spec/commit/98dfaaf041dedfd8c9bcf9a3957f334836e859e4

And latest Sysman backend versions corrupt memory / crash unless .pNext
values in some of the structs given to Get functions are initialized.

(Releases before fall 2022 did not use .pNext values in get* calls,
and worked fine. It just took a long time until I was able to verify
whether this was a regression that will be fixed, or intended change.)

Additionally, validate in test code that .pNext values are set to NULL
(because some structs lack those pointer members, ADD_METRIC() macro
cannot do that check for the <statename> functions given for it, but
otherwise everything is covered).

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: improve power limit handling
Eero Tamminen [Fri, 18 Nov 2022 12:26:06 +0000 (14:26 +0200)] 
gpu_sysman: improve power limit handling

Limits can be reported to only a subset of power domains. Therefore
querying limits (for given GPU) should be disabled only when querying
fails for all domains.

Added also TODO for upcoming spec change I noticed in the spec tracker.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: do metric reset on every loop round
Eero Tamminen [Tue, 8 Nov 2022 16:34:31 +0000 (18:34 +0200)] 
gpu_sysman: do metric reset on every loop round

Not doing metric reset between loop rounds could result in extra
incorrect metric label being reported for a metric, when earlier
metric in the loop had a conditional label, but latter metric does not
satisfy that condition (Sysman call for the info failed, but fail is
ignored, or Sysman struct value used for given label is not set).

This can happen e.g. with the conditional memory "health", frequency
"throttled_by" and power "limit" labels.

Other alternative would be either setting or removing (= using NULL)
values for each of the possible labels on every round.  Just reseting
metric labels on every round seemed more robust (easier to review),
and allowed simplifying the code slightly.

Looking at collectd metric implementation, it causes more allocs /
deallocs for the label array & label names though.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: make freq & mem handling more consistent
Eero Tamminen [Tue, 8 Nov 2022 17:06:54 +0000 (19:06 +0200)] 
gpu_sysman: make freq & mem handling more consistent

Readability/consistency improvement: change frequency and memory
metric handling to use new "reported" boolean instead of cache index,
for checking when metrics need to be submitted.  This is more
consistent how other metric functions handle that.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Minor improvements to test code
Eero Tamminen [Mon, 24 Oct 2022 14:25:23 +0000 (17:25 +0300)] 
gpu_sysman: Minor improvements to test code

Decrease max value and increase how many decimals are shown for metric
values, so that tests verbose logging shows useful values also for
ratios (which are in 0-1 range).

Rest of changes improve 'gpu_sysman.c' test coverage by 1%.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Fix memory metric comments
Eero Tamminen [Mon, 3 Oct 2022 11:31:13 +0000 (14:31 +0300)] 
gpu_sysman: Fix memory metric comments

Caught by Ukri.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: add "throttled_by" label to frequency metric
Eero Tamminen [Fri, 16 Sep 2022 11:58:19 +0000 (14:58 +0300)] 
gpu_sysman: add "throttled_by" label to frequency metric

Which is empty/missing when frequency is not throttled.

Already in L0 spec v1.0.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Provide returned error code when logging Sysman failures
Eero Tamminen [Wed, 16 Feb 2022 10:55:30 +0000 (12:55 +0200)] 
gpu_sysman: Provide returned error code when logging Sysman failures

To help in debugging issues with Sysman API usage.

(Includes minor stylistic improvements from Ukri & Tuomas)

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Add memory "health" label if memory health is known
Eero Tamminen [Mon, 14 Feb 2022 17:23:52 +0000 (19:23 +0200)] 
gpu_sysman: Add memory "health" label if memory health is known

Already in L0 spec v1.0.

Included only to memory usage metrics which are already querying
memory state (unlike memory BW metrics).

2 years agogpu_sysman: Add "pci_dev" label
Eero Tamminen [Fri, 28 Jan 2022 16:06:50 +0000 (18:06 +0200)] 
gpu_sysman: Add "pci_dev" label

On large cluster with different types of GPUs, it helps knowing which
card is of which type, not just their metrics. "pci_dev" label adds
PCI device ID to the device metrics.

Because GPUs within each cluster node are normally supposed to be
identical i.e. differ only between nodes, and additional labels
increase processing load, this is enabled only with the GpuInfo
setting.

Getting additional strings out of gpu_info() function required
refactoring.  GPU index in errors is now output only by gpu_scan(),
and gpu_info() gets pointers to label string pointers instead.

2 years agogpu_sysman: Avoid log warning when ratio output is disabled
Eero Tamminen [Tue, 18 Oct 2022 16:02:59 +0000 (19:02 +0300)] 
gpu_sysman: Avoid log warning when ratio output is disabled

Fixes: 75aeab3a42b5
Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Give variant config examples in doc
Eero Tamminen [Mon, 3 Oct 2022 12:10:46 +0000 (15:10 +0300)] 
gpu_sysman: Give variant config examples in doc

With the given 2 example variant configs, one does not
miss any of the information that Sysman might provide.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Improve metric variant handling code readability
Eero Tamminen [Mon, 3 Oct 2022 09:36:40 +0000 (12:36 +0300)] 
gpu_sysman: Improve metric variant handling code readability

Based on review comments by Tuomas

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: use sstrdup() instead of strdup()
Eero Tamminen [Mon, 19 Sep 2022 14:08:46 +0000 (17:08 +0300)] 
gpu_sysman: use sstrdup() instead of strdup()

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: log which metric is not found in test
Eero Tamminen [Fri, 16 Sep 2022 11:56:52 +0000 (14:56 +0300)] 
gpu_sysman: log which metric is not found in test

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Add ratio variant for power metric type
Eero Tamminen [Thu, 8 Sep 2022 17:18:59 +0000 (20:18 +0300)] 
gpu_sysman: Add ratio variant for power metric type

Needs new internal disable flag because power limit requires new
Sysman call which can fail separately from others (or reported limits
could be disabled).  Because it's not called on first round, that
needs some changes to test checks too.

With all 3 metric variants being supported, variants check can be
removed.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: give more info on call count errors in tests
Eero Tamminen [Thu, 15 Sep 2022 13:51:36 +0000 (16:51 +0300)] 
gpu_sysman: give more info on call count errors in tests

Knowing ID for the function getting incorrectly called or not called,
helps debugging test failures a lot.  This helps with next commit.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Add ratio variant for frequency metric type
Eero Tamminen [Tue, 13 Sep 2022 09:15:58 +0000 (12:15 +0300)] 
gpu_sysman: Add ratio variant for frequency metric type

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Add ratio variant for temperature metric type
Eero Tamminen [Mon, 12 Sep 2022 11:02:03 +0000 (14:02 +0300)] 
gpu_sysman: Add ratio variant for temperature metric type

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Pass properties variable to ADD_METRIC() test macro
Eero Tamminen [Tue, 13 Sep 2022 11:53:47 +0000 (14:53 +0300)] 
gpu_sysman: Pass properties variable to ADD_METRIC() test macro

Avoids need to clutter test code with separate fake properties
functions when overriding metric properties (in future commits).

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Warn if enabled metric has no enabled output variant
Eero Tamminen [Fri, 9 Sep 2022 15:51:59 +0000 (18:51 +0300)] 
gpu_sysman: Warn if enabled metric has no enabled output variant

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Report memory usage ratio only when ratios requested
Eero Tamminen [Fri, 9 Sep 2022 15:42:44 +0000 (18:42 +0300)] 
gpu_sysman: Report memory usage ratio only when ratios requested

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Report rate variant for memory bandwidth
Eero Tamminen [Wed, 7 Sep 2022 17:05:56 +0000 (20:05 +0300)] 
gpu_sysman: Report rate variant for memory bandwidth

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Use better memory test define names + sorting
Eero Tamminen [Tue, 15 Feb 2022 16:05:14 +0000 (18:05 +0200)] 
gpu_sysman: Use better memory test define names + sorting

Rename in preparation for adding more ratio checks, and sort
the counter metric names.

2 years agogpu_sysman: Add "rate" variant to MetricsOutput option
Eero Tamminen [Wed, 7 Sep 2022 14:12:50 +0000 (17:12 +0300)] 
gpu_sysman: Add "rate" variant to MetricsOutput option

And make variants to be handled as flags, so that multiple ones
can be selected.

Also change other MetricsOutput option values to match the names of
actual metrics they gate + document the different variants better.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agocpython: fix build with Python 3.11
Đoàn Trần Công Danh [Wed, 21 Sep 2022 15:21:58 +0000 (22:21 +0700)] 
cpython: fix build with Python 3.11

Python 3.11 moves longintrepr.h into cpython sub-directory.
However, in this version, longintrepr.h is always included.

2 years agowrite_prometheus: log which socket creation function failed
Eero Tamminen [Fri, 16 Sep 2022 10:46:45 +0000 (13:46 +0300)] 
write_prometheus: log which socket creation function failed

To ease debugging of users' listen socket creation failures.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Replace strdup() calls with safer sstrdup()
Eero Tamminen [Wed, 8 Jun 2022 11:24:22 +0000 (14:24 +0300)] 
gpu_sysman: Replace strdup() calls with safer sstrdup()

These 2 calls were used only at plugin startup, but one lacked assert.
Update also comments on error handling.

Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>
2 years agogpu_sysman: Fine-tune RAS error counter descriptions
Eero Tamminen [Wed, 23 Feb 2022 14:42:37 +0000 (16:42 +0200)] 
gpu_sysman: Fine-tune RAS error counter descriptions

* "number" -> "count" (as they are counters)
* "occurred" -> "that have occurred" (consistency with Sysman spec)

2 years agogpu_sysman: Update test mockup functions to Level-Zero API v1.7.9
Eero Tamminen [Thu, 17 Feb 2022 13:33:02 +0000 (15:33 +0200)] 
gpu_sysman: Update test mockup functions to Level-Zero API v1.7.9

* Add checks on whether functions are called in right order, and
  if not, return ZE_RESULT_ERROR_UNINITIALIZED

* All functions now truncate the counts to max available, instead of
  returning ZE_RESULT_ERROR_INVALID_SIZE

* Add new dev_args_check() helper function to reduce amount of test code

2 years agogpu_sysman: Reduce test code duplication with helper macro + function
Eero Tamminen [Wed, 16 Feb 2022 18:30:04 +0000 (20:30 +0200)] 
gpu_sysman: Reduce test code duplication with helper macro + function

2 years agogpu_sysman: Log more exact sampling and interval information
Eero Tamminen [Fri, 28 Jan 2022 15:29:57 +0000 (17:29 +0200)] 
gpu_sysman: Log more exact sampling and interval information

About plugin settings when GpuInfo is enabled, and in metrics HELP

3 years ago[collectd 6] Fix some gcc warnings with more strict checks (#3970)
Eero Tamminen [Wed, 8 Jun 2022 15:23:34 +0000 (18:23 +0300)] 
[collectd 6] Fix some gcc warnings with more strict checks (#3970)

* Remove unused dummy meta data functions

"format_stackdriver" was migrated over year ago.

* Fix rest of the GCC warnings from collectd core

* Properly initialize complex struct
* Fix signed vs unsigned comparisons
* Tell compiler which args are expected to be unused

Based on "-O3 -Werror -Wall -Wextra -Wformat-security" output.

* write_prometheus: fix static analysis warnings and comments

* Fix unused arguments reported by:
  "-O3 -Werror -Wall -Wextra -Wformat-security"
* Fix obsolete comment to match MHD docs:
  https://www.gnu.org/software/libmicrohttpd/ ("Queueing responses" section)
  https://git.gnunet.org/libmicrohttpd.git/tree/src/include/microhttpd.h#n2398
* Fix use after free reported by Klocwork, "prom_fam" cannot be used
  after it's been freed

* Fix signedness mismatch GCC warnings in few of the plugins

Based on "-O3 -Werror -Wall -Wextra -Wformat-security" output.

* Remove unused function arguments from few plugins

Based on "-O3 -Werror -Wall -Wextra -Wformat-security" output.

* Attribute unused functions arguments as such in few of the plugins

Based on "-O3 -Werror -Wall -Wextra -Wformat-security" output.

* turbostat: Satisfy clang-format CI check

Apparently CI has changed since this code was added to collectd.

Co-authored-by: Matthias Runge <mrunge@redhat.com>
3 years ago[collectd 6] Port #3938 for collectd 6 (#3999)
Matwey V. Kornilov [Wed, 8 Jun 2022 09:56:42 +0000 (12:56 +0300)] 
[collectd 6] Port #3938 for collectd 6 (#3999)

* write_http: Make use of CURLOPT_POSTFIELDSIZE

CURLOPT_POSTFIELDSIZE allows to specify the data size, which is known in
advance and equals to cb->send_buffer_fill. When CURLOPT_POSTFIELDSIZE is not
set (or set to -1), then curl evaluates data size using strlen() function,
which have O(N) complexity, so we save a few CPU cycles here.

Signed-off-by: Matwey V. Kornilov <matwey.kornilov@gmail.com>
* write_influxdb_udp: Split formatting functions to format_influxdb

Signed-off-by: Matwey V. Kornilov <matwey.kornilov@gmail.com>
* write_http: Add influxdb format

Signed-off-by: Matwey V. Kornilov <matwey.kornilov@gmail.com>
* write_http: Enable using unix socket in libcurl

Signed-off-by: Matwey V. Kornilov <matwey.kornilov@gmail.com>
Co-authored-by: Matthias Runge <mrunge@redhat.com>
3 years ago[collectd 6] Add 'gpu_sysman' plugin for (Intel) GPU metrics (#3968)
Eero Tamminen [Tue, 7 Jun 2022 17:55:14 +0000 (20:55 +0300)] 
[collectd 6] Add 'gpu_sysman' plugin for (Intel) GPU metrics (#3968)

* Add 'gpu_sysman' plugin for (Intel) GPU metrics

Metrics data is provided by OneAPI Level Zero Sysman API.

* Add unit-testing for 'gpu_sysman' plugin

See comment at start of src/gpu_sysman_test.c for details.

* Integrate 'gpu_sysman' plugin and its unit-testing to collectd build

* Add 'gpu_sysman' plugin configuration and documentation

* gpu_sysman: use sizeof(*var) rather than sizeof(vartype) in var=calloc(...)

Except for gpu_subarray_alloc(), all allocs are done with calloc().
This way correctness of all of them is easy to check just by grepping
for calloc (especially now that clang-format does not wrap those lines
any more), and reviewing gpu_subarray_alloc().

* gpu_sysman: minimal v6 API support + add units to metric names

Prometheus & OpenMetrics require metric names to be suffixed by the
metric unit, and ratios (0-1) to be used instead of percentages
(0-100).

* gpu_sysman: update test code for minimal v6 API support + new metric names

There's now also support for multiple metrics per family although they
are not used yet. "sstrncpy" is not needed any more.

* gpu_sysman: split metric properties from their names to separate labels

Following labels are used:
- sub_dev: subdevice ID (unsigned integer)
- location: e.g. "gpu" / "memory"
- type: e.g. "request" / "actual"
- direction: "read" / "write"

Additionally:

* Two location label values were fixed

* GPU engine indeces are now per engine type
  (instead of single index being used for all types)

* All metric family and label names have been changed to use
  underscores instead of dashes to separate words, as required by
  Prometheus i.e. collectd does not need to convert them any more:
  https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

* gpu_sysman: update test code to handle metrics split with labels

NOTE: providing NULL as label value to delete it is NOT supported.
Test code will assert on labels with NULL values.

* gpu_sysman: remove "GPU-" prefix from name and add it "pci_pdf" label

Also rename GPU struct "name" member to more explicit "pci_bdf".

This allowed simplifying the code slightly.

Sysman API supports nowadays also other devices than GPUs, so prefix
is removed to to simplify code and to be more future-proof:
https://spec.oneapi.io/level-zero/latest/core/api.html#_CPPv416ze_device_type_t

(Plugin will still query only GPU devices from Sysman though.)

* gpu_sysman: fix test code for "pci_bdf" added to metrics family

- do not add "pci_pdf" to metric name for matching
- fix for adding metric labels to family copies of them

* gpu_sysman: improvements to reported metrics

* Fix memory "type" label overwrite

* Replace "free" memory metric with "memory_usage_ratio" one,
  and rename "memory_bytes" to "memory_used_bytes" metric

* Split metric value aggregate function name to a separate
  "function" label

* Have metric family declares always in same place in code

* Avoid both setting metric labels, and reporting empty metrics,
  when higher internal sampling rate is used or there are L0
  errors

* gpu_sysman: update tests for sysman plugin changes

* Add "memory_usage_ratio" checks

* Update validation for metrics that can be sampled at higher
  rate i.e. have now the new aggregate function label

* With empty metrics avoided, dispatch mock-up can assert on them

* With extra L0 calls being skipped when not needed, number of calls
  can differ between query rounds:
  - refactor multi-sampling test to handle count changes
  - change error handing checks to be done in single-sampled mode

* Debug output is needed to debug triggered multisample asserts,
  so do that when assert would have been triggered, then abort

* gpu_sysman: add help information for all metric families

And document why const-qual cast is safe, and why GCC does
not warn about other assignments to .name & .help members.

* gpu_sysman: option to disable utilization metrics for single engines

More powerful GPUs can have a large number of engines of given type,
but user may be interested only on the higher level engine groups
utilization.

"DisableEngineSingle" option allows skipping individual engine metrics.

* gpu_sysman: option for specifying metrics output type

This can be used to speciify whether output metrics values will be
raw, derived or both.

This commit add support just for the configuration option itself,
adding / changing metrics to use it happens in next commit.

* gpu_sysman: optional raw metrics output for already supported metrics

This adds new counter type metrics for:
* memory bandwidth
* frequency throttle time
* engine execution time (activity)
* energy usage

Because collecd internally handles counters as integers, all units
cannot be ones recommended by Prometheus, but microseconds and
microjoules reported by Sysman.

* gpu_sysman: skip metrics with div-by-zero or time wrap around issues

Zero time intervals or max bandwidth would cause div-by-zero issues
and (very rare) time wrap around would cause bogus metric value.
Skip all of them.

* gpu_sysman: fix test code -Wpedantic + -Wcast-qual warnings

* gpu_sysman: add 'sub_dev' and 'type' labels only when needed

Empty label equals to a missing one, and Prometheus queries can check
for non-existence of a label, so let's just skip empty / unneeded ones.

Main difference to earlier is that LevelZero error categories that
provide non-zero values only for uncorrectable type (according to
spec), are now without a type label. Correctable i.e. zero metrics for
those categories were skipped already earlier.

* Add "dev_file" label support

And contrib/format.sh include re-order.

"dev_file" support is behind a define (enabled by default) because it
needs functions that are only part of POSIX, not C99.

Intel Kubernetes GPU plugin uses primary GPU node device file names
(card0, card1...) as its GPU identifiers.  This new label helps in
mapping Kubernetes custom metrics to them.

* Move test defines from Sysman plugin to its test code

And document with what GCC warning options the code is tested / passes.

* Change strcpy() in Sysman plugin to sstrncpy()

While for plugin that change does not really help (as target buffer is
always larger than source), for test code it is useful. And it shuts
up less capabable static checking tools than GCC.

As test code cannot use existing collectd functionality for this (test
code needs modified versions of some collectd functions, and all
collectd code does not pass GCC warnings I use), sstrncpy() is copied
to test code.

For test code there's also a fix to size given for snprintf(), and
removal of redundant string termination for modified plugin_log() copy
(vsnprintf() already terminates string).

* Pass clang-format check for gpu_sysman_test.c comments

* Add scalloc() wrapper similar to smalloc() to common utils

scalloc() wraps calloc() with exit on alloc failure,
similarly to what smalloc() does for malloc().

* Replace Sysman plugin alloc+assert calls with smalloc/scalloc

If asserts were disabled, allocation failures would result in collectd
memory errors => replace alloc+assert in the plugin with collectd
smalloc/scalloc wrappers that exits after logging allocation error.

Downsides are that this does not invoke debugger (which could be in a
different control group with plenty of memory), nor tell where / what
allocation failed, like enabled assert would, so test code variants of
the wrappers still do asserts.

* Pass clang-format check for gpu_sysman_test.c

3 years ago[ci][cirrus] Make Valgrind error on definite memory leaks only (#3977)
Emma Foley [Thu, 24 Feb 2022 07:25:47 +0000 (07:25 +0000)] 
[ci][cirrus] Make Valgrind error on definite memory leaks only (#3977)

* [ci][cirrus] Replace trusty with bionic/focal in debian_default_toolchain

Ubuntu 14.04 (Trusty) is out of standard support [1].
``make check`` fails for test_capabilities, as noted in [2].
[3] indicates that the cause is glibc, but that updates are not expected
to the version in trusty.

This PR replaces trusty with Ubuntu 18.04 (Bionic) and 20.04 (Focal).

[1] https://wiki.ubuntu.com/Releases
[2] https://github.com/collectd/collectd/pull/3936
[3] https://github.com/collectd/collectd/pull/3927#issuecomment-953350598

(cherry picked from commit b5d8ad13f4e300a7bfe342d27ad804003fcc9173)

* [ci][cirrus] Make Valgrind error on defininte memory leaks only

Valgrind gives errors when it find possible leaks;
update the options to only error on definite leaks.

This is done using the VALGRIND_OPTS env var, which is used by valgrind
when it is invoked.

Co-authored-by: Matthias Runge <mrunge@redhat.com>
3 years ago[ci][cirrus] Replace trusty with bionic/focal in debian_default_toolchain (#3972)
Matthias Runge [Tue, 15 Feb 2022 12:00:30 +0000 (13:00 +0100)] 
[ci][cirrus] Replace trusty with bionic/focal in debian_default_toolchain (#3972)

Ubuntu 14.04 (Trusty) is out of standard support [1].
``make check`` fails for test_capabilities, as noted in [2].
[3] indicates that the cause is glibc, but that updates are not expected
to the version in trusty.

This PR replaces trusty with Ubuntu 18.04 (Bionic) and 20.04 (Focal).

[1] https://wiki.ubuntu.com/Releases
[2] https://github.com/collectd/collectd/pull/3936
[3] https://github.com/collectd/collectd/pull/3927#issuecomment-953350598

(cherry picked from commit b5d8ad13f4e300a7bfe342d27ad804003fcc9173)

Co-authored-by: Emma Foley <efoley@redhat.com>
3 years ago[collectd 6] write_prometheus: migration to v6.0
Manuel Luis Sanmartín Rozada [Tue, 9 Mar 2021 23:11:48 +0000 (00:11 +0100)] 
[collectd 6] write_prometheus: migration to v6.0

3 years agoethstat
Bartlomiej Kotlowski [Fri, 8 Oct 2021 10:42:31 +0000 (12:42 +0200)] 
ethstat

3 years ago[collectd 6] cpusleep: migration to v6.0
Manuel Luis Sanmartín Rozada [Wed, 20 Jan 2021 23:37:12 +0000 (00:37 +0100)] 
[collectd 6] cpusleep: migration to v6.0

3 years agoMerge pull request #3828 from manuelluis/mlsr/collectd6-ipc
Matthias Runge [Wed, 8 Sep 2021 16:04:03 +0000 (18:04 +0200)] 
Merge pull request #3828 from manuelluis/mlsr/collectd6-ipc

[collectd 6] ipc: migration to v6.0

3 years agoMerge branch 'collectd-6.0' into mlsr/collectd6-ipc 3828/head
Matthias Runge [Wed, 8 Sep 2021 11:54:13 +0000 (13:54 +0200)] 
Merge branch 'collectd-6.0' into mlsr/collectd6-ipc

3 years agoMerge pull request #3893 from sonertari/patch-1
Matthias Runge [Mon, 6 Sep 2021 12:28:00 +0000 (14:28 +0200)] 
Merge pull request #3893 from sonertari/patch-1

Fix debug print of excluderegex if NULL

4 years agoFix debug print of excluderegex if NULL 3893/head
Soner Tari [Mon, 19 Jul 2021 14:31:51 +0000 (17:31 +0300)] 
Fix debug print of excluderegex if NULL

excluderegex in logparser plugin is optional, hence can be NULL. If debug is enabled, debug print causes a CRITICAL error log like vfprintf %s NULL in "utils_match: match_create_callback: regex = %s, excluderegex = %s"

4 years agoMerge pull request #3763 from carlospeon/collectd-6.0
Matthias Runge [Fri, 9 Jul 2021 18:39:26 +0000 (20:39 +0200)] 
Merge pull request #3763 from carlospeon/collectd-6.0

[collectd 6] write_influxdb_udp: migration to v6.0

4 years agoMerge branch 'collectd-6.0' into collectd-6.0 3763/head
Matthias Runge [Fri, 9 Jul 2021 13:01:03 +0000 (15:01 +0200)] 
Merge branch 'collectd-6.0' into collectd-6.0

4 years agoMerge pull request #3823 from manuelluis/mlsr/collectd6-ping
Matthias Runge [Mon, 15 Mar 2021 07:21:37 +0000 (08:21 +0100)] 
Merge pull request #3823 from manuelluis/mlsr/collectd6-ping

[collectd 6] ping: migration to v6.0

4 years agoMerge branch 'collectd-6.0' into mlsr/collectd6-ping 3823/head
Matthias Runge [Mon, 15 Mar 2021 07:04:43 +0000 (08:04 +0100)] 
Merge branch 'collectd-6.0' into mlsr/collectd6-ping

4 years agoMerge pull request #3851 from manuelluis/mlsr/collectd6-fix-protocols
Matthias Runge [Tue, 2 Mar 2021 17:14:13 +0000 (18:14 +0100)] 
Merge pull request #3851 from manuelluis/mlsr/collectd6-fix-protocols

[collectd 6]  protocols: Initialize metric_family to zero

4 years agoChange metric_family declaration 3851/head
Manuel Luis Sanmartín Rozada [Fri, 26 Feb 2021 23:31:00 +0000 (00:31 +0100)] 
Change metric_family declaration

4 years agoInitialize metric_family to zero
Manuel Luis Sanmartín Rozada [Sat, 20 Feb 2021 23:01:43 +0000 (00:01 +0100)] 
Initialize metric_family to zero

4 years agoMerge pull request #3821 from manuelluis/mlsr/collectd6-protocols
Matthias Runge [Sat, 20 Feb 2021 11:01:55 +0000 (12:01 +0100)] 
Merge pull request #3821 from manuelluis/mlsr/collectd6-protocols

[collectd 6] protocols: migration to v6.0

4 years agoMerge branch 'collectd-6.0' into mlsr/collectd6-protocols 3821/head
Matthias Runge [Fri, 19 Feb 2021 20:48:59 +0000 (21:48 +0100)] 
Merge branch 'collectd-6.0' into mlsr/collectd6-protocols

4 years agoMerge pull request #3833 from manuelluis/mlsr/collectd6-buddyinfo
Matthias Runge [Mon, 15 Feb 2021 15:52:04 +0000 (16:52 +0100)] 
Merge pull request #3833 from manuelluis/mlsr/collectd6-buddyinfo

[collectd 6] buddyinfo: migration to v6.0

4 years agoMerge branch 'collectd-6.0' into mlsr/collectd6-buddyinfo 3833/head
Matthias Runge [Mon, 15 Feb 2021 15:22:27 +0000 (16:22 +0100)] 
Merge branch 'collectd-6.0' into mlsr/collectd6-buddyinfo

4 years agoFilter selected zones
Manuel Luis Sanmartín Rozada [Fri, 12 Feb 2021 21:09:06 +0000 (22:09 +0100)] 
Filter selected zones

4 years agoMerge pull request #3808 from manuelluis/mlsr/collectd6-users
Matthias Runge [Fri, 29 Jan 2021 14:28:27 +0000 (15:28 +0100)] 
Merge pull request #3808 from manuelluis/mlsr/collectd6-users

[collectd 6] users: migration to v6.0

4 years agoMerge branch 'collectd-6.0' into mlsr/collectd6-users 3808/head
Matthias Runge [Fri, 29 Jan 2021 14:15:18 +0000 (15:15 +0100)] 
Merge branch 'collectd-6.0' into mlsr/collectd6-users

4 years ago[collectd 6] buddyinfo: migration to v6.0
Manuel Luis Sanmartín Rozada [Wed, 27 Jan 2021 21:40:39 +0000 (22:40 +0100)] 
[collectd 6] buddyinfo: migration to v6.0

4 years agoMerge pull request #3807 from manuelluis/mlsr/collectd6-uptime
Matthias Runge [Mon, 25 Jan 2021 17:22:05 +0000 (18:22 +0100)] 
Merge pull request #3807 from manuelluis/mlsr/collectd6-uptime

[collectd 6] uptime: migration to v6.0

4 years ago[collectd 6] ipc: migration to v6.0
Manuel Luis Sanmartín Rozada [Sun, 24 Jan 2021 21:58:40 +0000 (22:58 +0100)] 
[collectd 6] ipc: migration to v6.0

4 years ago[collectd 6] ping: migration to v6.0
Manuel Luis Sanmartín Rozada [Wed, 20 Jan 2021 23:51:06 +0000 (00:51 +0100)] 
[collectd 6] ping: migration to v6.0

4 years ago[collectd 6] protocols: migration to v6.0
Manuel Luis Sanmartín Rozada [Wed, 20 Jan 2021 23:49:57 +0000 (00:49 +0100)] 
[collectd 6] protocols: migration to v6.0

4 years ago[collectd 6] users: migration to v6.0
Manuel Luis Sanmartín Rozada [Wed, 20 Jan 2021 22:29:30 +0000 (23:29 +0100)] 
[collectd 6] users: migration to v6.0

4 years ago[collectd 6] uptime: migration to v6.0 3807/head
Manuel Luis Sanmartín Rozada [Wed, 20 Jan 2021 22:19:31 +0000 (23:19 +0100)] 
[collectd 6] uptime: migration to v6.0

4 years agoMerge branches 'v6/interface' and 'v6/memory' into collectd-6.0 3561/head
Florian Forster [Mon, 19 Oct 2020 08:56:08 +0000 (10:56 +0200)] 
Merge branches 'v6/interface' and 'v6/memory' into collectd-6.0

4 years agoRe-enable the write_influxdb_udp plugin in the CI systems
carlospeon [Wed, 14 Oct 2020 08:04:26 +0000 (10:04 +0200)] 
Re-enable the write_influxdb_udp plugin in the CI systems

4 years agointerface plugin: Convert to the v6 API. 3765/head
Florian Forster [Sat, 3 Oct 2020 06:43:54 +0000 (08:43 +0200)] 
interface plugin: Convert to the v6 API.

Fixes: #3641
4 years ago* family loop write rework
carlospeon [Wed, 30 Sep 2020 14:11:18 +0000 (16:11 +0200)] 
* family loop write rework
* fix gauge printf format
* fix METRIC_TYPE_UNTYPED

4 years agowrite_influxdb_udp.c: migration to v6.0
Carlos Peón Costa [Tue, 29 Sep 2020 16:13:41 +0000 (18:13 +0200)] 
write_influxdb_udp.c: migration to v6.0

4 years agomemory plugin: Implement absolute/percentage reporting. 3762/head
Florian Forster [Mon, 28 Sep 2020 11:03:06 +0000 (13:03 +0200)] 
memory plugin: Implement absolute/percentage reporting.

This also migrates the new NetBSD code to using metric_family_t.

Fixes: #3667
4 years agoMerge pull request #3586 from octo/merge-main-into-6
Florian Forster [Tue, 22 Sep 2020 09:54:57 +0000 (11:54 +0200)] 
Merge pull request #3586 from octo/merge-main-into-6

Merge main into collectd-6.0

4 years agoCI: disale building the gRPC plugin. 3586/head
Florian Forster [Fri, 18 Sep 2020 21:06:45 +0000 (23:06 +0200)] 
CI: disale building the gRPC plugin.

4 years agostrbuf: Use <stdbool.h> instead of _Bool.
Florian Forster [Fri, 18 Sep 2020 19:32:15 +0000 (21:32 +0200)] 
strbuf: Use <stdbool.h> instead of _Bool.

C++, e.g. the gRPC plugin, cannot deal with _Bool.

4 years agoMerge branch 'main' into collectd-6.0
Florian Forster [Tue, 22 Sep 2020 09:01:50 +0000 (11:01 +0200)] 
Merge branch 'main' into collectd-6.0

4 years agoMerge pull request #3559 from octo/remove_label_t
Florian Forster [Mon, 21 Sep 2020 08:00:49 +0000 (10:00 +0200)] 
Merge pull request #3559 from octo/remove_label_t

daemon: Fix the build on Solaris.

4 years agoformat_graphite: Add special case for NAN to gr_format_values. 3559/head
Florian Forster [Sat, 19 Sep 2020 20:03:30 +0000 (22:03 +0200)] 
format_graphite: Add special case for NAN to gr_format_values.

4 years agocommon: Add special case for NAN to format_values.
Florian Forster [Sat, 19 Sep 2020 19:39:53 +0000 (21:39 +0200)] 
common: Add special case for NAN to format_values.

4 years agoMerge pull request #3573 from octo/ci/buster coverity_scan
Matthias Runge [Tue, 15 Sep 2020 08:58:13 +0000 (10:58 +0200)] 
Merge pull request #3573 from octo/ci/buster

.cirrus.yml: Add Debian Buster.

4 years agoMerge pull request #3571 from carlospeon/buffer
Florian Forster [Tue, 15 Sep 2020 06:06:08 +0000 (08:06 +0200)] 
Merge pull request #3571 from carlospeon/buffer

write_influx_udp: build influxdb points outside de mutex.

4 years agoMerge pull request #3556 from octo/remove-xmms
Florian Forster [Mon, 14 Sep 2020 14:55:15 +0000 (16:55 +0200)] 
Merge pull request #3556 from octo/remove-xmms

Remove the "XMMS" plugin.

4 years ago.cirrus.yml: Add Debian Buster. 3573/head
Florian Forster [Mon, 14 Sep 2020 06:36:40 +0000 (08:36 +0200)] 
.cirrus.yml: Add Debian Buster.