Florian Forster [Fri, 2 Oct 2009 09:29:44 +0000 (11:29 +0200)]
df plugin: Implement the "ReportReserved" option.
When enabled, the reserved space is reported separately. The "df_complex"
type is used and the mount point or device name is used as plugin instance
(as it should be) instead of the type instance (which is now needed for
"free", "reserved" and "used").
The INode handling has been split up in the same manner.
Florian Forster [Fri, 2 Oct 2009 07:06:41 +0000 (09:06 +0200)]
processes plugin: Improve the error handling.
The fork-rate function now returns ULONG_MAX upon error. The error detection
when using strtoul has been improved (overflow is not the only possible error).
Doug MacEachern [Thu, 1 Oct 2009 00:28:08 +0000 (17:28 -0700)]
include netinet/in.h for sockaddr_in on FreeBSD
Fixes:
common.c: In function 'service_name_to_port_number':
common.c:1112: error: dereferencing pointer to incomplete type
common.c:1119: error: dereferencing pointer to incomplete type
unixsock plugin: Fix a (well hidden) race condition.
Within the client handling thread, fdopen is called twice on the file
descriptor passed to the thread. Later those file handles are closed like:
fclose (fhin);
fclose (fhout);
This is a race condition, because the first call to fclose will close the file
descriptor. The second call to fclose will try the same. Usually, it would fail
silently and all is well. On a busy machine, however, another thread may just
have opened a file or accepted a socket. In that case an arbitrary file
descriptor is closed. If the file descriptor is opened yet again fast enough,
data may even end up in a totally wrong location.
As a work-around the file descriptor is not dup'ed so each fdopen operates on
its own file descriptor. As an alternative the "r+" mode and a single file
handle may be suitable, too.
Many thanks to Sven Trenkel for pointing me into the right directioin :)
netapp plugin: Rename the “Capacity” and “Snapshot” options again.
They've been renamed to “GetCapacity” and “GetSnapshot” so the
names used within the “VolumeUsage” block are the same as the names
used elsewhere in the plugin.
netapp plugin: Refactor the VolumePerf collection.
Same procedure one last time. The “GetVolumePerfData” block has been
renamed to “VolumePerf”. The “Get{IO,Ops,Latency}” options now
use ignore lists, too. Appropriate “IgnoreSelected{IO,Ops,Latency}”
options have been introduced.
Much of this is like disk, wafl and system statistics before. The
“GetVolumeData” has been renamed to “VolumeUsage” and the
“GetDiskUtil” and “GetSnapUtil” options have been changed, too. The
configuration now looks like this:
The code now uses to "ignore lists" to check whether capacity and/or
snapshot information should be collected for a volume. This means the
order in which volumes are listed no longer matters and than you can
use such advanced options as selecting volumes via regular expressions.
netapp plugin: Print a notice if all WAFL values have been disabled.
This message is printed if the user did supply a <WAFL /> block but
then disabled all supported values. WAFL collection will be disabled
in this case to increase performance.
Same procedure as before: Instead of using the “service handler”,
create a cfg_system_t pointer if the user wants system statistics. Then
call cna_query_system instead of the service handler.
The “GetSystemPerfData” block has been renamed to “System” and the
“Multiplier” option has been replaced by the “Interval” option.
netapp plugin: Refactor handling of the WAFL data.
Basically the same structure as for the Disk data has been used. The
service handler has been removed and replaced by a call to
“cna_query_wafl”.
The “GetWaflPerfData” block has been renamed to “WAFL” to make the
config file easier to read. The “GetBufCache” config option has been
renamed to “GetBufferCache”. Maybe it should be renamed to
“GetBufferHash”, because that's what the NetApp API uses…?
Instead of obscuring control-flow with generic function pointers, use a
clear and easy to read function hierarchy. All disk-related action now
starts with “cna_query_disk (host)” instead of
“service->handler (host, data, service->data)”.
The “GetDiskPerfData”block has been renamed to “Disks”. All those
blocks start with “Get” and most end with “PerfData”, distracting
from the actual relevant part.
The “Multiplier” option has been replaced by the “Interval” option,
which expects a time in seconds rather than a factor which is multiplied
to the host interval.
The structure is roughly like this: Structs that only hold flags to tell
the functions what data to submit are prefixed with "cfg_". Structs that
hold old values for counters are prefixed with "data_".
The "disk_t" type now included flags, too, to indicate valid / invalid
values. The "query_submit_disk_data" function has been changed to honor
those flags.
Various "volume_data" stuff has been renamed to "volume_usage" to make
it more distinguishable from "volume_performance".
Various defines are now also prefixed with "CFG_" to show which flags
are used for configuration and which are used do mark counters valid.
The latter use the "HAVE_" prefix.
… into “query_volume_perf_data” and “submit_volume_perf_data”. The
functions use the “per_volume_perf_data_t” struct to pass the counters
from one value to the other. The flags have been extended to include
HAVE_* flags. This way we can reliably determine whether an “old”
counter is valid or not.