Andrés J. Díaz [Mon, 31 Aug 2009 19:16:41 +0000 (21:16 +0200)]
src/utils_threshold.c: Implement the “Hits” and “Hysteresis” config options.
Hi all!
Based on Mariusz's idea, i attach a patch for thresholds (no for
filtering, yet) with basic hysteresis support adding the keyword
Hysteresis to configuration file, for example:
In this case the notification is raised when load (midterm datasource)
is greater than 1, and came back to OKAY when lower than 0.7 (1 - 0.3).
This is a proof of concept and I do not have a lot of time to test,
please use this patch with caution. Furthermore, the code is really hard
and dirty :)
Best regards,
Andres
P.S.: The patch also including hits support, so to compile you also
require to apply hits-cache.patch and, obviously this patch is
incompatible with hits-threshold.patch.
I've attached a patch to add hit counter to thresholds, that is, each
time when threhsold raised, then an internal hit counter is incremented,
when the value of the counter raise a specific value setted in
configuration, then the notification is generated and counter is reset.
Here are an example of threshold configuration with hit conter:
Florian Forster [Thu, 27 Aug 2009 07:06:16 +0000 (09:06 +0200)]
curl_json plugin: Renamed the “couchdb” plugin to “curl_json”.
On Thu, Aug 20, 2009 at 10:31:22AM -0700, Doug MacEachern wrote:
> Wanted to bring this up before 4.8..
> When I first started on the couchdb plugin, there were metrics
> specific to couchdb, but ended up making it generic and the metrics
> are all specified in the config. Since then, I've looked at Dynomite
> which has its own set of metrics exposed the same way:
> http://gist.github.com/137771
> Also noticed Hadoop 0.21 daemons now support: "/metrics?format=json to
> retrieve the data in a structured form.", but haven't had a chance to
> try yet. I'm sure there's more too. So I'm wondering if 'couchdb'
> should be renamed to something more generic, 'json' or 'yajl' maybe?
> And/or pushing the curl/yajl code out to util functions, then add the
> couchdb specific metrics to the couchdb plugin. Then also use the
> util functions for dynomite, hadoop, etc., specific plugins. Thoughts?
Florian Forster [Fri, 21 Aug 2009 09:34:05 +0000 (11:34 +0200)]
http plugin: http_write: Clean-up.
A couple of bugs have been fixed in the process. One error handling path
didn't release a mutex, for example. Also, the buffer may have been sent
truncated.
2009/8/18 Florian Forster <octo@verplant.org>:
> Hi Mariusz,
>
> On Mon, Aug 17, 2009 at 02:20:29AM +0200, Mariusz Gronczewski wrote:
>> i was thinking how to "spread out" writes to rrd files a bit, because
>> now its big spike every CacheTimeout or little smaller "square" on
>> graph if u use WritesPerSecond.
>
> in general I like your patch, thank you very much for posting it :)
> I have some doubts about calling rand() in such a busy place though,
> since getting random numbers is potentially costly. Also, rand(3) is not
> thread-safe, though I don't think that's really an issue for us.
Yeah good point, but that would be probably noticable on very slow
(like PIII 800 slow) machines with tons of rrd, and then machine would
run out of disk bandwidth first.
> Maybe a solution would be to add a ‘random_timeout’ member to the
> ‘rrd_cache_t’ struct, too. This member is then set when creating the
> entry and set again right after the values have been removed. That way
> rand(3) is only called once for each write instead of calling for every
> check.
Yeah, very good idea, i didnt thougth about that (well tbh. i didnt
looked much into "interiors" of rrdtool plugin). Ive implemented it in
attached patch, so far ive been testing it for about 1 hour and works
pretty well.
> As an interesting sidenote: With the above approach, the random write
> times are distributed “uniform”, i. e. every delay from 0 to max-1
> seconds has the same probability. With your code, I think the actual
> time a value is written follows a “normal” distribution (you know, that
> famous bell curve). So I'd expect the above approach to spread the value
> quicker.
Yup, exactly as u said, its much quicker like that.
Im wondering how config variable should be called, name
"RandomTimeout" dont mean anything useful ("random timeout of what?"),
maybe TimeoutSpread ? RandomizeTimeout ?
i was thinking how to "spread out" writes to rrd files a bit, because
now its big spike every CacheTimeout or little smaller "square" on
graph if u use WritesPerSecond. So ive written little patch which
"spreads out" writing by changing Cache timeout every time rrdtool
plugin finds data to save. Basically instead of moving data older than
CacheTimeout to write queue it moves it if its older than CacheTimeout
+- RandomTimeout. What it changes?
Without it, gathered data is "synchronised" with eachother, for
example (CacheTimeout = 600):
1.collectd starts
2. after 10 minutes, data from all plugins get "too old" and get
pushed into write queue and get saved
3. after another 10 minutes, same thing, all data "ages" at same time
and get saved in one big chunk
With it (RandomTimeout=300) it works like that
1. collectd starts
2. after 5 minutes some data (lets call it A) starts to go into write queue
3. after 10 minutes from start about 50% (on average) data is saved
(lets call it B)
4. finally, after 15 minutes, all "leftover" data gets saved (lets call it C)
5. next "cycle"
6. data A ages first (cos it was put to disk first) and like before,
some of it gets writen earlier, some of it gets written later)
7. after that data B ages and like before writes are spread over 10 mins
8. same with C
so first cycle (looking at i/o) looks like sinus, next 10 minute cycle
is same sinus but flattened a bit and so on (looks like fading sinus),
and after few cycles it gives pretty much same amount on writes per
sec, no ugly spikes.
Effect looks like that:
http://img24.imageshack.us/img24/7294/drrawcgi.png
(after few more h it will be more "smooth")
Florian Forster [Mon, 17 Aug 2009 08:39:55 +0000 (10:39 +0200)]
java plugin: Wait with the configuration until the daemon has forked.
Passing the configuration to Java-based plugins requires the JVM to be
active and running. However, the JVM starts some threads that are lost
when the daemon forks to the background.
This patch changes the behavior of the Java plugin to copy the
configuration blocks found to a local variable and run the configuration
of the Java-based plugins from the `init' callback, because it is
invoked after the daemon has forked to the background.
Andrés J. Díaz [Tue, 11 Aug 2009 19:57:34 +0000 (21:57 +0200)]
src/utils_cache.c: Update GETVAL output when missing state.
Hi
I think that I've found a bug when use unixsock plugin. The problem is
releate with missing state, when no value is received by daemon for a
while in the cache is marked as MISSING, but the last value is still
showing even when machine is not reporting in a GETVAL and LISTVAL
commands. Some utlities like collectd-nagios does not work correctly,
and report an OKAY value when host is not reporting from a long time.
I attach a patch with check the state value of an cache entry in
uc_get_names and in uc_get_rate_by_name. This patch works for me, but
it's not very tested yet, and I not very sure about if it's a good way
to check the problem. The patch is tested on 4.7.2 release version.
BTW a GETSTATE command will be an useful feature too :P
Andrés J. Díaz [Tue, 11 Aug 2009 19:57:34 +0000 (21:57 +0200)]
src/utils_cache.c: Update GETVAL output when missing state.
Hi
I think that I've found a bug when use unixsock plugin. The problem is
releate with missing state, when no value is received by daemon for a
while in the cache is marked as MISSING, but the last value is still
showing even when machine is not reporting in a GETVAL and LISTVAL
commands. Some utlities like collectd-nagios does not work correctly,
and report an OKAY value when host is not reporting from a long time.
I attach a patch with check the state value of an cache entry in
uc_get_names and in uc_get_rate_by_name. This patch works for me, but
it's not very tested yet, and I not very sure about if it's a good way
to check the problem. The patch is tested on 4.7.2 release version.
BTW a GETSTATE command will be an useful feature too :P
Florian Forster [Sun, 16 Aug 2009 07:27:44 +0000 (09:27 +0200)]
madwifi plugin: Rename the antenna stats.
The first part of the type instance is already something like `ast_ant_rx' -
using `antenna%i' as the second part is therefore redundant. Thanks to Ondrej
for the pointer.
Florian Forster [Sun, 16 Aug 2009 07:21:05 +0000 (09:21 +0200)]
madwifi plugin: Unify ioctl error handling.
If an ioctl fails, a debug message is generated rather than an error message.
There are several types of interfaces manages by the madwifi driver, and not
all interfaces support all ioctls. Thanks to Ondrej for pointing this out.
Florian Forster [Wed, 12 Aug 2009 13:08:40 +0000 (15:08 +0200)]
libvirt plugin: Further improve the connection handling.
Use the complaint mechanism for failed connection attempts and handle multiple
`Connection' configuration options like other options in other plugins (i. e.
later options overwrite earlier settings of the same name).
Ondrej Zajicek [Tue, 11 Aug 2009 09:44:28 +0000 (11:44 +0200)]
madwifi plugin: Plugin for detailed information from the MadWifi driver.
Hello
After some time i managed to make a new version of Madwifi plugin. The
main change is that it is possible to finely tune the set of monitored
statistics and just the most important statistics are monitored by
default. Also the number of new data types is reduced (by using type
instances).
Florian Forster [Tue, 4 Aug 2009 11:02:57 +0000 (13:02 +0200)]
network plugin: Use the meta data to implement the `Forward' option.
Previously, a cache in the network plugin was used to keep track of
which values were received via the network in order to distinguish
between ``forwarded'' values and values that were received from
somewhere else.
The same cache was also used to avoid loops when forwarding packages by
keeping track of the highest timestamp that was sent by the plugin and
discard received data that was older or as old as that.
This information is not kept in the meta data of the global cache (what
is the last timestamp sent) and the meta data of the value list (was
this value list received via the network?). The cache that was
maintained in the network plugin has been removed.