2009/8/18 Florian Forster <octo@verplant.org>:
> Hi Mariusz,
>
> On Mon, Aug 17, 2009 at 02:20:29AM +0200, Mariusz Gronczewski wrote:
>> i was thinking how to "spread out" writes to rrd files a bit, because
>> now its big spike every CacheTimeout or little smaller "square" on
>> graph if u use WritesPerSecond.
>
> in general I like your patch, thank you very much for posting it :)
> I have some doubts about calling rand() in such a busy place though,
> since getting random numbers is potentially costly. Also, rand(3) is not
> thread-safe, though I don't think that's really an issue for us.
Yeah good point, but that would be probably noticable on very slow
(like PIII 800 slow) machines with tons of rrd, and then machine would
run out of disk bandwidth first.
> Maybe a solution would be to add a ‘random_timeout’ member to the
> ‘rrd_cache_t’ struct, too. This member is then set when creating the
> entry and set again right after the values have been removed. That way
> rand(3) is only called once for each write instead of calling for every
> check.
Yeah, very good idea, i didnt thougth about that (well tbh. i didnt
looked much into "interiors" of rrdtool plugin). Ive implemented it in
attached patch, so far ive been testing it for about 1 hour and works
pretty well.
> As an interesting sidenote: With the above approach, the random write
> times are distributed “uniform”, i. e. every delay from 0 to max-1
> seconds has the same probability. With your code, I think the actual
> time a value is written follows a “normal” distribution (you know, that
> famous bell curve). So I'd expect the above approach to spread the value
> quicker.
Yup, exactly as u said, its much quicker like that.
Im wondering how config variable should be called, name
"RandomTimeout" dont mean anything useful ("random timeout of what?"),
maybe TimeoutSpread ? RandomizeTimeout ?
i was thinking how to "spread out" writes to rrd files a bit, because
now its big spike every CacheTimeout or little smaller "square" on
graph if u use WritesPerSecond. So ive written little patch which
"spreads out" writing by changing Cache timeout every time rrdtool
plugin finds data to save. Basically instead of moving data older than
CacheTimeout to write queue it moves it if its older than CacheTimeout
+- RandomTimeout. What it changes?
Without it, gathered data is "synchronised" with eachother, for
example (CacheTimeout = 600):
1.collectd starts
2. after 10 minutes, data from all plugins get "too old" and get
pushed into write queue and get saved
3. after another 10 minutes, same thing, all data "ages" at same time
and get saved in one big chunk
With it (RandomTimeout=300) it works like that
1. collectd starts
2. after 5 minutes some data (lets call it A) starts to go into write queue
3. after 10 minutes from start about 50% (on average) data is saved
(lets call it B)
4. finally, after 15 minutes, all "leftover" data gets saved (lets call it C)
5. next "cycle"
6. data A ages first (cos it was put to disk first) and like before,
some of it gets writen earlier, some of it gets written later)
7. after that data B ages and like before writes are spread over 10 mins
8. same with C
so first cycle (looking at i/o) looks like sinus, next 10 minute cycle
is same sinus but flattened a bit and so on (looks like fading sinus),
and after few cycles it gives pretty much same amount on writes per
sec, no ugly spikes.
Effect looks like that:
http://img24.imageshack.us/img24/7294/drrawcgi.png
(after few more h it will be more "smooth")
Florian Forster [Sun, 16 Aug 2009 07:27:44 +0000 (09:27 +0200)]
madwifi plugin: Rename the antenna stats.
The first part of the type instance is already something like `ast_ant_rx' -
using `antenna%i' as the second part is therefore redundant. Thanks to Ondrej
for the pointer.
Florian Forster [Sun, 16 Aug 2009 07:21:05 +0000 (09:21 +0200)]
madwifi plugin: Unify ioctl error handling.
If an ioctl fails, a debug message is generated rather than an error message.
There are several types of interfaces manages by the madwifi driver, and not
all interfaces support all ioctls. Thanks to Ondrej for pointing this out.
Ondrej Zajicek [Tue, 11 Aug 2009 09:44:28 +0000 (11:44 +0200)]
madwifi plugin: Plugin for detailed information from the MadWifi driver.
Hello
After some time i managed to make a new version of Madwifi plugin. The
main change is that it is possible to finely tune the set of monitored
statistics and just the most important statistics are monitored by
default. Also the number of new data types is reduced (by using type
instances).
Florian Forster [Tue, 4 Aug 2009 11:02:57 +0000 (13:02 +0200)]
network plugin: Use the meta data to implement the `Forward' option.
Previously, a cache in the network plugin was used to keep track of
which values were received via the network in order to distinguish
between ``forwarded'' values and values that were received from
somewhere else.
The same cache was also used to avoid loops when forwarding packages by
keeping track of the highest timestamp that was sent by the plugin and
discard received data that was older or as old as that.
This information is not kept in the meta data of the global cache (what
is the last timestamp sent) and the meta data of the value list (was
this value list received via the network?). The cache that was
maintained in the network plugin has been removed.
These two new functions can be used to get historical data of values in
the cache. This can be used to calculate floating averages, hysteresis
and a shipload of other aggregation and consolidation functions.
The current implementation is probably not yet perfect:
- If not enough values are available to satisfy the request, the buffer
will be enlarged and NaNs will be returned in the newly allocated
cells. The caller has no way to recognize this case.
- If a value is missing, no NaNs will be added to the cache. It's
unclear if this was desirable.
- The returned values are reversed, i. e. val[0] will be the newest
value, val[n-1] will be the oldest. Here, too, I'm unsure which way
is easier to comprehend / use. I went for this implementation because
it was easier to write.
Paul Sadauskas [Wed, 8 Jul 2009 08:49:23 +0000 (10:49 +0200)]
src/utils_cache.c: Add a missing `continue'.
tokkee on IRC & I think we found a bug with utils_cache.c. The uc_check_timeout
function is missing a continue after the "uninteresting" service check, that
causes a key to be null.
This probably caused an assertion failure in cache_compare as reported by
Mariusz.
src/utils_threshold.c: Change the percentage code so it works with the DataSource option.
The percentage code used to *always* check the first data source. With this
patch, the code honors the `DataSource' option again, checking only the
configured data sources if applicable.
The percentage option works like collectd-nagios, that is, calculate the
percentage of the value of the first DS over the total. For df plugin,
for example,
calculate the percentage of the "used" DS.
Paul Sadauskas [Sat, 20 Jun 2009 20:50:36 +0000 (14:50 -0600)]
Changes suggested by Sebastian Harl.
* Separate Host and Port in config, report Host as hostname, and Port as
plugin instance.
* Submit before closing connection.
* Else-case in config, in case of invalid config params.
* Flounder around at using pkg-config in configure.in
* Remove forward declarations.
* Include plugin in config summary.
Introduce the DERIVE and ABSOLUTE data source types.
Hi,
i've updated my patch to 4.7.0, most of "data input" plugins (curl, java, exec,
perl, tail, couchdb) should work with derive. In case of couchdb and curl, if u
use absolute DS you can only "Set", no "Inc" or "Add" coz obviously that
wouldn't make much sense with it. Other plugins can be "enabled" globally to
use derive by changing "COUNTER" to "DERIVE" in types.db but that way is ugly
(but makes sense in some cases, like when u have lot of tunnels or ppp
interfaces) and either needs converting or recreating rrd files.
Regards
Mariusz
---
Hi,
ive been running my patch with 4.7.1, found a minor bug, but after repairing
that i didnt had any problems with it on my servers, im including patch
(against 4.7.1 from webpage),