Florian Forster [Fri, 21 Aug 2009 09:34:05 +0000 (11:34 +0200)]
http plugin: http_write: Clean-up.
A couple of bugs have been fixed in the process. One error handling path
didn't release a mutex, for example. Also, the buffer may have been sent
truncated.
2009/8/18 Florian Forster <octo@verplant.org>:
> Hi Mariusz,
>
> On Mon, Aug 17, 2009 at 02:20:29AM +0200, Mariusz Gronczewski wrote:
>> i was thinking how to "spread out" writes to rrd files a bit, because
>> now its big spike every CacheTimeout or little smaller "square" on
>> graph if u use WritesPerSecond.
>
> in general I like your patch, thank you very much for posting it :)
> I have some doubts about calling rand() in such a busy place though,
> since getting random numbers is potentially costly. Also, rand(3) is not
> thread-safe, though I don't think that's really an issue for us.
Yeah good point, but that would be probably noticable on very slow
(like PIII 800 slow) machines with tons of rrd, and then machine would
run out of disk bandwidth first.
> Maybe a solution would be to add a ‘random_timeout’ member to the
> ‘rrd_cache_t’ struct, too. This member is then set when creating the
> entry and set again right after the values have been removed. That way
> rand(3) is only called once for each write instead of calling for every
> check.
Yeah, very good idea, i didnt thougth about that (well tbh. i didnt
looked much into "interiors" of rrdtool plugin). Ive implemented it in
attached patch, so far ive been testing it for about 1 hour and works
pretty well.
> As an interesting sidenote: With the above approach, the random write
> times are distributed “uniform”, i. e. every delay from 0 to max-1
> seconds has the same probability. With your code, I think the actual
> time a value is written follows a “normal” distribution (you know, that
> famous bell curve). So I'd expect the above approach to spread the value
> quicker.
Yup, exactly as u said, its much quicker like that.
Im wondering how config variable should be called, name
"RandomTimeout" dont mean anything useful ("random timeout of what?"),
maybe TimeoutSpread ? RandomizeTimeout ?
i was thinking how to "spread out" writes to rrd files a bit, because
now its big spike every CacheTimeout or little smaller "square" on
graph if u use WritesPerSecond. So ive written little patch which
"spreads out" writing by changing Cache timeout every time rrdtool
plugin finds data to save. Basically instead of moving data older than
CacheTimeout to write queue it moves it if its older than CacheTimeout
+- RandomTimeout. What it changes?
Without it, gathered data is "synchronised" with eachother, for
example (CacheTimeout = 600):
1.collectd starts
2. after 10 minutes, data from all plugins get "too old" and get
pushed into write queue and get saved
3. after another 10 minutes, same thing, all data "ages" at same time
and get saved in one big chunk
With it (RandomTimeout=300) it works like that
1. collectd starts
2. after 5 minutes some data (lets call it A) starts to go into write queue
3. after 10 minutes from start about 50% (on average) data is saved
(lets call it B)
4. finally, after 15 minutes, all "leftover" data gets saved (lets call it C)
5. next "cycle"
6. data A ages first (cos it was put to disk first) and like before,
some of it gets writen earlier, some of it gets written later)
7. after that data B ages and like before writes are spread over 10 mins
8. same with C
so first cycle (looking at i/o) looks like sinus, next 10 minute cycle
is same sinus but flattened a bit and so on (looks like fading sinus),
and after few cycles it gives pretty much same amount on writes per
sec, no ugly spikes.
Effect looks like that:
http://img24.imageshack.us/img24/7294/drrawcgi.png
(after few more h it will be more "smooth")
Florian Forster [Mon, 17 Aug 2009 08:39:55 +0000 (10:39 +0200)]
java plugin: Wait with the configuration until the daemon has forked.
Passing the configuration to Java-based plugins requires the JVM to be
active and running. However, the JVM starts some threads that are lost
when the daemon forks to the background.
This patch changes the behavior of the Java plugin to copy the
configuration blocks found to a local variable and run the configuration
of the Java-based plugins from the `init' callback, because it is
invoked after the daemon has forked to the background.
Andrés J. Díaz [Tue, 11 Aug 2009 19:57:34 +0000 (21:57 +0200)]
src/utils_cache.c: Update GETVAL output when missing state.
Hi
I think that I've found a bug when use unixsock plugin. The problem is
releate with missing state, when no value is received by daemon for a
while in the cache is marked as MISSING, but the last value is still
showing even when machine is not reporting in a GETVAL and LISTVAL
commands. Some utlities like collectd-nagios does not work correctly,
and report an OKAY value when host is not reporting from a long time.
I attach a patch with check the state value of an cache entry in
uc_get_names and in uc_get_rate_by_name. This patch works for me, but
it's not very tested yet, and I not very sure about if it's a good way
to check the problem. The patch is tested on 4.7.2 release version.
BTW a GETSTATE command will be an useful feature too :P
Andrés J. Díaz [Tue, 11 Aug 2009 19:57:34 +0000 (21:57 +0200)]
src/utils_cache.c: Update GETVAL output when missing state.
Hi
I think that I've found a bug when use unixsock plugin. The problem is
releate with missing state, when no value is received by daemon for a
while in the cache is marked as MISSING, but the last value is still
showing even when machine is not reporting in a GETVAL and LISTVAL
commands. Some utlities like collectd-nagios does not work correctly,
and report an OKAY value when host is not reporting from a long time.
I attach a patch with check the state value of an cache entry in
uc_get_names and in uc_get_rate_by_name. This patch works for me, but
it's not very tested yet, and I not very sure about if it's a good way
to check the problem. The patch is tested on 4.7.2 release version.
BTW a GETSTATE command will be an useful feature too :P
Florian Forster [Sun, 16 Aug 2009 07:27:44 +0000 (09:27 +0200)]
madwifi plugin: Rename the antenna stats.
The first part of the type instance is already something like `ast_ant_rx' -
using `antenna%i' as the second part is therefore redundant. Thanks to Ondrej
for the pointer.
Florian Forster [Sun, 16 Aug 2009 07:21:05 +0000 (09:21 +0200)]
madwifi plugin: Unify ioctl error handling.
If an ioctl fails, a debug message is generated rather than an error message.
There are several types of interfaces manages by the madwifi driver, and not
all interfaces support all ioctls. Thanks to Ondrej for pointing this out.
Florian Forster [Wed, 12 Aug 2009 13:08:40 +0000 (15:08 +0200)]
libvirt plugin: Further improve the connection handling.
Use the complaint mechanism for failed connection attempts and handle multiple
`Connection' configuration options like other options in other plugins (i. e.
later options overwrite earlier settings of the same name).
Ondrej Zajicek [Tue, 11 Aug 2009 09:44:28 +0000 (11:44 +0200)]
madwifi plugin: Plugin for detailed information from the MadWifi driver.
Hello
After some time i managed to make a new version of Madwifi plugin. The
main change is that it is possible to finely tune the set of monitored
statistics and just the most important statistics are monitored by
default. Also the number of new data types is reduced (by using type
instances).
Florian Forster [Tue, 4 Aug 2009 11:02:57 +0000 (13:02 +0200)]
network plugin: Use the meta data to implement the `Forward' option.
Previously, a cache in the network plugin was used to keep track of
which values were received via the network in order to distinguish
between ``forwarded'' values and values that were received from
somewhere else.
The same cache was also used to avoid loops when forwarding packages by
keeping track of the highest timestamp that was sent by the plugin and
discard received data that was older or as old as that.
This information is not kept in the meta data of the global cache (what
is the last timestamp sent) and the meta data of the value list (was
this value list received via the network?). The cache that was
maintained in the network plugin has been removed.
Sebastian Harl [Wed, 8 Jul 2009 11:19:57 +0000 (13:19 +0200)]
src/utils_cache.c: Make really sure to free the right cache entry.
Make sure we do not try to free a (possibly some random) cache entry after
removing it from the AVL tree. Potentially, this might have caused invalid
free()s in some rare situations.
These two new functions can be used to get historical data of values in
the cache. This can be used to calculate floating averages, hysteresis
and a shipload of other aggregation and consolidation functions.
The current implementation is probably not yet perfect:
- If not enough values are available to satisfy the request, the buffer
will be enlarged and NaNs will be returned in the newly allocated
cells. The caller has no way to recognize this case.
- If a value is missing, no NaNs will be added to the cache. It's
unclear if this was desirable.
- The returned values are reversed, i. e. val[0] will be the newest
value, val[n-1] will be the oldest. Here, too, I'm unsure which way
is easier to comprehend / use. I went for this implementation because
it was easier to write.