Tobias Oetiker [Fri, 7 Nov 2008 13:51:24 +0000 (13:51 +0000)]
Much simpler handling of timestamp errors. Return an error to the user
when any of the time stamp values are invalid. This is similar to
RRDTool's normal behavior. Removed the complex logic previously used to
return error codes to the user.
This solves a bug where non-advancing timestamps could have produced
incorrect error output during "BATCH" mode. The bug was cause by using
the sock->wbuf pointer for the error output. -- kevin brintnall
Tobias Oetiker [Tue, 4 Nov 2008 07:12:46 +0000 (07:12 +0000)]
I realize now that the problem is the line
test -f lua/Makefile && cd lua && $(MAKE) install || true
in the target "install-data-local", in bindings/Makefile.am. It forces
execution of bindings/lua/Makefile independently of lua being found or
not. I added that line in my first patch, following perl, python and
ruby build style, but it's not needed after I switched to automake. The
make recursion is controlled by SUBDIRS, which will only contain "lua"
if BUILD_LUA is true.
Tobias Oetiker [Tue, 28 Oct 2008 08:57:13 +0000 (08:57 +0000)]
- remove the spaccing between the elements
- add xsd support to dump output
- change the argument "[--no-header|-n]" to [--header|-h {xsd,dtd}]
-- tobias.lindenmann 1und1.de
Tobias Oetiker [Wed, 22 Oct 2008 20:41:59 +0000 (20:41 +0000)]
The previous code relied on the assumption that pthread_cond_init(&cond)
was equivalent to memset(&cond,0). This may not be true on all platforms.
-- kevin
Tobias Oetiker [Wed, 22 Oct 2008 06:02:23 +0000 (06:02 +0000)]
remove_cache_item() did not check whether a file was in queue before
modifying the cache head/tail pointers. Therefore, the process of
flushing old files may perturb the cache_queue_head pointer. This caused
some nodes with CI_FLAGS_IN_QUEUE to be un-linked from the queue list.
Thereafter, they would not be flushed by any periodic process (although
they could be revived with FLUSH or UPDATE). This caused a slow memory
leak for files that are no longer updated. Pending updates for these
"abandoned" files would remain in memory ad infinitum.
With this patch, remove_from_queue() will check that the item is queued
before modifying the head/tail pointers. This restores the intended
behavior.
--kevin
Tobias Oetiker [Tue, 21 Oct 2008 05:42:50 +0000 (05:42 +0000)]
* Open all listen sockets in daemonize(), while we still have stderr.
Changed open_listen_socket_* routines to complain to stderr. Now, any
errors in binding to the listen sockets are much more obvious.
* Simplified exit of parent after fork()
* PID file will be correctly cleaned up if there is a failure in daemonize().
* unlink the unix socket before trying to bind()
(after we're sure we have the PID file)
Tobias Oetiker [Mon, 20 Oct 2008 11:46:08 +0000 (11:46 +0000)]
rrd_notify_row patch:
- Delegate choice of starting row for newly created RRD files to the rrd_open.c API.
- Introduce the rrd_notify_row() function so that an implementation can choose to align the rows of new RRDs with existing RRDs, if desirable.
- Maintain the existing behaviour (random starting row) by default.
Tobias Oetiker [Sat, 18 Oct 2008 22:32:19 +0000 (22:32 +0000)]
rrd_open should not create files with restrictive masks
Removed unnecessary "mode" varaible. The mode is only used when O_CREAT is
specified, where we want to use 0666 (as rrd_create_fn did r<=1612).
--kevin
Tobias Oetiker [Sat, 18 Oct 2008 15:50:07 +0000 (15:50 +0000)]
- encapsulate fd and mmap related variables within a private data
structure
- rrd_file_t keeps a pointer to the private data structure of type
void*, so that other block storage implementations can store their
internal data with rrd_file_t
-- Daniel.Pocock
Tobias Oetiker [Thu, 16 Oct 2008 21:12:27 +0000 (21:12 +0000)]
- rrd_open() calculates file size for new files and calls mmap once for
the whole file
- rrd_resize() cleaned up, no longer passing a size through the cookie
argument
- rrd_init(&my_rrd) must be called before rrd_open() - if people are
calling rrd_open directly from application code, this might be
troublesome. Alternative solutions: creating an additional function,
rrd_open_create(), or adding an extra argument to rrd_open() for setting
the file size
Tobias Oetiker [Tue, 14 Oct 2008 20:14:35 +0000 (20:14 +0000)]
This moves selection of the initial RRA row into the rrd_open.c API
The current implementation (random row) is used by default. However, it
now provides an opportunity for alternative implementations to integrate
with rrdtool in a single place.
Maybe there are other places in rrdtool where I should insert calls to
the function rrd_notify_row()?
This has been tested with rrdtool create and rrdtool info to verify that
random rows are selected by default (existing behaviour preserved).
Tobias Oetiker [Tue, 14 Oct 2008 19:23:24 +0000 (19:23 +0000)]
Under most circumstances, rrdcached can detect a stale pid file.
If the process in the pid file does not exist, or cannot be signalled by
the rrdcached owner, then rrdcached will replace the pid file and start
normally. Otherwise, it will complain verbosely to STDERR.
Tobias Oetiker [Tue, 14 Oct 2008 19:08:56 +0000 (19:08 +0000)]
* this preserves principle of least surprise when dealing with files that
are reachable via many path strings. i.e. when $PWD=/base/dir the
following files are the same:
/base/dir/x.rrd
x.rrd
../dir/x.rrd
* for performance, absolute paths (starting with '/') are not resolved.
this reduces the number of stat(2) system calls.
Tobias Oetiker [Mon, 13 Oct 2008 22:07:14 +0000 (22:07 +0000)]
This patch reduces the number of time()/gettimeofday() system calls when
doing high volume processing. This enables about 25% speed increase
during journal replay and "BATCH" processing. (this is a function of
syscall overhead).
* note when "BATCH" processing or journal replay starts, use that
timestamp for all commands
* use the batch start time to detect when we're in batch mode. no longer
need a separate boolean.
* pass the time_t into handle_request
* pass the time_t through to the commands that need it
Tobias Oetiker [Sat, 11 Oct 2008 09:37:53 +0000 (09:37 +0000)]
This patch introduces a feature whereby rrdcached will disallow updates
that do not advance the update time. This prevents the updates from being
discarded later by rrd_update_r.
This patch attempts to make the most of the protocol's limited ability to
return error text when using a -1 return code.
Tobias Oetiker [Fri, 10 Oct 2008 05:21:19 +0000 (05:21 +0000)]
This patch ensures that when rrdcached is stopped, it cleans up the
pid file. Apparently this is necessary if RRDCACHED_USER is not the
default "rrdcached".
-- Bernard Li
Tobias Oetiker [Fri, 10 Oct 2008 05:21:01 +0000 (05:21 +0000)]
The patch I submitted for rrdtool.spec introduced a bug where there
are two ldconfig calls in the %postun section. This patch fixes that.
-- Bernard Li
Tobias Oetiker [Tue, 7 Oct 2008 21:08:30 +0000 (21:08 +0000)]
This patch moves the permission handling code around a bit.
* moved privilege checks into the command handler functions
(possible now that we pass the sock data structures around)
* on UPDATE, delay journal_write until after check_file_access().
previously, it was possible for a high-priv socket to introduce
commands into the journal that could be replayed if they were
still in the journal at next startup.
* moved has_privilege() further up in the file to avoid need
for prototype.
Tobias Oetiker [Tue, 7 Oct 2008 16:28:24 +0000 (16:28 +0000)]
This patch introduces some extra safety checks in journal processing,
and cleans up the code a little bit.
* moved journal initialization to its own function; main() is cleaner
* any time we process a file, log the results
(previous code only loggded if there was a valid entry)
* After reading journals at startup, only trigger full flush out to disk
if the user specified -F. Avoids unnecessary IO on startup unless the
user also wants unnecessary IO on shutdown.
* journal_replay is much more careful about files it will open
* must be a regular file
* must be owned by daemon user
* must not be group/other writable
* Ensure that the journal gets created with the right permissions.
... even when the daemon is invoked with a permissive umask.
equivalent to "chmod a-x,go-w"
Tobias Oetiker [Tue, 7 Oct 2008 15:37:34 +0000 (15:37 +0000)]
aniel Pocock reported that the argument may be NULL in low-diskspace
situations, so check for that here to prevent a segmentation fault.
-- Florian Forster
Tobias Oetiker [Mon, 6 Oct 2008 19:05:47 +0000 (19:05 +0000)]
This patch introduces "BATCH" mode.
In this mode, a client can feed multiple commands to rrdcached without
waiting for acknowledgement. This permits multiple commands to be sent
for each read()/write(). This can dramatically increase the command
throughput by increasing the amount of work done per system call.
It enables over 100k updates/second with no CPU
utilization due to the reduced system calls.
Tobias Oetiker [Mon, 6 Oct 2008 19:04:48 +0000 (19:04 +0000)]
This patch introduces buffered I/O to rrdcached. Now, rrdcached can
interpret as many commands as arrive in a single read(), and it will use
fewer write()s when there are multiple output lines.
All routines now pass around listen_socket_t objects instead of file
descriptors.
All I/O is now contained in two routines. It's no longer necessary to
specify the line count in multi-line outputs, since that is calculated
automatically.
This is the foundation for accepting batched commands.
-- kevin brintnall
Tobias Oetiker [Wed, 1 Oct 2008 20:22:57 +0000 (20:22 +0000)]
since rrdcached uses pthread functions, use the threadsafe version of librrd as well. This will
also reasolve build problems on boxes there the ptherad functions must be linked explicitly.
Tobias Oetiker [Wed, 1 Oct 2008 20:01:43 +0000 (20:01 +0000)]
Fixes for the following compiler warnings:
- unused variable
- unused parameter
- assignment / argument discards qualifiers from pointer target type
- comparison between signed and unsigned
- too many arguments to function
- assignment makes pointer from integer without a cast
- incompatible pointer type
- differ in signedness
- implicit declaration of function
- enumeration value not handled in switch
- value computed is not used
Most notably, a possible segfault in the Rrd_Lastupdate() code of the TCL
bindings has been fixed.
Also, -Wundef (warn if an undefined identifier is evaluated in an #if
directive) has been removed from CFLAGS. I don't see any problem with letting
undefined identifiers evaluate to "false" in rrdtool. Keeping that option
would produce a lot of (imho unnecessary) errors which would need to be fixed
using ugly preprocessor statements like '#if defined(FOO) && FOO'.
Tobias Oetiker [Wed, 1 Oct 2008 19:48:15 +0000 (19:48 +0000)]
I've adapted an init script for rrdcached, and also incorporated it into
the spec file so that it is deployed with the RPM.
There are also some other changes to the spec file so that I could build
an RPM successfully from trunk. I'm happy to tidy up the spec file some
more if no one else wants to mandate the best way to do it.
By default, rrdcached runs as nobody. I've tested this on a server
running Ganglia gmetad.
Tobias Oetiker [Wed, 1 Oct 2008 19:44:36 +0000 (19:44 +0000)]
Now, moving a value to the head of the queue is O(1). Before it was
O(queue size). This improves performance of individual flushes when
there is a large number of files in the queue. As a result, we don't
hold the cache_lock as much.
Revamped enqueue_cache_item to take advantage of the new structure.
Renamed _wipe_ci_values to look nicer with other code.
When -B is specified, the daemon will only operate on files within the
base directory. Symlink detection is omitted for performance reasons (if
a user can create a symlink, they can probably overwrite the RRDs anyway). -- kevin
This bug caused the last line in each journal file to be processed a
second time. Since it had been modified due to tokenizing, it failed
syntax check. The daemon would always record one failed line at
end-of-journal as a result. No data loss incurred by this bug. -- kevin
This patch introduces the concept of socket privilege levels. "UPDATE"
and "FLUSHALL" commands are restricted to high-privilege sockets. "FLUSH"
commands can be executed on any socket. This is ideal for multi-user
installations where only certain users need write access to the RRD files.
Now, nearly all socket information is passed around the daemon in
listen_socket_t data structures. In case there is other per-socket state
(i.e. if we add authentication) we can put it there.
Also, I created a new "open_listen_socket_network" and removed the network
setup from "open_listen_socket". -- kevin
This patch provides better error messages to the client when something
goes wrong with the daemon. When possible, the daemon error message is
passed through to rrd_set_error() on the client. Prior to this patch,
most error conditions would result in "Internal error", which is not very
helpful. -- kevin brintnall
This patch removes an extra "SIGNALS" section in the rrdcached.pod and
merges "[BUG] fixed hang in flush_file() introduced by per-file flush
condition". -- kevin brintnall
Moved signal handler setup out of daemonize(). Coalesced common code
in preparation for new signals. Documented behavior of existing signals.
-- kevin brintnall
Attached is a patch to lower the version requirements of libtool and
automake. I have tested this on CentOS 4.x with the specified
versions of libtool and automake and was able to build RRDTool fine.
I did *not* test building with PHP, tcl, ruby or Python though.
I also abstracted the version numbers of all the dependencies such
that editing them in the future will be easier.
-- Bernard Li
When -z <jitter> is specified, some updates may be timestamped up to
<jitter> seconcds in the future. Therefore, a timeout of now+1 may not be
sufficient. Set abs_timeout past the point where any updates are
currently specified. -- kevin brintnall
The PID file is created with open() in the parent process, while we still
have STDERR open. If it cannot be created, it complains verbosely to
stderr.
The PID file is written in the child process. The only way the fdopen()
will fail on a fd that is already open is if you're completely out of
memory. As in other places in the code, I didn't consider this a case
that required a very verbose message. (Search for "strdup failed"). If
you still think a more verbose message is called for, please suggest one.
The attached patch corrects the error message to complain about fdopen()
vs fopen(). I hadn't noticed that until you brought it up.
vdef calc was using end_orig to determine for which range it should do its
calculations which is odd, since orig is only the requested range as
invocation time and not the data range deliverd by fetch. It does fall
completely flat when shifting since shifting does not affect the original
data. Bug #177 reported by hokiel
This patch ensures that the "FLUSH" command will write the updates out to
RRD before returning to the user. Before, it returned when the update was
"dequeued"; updates were not necessarily on disk.
Also, for new nodes, the cache_lock is not held while we are setting up
the new node. We don't want to be holding the lock if the stat() blocks.
-- kevin brintnal
Support for IPv6 has been broken with revision 1522: Because IPv6-addresses
contain colons, simply checking for a colon and using everything after it does
destroy correctly formatted IPv6-addresses.
This patch checks for dots '.' in the address. If the address contains at least
one dot, it is considered to be a hostname or an IPv4-address and a simple
search for a colon is done.
If no dot is found, the code will check for an opening square bracket '[' at
the beginning of the address. If one if found, the format
[address]:port
is assumed.
If neither applies, the default port will be used.
-- Florian Forster
This adds support for <address>:<port> in the rrd client library.
Obviously this is required to take advantage of the server's ability to
bind to a non-standard port -- kevin brintnall
I finally finished the first version of the patch (attached) -- Fidelis Assis fidelis pobox.com
(this does not seem to quite work yet at least not in my hardy setup)