Wolfgang Stöggl [Thu, 14 Sep 2017 09:07:21 +0000 (11:07 +0200)]
Add missing files from win32 to Makefile.am
- Missing files were added in the Makefile.am to EXTRA_DIST
- The extra files will be included in the tarball
- See [rrd-users] message:
https://lists.oetiker.ch/pipermail/rrd-users/2016-August/020443.html
Alan Jenkins [Tue, 12 Sep 2017 13:00:42 +0000 (14:00 +0100)]
src/rrd_daemon.c: "comparison between pointer and zero character constant"
rrd_daemon.c: In function ‘handle_request_list’:
rrd_daemon.c:2587:22: warning: comparison between pointer and zero character constant [-Wpointer-compare]
} while (start_ptr != '\0');
^~
rrd_daemon.c:2587:12: note: did you mean to dereference the pointer?
} while (start_ptr != '\0');
I thought this might be related to what travis picked up with valgrind
here, but I couldn't reproduce it. Also I don't think this commit fixes a
bug; all it's doing is cleaning up the compiler warning.
Starting rrdcached...
OK: empty directory ./list1_dir returns nothing
==23957== LEAK SUMMARY:
==23957== definitely lost: 0 bytes in 0 blocks
==23957== indirectly lost: 0 bytes in 0 blocks
==23957== possibly lost: 0 bytes in 0 blocks
==23957== still reachable: 14,712 bytes in 6 blocks
==23957== suppressed: 0 bytes in 0 blocks
==23957==
==23957== For counts of detected and suppressed errors, rerun with: -v
==23957== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
FAILED: (rc=valgrind error) single file /list1.rrd
FAIL: list1
Alan Jenkins [Tue, 12 Sep 2017 12:17:54 +0000 (13:17 +0100)]
Fix valgrind usage in tests (#804)
* Fix leak of socket address in rrd_daemon
==11140== 59 bytes in 1 blocks are definitely lost in loss record 132
==11140== at 0x4C2DB9D: malloc (vg_replace_malloc.c:299)
==11140== by 0x6AD8CC9: strdup (in /usr/lib64/libc-2.24.so)
==11140== by 0x40A8AF: read_options (rrd_daemon.c:4493)
==11140== by 0x403114: main (rrd_daemon.c:4882)
* create a file in case the caller has difficulty testing the rc
e.g. because they pipe the output of rrdtool though something else.
Additional valgrind suppressions were needed in 20 out of 21 tests.
* tests: don't try to remove $VALGRIND_ERR_FILE when it doesn't exist
Fix non-fatal warning.
rm: cannot remove `/home/travis/build/oetiker/rrdtool-1.x/tests/
dump-restore-valgrind-err.tmp': No such file or directory
* Link to gthread-2.0 when g_thread_init needs to be invoked
See https://developer.gnome.org/glib/stable/glib-Deprecated-Thread-APIs.html#g-thread-init:
> To use g_thread_init() in your program, you have to link
> with the libraries that the command `pkg-config --libs
> gthread-2.0` outputs. This is not the case for all the other
> thread-related functions of GLib. Those can be used without having to
> link with the thread libraries.
* Support building in a separate directory
* bindings/tcl/Makefile.am: use a relative path to the shlib in pkgIndex.tcl
Alan Jenkins [Sat, 8 Jul 2017 10:53:45 +0000 (11:53 +0100)]
Recently discovered fixes (#803)
* fix incorrect use of snprintf in rrd_cgi
DS_NAM_SIZE is used to size a buffer, but there is no relation to ds_nam.
Fortunately this lead to the buffer being over-sized.
Replace with code which does not rely on calculating the size of a
formatted string. (If the error message is around the maximum possible
size supported by librrd, we might drop a closing ']'. That won't cause
us a problem).
We don't have a shim for asprintf(). rrd_asprintf() and friends are for
writing e.g. XML etc without locale-specific formatting.
* refactor - header_len in rrd_open() is only valid for new files
this would be confusing and undesirable if it was used for anything other
than calculating newfile_size. Fortunately it is not. The variable can
easily be scoped to avoid confusion.
* rrd_open() should position "to the first cdp in the first rra"
but RRD_READVALUES was buggy. It left the fd positioned at the end of the
file, but with rrd_file->pos still set to just after the header.
(rados, which doesn't use the fd, would be unafffected by this bug, leading
to differing behaviour).
The comment for rrd_open implies we should rrd_seek() back to just after
the header. So let's resolve the inconsistency that way.
The old code with d_offset was completely pointless. It was used to
restore rrd_file->pos, but rrd_file->pos was not changed, even by the
__rrd_read macro magic.
We're only going to do something horrible later involving `(size_t) -1`.
Just dereference NULL and trigger the undefined behaviour now, rather
than confuse readers about how RRD_CREAT works when rrd->stat_head is not
provided.
(But if anyone wants to expand argument checking and make it not
conditional on DEBUG, I have no objection to that either :).
* remove unecessary strdup/free in rrd_flushcached()
* rrd_get_error() does not return NULL when there is no message
* don't silently lose error messages to ENOMEM
If we run out of memory, that's a catastrophic error; we want to report it
in some way. (Note there's a good chance the error message we lost was
caused by ENOMEM, directly or indirectly).
* rrd_client: request() never sets `res` on failure, don't bother checking it
When request() returns non-zero (failure), it does not set `ret_response`.
So there is no point trying to extract an error message from ret_response
when request() returns non-zero.
If rrdcached returned an error response (ret_response->status < 0), then
response_read() will already have passed it to rrd_set_error().
* rrd_client: add missing checks for error responses
I have no explanation for why we were chosing to leak rrd->stat_head
(and only rrd->stat_head). If it _was_ required for some reason, there was
a bug anyway, as we did not leak it if we fail to allocate live_hdr.
If it's required by legacy ABI, then we should remove rrd_open() from the
public API (and bump the soname). It was originally scheduled for removal
in v1.4.
* avoid unmapping pages that don't belong to us in rrd_close()
When a rados RRD is closed, don't `munmap(0, rrd_file->file_len)` !
Note munmap() does not return an error when there were no pages mapped, so
our callers wouldn't notice even if they check the result of rrd_close().
Also fix the code for munmap errors - so it doesn't format one message and
then immediately replace it with a less specific one. (If both munmap()
and close() fails though, we will describe the later error only).
Also, there's no point rrd_close() setting rrd_file to NULL at the end,
it won't affect the variable used by the caller.
* Add read locking through rrd_open()
Read lock or write lock is taken according to whether RRD_READONLY or
RRD_READWRITE is passed. For librrd compat, the lock is only taken
when explicitly requested by passing RRD_LOCK. Similarly rrd_lock()
is still available, but it should not be needed by new code.
So `RRD_LOCK|RRD_READONLY` is intended to take a read lock, however I must
admit to limitations of this new code:
* On rados, our write lock expires after 2 seconds. This write-lock is
probably useful as a way to detect conflicts. However I would not want
to rely on a lock that works like this to guarantee consistent reads for
a backup. I did not add read locking on rados.
If rados users need to e.g. obtain a consistent backup, they should
create a rados snapshot. (This is able to work because we use a single
atomic rados operation when when we *write* to RRDs).
* On the Windows _locking() function, read locks are treated the same as
write locks.
The Windows code required several changes, and has not been compiled
as of this moment.
With Windows _locking(), we should have been unlocking before close().
Apparently the automatic unlock on close() is not guaranteed to occur
synchronously. This isn't specific to _locking(), it's a limitation
of winapi LockFileEx().
To manually unlock, we need to know what we actually locked.
But on Windows, rrd_lock() was locking the bytes starting at the current
rrd_file->pos, instead of always starting from 0.
Also, I thought RRD_LOCK should lock the file immediately, to guarantee we
don't have conflicting writes. But on Windows we only tried to lock
the bytes that already existed in the file... which would be 0 bytes for
RRD_CREATE. Fortunately, even Windows recognizes this problem and allows
locking bytes that don't exist yet. So we can just lock the maximum number
of bytes allowed. Locking 1 byte would probably work equally well for us,
but using LONG_MAX seems cleaner to me.
Alan Jenkins [Fri, 7 Jul 2017 09:48:27 +0000 (10:48 +0100)]
Fix madvise (and fadvise) (#802)
* fadvise and madvise code in rrd_open can be kept together
There was no reason to break up fadvise and madvise.
Putting them together makes it _slightly_ clearer that the exising #ifdefs
are intended. (Documentation on the effect of fadvise() on mmap() is
hard to find, so it was simpler to avoid doing that).
Also, setvbuf() clearly doesn't apply to fd's. Buffering is a function of
stdio e.g. fread(), fwrite() etc., but rrd_open() does not (currently?)
use these, it uses fds directly. Hence, we have no buffering already :).
There must be some reason people use fread() but disable buffering,
but I can't think why rrd_open() might want to do that in future.
Remove the commented code which uses setvbuf().
* rrd_open(... RRD_COPY) does not want FADV_RANDOM
fix the !HAVE_MMAP case, to match the USE_MADVISE code.
# Conflicts:
# src/rrd_open.c
* Passing RRD_READAHEAD to rrd_open() had no effect on current linux, fix it
MAP_NONBLOCK (since Linux 2.5.46)
Only meaningful in conjunction with MAP_POPULATE. Don't perform
read-ahead: create page tables entries only for pages that are
already present in RAM. Since Linux 2.6.23, this flag causes
MAP_POPULATE to do nothing. One day, the combination of
MAP_POPULATE and MAP_NONBLOCK may be reimplemented.
RRD_READAHEAD has been "broken" for a looong time. But I don't think it's
a good idea to carry code for RRD_READAHEAD which does nothing, at the same
time as we disable the default readahead by passing MADV_RANDOM.
This solution would not necessarily be API-quality, as changing the kernel
access hint from RANDOM to SEQUENTIAL means it's allowed to do drop-behind.
I am crudely assuming the in-tree users wouldn't have a problem with this.
Test:
for i in $(seq 50); do
dd if=dns.rrd of=x/$i.rrd bs=1M oflag=direct 2>/dev/null
done
/usr/bin/time sh -c 'for i in x/*.rrd; do
rrdtool dump $i >/dev/null
done'
I do not add madvise() WILLNEED before or after SEQUENTIAL.
Counter-intuitively, the test shows this would decrease performance.
(WILLNEED was implemented simply as triggering kernel readahead.
I think the implementation fails to use the increased readahead window.
Compare the number of major page faults).
* rrd_open() does not want to enable drop-behind of memory-mapped header
MADV_SEQUENTIAL mentions drop-behind. Override it for the header now
we've read it, in case anyone implemented drop-behind.
Linux does not currently implement drop-behind. My performance test
confirmed this did not help, but it didn't hurt either.
Do *not* fall back to fadvise() for !HAVE_MMAP. We've already copied
the header in that case. Doing e.g. FADV_NORMAL on Linux (4.12) on
*any* region would negate the effect of previous FADV_SEQUENTIAL.
Alan Jenkins [Wed, 5 Jul 2017 22:52:17 +0000 (23:52 +0100)]
fix wrong printf specifier
$ cd tests && ./list1
...
ERROR: rrdcached@����������������������������������������������������������
��������������������������������������������正1: Usage: LIST [RECURSIVE]
/[<path>]
under valgrind:
==16450== Conditional jump or move depends on uninitialised value(s)
==16450== at 0x4C356A9: wcslen (vg_replace_strmem.c:1848)
==16450== by 0x535C953: wcsrtombs (in /usr/lib64/libc-2.24.so)
==16450== by 0x52FFFC4: vfprintf (in /usr/lib64/libc-2.24.so)
==16450== by 0x5328448: vsnprintf (in /usr/lib64/libc-2.24.so)
==16450== by 0x4E6C848: rrd_set_error (rrd_error.c:51)
==16450== by 0x4E6E113: response_read.constprop.5 (rrd_client.c:577)
==16450== by 0x4E6E251: request.constprop.3 (rrd_client.c:659)
==16450== by 0x4E6F319: rrd_client_list (rrd_client.c:1354)
==16450== by 0x4E6F567: rrdc_list (rrd_client.c:1412)
==16450== by 0x4E485C8: rrd_list (rrd_list.c:308)
==16450== by 0x402D07: HandleInputLine (rrd_tool.c:696)
==16450== by 0x40158B: main (rrd_tool.c:551)
==16450==
==16450== Invalid read of size 4
==16450== at 0x4C356A4: wcslen (vg_replace_strmem.c:1848)
==16450== by 0x535C953: wcsrtombs (in /usr/lib64/libc-2.24.so)
==16450== by 0x52FFFC4: vfprintf (in /usr/lib64/libc-2.24.so)
==16450== by 0x5328448: vsnprintf (in /usr/lib64/libc-2.24.so)
==16450== by 0x4E6C848: rrd_set_error (rrd_error.c:51)
==16450== by 0x4E6E113: response_read.constprop.5 (rrd_client.c:577)
==16450== by 0x4E6E251: request.constprop.3 (rrd_client.c:659)
==16450== by 0x4E6F319: rrd_client_list (rrd_client.c:1354)
==16450== by 0x4E6F567: rrdc_list (rrd_client.c:1412)
==16450== by 0x4E485C8: rrd_list (rrd_list.c:308)
==16450== by 0x402D07: HandleInputLine (rrd_tool.c:696)
==16450== by 0x40158B: main (rrd_tool.c:551)
...
$ man 3 printf
...
Conversion specifiers
...
S (Not in C99 or C11, but in SUSv2, SUSv3, and SUSv4.) Synonym for
ls. Don't use.
...
s If an l modifier is present: The const wchar_t * argument is
expected to be a pointer to an array of wide characters.
Sven Panne [Wed, 31 May 2017 12:14:42 +0000 (14:14 +0200)]
Improved buffering (#792)
* Encapsulate access to write buffer.
Fixed ssize_t vs. size_t confusion on the way.
* Improve wbuf_append's time complexity from linear to amortized constant.
This is done by the usual technique employed in many, may libraries out
there (C++'s vector, Java's ArrayList, Python's list, ...): Distinguish
between the size and the capacity of the underlying container. The capacity
grows by some *factor* (2 in our case) if it is too small, amortizing the
needed allocations/copies over time.
In a nutshell: Adding a single character to the buffer can now be done in
constant amortized time.
* Simplify and improve handle_request_fetch.
Now that add_response_info is efficient (thanks to the improved
wbuf_append), we can vastly simplify handle_request_fetch and even remove
some arbitrary length restrictions on the way.
Sven Panne [Tue, 30 May 2017 07:19:04 +0000 (09:19 +0200)]
Fixed declaration and export of rrd_dump_opt_r.
The previous state of affairs regarding rrd_dump_opt_r was a bit
inconsistent: The dynamic library didn't export it, while the static one
did. So the right way to fix it will be exporting it in both variants,
because removing it from the static library would require a major version
change. Furthermore, rrd_dump_opt_r is mentioned in the documentation at
librrd.pod, so this is another hint for the fix.
In addition, a C prototype was wmissing, so GCC complained about it.
Sven Panne [Tue, 30 May 2017 06:11:24 +0000 (08:11 +0200)]
Do not initialize Cairo/Pango on export.
On a heavily loaded rrdcached, initializing lots of unnecessary Cairo/Pango
data structures (surfaces, font-related structures, etc.) is not a good
idea. Do this only when we really want to draw something.
Sven Panne [Wed, 24 May 2017 13:18:54 +0000 (15:18 +0200)]
Added missing GLib thread initialization.
Older GLib versions (< 2.32, e.g. the ones shipped with CentOS 5 and 6,
SLES11, Debian 6) need an explicit initialization to make things
thread-safe, see:
RRDtools didn't do that, so you get random crashes in clients and the
rrdcached on those systems. Now we make sure that GLib is properly
initialized before calling into it.
make top_distdir="bxrrdtool-1.6.1" distdir="bxrrdtool-1.6.1" dist-hook
make[2]: Entering directory '/home/vficet/src/rrdtool-1.x'
cd bxrrdtool-1.6.1 && /usr/bin/perl -i -p -e 's/^\1ERSION.+/\1ERSION='1.6002';/' bindings/perl-*/*.pm
Reference to nonexistent group in regex; marked by <-- HERE in m/^\1 <-- HERE ERSION.+/ at -e line 1.
Makefile:933: recipe for target 'dist-hook' failed
Tomas Vestelind [Wed, 1 Mar 2017 15:15:53 +0000 (16:15 +0100)]
use systemd macros for rrdcached in rpm spec file
I've added:
* a variable to control whether or not to build for systemd
* systemd-provided macros in the RPM spec-file according what Fedora
recommends.
Signed-off-by: Tomas Vestelind <tomas.vestelind@gmail.com>
Tomas Vestelind [Wed, 25 Jan 2017 17:30:27 +0000 (18:30 +0100)]
restart rrdcached on upgrade, start and stop in install/uninstall (#764)
* restart rrdcached on upgrade, start and stop in install/uninstall
Start rrdcached and enable autostart when installing
Restart rrdcached and enable autostart when upgrading
Stop rrdcached and disable autostart when uninstalling
Catch non-zero return codes and return 0 in scriptlets
Signed-off-by: Tomas Vestelind <tomas.vestelind@gmail.com>
* enable and restart on both install and upgrade just before finishing the rpm installation
Signed-off-by: Tomas Vestelind <tomas.vestelind@gmail.com>
Marek Schimara [Mon, 24 Oct 2016 11:11:24 +0000 (13:11 +0200)]
src/rrd_daemon.c: fix incorrect line number reporting by rrdcached in fetch
Example of error:
rrdtool fetch --daemon 127.0.0.1:42218 <path to>.rrd --start -5min LAST
ERROR: rrdc_fetch: Got 34 lines, expected 37
While the same, without '--daemon <address>' works fine.
The bug manifests itself when the last element of line to write falls
within the last 10% of the length of 'char linebuf[1024]', ie between
922 and 1023 bytes; in that case the '\n' at the end of the line is
not written, and so not seen by count_lines(), wrong number of lines
given in the daemon's response -> 'rrdtool fetch' errors out.
Havard Eidnes [Tue, 18 Oct 2016 21:12:29 +0000 (23:12 +0200)]
Fix a signedness/unsignedness conversion bug in the expression
calculating "now" before printout of the timestamp. Negative values
are first cast to unsigned, it seems, causing overflow.
Sebastian Harl [Mon, 17 Oct 2016 07:40:22 +0000 (09:40 +0200)]
Improve the RRD client API (#742)
* RRD client: Add an interface allowing for multiple client connections.
Add a client object which resembles a persistent connection to a server.
Multiple such objects may be used in parallel.
The old, "simple" interface is a wrapper around the new interface now, using
a default client connection. As before, this can only be used for one
connection.
Thilo Bangert [Tue, 4 Oct 2016 23:33:50 +0000 (01:33 +0200)]
Read the last timestamp from the RRD file, instead of setting it to zero. (#744)
This fixes a dataloss problem when trying to insert old data. rrdcached
will accept the data at first, but then fail the whole batch when trying
to flush the data to disk.
The last timestamp is initially set to 0, and later updated to the
timestamp of the first update. However, if the RRD file already contains
newer data, fails later on. And since RRD cache writes all updates
in one go, the whole batch of samples fails and is discarded (even though,
just the first few updates are too old).
By reading in the last timestamp from the RRD file, we already fail
correctly on the individual update commands (ie. on the protocol layer).