From: Alan Jenkins Date: Fri, 7 Jul 2017 09:48:27 +0000 (+0100) Subject: Fix madvise (and fadvise) (#802) X-Git-Tag: v1.7.1~122 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=da65effdeee895f3bbb7b41e5f8af77c7ab47afc;p=thirdparty%2Frrdtool-1.x.git Fix madvise (and fadvise) (#802) * fadvise and madvise code in rrd_open can be kept together There was no reason to break up fadvise and madvise. Putting them together makes it _slightly_ clearer that the exising #ifdefs are intended. (Documentation on the effect of fadvise() on mmap() is hard to find, so it was simpler to avoid doing that). Also, setvbuf() clearly doesn't apply to fd's. Buffering is a function of stdio e.g. fread(), fwrite() etc., but rrd_open() does not (currently?) use these, it uses fds directly. Hence, we have no buffering already :). There must be some reason people use fread() but disable buffering, but I can't think why rrd_open() might want to do that in future. Remove the commented code which uses setvbuf(). * rrd_open(... RRD_COPY) does not want FADV_RANDOM fix the !HAVE_MMAP case, to match the USE_MADVISE code. # Conflicts: # src/rrd_open.c * Passing RRD_READAHEAD to rrd_open() had no effect on current linux, fix it MAP_NONBLOCK (since Linux 2.5.46) Only meaningful in conjunction with MAP_POPULATE. Don't perform read-ahead: create page tables entries only for pages that are already present in RAM. Since Linux 2.6.23, this flag causes MAP_POPULATE to do nothing. One day, the combination of MAP_POPULATE and MAP_NONBLOCK may be reimplemented. RRD_READAHEAD has been "broken" for a looong time. But I don't think it's a good idea to carry code for RRD_READAHEAD which does nothing, at the same time as we disable the default readahead by passing MADV_RANDOM. This solution would not necessarily be API-quality, as changing the kernel access hint from RANDOM to SEQUENTIAL means it's allowed to do drop-behind. I am crudely assuming the in-tree users wouldn't have a problem with this. Test: for i in $(seq 50); do dd if=dns.rrd of=x/$i.rrd bs=1M oflag=direct 2>/dev/null done /usr/bin/time sh -c 'for i in x/*.rrd; do rrdtool dump $i >/dev/null done' Before: 6.52user 0.74system 0:09.46elapsed 76%CPU (0avgtext+0avgdata 9796maxresident)k 204800inputs+0outputs (25600major+25112minor)pagefaults 0swaps After: 5.82user 0.27system 0:06.57elapsed 92%CPU (0avgtext+0avgdata 9740maxresident)k 204800inputs+0outputs (600major+27020minor)pagefaults 0swaps I do not add madvise() WILLNEED before or after SEQUENTIAL. Counter-intuitively, the test shows this would decrease performance. (WILLNEED was implemented simply as triggering kernel readahead. I think the implementation fails to use the increased readahead window. Compare the number of major page faults). WILLNEED after SEQUENTIAL: 6.02user 0.30system 0:06.96elapsed 90%CPU (0avgtext+0avgdata 9528maxresident)k 204800inputs+0outputs (900major+26648minor)pagefaults 0swaps WILLNEED before SEQUENTIAL: 6.35user 0.27system 0:07.28elapsed 91%CPU (0avgtext+0avgdata 9548maxresident)k 204800inputs+0outputs (900major+26864minor)pagefaults 0swaps * rrd_open() does not want to enable drop-behind of memory-mapped header MADV_SEQUENTIAL mentions drop-behind. Override it for the header now we've read it, in case anyone implemented drop-behind. Linux does not currently implement drop-behind. My performance test confirmed this did not help, but it didn't hurt either. Do *not* fall back to fadvise() for !HAVE_MMAP. We've already copied the header in that case. Doing e.g. FADV_NORMAL on Linux (4.12) on *any* region would negate the effect of previous FADV_SEQUENTIAL. --- diff --git a/src/rrd_open.c b/src/rrd_open.c index 02d3fcb4..fef5482e 100644 --- a/src/rrd_open.c +++ b/src/rrd_open.c @@ -306,23 +306,6 @@ rrd_file_t *rrd_open( } } no_lseek_necessary: -#if !defined(HAVE_MMAP) && defined(HAVE_POSIX_FADVISE) - /* In general we need no read-ahead when dealing with rrd_files. - When we stop reading, it is highly unlikely that we start up again. - In this manner we actually save time and disk access (and buffer cache). - Thanks to Dave Plonka for the Idea of using POSIX_FADV_RANDOM here. */ - posix_fadvise(rrd_simple_file->fd, 0, 0, POSIX_FADV_RANDOM); -#endif - -/* - if (rdwr & RRD_READWRITE) - { - if (setvbuf((rrd_simple_file->fd),NULL,_IONBF,2)) { - rrd_set_error("failed to disable the stream buffer\n"); - return (-1); - } - } -*/ #ifdef HAVE_MMAP #ifndef HAVE_POSIX_FALLOCATE @@ -376,15 +359,32 @@ rrd_file_t *rrd_open( #endif if (rdwr & RRD_CREAT) goto out_done; + + if (rdwr & RRD_READAHEAD) { + /* If perfect READAHEAD is not achieved for whatever reason, caller + will not thank us for advising the kernel of RANDOM access below.*/ + rdwr |= RRD_COPY; + } + /* In general we need no read-ahead when dealing with rrd_files. + When we stop reading, it is highly unlikely that we start up again. + In this manner we actually save time and disk access (and buffer cache). + Thanks to Dave Plonka for the Idea of using POSIX_FADV_RANDOM here. */ #ifdef USE_MADVISE if (rdwr & RRD_COPY) { /* We will read everything in a moment (copying) */ - madvise(data, rrd_file->file_len, MADV_SEQUENTIAL ); + madvise(data, rrd_file->file_len, MADV_SEQUENTIAL); } else { /* We do not need to read anything in for the moment */ madvise(data, rrd_file->file_len, MADV_RANDOM); } #endif +#if !defined(HAVE_MMAP) && defined(HAVE_POSIX_FADVISE) + if (rdwr & RRD_COPY) { + posix_fadvise(rrd_simple_file->fd, 0, 0, POSIX_FADV_SEQUENTIAL); + } else { + posix_fadvise(rrd_simple_file->fd, 0, 0, POSIX_FADV_RANDOM); + } +#endif #ifdef HAVE_LIBRADOS read_check: @@ -443,6 +443,21 @@ read_check: rrd_file->header_len = offset; rrd_file->pos = offset; +#if defined(HAVE_MMAP) && defined(USE_MADVISE) + if (data != MAP_FAILED) { + /* MADV_SEQUENTIAL mentions drop-behind. Override it for the header + * now we've read it, in case anyone implemented drop-behind. + * + * Do *not* fall back to fadvise() for !HAVE_MMAP. In that case, + * we've copied the header and will not read it again. Doing e.g. + * FADV_NORMAL on Linux (4.12) on *any* region would negate the + * effect of previous FADV_SEQUENTIAL. + */ + madvise(data, sysconf(_SC_PAGESIZE), MADV_NORMAL); + madvise(data, sysconf(_SC_PAGESIZE), MADV_WILLNEED); + } +#endif + { unsigned long row_cnt = 0;