From: Dave Chinner <dchinner@redhat.com>
Date: Tue, 8 Jul 2014 00:36:39 +0000 (+1000)
Subject: libxfs: reused invalidated buffers leak state and data
X-Git-Tag: v3.2.1~6
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=6af7c1eacfc3bf4fb4b782f9ab926cc8263886d7;p=thirdparty%2Fxfsprogs-dev.git

libxfs: reused invalidated buffers leak state and data

When rebuilding a bad directory, repair first truncates away all the
blocks in the directory, good or bad. This removes blocks from the
bmap btree, and when those blocks are freed the bmap btree code
invalidates them. This marks the buffers LIBXFS_B_STALE so that we
don't try to write stale data from that buffer at a later time.

However, when rebuilding the directory, blocks may get reallocated
and we reuse the underlying buffers. This has two problems.

The first is that if the buffer was previously detected as having a
verifier error (i.e. an error that is leading to the block being
freed and the buffer being invalidated) then the error might still
be held in b_error. Hence the libxfs code needs to ensure that
b_error does not leak from one buffer usage context to another
after invalidation.

The second problem is that when new data is written into a buffer,
it no longer has stale contents. Hence when we write the buffer, we
need to clear the LIBXFS_B_STALE flag to ensure that the new data
gets written.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---

diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 981f2ba52..9743d04a2 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -19,6 +19,37 @@
 #include <xfs/libxfs.h>
 #include "init.h"
 
+/*
+ * Important design/architecture note:
+ *
+ * The userspace code that uses the buffer cache is much less constrained than
+ * the kernel code. The userspace code is pretty nasty in places, especially
+ * when it comes to buffer error handling.  Very little of the userspace code
+ * outside libxfs clears bp->b_error - very little code even checks it - so the
+ * libxfs code is tripping on stale errors left by the userspace code.
+ *
+ * We can't clear errors or zero buffer contents in libxfs_getbuf-* like we do
+ * in the kernel, because those functions are used by the libxfs_readbuf_*
+ * functions and hence need to leave the buffers unchanged on cache hits. This
+ * is actually the only way to gather a write error from a libxfs_writebuf()
+ * call - you need to get the buffer again so you can check bp->b_error field -
+ * assuming that the buffer is still in the cache when you check, that is.
+ *
+ * This is very different to the kernel code which does not release buffers on a
+ * write so we can wait on IO and check errors. The kernel buffer cache also
+ * guarantees a buffer of a known initial state from xfs_buf_get() even on a
+ * cache hit.
+ *
+ * IOWs, userspace is behaving quite differently to the kernel and as a result
+ * it leaks errors from reads, invalidations and writes through
+ * libxfs_getbuf/libxfs_readbuf.
+ *
+ * The result of this is that until the userspace code outside libxfs is cleaned
+ * up, functions that release buffers from userspace control (i.e
+ * libxfs_writebuf/libxfs_putbuf) need to zero bp->b_error to prevent
+ * propagation of stale errors into future buffer operations.
+ */
+
 #define BDSTRAT_SIZE	(256 * 1024)
 
 #define IO_BCOMPARE_CHECK
@@ -632,6 +663,12 @@ libxfs_putbuf(xfs_buf_t *bp)
 			pthread_mutex_unlock(&bp->b_lock);
 		}
 	}
+	/*
+	 * ensure that any errors on this use of the buffer don't carry
+	 * over to the next user.
+	 */
+	bp->b_error = 0;
+
 	cache_node_put(libxfs_bcache, (struct cache_node *)bp);
 }
 
@@ -928,6 +965,7 @@ libxfs_writebuf_int(xfs_buf_t *bp, int flags)
 	 * subsequent reads after this write from seeing stale errors.
 	 */
 	bp->b_error = 0;
+	bp->b_flags &= ~LIBXFS_B_STALE;
 	bp->b_flags |= (LIBXFS_B_DIRTY | flags);
 	return 0;
 }
@@ -946,6 +984,7 @@ libxfs_writebuf(xfs_buf_t *bp, int flags)
 	 * subsequent reads after this write from seeing stale errors.
 	 */
 	bp->b_error = 0;
+	bp->b_flags &= ~LIBXFS_B_STALE;
 	bp->b_flags |= (LIBXFS_B_DIRTY | flags);
 	libxfs_putbuf(bp);
 	return 0;