--- /dev/null
+From stable-bounces@linux.kernel.org Thu Mar 20 16:13:46 2008
+Date: Thu, 20 Mar 2008 02:45:07 GMT
+Message-Id: <200803200245.m2K2j7jY024719@hera.kernel.org>
+From: jejb@kernel.org
+To: jejb@kernel.org, stable@kernel.org
+Subject: aio: bad AIO race in aio_complete() leads to process hang
+
+From: Quentin Barnes <qbarnes+linux@yahoo-inc.com>
+commit: 6cb2a21049b8990df4576c5fce4d48d0206c22d5
+
+My group ran into a AIO process hang on a 2.6.24 kernel with the process
+sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come
+and it never would.
+
+We ran the tests on x86_64 SMP. The hang only occurred on a Xeon box
+("Clovertown") but not a Core2Duo ("Conroe"). On the Xeon, the L2 cache isn't
+shared between all eight processors, but is L2 is shared between between all
+two processors on the Core2Duo we use.
+
+My analysis of the hang is if you go down to the second while-loop
+in read_events(), what happens on processor #1:
+ 1) add_wait_queue_exclusive() adds thread to ctx->wait
+ 2) aio_read_evt() to check tail
+ 3) if aio_read_evt() returned 0, call [io_]schedule() and sleep
+
+In aio_complete() with processor #2:
+ A) info->tail = tail;
+ B) waitqueue_active(&ctx->wait)
+ C) if waitqueue_active() returned non-0, call wake_up()
+
+The way the code is written, step 1 must be seen by all other processors
+before processor 1 checks for pending events in step 2 (that were recorded by
+step A) and step A by processor 2 must be seen by all other processors
+(checked in step 2) before step B is done.
+
+The race I believed I was seeing is that steps 1 and 2 were
+effectively swapped due to the __list_add() being delayed by the L2
+cache not shared by some of the other processors. Imagine:
+proc 2: just before step A
+proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet
+proc 1, step 2: checks tail and sees no pending events
+proc 2, step A: updates tail
+proc 1, step 3: calls [io_]schedule() and sleeps
+proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup
+ so proc 1 sleeps indefinitely
+
+My patch adds a memory barrier between steps A and B. It ensures that the
+update in step 1 gets seen on processor 2 before continuing. If processor 1
+was just before step 1, the memory barrier makes sure that step A (update
+tail) gets seen by the time processor 1 makes it to step 2 (check tail).
+
+Before the patch our AIO process would hang virtually 100% of the time. After
+the patch, we have yet to see the process ever hang.
+
+Signed-off-by: Quentin Barnes <qbarnes+linux@yahoo-inc.com>
+Reviewed-by: Zach Brown <zach.brown@oracle.com>
+Cc: Benjamin LaHaise <bcrl@kvack.org>
+Cc: <stable@kernel.org>
+Cc: Nick Piggin <nickpiggin@yahoo.com.au>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+[ We should probably disallow that "if (waitqueue_active()) wake_up()"
+ coding pattern, because it's so often buggy wrt memory ordering ]
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Chris Wright <chrisw@sous-sol.org>
+---
+ fs/aio.c | 8 ++++++++
+ 1 file changed, 8 insertions(+)
+
+--- a/fs/aio.c
++++ b/fs/aio.c
+@@ -997,6 +997,14 @@ put_rq:
+ /* everything turned out well, dispose of the aiocb. */
+ ret = __aio_put_req(ctx, iocb);
+
++ /*
++ * We have to order our ring_info tail store above and test
++ * of the wait list below outside the wait lock. This is
++ * like in wake_up_bit() where clearing a bit has to be
++ * ordered with the unlocked test.
++ */
++ smp_mb();
++
+ if (waitqueue_active(&ctx->wait))
+ wake_up(&ctx->wait);
+
--- /dev/null
+From stable-bounces@linux.kernel.org Thu Mar 20 16:11:35 2008
+Date: Wed, 19 Mar 2008 04:40:04 GMT
+Message-Id: <200803190440.m2J4e4Bk023448@hera.kernel.org>
+From: jejb@kernel.org
+To: jejb@kernel.org, stable@kernel.org
+Subject: async_tx: avoid the async xor_zero_sum path when src_cnt > device->max_xor
+
+From: Dan Williams <dan.j.williams@intel.com>
+commit: 8d8002f642886ae256a3c5d70fe8aff4faf3631a
+
+If the channel cannot perform the operation in one call to
+->device_prep_dma_zero_sum, then fallback to the xor+page_is_zero path.
+This only affects users with arrays larger than 16 devices on iop13xx or
+32 devices on iop3xx.
+
+Cc: <stable@kernel.org>
+Cc: Neil Brown <neilb@suse.de>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+[chrisw@sous-sol.org: backport to 2.6.24.3]
+Signed-off-by: Chris Wright <chrisw@sous-sol.org>
+---
+please verify the backport makes sense
+
+ crypto/async_tx/async_xor.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/crypto/async_tx/async_xor.c
++++ b/crypto/async_tx/async_xor.c
+@@ -264,7 +264,7 @@ async_xor_zero_sum(struct page *dest, st
+
+ BUG_ON(src_cnt <= 1);
+
+- if (tx) {
++ if (tx && src_cnt <= device->max_xor) {
+ dma_addr_t dma_addr;
+ enum dma_data_direction dir;
+
--- /dev/null
+From stable-bounces@linux.kernel.org Thu Mar 20 16:13:23 2008
+Date: Thu, 20 Mar 2008 02:45:06 GMT
+Message-Id: <200803200245.m2K2j6jD024675@hera.kernel.org>
+From: jejb@kernel.org
+To: jejb@kernel.org, stable@kernel.org
+Subject: jbd: correctly unescape journal data blocks
+
+From: Duane Griffin <duaneg@dghda.com>
+commit: 439aeec639d7c57f3561054a6d315c40fd24bb74
+
+Fix a long-standing typo (predating git) that will cause data corruption if a
+journal data block needs unescaping. At the moment the wrong buffer head's
+data is being unescaped.
+
+To test this case mount a filesystem with data=journal, start creating and
+deleting a bunch of files containing only JFS_MAGIC_NUMBER (0xc03b3998), then
+pull the plug on the device. Without this patch the files will contain zeros
+instead of the correct data after recovery.
+
+Signed-off-by: Duane Griffin <duaneg@dghda.com>
+Acked-by: Jan Kara <jack@suse.cz>
+Cc: <linux-ext4@vger.kernel.org>
+Cc: <stable@kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Chris Wright <chrisw@sous-sol.org>
+---
+ fs/jbd/recovery.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/fs/jbd/recovery.c
++++ b/fs/jbd/recovery.c
+@@ -478,7 +478,7 @@ static int do_one_pass(journal_t *journa
+ memcpy(nbh->b_data, obh->b_data,
+ journal->j_blocksize);
+ if (flags & JFS_FLAG_ESCAPE) {
+- *((__be32 *)bh->b_data) =
++ *((__be32 *)nbh->b_data) =
+ cpu_to_be32(JFS_MAGIC_NUMBER);
+ }
+
--- /dev/null
+From stable-bounces@linux.kernel.org Thu Mar 20 16:12:49 2008
+Date: Thu, 20 Mar 2008 02:45:05 GMT
+Message-Id: <200803200245.m2K2j5Lw024656@hera.kernel.org>
+From: jejb@kernel.org
+To: jejb@kernel.org, stable@kernel.org
+Subject: jbd2: correctly unescape journal data blocks
+
+From: Duane Griffin <duaneg@dghda.com>
+commit: d00256766a0b4f1441931a7f569a13edf6c68200
+
+Fix a long-standing typo (predating git) that will cause data corruption if a
+journal data block needs unescaping. At the moment the wrong buffer head's
+data is being unescaped.
+
+To test this case mount a filesystem with data=journal, start creating and
+deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then
+pull the plug on the device. Without this patch the files will contain zeros
+instead of the correct data after recovery.
+
+Signed-off-by: Duane Griffin <duaneg@dghda.com>
+Acked-by: Jan Kara <jack@suse.cz>
+Cc: <linux-ext4@vger.kernel.org>
+Cc: <stable@kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Chris Wright <chrisw@sous-sol.org>
+---
+ fs/jbd2/recovery.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/fs/jbd2/recovery.c
++++ b/fs/jbd2/recovery.c
+@@ -488,7 +488,7 @@ static int do_one_pass(journal_t *journa
+ memcpy(nbh->b_data, obh->b_data,
+ journal->j_blocksize);
+ if (flags & JBD2_FLAG_ESCAPE) {
+- *((__be32 *)bh->b_data) =
++ *((__be32 *)nbh->b_data) =
+ cpu_to_be32(JBD2_MAGIC_NUMBER);
+ }
+
netfilter-xt_time-fix-failure-to-match-on-sundays.patch
netfilter-nfnetlink_queue-fix-computation-of-allocated-size-for-netlink-skb.patch
netfilter-nfnetlink_log-fix-computation-of-netlink-skb-size.patch
+zisofs-fix-readpage-outside-i_size.patch
+jbd2-correctly-unescape-journal-data-blocks.patch
+jbd-correctly-unescape-journal-data-blocks.patch
+aio-bad-aio-race-in-aio_complete-leads-to-process-hang.patch
+async_tx-avoid-the-async-xor_zero_sum-path-when-src_cnt-device-max_xor.patch
--- /dev/null
+From stable-bounces@linux.kernel.org Thu Mar 20 16:12:02 2008
+Date: Thu, 20 Mar 2008 02:45:04 GMT
+Message-Id: <200803200245.m2K2j46b024586@hera.kernel.org>
+From: jejb@kernel.org
+To: jejb@kernel.org, stable@kernel.org
+Subject: zisofs: fix readpage() outside i_size
+
+From: Dave Young <hidave.darkstar@gmail.com>
+commit: 08ca0db8aa2db4ddcf487d46d85dc8ffb22162cc
+
+A read request outside i_size will be handled in do_generic_file_read(). So
+we just return 0 to avoid getting -EIO as normal reading, let
+do_generic_file_read do the rest.
+
+At the same time we need unlock the page to avoid system stuck.
+
+Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10227
+
+Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
+Acked-by: Jan Kara <jack@suse.cz>
+Report-by: Christian Perle <chris@linuxinfotag.de>
+Cc: <stable@kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Chris Wright <chrisw@sous-sol.org>
+---
+ fs/isofs/compress.c | 11 +++++++++++
+ 1 file changed, 11 insertions(+)
+
+--- a/fs/isofs/compress.c
++++ b/fs/isofs/compress.c
+@@ -72,6 +72,17 @@ static int zisofs_readpage(struct file *
+ offset = index & ~zisofs_block_page_mask;
+ blockindex = offset >> zisofs_block_page_shift;
+ maxpage = (inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
++
++ /*
++ * If this page is wholly outside i_size we just return zero;
++ * do_generic_file_read() will handle this for us
++ */
++ if (page->index >= maxpage) {
++ SetPageUptodate(page);
++ unlock_page(page);
++ return 0;
++ }
++
+ maxpage = min(zisofs_block_pages, maxpage-offset);
+
+ for ( i = 0 ; i < maxpage ; i++, offset++ ) {