update 2.6.24 queue with current backlog

author Chris Wright <chrisw@sous-sol.org>

Thu, 20 Mar 2008 23:33:34 +0000 (16:33 -0700)

committer Chris Wright <chrisw@sous-sol.org>

Thu, 20 Mar 2008 23:33:34 +0000 (16:33 -0700)
author Chris Wright <chrisw@sous-sol.org>
Thu, 20 Mar 2008 23:33:34 +0000 (16:33 -0700)
committer Chris Wright <chrisw@sous-sol.org>
Thu, 20 Mar 2008 23:33:34 +0000 (16:33 -0700)
diff --git a/queue-2.6.24/aio-bad-aio-race-in-aio_complete-leads-to-process-hang.patch b/queue-2.6.24/aio-bad-aio-race-in-aio_complete-leads-to-process-hang.patch

new file mode 100644 (file)

index 0000000..3d96b4d
--- /dev/null
+++ b/queue-2.6.24/aio-bad-aio-race-in-aio_complete-leads-to-process-hang.patch
@@ -0,0 +1,85 @@
+From stable-bounces@linux.kernel.org  Thu Mar 20 16:13:46 2008
+Date: Thu, 20 Mar 2008 02:45:07 GMT
+Message-Id: <200803200245.m2K2j7jY024719@hera.kernel.org>
+From: jejb@kernel.org
+To: jejb@kernel.org, stable@kernel.org
+Subject: aio: bad AIO race in aio_complete() leads to process hang
+
+From: Quentin Barnes <qbarnes+linux@yahoo-inc.com>
+commit: 6cb2a21049b8990df4576c5fce4d48d0206c22d5
+
+My group ran into a AIO process hang on a 2.6.24 kernel with the process
+sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come
+and it never would.
+
+We ran the tests on x86_64 SMP.  The hang only occurred on a Xeon box
+("Clovertown") but not a Core2Duo ("Conroe").  On the Xeon, the L2 cache isn't
+shared between all eight processors, but is L2 is shared between between all
+two processors on the Core2Duo we use.
+
+My analysis of the hang is if you go down to the second while-loop
+in read_events(), what happens on processor #1:
+       1) add_wait_queue_exclusive() adds thread to ctx->wait
+       2) aio_read_evt() to check tail
+       3) if aio_read_evt() returned 0, call [io_]schedule() and sleep
+
+In aio_complete() with processor #2:
+       A) info->tail = tail;
+       B) waitqueue_active(&ctx->wait)
+       C) if waitqueue_active() returned non-0, call wake_up()
+
+The way the code is written, step 1 must be seen by all other processors
+before processor 1 checks for pending events in step 2 (that were recorded by
+step A) and step A by processor 2 must be seen by all other processors
+(checked in step 2) before step B is done.
+
+The race I believed I was seeing is that steps 1 and 2 were
+effectively swapped due to the __list_add() being delayed by the L2
+cache not shared by some of the other processors.  Imagine:
+proc 2: just before step A
+proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet
+proc 1, step 2: checks tail and sees no pending events
+proc 2, step A: updates tail
+proc 1, step 3: calls [io_]schedule() and sleeps
+proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup
+                so proc 1 sleeps indefinitely
+
+My patch adds a memory barrier between steps A and B.  It ensures that the
+update in step 1 gets seen on processor 2 before continuing.  If processor 1
+was just before step 1, the memory barrier makes sure that step A (update
+tail) gets seen by the time processor 1 makes it to step 2 (check tail).
+
+Before the patch our AIO process would hang virtually 100% of the time.  After
+the patch, we have yet to see the process ever hang.
+
+Signed-off-by: Quentin Barnes <qbarnes+linux@yahoo-inc.com>
+Reviewed-by: Zach Brown <zach.brown@oracle.com>
+Cc: Benjamin LaHaise <bcrl@kvack.org>
+Cc: <stable@kernel.org>
+Cc: Nick Piggin <nickpiggin@yahoo.com.au>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+[ We should probably disallow that "if (waitqueue_active()) wake_up()"
+  coding pattern, because it's so often buggy wrt memory ordering ]
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Chris Wright <chrisw@sous-sol.org>
+---
+ fs/aio.c |    8 ++++++++
+ 1 file changed, 8 insertions(+)
+
+--- a/fs/aio.c
++++ b/fs/aio.c
+@@ -997,6 +997,14 @@ put_rq:
+       /* everything turned out well, dispose of the aiocb. */
+       ret = __aio_put_req(ctx, iocb);
+ 
++      /*
++       * We have to order our ring_info tail store above and test
++       * of the wait list below outside the wait lock.  This is
++       * like in wake_up_bit() where clearing a bit has to be
++       * ordered with the unlocked test.
++       */
++      smp_mb();
++
+       if (waitqueue_active(&ctx->wait))
+               wake_up(&ctx->wait);
+ 
diff --git a/queue-2.6.24/async_tx-avoid-the-async-xor_zero_sum-path-when-src_cnt-device-max_xor.patch b/queue-2.6.24/async_tx-avoid-the-async-xor_zero_sum-path-when-src_cnt-device-max_xor.patch

new file mode 100644 (file)

index 0000000..ac88dfe
--- /dev/null
+++ b/queue-2.6.24/async_tx-avoid-the-async-xor_zero_sum-path-when-src_cnt-device-max_xor.patch
@@ -0,0 +1,37 @@
+From stable-bounces@linux.kernel.org  Thu Mar 20 16:11:35 2008
+Date: Wed, 19 Mar 2008 04:40:04 GMT
+Message-Id: <200803190440.m2J4e4Bk023448@hera.kernel.org>
+From: jejb@kernel.org
+To: jejb@kernel.org, stable@kernel.org
+Subject: async_tx: avoid the async xor_zero_sum path when src_cnt > device->max_xor
+
+From: Dan Williams <dan.j.williams@intel.com>
+commit: 8d8002f642886ae256a3c5d70fe8aff4faf3631a
+
+If the channel cannot perform the operation in one call to
+->device_prep_dma_zero_sum, then fallback to the xor+page_is_zero path.
+This only affects users with arrays larger than 16 devices on iop13xx or
+32 devices on iop3xx.
+
+Cc: <stable@kernel.org>
+Cc: Neil Brown <neilb@suse.de>
+Signed-off-by: Dan Williams <dan.j.williams@intel.com>
+[chrisw@sous-sol.org: backport to 2.6.24.3]
+Signed-off-by: Chris Wright <chrisw@sous-sol.org>
+---
+please verify the backport makes sense
+
+ crypto/async_tx/async_xor.c |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/crypto/async_tx/async_xor.c
++++ b/crypto/async_tx/async_xor.c
+@@ -264,7 +264,7 @@ async_xor_zero_sum(struct page *dest, st
+ 
+       BUG_ON(src_cnt <= 1);
+ 
+-      if (tx) {
++      if (tx && src_cnt <= device->max_xor) {
+               dma_addr_t dma_addr;
+               enum dma_data_direction dir;
+ 
diff --git a/queue-2.6.24/jbd-correctly-unescape-journal-data-blocks.patch b/queue-2.6.24/jbd-correctly-unescape-journal-data-blocks.patch

new file mode 100644 (file)

index 0000000..082b65e
--- /dev/null
+++ b/queue-2.6.24/jbd-correctly-unescape-journal-data-blocks.patch
@@ -0,0 +1,41 @@
+From stable-bounces@linux.kernel.org  Thu Mar 20 16:13:23 2008
+Date: Thu, 20 Mar 2008 02:45:06 GMT
+Message-Id: <200803200245.m2K2j6jD024675@hera.kernel.org>
+From: jejb@kernel.org
+To: jejb@kernel.org, stable@kernel.org
+Subject: jbd: correctly unescape journal data blocks
+
+From: Duane Griffin <duaneg@dghda.com>
+commit: 439aeec639d7c57f3561054a6d315c40fd24bb74
+
+Fix a long-standing typo (predating git) that will cause data corruption if a
+journal data block needs unescaping.  At the moment the wrong buffer head's
+data is being unescaped.
+
+To test this case mount a filesystem with data=journal, start creating and
+deleting a bunch of files containing only JFS_MAGIC_NUMBER (0xc03b3998), then
+pull the plug on the device.  Without this patch the files will contain zeros
+instead of the correct data after recovery.
+
+Signed-off-by: Duane Griffin <duaneg@dghda.com>
+Acked-by: Jan Kara <jack@suse.cz>
+Cc: <linux-ext4@vger.kernel.org>
+Cc: <stable@kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Chris Wright <chrisw@sous-sol.org>
+---
+ fs/jbd/recovery.c |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/fs/jbd/recovery.c
++++ b/fs/jbd/recovery.c
+@@ -478,7 +478,7 @@ static int do_one_pass(journal_t *journa
+                                       memcpy(nbh->b_data, obh->b_data,
+                                                       journal->j_blocksize);
+                                       if (flags & JFS_FLAG_ESCAPE) {
+-                                              *((__be32 *)bh->b_data) =
++                                              *((__be32 *)nbh->b_data) =
+                                               cpu_to_be32(JFS_MAGIC_NUMBER);
+                                       }
+ 
diff --git a/queue-2.6.24/jbd2-correctly-unescape-journal-data-blocks.patch b/queue-2.6.24/jbd2-correctly-unescape-journal-data-blocks.patch

new file mode 100644 (file)

index 0000000..bd533c7
--- /dev/null
+++ b/queue-2.6.24/jbd2-correctly-unescape-journal-data-blocks.patch
@@ -0,0 +1,41 @@
+From stable-bounces@linux.kernel.org  Thu Mar 20 16:12:49 2008
+Date: Thu, 20 Mar 2008 02:45:05 GMT
+Message-Id: <200803200245.m2K2j5Lw024656@hera.kernel.org>
+From: jejb@kernel.org
+To: jejb@kernel.org, stable@kernel.org
+Subject: jbd2: correctly unescape journal data blocks
+
+From: Duane Griffin <duaneg@dghda.com>
+commit: d00256766a0b4f1441931a7f569a13edf6c68200
+
+Fix a long-standing typo (predating git) that will cause data corruption if a
+journal data block needs unescaping.  At the moment the wrong buffer head's
+data is being unescaped.
+
+To test this case mount a filesystem with data=journal, start creating and
+deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then
+pull the plug on the device.  Without this patch the files will contain zeros
+instead of the correct data after recovery.
+
+Signed-off-by: Duane Griffin <duaneg@dghda.com>
+Acked-by: Jan Kara <jack@suse.cz>
+Cc: <linux-ext4@vger.kernel.org>
+Cc: <stable@kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Chris Wright <chrisw@sous-sol.org>
+---
+ fs/jbd2/recovery.c |    2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/fs/jbd2/recovery.c
++++ b/fs/jbd2/recovery.c
+@@ -488,7 +488,7 @@ static int do_one_pass(journal_t *journa
+                                       memcpy(nbh->b_data, obh->b_data,
+                                                       journal->j_blocksize);
+                                       if (flags & JBD2_FLAG_ESCAPE) {
+-                                              *((__be32 *)bh->b_data) =
++                                              *((__be32 *)nbh->b_data) =
+                                               cpu_to_be32(JBD2_MAGIC_NUMBER);
+                                       }
+ 
diff --git a/queue-2.6.24/series b/queue-2.6.24/series

index a7a4546b1e63ea1c502c97e6b77b2787c40b89d7..e5ab2273753799d9fcb0635d4fa6739d672dc509 100644 (file)
--- a/queue-2.6.24/series
+++ b/queue-2.6.24/series
@@ -66,3 +66,8 @@ sched_nr_migrate-wrong-mode-bits.patch
  netfilter-xt_time-fix-failure-to-match-on-sundays.patch
  netfilter-nfnetlink_queue-fix-computation-of-allocated-size-for-netlink-skb.patch
  netfilter-nfnetlink_log-fix-computation-of-netlink-skb-size.patch
+zisofs-fix-readpage-outside-i_size.patch
+jbd2-correctly-unescape-journal-data-blocks.patch
+jbd-correctly-unescape-journal-data-blocks.patch
+aio-bad-aio-race-in-aio_complete-leads-to-process-hang.patch
+async_tx-avoid-the-async-xor_zero_sum-path-when-src_cnt-device-max_xor.patch
diff --git a/queue-2.6.24/zisofs-fix-readpage-outside-i_size.patch b/queue-2.6.24/zisofs-fix-readpage-outside-i_size.patch

new file mode 100644 (file)

index 0000000..87bd3e8
--- /dev/null
+++ b/queue-2.6.24/zisofs-fix-readpage-outside-i_size.patch
@@ -0,0 +1,49 @@
+From stable-bounces@linux.kernel.org  Thu Mar 20 16:12:02 2008
+Date: Thu, 20 Mar 2008 02:45:04 GMT
+Message-Id: <200803200245.m2K2j46b024586@hera.kernel.org>
+From: jejb@kernel.org
+To: jejb@kernel.org, stable@kernel.org
+Subject: zisofs: fix readpage() outside i_size
+
+From: Dave Young <hidave.darkstar@gmail.com>
+commit: 08ca0db8aa2db4ddcf487d46d85dc8ffb22162cc
+
+A read request outside i_size will be handled in do_generic_file_read().  So
+we just return 0 to avoid getting -EIO as normal reading, let
+do_generic_file_read do the rest.
+
+At the same time we need unlock the page to avoid system stuck.
+
+Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10227
+
+Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
+Acked-by: Jan Kara <jack@suse.cz>
+Report-by: Christian Perle <chris@linuxinfotag.de>
+Cc: <stable@kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Chris Wright <chrisw@sous-sol.org>
+---
+ fs/isofs/compress.c |   11 +++++++++++
+ 1 file changed, 11 insertions(+)
+
+--- a/fs/isofs/compress.c
++++ b/fs/isofs/compress.c
+@@ -72,6 +72,17 @@ static int zisofs_readpage(struct file *
+       offset = index & ~zisofs_block_page_mask;
+       blockindex = offset >> zisofs_block_page_shift;
+       maxpage = (inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
++
++      /*
++       * If this page is wholly outside i_size we just return zero;
++       * do_generic_file_read() will handle this for us
++       */
++      if (page->index >= maxpage) {
++              SetPageUptodate(page);
++              unlock_page(page);
++              return 0;
++      }
++
+       maxpage = min(zisofs_block_pages, maxpage-offset);
+ 
+       for ( i = 0 ; i < maxpage ; i++, offset++ ) {
author	Chris Wright <chrisw@sous-sol.org>
	Thu, 20 Mar 2008 23:33:34 +0000 (16:33 -0700)
committer	Chris Wright <chrisw@sous-sol.org>
	Thu, 20 Mar 2008 23:33:34 +0000 (16:33 -0700)
queue-2.6.24/aio-bad-aio-race-in-aio_complete-leads-to-process-hang.patch	[new file with mode: 0644]	patch \| blob
queue-2.6.24/async_tx-avoid-the-async-xor_zero_sum-path-when-src_cnt-device-max_xor.patch	[new file with mode: 0644]	patch \| blob
queue-2.6.24/jbd-correctly-unescape-journal-data-blocks.patch	[new file with mode: 0644]	patch \| blob
queue-2.6.24/jbd2-correctly-unescape-journal-data-blocks.patch	[new file with mode: 0644]	patch \| blob
queue-2.6.24/series		patch \| blob \| blame \| history
queue-2.6.24/zisofs-fix-readpage-outside-i_size.patch	[new file with mode: 0644]	patch \| blob