From: Greg Kroah-Hartman Date: Mon, 5 Jun 2017 15:08:18 +0000 (+0200) Subject: 4.4-stable patches X-Git-Tag: v3.18.56~6 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=ae1944c6d022b2722dd8eba6d1fe85f13799765a;p=thirdparty%2Fkernel%2Fstable-queue.git 4.4-stable patches added patches: xfs-bad-assertion-for-delalloc-an-extent-that-start-at-i_size.patch xfs-fix-indlen-accounting-error-on-partial-delalloc-conversion.patch xfs-fix-over-copying-of-getbmap-parameters-from-userspace.patch xfs-fix-unaligned-access-in-xfs_btree_visit_blocks.patch xfs-fix-up-quotacheck-buffer-list-error-handling.patch xfs-handle-array-index-overrun-in-xfs_dir2_leaf_readbuf.patch xfs-prevent-multi-fsb-dir-readahead-from-reading-random-blocks.patch xfs-support-ability-to-wait-on-new-inodes.patch xfs-update-ag-iterator-to-support-wait-on-new-inodes.patch xfs-wait-on-new-inodes-during-quotaoff-dquot-release.patch --- diff --git a/queue-4.4/series b/queue-4.4/series index 3dc8b748e38..40ec9ee83f6 100644 --- a/queue-4.4/series +++ b/queue-4.4/series @@ -39,3 +39,13 @@ mlock-fix-mlock-count-can-not-decrease-in-race-condition.patch pci-pm-add-needs_resume-flag-to-avoid-suspend-complete-optimization.patch xfs-fix-missed-holes-in-seek_hole-implementation.patch xfs-fix-off-by-one-on-max-nr_pages-in-xfs_find_get_desired_pgoff.patch +xfs-fix-over-copying-of-getbmap-parameters-from-userspace.patch +xfs-handle-array-index-overrun-in-xfs_dir2_leaf_readbuf.patch +xfs-prevent-multi-fsb-dir-readahead-from-reading-random-blocks.patch +xfs-fix-up-quotacheck-buffer-list-error-handling.patch +xfs-support-ability-to-wait-on-new-inodes.patch +xfs-update-ag-iterator-to-support-wait-on-new-inodes.patch +xfs-wait-on-new-inodes-during-quotaoff-dquot-release.patch +xfs-fix-indlen-accounting-error-on-partial-delalloc-conversion.patch +xfs-bad-assertion-for-delalloc-an-extent-that-start-at-i_size.patch +xfs-fix-unaligned-access-in-xfs_btree_visit_blocks.patch diff --git a/queue-4.4/xfs-bad-assertion-for-delalloc-an-extent-that-start-at-i_size.patch b/queue-4.4/xfs-bad-assertion-for-delalloc-an-extent-that-start-at-i_size.patch new file mode 100644 index 00000000000..d65afee243d --- /dev/null +++ b/queue-4.4/xfs-bad-assertion-for-delalloc-an-extent-that-start-at-i_size.patch @@ -0,0 +1,46 @@ +From 892d2a5f705723b2cb488bfb38bcbdcf83273184 Mon Sep 17 00:00:00 2001 +From: Zorro Lang +Date: Mon, 15 May 2017 08:40:02 -0700 +Subject: xfs: bad assertion for delalloc an extent that start at i_size + +From: Zorro Lang + +commit 892d2a5f705723b2cb488bfb38bcbdcf83273184 upstream. + +By run fsstress long enough time enough in RHEL-7, I find an +assertion failure (harder to reproduce on linux-4.11, but problem +is still there): + + XFS: Assertion failed: (iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c + +The assertion is in xfs_getbmap() funciton: + + if (map[i].br_startblock == DELAYSTARTBLOCK && +--> map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip))) + ASSERT((iflags & BMV_IF_DELALLOC) != 0); + +When map[i].br_startoff == XFS_B_TO_FSB(mp, XFS_ISIZE(ip)), the +startoff is just at EOF. But we only need to make sure delalloc +extents that are within EOF, not include EOF. + +Signed-off-by: Zorro Lang +Reviewed-by: Brian Foster +Reviewed-by: Darrick J. Wong +Signed-off-by: Darrick J. Wong +Signed-off-by: Greg Kroah-Hartman + +--- + fs/xfs/xfs_bmap_util.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/fs/xfs/xfs_bmap_util.c ++++ b/fs/xfs/xfs_bmap_util.c +@@ -682,7 +682,7 @@ xfs_getbmap( + * extents. + */ + if (map[i].br_startblock == DELAYSTARTBLOCK && +- map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip))) ++ map[i].br_startoff < XFS_B_TO_FSB(mp, XFS_ISIZE(ip))) + ASSERT((iflags & BMV_IF_DELALLOC) != 0); + + if (map[i].br_startblock == HOLESTARTBLOCK && diff --git a/queue-4.4/xfs-fix-indlen-accounting-error-on-partial-delalloc-conversion.patch b/queue-4.4/xfs-fix-indlen-accounting-error-on-partial-delalloc-conversion.patch new file mode 100644 index 00000000000..dd43500ec42 --- /dev/null +++ b/queue-4.4/xfs-fix-indlen-accounting-error-on-partial-delalloc-conversion.patch @@ -0,0 +1,71 @@ +From 0daaecacb83bc6b656a56393ab77a31c28139bc7 Mon Sep 17 00:00:00 2001 +From: Brian Foster +Date: Fri, 12 May 2017 10:44:08 -0700 +Subject: xfs: fix indlen accounting error on partial delalloc conversion + +From: Brian Foster + +commit 0daaecacb83bc6b656a56393ab77a31c28139bc7 upstream. + +The delalloc -> real block conversion path uses an incorrect +calculation in the case where the middle part of a delalloc extent +is being converted. This is documented as a rare situation because +XFS generally attempts to maximize contiguity by converting as much +of a delalloc extent as possible. + +If this situation does occur, the indlen reservation for the two new +delalloc extents left behind by the conversion of the middle range +is calculated and compared with the original reservation. If more +blocks are required, the delta is allocated from the global block +pool. This delta value can be characterized as the difference +between the new total requirement (temp + temp2) and the currently +available reservation minus those blocks that have already been +allocated (startblockval(PREV.br_startblock) - allocated). + +The problem is that the current code does not account for previously +allocated blocks correctly. It subtracts the current allocation +count from the (new - old) delta rather than the old indlen +reservation. This means that more indlen blocks than have been +allocated end up stashed in the remaining extents and free space +accounting is broken as a result. + +Fix up the calculation to subtract the allocated block count from +the original extent indlen and thus correctly allocate the +reservation delta based on the difference between the new total +requirement and the unused blocks from the original reservation. +Also remove a bogus assert that contradicts the fact that the new +indlen reservation can be larger than the original indlen +reservation. + +Signed-off-by: Brian Foster +Reviewed-by: Darrick J. Wong +Signed-off-by: Darrick J. Wong +Signed-off-by: Greg Kroah-Hartman + +--- + fs/xfs/libxfs/xfs_bmap.c | 7 ++++--- + 1 file changed, 4 insertions(+), 3 deletions(-) + +--- a/fs/xfs/libxfs/xfs_bmap.c ++++ b/fs/xfs/libxfs/xfs_bmap.c +@@ -2179,8 +2179,10 @@ xfs_bmap_add_extent_delay_real( + } + temp = xfs_bmap_worst_indlen(bma->ip, temp); + temp2 = xfs_bmap_worst_indlen(bma->ip, temp2); +- diff = (int)(temp + temp2 - startblockval(PREV.br_startblock) - +- (bma->cur ? bma->cur->bc_private.b.allocated : 0)); ++ diff = (int)(temp + temp2 - ++ (startblockval(PREV.br_startblock) - ++ (bma->cur ? ++ bma->cur->bc_private.b.allocated : 0))); + if (diff > 0) { + error = xfs_mod_fdblocks(bma->ip->i_mount, + -((int64_t)diff), false); +@@ -2232,7 +2234,6 @@ xfs_bmap_add_extent_delay_real( + temp = da_new; + if (bma->cur) + temp += bma->cur->bc_private.b.allocated; +- ASSERT(temp <= da_old); + if (temp < da_old) + xfs_mod_fdblocks(bma->ip->i_mount, + (int64_t)(da_old - temp), false); diff --git a/queue-4.4/xfs-fix-over-copying-of-getbmap-parameters-from-userspace.patch b/queue-4.4/xfs-fix-over-copying-of-getbmap-parameters-from-userspace.patch new file mode 100644 index 00000000000..032c0d5d3ce --- /dev/null +++ b/queue-4.4/xfs-fix-over-copying-of-getbmap-parameters-from-userspace.patch @@ -0,0 +1,38 @@ +From be6324c00c4d1e0e665f03ed1fc18863a88da119 Mon Sep 17 00:00:00 2001 +From: "Darrick J. Wong" +Date: Mon, 3 Apr 2017 15:17:57 -0700 +Subject: xfs: fix over-copying of getbmap parameters from userspace + +From: Darrick J. Wong + +commit be6324c00c4d1e0e665f03ed1fc18863a88da119 upstream. + +In xfs_ioc_getbmap, we should only copy the fields of struct getbmap +from userspace, or else we end up copying random stack contents into the +kernel. struct getbmap is a strict subset of getbmapx, so a partial +structure copy should work fine. + +Signed-off-by: Darrick J. Wong +Reviewed-by: Christoph Hellwig +Signed-off-by: Greg Kroah-Hartman + +--- + fs/xfs/xfs_ioctl.c | 5 +++-- + 1 file changed, 3 insertions(+), 2 deletions(-) + +--- a/fs/xfs/xfs_ioctl.c ++++ b/fs/xfs/xfs_ioctl.c +@@ -1379,10 +1379,11 @@ xfs_ioc_getbmap( + unsigned int cmd, + void __user *arg) + { +- struct getbmapx bmx; ++ struct getbmapx bmx = { 0 }; + int error; + +- if (copy_from_user(&bmx, arg, sizeof(struct getbmapx))) ++ /* struct getbmap is a strict subset of struct getbmapx. */ ++ if (copy_from_user(&bmx, arg, offsetof(struct getbmapx, bmv_iflags))) + return -EFAULT; + + if (bmx.bmv_count < 2) diff --git a/queue-4.4/xfs-fix-unaligned-access-in-xfs_btree_visit_blocks.patch b/queue-4.4/xfs-fix-unaligned-access-in-xfs_btree_visit_blocks.patch new file mode 100644 index 00000000000..69a32e6be8a --- /dev/null +++ b/queue-4.4/xfs-fix-unaligned-access-in-xfs_btree_visit_blocks.patch @@ -0,0 +1,35 @@ +From a4d768e702de224cc85e0c8eac9311763403b368 Mon Sep 17 00:00:00 2001 +From: Eric Sandeen +Date: Mon, 22 May 2017 19:54:10 -0700 +Subject: xfs: fix unaligned access in xfs_btree_visit_blocks + +From: Eric Sandeen + +commit a4d768e702de224cc85e0c8eac9311763403b368 upstream. + +This structure copy was throwing unaligned access warnings on sparc64: + +Kernel unaligned access at TPC[1043c088] xfs_btree_visit_blocks+0x88/0xe0 [xfs] + +xfs_btree_copy_ptrs does a memcpy, which avoids it. + +Signed-off-by: Eric Sandeen +Reviewed-by: Darrick J. Wong +Signed-off-by: Darrick J. Wong +Signed-off-by: Greg Kroah-Hartman + +--- + fs/xfs/libxfs/xfs_btree.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/fs/xfs/libxfs/xfs_btree.c ++++ b/fs/xfs/libxfs/xfs_btree.c +@@ -4064,7 +4064,7 @@ xfs_btree_change_owner( + xfs_btree_readahead_ptr(cur, ptr, 1); + + /* save for the next iteration of the loop */ +- lptr = *ptr; ++ xfs_btree_copy_ptrs(cur, &lptr, ptr, 1); + } + + /* for each buffer in the level */ diff --git a/queue-4.4/xfs-fix-up-quotacheck-buffer-list-error-handling.patch b/queue-4.4/xfs-fix-up-quotacheck-buffer-list-error-handling.patch new file mode 100644 index 00000000000..c14070f260e --- /dev/null +++ b/queue-4.4/xfs-fix-up-quotacheck-buffer-list-error-handling.patch @@ -0,0 +1,96 @@ +From 20e8a063786050083fe05b4f45be338c60b49126 Mon Sep 17 00:00:00 2001 +From: Brian Foster +Date: Fri, 21 Apr 2017 12:40:44 -0700 +Subject: xfs: fix up quotacheck buffer list error handling + +From: Brian Foster + +commit 20e8a063786050083fe05b4f45be338c60b49126 upstream. + +The quotacheck error handling of the delwri buffer list assumes the +resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag +on the buffers that are dequeued. This can lead to assert failures +on buffer release and possibly other locking problems. + +Move this code to a delwri queue cancel helper function to +encapsulate the logic required to properly release buffers from a +delwri queue. Update the helper to clear the delwri queue flag and +call it from quotacheck. + +Signed-off-by: Brian Foster +Reviewed-by: Darrick J. Wong +Signed-off-by: Darrick J. Wong +Signed-off-by: Greg Kroah-Hartman + +--- + fs/xfs/xfs_buf.c | 24 ++++++++++++++++++++++++ + fs/xfs/xfs_buf.h | 1 + + fs/xfs/xfs_qm.c | 7 +------ + 3 files changed, 26 insertions(+), 6 deletions(-) + +--- a/fs/xfs/xfs_buf.c ++++ b/fs/xfs/xfs_buf.c +@@ -979,6 +979,8 @@ void + xfs_buf_unlock( + struct xfs_buf *bp) + { ++ ASSERT(xfs_buf_islocked(bp)); ++ + XB_CLEAR_OWNER(bp); + up(&bp->b_sema); + +@@ -1713,6 +1715,28 @@ error: + } + + /* ++ * Cancel a delayed write list. ++ * ++ * Remove each buffer from the list, clear the delwri queue flag and drop the ++ * associated buffer reference. ++ */ ++void ++xfs_buf_delwri_cancel( ++ struct list_head *list) ++{ ++ struct xfs_buf *bp; ++ ++ while (!list_empty(list)) { ++ bp = list_first_entry(list, struct xfs_buf, b_list); ++ ++ xfs_buf_lock(bp); ++ bp->b_flags &= ~_XBF_DELWRI_Q; ++ list_del_init(&bp->b_list); ++ xfs_buf_relse(bp); ++ } ++} ++ ++/* + * Add a buffer to the delayed write list. + * + * This queues a buffer for writeout if it hasn't already been. Note that +--- a/fs/xfs/xfs_buf.h ++++ b/fs/xfs/xfs_buf.h +@@ -304,6 +304,7 @@ extern void xfs_buf_iomove(xfs_buf_t *, + extern void *xfs_buf_offset(struct xfs_buf *, size_t); + + /* Delayed Write Buffer Routines */ ++extern void xfs_buf_delwri_cancel(struct list_head *); + extern bool xfs_buf_delwri_queue(struct xfs_buf *, struct list_head *); + extern int xfs_buf_delwri_submit(struct list_head *); + extern int xfs_buf_delwri_submit_nowait(struct list_head *); +--- a/fs/xfs/xfs_qm.c ++++ b/fs/xfs/xfs_qm.c +@@ -1355,12 +1355,7 @@ xfs_qm_quotacheck( + mp->m_qflags |= flags; + + error_return: +- while (!list_empty(&buffer_list)) { +- struct xfs_buf *bp = +- list_first_entry(&buffer_list, struct xfs_buf, b_list); +- list_del_init(&bp->b_list); +- xfs_buf_relse(bp); +- } ++ xfs_buf_delwri_cancel(&buffer_list); + + if (error) { + xfs_warn(mp, diff --git a/queue-4.4/xfs-handle-array-index-overrun-in-xfs_dir2_leaf_readbuf.patch b/queue-4.4/xfs-handle-array-index-overrun-in-xfs_dir2_leaf_readbuf.patch new file mode 100644 index 00000000000..3ff7a0f011c --- /dev/null +++ b/queue-4.4/xfs-handle-array-index-overrun-in-xfs_dir2_leaf_readbuf.patch @@ -0,0 +1,103 @@ +From 023cc840b40fad95c6fe26fff1d380a8c9d45939 Mon Sep 17 00:00:00 2001 +From: Eric Sandeen +Date: Thu, 13 Apr 2017 15:15:47 -0700 +Subject: xfs: handle array index overrun in xfs_dir2_leaf_readbuf() + +From: Eric Sandeen + +commit 023cc840b40fad95c6fe26fff1d380a8c9d45939 upstream. + +Carlos had a case where "find" seemed to start spinning +forever and never return. + +This was on a filesystem with non-default multi-fsb (8k) +directory blocks, and a fragmented directory with extents +like this: + +0:[0,133646,2,0] +1:[2,195888,1,0] +2:[3,195890,1,0] +3:[4,195892,1,0] +4:[5,195894,1,0] +5:[6,195896,1,0] +6:[7,195898,1,0] +7:[8,195900,1,0] +8:[9,195902,1,0] +9:[10,195908,1,0] +10:[11,195910,1,0] +11:[12,195912,1,0] +12:[13,195914,1,0] +... + +i.e. the first extent is a contiguous 2-fsb dir block, but +after that it is fragmented into 1 block extents. + +At the top of the readdir path, we allocate a mapping array +which (for this filesystem geometry) can hold 10 extents; see +the assignment to map_info->map_size. During readdir, we are +therefore able to map extents 0 through 9 above into the array +for readahead purposes. If we count by 2, we see that the last +mapped index (9) is the first block of a 2-fsb directory block. + +At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill +more readahead; the outer loop assumes one full dir block is +processed each loop iteration, and an inner loop that ensures +that this is so by advancing to the next extent until a full +directory block is mapped. + +The problem is that this inner loop may step past the last +extent in the mapping array as it tries to reach the end of +the directory block. This will read garbage for the extent +length, and as a result the loop control variable 'j' may +become corrupted and never fail the loop conditional. + +The number of valid mappings we have in our array is stored +in map->map_valid, so stop this inner loop based on that limit. + +There is an ASSERT at the top of the outer loop for this +same condition, but we never made it out of the inner loop, +so the ASSERT never fired. + +Huge appreciation for Carlos for debugging and isolating +the problem. + +Debugged-and-analyzed-by: Carlos Maiolino +Signed-off-by: Eric Sandeen +Tested-by: Carlos Maiolino +Reviewed-by: Carlos Maiolino +Reviewed-by: Bill O'Donnell +Reviewed-by: Darrick J. Wong +Signed-off-by: Darrick J. Wong +Signed-off-by: Greg Kroah-Hartman + +--- + fs/xfs/xfs_dir2_readdir.c | 10 ++++++++-- + 1 file changed, 8 insertions(+), 2 deletions(-) + +--- a/fs/xfs/xfs_dir2_readdir.c ++++ b/fs/xfs/xfs_dir2_readdir.c +@@ -406,6 +406,7 @@ xfs_dir2_leaf_readbuf( + + /* + * Do we need more readahead? ++ * Each loop tries to process 1 full dir blk; last may be partial. + */ + blk_start_plug(&plug); + for (mip->ra_index = mip->ra_offset = i = 0; +@@ -437,9 +438,14 @@ xfs_dir2_leaf_readbuf( + } + + /* +- * Advance offset through the mapping table. ++ * Advance offset through the mapping table, processing a full ++ * dir block even if it is fragmented into several extents. ++ * But stop if we have consumed all valid mappings, even if ++ * it's not yet a full directory block. + */ +- for (j = 0; j < geo->fsbcount; j += length ) { ++ for (j = 0; ++ j < geo->fsbcount && mip->ra_index < mip->map_valid; ++ j += length ) { + /* + * The rest of this extent but not more than a dir + * block. diff --git a/queue-4.4/xfs-prevent-multi-fsb-dir-readahead-from-reading-random-blocks.patch b/queue-4.4/xfs-prevent-multi-fsb-dir-readahead-from-reading-random-blocks.patch new file mode 100644 index 00000000000..e2df196fab3 --- /dev/null +++ b/queue-4.4/xfs-prevent-multi-fsb-dir-readahead-from-reading-random-blocks.patch @@ -0,0 +1,80 @@ +From cb52ee334a45ae6c78a3999e4b473c43ddc528f4 Mon Sep 17 00:00:00 2001 +From: Brian Foster +Date: Thu, 20 Apr 2017 08:06:47 -0700 +Subject: xfs: prevent multi-fsb dir readahead from reading random blocks + +From: Brian Foster + +commit cb52ee334a45ae6c78a3999e4b473c43ddc528f4 upstream. + +Directory block readahead uses a complex iteration mechanism to map +between high-level directory blocks and underlying physical extents. +This mechanism attempts to traverse the higher-level dir blocks in a +manner that handles multi-fsb directory blocks and simultaneously +maintains a reference to the corresponding physical blocks. + +This logic doesn't handle certain (discontiguous) physical extent +layouts correctly with multi-fsb directory blocks. For example, +consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory +block size and a directory with the following extent layout: + + EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL + 0: [0..7]: 88..95 0 (88..95) 8 + 1: [8..15]: 80..87 0 (80..87) 8 + 2: [16..39]: 168..191 0 (168..191) 24 + 3: [40..63]: 5242952..5242975 1 (72..95) 24 + +Directory block 0 spans physical extents 0 and 1, dirblk 1 lies +entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because +extent 2 is larger than the directory block size, the readahead code +erroneously assumes the block is contiguous and issues a readahead +based on the physical mapping of the first fsb of the dirblk. This +results in read verifier failure and a spurious corruption or crc +failure, depending on the filesystem format. + +Further, the subsequent readahead code responsible for walking +through the physical table doesn't correctly advance the physical +block reference for dirblk 2. Instead of advancing two physical +filesystem blocks, the first iteration of the loop advances 1 block +(correctly), but the subsequent iteration advances 2 more physical +blocks because the next physical extent (extent 3, above) happens to +cover more than dirblk 2. At this point, the higher-level directory +block walking is completely off the rails of the actual physical +layout of the directory for the respective mapping table. + +Update the contiguous dirblock logic to consider the current offset +in the physical extent to avoid issuing directory readahead to +unrelated blocks. Also, update the mapping table advancing code to +consider the current offset within the current dirblock to avoid +advancing the mapping reference too far beyond the dirblock. + +Signed-off-by: Brian Foster +Reviewed-by: Darrick J. Wong +Signed-off-by: Darrick J. Wong +Signed-off-by: Greg Kroah-Hartman + +--- + fs/xfs/xfs_dir2_readdir.c | 5 +++-- + 1 file changed, 3 insertions(+), 2 deletions(-) + +--- a/fs/xfs/xfs_dir2_readdir.c ++++ b/fs/xfs/xfs_dir2_readdir.c +@@ -417,7 +417,8 @@ xfs_dir2_leaf_readbuf( + * Read-ahead a contiguous directory block. + */ + if (i > mip->ra_current && +- map[mip->ra_index].br_blockcount >= geo->fsbcount) { ++ (map[mip->ra_index].br_blockcount - mip->ra_offset) >= ++ geo->fsbcount) { + xfs_dir3_data_readahead(dp, + map[mip->ra_index].br_startoff + mip->ra_offset, + XFS_FSB_TO_DADDR(dp->i_mount, +@@ -450,7 +451,7 @@ xfs_dir2_leaf_readbuf( + * The rest of this extent but not more than a dir + * block. + */ +- length = min_t(int, geo->fsbcount, ++ length = min_t(int, geo->fsbcount - j, + map[mip->ra_index].br_blockcount - + mip->ra_offset); + mip->ra_offset += length; diff --git a/queue-4.4/xfs-support-ability-to-wait-on-new-inodes.patch b/queue-4.4/xfs-support-ability-to-wait-on-new-inodes.patch new file mode 100644 index 00000000000..0ec476f48dc --- /dev/null +++ b/queue-4.4/xfs-support-ability-to-wait-on-new-inodes.patch @@ -0,0 +1,71 @@ +From 756baca27fff3ecaeab9dbc7a5ee35a1d7bc0c7f Mon Sep 17 00:00:00 2001 +From: Brian Foster +Date: Wed, 26 Apr 2017 08:30:39 -0700 +Subject: xfs: support ability to wait on new inodes + +From: Brian Foster + +commit 756baca27fff3ecaeab9dbc7a5ee35a1d7bc0c7f upstream. + +Inodes that are inserted into the perag tree but still under +construction are flagged with the XFS_INEW bit. Most contexts either +skip such inodes when they are encountered or have the ability to +handle them. + +The runtime quotaoff sequence introduces a context that must wait +for construction of such inodes to correctly ensure that all dquots +in the fs are released. In anticipation of this, support the ability +to wait on new inodes. Wake the appropriate bit when XFS_INEW is +cleared. + +Signed-off-by: Brian Foster +Reviewed-by: Darrick J. Wong +Signed-off-by: Darrick J. Wong +Signed-off-by: Greg Kroah-Hartman + +--- + fs/xfs/xfs_icache.c | 5 ++++- + fs/xfs/xfs_inode.h | 4 +++- + 2 files changed, 7 insertions(+), 2 deletions(-) + +--- a/fs/xfs/xfs_icache.c ++++ b/fs/xfs/xfs_icache.c +@@ -210,14 +210,17 @@ xfs_iget_cache_hit( + + error = inode_init_always(mp->m_super, inode); + if (error) { ++ bool wake; + /* + * Re-initializing the inode failed, and we are in deep + * trouble. Try to re-add it to the reclaim list. + */ + rcu_read_lock(); + spin_lock(&ip->i_flags_lock); +- ++ wake = !!__xfs_iflags_test(ip, XFS_INEW); + ip->i_flags &= ~(XFS_INEW | XFS_IRECLAIM); ++ if (wake) ++ wake_up_bit(&ip->i_flags, __XFS_INEW_BIT); + ASSERT(ip->i_flags & XFS_IRECLAIMABLE); + trace_xfs_iget_reclaim_fail(ip); + goto out_error; +--- a/fs/xfs/xfs_inode.h ++++ b/fs/xfs/xfs_inode.h +@@ -208,7 +208,8 @@ xfs_get_initial_prid(struct xfs_inode *d + #define XFS_IRECLAIM (1 << 0) /* started reclaiming this inode */ + #define XFS_ISTALE (1 << 1) /* inode has been staled */ + #define XFS_IRECLAIMABLE (1 << 2) /* inode can be reclaimed */ +-#define XFS_INEW (1 << 3) /* inode has just been allocated */ ++#define __XFS_INEW_BIT 3 /* inode has just been allocated */ ++#define XFS_INEW (1 << __XFS_INEW_BIT) + #define XFS_ITRUNCATED (1 << 5) /* truncated down so flush-on-close */ + #define XFS_IDIRTY_RELEASE (1 << 6) /* dirty release already seen */ + #define __XFS_IFLOCK_BIT 7 /* inode is being flushed right now */ +@@ -453,6 +454,7 @@ static inline void xfs_finish_inode_setu + xfs_iflags_clear(ip, XFS_INEW); + barrier(); + unlock_new_inode(VFS_I(ip)); ++ wake_up_bit(&ip->i_flags, __XFS_INEW_BIT); + } + + static inline void xfs_setup_existing_inode(struct xfs_inode *ip) diff --git a/queue-4.4/xfs-update-ag-iterator-to-support-wait-on-new-inodes.patch b/queue-4.4/xfs-update-ag-iterator-to-support-wait-on-new-inodes.patch new file mode 100644 index 00000000000..7897913f312 --- /dev/null +++ b/queue-4.4/xfs-update-ag-iterator-to-support-wait-on-new-inodes.patch @@ -0,0 +1,188 @@ +From ae2c4ac2dd39b23a87ddb14ceddc3f2872c6aef5 Mon Sep 17 00:00:00 2001 +From: Brian Foster +Date: Wed, 26 Apr 2017 08:30:39 -0700 +Subject: xfs: update ag iterator to support wait on new inodes + +From: Brian Foster + +commit ae2c4ac2dd39b23a87ddb14ceddc3f2872c6aef5 upstream. + +The AG inode iterator currently skips new inodes as such inodes are +inserted into the inode radix tree before they are fully +constructed. Certain contexts require the ability to wait on the +construction of new inodes, however. The fs-wide dquot release from +the quotaoff sequence is an example of this. + +Update the AG inode iterator to support the ability to wait on +inodes flagged with XFS_INEW upon request. Create a new +xfs_inode_ag_iterator_flags() interface and support a set of +iteration flags to modify the iteration behavior. When the +XFS_AGITER_INEW_WAIT flag is set, include XFS_INEW flags in the +radix tree inode lookup and wait on them before the callback is +executed. + +Signed-off-by: Brian Foster +Reviewed-by: Darrick J. Wong +Signed-off-by: Darrick J. Wong +Signed-off-by: Greg Kroah-Hartman + +--- + fs/xfs/xfs_icache.c | 53 ++++++++++++++++++++++++++++++++++++++++++++-------- + fs/xfs/xfs_icache.h | 8 +++++++ + 2 files changed, 53 insertions(+), 8 deletions(-) + +--- a/fs/xfs/xfs_icache.c ++++ b/fs/xfs/xfs_icache.c +@@ -366,6 +366,22 @@ out_destroy: + return error; + } + ++static void ++xfs_inew_wait( ++ struct xfs_inode *ip) ++{ ++ wait_queue_head_t *wq = bit_waitqueue(&ip->i_flags, __XFS_INEW_BIT); ++ DEFINE_WAIT_BIT(wait, &ip->i_flags, __XFS_INEW_BIT); ++ ++ do { ++ prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE); ++ if (!xfs_iflags_test(ip, XFS_INEW)) ++ break; ++ schedule(); ++ } while (true); ++ finish_wait(wq, &wait.wait); ++} ++ + /* + * Look up an inode by number in the given file system. + * The inode is looked up in the cache held in each AG. +@@ -470,9 +486,11 @@ out_error_or_again: + + STATIC int + xfs_inode_ag_walk_grab( +- struct xfs_inode *ip) ++ struct xfs_inode *ip, ++ int flags) + { + struct inode *inode = VFS_I(ip); ++ bool newinos = !!(flags & XFS_AGITER_INEW_WAIT); + + ASSERT(rcu_read_lock_held()); + +@@ -490,7 +508,8 @@ xfs_inode_ag_walk_grab( + goto out_unlock_noent; + + /* avoid new or reclaimable inodes. Leave for reclaim code to flush */ +- if (__xfs_iflags_test(ip, XFS_INEW | XFS_IRECLAIMABLE | XFS_IRECLAIM)) ++ if ((!newinos && __xfs_iflags_test(ip, XFS_INEW)) || ++ __xfs_iflags_test(ip, XFS_IRECLAIMABLE | XFS_IRECLAIM)) + goto out_unlock_noent; + spin_unlock(&ip->i_flags_lock); + +@@ -518,7 +537,8 @@ xfs_inode_ag_walk( + void *args), + int flags, + void *args, +- int tag) ++ int tag, ++ int iter_flags) + { + uint32_t first_index; + int last_error = 0; +@@ -560,7 +580,7 @@ restart: + for (i = 0; i < nr_found; i++) { + struct xfs_inode *ip = batch[i]; + +- if (done || xfs_inode_ag_walk_grab(ip)) ++ if (done || xfs_inode_ag_walk_grab(ip, iter_flags)) + batch[i] = NULL; + + /* +@@ -588,6 +608,9 @@ restart: + for (i = 0; i < nr_found; i++) { + if (!batch[i]) + continue; ++ if ((iter_flags & XFS_AGITER_INEW_WAIT) && ++ xfs_iflags_test(batch[i], XFS_INEW)) ++ xfs_inew_wait(batch[i]); + error = execute(batch[i], flags, args); + IRELE(batch[i]); + if (error == -EAGAIN) { +@@ -640,12 +663,13 @@ xfs_eofblocks_worker( + } + + int +-xfs_inode_ag_iterator( ++xfs_inode_ag_iterator_flags( + struct xfs_mount *mp, + int (*execute)(struct xfs_inode *ip, int flags, + void *args), + int flags, +- void *args) ++ void *args, ++ int iter_flags) + { + struct xfs_perag *pag; + int error = 0; +@@ -655,7 +679,8 @@ xfs_inode_ag_iterator( + ag = 0; + while ((pag = xfs_perag_get(mp, ag))) { + ag = pag->pag_agno + 1; +- error = xfs_inode_ag_walk(mp, pag, execute, flags, args, -1); ++ error = xfs_inode_ag_walk(mp, pag, execute, flags, args, -1, ++ iter_flags); + xfs_perag_put(pag); + if (error) { + last_error = error; +@@ -667,6 +692,17 @@ xfs_inode_ag_iterator( + } + + int ++xfs_inode_ag_iterator( ++ struct xfs_mount *mp, ++ int (*execute)(struct xfs_inode *ip, int flags, ++ void *args), ++ int flags, ++ void *args) ++{ ++ return xfs_inode_ag_iterator_flags(mp, execute, flags, args, 0); ++} ++ ++int + xfs_inode_ag_iterator_tag( + struct xfs_mount *mp, + int (*execute)(struct xfs_inode *ip, int flags, +@@ -683,7 +719,8 @@ xfs_inode_ag_iterator_tag( + ag = 0; + while ((pag = xfs_perag_get_tag(mp, ag, tag))) { + ag = pag->pag_agno + 1; +- error = xfs_inode_ag_walk(mp, pag, execute, flags, args, tag); ++ error = xfs_inode_ag_walk(mp, pag, execute, flags, args, tag, ++ 0); + xfs_perag_put(pag); + if (error) { + last_error = error; +--- a/fs/xfs/xfs_icache.h ++++ b/fs/xfs/xfs_icache.h +@@ -48,6 +48,11 @@ struct xfs_eofblocks { + #define XFS_IGET_UNTRUSTED 0x2 + #define XFS_IGET_DONTCACHE 0x4 + ++/* ++ * flags for AG inode iterator ++ */ ++#define XFS_AGITER_INEW_WAIT 0x1 /* wait on new inodes */ ++ + int xfs_iget(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino, + uint flags, uint lock_flags, xfs_inode_t **ipp); + +@@ -72,6 +77,9 @@ void xfs_eofblocks_worker(struct work_st + int xfs_inode_ag_iterator(struct xfs_mount *mp, + int (*execute)(struct xfs_inode *ip, int flags, void *args), + int flags, void *args); ++int xfs_inode_ag_iterator_flags(struct xfs_mount *mp, ++ int (*execute)(struct xfs_inode *ip, int flags, void *args), ++ int flags, void *args, int iter_flags); + int xfs_inode_ag_iterator_tag(struct xfs_mount *mp, + int (*execute)(struct xfs_inode *ip, int flags, void *args), + int flags, void *args, int tag); diff --git a/queue-4.4/xfs-wait-on-new-inodes-during-quotaoff-dquot-release.patch b/queue-4.4/xfs-wait-on-new-inodes-during-quotaoff-dquot-release.patch new file mode 100644 index 00000000000..82feeb46b40 --- /dev/null +++ b/queue-4.4/xfs-wait-on-new-inodes-during-quotaoff-dquot-release.patch @@ -0,0 +1,54 @@ +From e20c8a517f259cb4d258e10b0cd5d4b30d4167a0 Mon Sep 17 00:00:00 2001 +From: Brian Foster +Date: Wed, 26 Apr 2017 08:30:40 -0700 +Subject: xfs: wait on new inodes during quotaoff dquot release + +From: Brian Foster + +commit e20c8a517f259cb4d258e10b0cd5d4b30d4167a0 upstream. + +The quotaoff operation has a race with inode allocation that results +in a livelock. An inode allocation that occurs before the quota +status flags are updated acquires the appropriate dquots for the +inode via xfs_qm_vop_dqalloc(). It then inserts the XFS_INEW inode +into the perag radix tree, sometime later attaches the dquots to the +inode and finally clears the XFS_INEW flag. Quotaoff expects to +release the dquots from all inodes in the filesystem via +xfs_qm_dqrele_all_inodes(). This invokes the AG inode iterator, +which skips inodes in the XFS_INEW state because they are not fully +constructed. If the scan occurs after dquots have been attached to +an inode, but before XFS_INEW is cleared, the newly allocated inode +will continue to hold a reference to the applicable dquots. When +quotaoff invokes xfs_qm_dqpurge_all(), the reference count of those +dquot(s) remain elevated and the dqpurge scan spins indefinitely. + +To address this problem, update the xfs_qm_dqrele_all_inodes() scan +to wait on inodes marked on the XFS_INEW state. We wait on the +inodes explicitly rather than skip and retry to avoid continuous +retry loops due to a parallel inode allocation workload. Since +quotaoff updates the quota state flags and uses a synchronous +transaction before the dqrele scan, and dquots are attached to +inodes after radix tree insertion iff quota is enabled, one INEW +waiting pass through the AG guarantees that the scan has processed +all inodes that could possibly hold dquot references. + +Reported-by: Eryu Guan +Signed-off-by: Brian Foster +Reviewed-by: Darrick J. Wong +Signed-off-by: Darrick J. Wong +Signed-off-by: Greg Kroah-Hartman + +--- + fs/xfs/xfs_qm_syscalls.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +--- a/fs/xfs/xfs_qm_syscalls.c ++++ b/fs/xfs/xfs_qm_syscalls.c +@@ -764,5 +764,6 @@ xfs_qm_dqrele_all_inodes( + uint flags) + { + ASSERT(mp->m_quotainfo); +- xfs_inode_ag_iterator(mp, xfs_dqrele_inode, flags, NULL); ++ xfs_inode_ag_iterator_flags(mp, xfs_dqrele_inode, flags, NULL, ++ XFS_AGITER_INEW_WAIT); + }