From: Dave Chinner Date: Fri, 9 May 2014 04:50:28 +0000 (+1000) Subject: repair: don't grind CPUs with large extent lists X-Git-Tag: v3.2.0-rc3~1 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=bb9ca6c89d8182d6debd76bd5c581c53d9b99e8d;p=thirdparty%2Fxfsprogs-dev.git repair: don't grind CPUs with large extent lists When repairing a large filesystem with fragemented files, xfs_repair can grind to a halt burning multiple CPUs in blkmap_set_ext(). blkmap_set_ext() inserts extents into the blockmap for the inode fork and it keeps them in order, even if the inserts are not done in order. The ordered insert is highly inefficient - it starts at the first extent, and simple walks the array to find the insertion point. i.e. it is an O(n) operation. When we have a fragemented file with a large number of extents, the cost of the entire mapping operation is rather costly. The thing is, we are doing the insertion from an *ordered btree scan* which is inserting the extents in ascending offset order. IOWs, we are always inserting the extent at the end of the array after searching the entire array. i.e. the mapping operation cost is O(N^2). Fix this simply by reversing the order of the insert slot search. Start at the end of the blockmap array when we do almost all insertions, which brings the overhead of each insertion down to O(1) complexity. This, in turn, results in the overall map building operation being reduced to an O(N) operation, and so performance degrades linearly with increasing extent counts rather than exponentially. While there, I noticed that the growing of the blkmap array was only done 4 extents at a time. When we are dealing with files that may have hundreds of thousands of extents, growing th map only 4 extents at a time requires excessive amounts of reallocation. Reduce the reallocation rate by increasing the grow increment according to how large the array currently is. The result is that the test filesystem (27TB, 30M inodes, at ENOSPC) takes 5m10s to *fully repair* on my test system, rather that getting 15 (of 60) AGs into phase three and sitting there burning 3-4 CPUs making no progress for over half an hour. Signed-off-by: Dave Chinner Reviewed-by: Eric Sandeen Signed-off-by: Dave Chinner --- diff --git a/repair/bmap.c b/repair/bmap.c index 14161cb93..3acf997f2 100644 --- a/repair/bmap.c +++ b/repair/bmap.c @@ -260,7 +260,15 @@ blkmap_grow( { pthread_key_t key = dblkmap_key; blkmap_t *new_blkmap; - int new_naexts = blkmap->naexts + 4; + int new_naexts; + + /* reduce the number of reallocations for large files */ + if (blkmap->naexts < 1000) + new_naexts = blkmap->naexts + 4; + else if (blkmap->naexts < 10000) + new_naexts = blkmap->naexts + 100; + else + new_naexts = blkmap->naexts + 1000; if (pthread_getspecific(key) != blkmap) { key = ablkmap_key; @@ -318,15 +326,33 @@ blkmap_set_ext( } ASSERT(blkmap->nexts < blkmap->naexts); - for (i = 0; i < blkmap->nexts; i++) { - if (blkmap->exts[i].startoff > o) { - memmove(blkmap->exts + i + 1, - blkmap->exts + i, - sizeof(bmap_ext_t) * (blkmap->nexts - i)); + + if (blkmap->nexts == 0) { + i = 0; + goto insert; + } + + /* + * The most common insert pattern comes from an ascending offset order + * bmapbt scan. In this case, the extent being added will end up at the + * end of the array. Hence do a reverse order search for the insertion + * point so we don't needlessly scan the entire array on every + * insertion. + * + * Also, use "plus 1" indexing for the loop counter so when we break out + * of the loop we are at the correct index for insertion. + */ + for (i = blkmap->nexts; i > 0; i--) { + if (blkmap->exts[i - 1].startoff < o) break; - } } + /* make space for the new extent */ + memmove(blkmap->exts + i + 1, + blkmap->exts + i, + sizeof(bmap_ext_t) * (blkmap->nexts - i)); + +insert: blkmap->exts[i].startoff = o; blkmap->exts[i].startblock = b; blkmap->exts[i].blockcount = c;