git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

xfs: move local to extent inode logging into bmap helper

Source kernel commit: aeea4b75f045294e1c026acc380466daa43afc65

The callers of xfs_bmap_local_to_extents_empty() log the inode
external to the function, yet this function is where the on-disk
format value is updated. Push the inode logging down into the
function itself to help prevent future mistakes.

Note that internal bmap callers track the inode logging flags
independently and thus may log the inode core twice due to this
change. This is harmless, so leave this code around for consistency
with the other attr fork conversion functions.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove broken error handling on failed attr sf to leaf change

Source kernel commit: 603efebd6768356fb7cea02e4a822587c2dc5d7c

xfs_attr_shortform_to_leaf() attempts to put the shortform fork back
together after a failed attempt to convert from shortform to leaf
format. While this code reallocates and copies back the shortform
attr fork data, it never resets the inode format field back to local
format. Further, now that the inode is properly logged after the
initial switch from local format, any error that triggers the
recovery code will eventually abort the transaction and shutdown the
fs. Therefore, remove the broken and unnecessary error handling
code.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: log the inode on directory sf to block format change

Source kernel commit: 0b10d8a89f55c416f6a1f6a616669543fa8bdb69

When a directory changes from shortform (sf) to block format, the sf
format is copied to a temporary buffer, the inode format is modified
and the updated format filled with the dentries from the temporary
buffer. If the inode format is modified and attempt to grow the
inode fails (due to I/O error, for example), it is possible to
return an error while leaving the directory in an inconsistent state
and with an otherwise clean transaction. This results in corruption
of the associated directory and leads to xfs_dabuf_map() errors as
subsequent lookups cannot accurately determine the format of the
directory. This problem is reproduced occasionally by generic/475.

The fundamental problem is that xfs_dir2_sf_to_block() changes the
on-disk inode format without logging the inode. The inode is
eventually logged by the bmapi layer in the common case, but error
checking introduces the possibility of failing the high level
request before this happens.

Update both of the dir2 and attr callers of
xfs_bmap_local_to_extents_empty() to log the inode core as
consistent with the bmap local to extent format change codepath.
This ensures that any subsequent errors after the format has changed
cause the transaction to abort.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unused flags arg from xfs_get_aghdr_buf()

Source kernel commit: 6374ca03975ab0a2b1a5ced222e0ef2ea6e22f9e

The flags arg is always passed as zero, so remove it.

(xfs_buf_get_uncached takes flags to support XBF_NO_IOACCT for
the sb, but that should never be relevant for xfs_get_aghdr_buf)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: log proper length of superblock

Source kernel commit: 6f4ff81a4602dcfba436c6e2307d61ce9e9f652c

xfs_trans_log_buf takes first byte, last byte as args. In this
case, it should be from 0 to sizeof() - 1.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: revert 1baa2800e62d ("xfs: remove the unused XFS_ALLOC_USERDATA flag")

Source kernel commit: ce840429260a98bcfe4aaf487bb07fa346d86c41

Revert this commit, as it caused periodic regressions in xfs/173 w/
1k blocks.

[1] https://lore.kernel.org/lkml/20190919014602.GN15734@shao2-debian/

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: convert inode to extent format after extent merge due to shift

Source kernel commit: e20e174ca1bd98241b42d5ccfa228d8c6522e4e7

The collapse range operation can merge extents if two newly adjacent
extents are physically contiguous. If the extent count is reduced on
a btree format inode, a change to extent format might be necessary.
This format change currently occurs as a side effect of the file
size update after extents have been shifted for the collapse. This
codepath ultimately calls xfs_bunmapi(), which happens to check for
and execute the format conversion even if there were no blocks
removed from the mapping.

While this ultimately puts the inode into the correct state, the
fact the format conversion occurs in a separate transaction from the
change that called for it is a problem. If an extent shift
transaction commits and the filesystem happens to crash before the
format conversion, the inode fork is left in a corrupted state after
log recovery. The inode fork verifier fails and xfs_repair
ultimately nukes the inode. This problem was originally reproduced
by generic/388.

Similar to how the insert range extent split code handles extent to
btree conversion, update the collapse range extent merge code to
handle btree to extent format conversion in the same transaction
that merges the extents. This ensures that the inode fork format
remains consistent if the filesystem happens to crash in the middle
of a collapse range operation that changes the inode fork format.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: define a flags field for the AG geometry ioctl structure

Source kernel commit: 76f1793359db07205b9aefba66c7acbac988aaac

Define a flags field for the AG geometry ioctl structure.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: add a xfs_valid_startblock helper

Source kernel commit: eb77b23b565e7e1d2cc3a7d9e4cc186fb54e81a8

Add a helper that validates the startblock is valid. This checks for a
non-zero block on the main device, but skips that check for blocks on
the realtime device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove the unused XFS_ALLOC_USERDATA flag

Source kernel commit: 1baa2800e62d7d58c04cb871a6e4189fc3e1ccf1

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: allocate xattr buffer on demand

Source kernel commit: ddbca70cc45c0ac97ff6d9529e45f10b8ae73ad4

When doing file lookups and checking for permissions, we end up in
xfs_get_acl() to see if there are any ACLs on the inode. This
requires and xattr lookup, and to do that we have to supply a buffer
large enough to hold an maximum sized xattr.

On workloads were we are accessing a wide range of cache cold files
under memory pressure (e.g. NFS fileservers) we end up spending a
lot of time allocating the buffer. The buffer is 64k in length, so
is a contiguous multi-page allocation, and if that then fails we
fall back to vmalloc(). Hence the allocation here is /expensive/
when we are looking up hundreds of thousands of files a second.

Initial numbers from a bpf trace show average time in xfs_get_acl()
is ~32us, with ~19us of that in the memory allocation. Note these
are average times, so there are going to be affected by the worst
case allocations more than the common fast case...

To avoid this, we could just do a "null" lookup to see if the ACL
xattr exists and then only do the allocation if it exists. This,
however, optimises the path for the "no ACL present" case at the
expense of the "acl present" case. i.e. we can halve the time in
xfs_get_acl() for the no acl case (i.e down to ~10-15us), but that
then increases the ACL case by 30% (i.e. up to 40-45us).

To solve this and speed up both cases, drive the xattr buffer
allocation into the attribute code once we know what the actual
xattr length is. For the no-xattr case, we avoid the allocation
completely, speeding up that case. For the common ACL case, we'll
end up with a fast heap allocation (because it'll be smaller than a
page), and only for the rarer "we have a remote xattr" will we have
a multi-page allocation occur. Hence the common ACL case will be
much faster, too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: consolidate attribute value copying

Source kernel commit: 9df243a1a9e607e7cf5d20ee46edd5ec84b7e400

The same code is used to copy do the attribute copying in three
different places. Consolidate them into a single function in
preparation from on-demand buffer allocation.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: move remote attr retrieval into xfs_attr3_leaf_getvalue

Source kernel commit: e3cc4554ce1b2cc43262c93ebe6d8cde780d29ee

Because we repeat exactly the same code to get the remote attribute
value after both calls to xfs_attr3_leaf_getvalue() if it's a remote
attr. Just do it in xfs_attr3_leaf_getvalue() so the callers don't
have to care about it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unnecessary indenting from xfs_attr3_leaf_getvalue

Source kernel commit: a0e959d3c9d52a83e01a97ef352a1465550f9329

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: make attr lookup returns consistent

Source kernel commit: 728bcaa3e0f9820e923dabe5e4c34e87100bb995

Shortform, leaf and remote value attr value retrieval return
different values for success. This makes it more complex to handle
actual errors xfs_attr_get() as some errors mean success and some
mean failure. Make the return values consistent for success and
failure consistent for all attribute formats.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: reverse search directory freespace indexes

Source kernel commit: 756c6f0f7efe8759ff6dda35d220e2e753e2b0e3

When a directory is growing rapidly, new blocks tend to get added at
the end of the directory. These end up at the end of the freespace
index, and when the directory gets large finding these new
freespaces gets expensive. The code does a linear search across the
frespace index from the first block in the directory to the last,
hence meaning the newly added space is the last index searched.

Instead, do a reverse order index search, starting from the last
block and index in the freespace index. This makes most lookups for
free space on rapidly growing directories O(1) instead of O(N), but
should not have any impact on random insert workloads because the
average search length is the same regardless of which end of the
array we start at.

The result is a major improvement in large directory grow rates:

create time(sec) / rate (files/s)
File count     vanilla             Prev commit         Patched
10k         0.41 / 24.3k         0.42 / 23.8k       0.41 / 24.3k
20k         0.74 / 27.0k         0.76 / 26.3k       0.75 / 26.7k
100k         3.81 / 26.4k         3.47 / 28.8k       3.27 / 30.6k
200k         8.58 / 23.3k         7.19 / 27.8k       6.71 / 29.8k
1M        85.69 / 11.7k        48.53 / 20.6k      37.67 / 26.5k
2M       280.31 /  7.1k       130.14 / 15.3k      79.55 / 25.2k
10M      3913.26 /  2.5k                          552.89 / 18.1k

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: speed up directory bestfree block scanning

Source kernel commit: 610125ab1e4b1b48dcffe74d9d82b0606bf1b923

When running a "create millions inodes in a directory" test
recently, I noticed we were spending a huge amount of time
converting freespace block headers from disk format to in-memory
format:

31.47%  [kernel]  [k] xfs_dir2_node_addname
17.86%  [kernel]  [k] xfs_dir3_free_hdr_from_disk
3.55%  [kernel]  [k] xfs_dir3_free_bests_p

We shouldn't be hitting the best free block scanning code so hard
when doing sequential directory creates, and it turns out there's
a highly suboptimal loop searching the the best free array in
the freespace block - it decodes the block header before checking
each entry inside a loop, instead of decoding the header once before
running the entry search loop.

This makes a massive difference to create rates. Profile now looks
like this:

13.15%  [kernel]  [k] xfs_dir2_node_addname
3.52%  [kernel]  [k] xfs_dir3_leaf_check_int
3.11%  [kernel]  [k] xfs_log_commit_cil

And the wall time/average file create rate differences are
just as stark:

create time(sec) / rate (files/s)
File count           vanilla                patched
10k              0.41 / 24.3k            0.42 / 23.8k
20k              0.74 / 27.0k            0.76 / 26.3k
100k              3.81 / 26.4k            3.47 / 28.8k
200k              8.58 / 23.3k            7.19 / 27.8k
1M             85.69 / 11.7k           48.53 / 20.6k
2M            280.31 /  7.1k          130.14 / 15.3k

The larger the directory, the bigger the performance improvement.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: factor free block index lookup from xfs_dir2_node_addname_int()

Source kernel commit: 0e822255f95db400e56e19773e04755d79b50bda

Simplify the logic in xfs_dir2_node_addname_int() by factoring out
the free block index lookup code that finds a block with enough free
space for the entry to be added. The code that is moved gets a major
cleanup at the same time, but there is no algorithm change here.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: factor data block addition from xfs_dir2_node_addname_int()

Source kernel commit: a07258a695281109422011182db069a0e6f8855e

Factor out the code that adds a data block to a directory from
xfs_dir2_node_addname_int(). This makes the code flow cleaner and
more obvious and provides clear isolation of upcoming optimsations.

Signed-off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: move xfs_dir2_addname()

Source kernel commit: aee7754bbeb1000992ed7e6945418a145d181074

This gets rid of the need for a forward declaration of the static
function xfs_dir2_addname_int() and readies the code for factoring
of xfs_dir2_addname_int().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove all *_ITER_CONTINUE values

Source kernel commit: 39ee2239a5a212cbba9d96050bd11cfd0f26634d

Iterator functions already use 0 to signal "continue iterating", so get
rid of the #defines and just do it directly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove all *_ITER_ABORT values

Source kernel commit: e7ee96dfb8c2687a29d2c5c3b06c967fa54b839c

Use -ECANCELED to signal "stop iterating" instead of these magical
*_ITER_ABORT values, since it's duplicative.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: reinitialize rm_flags when unpacking an offset into an rmap irec

Source kernel commit: ffb5696f7555e44c04b7c1212fcd83ddedc43498

In xfs_rmap_irec_offset_unpack, we should always clear the contents of
rm_flags before we begin unpacking the encoded (ondisk) offset into the
incore rm_offset and incore rm_flags fields. Remove the open-coded
field zeroing as this encourages api misuse.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unnecessary int returns from deferred bmap functions

Source kernel commit: 3e08f42ae7828bac2e7445a950f5de2b08203352

Remove the return value from the functions that schedule deferred bmap
operations since they never fail and do not return status.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unnecessary int returns from deferred refcount functions

Source kernel commit: 74b4c5d4a9c073162a37d1c20c95cb33152ca474

Remove the return value from the functions that schedule deferred
refcount operations since they never fail and do not return status.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unnecessary int returns from deferred rmap functions

Source kernel commit: bc46ac64713f11c86cbbe11a86abd2a71274b15f

Remove the return value from the functions that schedule deferred rmap
operations since they never fail and do not return status.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unnecessary parameter from xfs_iext_inc_seq

Source kernel commit: 2ca09177ab9dae7cc0a888137bd2421f5829cd87

This function doesn't use the @state parameter, so get rid of it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix sign handling problem in xfs_bmbt_diff_two_keys

Source kernel commit: b521c89027f4b86af418083cb75db60a0e7a5997

In xfs_bmbt_diff_two_keys, we perform a signed int64_t subtraction with
two unsigned 64-bit quantities.  If the second quantity is actually the
"maximum" key (all ones) as used in _query_all, the subtraction
effectively becomes addition of two positive numbers and the function
returns incorrect results.  Fix this with explicit comparisons of the
unsigned values.  Nobody needs this now, but the online repair patches
will need this to work properly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: don't return _QUERY_ABORT from xfs_rmap_has_other_keys

Source kernel commit: 7380e8fec16b3527baa1153255da9cc8d5d4d309

The xfs_rmap_has_other_keys helper aborts the iteration as soon as it
has an answer. Don't let this abort leak out to callers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix maxicount division by zero error

Source kernel commit: c94613feefd7714ad0d03a1914632c15782114ff

In xfs_ialloc_setup_geometry, it's possible for a malicious/corrupt fs
image to set an unreasonably large value for sb_inopblog which will
cause ialloc_blks to be zero. If sb_imax_pct is also set, this results
in a division by zero error in the second do_div call. Therefore, force
maxicount to zero if ialloc_blks is zero.

Note that the kernel metadata verifiers will catch the garbage inopblog
value and abort the fs mount long before it tries to set up the inode
geometry; this is needed to avoid a crash in xfs_db while setting up the
xfs_mount structure.

Found by fuzzing sb_inopblog to 122 in xfs/350.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove excess function parameter description in 'xfs_btree_sblock_v5hdr_verify'

Source kernel commit: 71912e08e06b7041ba3b40945328b84798dfe6fc

Fixes gcc warning:

fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'max_recs' description in 'xfs_btree_sblock_v5hdr_verify'
fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'pag_max_level' description in 'xfs_btree_sblock_v5hdr_verify'

Fixes: c5ab131ba0df ("libxfs: refactor short btree block verification")
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: add kmem allocation trace points

Source kernel commit: 0ad95687c3adb91e762d0f6ea50a6b1137300e19

When trying to correlate XFS kernel allocations to memory reclaim
behaviour, it is useful to know what allocations XFS is actually
attempting. This information is not directly available from
tracepoints in the generic memory allocation and reclaim
tracepoints, so these new trace points provide a high level
indication of what the XFS memory demand actually is.

There is no per-filesystem context in this code, so we just trace
the type of allocation, the size and the allocation constraints.
The kmem code also doesn't include much of the common XFS headers,
so there are a few definitions that need to be added to the trace
headers and a couple of types that need to be made common to avoid
needing to include the whole world in the kmem code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

fs: xfs: Remove KM_NOSLEEP and KM_SLEEP.

Source kernel commit: 707e0ddaf67e8942448ebdd16b523e409ebe40ce

Since no caller is using KM_NOSLEEP and no callee branches on KM_SLEEP,
we can remove KM_NOSLEEP and replace KM_SLEEP with 0.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v5.3.0

Update all the necessary files for a 5.3.0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v5.3.0-rc2

Update all the necessary files for a 5.3.0-rc2 prerelease.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: fix memory leak in add_enckey

Invalid arguments to add_enckey will leak the "arg" allocation,
so fix that.

Fixes: ba71de04 ("xfs_io/encrypt: add 'add_enckey' command")
Fixes-coverity-id: 1454644
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
[sandeen: incorporate improvements from Eric Biggers]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: fix complaint about uninitialized ret

Coverity complained about the uninitialized ret in run_scrub_phases.
It's not sophisticated enough to realize that phase 1 and 7 are both
marked mustrun and are never the repair or datascan dummies and that
therefore ret is always initialized by the end of the for loop, but
OTOH there's no reason not to fix a trivial logic bomb if that ever
changes.

Coverity-id: 1455255
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: convert workqueue.c functions to negative error codes

Convert libfrog functions to return negative error codes like libxfs
does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: convert scrub.c functions to negative error codes

Convert libfrog functions to return negative error codes like libxfs
does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: convert ptvar.c functions to negative error codes

Convert libfrog functions to return negative error codes like libxfs
does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: convert bulkstat.c functions to negative error codes

Convert libfrog functions to return negative error codes like libxfs
does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: convert fsgeom.c functions to negative error codes

Convert libfrog functions to return negative error codes like libxfs
does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: convert bitmap.c to negative error codes

Convert libfrog functions to return negative error codes like libxfs
does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: print library errors

Add a libfrog library function that will print tagged error messages.
This will eliminate the need for a lot of open-coded:

errno = ret;
perror("...");

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from main program

Replace the moveon returns in xfs_scrub.c to e with a direct integer
error return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove XFS_ITERATE_INODES_ABORT from inode iterator

Remove the _ABORT code since nobody uses it and we're slowly moving to
ECANCELED anyway.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from phase 1 functions

Replace the moveon returns in the phase 1 code with a direct integer
error return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from phase 2 functions

Replace the moveon returns in the phase 2 code with a direct integer
error return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from phase 3 functions

Replace the moveon returns in the phase 3 code with a direct integer
error return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from phase 4 functions

Replace the moveon returns in the phase 4 code with a direct integer
error return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from phase 5 functions

Replace the moveon returns in the phase 5 code with a direct integer
error return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from phase 6 functions

Replace the moveon returns in the phase 6 code with a direct integer
error return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from phase 7 functions

Replace the moveon returns in the phase 7 code with a direct integer
error return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from repair action list helpers

Replace the moveon returns in the repair action list processing
functions with a direct integer error return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from scrub ioctl wrappers

Replace the moveon returns in the scrub ioctl wrapper functions
with a direct integer error return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from progress report helpers

Replace the moveon returns in the scrub process reporting helpers
with a direct integer error return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from unicode name collision helpers

Replace the moveon returns in the unicode name collsion detector code
with a direct integer error return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from spacemap

Replace the moveon returns in the space map iteration code with a direct
integer return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from vfs directory tree iteration

Replace the moveon returns in the vfs directory tree walking functions
with a direct integer error return.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from inode iteration

Replace the moveon retuns in the inode iteration functions with a direct
integer error return. While we're at it, drop the xfs_ prefix.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from the fscounters functions

Replace the moveon returns in the fscounters functions with direct error
returns. Drop the xfs_ prefixes while we're at it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: remove moveon from filemap iteration

Remove the moveon and descr clutter from filemap iteration in favor of
returning errors directly and passing error domain descriptions around
through the existing void *arg.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: implement background mode for phase 6

Phase 6 doesn't implement background mode, which means that it doesn't
run in single-threaded mode with one -b and it doesn't sleep between
calls with multiple -b like every other phase does. It also doesn't
restrict the amount of work per kernel call, which is a key part of
throttling. Wire up the necessary pieces to make it behave like the man
page says it should.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: adapt phase5 to deferred descriptions

Apply the deferred description mechanism to phase 5 so that we don't
build inode prefix strings unless we actually want to say something
about an inode's attributes or directory entries.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: implement deferred description string rendering

A flamegraph analysis of xfs_scrub runtimes showed that we spend 7-10%
of the program's userspace runtime rendering prefix strings in case we
want to show a message about something we're checking, whether or not
that string ever actually gets used.

For a non-verbose run on a clean filesystem, this work is totally
unnecessary. We could defer the message catalog lookup and snprintf
call until we actually need that message, so build enough of a function
closure mechanism so that we can capture some location information when
its convenient and push that all the way to the edge of the call graph
and only when we need it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[sandeen: make comment change suggested on list]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: bump work_threads to include the controller thread

Bump @work_threads in the scrub phase setup function because we will
soon want the main thread (i.e. the one that coordinates workers) to be
factored into per-thread data structures. We'll need this in an
upcoming patch to render error string prefixes to preallocated
per-thread buffers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io/lsattr: expose FS_XFLAG_HASATTR flag

For efficient check if file has xattrs.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
[sandeen: Add commented-out option to CHATTR_XFLAG_LIST]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_growfs: allow mounted device node as argument

Up until:

b97815a0 xfs_growfs: ensure target path is an active xfs mountpoint

xfs_growfs actually accepted a mounted block device name as the
primary argument, because it could be found in the mount table.

It turns out that Ansible was making use of this undocumented behavior,
and it's trivial to allow it, so put it back in place and document
it this time.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: create a new category for unfixable errors

There's nothing that xfs_scrub (or XFS) can do about media errors for
data file blocks -- the data are gone. Create a new category for these
unfixable errors so that we don't advise the user to take further action
that won't fix the problem.

[sandeen: this error counter is only used for media errors today, but
there are tests in the code to accommodate potential future new types
of unfixable errors.]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>

xfs_scrub: refactor xfs_scrub_excessive_errors

Refactor this helper to avoid cycling the scrub context lock when the
user hasn't configured a maximum error count threshold.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: don't check unsigned max_errors for < 0]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: promote some of the str_info to str_error calls

Now that str_error is only for runtime errors, we can promote a few of
the str_info calls that report runtime errors to str_error.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: explicitly track corruptions, not just errors

Rename the @errors_found variable to @corruptions_found to make it
more explicit that we're tracking fs corruption issues. Add a new
str_corrupt() function to handle communications that fall under this new
corruption classification. str_error() now exists to log runtime errors
that do not have an associated errno code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: clean up error level table

Rework the error levels table in preparation for adding a few more error
categories that won't fit on a single line.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: simplify post-run reporting logic

Simplify the post-run error and warning reporting logic so that in
subsequent patches we can be more specific about what types of things
went wrong.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: fix misclassified error reporting

Fix a few places where we assign error reports to the wrong
classification.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: refactor xfs_iterate_inodes_range_check

Move all the bulkstat action into a single helper function. This gets
rid of the awkward name and increases cohesion.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: clean out the nproc global variable

Get rid of this global variable since we already have a libfrog function
that does exactly what it does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: take over platform headers

Move all the declarations for platform-specific functions into
libfrog/platform.h, since they're a part of libfrog now.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: remove libxfs_physmem

Remove this thin wrapper too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: remove libxfs_nproc

Remove libxfs_nproc since it's a wrapper around a libfrog function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: clean up platform_nproc

The platform_nproc function should check for error returns and obviously
garbage values and deal with them appropriately. Fix the header
declaration since it's part of the libfrog platform support code, not
libxfs. xfs_scrub will make use of it in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: fix media verification thread pool size calculations

The read verifier pool deals with two different thread counts -- there's
the submitter thread count that enables us to perform per-thread verify
request aggregation, and then there's the io thread pool count which is
the maximum number of IO requests we want to send to the disk at any
given time.

The io thread pool count should be derived from disk_heads() but instead
we bungle it by measuring and modifying(!) the nproc global variable.
Fix the derivation to use global variables correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: request fewer bmaps when we can

In xfs_iterate_filemaps, we query the number of bmaps for a given file
that we're going to iterate, so feed that information to bmap so that
the kernel won't waste time allocating in-kernel memory unnecessarily.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: reduce fsmap activity for media errors

Right now we rather foolishly query the fsmap data for every single
media error that we find. This is a silly waste of time since we
have yet to combine adjacent bad blocks into bad extents, so move the
rmap query until after we've constructed the bad block bitmap data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: don't report media errors on unwritten extents

Don't report media errors for unwritten extents since no data has been
lost.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: improve reporting of file metadata media errors

Report media errors that map to data and attr fork extent maps.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: better reporting of metadata media errors

When we report bad metadata, we inexplicably report the physical address
in units of sectors, whereas for file data we report file offsets in
units of bytes. Fix the metadata reporting units to match the file data
units (i.e. bytes) and skip the printf for all other cases.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: improve reporting of file data media errors

When we report media errors, we should tell the administrator the file
offset and length of the bad region, not just the offset of the entire
file extent record that overlaps a bad region.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: separate media error reporting for attribute forks

Use different functions to warn about media errors that were detected in
underlying xattr data because logical offsets for attribute fork extents
have no meaning to users.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog/xfs_scrub: improve iteration function documentation

Between libfrog and xfs_scrub, we have several item collection iteration
functions that take a pointer to a function that will be called for
every item in that collection. They're not well documented, so improve
the description of when they'll be called and what kinds of return
values they expect.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: fix incorrect error message

If we encounter a failure while fixing the freelist during mkfs, we
shouldn't print a misleading message about space reservation. Fix it so
that we print something about what we were trying to do when the error
happened.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: fix typo in message about write verifier

Fix a silly typo.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: print better information when metadata updates fail

If a metadata update fails during phase 6, we should print an error
message that can be traced back to a specific line of code. Also,
res_failed spits out a general message about "xfs_trans_reserve failed",
which is probably not where the failure happened. Fix two incorrect
call sites.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: fix workqueue_add error out

Don't forget to unlock before erroring out.

Coverity-id: 1454843
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: don't allow zero or negative error injection interval

Don't allow zero or negative values from XFS_SCRUB_DISK_ERROR_INTERVAL
to slip into the system. This is a debugging knob so we don't need to
be rigorous, but we can at least take care of obvious garbage values.

Coverity-id: 1454842
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: fix patch title]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: report repair activities on stdout, not stderr

Reduce the severity of reports about successful metadata repairs. We
fixed the problem, so there's no action necessary on the part of the
system admin.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: put err_levels in enum order]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: btheight should check geometry more carefully

The btheight command needs to check user-supplied geometry more
carefully so that we don't hit floating point exceptions.

Coverity-id: 1453661, 1453659
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_spaceman: always report sick metadata, checked or not

If the kernel thinks a piece of metadata is bad, we must always report
it. This will happen with an upcoming series to mark things sick
whenever we return EFSCORRUPTED at runtime.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: simulate errors in the read-verify phase

Add a debugging hook so that we can simulate disk errors during the
media scan to test that the code works.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: fix read verify disk error handling strategy

The error handling strategy for media errors is totally bogus. First of
all, short reads are entirely unhandled -- when we encounter a short
read, we know the disk was able to feed us the beginning of what we
asked for, so we need to single-step through the remainder to try to
capture the exact error that we hit.

Second, an actual IO error causes the entire region to be marked bad
even though it could be just a few MB of a multi-gigabyte extent that's
bad. Therefore, single-step each block in the IO request until we stop
getting IO errors to find out if all the blocks are bad or if it's just
that extent.

Third, fix the fact that the loop updates its own counter variables with
the length fed to read(), which doesn't necessarily have anything to do
with the amount of data that the read actually produced.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[sandeen: change "io_error" to "read_error"]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>