Simplify the logic in xfs_dir2_node_addname_int() by factoring out
the free block index lookup code that finds a block with enough free
space for the entry to be added. The code that is moved gets a major
cleanup at the same time, but there is no algorithm change here.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Factor out the code that adds a data block to a directory from
xfs_dir2_node_addname_int(). This makes the code flow cleaner and
more obvious and provides clear isolation of upcoming optimsations.
Signed-off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This gets rid of the need for a forward declaration of the static
function xfs_dir2_addname_int() and readies the code for factoring
of xfs_dir2_addname_int().
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Iterator functions already use 0 to signal "continue iterating", so get
rid of the #defines and just do it directly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Use -ECANCELED to signal "stop iterating" instead of these magical
*_ITER_ABORT values, since it's duplicative.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In xfs_rmap_irec_offset_unpack, we should always clear the contents of
rm_flags before we begin unpacking the encoded (ondisk) offset into the
incore rm_offset and incore rm_flags fields. Remove the open-coded
field zeroing as this encourages api misuse.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Remove the return value from the functions that schedule deferred bmap
operations since they never fail and do not return status.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Remove the return value from the functions that schedule deferred
refcount operations since they never fail and do not return status.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Remove the return value from the functions that schedule deferred rmap
operations since they never fail and do not return status.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This function doesn't use the @state parameter, so get rid of it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In xfs_bmbt_diff_two_keys, we perform a signed int64_t subtraction with
two unsigned 64-bit quantities. If the second quantity is actually the
"maximum" key (all ones) as used in _query_all, the subtraction
effectively becomes addition of two positive numbers and the function
returns incorrect results. Fix this with explicit comparisons of the
unsigned values. Nobody needs this now, but the online repair patches
will need this to work properly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The xfs_rmap_has_other_keys helper aborts the iteration as soon as it
has an answer. Don't let this abort leak out to callers.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In xfs_ialloc_setup_geometry, it's possible for a malicious/corrupt fs
image to set an unreasonably large value for sb_inopblog which will
cause ialloc_blks to be zero. If sb_imax_pct is also set, this results
in a division by zero error in the second do_div call. Therefore, force
maxicount to zero if ialloc_blks is zero.
Note that the kernel metadata verifiers will catch the garbage inopblog
value and abort the fs mount long before it tries to set up the inode
geometry; this is needed to avoid a crash in xfs_db while setting up the
xfs_mount structure.
Found by fuzzing sb_inopblog to 122 in xfs/350.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'max_recs' description in 'xfs_btree_sblock_v5hdr_verify'
fs/xfs/libxfs/xfs_btree.c:4475: warning: Excess function parameter 'pag_max_level' description in 'xfs_btree_sblock_v5hdr_verify'
Fixes: c5ab131ba0df ("libxfs: refactor short btree block verification") Signed-off-by: zhengbin <zhengbin13@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
When trying to correlate XFS kernel allocations to memory reclaim
behaviour, it is useful to know what allocations XFS is actually
attempting. This information is not directly available from
tracepoints in the generic memory allocation and reclaim
tracepoints, so these new trace points provide a high level
indication of what the XFS memory demand actually is.
There is no per-filesystem context in this code, so we just trace
the type of allocation, the size and the allocation constraints.
The kmem code also doesn't include much of the common XFS headers,
so there are a few definitions that need to be added to the trace
headers and a couple of types that need to be made common to avoid
needing to include the whole world in the kmem code.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Since no caller is using KM_NOSLEEP and no callee branches on KM_SLEEP,
we can remove KM_NOSLEEP and replace KM_SLEEP with 0.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Mon, 11 Nov 2019 15:06:46 +0000 (10:06 -0500)]
xfs_scrub: fix complaint about uninitialized ret
Coverity complained about the uninitialized ret in run_scrub_phases.
It's not sophisticated enough to realize that phase 1 and 7 are both
marked mustrun and are never the repair or datascan dummies and that
therefore ret is always initialized by the end of the for loop, but
OTOH there's no reason not to fix a trivial logic bomb if that ever
changes.
Coverity-id: 1455255 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:30:57 +0000 (17:30 -0500)]
xfs_scrub: remove moveon from main program
Replace the moveon returns in xfs_scrub.c to e with a direct integer
error return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:30:36 +0000 (17:30 -0500)]
xfs_scrub: remove XFS_ITERATE_INODES_ABORT from inode iterator
Remove the _ABORT code since nobody uses it and we're slowly moving to
ECANCELED anyway.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:30:29 +0000 (17:30 -0500)]
xfs_scrub: remove moveon from phase 1 functions
Replace the moveon returns in the phase 1 code with a direct integer
error return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:30:22 +0000 (17:30 -0500)]
xfs_scrub: remove moveon from phase 2 functions
Replace the moveon returns in the phase 2 code with a direct integer
error return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:30:16 +0000 (17:30 -0500)]
xfs_scrub: remove moveon from phase 3 functions
Replace the moveon returns in the phase 3 code with a direct integer
error return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:30:09 +0000 (17:30 -0500)]
xfs_scrub: remove moveon from phase 4 functions
Replace the moveon returns in the phase 4 code with a direct integer
error return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:30:03 +0000 (17:30 -0500)]
xfs_scrub: remove moveon from phase 5 functions
Replace the moveon returns in the phase 5 code with a direct integer
error return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:29:55 +0000 (17:29 -0500)]
xfs_scrub: remove moveon from phase 6 functions
Replace the moveon returns in the phase 6 code with a direct integer
error return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:29:47 +0000 (17:29 -0500)]
xfs_scrub: remove moveon from phase 7 functions
Replace the moveon returns in the phase 7 code with a direct integer
error return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:29:29 +0000 (17:29 -0500)]
xfs_scrub: remove moveon from repair action list helpers
Replace the moveon returns in the repair action list processing
functions with a direct integer error return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:29:22 +0000 (17:29 -0500)]
xfs_scrub: remove moveon from scrub ioctl wrappers
Replace the moveon returns in the scrub ioctl wrapper functions
with a direct integer error return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:29:15 +0000 (17:29 -0500)]
xfs_scrub: remove moveon from progress report helpers
Replace the moveon returns in the scrub process reporting helpers
with a direct integer error return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:29:08 +0000 (17:29 -0500)]
xfs_scrub: remove moveon from unicode name collision helpers
Replace the moveon returns in the unicode name collsion detector code
with a direct integer error return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:28:45 +0000 (17:28 -0500)]
xfs_scrub: remove moveon from spacemap
Replace the moveon returns in the space map iteration code with a direct
integer return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:28:37 +0000 (17:28 -0500)]
xfs_scrub: remove moveon from vfs directory tree iteration
Replace the moveon returns in the vfs directory tree walking functions
with a direct integer error return.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:28:21 +0000 (17:28 -0500)]
xfs_scrub: remove moveon from inode iteration
Replace the moveon retuns in the inode iteration functions with a direct
integer error return. While we're at it, drop the xfs_ prefix.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:27:19 +0000 (17:27 -0500)]
xfs_scrub: remove moveon from the fscounters functions
Replace the moveon returns in the fscounters functions with direct error
returns. Drop the xfs_ prefixes while we're at it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:27:17 +0000 (17:27 -0500)]
xfs_scrub: remove moveon from filemap iteration
Remove the moveon and descr clutter from filemap iteration in favor of
returning errors directly and passing error domain descriptions around
through the existing void *arg.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:27:06 +0000 (17:27 -0500)]
xfs_scrub: implement background mode for phase 6
Phase 6 doesn't implement background mode, which means that it doesn't
run in single-threaded mode with one -b and it doesn't sleep between
calls with multiple -b like every other phase does. It also doesn't
restrict the amount of work per kernel call, which is a key part of
throttling. Wire up the necessary pieces to make it behave like the man
page says it should.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:27:04 +0000 (17:27 -0500)]
xfs_scrub: adapt phase5 to deferred descriptions
Apply the deferred description mechanism to phase 5 so that we don't
build inode prefix strings unless we actually want to say something
about an inode's attributes or directory entries.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
A flamegraph analysis of xfs_scrub runtimes showed that we spend 7-10%
of the program's userspace runtime rendering prefix strings in case we
want to show a message about something we're checking, whether or not
that string ever actually gets used.
For a non-verbose run on a clean filesystem, this work is totally
unnecessary. We could defer the message catalog lookup and snprintf
call until we actually need that message, so build enough of a function
closure mechanism so that we can capture some location information when
its convenient and push that all the way to the edge of the call graph
and only when we need it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[sandeen: make comment change suggested on list] Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 6 Nov 2019 22:26:35 +0000 (17:26 -0500)]
xfs_scrub: bump work_threads to include the controller thread
Bump @work_threads in the scrub phase setup function because we will
soon want the main thread (i.e. the one that coordinates workers) to be
factored into per-thread data structures. We'll need this in an
upcoming patch to render error string prefixes to preallocated
per-thread buffers.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Amir Goldstein [Wed, 6 Nov 2019 22:26:29 +0000 (17:26 -0500)]
xfs_io/lsattr: expose FS_XFLAG_HASATTR flag
For efficient check if file has xattrs.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
[sandeen: Add commented-out option to CHATTR_XFLAG_LIST] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Mon, 4 Nov 2019 20:35:49 +0000 (15:35 -0500)]
xfs_growfs: allow mounted device node as argument
Up until:
b97815a0 xfs_growfs: ensure target path is an active xfs mountpoint
xfs_growfs actually accepted a mounted block device name as the
primary argument, because it could be found in the mount table.
It turns out that Ansible was making use of this undocumented behavior,
and it's trivial to allow it, so put it back in place and document
it this time.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 22:04:21 +0000 (18:04 -0400)]
xfs_scrub: create a new category for unfixable errors
There's nothing that xfs_scrub (or XFS) can do about media errors for
data file blocks -- the data are gone. Create a new category for these
unfixable errors so that we don't advise the user to take further action
that won't fix the problem.
[sandeen: this error counter is only used for media errors today, but
there are tests in the code to accommodate potential future new types
of unfixable errors.]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Darrick J. Wong [Fri, 1 Nov 2019 21:58:14 +0000 (17:58 -0400)]
xfs_scrub: refactor xfs_scrub_excessive_errors
Refactor this helper to avoid cycling the scrub context lock when the
user hasn't configured a maximum error count threshold.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: don't check unsigned max_errors for < 0] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 21:55:06 +0000 (17:55 -0400)]
xfs_scrub: promote some of the str_info to str_error calls
Now that str_error is only for runtime errors, we can promote a few of
the str_info calls that report runtime errors to str_error.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 21:55:06 +0000 (17:55 -0400)]
xfs_scrub: explicitly track corruptions, not just errors
Rename the @errors_found variable to @corruptions_found to make it
more explicit that we're tracking fs corruption issues. Add a new
str_corrupt() function to handle communications that fall under this new
corruption classification. str_error() now exists to log runtime errors
that do not have an associated errno code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 21:55:06 +0000 (17:55 -0400)]
xfs_scrub: clean up error level table
Rework the error levels table in preparation for adding a few more error
categories that won't fit on a single line.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 21:55:06 +0000 (17:55 -0400)]
xfs_scrub: simplify post-run reporting logic
Simplify the post-run error and warning reporting logic so that in
subsequent patches we can be more specific about what types of things
went wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 21:55:06 +0000 (17:55 -0400)]
xfs_scrub: fix misclassified error reporting
Fix a few places where we assign error reports to the wrong
classification.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Move all the bulkstat action into a single helper function. This gets
rid of the awkward name and increases cohesion.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:54:20 +0000 (16:54 -0400)]
xfs_scrub: clean out the nproc global variable
Get rid of this global variable since we already have a libfrog function
that does exactly what it does.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:38:35 +0000 (16:38 -0400)]
libfrog: take over platform headers
Move all the declarations for platform-specific functions into
libfrog/platform.h, since they're a part of libfrog now.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:38:35 +0000 (16:38 -0400)]
libxfs: remove libxfs_physmem
Remove this thin wrapper too.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:38:35 +0000 (16:38 -0400)]
libxfs: remove libxfs_nproc
Remove libxfs_nproc since it's a wrapper around a libfrog function.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:38:35 +0000 (16:38 -0400)]
libfrog: clean up platform_nproc
The platform_nproc function should check for error returns and obviously
garbage values and deal with them appropriately. Fix the header
declaration since it's part of the libfrog platform support code, not
libxfs. xfs_scrub will make use of it in the next patch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:32:45 +0000 (16:32 -0400)]
xfs_scrub: fix media verification thread pool size calculations
The read verifier pool deals with two different thread counts -- there's
the submitter thread count that enables us to perform per-thread verify
request aggregation, and then there's the io thread pool count which is
the maximum number of IO requests we want to send to the disk at any
given time.
The io thread pool count should be derived from disk_heads() but instead
we bungle it by measuring and modifying(!) the nproc global variable.
Fix the derivation to use global variables correctly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:32:45 +0000 (16:32 -0400)]
xfs_scrub: request fewer bmaps when we can
In xfs_iterate_filemaps, we query the number of bmaps for a given file
that we're going to iterate, so feed that information to bmap so that
the kernel won't waste time allocating in-kernel memory unnecessarily.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:29:57 +0000 (16:29 -0400)]
xfs_scrub: reduce fsmap activity for media errors
Right now we rather foolishly query the fsmap data for every single
media error that we find. This is a silly waste of time since we
have yet to combine adjacent bad blocks into bad extents, so move the
rmap query until after we've constructed the bad block bitmap data.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:16:40 +0000 (16:16 -0400)]
xfs_scrub: don't report media errors on unwritten extents
Don't report media errors for unwritten extents since no data has been
lost.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:16:40 +0000 (16:16 -0400)]
xfs_scrub: improve reporting of file metadata media errors
Report media errors that map to data and attr fork extent maps.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:16:40 +0000 (16:16 -0400)]
xfs_scrub: better reporting of metadata media errors
When we report bad metadata, we inexplicably report the physical address
in units of sectors, whereas for file data we report file offsets in
units of bytes. Fix the metadata reporting units to match the file data
units (i.e. bytes) and skip the printf for all other cases.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:16:40 +0000 (16:16 -0400)]
xfs_scrub: improve reporting of file data media errors
When we report media errors, we should tell the administrator the file
offset and length of the bad region, not just the offset of the entire
file extent record that overlaps a bad region.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:16:40 +0000 (16:16 -0400)]
xfs_scrub: separate media error reporting for attribute forks
Use different functions to warn about media errors that were detected in
underlying xattr data because logical offsets for attribute fork extents
have no meaning to users.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 20:16:40 +0000 (16:16 -0400)]
libfrog/xfs_scrub: improve iteration function documentation
Between libfrog and xfs_scrub, we have several item collection iteration
functions that take a pointer to a function that will be called for
every item in that collection. They're not well documented, so improve
the description of when they'll be called and what kinds of return
values they expect.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 19:49:08 +0000 (15:49 -0400)]
mkfs: fix incorrect error message
If we encounter a failure while fixing the freelist during mkfs, we
shouldn't print a misleading message about space reservation. Fix it so
that we print something about what we were trying to do when the error
happened.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 19:49:01 +0000 (15:49 -0400)]
libxfs: fix typo in message about write verifier
Fix a silly typo.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 19:48:56 +0000 (15:48 -0400)]
xfs_repair: print better information when metadata updates fail
If a metadata update fails during phase 6, we should print an error
message that can be traced back to a specific line of code. Also,
res_failed spits out a general message about "xfs_trans_reserve failed",
which is probably not where the failure happened. Fix two incorrect
call sites.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 19:48:52 +0000 (15:48 -0400)]
libfrog: fix workqueue_add error out
Don't forget to unlock before erroring out.
Coverity-id: 1454843 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 19:48:48 +0000 (15:48 -0400)]
xfs_scrub: don't allow zero or negative error injection interval
Don't allow zero or negative values from XFS_SCRUB_DISK_ERROR_INTERVAL
to slip into the system. This is a debugging knob so we don't need to
be rigorous, but we can at least take care of obvious garbage values.
Coverity-id: 1454842 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: fix patch title] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 19:48:16 +0000 (15:48 -0400)]
xfs_scrub: report repair activities on stdout, not stderr
Reduce the severity of reports about successful metadata repairs. We
fixed the problem, so there's no action necessary on the part of the
system admin.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: put err_levels in enum order] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 19:46:12 +0000 (15:46 -0400)]
xfs_db: btheight should check geometry more carefully
The btheight command needs to check user-supplied geometry more
carefully so that we don't hit floating point exceptions.
Coverity-id: 1453661, 1453659 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Nov 2019 19:46:12 +0000 (15:46 -0400)]
xfs_spaceman: always report sick metadata, checked or not
If the kernel thinks a piece of metadata is bad, we must always report
it. This will happen with an upcoming series to mark things sick
whenever we return EFSCORRUPTED at runtime.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:26 +0000 (22:35 -0400)]
xfs_scrub: simulate errors in the read-verify phase
Add a debugging hook so that we can simulate disk errors during the
media scan to test that the code works.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:26 +0000 (22:35 -0400)]
xfs_scrub: fix read verify disk error handling strategy
The error handling strategy for media errors is totally bogus. First of
all, short reads are entirely unhandled -- when we encounter a short
read, we know the disk was able to feed us the beginning of what we
asked for, so we need to single-step through the remainder to try to
capture the exact error that we hit.
Second, an actual IO error causes the entire region to be marked bad
even though it could be just a few MB of a multi-gigabyte extent that's
bad. Therefore, single-step each block in the IO request until we stop
getting IO errors to find out if all the blocks are bad or if it's just
that extent.
Third, fix the fact that the loop updates its own counter variables with
the length fed to read(), which doesn't necessarily have anything to do
with the amount of data that the read actually produced.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[sandeen: change "io_error" to "read_error"] Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:26 +0000 (22:35 -0400)]
xfs_scrub: return bytes verified from a SCSI VERIFY command
Since disk_scsi_verify and pread are interchangeably called from
disk_read_verify(), we must return the number of bytes verified (or -1)
just like what pread returns. This doesn't matter now due to bugs in
scrub, but we're about to fix those bugs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:26 +0000 (22:35 -0400)]
xfs_scrub: enforce read verify pool minimum io size
Make sure we always issue media verification requests aligned to the
minimum IO size that the caller cares about. Concretely, this means
that we only care about doing IO in filesystem block-sized chunks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:26 +0000 (22:35 -0400)]
xfs_scrub: record disk LBA size
Remember the size (in bytes) of a logical block on the disk. We'll use
this in subsequent patches to improve the ability of media scans to
report on which files are corrupt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:26 +0000 (22:35 -0400)]
xfs_scrub: refactor inode prefix rendering code
Refactor all the places in the code where we try to render an inode
number as a prefix for some sort of status message. This will help make
message prefixes more consistent, which should help users to locate
broken metadata.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[sandeen: rename functions, auto-add spaces, edit comments] Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:26 +0000 (22:35 -0400)]
xfs_scrub: only call read_verify_force_io once per pool
There's no reason we need to call read_verify_force_io every AG; we can
just let the request aggregation code do its thing and push when we're
totally done browsing the fsmap information.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:26 +0000 (22:35 -0400)]
xfs_scrub: fix queue-and-stash of non-contiguous verify requests
read_verify_schedule_io is supposed to have the ability to decide that a
retained aggregate extent verification request is not sufficiently
contiguous with the request that is being scheduled, and therefore it
needs to queue the retained request and use the new request to start
building a new aggregate request.
Unfortunately, it stupidly returns after queueing the IO, so we lose the
incoming request. Fix the code so we only do that if there's a run time
error.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:26 +0000 (22:35 -0400)]
xfs_scrub: fix read-verify pool error communication problems
Fix all the places in the read-verify pool functions either we fail to
check for runtime errors or fail to communicate them properly to
callers. Then fix all the callers to report the error messages instead
of hiding them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:25 +0000 (22:35 -0400)]
xfs_scrub: abort all read verification work immediately on error
Add a new abort function to the read verify pool code so that the caller
can immediately abort all pending verification work if things start
going wrong. There's no point in waiting for queued work to run if
we've already decided to bail.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:25 +0000 (22:35 -0400)]
xfs_scrub: fix handling of read-verify pool runtime errors
Fix some bogosity with how we handle runtime errors in the read verify
pool functions. First of all, memory allocation failures shouldn't be
recorded as disk IO errors, they should just complain and abort the
phase. Second, we need to collect any other runtime errors in the IO
thread and abort the phase instead of silently ignoring them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:25 +0000 (22:35 -0400)]
xfs_scrub: fix error handling problems in vfs.c
Fix all the places where we drop or screw up error handling in
scan_fs_tree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:25 +0000 (22:35 -0400)]
xfs_scrub: move all the queue_subdir error reporting to callers
Change queue_subdir to return a positive error code to callers and move
the error reporting to the callers. This continues the process of
changing internal functions to return error codes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:25 +0000 (22:35 -0400)]
xfs_scrub: check progress bar timedwait failures
Check for failures in the timedwait for progressbar reporting.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:25 +0000 (22:35 -0400)]
xfs_scrub: report all progressbar creation failures
Always report failures when creating progress bars.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:25 +0000 (22:35 -0400)]
xfs_scrub: fix per-thread counter error communication problems
Fix all the places in the per-thread counter functions either we fail to
check for runtime errors or fail to communicate them properly to
callers. Then fix all the callers to report the error messages instead
of hiding them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:25 +0000 (22:35 -0400)]
libfrog: fix missing error checking in bitmap code
Check library calls for error codes being returned.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 17 Oct 2019 02:35:25 +0000 (22:35 -0400)]
libfrog: fix bitmap error communication problems
Convert all the libfrog code and callers away from the libc-style
indirect errno returns to directly returning error codes to callers.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Add missing return value checks for everything that the per-thread
variable code calls.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>