Prevent kmemcheck from throwing warnings about reading uninitialised
memory when formatting inodes into the incore log buffer. There are
several issues here - we don't always log all the fields in the
inode log format item, and we never log the inode the
di_next_unlinked field.
In the case of the inode log format item, this is exacerbated
by the old xfs_inode_log_format structure padding issue. Hence make
the padded, 64 bit aligned version of the structure the one we always
use for formatting the log and get rid of the 64 bit variant. This
means we'll always log the 64-bit version and so recovery only needs
to convert from the unpadded 32 bit version from older 32 bit
kernels.
Signed-Off-By: Dave Chinner <dchinner@redhat.com> Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In commit fd26a88093ba we added a worst case estimate for rmapbt blocks
needed to satisfy the block mapping request. Since then, we added the
ability to reserve enough space in each AG such that we should never run
out of blocks to grow the rmapbt, which makes this calculation
unnecessary. Revert the commit because it makes the extra delalloc
indlen accounting unnecessary and incorrect.
Reported-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
We call __xfs_ag_resv_init to make a per-AG reservation for each AG.
This makes the reservation per-AG, not per-filesystem. Therefore, it
is incorrect to adjust m_ag_max_usable for each AG. Adjust it only
when we're reserving AG 0's blocks so that we only do it once per fs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In Christoph's patch to refactor xfs_bmse_merge, the updated rmap code
does more work than it needs to (because map-extent auto-merges
records). Remove the unnecessary unmap and save ourselves a deferred
op.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This abstracts the function away from details of the low-level extent
list implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This abstracts the function away from details of the low-level extent
list implementation.
Note that it seems like the previous implementation of rmap for
the merge case was completely broken, but it no seems appear to
trigger that.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
For the first right move we need to look up next_fsb. That means
our last fsb that contains next_fsb must also be the current extent,
so take advantage of that by moving the code around a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Use the bmap abstraction instead of open-coding bmbt details here.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Use the helper instead of open coding it, to provide a better abstraction
for the scalable extent list work. This also gets an additional assert
and trace point for free.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This helper is used to update an extent record based on the extent index,
and can be used to provide a level of abstractions between callers that
want to modify in-core extent records and the details of the extent list
implementation.
Also switch all users of the xfs_bmbt_set_all(xfs_iext_get_ext(...))
pattern to this new helper.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The owner change bmbt scan that occurs during extent swap operations
does not handle ordered buffer failures. Buffers that cannot be
marked ordered must be physically logged so previously dirty ranges
of the buffer can be relogged in the transaction.
Since the bmbt scan may need to process and potentially log a large
number of blocks, we can't expect to complete this operation in a
single transaction. Update extent swap to use a permanent
transaction with enough log reservation to physically log a buffer.
Update the bmbt scan to physically log any buffers that cannot be
ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
caller rolls the transaction and restarts the scan. Finally, update
the bmbt scan helper function to skip bmbt blocks that already match
the expected owner so they are not reprocessed after scan restarts.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
[darrick: fix the xfs_trans_roll call] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[dchinner: proper userspace libxfs_trans_ordered_buf] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
owners on v5 (!rmapbt) filesystems. The bmbt scan uses
xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
current owner of the block against the parent inode of the bmbt.
This works during extent swap because the bmbt owners are updated to
the opposite inode number before the inode extent forks are swapped.
The modified bmbt blocks are marked as ordered buffers which allows
everything to commit in a single transaction. If the transaction
extent swap is required, log recovery restarts the bmbt scan to fix
up any bmbt blocks that may have not been written back before the
crash. The log recovery bmbt scan occurs after the inode forks have
been swapped, however. This causes the bmbt block owner verification
to fail, leads to log recovery failure and requires xfs_repair to
zap the log to recover.
Define a new invalid inode owner flag to inform the btree block
lookup mechanism that the current inode may be invalid with respect
to the current owner of the bmbt block. Set this flag on the cursor
used for change owner scans to allow this operation to work at
runtime and during log recovery.
Signed-off-by: Brian Foster <bfoster@redhat.com> Fixes: bb3be7e7c ("xfs: check for bogus values in btree block headers") Cc: stable@vger.kernel.org Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Ordered buffers are attached to transactions and pushed through the
logging infrastructure just like normal buffers with the exception
that they are not actually written to the log. Therefore, we don't
need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is
called on ordered buffers to set up all of the dirty state on the
transaction, buffer and log item and prepare the buffer for I/O.
Now that xfs_trans_dirty_buf() is available, call it from
xfs_trans_ordered_buf() so the latter is now mutually exclusive with
xfs_trans_log_buf(). This reflects the implementation of ordered
buffers and helps eliminate confusion over the need to log ranges of
ordered buffers just to set up internal log state.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
And instead require callers to explicitly join the inode using
xfs_defer_ijoin. Also consolidate the defer error handling in
a few places using a goto label.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Split xfs_trans_roll into a low-level helper that just rolls the
actual transaction and a new higher level xfs_trans_roll_inode
that takes care of logging and rejoining the inode. This gets
rid of the NULL inode case, and allows to simplify the special
cases in the deferred operation code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In a filesystem without finobt, the Space manager selects an AG to alloc a new
inode, where xfs_dialloc_ag_inobt() will search the AG for the free slot chunk.
When the new inode is in the same AG as its parent, the btree will be searched
starting on the parent's record, and then retried from the top if no slot is
available beyond the parent's record.
To exit this loop though, xfs_dialloc_ag_inobt() relies on the fact that the
btree must have a free slot available, once its callers relied on the
agi->freecount when deciding how/where to allocate this new inode.
In the case when the agi->freecount is corrupted, showing available inodes in an
AG, when in fact there is none, this becomes an infinite loop.
Add a way to stop the loop when a free slot is not found in the btree, making
the function to fall into the whole AG scan which will then, be able to detect
the corruption and shut the filesystem down.
As pointed by Brian, this might impact performance, giving the fact we
don't reset the search distance anymore when we reach the end of the
tree, giving it fewer tries before falling back to the whole AG search, but
it will only affect searches that start within 10 records to the end of the tree.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 27 Sep 2017 01:41:12 +0000 (20:41 -0500)]
mkfs: don't overflow the subopts array
The new -d cowextsize option overran the subopts array; make it larger.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 21 Sep 2017 22:00:08 +0000 (17:00 -0500)]
xfs_repair: handle missing extent states
Missed a couple of the new extent states in the bmbt processing, so add
them to avoid aborting xfs_repair.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: move XR_E_REFC case above fallthrough to emit both do_warn] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 21 Sep 2017 22:00:01 +0000 (17:00 -0500)]
mkfs: pass a custom cowextsize into the created filesystem
Create a -d option to mkfs.xfs that enables administrators to set
the CoW extent size hint on the created files.
[sandeen: Note, the switch to xfs_flags2diflags looks like
a bugfix, but it's not - the flags set by mkfs up to this
point just happened to line up without any translation.]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
xfs_repair: don't use do_warn for normal log message
In some case, exit status of xfs_repair -n is different even for
the same file system when -v is specified or not. This patch fixes
this behavior.
If -v is specified, do_warn() is used in zero_log() for printing
a normal message. That makes the exit status to 1 though there
is no dirtiness in the file system.
Signed-off-by: Masatake YAMATO <yamato@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: edit changelog for brevity] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 15 Sep 2017 13:34:10 +0000 (08:34 -0500)]
libxfs: remove getcwd/chdir dance from initialization
Back in the old days of Irix, the library saved and restored chdir so
that (I guess?) check_open can change directory without screwing up our
ability to use relative paths. However, there's nothing in Linux that
actually does this, so just rip out the getcwd/chdir stuff since we
absolute device paths work just fine without it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 15 Sep 2017 13:34:10 +0000 (08:34 -0500)]
misc: fix more stupid compiler warnings
Fix more compiler warnings about pointless checks, unchecked return
values, brace problems, and missing parentheses.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 15 Sep 2017 13:33:45 +0000 (08:33 -0500)]
xfs_repair: take the ag_lock before recording rmap for a bmbt record
When the (threaded) inode scanner iterates the blocks of a bmbt tree and
wants to record the bmbt blocks in the in-core rmap database, we have to
take the ag_lock for the AG that the bmbt block is in, or else we can
accidentally corrupt the rmap slab by calling slab_add from two threads.
Reported-by: matorola@gmail.com Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Donald Douwsma [Fri, 15 Sep 2017 13:33:42 +0000 (08:33 -0500)]
mkfs.xfs: Don't stagger AG for a single disk
When sunit and swidth are used mkfs.xfs tries to avoid all allocation
groups aligning on the same stripe and will attempt to stagger them
across the stripes that make up swidth. If there is only one stripe
then there is no benefit in this optimisation.
When we try to allocate a free inode by searching the inobt, we try to
find the inode nearest the parent inode by searching chunks both left
and right of the chunk containing the parent. As an optimization, we
cache the leftmost and rightmost records that we previously searched; if
we do another allocation with the same parent inode, we'll pick up the
search where it last left off.
There's a bug in the case where we found a free inode to the left of the
parent's chunk: we need to update the cached left and right records, but
because we already reassigned the right record to point to the left, we
end up assigning the left record to both the cached left and right
records.
This isn't a correctness problem strictly, but it can result in the next
allocation rechecking chunks unnecessarily or allocating inodes further
away from the parent than it needs to. Fix it by swapping the record
pointer after we update the cached left and right records.
Fixes: bd169565993b ("xfs: speed up free inode search") Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 24 Aug 2017 21:43:46 +0000 (16:43 -0500)]
xfs_io: clarify the fsmap documentation
Explicitly declare that the 'start' and 'end' arguments to fsmap require
one of -d, -l, or -r to select the data, log, or realtime device, and fix
misspelled command name while we're at it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Thu, 24 Aug 2017 21:43:45 +0000 (16:43 -0500)]
xfs_bmap: fix -n documentation in manpage
xfs_bmap's manpage mis-describes the behavior of the
-n option. xfs_io's fiemap command references the xfs_bmap
manpage, and has the same problem:
-n does not change the query batch size, it limits the number
of extents displayed.
This has been true for 15+ years, so change the documentation
to match reality.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Nikolay Borisov [Thu, 24 Aug 2017 21:43:44 +0000 (16:43 -0500)]
fiemap: Fix semantics of max_extents (-n arguments)
Currently the semantics of the -n argument are a bit idiosyncratic. We want the
argument to be the limit of extents that are going to be output by the tool. This
is clearly broken now as evident from the following example on a fragmented file:
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
[sandeen: fix initialization of max_extents] Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Nikolay Borisov [Thu, 24 Aug 2017 21:43:42 +0000 (16:43 -0500)]
fiemap: Factor out common code used for printing holes
The code responsible for printing holes is scattered across 3 places:
plain print function, verbose print function and in the block handling EOF hole.
Introduce a new function factoring out the common code and replace the 3 sites
where the code is used with it. This reduces duplication and makes it apparent
when we are printing holes. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Nikolay Borisov [Thu, 24 Aug 2017 21:43:41 +0000 (16:43 -0500)]
fiemap: De-obfuscate last_logical and cur_extent manipulation
last_logical and cur_extent are being passed by reference to the printing
functions and the in turn modify those variables. This makes it a bit harder to
reason about the code. So change the printing function to take those 2 arguemnts
by value and move the manipulation logic in fiemap_f. Furthermore, the printing
function now return the number of extents they have printed (either 1 or 2,
dependent on whether we've hit the -n limit). No functional changes
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Nikolay Borisov [Thu, 24 Aug 2017 21:43:40 +0000 (16:43 -0500)]
fiemap: Eliminate num_extents
Fiemap has this rather convoluted logic to calculate the number of extents to
query. This introduces needless complexity with no real benefit. Remove
num_extents and instead hardcode the number of extents we query for in a single
go to 32. No functional changes
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Nikolay Borisov [Thu, 24 Aug 2017 21:43:39 +0000 (16:43 -0500)]
fiemap: Make max_extents a global var
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Nikolay Borisov [Thu, 24 Aug 2017 21:43:36 +0000 (16:43 -0500)]
fiemap: Remove blocksize variable
The blocksize variable was hardcoded to 512 bytes and was passed to various
functions. This introduced a lot of redundancy since we can just as well use
the BTOBBT macro. So let's do that and eliminate all usage of the blocksize var.
No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Jeff Mahoney [Tue, 22 Aug 2017 15:01:31 +0000 (10:01 -0500)]
xfs_repair: fix thread creation failure recovery
When pf_create_prefetch_thread fails, it tears down the args struct
and frees it. This causes a use-after-free in prefetch_ag_range, which
then passes the now-invalid pointer to start_inode_prefetch. The struct
is only freed when the queuing thread can't be started. When we can't
start even one worker thread, we mark the args ready for processing and
allow it to proceed single-threaded. Unfortunately, this only marks
the current args ready for processing and since we return immediately,
the call to pf_create_prefetch_thread at the end of pf_queuing_worker
never gets called and we wait forever for prefetch to start on the
next AG.
This patch factors out the cleanup into a new pf_skip_prefetch_thread
that is called when we fail to create either the queuing thread or
the first of the workers. It marks the args ready for processing, marks
it done so start_inode_prefetch doesn't add another AG to the list, and
tries to start a new thread for the next AG in the list. We also clear
->next_args and check for it in cleanup_inode_prefetch so this condition
is easier to catch should it arise again.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Jeff Mahoney [Tue, 22 Aug 2017 15:01:31 +0000 (10:01 -0500)]
xfs_repair: add prefetch trace calls to debug thread creation failures
When debugging prefetch failures, it's useful to have thread creation
failure messages that are output as warnings on stderr in the trace
log as well. It's also helpful to see when an AG gets queued behind
another one rather than having the thread started directly, which
has a separate trace line.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Jeff Mahoney [Tue, 22 Aug 2017 15:01:30 +0000 (10:01 -0500)]
xfs_repair: clear pthread_t when pthread_create fails
pf_queuing_worker and pf_create_prefetch_thread both try to handle
thread creation failure gracefully, but assume that pthread_create
doesn't modify the pthread_t when it fails.
From the pthread_create man page:
On success, pthread_create() returns 0; on error, it returns an error
number, and the contents of *thread are undefined.
In fact, glibc's pthread_create writes the pthread_t value before
calling clone(). When we join the created threads in
cleanup_inode_prefetch and the cleanup stage of pf_queuing_worker, we
assume that if the pthread_t is nonzero that it's a valid thread handle
and end up crashing in pthread_join.
This patch zeros out the handle after pthread_create failure.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
When seeking for data and holes, the lseek result must be greater than
or equal to the start offset. Furthermore, assuming that the file
doesn't change under us, when switching between SEEK_HOLE and SEEK_DATA,
the seek position must increase monotonically. Warn and abort if this
is not the case.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Jeff Mahoney [Tue, 22 Aug 2017 15:01:30 +0000 (10:01 -0500)]
fsr: fix uninitialized fs usage after timeout
In the main loop of fsrallfs, we exit when we've hit the timeout but
we increment fs before we get there. If we're operating on the last
file system in the array, we'll hit an uninitialized fsdesc and
crash in fsrall_cleanup.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[sandeen: change Jeff's for(; loop] Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Tue, 22 Aug 2017 15:01:30 +0000 (10:01 -0500)]
xfs_db: bit fuzzing should read the right bit when flipping
The middle and last bit flip fuzz verbs need to read the same bit that
they're trying to set.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 18 Aug 2017 17:00:03 +0000 (12:00 -0500)]
xfs_db: make write/fuzz -c and -d work on non-crc filesystems
For a non-crc filesystem, make write/fuzz -c and -d work properly
instead of bailing out. Since there's no checksum to update, both
cases collapse to setting the field value without calling the write
verifier.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 18 Aug 2017 17:00:01 +0000 (12:00 -0500)]
xfs_db: free field list when failing out of fuzz
Fix a missed opportunity to free the field list when we fail out of the
fuzz command by refactoring the error clauses to use a common cleanup
clause.
Fixes-coverity-id: 1416141 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 18 Aug 2017 17:00:00 +0000 (12:00 -0500)]
xfs_db: reset metadump output flag
On the off chance that someone runs metadump more than once with the
metadump file going to stdout and then not stdout, the stdout_metadump
variable will not be reset before the second invocation. Clear the
status variable when we undo the stdout redirection.
Fixes-coverity-id: 1416140 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 4 Aug 2017 21:33:52 +0000 (16:33 -0500)]
xfs_db: btdump should avoid eval for push and pop of cursor
We can call the cursor push and pop functions directly from btdump,
so skip all the eval overhead.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Fri, 4 Aug 2017 21:33:52 +0000 (16:33 -0500)]
xfs_db: use TYP_F_CRC_FUNC for inodes & dquots
Now that typ_t has a ->set_crc method, use it for inodes & dquots
as well, rather than recognizing them as special types and calling
their crc functions directly by name.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 4 Aug 2017 21:33:52 +0000 (16:33 -0500)]
xfs_db: introduce fuzz command
Introduce a new 'fuzz' command to write creative values into
disk structure fields.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
[sandeen: tweak words in help a bit for consistency] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 4 Aug 2017 21:33:52 +0000 (16:33 -0500)]
xfs_db: write values into dir/attr blocks and recalculate CRCs
Extend typ_t to (optionally) store a pointer to a function to calculate
the CRC of the block, provide functions to do this for the dir3 and
attr3 types, and then wire up the write command so that we can modify
directory and extended attribute block fields.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 4 Aug 2017 21:33:52 +0000 (16:33 -0500)]
xfs_db: print attribute remote value blocks
Teach xfs_db how to print the contents of xattr remote value blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 4 Aug 2017 21:33:52 +0000 (16:33 -0500)]
xfs_db: dump dir/attr btrees
Dump the directory or extended attribute btree contents.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 4 Aug 2017 21:33:51 +0000 (16:33 -0500)]
xfs_db: fix metadump redirection (again)
In patch 4944defad4 ("xfs_db: redirect printfs when metadumping to
stdout"), we solved the problem of xfs_db printfs ending up in the
metadump stream by reassigning stdout for the duration of a stdout
metadump. Unfortunately, musl doesn't allow stdout to be reassigned (in
their view "extern FILE *stdout" means "extern FILE * const stdout"), so
we abandon the old approach in favor of playing games with dup() to
switch the raw file descriptors.
While we're at it, fix a regression where an unconverted outf test
allows progress info to end up in the metadump stream.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 4 Aug 2017 21:33:51 +0000 (16:33 -0500)]
xfs_repair: fix symlink target length checks by changing MAXPATHLEN to XFS_SYMLINK_MAXLEN
XFS has a maximum symlink target length of 1024 bytes; this is a
holdover from the Irix days. Unfortunately, the constant establishing
this was 'MAXPATHLEN', and is /not/ the same as the Linux MAXPATHLEN,
which is 4096.
The kernel enforces its 1024 byte MAXPATHLEN on symlink targets, but
xfsprogs picks up the (Linux) system 4096 byte MAXPATHLEN, which means
that xfs_repair doesn't complain about oversized symlinks.
Since this is an on-disk format constraint, put the define in the XFS
namespace. As a side effect of the rename, xfs_repair wil detect
oversized symlinks and clean them off the system.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 4 Aug 2017 21:33:51 +0000 (16:33 -0500)]
xfsprogs: remove double-underscore integer types
This is a purely mechanical patch that removes the private
__{u,}int{8,16,32,64}_t typedefs in favor of using the system
{u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform
the transformation and fix the resulting whitespace and indentation
errors:
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
[sandeen: fix whitespace incidentals] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Fri, 4 Aug 2017 21:33:51 +0000 (16:33 -0500)]
xfs_metadump: properly handle obfuscation of all remote attribute blocks
add_remote_vals assumes that it can subtract blocksize
from each block that it processes, but with CRCs, there
is a header on each block, so the assumption that each
block consumes $BLOCKSIZE of the value length is incorrect.
This causes us to stop adding remote blocks too soon, and
the missed blocks do not get obfuscated.
Fix this by accounting for the header size as appropriate,
depending on whether or not we have a CRC filesystem.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Ross Zwisler [Fri, 4 Aug 2017 21:33:51 +0000 (16:33 -0500)]
xfs_io: allow lsattr & lsproj on foreign filesystems
The following commit:
commit 73b54bb6a2fb ("xfs_io: allow chattr & chproj on foreign filesystems")
allowed chattr and chproj to be run on non-xfs filesystems now that
FS_IOC_FSSETXATTR is a generic vfs call. It failed to enable the
corresponding lsattr and lsproj commands for those filesystems, though.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Fixes: 73b54bb6a2fb ("xfs_io: allow chattr & chproj on foreign filesystems") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Brian Foster [Fri, 4 Aug 2017 21:33:51 +0000 (16:33 -0500)]
libxfs: init ->b_maps on contig buffers for uncached compatibility
There is a bit of an inconsistency in how ->b_maps is used for
contiguous buffers between kernel libxfs and xfsprogs due to the
independent buffer implementations. In the kernel, ->b_maps[0] is
always intialized to a valid range and in xfsprogs, ->b_maps is only
allocated for discontiguous buffers.
This can lead to confusion when dealing with uncached kernel buffers
in common libxfs code because xfsprogs has no concept of uncached
buffers. Kernel uncached buffers have ->b_bn == XFS_BUF_DADDR_NULL
and ->b_maps[0] points to the physical block address. Block address
checks in common code for kernel uncached buffers, such as in
xfs_sb_verify(), therefore would need to check both places for an
address or risk broken logic or userspace segfaults.
This problem currently manifests as an xfs_repair segfault due to a
NULL ->b_maps access in xfs_sb_verify(). Note that this problem is
only reproducible on builds with (-O2) optimization disabled, as the
affected parameter is currently unused and thus optimization
eliminates the problematic access.
To fix this problem and eliminate the incompatibility, update the
userspace xfs_buf with an internal ->__b_map field and point
->b_maps to it for contiguous buffers, similar to the kernel buffer
implementation. Set valid values in ->b_maps0] for contiguous
buffers so common code will continue to work regardless of whether a
buffer is uncached in the kernel.
Signed-off-by: Brian Foster <bfoster@redhat.com> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Fri, 4 Aug 2017 21:33:45 +0000 (16:33 -0500)]
xfs_db: associate proper type with free inode btree root
When navigating to the free inode btree root, the wrong type
is set:
xfs_db> agi 0
xfs_db> addr free_root
xfs_db> type
current type is "inobt"
Change this to type finobt / TYP_FINOBT
(There seems to be no actual difference, but if we have an explicit type
name for the free inode btree, we should use it as appropriate)
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Carlos Maiolino [Fri, 4 Aug 2017 21:33:45 +0000 (16:33 -0500)]
xfs_io: Print filesystem statfs flags in 'statfs' command
Sometimes printing the flags from the statfs structure is useful, so,
make statfs command print them.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Just like in the allocator we must avoid touching multiple AGs out of
order when freeing blocks, as freeing still locks the AGF and can cause
the same AB-BA deadlocks as in the allocation path.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Nikolay Borisov <n.borisov.lkml@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
When we're checking the entries in a directory buffer, make sure that
the entry length doesn't push us off the end of the buffer. Found via
xfs/388 writing ones to the length fields.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In some circumstances, _alloc_read_agf can return an error code of zero
but also a null AGF buffer pointer. Check for this and jump out.
Fixes-coverity-id: 1415250
Fixes-coverity-id: 1415320 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
We must initialize the firstfsb parameter to _bmapi_write so that it
doesn't incorrectly treat stack garbage as a restriction on which AGs
it can search for free space.
Fixes-coverity-id: 1402025
Fixes-coverity-id: 1415167 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Check the _btree_check_block return value for the firstrec and lastrec
functions, since we have the ability to signal that the repositioning
did not succeed.
Fixes-coverity-id: 114067
Fixes-coverity-id: 114068 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The new XFS_QMOPT_NOLOCK isn't used at all, and conditional locking based
on a flag is always the wrong thing to do - we should be having helpers
that can be called without the lock instead.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The comment mentioned the wrong lock. Also add an ASSERT to assert
this locking precondition.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In quite a few places we call xfs_da_read_buf with a mappedbno that we
don't control, then assume that the function passes back either an error
code or a buffer pointer. Unfortunately, if mappedbno == -2 and bno
maps to a hole, we get a return code of zero and a NULL buffer, which
means that we crash if we actually try to use that buffer pointer. This
happens immediately when we set the buffer type for transaction context.
Therefore, check that we have no error code and a non-NULL bp before
trying to use bp. This patch is a follow-up to an incomplete fix in 96a3aefb8ffde231 ("xfs: don't crash if reading a directory results in an
unexpected hole").
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
XFS has a maximum symlink target length of 1024 bytes; this is a
holdover from the Irix days. Unfortunately, the constant establishing
this is 'MAXPATHLEN' and is /not/ the same as the Linux MAXPATHLEN,
which is 4096.
The kernel enforces its 1024 byte MAXPATHLEN on symlink targets, but
xfsprogs picks up the (Linux) system 4096 byte MAXPATHLEN, which means
that xfs_repair doesn't complain about oversized symlinks.
Since this is an on-disk format constraint, put the define in the XFS
namespace and move everything over to use the new name.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Add a new dqget flag that grabs the dquot without taking the ilock.
This will be used by the scrubber (which will have already grabbed
the ilock) to perform basic sanity checking of the quota data.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Since we moved the injected error frequency controls to the mountpoint,
we can get rid of the last argument to XFS_TEST_ERROR.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Teach the extended attribute reading functions to pass along a
transaction context if one was supplied. The extended attribute scrub
code will use transactions to lock buffers and avoid deadlocking with
itself in the case of loops; since it will already have the inode
locked, also create xattr get/list helpers that don't take locks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Teach the directory reading functions to pass along a transaction context
if one was supplied. The directory scrub code will use transactions to
lock buffers and avoid deadlocking with itself in the case of loops.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Modify the existing dir leafn lasthash function to enable us to
calculate the highest hash value of a leaf1 block. This will be used by
the directory scrubbing code to check the sanity of hashes in leaf1
directory blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Create a function to extract an in-core inobt record from a generic
btree_rec union so that scrub will be able to check inobt records
and check inode block alignment.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Plumb in the pieces (init_high_key, diff_two_keys) necessary to call
query_range on the inode space and block mapping btrees and to extract
raw btree records. This will eventually be used by the inobt and bmbt
scrubbers.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Export various internal functions so that the online scrubber can use
them to check the state of metadata.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The btree record and key inorder check functions will be used by the
btree scrubber code, so make sure they're always built.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This is a purely mechanical patch that removes the private
__{u,}int{8,16,32,64}_t typedefs in favor of using the system
{u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform
the transformation and fix the resulting whitespace and indentation
errors:
Don't bother wandering our way through the leaf nodes when the caller
issues a query_all; just zoom down the left side of the tree and walk
rightwards along level zero.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
XFS_HSIZE is an extremly confusing way to calculate the size of handle_t.
Given that handle_t always only had two sizes, and one of them isn't
even covered by XFS_HSIZE to start with just remove the macro and use
a constant sizeof expression.
Note that XFS_HSIZE isn't used in xfsprogs, xfsdump or xfstests either.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In a pathological scenario where we are trying to bunmapi a single
extent in which every other block is shared, it's possible that trying
to unmap the entire large extent in a single transaction can generate so
many EFIs that we overflow the transaction reservation.
Therefore, use a heuristic to guess at the number of blocks we can
safely unmap from a reflink file's data fork in an single transaction.
This should prevent problems such as the log head slamming into the tail
and ASSERTs that trigger because we've exceeded the transaction
reservation.
Note that since bunmapi can fail to unmap the entire range, we must also
teach the deferred unmap code to roll into a new transaction whenever we
get low on reservation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: random edits, all bugs are my fault] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 20 Jul 2017 15:51:46 +0000 (10:51 -0500)]
libxfs: propagate transaction block reservations
Certain parts of the libxfs preemptively refuse to run if the
transaction block reservation has fallen to zero. We leave t_blk_res at
its default of zero, which means that these code paths always fail even
if the transaction was allocated with a non-zero block reservation. Set
t_blk_res and maintain it through transaction rolls, even though we
don't do much enforcement the transaction block limits.
[sandeen: This broke during a libxfs sync to userspace, see Fixes:]
Fixes: 0268fdc3 ("xfs: remove xfs_trans_get_block_res") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Thu, 20 Jul 2017 15:51:46 +0000 (10:51 -0500)]
xfs_db: properly set inode type
When we set the type to "inode" the verifier validates multiple
inodes in the current fs block, so setting the buffer size to
that of just one inode is not sufficient and it'll emit spurious
verifier errors for all but the first, as we read off the end:
xfs_db> daddr 99
xfs_db> type inode
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Use the special set_cur_inode() function for this purpose
as is done in inode_f().
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
[sandeen: remove nag/warning printf for now] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 20 Jul 2017 15:51:37 +0000 (10:51 -0500)]
xfs_db: redirect printfs when metadumping to stdout
If we're metadumping to stdout, we don't want xfs_db's various dbprintf
statements dumping to stdout because that'll corrupt the metadump.
Therefore, let outf point to the existing stdout and redirect stdout to
stderr for the duration of the dump operation.
Reported-by: David Shaw <dshaw@jabberwocky.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Thu, 20 Jul 2017 15:51:34 +0000 (10:51 -0500)]
mkfs.xfs: allow specification of 0 data stripe width & unit
The "noalign" option works for this too, but it seems reasonable
to allow explicit specification of stripe unit and stripe width
to 0; today, doing so today makes the code think it's unspecified,
and so it goes ahead and detects stripe geometry and sets it in the
superblock. That's unexpected and surprising.
Create a new flag that tracks whtether a geometry option has been
specified, and if it's set along with 0 values, treat it the
same as if "noalign" had been specified.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 13 Jul 2017 16:51:27 +0000 (11:51 -0500)]
mkfs: set inode alignment and cluster size for minimum log size estimation
In order for mkfs to calculate the minimum log size correctly, it must
be able to find the transaction type with the largest reservation. The
iunlink transaction reservation size calculation depends on having the
inode cluster size set correctly, which in turn depends on the inode
alignment parameters being set as they will be in the final filesystem.
Therefore we have to set up the inoalignmt field in max_trans_res.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 13 Jul 2017 16:51:25 +0000 (11:51 -0500)]
mkfs: set agblklog when we're verifying minimum log size
In e5cc9d560a ("mkfs: set agsize prior to calculating minimum log
size"), we set the ag size in the superblock structure so that we can
calculate the maximum btree height correctly. The btree heights are
used to calculate transaction reservation sizes; these sizes are used to
compute the minimum log length; and the minimum log length is checked by
the kernel.
Unfortunately, I didn't realize that some of the btree sizing functions
also depend on the agblklog (log2 of the ag size), so we've been
underestimating the minimum log length allowable, which results in mkfs
formatting filesystems that the kernel refuses to mount.
This can be trivially reproduced by formatting a small (~800M) volume
with rmap and reflink turned on.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 30 Jun 2017 18:56:29 +0000 (13:56 -0500)]
libxfs: fix fsmap.h inclusion
If we /do/ have HAVE_GETFSMAP defined, we need to include linux/fsmap.h.
Found-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 30 Jun 2017 16:02:46 +0000 (11:02 -0500)]
xfs_db: identify attr dabtree field types correctly
For whatever reason, the v5 xattr dabtree header fields are mapped to
the directory dabtree header fields, which means that the types are
wrong and hence we cannot use the 'addr' command to step through the
tree. Since the v4 attr dabtree does this correctly, simply port the v5
fields to the attr code too.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>