git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

xfs: Don't log uninitialised fields in inode structures

Source kernel commit: 20413e37d71befd02b5846acdaf5e2564dd1c38e

Prevent kmemcheck from throwing warnings about reading uninitialised
memory when formatting inodes into the incore log buffer. There are
several issues here - we don't always log all the fields in the
inode log format item, and we never log the inode the
di_next_unlinked field.

In the case of the inode log format item, this is exacerbated
by the old xfs_inode_log_format structure padding issue. Hence make
the padded, 64 bit aligned version of the structure the one we always
use for formatting the log and get rid of the 64 bit variant. This
means we'll always log the 64-bit version and so recovery only needs
to convert from the unpadded 32 bit version from older 32 bit
kernels.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: revert "xfs: factor rmap btree size into the indlen calculations"

Source kernel commit: 5e5c943c1f257c2b3424fc3f8a7b18570152dab3

In commit fd26a88093ba we added a worst case estimate for rmapbt blocks
needed to satisfy the block mapping request. Since then, we added the
ability to reserve enough space in each AG such that we should never run
out of blocks to grow the rmapbt, which makes this calculation
unnecessary. Revert the commit because it makes the extra delalloc
indlen accounting unnecessary and incorrect.

Reported-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: perag initialization should only touch m_ag_max_usable for AG 0

Source kernel commit: 9789dd9e1d939232e8ff4c50ef8e75aa6781b3fb

We call __xfs_ag_resv_init to make a per-AG reservation for each AG.
This makes the reservation per-AG, not per-filesystem. Therefore, it
is incorrect to adjust m_ag_max_usable for each AG. Adjust it only
when we're reserving AG 0's blocks so that we only do it once per fs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix compiler warnings

Source kernel commit: 7bf7a193a90cadccaad21c5970435c665c40fe27

Fix up all the compiler warnings that have crept in.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: simplify the rmap code in xfs_bmse_merge

Source kernel commit: 4cc1ee5e654114aa7fac6993488ad2cd0b3411bb

In Christoph's patch to refactor xfs_bmse_merge, the updated rmap code
does more work than it needs to (because map-extent auto-merges
records). Remove the unnecessary unmap and save ourselves a deferred
op.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: use xfs_iext_*_extent helpers in xfs_bmap_split_extent_at

Source kernel commit: 4c35445b591ee669097c5b98e4bb677808e9f582

This abstracts the function away from details of the low-level extent
list implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: use xfs_iext_*_extent helpers in xfs_bmap_shift_extents

Source kernel commit: 4da6b514eaa168c246fc5c1245c4f82084bcf24e

This abstracts the function away from details of the low-level extent
list implementation.

Note that it seems like the previous implementation of rmap for
the merge case was completely broken, but it no seems appear to
trigger that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: move some code around inside xfs_bmap_shift_extents

Source kernel commit: 05b7c8ab2be71e6fef4615451e7af1bc79ffdf29

For the first right move we need to look up next_fsb. That means
our last fsb that contains next_fsb must also be the current extent,
so take advantage of that by moving the code around a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: use xfs_iext_get_extent in xfs_bmap_first_unused

Source kernel commit: f2285c148c4167337d12452bebccadd2ad821d5d

Use the bmap abstraction instead of open-coding bmbt details here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: switch xfs_bmap_local_to_extents to use xfs_iext_insert

Source kernel commit: 50bb44c28614205def9e711190842b4c0242ae79

Use the helper instead of open coding it, to provide a better abstraction
for the scalable extent list work. This also gets an additional assert
and trace point for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: add a xfs_iext_update_extent helper

Source kernel commit: 67e4e69cb2a7afbffdefd1a0a23a94d1d706c38f

This helper is used to update an extent record based on the extent index,
and can be used to provide a level of abstractions between callers that
want to modify in-core extent records and the details of the extent list
implementation.

Also switch all users of the xfs_bmbt_set_all(xfs_iext_get_ext(...))
pattern to this new helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: relog dirty buffers during swapext bmbt owner change

Source kernel commit: 2dd3d709fc4338681a3aa61658122fa8faa5a437

The owner change bmbt scan that occurs during extent swap operations
does not handle ordered buffer failures. Buffers that cannot be
marked ordered must be physically logged so previously dirty ranges
of the buffer can be relogged in the transaction.

Since the bmbt scan may need to process and potentially log a large
number of blocks, we can't expect to complete this operation in a
single transaction. Update extent swap to use a permanent
transaction with enough log reservation to physically log a buffer.
Update the bmbt scan to physically log any buffers that cannot be
ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
caller rolls the transaction and restarts the scan. Finally, update
the bmbt scan helper function to skip bmbt blocks that already match
the expected owner so they are not reprocessed after scan restarts.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[darrick: fix the xfs_trans_roll call]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[dchinner: proper userspace libxfs_trans_ordered_buf]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: skip bmbt block ino validation during owner change

Source kernel commit: 99c794c639a65cc7b74f30a674048fd100fe9ac8

Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
owners on v5 (!rmapbt) filesystems. The bmbt scan uses
xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
current owner of the block against the parent inode of the bmbt.
This works during extent swap because the bmbt owners are updated to
the opposite inode number before the inode extent forks are swapped.

The modified bmbt blocks are marked as ordered buffers which allows
everything to commit in a single transaction. If the transaction
extent swap is required, log recovery restarts the bmbt scan to fix
up any bmbt blocks that may have not been written back before the
crash. The log recovery bmbt scan occurs after the inode forks have
been swapped, however. This causes the bmbt block owner verification
to fail, leads to log recovery failure and requires xfs_repair to
zap the log to recover.

Define a new invalid inode owner flag to inform the btree block
lookup mechanism that the current inode may be invalid with respect
to the current owner of the bmbt block. Set this flag on the cursor
used for change owner scans to allow this operation to work at
runtime and during log recovery.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Fixes: bb3be7e7c ("xfs: check for bogus values in btree block headers")
Cc: stable@vger.kernel.org
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: don't log dirty ranges for ordered buffers

Source kernel commit: 8dc518dfa7dbd079581269e51074b3c55a65a880

Ordered buffers are attached to transactions and pushed through the
logging infrastructure just like normal buffers with the exception
that they are not actually written to the log. Therefore, we don't
need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is
called on ordered buffers to set up all of the dirty state on the
transaction, buffer and log item and prepare the buffer for I/O.

Now that xfs_trans_dirty_buf() is available, call it from
xfs_trans_ordered_buf() so the latter is now mutually exclusive with
xfs_trans_log_buf(). This reflects the implementation of ordered
buffers and helps eliminate confusion over the need to log ranges of
ordered buffers just to set up internal log state.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove the ip argument to xfs_defer_finish

Source kernel commit: 8ad7c629b18695ec1ee8654fb27599864049862b

And instead require callers to explicitly join the inode using
xfs_defer_ijoin. Also consolidate the defer error handling in
a few places using a goto label.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: rename xfs_defer_join to xfs_defer_ijoin

Source kernel commit: 882d8785fb87f691000a0b33c215364d74bd2ceb

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: refactor xfs_trans_roll

Source kernel commit: 411350df14a3d6f1c769ea64a8b43a71f8d9760e

Split xfs_trans_roll into a low-level helper that just rolls the
actual transaction and a new higher level xfs_trans_roll_inode
that takes care of logging and rejoining the inode. This gets
rid of the NULL inode case, and allows to simplify the special
cases in the deferred operation code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: stop searching for free slots in an inode chunk when there are none

Source kernel commit: 2d32311cf19bfb8c1d2b4601974ddd951f9cfd0b

In a filesystem without finobt, the Space manager selects an AG to alloc a new
inode, where xfs_dialloc_ag_inobt() will search the AG for the free slot chunk.

When the new inode is in the same AG as its parent, the btree will be searched
starting on the parent's record, and then retried from the top if no slot is
available beyond the parent's record.

To exit this loop though, xfs_dialloc_ag_inobt() relies on the fact that the
btree must have a free slot available, once its callers relied on the
agi->freecount when deciding how/where to allocate this new inode.

In the case when the agi->freecount is corrupted, showing available inodes in an
AG, when in fact there is none, this becomes an infinite loop.

Add a way to stop the loop when a free slot is not found in the btree, making
the function to fall into the whole AG scan which will then, be able to detect
the corruption and shut the filesystem down.

As pointed by Brian, this might impact performance, giving the fact we
don't reset the search distance anymore when we reach the end of the
tree, giving it fewer tries before falling back to the whole AG search, but
it will only affect searches that start within 10 records to the end of the tree.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.13.1

Update all the necessary files for a 4.13.1 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: don't overflow the subopts array

The new -d cowextsize option overran the subopts array; make it larger.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.13.0

Update all the necessary files for a 4.13.0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.13.0-rc2

Update all the necessary files for a 4.13.0-rc2 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: handle missing extent states

Missed a couple of the new extent states in the bmbt processing, so add
them to avoid aborting xfs_repair.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: move XR_E_REFC case above fallthrough to emit both do_warn]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: pass a custom cowextsize into the created filesystem

Create a -d option to mkfs.xfs that enables administrators to set
the CoW extent size hint on the created files.

[sandeen: Note, the switch to xfs_flags2diflags looks like
a bugfix, but it's not - the flags set by mkfs up to this
point just happened to line up without any translation.]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: version command misses RMAPBT feature string

When I create a XFS with "rmapbt=1,reflink=1", "xfs_db -c version"
didn't show RMAPBT feature. But REFLINK can be found.

  # mkfs/mkfs.xfs -f -m rmapbt=1,reflink=1 /dev/sda3
  # db/xfs_db -c version /dev/sda3
  versionnum [0xb4a5+0x18a] = V5,NLINK,DIRV2,ALIGN,LOGV2,EXTFLG,MOREBITS,
                  ATTR2,LAZYSBCOUNT,PROJID32BIT,CRC,FTYPE,FINOBT,REFLINK

Signed-off-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.13.0-rc1

Update all the necessary files for a 4.13.0-rc1 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: don't use do_warn for normal log message

In some case, exit status of xfs_repair -n is different even for
the same file system when -v is specified or not. This patch fixes
this behavior.

If -v is specified, do_warn() is used in zero_log() for printing
a normal message. That makes the exit status to 1 though there
is no dirtiness in the file system.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: edit changelog for brevity]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: add documentation for forgotten mkfs flags

Add documentation for some undocumented mkfs -d and -l flags.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: remove getcwd/chdir dance from initialization

Back in the old days of Irix, the library saved and restored chdir so
that (I guess?) check_open can change directory without screwing up our
ability to use relative paths. However, there's nothing in Linux that
actually does this, so just rip out the getcwd/chdir stuff since we
absolute device paths work just fine without it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

misc: fix more stupid compiler warnings

Fix more compiler warnings about pointless checks, unchecked return
values, brace problems, and missing parentheses.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: take the ag_lock before recording rmap for a bmbt record

When the (threaded) inode scanner iterates the blocks of a bmbt tree and
wants to record the bmbt blocks in the in-core rmap database, we have to
take the ag_lock for the AG that the bmbt block is in, or else we can
accidentally corrupt the rmap slab by calling slab_add from two threads.

Reported-by: matorola@gmail.com
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs.xfs: Don't stagger AG for a single disk

When sunit and swidth are used mkfs.xfs tries to avoid all allocation
groups aligning on the same stripe and will attempt to stagger them
across the stripes that make up swidth.  If there is only one stripe
then there is no benefit in this optimisation.

$ truncate -s10G xfs_10G_su256k_sw1.image
$ mkfs.xfs -d su=256k,sw=1 xfs_10G_su256k_sw1.image
meta-data=xfs_10G_su256k_sw1.image isize=512    agcount=16, agsize=163776 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=2620416, imaxpct=25
         =                       sunit=64     swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

A side effect of the optimisation is that the size adjustment used to stager
the allocation groups causes the last sunit of storage to be unused.

$ echo $((2620416*4096))
10733223936
$ ls -l xfs_10G_su256k_sw1.image
-rw-rw-r--. 1 test test 10737418240 Aug 30 10:54 xfs_10G_su256k_sw1.image

Skip this optimisation when sunit == swidth.

Signed-off-by: Donald Douwsma <ddouwsma@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

Merge branch 'libxfs-4.13-sync' into for-next

xfs: fix inobt inode allocation search optimization

Source kernel commit: c44245b3d5435f533ca8346ece65918f84c057f9

When we try to allocate a free inode by searching the inobt, we try to
find the inode nearest the parent inode by searching chunks both left
and right of the chunk containing the parent. As an optimization, we
cache the leftmost and rightmost records that we previously searched; if
we do another allocation with the same parent inode, we'll pick up the
search where it last left off.

There's a bug in the case where we found a free inode to the left of the
parent's chunk: we need to update the cached left and right records, but
because we already reassigned the right record to point to the left, we
end up assigning the left record to both the cached left and right
records.

This isn't a correctness problem strictly, but it can result in the next
allocation rechecking chunks unnecessarily or allocating inodes further
away from the parent than it needs to. Fix it by swapping the record
pointer after we update the cached left and right records.

Fixes: bd169565993b ("xfs: speed up free inode search")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: clarify the fsmap documentation

Explicitly declare that the 'start' and 'end' arguments to fsmap require
one of -d, -l, or -r to select the data, log, or realtime device, and fix
misspelled command name while we're at it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_bmap: fix -n documentation in manpage

xfs_bmap's manpage mis-describes the behavior of the
-n option. xfs_io's fiemap command references the xfs_bmap
manpage, and has the same problem:

-n does not change the query batch size, it limits the number
of extents displayed.

This has been true for 15+ years, so change the documentation
to match reality.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

fiemap: Fix semantics of max_extents (-n arguments)

Currently the semantics of the -n argument are a bit idiosyncratic. We want the
argument to be the limit of extents that are going to be output by the tool. This
is clearly broken now as evident from the following example on a fragmented file:

xfs_io -c "fiemap -v -n 5" test-dir/fragmented-file
test-dir/fragmented-file:
EXT: FILE-OFFSET      BLOCK-RANGE          TOTAL FLAGS
   0: [0..15]:         hole                    16
   1: [16..23]:        897847296..897847303     8   0x0
   2: [24..31]:        hole                     8
   3: [32..39]:        897851392..897851399     8   0x0

So we want at most 5 extents printed, yet we get 4. So we always print n - 1
extents.

With this modification the output looks like:

xfs_io -c "fiemap -v -n 5" test-dir/fragmented-file
test-dir/fragmented-file:
EXT: FILE-OFFSET      BLOCK-RANGE          TOTAL FLAGS
   0: [0..15]:         hole                    16
   1: [16..23]:        897847296..897847303     8   0x0
   2: [24..31]:        hole                     8
   3: [32..39]:        897851392..897851399     8   0x0
   4: [40..47]:        hole                     8

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
[sandeen: fix initialization of max_extents]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

fiemap: Factor out common code used for printing holes

The code responsible for printing holes is scattered across 3 places:
plain print function, verbose print function and in the block handling EOF hole.
Introduce a new function factoring out the common code and replace the 3 sites
where the code is used with it. This reduces duplication and makes it apparent
when we are printing holes. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

fiemap: De-obfuscate last_logical and cur_extent manipulation

last_logical and cur_extent are being passed by reference to the printing
functions and the in turn modify those variables. This makes it a bit harder to
reason about the code. So change the printing function to take those 2 arguemnts
by value and move the manipulation logic in fiemap_f. Furthermore, the printing
function now return the number of extents they have printed (either 1 or 2,
dependent on whether we've hit the -n limit). No functional changes

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

fiemap: Eliminate num_extents

Fiemap has this rather convoluted logic to calculate the number of extents to
query. This introduces needless complexity with no real benefit. Remove
num_extents and instead hardcode the number of extents we query for in a single
go to 32. No functional changes

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

fiemap: Make max_extents a global var

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

fiemap: Remove blocksize variable

The blocksize variable was hardcoded to 512 bytes and was passed to various
functions. This introduced a lot of redundancy since we can just as well use
the BTOBBT macro. So let's do that and eliminate all usage of the blocksize var.
No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix thread creation failure recovery

When pf_create_prefetch_thread fails, it tears down the args struct
and frees it.  This causes a use-after-free in prefetch_ag_range, which
then passes the now-invalid pointer to start_inode_prefetch.  The struct
is only freed when the queuing thread can't be started.  When we can't
start even one worker thread, we mark the args ready for processing and
allow it to proceed single-threaded.  Unfortunately, this only marks
the current args ready for processing and since we return immediately,
the call to pf_create_prefetch_thread at the end of pf_queuing_worker
never gets called and we wait forever for prefetch to start on the
next AG.

This patch factors out the cleanup into a new pf_skip_prefetch_thread
that is called when we fail to create either the queuing thread or
the first of the workers.  It marks the args ready for processing, marks
it done so start_inode_prefetch doesn't add another AG to the list, and
tries to start a new thread for the next AG in the list.  We also clear
->next_args and check for it in cleanup_inode_prefetch so this condition
is easier to catch should it arise again.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: add prefetch trace calls to debug thread creation failures

When debugging prefetch failures, it's useful to have thread creation
failure messages that are output as warnings on stderr in the trace
log as well. It's also helpful to see when an AG gets queued behind
another one rather than having the thread started directly, which
has a separate trace line.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: clear pthread_t when pthread_create fails

pf_queuing_worker and pf_create_prefetch_thread both try to handle
thread creation failure gracefully, but assume that pthread_create
doesn't modify the pthread_t when it fails.

From the pthread_create man page:
On success, pthread_create() returns 0; on error, it returns an error
number, and the contents of *thread are undefined.

In fact, glibc's pthread_create writes the pthread_t value before
calling clone(). When we join the created threads in
cleanup_inode_prefetch and the cleanup stage of pf_queuing_worker, we
assume that if the pthread_t is nonzero that it's a valid thread handle
and end up crashing in pthread_join.

This patch zeros out the handle after pthread_create failure.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: add seek consistency checks

When seeking for data and holes, the lseek result must be greater than
or equal to the start offset. Furthermore, assuming that the file
doesn't change under us, when switching between SEEK_HOLE and SEEK_DATA,
the seek position must increase monotonically. Warn and abort if this
is not the case.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

fsr: fix uninitialized fs usage after timeout

In the main loop of fsrallfs, we exit when we've hit the timeout but
we increment fs before we get there. If we're operating on the last
file system in the array, we'll hit an uninitialized fsdesc and
crash in fsrall_cleanup.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[sandeen: change Jeff's for(; loop]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: bit fuzzing should read the right bit when flipping

The middle and last bit flip fuzz verbs need to read the same bit that
they're trying to set.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: add missing translation

Add a missing underscore where it was omitted probably by a mistake.

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: make write/fuzz -c and -d work on non-crc filesystems

For a non-crc filesystem, make write/fuzz -c and -d work properly
instead of bailing out. Since there's no checksum to update, both
cases collapse to setting the field value without calling the write
verifier.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: free field list when failing out of fuzz

Fix a missed opportunity to free the field list when we fail out of the
fuzz command by refactoring the error clauses to use a common cleanup
clause.

Fixes-coverity-id: 1416141
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: reset metadump output flag

On the off chance that someone runs metadump more than once with the
metadump file going to stdout and then not stdout, the stdout_metadump
variable will not be reset before the second invocation. Clear the
status variable when we undo the stdout redirection.

Fixes-coverity-id: 1416140
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: btdump should avoid eval for push and pop of cursor

We can call the cursor push and pop functions directly from btdump,
so skip all the eval overhead.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: use TYP_F_CRC_FUNC for inodes & dquots

Now that typ_t has a ->set_crc method, use it for inodes & dquots
as well, rather than recognizing them as special types and calling
their crc functions directly by name.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: introduce fuzz command

Introduce a new 'fuzz' command to write creative values into
disk structure fields.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
[sandeen: tweak words in help a bit for consistency]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: write values into dir/attr blocks and recalculate CRCs

Extend typ_t to (optionally) store a pointer to a function to calculate
the CRC of the block, provide functions to do this for the dir3 and
attr3 types, and then wire up the write command so that we can modify
directory and extended attribute block fields.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: print attribute remote value blocks

Teach xfs_db how to print the contents of xattr remote value blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: dump dir/attr btrees

Dump the directory or extended attribute btree contents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: fix metadump redirection (again)

In patch 4944defad4 ("xfs_db: redirect printfs when metadumping to
stdout"), we solved the problem of xfs_db printfs ending up in the
metadump stream by reassigning stdout for the duration of a stdout
metadump. Unfortunately, musl doesn't allow stdout to be reassigned (in
their view "extern FILE *stdout" means "extern FILE * const stdout"), so
we abandon the old approach in favor of playing games with dup() to
switch the raw file descriptors.

While we're at it, fix a regression where an unconverted outf test
allows progress info to end up in the metadump stream.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix symlink target length checks by changing MAXPATHLEN to XFS_SYMLINK_MAXLEN

XFS has a maximum symlink target length of 1024 bytes; this is a
holdover from the Irix days. Unfortunately, the constant establishing
this was 'MAXPATHLEN', and is /not/ the same as the Linux MAXPATHLEN,
which is 4096.

The kernel enforces its 1024 byte MAXPATHLEN on symlink targets, but
xfsprogs picks up the (Linux) system 4096 byte MAXPATHLEN, which means
that xfs_repair doesn't complain about oversized symlinks.

Since this is an on-disk format constraint, put the define in the XFS
namespace. As a side effect of the rename, xfs_repair wil detect
oversized symlinks and clean them off the system.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: remove double-underscore integer types

This is a purely mechanical patch that removes the private
__{u,}int{8,16,32,64}_t typedefs in favor of using the system
{u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform
the transformation and fix the resulting whitespace and indentation
errors:

s/typedef\t__uint8_t/typedef __uint8_t\t/g
s/typedef\t__uint/typedef __uint/g
s/typedef\t__int$[0-9]*$_t/typedef int\1_t\t/g
s/__uint8_t\t/__uint8_t\t\t/g
s/__uint/uint/g
s/__int$[0-9]*$_t\t/__int\1_t\t\t/g
s/__int/int/g
/^typedef.*int[0-9]*_t;$/d

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
[sandeen: fix whitespace incidentals]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_metadump: properly handle obfuscation of all remote attribute blocks

add_remote_vals assumes that it can subtract blocksize
from each block that it processes, but with CRCs, there
is a header on each block, so the assumption that each
block consumes $BLOCKSIZE of the value length is incorrect.

This causes us to stop adding remote blocks too soon, and
the missed blocks do not get obfuscated.

Fix this by accounting for the header size as appropriate,
depending on whether or not we have a CRC filesystem.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: allow lsattr & lsproj on foreign filesystems

The following commit:

commit 73b54bb6a2fb ("xfs_io: allow chattr & chproj on foreign filesystems")

allowed chattr and chproj to be run on non-xfs filesystems now that
FS_IOC_FSSETXATTR is a generic vfs call. It failed to enable the
corresponding lsattr and lsproj commands for those filesystems, though.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Fixes: 73b54bb6a2fb ("xfs_io: allow chattr & chproj on foreign filesystems")
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: init ->b_maps on contig buffers for uncached compatibility

There is a bit of an inconsistency in how ->b_maps is used for
contiguous buffers between kernel libxfs and xfsprogs due to the
independent buffer implementations. In the kernel, ->b_maps[0] is
always intialized to a valid range and in xfsprogs, ->b_maps is only
allocated for discontiguous buffers.

This can lead to confusion when dealing with uncached kernel buffers
in common libxfs code because xfsprogs has no concept of uncached
buffers. Kernel uncached buffers have ->b_bn == XFS_BUF_DADDR_NULL
and ->b_maps[0] points to the physical block address. Block address
checks in common code for kernel uncached buffers, such as in
xfs_sb_verify(), therefore would need to check both places for an
address or risk broken logic or userspace segfaults.

This problem currently manifests as an xfs_repair segfault due to a
NULL ->b_maps access in xfs_sb_verify(). Note that this problem is
only reproducible on builds with (-O2) optimization disabled, as the
affected parameter is currently unused and thus optimization
eliminates the problematic access.

To fix this problem and eliminate the incompatibility, update the
userspace xfs_buf with an internal ->__b_map field and point
->b_maps to it for contiguous buffers, similar to the kernel buffer
implementation. Set valid values in ->b_maps0] for contiguous
buffers so common code will continue to work regardless of whether a
buffer is uncached in the kernel.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: associate proper type with free inode btree root

When navigating to the free inode btree root, the wrong type
is set:

xfs_db> agi 0
xfs_db> addr free_root
xfs_db> type
current type is "inobt"

Change this to type finobt / TYP_FINOBT

(There seems to be no actual difference, but if we have an explicit type
name for the free inode btree, we should use it as appropriate)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: Print filesystem statfs flags in 'statfs' command

Sometimes printing the flags from the statfs structure is useful, so,
make statfs command print them.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix multi-AG deadlock in xfs_bunmapi

Source kernel commit: 5b094d6dac0451ad89b1dc088395c7b399b7e9e8

Just like in the allocator we must avoid touching multiple AGs out of
order when freeing blocks, as freeing still locks the AGF and can cause
the same AB-BA deadlocks as in the allocation path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: check that dir block entries don't off the end of the buffer

Source kernel commit: 6215894e11de224183c89b001f5363912442b489

When we're checking the entries in a directory buffer, make sure that
the entry length doesn't push us off the end of the buffer. Found via
xfs/388 writing ones to the length fields.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: check _alloc_read_agf buffer pointer before using

Source kernel commit: 10479e2dea83d4c421ad05dfc55d918aa8dfc0cd

In some circumstances, _alloc_read_agf can return an error code of zero
but also a null AGF buffer pointer. Check for this and jump out.

Fixes-coverity-id: 1415250
Fixes-coverity-id: 1415320
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: set firstfsb to NULLFSBLOCK before feeding it to _bmapi_write

Source kernel commit: 4c1a67bd3606540b9b42caff34a1d5cd94b1cf65

We must initialize the firstfsb parameter to _bmapi_write so that it
doesn't incorrectly treat stack garbage as a restriction on which AGs
it can search for free space.

Fixes-coverity-id: 1402025
Fixes-coverity-id: 1415167
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: check _btree_check_block value

Source kernel commit: 1e86eabe73b73c82e1110c746ed3ec6d5e1c0a0d

Check the _btree_check_block return value for the firstrec and lastrec
functions, since we have the ability to signal that the repositioning
did not succeed.

Fixes-coverity-id: 114067
Fixes-coverity-id: 114068
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

Revert "xfs: grab dquots without taking the ilock"

Source kernel commit: 0891f9971a3b00d243d5743cc78a628ad060adea

This reverts commit cfcce6478cee639b15cd50e68cf66f884e137312.

The new XFS_QMOPT_NOLOCK isn't used at all, and conditional locking based
on a flag is always the wrong thing to do - we should be having helpers
that can be called without the lock instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fixup xfs_attr_get_ilocked

Source kernel commit: cf69f8248cc89c0a0e82f8332f9e7f13ab014c98

The comment mentioned the wrong lock. Also add an ASSERT to assert
this locking precondition.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: don't crash on unexpected holes in dir/attr btrees

Source kernel commit: cd87d867920155911d0d2e6485b769d853547750

In quite a few places we call xfs_da_read_buf with a mappedbno that we
don't control, then assume that the function passes back either an error
code or a buffer pointer.  Unfortunately, if mappedbno == -2 and bno
maps to a hole, we get a return code of zero and a NULL buffer, which
means that we crash if we actually try to use that buffer pointer.  This
happens immediately when we set the buffer type for transaction context.

Therefore, check that we have no error code and a non-NULL bp before
trying to use bp.  This patch is a follow-up to an incomplete fix in
96a3aefb8ffde231 ("xfs: don't crash if reading a directory results in an
unexpected hole").

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN

Source kernel commit: 6eb0b8df9f74f33d1a69100117630a7a87a9cc96

XFS has a maximum symlink target length of 1024 bytes; this is a
holdover from the Irix days. Unfortunately, the constant establishing
this is 'MAXPATHLEN' and is /not/ the same as the Linux MAXPATHLEN,
which is 4096.

The kernel enforces its 1024 byte MAXPATHLEN on symlink targets, but
xfsprogs picks up the (Linux) system 4096 byte MAXPATHLEN, which means
that xfs_repair doesn't complain about oversized symlinks.

Since this is an on-disk format constraint, put the define in the XFS
namespace and move everything over to use the new name.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: grab dquots without taking the ilock

Source kernel commit: 50e0bdbe9f48f98bb02eac7030d682f4716884ae

Add a new dqget flag that grabs the dquot without taking the ilock.
This will be used by the scrubber (which will have already grabbed
the ilock) to perform basic sanity checking of the quota data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unneeded parameter from XFS_TEST_ERROR

Source kernel commit: 9e24cfd044853e0e46e7149b91b7bb09effb0a79

Since we moved the injected error frequency controls to the mountpoint,
we can get rid of the last argument to XFS_TEST_ERROR.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: pass along transaction context when reading xattr block buffers

Source kernel commit: ad017f6537dee30a67b89f937a16e2f6c82e3774

Teach the extended attribute reading functions to pass along a
transaction context if one was supplied. The extended attribute scrub
code will use transactions to lock buffers and avoid deadlocking with
itself in the case of loops; since it will already have the inode
locked, also create xattr get/list helpers that don't take locks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: pass along transaction context when reading directory block buffers

Source kernel commit: acb9553cab552cf17154814f079f54401eefa474

Teach the directory reading functions to pass along a transaction context
if one was supplied. The directory scrub code will use transactions to
lock buffers and avoid deadlocking with itself in the case of loops.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: return the hash value of a leaf1 directory block

Source kernel commit: 8e8877e6edf2b593fe488b6efa8b3b7cfba96738

Modify the existing dir leafn lasthash function to enable us to
calculate the highest hash value of a leaf1 block. This will be used by
the directory scrubbing code to check the sanity of hashes in leaf1
directory blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: export _inobt_btrec_to_irec and _ialloc_cluster_alignment for scrub

Source kernel commit: e936945ee49693f40217db82a7db55c94e34ce4c

Create a function to extract an in-core inobt record from a generic
btree_rec union so that scrub will be able to check inobt records
and check inode block alignment.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: plumb in needed functions for range querying of various btrees

Source kernel commit: 118bb47e281cde728608633f1a358fb9f2ac0adc

Plumb in the pieces (init_high_key, diff_two_keys) necessary to call
query_range on the inode space and block mapping btrees and to extract
raw btree records. This will eventually be used by the inobt and bmbt
scrubbers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: export various function for the online scrubber

Source kernel commit: 2678809799e6e37db0800725157f5ebfc03a9df7

Export various internal functions so that the online scrubber can use
them to check the state of metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: always compile the btree inorder check functions

Source kernel commit: 38dee376d67047e9877a34e408013852c9729eb8

The btree record and key inorder check functions will be used by the
btree scrubber code, so make sure they're always built.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove double-underscore integer types

Source kernel commit: c8ce540db5f67d254aafb14b5d76422c62a906df

This is a purely mechanical patch that removes the private
__{u,}int{8,16,32,64}_t typedefs in favor of using the system
{u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform
the transformation and fix the resulting whitespace and indentation
errors:

s/typedef\t__uint8_t/typedef __uint8_t\t/g
s/typedef\t__uint/typedef __uint/g
s/typedef\t__int$[0-9]*$_t/typedef int\1_t\t/g
s/__uint8_t\t/__uint8_t\t\t/g
s/__uint/uint/g
s/__int$[0-9]*$_t\t/__int\1_t\t\t/g
s/__int/int/g
/^typedef.*int[0-9]*_t;$/d

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: optimize _btree_query_all

Source kernel commit: 5a4c73342ad493c61f19a1406f47dcd35e18030f

Don't bother wandering our way through the leaf nodes when the caller
issues a query_all; just zoom down the left side of the tree and walk
rightwards along level zero.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove XFS_HSIZE

Source kernel commit: 3398a4005f0c8ced67a9071475562d435d88b7a6

XFS_HSIZE is an extremly confusing way to calculate the size of handle_t.
Given that handle_t always only had two sizes, and one of them isn't
even covered by XFS_HSIZE to start with just remove the macro and use
a constant sizeof expression.

Note that XFS_HSIZE isn't used in xfsprogs, xfsdump or xfstests either.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent

Source kernel commit: e1a4e37cc7b665b6804fba812aca2f4d7402c249

In a pathological scenario where we are trying to bunmapi a single
extent in which every other block is shared, it's possible that trying
to unmap the entire large extent in a single transaction can generate so
many EFIs that we overflow the transaction reservation.

Therefore, use a heuristic to guess at the number of blocks we can
safely unmap from a reflink file's data fork in an single transaction.
This should prevent problems such as the log head slamming into the tail
and ASSERTs that trigger because we've exceeded the transaction
reservation.

Note that since bunmapi can fail to unmap the entire range, we must also
teach the deferred unmap code to roll into a new transaction whenever we
get low on reservation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: random edits, all bugs are my fault]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.12.0

Update all the necessary files for a 4.12.0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: propagate transaction block reservations

Certain parts of the libxfs preemptively refuse to run if the
transaction block reservation has fallen to zero. We leave t_blk_res at
its default of zero, which means that these code paths always fail even
if the transaction was allocated with a non-zero block reservation. Set
t_blk_res and maintain it through transaction rolls, even though we
don't do much enforcement the transaction block limits.

[sandeen: This broke during a libxfs sync to userspace, see Fixes:]

Fixes: 0268fdc3 ("xfs: remove xfs_trans_get_block_res")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: properly set inode type

When we set the type to "inode" the verifier validates multiple
inodes in the current fs block, so setting the buffer size to
that of just one inode is not sufficient and it'll emit spurious
verifier errors for all but the first, as we read off the end:

xfs_db> daddr 99
xfs_db> type inode
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200

Use the special set_cur_inode() function for this purpose
as is done in inode_f().

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
[sandeen: remove nag/warning printf for now]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: redirect printfs when metadumping to stdout

If we're metadumping to stdout, we don't want xfs_db's various dbprintf
statements dumping to stdout because that'll corrupt the metadump.
Therefore, let outf point to the existing stdout and redirect stdout to
stderr for the duration of the dump operation.

Reported-by: David Shaw <dshaw@jabberwocky.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs.xfs: allow specification of 0 data stripe width & unit

The "noalign" option works for this too, but it seems reasonable
to allow explicit specification of stripe unit and stripe width
to 0; today, doing so today makes the code think it's unspecified,
and so it goes ahead and detects stripe geometry and sets it in the
superblock. That's unexpected and surprising.

Create a new flag that tracks whtether a geometry option has been
specified, and if it's set along with 0 values, treat it the
same as if "noalign" had been specified.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.12.0-rc2

Update all the necessary files for a 4.12.0-rc2 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: set inode alignment and cluster size for minimum log size estimation

In order for mkfs to calculate the minimum log size correctly, it must
be able to find the transaction type with the largest reservation. The
iunlink transaction reservation size calculation depends on having the
inode cluster size set correctly, which in turn depends on the inode
alignment parameters being set as they will be in the final filesystem.
Therefore we have to set up the inoalignmt field in max_trans_res.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: set agblklog when we're verifying minimum log size

In e5cc9d560a ("mkfs: set agsize prior to calculating minimum log
size"), we set the ag size in the superblock structure so that we can
calculate the maximum btree height correctly. The btree heights are
used to calculate transaction reservation sizes; these sizes are used to
compute the minimum log length; and the minimum log length is checked by
the kernel.

Unfortunately, I didn't realize that some of the btree sizing functions
also depend on the agblklog (log2 of the ag size), so we've been
underestimating the minimum log length allowable, which results in mkfs
formatting filesystems that the kernel refuses to mount.

This can be trivially reproduced by formatting a small (~800M) volume
with rmap and reflink turned on.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.12.0-rc1

Update all the necessary files for a 4.12.0-rc1 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: fix fsmap.h inclusion

If we /do/ have HAVE_GETFSMAP defined, we need to include linux/fsmap.h.

Found-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: identify attr dabtree field types correctly

For whatever reason, the v5 xattr dabtree header fields are mapped to
the directory dabtree header fields, which means that the types are
wrong and hence we cannot use the 'addr' command to step through the
tree. Since the v4 attr dabtree does this correctly, simply port the v5
fields to the attr code too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_spaceman: fix potential overflowing expression in trim_f()

Prevent the potential overflow in expression calculating offset
in trim_f(() by casting the first variable to off64_t (64bit signed).

Addresses-Coverity-Id: 1413771

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>