git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

xfs: pass bmapi flags through to bmap_del_extent

Source kernel commit: 4847acf868bb426455c8b703c80ed5fc5e2ee556

Pass BMAPI_ flags from bunmapi into bmap_del_extent and extend
BMAPI_REMAP (which means "don't touch the allocator or the quota
accounting") to apply to bunmapi as well. This will be used to
implement the unmap operation, which will be used by swapext.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: map an inode's offset to an exact physical block

Source kernel commit: f65306ea5246ef3ff68a6abf85f5a73a04903366

Teach the bmap routine to know how to map a range of file blocks to a
specific range of physical blocks, instead of simply allocating fresh
blocks. This enables reflink to map a file to blocks that are already
in use.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: log bmap intent items

Source kernel commit: 77d61fe45e720577a2cc0e9580fbc57d8faa7232

Provide a mechanism for higher levels to create BUI/BUD items, submit
them to the log, and a stub function to deal with recovered BUI items.
These parts will be connected to the rmapbt in a later patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: create bmbt update intent log items

Source kernel commit: 6413a01420c2fbf03b3d059795f541caeb962e86

Create bmbt update intent/done log items to record redo information in
the log. Because we roll transactions multiple times for reflink
operations, we also have to track the status of the metadata updates
that will be recorded in the post-roll transactions in case we crash
before committing the final transaction. This mechanism enables log
recovery to finish what was already started.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: introduce reflink utility functions

Source kernel commit: 350a27a6a65cc5dd2ba1b220e8641993414816d2

These functions will be used by the other reflink functions to find
the maximum length of a range of shared blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.coM>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: reserve AG space for the refcount btree root

Source kernel commit: d0e853f3600cd2a3f7c4a067dc38155c77c51df9

Reduce the max AG usable space size so that we always have space for
the refcount btree root.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: adjust refcount when unmapping file blocks

Source kernel commit: 62aab20f08758b1b171a73a54e0c72dd12beb980

When we're unmapping blocks from a reflinked file, decrease the
refcount of the affected blocks and free the extents that are no
longer in use.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: connect refcount adjust functions to upper layers

Source kernel commit: 33ba6129208475ec3aeffe6e9dad9f9afe022405

Plumb in the upper level interface to schedule and finish deferred
refcount operations via the deferred ops mechanism.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: adjust refcount of an extent of blocks in refcount btree

Source kernel commit: 3172725814f9a689d6e8b3c7979b66403abf5dae

Provide functions to adjust the reference counts for an extent of
physical blocks stored in the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

xfs: log refcount intent items

Source kernel commit: f997ee2137175f5b2bd7ced52acf1ca51f04f420

Provide a mechanism for higher levels to create CUI/CUD items, submit
them to the log, and a stub function to deal with recovered CUI items.
These parts will be connected to the refcountbt in a later patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: create refcount update intent log items

Source kernel commit: baf4bcacb715cebd412b2f4bb69989ef24496523

Create refcount update intent/done log items to record redo
information in the log. Because we need to roll transactions between
updating the bmbt mapping and updating the reverse mapping, we also
have to track the status of the metadata updates that will be recorded
in the post-roll transactions, just in case we crash before committing
the final transaction. This mechanism enables log recovery to finish
what was already started.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: add refcount btree operations

Source kernel commit: bdf28630b72154e5766cbad5874576b6f22e7237

Implement the generic btree operations required to manipulate refcount
btree blocks. The implementation is similar to the bmapbt, though it
will only allocate and free blocks from the AG.

Since the refcount root and level fields are separate from the
existing roots and levels array, they need a separate logging flag.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: fix logging of AGF refcount btree fields]
Signed-off-by: Christoph Hellwig <hch@lst.de>

xfs: account for the refcount btree in the alloc/free log reservation

Source kernel commit: f310bd2ecd37b17bf0042c9d1595329057970eb6

Every time we allocate or free a data extent, we might need to split
the refcount btree. Reserve some blocks in the transaction to handle
this possibility. Even though the deferred refcount code can roll a
transaction to avoid overloading the transaction, we can still exceed
the reservation.

Certain pathological workloads (1k blocks, no cowextsize hint, random
directio writes), cause a perfect storm wherein a refcount adjustment
of a large range of blocks causes full tree splits in two separate
extents in two separate refcount tree blocks; allocating new refcount
tree blocks causes rmap btree splits; and all the allocation activity
causes the freespace btrees to split, blowing the reservation.

(Reproduced by generic/167 over NFS atop XFS)

Signed-off-by: Christoph Hellwig <hch@lst.de>
[darrick.wong@oracle.com: add commit message]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

xfs: define the on-disk refcount btree format

Source kernel commit: 1946b91cee4fc8ae25450673e4d4f35e9b462e9e

Start constructing the refcount btree implementation by establishing
the on-disk format and everything needed to read, write, and
manipulate the refcount btree blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: refcount btree add more reserved blocks

Source kernel commit: af30dfa14411e9df0e69c6e46e8c6c467b88229d

Since XFS reserves a small amount of space in each AG as the minimum
free space needed for an operation, save some more space in case we
touch the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: introduce refcount btree definitions

Source kernel commit: 46eeb521b95247170d2db773bb4cc8fb3de1d85c

Add new per-AG refcount btree definitions to the per-AG structures.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: remote attribute blocks aren't really userdata

Source kernel commit: 292378edcb408c652e841fdc867fc14f8b4995fa

When adding a new remote attribute, we write the attribute to the
new extent before the allocation transaction is committed. This
means we cannot reuse busy extents as that violates crash
consistency semantics. Hence we currently treat remote attribute
extent allocation like userdata because it has the same overwrite
ordering constraints as userdata.

Unfortunately, this also allows the allocator to incorrectly apply
extent size hints to the remote attribute extent allocation. This
results in interesting failures, such as transaction block
reservation overruns and in-memory inode attribute fork corruption.

To fix this, we need to separate the busy extent reuse configuration
from the userdata configuration. This changes the definition of
XFS_BMAPI_METADATA slightly - it now means that allocation is
metadata and reuse of busy extents is acceptible due to the metadata
ordering semantics of the journal. If this flag is not set, it
means the allocation is that has unordered data writeback, and hence
busy extent reuse is not allowed. It no longer implies the
allocation is for user data, just that the data write will not be
strictly ordered. This matches the semantics for both user data
and remote attribute block allocation.

As such, This patch changes the "userdata" field to a "datatype"
field, and adds a "no busy reuse" flag to the field.
When we detect an unordered data extent allocation, we immediately set
the no reuse flag. We then set the "user data" flags based on the
inode fork we are allocating the extent to. Hence we only set
userdata flags on data fork allocations now and consider attribute
fork remote extents to be an unordered metadata extent.

The result is that remote attribute extents now have the expected
allocation semantics, and the data fork allocation behaviour is
completely unchanged.

It should be noted that there may be other ways to fix this (e.g.
use ordered metadata buffers for the remote attribute extent data
write) but they are more invasive and difficult to validate both
from a design and implementation POV. Hence this patch takes the
simple, obvious route to fixing the problem...

Reported-and-tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: rewrite and optimize the delalloc write path

Source kernel commit: 51446f5ba44874db4d2a93a6eb61b133e5ec1b3e

Currently xfs_iomap_write_delay does up to lookups in the inode
extent tree, which is rather costly especially with the new iomap
based write path and small write sizes.

But it turns out that the low-level xfs_bmap_search_extents gives us
all the information we need in the regular delalloc buffered write
path:

- it will return us an extent covering the block we are looking up
if it exists.  In that case we can simply return that extent to
the caller and are done
- it will tell us if we are beyoned the last current allocated
block with an eof return parameter.  In that case we can create a
delalloc reservation and use the also returned information about
the last extent in the file as the hint to size our delalloc
reservation.
- it can tell us that we are writing into a hole, but that there is
an extent beyoned this hole.  In this case we can create a
delalloc reservation that covers the requested size (possible
capped to the next existing allocation).

All that can be done in one single routine instead of bouncing up
and down a few layers.  This reduced the CPU overhead of the block
mapping routines and also simplified the code a lot.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: set up per-AG free space reservations

Source kernel commit: 3fd129b63fd062a0d8f5d55994a6e98896c20fa7

One unfortunate quirk of the reference count and reverse mapping
btrees -- they can expand in size when blocks are written to *other*
allocation groups if, say, one large extent becomes a lot of tiny
extents.  Since we don't want to start throwing errors in the middle
of CoWing, we need to reserve some blocks to handle future expansion.
The transaction block reservation counters aren't sufficient here
because we have to have a reserve of blocks in every AG, not just
somewhere in the filesystem.

Therefore, create two per-AG block reservation pools.  One feeds the
AGFL so that rmapbt expansion always succeeds, and the other feeds all
other metadata so that refcountbt expansion never fails.

Use the count of how many reserved blocks we need to have on hand to
create a virtual reservation in the AG.  Through selective clamping of
the maximum length of allocation requests and of the length of the
longest free extent, we can make it look like there's less free space
in the AG unless the reservation owner is asking for blocks.

In other words, play some accounting tricks in-core to make sure that
we always have blocks available.  On the plus side, there's nothing to
clean up if we crash, which is contrast to the strategy that the rough
draft used (actually removing extents from the freespace btrees).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
new file mode 100644
index 000000000000..e3ae0f2b4294

xfs: defer should allow ->finish_item to request a new transaction

Source kernel commit: 385d655861d221bb43ae69a9cfa9adbefe31ad00

When xfs_defer_finish calls ->finish_item, it's possible that
(refcount) won't be able to finish all the work in a single
transaction. When this happens, the ->finish_item handler should
shorten the log done item's list count, update the work item to
reflect where work should continue, and return -EAGAIN so that
defer_finish knows to retain the pending item on the pending list,
roll the transaction, and restart processing where we left off.

Plumb in the code and document how this mechanism is supposed to work.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

xfs: count the blocks in a btree

Source kernel commit: c611cc0360cd924448c23ccd70ce8be703fcb4a6

Provide a helper method to count the number of blocks in a short form
btree. The refcount and rmap btrees need to know the number of blocks
already in use to set up their per-AG block reservations during mount.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: create a standard btree size calculator code

Source kernel commit: 4ed3f68792f6a9c21a290ae777565e7562a09653

Create a helper to generate AG btree height calculator functions.
This will be used (much) later when we get to the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: remove xfs_btree_bigkey

Source kernel commit: a1d46cffaf40e04acb0ecab14980ece3ef1ab933

Remove the xfs_btree_bigkey mess and simply make xfs_btree_key big enough
to hold both keys in-core.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: convert RUI log formats to use variable length arrays

Source kernel commit: cd00158ce34d6e2c42d8892e8499779b8ac1d2bf

Use variable length array declarations for RUI log items,
and replace the open coded sizeof formulae with a single function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: track log done items directly in the deferred pending work item

Source kernel commit: ea78d80866ce375defb2fdd1c8a3aafec95e0f85

Christoph reports slab corruption when a deferred refcount update
aborts during _defer_finish().  The cause of this was broken log item
state tracking in xfs_defer_pending -- upon an abort,
_defer_trans_abort() will call abort_intent on all intent items,
including the ones that have already had a done item attached.

This is incorrect because each intent item has 2 refcount: the first
is released when the intent item is committed to the log; and the
second is released when the _done_ item is committed to the log, or
by the intent creator if there is no done item.  In other words, once
we log the done item, responsibility for releasing the intent item's
second refcount is transferred to the done item and /must not/ be
performed by anything else.

The dfp_committed flag should have been tracking whether or not we had
a done item so that _defer_trans_abort could decide if it needs to
abort the intent item, but due to a thinko this was not the case.  Rip
it out and track the done item directly so that we do the right thing
w.r.t. intent item freeing.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: fix superblock inprogress check

Source kernel commit: f3d7ebdeb2c297bd26272384e955033493ca291c

From inspection, the superblock sb_inprogress check is done in the
verifier and triggered only for the primary superblock via a
"bp->b_bn == XFS_SB_DADDR" check.

Unfortunately, the primary superblock is an uncached buffer, and
hence it is configured by xfs_buf_read_uncached() with:

bp->b_bn = XFS_BUF_DADDR_NULL; /* always null for uncached buffers */

And so this check never triggers. Fix it.

cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs_apply: filterdiff can't handle /dev/null properly

Because we are mangling the diff source/destination locations, we
have to add prefixes to them to get them to apply cleanly as -p1
patches. This is all fine until we create or remove a file and
the the src/dest is /dev/null. Applying a prefix here causes
the diff to be malformed and it won't apply.

Add another hack to work around this limitation of filterdiff when
reformatting the diff into readable format.

Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs_apply: filter commits from libxfs only

When pulling commits from the kernel, it's easy to specify a commit
range such as "v4.8..for-next" to indicate we want to pull all
commits for libxfs since the 4.8 kernel release. Unfortunately,
this pull commits from all over the kernel tree, not just
fs/xfs/libxfs.

Filter the commit list retrieval to limit the commits to those touch
fs/xfs/libxfs so that we only attempt to apply the realtively small
number of relevant commits.

Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: Release 4.8.0

Update all the necessary files for a 4.8.0 release.

Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: Release 4.8.0-rc3

Update all the necessary files for a 4.8.0-rc3 release.

Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: fix inode command with "-n" for bogus inode

If we ask for the next allocated inode after a number for which
no other inode exists, the bulkstat returns success, but with
count == 0. If we ignore this fact, we print a garbage result
from bstat.bs_ino in this case, so fix it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: refactor inode command

The inode_f function is a bit convoluted; the default
find-last-inode case appears at the end, there are several return
points, we print the same basic information using 2 different
variables in 2 different locations depending on the mode we're in,
the "inode not found" was a printf & exit in the middle of the
function, etc.

Move the default case up to the top so it's more obvious, not
buried.

Make a new var, result_ino, which holds whatever we want to print
regardless of the mode, and then handle all the output at the end.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: move inode command arg handling to top

As it stands, collecting the inode number and testing args validity
is all tangled up; for example the test for "-n" having no inode is
buried in an else after a large code block which handles something
else.

Get inode number argument collection and testing out of the way
before doing anything else.

Clean up the error message if a non-numeric inode arg is given.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: factor out new get_last_inode() helper

The inode command by default finds the last allocated inode in the
filesystem via bulkstat, and this specific function is open-coded
after other cases are handled, leading to a fairly long inode_f
function and confusing code flow.

Clean it up by factoring it into a new function, more refactoring
will follow.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: fix inode command help and argsmax

The short help implied that -n and -v were exclusive, and the longer
help wasn't particularly clear.

Further, argsmax is wrong; "-n -v num" is 3, not 2.

# xfs_io -c "inode -n -v 123" /mnt/test2
bad argument count 3 to inode, expected between 0 and 2 arguments
# xfs_io -c "inode -vn 123" /mnt/test2
128:32

Fix up all of those issues.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: add freesp btree block overflow to the free space

If we overestimate the number of blocks needed to rebuild the free
space btrees to the point that we have more blocks than fit in the
AGFL, save those blocks and reinsert them into the free space at
the end of phase 5. Previously, the overflow blocks would simply
be lost.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: fix bogosity when rmapping new AGFL blocks

When repair rebuilds the AGFL, the blocks can come either from the
in-core free space tree or they can come as a result of overestimating
the number of blocks needed to rebuild the on-disk free space btree.
The code in here was trying to only create rmap records for AGFL blocks
that did /not/ come from free space btree rebuild overestimation, but
was totally broken. The initial and check conditions were totally wrong
if there was any overflow. Remove a stray debug printf too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: remove unused libxfs_iget arg

libxfs_iget() is always called with bno == 0. Which is probably a
good thing, because it then passes bno to xfs_iread as iget_flags!

So remove the libxfs_iget arg, and explicitly pass 0 to xfs_iread
for flags.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxcmd: fix counting of xfs entries in fs_table_insert

Commit bb80e3d6cd04 ("libxcmd: populate fs table with xfs entries
first, foreign entries last") adds a new counter "xfs_fs_count" and
increases the counter when inserting an XFS entry.

But it missed a counter when fs_count is zero (inserting the first
path) and the entry has no FS_FOREIGN bit set, i.e. the first XFS
entry doesn't increase xfs_fs_count.

This results in args_command() mess and infinite loop in xfs/244
when testing v4 XFS (xfs/244 notrun on v5 XFS, but this bug still
reproduces on v5 XFS). e.g.

  mkfs -t xfs -f /dev/sda5
  mount -o pquota /dev/sda5 /mnt/xfs
  mkdir /mnt/xfs/project
  touch /mnt/xfs/project/testfile
  xfs_quota -x -c "project -s -p /mnt/xfs/project/testfile 1" /dev/sda5

Fix it by increasing xfs_fs_count when flags has no FS_FOREIGN bit.

Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: Release 4.8.0-rc2

Update all the necessary files for a 4.8.0-rc2 release.

Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_copy: Fix meta UUID handling on multiple copies

Zorro reported that when making multiple copies of a V5
filesystem with xfs_copy while generating new UUIDs, all
but the first copy were corrupt.

Upon inspection, the corruption was related to incorrect UUIDs;
the original UUID, as stamped into every metadata structure,
was not preserved in the sb_meta_uuid field of the superblock
on any but the first copy.

This happened because sb_update_uuid was using the UUID present in
the ag_hdr structure as the unchanging meta-uuid which is to match
existing structures, but it also /updates/ that UUID with the
new identifying UUID present in tcarg. So the newly-generated
UUIDs moved transitively from tcarg->uuid to ag_hdr->xfs_sb->sb_uuid
to ag_hdr->xfs_sb->sb_meta_uuid each time the function got called.

Fix this by looking instead to the unchanging, original UUID
present in the xfs_sb_t we are given, which reflects the original
filesystem's metadata UUID, and copy /that/ UUID into each target
filesystem's meta_uuid field.

Most of this patch is changing comments and re-ordering tests
to match; the functional change is to simply use the *sb rather
than the *ag_hdr to identify the proper metadata UUID.

Reported-and-tested-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: fix segfault from uninitialized tp in mv_orphanage

After 9074815 xfs: better xfs_trans_alloc interface, mv_orphanage
was passing an uninitialized *tp into libxfs_dir_lookup, because
the trans_alloc() call which was present prior to the call got
removed in that commit.

This ultimately led to testing an uninit tp var:

Conditional jump or move depends on uninitialised value(s)
   at 0x434D01: libxfs_trans_read_buf_map (trans.c:554)
   by 0x45152E: libxfs_da_read_buf (xfs_da_btree.c:2610)
   by 0x456ACB: xfs_dir3_block_read (xfs_dir2_block.c:136)
   by 0x4570A8: xfs_dir2_block_lookup_int (xfs_dir2_block.c:675)
   by 0x457DB7: xfs_dir2_block_lookup (xfs_dir2_block.c:623)
   by 0x455F54: libxfs_dir_lookup (xfs_dir2.c:399)
   by 0x421C46: mv_orphanage (phase6.c:1095)
   by 0x4222C2: check_for_orphaned_inodes (phase6.c:3108)
   by 0x423ABD: phase6 (phase6.c:3287)
   by 0x42E4B2: main (xfs_repair.c:933)

and ended with a segfault as we tried to use that tp when
searching for the buffer in xfs_trans_buf_item_match():

        list_for_each_entry(lidp, &tp->t_items, lid_trans) {

I think simply passing in NULL for this tp is sufficient to fix
this; we'll just go read the buffer from disk in
libxfs_trans_read_buf_map rather than trying to find it in an
existing transaction.

Reported-by: Consigliere <admin@russenmafia.at>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: factor mount checks into helper function

platform_check_ismounted switched to a getmntent() loop after
ustat disappeared on some new platforms.

We also use a similar mechanism for determining the
ro/rw-mounted status of a device in platform_check_iswritable.

Because the loops are essentially the same, factor them into a
single helper which accepts a VERBOSE flag to print info if the
device is found in the checked-for state, and a WRITABLE flag
which only checks specifically for a mounted and /writable/ device.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs.xfs: clarify ftype defaults in manpage

When CRCs were made default, a few leftovers related to its
prior non-default status remained in the manpage, in the ftype
section. Clean those up, stating the correct default for this
feature.

Reported-by: Chris Murphy <chris@cmurf.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: allow chattr & chproj on foreign filesystems

Now that FS_IOC_FSSETXATTR is a generic vfs call, these
functions can be used on non-xfs filesystems, and this is
needed for generic project quota testing.

(not all flags are valid on all filesystems.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs_quota: fix free command for foreign fs

The "free" command is really just a fancy df that knows about
log space and realtime blocks for an xfs filesystem.

We can simply use statfs to get more or less the same thing
on a non-xfs filesystem, so, ah, do that I guess, and re-enable
it.

# quota/xfs_quota -f -x -c path -c free /mnt/test
          Filesystem          Pathname
[000] (F) /mnt/test           /dev/sdb1 (uquota)

Filesystem           1K-blocks       Used  Available  Use% Pathname
/dev/sdb1             20511356      45000   20466356    0% /mnt/test

Fix the short help text for -N while we're at it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_quota: un-flag non-foreign-capable commands

The off command calls XFS_QUOTAOFF / Q_XQUOTAOFF, which calls
quota_disable in the kernel, which returns ENOSYS if the
->quota_enable quota op doesn't exist - and it does not exist
on any non-xfs filesystems.

We could get clever if we wanted it, and send Q_QUOTAOFF
instead for foreign filesystems, but for now it's broken
so just remove the flag.

The free command relies on XFS_IOC_FSGEOMETRY_V1, so unflag it
as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_quota: Enable 3 more foreign commands

Enable restore, limit, and timer.

Unsupported commands remain, for lack of kernel support, generally:
warn, quot,, enable, disable, and remove.

xfs_quota> report
User quota on /mnt/test2/git/xfsprogs/mnt (/dev/loop0)
                               Blocks
User ID          Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
root               13          0          0     00 [--------]

xfs_quota> restore -f quotadump
xfs_quota> report
User quota on /mnt/test2/git/xfsprogs/mnt (/dev/loop0)
                               Blocks
User ID          Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
root               13          0          0     00 [--------]
testuser            0      16384      32768     00 [--------]
fsgqa               0     102400     112640     00 [--------]

xfs_quota> limit bsoft=200m fsgqa

xfs_quota> report
User quota on /mnt/test2/git/xfsprogs/mnt (/dev/loop0)
                               Blocks
User ID          Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
root               13          0          0     00 [--------]
testuser            0      16384      32768     00 [--------]
fsgqa               0     204800     112640     00 [--------]

xfs_quota> state -u
User quota state on /mnt/test2/git/xfsprogs/mnt (/dev/loop0)
  Accounting: ON
  Enforcement: ON
  Inode: #12 (16 blocks, 1 extents)
Blocks grace time: [7 days]
Inodes grace time: [7 days]

xfs_quota> timer -b 3days
xfs_quota> state -u
User quota state on /mnt/test2/git/xfsprogs/mnt (/dev/loop0)
  Accounting: ON
  Enforcement: ON
  Inode: #12 (16 blocks, 1 extents)
Blocks grace time: [3 days]
Inodes grace time: [7 days]
Realtime Blocks grace time: [--------]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_quota: add case for foreign fs, disabled regardless of foreign_allowed

Some commands are disallowed for foreign filesystems,
regardless of whether or not the -f flag is thrown.
Add a case for this condition and improve commenting
and output messaging accordingly in init_check_command.

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_quota: print and path output formatting: maintain reverse compatibility

This patch adjusts the formatting of the xfs_quota print and
path outputs, in order to maintain reverse compatability:
when -f flag isn't used, need to keep the output same as in
previous version.

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxcmd: populate fs table with xfs entries first, foreign entries last

Commits b20b6c2 and 29647c8 modified xfs_quota for use on
non-XFS filesystems. Modifications to fs_initialise_mounts
(paths.c) resulted in an xfstest fail (xfs/261), due to foreign
fs paths being picked up first from the fs table. The xfs_quota
print command then complained about not being able to print the
foreign paths, instead of previous behavior (quiet).

This patch restores correct behavior, sorting the table so that
xfs entries are first, followed by foreign fs entries. The patch
maintains the order of xfs entries and foreign entries in the
same order as mtab entries. Then, in functions which print all
paths we can simply break at the first foreign path if the -f
switch is not specified.

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: exit with status 2 if log dirtiness is unknown

This new case is mostly like the known dirty log case; the log
is corrupt, dirtiness cannot be determined, and a mount/umount
cycle or an xfs_repair -L is required.

So exit with status 2 here as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_logprint: remove the printing of transaction type

THe kernel stopped using meaningful types in transaction headers
when delayed logging was introduced. Since then the only transaction
type that reaches the journal is a "Checkpoint" type. Since then,
we've effectivey broken the transaction type printing for newer
kernels, and the current kernels don't even have transaction types
internally. Hence this logprint function is stale, broken, and
causing us problems.

This patch removes the transaction type parsing. If a user needs
this information from logprint, we can still build a binary from a
prior version that correctly decoded the transaction type (e.g.
3.2.1) for that purpose.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: move iswritable "fatal" decision to caller

Simplify platform_check_iswritable by moving the
"fatal" decision up to the (one) caller. In other words,
simply return whether mounted+writable is true, and
return 1 if so. Caller decides what to do with that info
based on /its/ "fatal" argument.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: Release 4.8.0-rc1

Update all the necessary files for a 4.8.0-rc1 release. Also,
replace all the mailing list contact details with the new list
address, and the project website with xfs.org.

Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: pass the inode cluster offset when copying inodes

In copy_inode_chunk, we try to determine whether or not an inode is
free as part of copying the inode records. The macros involved in
testing ir_free require both the inode record and the offset of an
inode within that chunk. Prior to sparse inode support, the loop
index "i" was also the inode chunk offset; however, when sparse
support was added, "i" became the inode offset within a cluster and
"ioff" became the inode cluster offset within an inode chunk.
Therefore, it is necessary to pass "ioff + i" to do the free-ness
calculation correctly.

This was discovered while trying to take metadumps of fs images for
scrub testing.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: don't crash on ENOSPC rebuilding a btree

During btree rebuilding, the cursor setup function checks ext_ptr to
report ENOSPC problems when it grabs the first extent for the btree.
However, subsequent grabs for free space don't check ext_ptr and so we
segfault if there's no space. Therefore, move the ENOSPC check into
the loop so that we always complain about insufficient space instead
of just crashing.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs/linux.c: Replace use of ustat by stat

ustat has been used to check whether a device file is mounted.
The function is deprecated and not supported by uclibc and musl.
Now do the check using the *mntent functions.

Based on patch by Natanael Copa <ncopa@alpinelinux.org>.

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

db: write via array indexing doesn't work

This command to write a specific AGFL index:

# xfs_db -x -c "agfl 0" -c "write -d bno[32] 78" /dev/ram0

doesn't write to array index 32 - it incorrectly writes to
/somewhere/ in the entire array range.

The issue here is that the write_struct() code assumes that the
object it is printing always a structure member and any array
indexes will be exposed as children of the parent type. This works
just fine for structures with internal arrays, but when the type
being decoded is an array, we get a direct reference to the offset
to be written in the parent object.

Hence we need to take into account the array index returned by the
parent object parsing when calculating the size of the region to be
modified rather than using fcount() as that results in the size
always being set to the size of the entire array and the
modification being written to the wrong place.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: properly set dquot_buf when operating on dquot

The earlier commit:

66a40d02 db: verify and calculate dquot CRCs

added a "dquot_buf" to the iocur to specify when we were operating
on a dquot and thus handle dquot CRC updates - but nothing ever
actually set dquot_buf to a non-zero value.

Without doing so, we don't recalculate the dquot crc when
changing contents of a dquot:

# xfs_db -x -c "dquot -u 500" -c "p crc" -c "write diskdq.bcount 2" \
-c "p crc" crctestfile
crc = 0xfd293c68 (correct)
diskdq.bcount = 2
crc = 0xfd293c68 (correct)

[ the "(correct)" tag is another, different issue ]

# xfs_db -x -c "dquot -u 500" -c "p crc" crctestfile
Metadata CRC error detected at xfs_dquot block 0xd8/0x1000
crc = 0xfd293c68 (bad)

With this change, dquot CRCs are properly recalculated in write_cur.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_quota: fix missing break after foreign_allowed option

New quota "-f" has been brought in by:

29647c8 xfs_quota: add capabilities for use on non-XFS filesystems

But Coverity Scan find a missing break in quota/init.c: init()
function.

Signed-off-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: add crc manipulation commands

This adds a new "crc" command to xfs_db for CRC-enabled filesystems.

If a structure has a CRC field, we can validate it, invalidate/corrupt
it, or revalidate/rewrite it:

xfs_db> sb 0
xfs_db> crc -v
crc = 0x796c814f (correct)
xfs_db> crc -i
Metadata CRC error detected at block 0x0/0x200
crc = 0x796c8150 (bad)
xfs_db> crc -r
crc = 0x796c814f (correct)

(-i and -r require "expert" write-capable mode)

This requires temporarily replacing the write verifier with
a dummy which won't recalculate the CRC on the way to disk.

It also required me to write a new flist function, which is
totally foreign to me, so hopefully done right - but it seems
to work here.

[ dchinner: rewrite write_cur() to also skip CRC updates on dquots,
fix set-but-unused warnings, use iotop_cur safely. ]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_quota: certain commands must always be available

Add CMD_ALL_FSTYPES to enable basic xfs_quota commands (e.g. help,
quit) to be run regardless of the filesystem type we are operating
on.

Use CMD_FLAG_FOREIGN_OK on commands suitable for foreign filesystems.
Refactor init_check_command in quota/init.c for clarity.

[ dchinner: CMD_SKIP_CHECK -> CMD_ALL_FSTYPES ]

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_quota: add capabilities for use on non-XFS filesystems

This patch allows xfs_quota to be used on ext4 for project quota
testing in xfstests. It was based on work originally from Dave
Chinner. As a part of its support for foreign filesystems xfs_quota
is modified with a "-f" command line flag to enable select commands
on those filesystems.

This requires us to discriminate different filesystem types when
walking the fileystem table during argument processing as the table
is now populated with mounted non-XFS filesystems. We should only
select a foreign filesystem mount point if the "-f" flag is present
on the command line.

This patch also updates the usage and man page information
for the new CLI flag appropriately.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_quota: wire up XFS_GETQSTATV

The new XFS_GETQSTATV quotactl, available since kernel v3.12,
was never implemented in xfs_quota, and the "state" command
continues to use XFS_GETQSTAT, which cannot report both
group & project quota on newer formats.

The new call has room for all 3 quota types (user, group, and
quota), vs just two, where previously project and quota
overlapped.

So:

First, try XFS_GETQSTATV.
If it passes, we have all the information we need, and we print
it. state_qfilestat() is modified to take the newer structure.

If it fails, try XFS_GETQSTAT. If that passes, we are on an
older kernel with neither XFS_GETQSTATV nor the on-disk project
quota inode. We copy the available information into the newer
statv structure, carefully determining wither group or project
(or neither) is actually active, and print it with the same
state_qfilestat routine.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_metadump: don't warn about unobfuscated log with -o

It makes no sense to warn about un-obfuscated logs
when we asked xfs_metadump to not obfuscate metadata:

# xfs_metadump -o /dev/loop2 bad.metadump
xfs_metadump: Filesystem log is dirty; image will contain unobfuscated metadata in log.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: fix naming problems in repair/rmap.c

The utility functions in repair/rmap.c should all have a prefix
of 'rmap_' so that they are easily identifiable.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

misc: fix libxfs api violations

Fix all the client programs to use 'libxfs_' prefixes for non-inline
function calls and to negate integer return codes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

misc: fix Coverity errors

Fix various code sloppinesses pointed out by Coverity,
and fix an incorrect comment/debug message.

Coverity-id: 1371628 - 1371638
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxcmd: fix mount option parsing to find rt/log devices

It turns out that glibc's hasmntopt implementation returns NULL
if the opt parameter ends with an equals ('='). Therefore, we
cannot directly search for the option 'rtdev='; we must instead
have hasmntopt look for 'rtdev' and look for the trailing equals
sign ourselves. This fixes xfs_info's reporting of external
log and realtime device paths, and xfs_scrub will need it for
data block scrubbing of realtime extents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: simple btree query range should look right if LE lookup fails

If the initial LOOKUP_LE in the simple query range fails to find
anything, we should attempt to increment the btree cursor to see
if there actually /are/ records for what we're trying to find.
Without this patch, a bnobt range query of (0, $agsize) returns
no results because the leftmost record never has a startblock
of zero.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: fix some key handling problems in _btree_simple_query_range

We only need the record's high key for the first record that we look
at; for all records, we /definitely/ need the regular record key.
Therefore, fix how the simple range query function gets its keys.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: don't perform lookups on zero-height btrees

If the caller passes in a cursor to a zero-height btree (which is
impossible), we never set block to anything but NULL, which causes the
later dereference of it to crash. Instead, just return -EFSCORRUPTED.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs.xfs: create filesystems with reverse-mappings

Originally-From: Dave Chinner <dchinner@redhat.com>

Create v5 filesystems with rmapbt turned on. Document the rmapbt
options to mkfs, and initialize the extra field we added for reflink
support.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick.wong@oracle.com: split patch, add commit message and extra fields]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs: set agsize prior to calculating minimum log size

Each btree has its own maxlevels variable. Since the level count of
certain btrees depend on agblocks, it's necessary to know the AG size
prior to calculating the log reservations. These reservations are
needed to calculate the log size and the kernel will refuse to mount
if we guess too low, so stuff in the real agsize when we're formatting
the log.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: check for impossible rmap record field combinations

Make sure there are no records or keys with impossible field
combinations, such as non-inode records with offsets or flags.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: look for mergeable rmaps

Check for adjacent mergeable rmaps; this is a sign that we've
screwed up somehow.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: merge data & attr fork reverse mappings

Merge data and attribute fork reverse mappings.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: add per-AG btree blocks to rmap data and add to rmapbt

Since we can't know the location of the new per-AG btree blocks prior
to constructing the rmapbt, we must record raw reverse-mapping data for
btree blocks while the new btrees are under construction. After the
rmapbt has been rebuilt, merge the btree rmap entries into the rmapbt
with the libxfs code.

Also refactor the freelist fixing code since we need it to tidy up
the AGFL after each rmapbt allocation.

Use libxfs_rmap_alloc to add rmap records for AG metadata blocks
because it knows how to merge adjacent rmaps. This particular bug was
discovered while running xfs_repair twice on generic/175 wherein block
X was originally allocated to the rmapbt, then X+1 got allocated to
the rmapbt when we expanded it to hold all the entries for the rmapbt
blocks.

[dchinner: libxfs'ify the libxfs calls.]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: rebuild reverse-mapping btree

Rebuild the reverse-mapping btree with the rmap observations
corresponding to file extents, bmbt blocks, and fixed per-AG metadata.

Leave a few empty slots in each rmapbt leaf when we're rebuilding
the rmapbt so that we can insert records for the AG metadata blocks
without causing too many btree splits. This (hopefully) prevents the
situation where running xfs_repair greatly increases the size of the
btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: check existing rmapbt entries against observed rmaps

Once we've finished collecting reverse mapping observations from the
metadata scan, check those observations against the rmap btree
(particularly if we're in -n mode) to detect rmapbt problems.

[dchinner: libxfs'ify the various libxfs calls. ]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: add fixed-location per-AG rmaps

Add reverse-mappings for fixed-location per-AG metadata such as inode
chunks, superblocks, and the log to the raw rmap list, then merge the
raw rmap data (which also has the BMBT data) into the main rmap list.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: add inode bmbt block rmaps

Record BMBT blocks in the raw rmap list.

[dchinner: remove unused lastowner/lastoffset variables from scan.c]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: record and merge raw rmap data

Since we still allow merging of BMBT block, AG metadata, and AG btree
block rmaps, provide a facility to collect these raw observations and
merge them (with maximal length) into the main rmap list.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: collect reverse-mapping data for refcount/rmap tree rebuilding

Collect reverse-mapping data for the entire filesystem so that we can
later check and rebuild the reference count tree and the reverse mapping
tree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: create a slab API for allocating arrays in large chunks

Create a slab-based array and a bag-of-pointers data structure to
facilitate rapid linear scans of reverse-mapping data for later
reconstruction of the refcount and rmap btrees.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: fix fino_bno calculation when rmapbt is enabled

In xfs_repair, we calculate where we think mkfs put the root inode
block. However, the rmapbt component doesn't account for the fact
that mkfs reserved 2 AGFL blocks for the rmapbt, so its calculation
is off by a bit. This leads to it complaining (incorrectly) about the
root inode block being in the wrong place and blowing up.

[dchinner: small comment update to indicate AGFL block accounting]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: use rmap btree data to check block types

Originally-From: Dave Chinner <dchinner@redhat.com>

Use the rmap btree to pre-populate the block type information so that
when repair iterates the primary metadata, we can confirm the block
type.

Ensure that we remove the flag bits from blockcount before using the
length field.

When we're processing rmap records, we set the bmap state of
the entire extent, not just the first block of the extent. This
enables us to catch improperly overlapping rmap records and later to
ensure that the entire primary metadata extent matches (owner-wise)
the reverse mapping. It also enables us to catch the case where the
rmapbt maps something that isn't pointed to by primary metadata.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick.wong@oracle.com: split patch, strip flag bits from blockcount]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_logprint: support rmap redo items

Print reverse mapping update redo items.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: add rmap-finish error injection type

Add XFS_ERRTAG_RMAP_FINISH_ONE to the types of errors we can inject.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_growfs: report rmapbt presence

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: introduce the 'fsmap' command to find what owns a set of fsblocks

Introduce a new 'fsmap' command to the fs debugger that will query the
rmap btree to report the file/metadata extents mapped to a range of
physical blocks.

[dchinner: xfs_rmap_query_range -> libxfs_rmap_query_range]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: copy the rmap btree

Copy the rmapbt when we're metadumping the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: spot check rmapbt

Check the rmapbt for obvious errors. We're leaving thorough checks
such as comparing the primary metadata against the rmapbt contents
for newer things like xfs_repair.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: display rmap btree contents

Originally-From: Dave Chinner <dchinner@redhat.com>

Teach the debugger how to dump the reverse-mapping btree contents.
Decode the extra fields in the rmapbt records and keys now that we
support reflink.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: split patch, add commit message, decode extra fields]
[darrick: support overlapped interval btree fields]
[darrick: move unwritten bit to rm_offset]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: add deferred ops item handlers for userspace

Add deferred ops handlers for userspace, which simply call back
into the libxfs functions.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: fix various oddities in the kernel import

Fix some minor anomalies in the kernel -> xfsprogs import of the
4.8 libxfs code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: remove OWN_AG rmap when allocating a block from the AGFL

When we're really tight on space, xfs_alloc_ag_vextent_small() can
allocate a block from the AGFL and give it to the caller. Since the
caller is never the AGFL-fixing method, we must remove the OWN_AG
reverse mapping because it will clash with whatever rmap the caller
wants to set up. This bug was discovered by running generic/299
repeatedly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: store rmapbt block count in the AGF

Track the number of blocks used for the rmapbt in the AGF. When we
get to the AG reservation code we need this counter to quickly
make our reservation during mount.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: add free-extent error injection type

Add XFS_ERRTAG_FREE_EXTENT to the types of errors we can inject.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>