git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

xfs: move growfs core to libxfs

Source kernel commit: b16817b66b6c97d2a812d663d26faed40079892a

So it can be shared with userspace (e.g. mkfs) easily.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: implement the metadata repair ioctl flag

Source kernel commit: 84d42ea6b6269aee7eb3d91a4425a08b8965fd4a

Plumb in the pieces necessary to make the "scrub" subfunction of
the scrub ioctl actually work. This means that we make the IFLAG_REPAIR
flag to the scrub ioctl actually do something, and we add an errortag
knob so that xfstests can force the kernel to rebuild a metadata
structure even if there's nothing wrong with it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: teach xfs_bmapi_remap to accept some bmapi flags

Source kernel commit: 7644bd988d911168c80599bc034bb489dc851dcf

Teach xfs_bmapi_remap how to map in unwritten extent and to skip rmap
updates. This enables us to rebuild real and unwritten extents from the
rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: make xfs_bmapi_remapi work with attribute forks

Source kernel commit: 7cf199ba5a70dbc744276efc94442fb4436dac15

Add a new flags argument to xfs_bmapi_remapi so that we can pass BMAPI
flags into the function. This enables us to pass in BMAPI_ATTRFORK so
that we can remap things into the attribute fork. Eventually the
online repair code will use this to rebuild attribute forks, so make it
non-static.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: hoist xfs_scrub_agfl_walk to libxfs as xfs_agfl_walk

Source kernel commit: 9f3a080ef19b1c182a8fb1edbfb707fdb811437c

This function is basically a generic AGFL block iterator, so promote it
to libxfs ahead of online repair wanting to use it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: superblock scrub should use short-lived buffers

Source kernel commit: 689e11c84b15866619e7582486acacaf79d7e3e2

Secondary superblocks are rarely used, so create a helper to read a
given non-primary AG's superblock and ensure that it won't stick around
hogging memory.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: factor out nodiscard helpers

Source kernel commit: 4e529339af15226a30e0ca044aa2d78ba3518494

The changes to skip discards of speculative preallocation and
unwritten extents introduced several new wrapper functions through
the bunmapi -> extent free codepath to reduce churn in all of the
associated callers. In several cases, these wrappers simply toggle a
single flag to skip or not skip discards for the resulting blocks.

The explicit _nodiscard() wrappers for such an isolated set of
callers is a bit overkill. Kill off these wrappers and replace with
the calls to the underlying functions in the contexts that need to
control discard behavior. Retain the wrappers that preserve the
original calling conventions to serve the original purpose of
reducing code churn.

This is a refactoring patch and does not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: add BMAPI_NORMAP flag to perform block remapping without updating rmapbt

Source kernel commit: 95eb308caa0ff7c4a0a86053422934737e6e6dc7

Add a new flag, XFS_BMAPI_NORMAP, which will perform file block
remapping without updating the rmapbt. This will be used by the repair
code to reconstruct bmbts from the rmapbt, in which case we don't want
the rmapbt update.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: add repair helpers for the reference count btree

Source kernel commit: 08daa3ccf541b8cc59d198daaccefae17fe565ae

Add a couple of functions to the refcount btree and generic btree code
that will be used to repair the refcountbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: add repair helpers for the reverse mapping btree

Source kernel commit: 4d4f86b49fd0d88677ce45c9cc544cdf663bf047

Add a couple of functions to the reverse mapping btree that will be used
to repair the rmapbt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: expose various functions to repair code

Source kernel commit: 7f8f1313d91a7db9546de6e5bfeb1a2eebb1fef5

Expose various helpers that the repair code will want to use.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: add helpers to calculate btree size

Source kernel commit: 14861c47400b4a1669956d8b027fe4b7855e39f1

Add a bunch of helper functions that calculate the sizes of various
btrees. These will be used to repair btrees and btree headers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: replace XFS_QMOPT_DQALLOC with a simple boolean

Source kernel commit: 30ab2dcf2c0693e518b1920e6edc4212cba10d10

DQALLOC is only ever used with xfs_qm_dqget*, and the only flag that the
_dqget family of functions cares about is DQALLOC. Therefore, change
it to a boolean 'can alloc?' flag for the dqget interfaces where that
makes sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unnecessary xfs_qm_dqattach parameter

Source kernel commit: c14cfccabe2af251388e20c1004ac5c6a970ba53

The flags argument is always zero, get rid of it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: refactor XFS_QMOPT_DQNEXT out of existence

Source kernel commit: 2e330e76e03dd0caee6804b49e9e49d7c3998867

There's only one caller of DQNEXT and its semantics can be moved into a
separate function, so create the function and get rid of the flag.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: don't discard on free of unwritten extents

Source kernel commit: 84ca484ecf2f8e1dc3afddc895cb9b62c531db49

Unwritten extents by definition have not been written to until they
are converted to normal written extents. If unwritten extents are
freed from a file, it is therefore guaranteed that the blocks have
not been written to since allocation (note that zero range punches
and reallocates blocks).

To cut down on online discards generated from workloads that make
use of preallocation, skip discards of extents if they are in the
unwritten state when the extent is freed.

Note that this optimization does not apply to log recovery, during
which all freed extents are discarded if online discard is enabled.
Also note that it may be possible for a filesystem crash to occur
after write completion of an unwritten extent but before unwritten
conversion such that the extent remains unwritten after log
recovery. Since this pseudo-inconsistency may already be possible
after a crash (consider writing to recently allocated blocks where
the allocation transaction is lost after a crash), this change
shouldn't introduce any fundamental limitations that don't already
exist. In short, on storage stacks where discards are important,
it's good practice to run an occasional fstrim even with online
discard enabled in the filesystem, particularly after a crash.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: add bmapi nodiscard flag

Source kernel commit: fcb762f5de2e534ab47b5f034fe484c2b25b4d51

Freed extents are unconditionally discarded when online discard is
enabled. Define XFS_BMAPI_NODISCARD to allow callers to bypass
discards when unnecessary. For example, this will be useful for
eofblocks trimming.

This patch does not change behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: get rid of the log item descriptor

Source kernel commit: e6631f85546c8ff8842f62c73be44ff502d4287a

It's just a connector between a transaction and a log item. There's
a 1:1 relationship between a log item descriptor and a log item,
and a 1:1 relationship between a log item descriptor and a
transaction. Both relationships are created and terminated at the
same time, so why do we even have the descriptor?

Replace it with a specific list_head in the log item and a new
log item dirtied flag to replace the XFS_LID_DIRTY flag.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[darrick: fix up deferred agfl intent finish_item use of LID_DIRTY]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: adder caller IP to xfs_defer* tracepoints

Source kernel commit: e632a5690c734a383a83272a502be79cb2c040e5

So it's clear in the trace where they are being called from.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: add missing rmap error return

Source kernel commit: 52101dfe56f71d8cb140c2440d95affa25a53746

xfs_rmap_lookup_le_range can return errors, so we need to check for
them and bail out.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: bmap debugging should never panic the system

Source kernel commit: cec572561a748396c783c1ea91a289816d3c4f18

Don't panic() the system if the bmap records are garbage, just call
ASSERT which gives us the same backtrace but enables developers to
control if the system goes down or not. This makes debugging with
generic/388 much easier because it won't reboot the machine midway
through a run just because btree_read_bufl returns EIO when the fs has
already shut down.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: defer agfl block frees from deferred ops processing context

Source kernel commit: 2bc5eba8b6957824631205aeffc75db8dbb5426a

Now that AGFL block frees are deferred when dfops is set in the
transaction, start deferring AGFL block frees from contexts that are
known to push the limits of existing log reservations.

The first such context is deferred operation processing itself. This
primarily targets deferred extent frees (such as file extents and
inode chunks), but in doing so covers all allocation operations that
occur in deferred operation processing context.

Update xfs_defer_finish() to set and reset ->t_agfl_dfops across the
processing sequence. This means that any AGFL block frees due to
allocation events result in the addition of new EFIs to the dfops
rather than being processed immediately. xfs_defer_finish() rolls
the transaction at least once more to process the frees of the AGFL
blocks back to the allocation btrees and returns once the AGFL is
rectified.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: defer agfl block frees when dfops is available

Source kernel commit: f8f2835a9cf300079835e1adb1d90f85033be04c

The AGFL fixup code executes before every block allocation/free and
rectifies the AGFL based on the current, dynamic allocation
requirements of the fs. The AGFL must hold a minimum number of
blocks to satisfy a worst case split of the free space btrees caused
by the impending allocation operation. The AGFL is also updated to
maintain the implicit requirement for a minimum number of free slots
to satisfy a worst case join of the free space btrees.

Since the AGFL caches individual blocks, AGFL reduction typically
involves multiple, single block frees. We've had reports of
transaction overrun problems during certain workloads that boil down
to AGFL reduction freeing multiple blocks and consuming more space
in the log than was reserved for the transaction.

Since the objective of freeing AGFL blocks is to ensure free AGFL
free slots are available for the upcoming allocation, one way to
address this problem is to release surplus blocks from the AGFL
immediately but defer the free of those blocks (similar to how
file-mapped blocks are unmapped from the file in one transaction and
freed via a deferred operation) until the transaction is rolled.
This turns AGFL reduction into an operation with predictable log
reservation consumption.

Add the capability to defer AGFL block frees when a deferred ops
list is available to the AGFL fixup code. Add a dfops pointer to the
transaction to carry dfops through various contexts to the allocator
context. Deferring AGFL frees is conditional behavior based on
whether the transaction pointer is populated. The long term
objective is to reuse the transaction pointer to clean up all
unrelated callchains that pass dfops on the stack along with a
transaction and in doing so, consistently defer AGFL blocks from the
allocator.

A bit of customization is required to handle deferred completion
processing because AGFL blocks are accounted against a per-ag
reservation pool and AGFL blocks are not inserted into the extent
busy list when freed (they are inserted when used and released back
to the AGFL). Reuse the majority of the existing deferred extent
free infrastructure and customize it appropriately to handle AGFL
blocks.

Note that this patch only adds infrastructure. It does not change
behavior because no callers have been updated to pass ->t_agfl_dfops
into the allocation code.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: create agfl block free helper function

Source kernel commit: 4223f659dd3edd9e561d90488c6ae332a0a05148

Refactor the AGFL block free code into a new helper such that it can
be invoked from deferred context. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: print specific dqblk that failed verifiers

Source kernel commit: 72c5c5f6d01c859dfe16c4910a5222ed9393c37c

Rather than printing the top of the buffer that held a corrupted dqblk,
restructure things to print out the specific one that failed by pushing
the calls to the verifier_error function down into the verifier which
iterates over the buffer and detects the error.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: add full xfs_dqblk verifier

Source kernel commit: 7224fa482a6daa0558792e03a209e08d34690a26

Add an xfs_dqblk verifier so that it can check the uuid on V5 filesystems;
it calls the existing xfs_dquot_verify verifier to validate the
xfs_disk_dquot_t contained inside it. This lets us move the uuid
verification out of the crc verifier, which makes little sense.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: pass full xfs_dqblk to repair during quotacheck

Source kernel commit: 48fa1db87f730da1aed2d3df0cc8c33c7c133b4b

It's a bit dicey to pass in the smaller xfs_disk_dquot and then cast it to
something larger; pass in the full xfs_dqblk so we know the caller has sent
us the right thing. Rename the function to xfs_dqblk_repair for
clarity.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: check type in quota verifier during quotacheck

Source kernel commit: 57ab324553bbfedc8e732eb570edfac0f5cfe57e

During quotacheck we send in the quota type, so verify that as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unused flags arg from xfs_dquot_verify

Source kernel commit: e381a0f6c28a3f2a452d5fba9b917f03e5dc4ffb

Long ago the flags argument was used to determine whether to issue warnings
about corruptions, but that's done elsewhere now and the flag is unused
here, so remove it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: make xfs_buf_incore out of line

Source kernel commit: 8925a3dc4771004b3e697e7159fa87be2aa5dd43

Move xfs_buf_incore out of line and make it the only way to look up
a buffer in the buffer cache from outside the buffer cache. Convert
the external users of _xfs_buf_find() to xfs_buf_incore() and make
_xfs_buf_find() static.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: actually rename xfs_incore -> xfs_buf_incore]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: clear extent size hints when clearing inode core

In kernel 4.18 we become more strict about what can be in the extent
size hint fields, even for freed inodes. Therefore, if repair decides
to clear out an inode core, zero the hint fields and clear the flags so
that the kernel won't trip over the cleared inode if and when it tries
to read the chunk.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.17.0

Update all the necessary files for a 4.17.0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.17.0-rc1

Update all the necessary files for a 4.17.0-rc1 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: clarify -c in bmap documentation

The bmap -c parameter displays the cow fork information for a file if
the kernel was built with CONFIG_XFS_DEBUG=y. Since xfs_bmap doesn't
support it and it doesn't work generally, remove it from the manpages.
However, xfstests relies on the -c command to be documented in the help
screen so leave it there with a warning about its use.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_bmap: remove -c from manpage

There is no -c switch so remove the documentation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_bmap: don't reject -e

The xfs_io bmap command has an -e switch that prints delalloc extents
without fsync'ing the file first. The xfs_bmap manpage says it'll pass
-e through, but it doesn't. Fix the script and fix the weird manpage
discrepancy where it doesn't list -e in the available options but
discusses it anyway.

Fixes: 7536ce44f6 ("xfs_io: bmap should support querying CoW fork, shared blocks")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: Fix root inode's parent when it's bogus for sf directory

Currently when root inode is in short-form and its parent ino
has an invalid value, process_sf_dir2() ends up not fixing it,
because if verify_inum() fails we never get to the next case which
would fix the root inode's parent pointer.

This behavior triggers the following assert on process_dir2():

   ASSERT((ino != mp->m_sb.sb_rootino && ino != *parent) ||
        (ino == mp->m_sb.sb_rootino &&
        (ino == *parent || need_root_dotdot == 1)));

This patch fixes this behavior by making sure we always properly
handle rootino parent pointer in process_sf_dir2()

Signed-off-by: Marco Benatto <mbenatto@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: fix signed comparison problem in copy_file_range

cvtnum() returns a signed long long, so the type of 'len' should be a
signed type so that a user entering a negative length doesn't produce
some huge positive integer. The negative len check demands it anyway.

Coverity-id: 1435895
Fixes: 25b4549 ("xfs_io: Make copy_range arguments understand *iB values")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: Don't ASSERT on unrecognized metadata

For some types, (for example attr3) if the metadata is not recognized
as the requested type, we can hit an ASSERT when trying to print
the type:

xfs_db: print.c:164: print_flist_1: Assertion `fa->arg & 64' failed.
Aborted (core dumped)

This can happen for corrupted metadata or even just a misdirected
user command; there's no reason to ASSERT. If we get here, print
something helpful and carry on.

[sandeen: write the commit log]

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs.xfs: if either sunit or swidth is nonzero, the other must be as well

Don't allow the user to set one but not the other.

Reported-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: handle realtime bitmap / summary files as text

Use handle_text to print realtime bitmap / summary file blocks instead
of erroring out.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: Make copy_range arguments understand *iB values

Arguments such as 2MiB or 2M are converted to 2 because copy_range uses
strtoull(). Convert strtoull() to cvtnum().

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: fix null pointer deref when complaining about scrub command

Don't increment optind until we've validated that argv[optind] is a
valid scrub/repair subcommand and do not need to complain about
argv[optind].

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix libxfs api violations in quota repair code

My "repair quotas" patch forgot about our libxfs API tricks,
fix that.

Fixes: 5857dce ("xfs_repair: check and repair quota metadata")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: actually check for errors coming from close()

Report errors reported by close().

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

fsck: fix more bashisms

command -v is a bashism, so we need to get rid of it. The shell returns
an error code of 127 if it couldn't invoke xfs_repair, so teach
repair2fsck_code to deal with this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_buflock: record buffer initialization

Buffers are created locked, so we have to factor that into the buffer
state machine that the script utilizes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_buflock: record line number of trace where we locked the buffer

Enhance the debug output by reporting at which line in the trace output
we locked a particular buffer.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_buflock: ignore if buffer already locked

If the trace data says we ran trylock but we were already locked, don't
record another lock.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix integer handling issues

When we shift sb_logblocks to the left we need to ensure that we have
enough storage space to shift correctly. Cast logblocks to a 64-bit
type so that we don't screw up the check.

Coverity-id: 1435810
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: Allow -P and -L to be given to open for O_PATH and  O_NOFOLLOW

Allow "open -P" to specify O_PATH so that paths which would otherwise be
unopenable might be opened for stat()'ing.  Such things include files that
would incur an access error or device files for which no corresponding
driver is available.

Allow "-L" to be given in conjunction with O_PATH to specify O_NOFOLLOW
also.

We also have to avoid calling xfsctl() if O_PATH is given as ioctls are
forbidden on such fds.  This means we cannot retrieve the geometry
information on an XFS filesystem, so the record gets cleared instead.  For
the moment, only the xfsctl() calls in the 'open' command are
conditionalised.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: syncfs can fail

syncfs can return an error. Report one if it does. Also, ensure that
xfs_io will exit with a non-zero status in that case.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: add label command

This adds an online get/set/clear label command to xfs_io.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: check and repair quota metadata

Today, quota inodes are not checked at all in xfs_repair.  (This is
a little odd, because xfs_check used to do it in process_quota()).

The kernel has quota inode validation and repair routines, but it is
out of the ordinary for the kernel to be doing metadata repair.  And
now that we have metadata verifiers, this also yields a surprisingly
noisy mount if quota inodes are corrupted, even immediately after an
xfs_repair.

So this patch allows xfs_repair to fix the quota inode metadata.

Quotacheck is still left for the kernel.  After a few more releases,
I'll propose removing the repair calls from quotacheck.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: be careful about what we stat in platform_check_mount

After we lost ustat(2) in commit 4e7a824, we ended up with a slightly
bonkers method to determine if our target block device was mounted:
it goes through every entry returned by getmntent and stats the dir
to see if its underlying device matches ours.

Unfortunately that dir might be a hung nfs server and sadness ensues.

So just do a really simple sanity check before we try to stat the
mountpoint: does its device start with a / ? If not, skip it.

Fixes: 4e7a824 ("libxfs/linux.c: Replace use of ustat by stat")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_growfs: refactor geometry reporting

Use the new geometry pretty-printing function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_info: call xfs_db for offline filesystems

If the online filesystem geometry query doesn't work, try using xfs_db
to see if we can grab the information offline.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_info: move to xfs_spaceman

Move xfs_info to be under spaceman so that we can remove growfs -N.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_spaceman: add a superblock info command

Add an 'info' command to pretty-print the superblock geometry.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_spaceman: print a nicer message when file path isn't on xfs

If the file path passed in is not something on an xfs filesystem, print
a nice message about that instead of yelling about ioctls.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: add a superblock info command

Add an 'info' command to pretty-print the superblock geometry.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: use geometry generation / helper functions

Since libxfs now has a function to fill out the geometry structure
and libfrog has a function to pretty-print the geometry, have mkfs
use the two helpers instead of open-coding it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: refactor fs geometry printing function

Move the fs geometry printing function to libfrog so that mkfs and others
can share.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: move platform specific runtime support code out of libxfs

Move the platform support code to libfrog, which should remove the final
dependency of libfrog on libxfs. libfrog is the runtime support
library, and these files provide platform support.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: don't assert on bad '.' entry in no-modify mode

During phase 3, we check the '.' entry of a directory inode and (in
modify mode) zap it if the name isn't valid. During phase 6, we assert
that the '.' entry now reflects the correct name. In no-modify mode
this is incorrect because we didn't actually fix anything, so repair
asserts and crashes.

Found by fuzzing bu[0].namelen = 4 in xfs/387 and running xfs_repair -n.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: check inode nsec for obviously garbage values

It makes no sense to have an nsec value that is larger than 1 second,
so zero the field if this is the case.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: actually fix .. entries that point to inode zero

If we encounter a directory with an entry that points to inode zero,
we'll crash due to an ASSERT during process_inode_chunk. This is due to
process_dir2_data not arranging for phase 6 to fix the parent pointer
when '..' -> 0, so do that. Found via xfs/386 fuzzing bu[1].inumber to
zero.

[sandeen: change "parent pointer" to parent directory for clarity]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: don't crash if da btree is corrupt

In the recursive verify_da_path call chain, we decide to examine the
next upper level if the current entry points past the end of the
entries. However, we don't check for a node with zero entries (which
should be impossible) so we run right off the end of the da cursor's
level array and crash. Found by fuzzing hdr.count in xfs/402.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: only update in-core extent state after scanning full extent

In process_bmbt_reclist_int, only update the in-core extent state after
clearing the entire extent for conflicts. If we encounter conflicts
we'll try rebuilding the fork from rmap data and rescanning the fork.
It is essential to avoid polluting the in-memory state with garbage
data so that we don't end up nuking other files needlessly. Found by
fuzzing recs[1].blockcount = middlebit in xfs/380.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: invalidate dirty dir buffers when we zap a directory

If we decide to rebuild a directory in phase 6, we need to find and
invalidate all of the old directory buffers so that they don't get
written out, which can trigger write verifier errors when we finish.
This fixes the write verifier errors in phase 7 that can occur via
xfs/382.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: treat zero da btree pointers as corruption

If a da btree pointer is zero (i.e. the beginning of the fork) report
this as a corrupt tree to the caller instead of telling it that
everything is good. Fixes assertion errors when fuzzing
nbtree[0].before to zero in xfs/394.

[sandeen: tweak comment above change for clarity]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: zap corrupt remote symlink

If a remote symlink has a corrupted remote block, just zap the symlink.
Fixes total lack of repair activity in xfs/382.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: validate some of the log space information

Validate the log space information in a similar manner to the kernel so
that repair will stumble over (and fix) broken log info that prevents
mounting. Fixes logsunit fuzz-and-fix failures in xfs/350.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: don't leak buffer on xattr remote buf verifier error

If we try to read an xattr remote buffer and hit a verifier error,
release the buffer instead of leaking it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: examine all remote attribute blocks

Examine all remote xattr values of a file, not just the XFS_ATTR_ROOT
values. This enables us to detect and zap corrupt user xattrs, as
tested by xfs/404.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

Merge branch 'libxfs-4.17-sync' into for-next

xfs: don't fail when converting shortform attr to long form during ATTR_REPLACE

Source kernel commit: 7b38460dc8e4eafba06c78f8e37099d3b34d473c

Kanda Motohiro reported that expanding a tiny xattr into a large xattr
fails on XFS because we remove the tiny xattr from a shortform fork and
then try to re-add it after converting the fork to extents format having
not removed the ATTR_REPLACE flag. This fails because the attr is no
longer present, causing a fs shutdown.

This is derived from the patch in his bug report, but we really
shouldn't ignore a nonzero retval from the remove call.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199119
Reported-by: kanda.motohiro@gmail.com
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: set format back to extents if xfs_bmap_extents_to_btree

Source kernel commit: 2c4306f719b083d17df2963bc761777576b8ad1b

If xfs_bmap_extents_to_btree fails in a mode where we call
xfs_iroot_realloc(-1) to de-allocate the root, set the
format back to extents.

Otherwise we can assume we can dereference ifp->if_broot
based on the XFS_DINODE_FMT_BTREE format, and crash.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199423
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: enhance dinode verifier

Source kernel commit: b42db0860e13067fcc7cbfba3966c9e652668bbc

Add several more validations to xfs_dinode_verify:

- For LOCAL data fork formats, di_nextents must be 0.
- For LOCAL attr fork formats, di_anextents must be 0.
- For inodes with no attr fork offset,
- format must be XFS_DINODE_FMT_EXTENTS if set at all
- di_anextents must be 0.

Thanks to dchinner for pointing out a couple related checks I had
forgotten to add.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199377
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

Merge branch 'libxfs-4.17-sync' into for-next

xfsprogs: Release v4.16.1

Update all the necessary files for a 4.16.1 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

fsck.xfs: do not use 'function' keyword

It was pointed out on irc that fsck.xfs uses the 'function' keyword
although it invokes /bin/sh - 'function' is a bashism. It's not needed
here, so just remove it.

Fixes: 04a2d5d ("fsck.xfs: allow forced repairs using xfs_repair")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.16.0

Update all the necessary files for a 4.16.0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: non-scrub - remove unused function parameters

Source kernel commit: a1f69417c6f4d1c5280ffb795da7778cba1e1451

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs: clean up xfs_mount allocation and dynamic initializers

Source kernel commit: 72c44e35f02a1cb4032e476c398a7234badcf49f

Most of the generic data structures embedded in xfs_mount are
dynamically initialized immediately after mp is allocated. A few
fields are left out and initialized during the xfs_mountfs()
sequence, after mp has been attached to the superblock.

To clean this up and help prevent premature access of associated
fields, refactor xfs_mount allocation and all dependent init calls
into a new helper. This self-documents that all low level data
structures (i.e., locks, trees, etc.) should be initialized before
xfs_mount is attached to the superblock.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs: remove dead inode version setting code

Source kernel commit: fa4493f0d9b3ac8f36743f1a26e2318b449ee4c8

We can only get into the branch if CRCs are enabled, so there's no
need to check inside the branch for CRCs being enabled....

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs: don't accept inode buffers with suspicious unlinked chains

Source kernel commit: 6a96c5650568a2218712d43ec16f3f82296a6c53

When we're verifying inode buffers, sanity-check the unlinked pointer.
We don't want to run the risk of trying to purge something that's
obviously broken.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs: move inode extent size hint validation to libxfs

Source kernel commit: 8bb82bc12a9e75dd47047d9a2e53135cc5e5787b

Extent size hint validation is used by scrub to decide if there's an
error, and it will be used by repair to decide to remove the hint.
Since these use the same validation functions, move them to libxfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs: refactor inode buffer verifier error logging

Source kernel commit: 6edb181053f067cee64d4239830062cb40ddab00

When the inode buffer verifier encounters an error, it's much more
helpful to print a buffer from the offending inode instead of just the
start of the inode chunk buffer.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs: refactor inode verifier error logging

Source kernel commit: 90a58f95717b46f67756580ad5f8b698304e4bad

Refactor some of the inode verifier failure logging call sites to use
the new xfs_inode_verifier_error method which dumps the offending buffer
as well as the code location of the failed check. This trims the
output, makes it clearer to the admin that repair must be run, and gives
the developers more details to work from.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs: refactor bmap record validation

Source kernel commit: 30b0984d9117dd14c895265886d34335856b712b

Refactor the bmap validator into a more complete helper that looks for
extents that run off the end of the device, overflow into the next AG,
or have invalid flag states.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs: sanity-check the unused space before trying to use it

Source kernel commit: 6915ef35c0350e87a104cb4c4ab2121c81ca7a34

In xfs_dir2_data_use_free, we examine on-disk metadata and ASSERT if
it doesn't make sense. Since a carefully crafted fuzzed image can cause
the kernel to crash after blowing a bunch of assertions, let's move
those checks into a validator function and rig everything up to return
EFSCORRUPTED to userspace. Found by lastbit fuzzing ltail.bestcount via
xfs/391.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs: detect agfl count corruption and reset agfl

Source kernel commit: a27ba2607e60312554cbcd43fc660b2c7f29dc9c

The struct xfs_agfl v5 header was originally introduced with
unexpected padding that caused the AGFL to operate with one less
slot than intended. The header has since been packed, but the fix
left an incompatibility for users who upgrade from an old kernel
with the unpacked header to a newer kernel with the packed header
while the AGFL happens to wrap around the end. The newer kernel
recognizes one extra slot at the physical end of the AGFL that the
previous kernel did not. The new kernel will eventually attempt to
allocate a block from that slot, which contains invalid data, and
cause a crash.

This condition can be detected by comparing the active range of the
AGFL to the count. While this detects a padding mismatch, it can
also trigger false positives for unrelated flcount corruption. Since
we cannot distinguish a size mismatch due to padding from unrelated
corruption, we can't trust the AGFL enough to simply repopulate the
empty slot.

Instead, avoid unnecessarily complex detection logic and and use a
solution that can handle any form of flcount corruption that slips
through read verifiers: distrust the entire AGFL and reset it to an
empty state. Any valid blocks within the AGFL are intentionally
leaked. This requires xfs_repair to rectify (which was already
necessary based on the state the AGFL was found in). The reset
mitigates the side effect of the padding mismatch problem from a
filesystem crash to a free space accounting inconsistency. The
generic approach also means that this patch can be safely backported
to kernels with or without a packed struct xfs_agfl.

Check the AGF for an invalid freelist count on initial read from
disk. If detected, set a flag on the xfs_perag to indicate that a
reset is required before the AGFL can be used. In the first
transaction that attempts to use a flagged AGFL, reset it to empty,
warn the user about the inconsistency and allow the freelist fixup
code to repopulate the AGFL with new blocks. The xfs_perag flag is
cleared to eliminate the need for repeated checks on each block
allocation operation.

This allows kernels that include the packing fix commit 96f859d52bcb
("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
to handle older unpacked AGFL formats without a filesystem crash.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs: account only rmapbt-used blocks against rmapbt perag res

Source kernel commit: 0ab32086d0becee56c75a8ba21f16ac08b80f304

The rmapbt perag metadata reservation reserves blocks for the
reverse mapping btree (rmapbt). Since the rmapbt uses blocks from
the agfl and perag accounting is updated as blocks are allocated
from the allocation btrees, the reservation actually accounts blocks
as they are allocated to (or freed from) the agfl rather than the
rmapbt itself.

While this works for blocks that are eventually used for the rmapbt,
not all agfl blocks are destined for the rmapbt. Blocks that are
allocated to the agfl (and thus "reserved" for the rmapbt) but then
used by another structure leads to a growing inconsistency over time
between the runtime tracking of rmapbt usage vs. actual rmapbt
usage. Since the runtime tracking thinks all agfl blocks are rmapbt
blocks, it essentially believes that less future reservation is
required to satisfy the rmapbt than what is actually necessary.

The inconsistency is rectified across mount cycles because the perag
reservation is initialized based on the actual rmapbt usage at mount
time. The problem, however, is that the excessive drain of the
reservation at runtime opens a window to allocate blocks for other
purposes that might be required for the rmapbt on a subsequent
mount. This problem can be demonstrated by a simple test that runs
an allocation workload to consume agfl blocks over time and then
observe the difference in the agfl reservation requirement across an
unmount/mount cycle:

mount ...: xfs_ag_resv_init: ... resv 3193 ask 3194 len 3194
...
... : xfs_ag_resv_alloc_extent: ... resv 2957 ask 3194 len 1
umount...: xfs_ag_resv_free: ... resv 2956 ask 3194 len 0
mount ...: xfs_ag_resv_init: ... resv 3052 ask 3194 len 3194

As the above tracepoints show, the reservation requirement reduces
from 3194 blocks to 2956 blocks as the workload runs. Without any
other changes in the filesystem, the same reservation requirement
jumps from 2956 to 3052 blocks over a umount/mount cycle.

To address this divergence, update the RMAPBT reservation to account
blocks used for the rmapbt only rather than all blocks filled into
the agfl. This patch makes several high-level changes toward that
end:

1.) Reintroduce an AGFL reservation type to serve as an accounting
no-op for blocks allocated to (or freed from) the AGFL.
2.) Invoke RMAPBT usage accounting from the actual rmapbt block
allocation path rather than the AGFL allocation path.

The first change is required because agfl blocks are considered free
blocks throughout their lifetime. The perag reservation subsystem is
invoked unconditionally by the allocation subsystem, so we need a
way to tell the perag subsystem (via the allocation subsystem) to
not make any accounting changes for blocks filled into the AGFL.

The second change causes the in-core RMAPBT reservation usage
accounting to remain consistent with the on-disk state at all times
and eliminates the risk of leaving the rmapbt reservation
underfilled.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs: rename agfl perag res type to rmapbt

Source kernel commit: 215928633502a7296fec42614463bb49859787d6

The AGFL perag reservation type accounts all allocations that feed
into (or are released from) the allocation group free list (agfl).
The purpose of the reservation is to support worst case conditions
for the reverse mapping btree (rmapbt). As such, the agfl
reservation usage accounting only considers rmapbt usage when the
in-core counters are initialized at mount time.

This implementation inconsistency leads to divergence of the in-core
and on-disk usage accounting over time. In preparation to resolve
this inconsistency and adjust the AGFL reservation into an rmapbt
specific reservation, rename the AGFL reservation type and
associated accounting fields to something more rmapbt-specific. Also
fix up a couple tracepoints that incorrectly use the AGFL
reservation type to pass the agfl state of the associated extent
where the raw reservation type is expected.

Note that this patch does not change perag reservation behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs: convert XFS_AGFL_SIZE to a helper function

Source kernel commit: a78ee256c325ecfaec13cafc41b315bd4e1dd518

The AGFL size calculation is about to get more complex, so lets turn
the macro into a function first and remove the macro.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: forward port to newer kernel, simplify the helper]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfs: convert a few more directory asserts to corruption

Source kernel commit: 3f883f5bb197b6fe4e6f461362782aa7b0e89cb6

Yet another round of playing whack-a-mole with directory code that
asserts on corrupt on-disk metadata when it really should be returning
-EFSCORRUPTED instead of ASSERTing. Found by a xfs/391 crash while
lastbit fuzzing of ltail.bestcount.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

Cleanup old XFS_BTREE_* traces

Source kernel commit: e157ebdcb3acd16221f1e5f84c6e371e15d37b6e

Remove unused legacy btree traces from IRIX era.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>

xfsprogs: Release v4.16.0-rc1

Update all the necessary files for a 4.16.0-rc1 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix getsubopt name definitions to use enums

Convert the getsubopt usage in xfs_repair to use enums and explicitly
initialized array elements, similar to mkfs. This also fixes the hole
in the o_opts table caused by 42fa89bc1b8dc8 ("xfs_repair: remove
pre_65_beta option") that causes segfaults in xfs/179 and xfs/202.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Fixes: 42fa89bc1b ("xfs_repair: remove pre_65_beta option")
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>