git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

xfs_db: properly set inode type

When we set the type to "inode" the verifier validates multiple
inodes in the current fs block, so setting the buffer size to
that of just one inode is not sufficient and it'll emit spurious
verifier errors for all but the first, as we read off the end:

xfs_db> daddr 99
xfs_db> type inode
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200
Metadata corruption detected at xfs_inode block 0x63/0x200

Use the special set_cur_inode() function for this purpose
as is done in inode_f().

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
[sandeen: remove nag/warning printf for now]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: redirect printfs when metadumping to stdout

If we're metadumping to stdout, we don't want xfs_db's various dbprintf
statements dumping to stdout because that'll corrupt the metadump.
Therefore, let outf point to the existing stdout and redirect stdout to
stderr for the duration of the dump operation.

Reported-by: David Shaw <dshaw@jabberwocky.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs.xfs: allow specification of 0 data stripe width & unit

The "noalign" option works for this too, but it seems reasonable
to allow explicit specification of stripe unit and stripe width
to 0; today, doing so today makes the code think it's unspecified,
and so it goes ahead and detects stripe geometry and sets it in the
superblock. That's unexpected and surprising.

Create a new flag that tracks whtether a geometry option has been
specified, and if it's set along with 0 values, treat it the
same as if "noalign" had been specified.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.12.0-rc2

Update all the necessary files for a 4.12.0-rc2 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: set inode alignment and cluster size for minimum log size estimation

In order for mkfs to calculate the minimum log size correctly, it must
be able to find the transaction type with the largest reservation. The
iunlink transaction reservation size calculation depends on having the
inode cluster size set correctly, which in turn depends on the inode
alignment parameters being set as they will be in the final filesystem.
Therefore we have to set up the inoalignmt field in max_trans_res.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: set agblklog when we're verifying minimum log size

In e5cc9d560a ("mkfs: set agsize prior to calculating minimum log
size"), we set the ag size in the superblock structure so that we can
calculate the maximum btree height correctly. The btree heights are
used to calculate transaction reservation sizes; these sizes are used to
compute the minimum log length; and the minimum log length is checked by
the kernel.

Unfortunately, I didn't realize that some of the btree sizing functions
also depend on the agblklog (log2 of the ag size), so we've been
underestimating the minimum log length allowable, which results in mkfs
formatting filesystems that the kernel refuses to mount.

This can be trivially reproduced by formatting a small (~800M) volume
with rmap and reflink turned on.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.12.0-rc1

Update all the necessary files for a 4.12.0-rc1 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: fix fsmap.h inclusion

If we /do/ have HAVE_GETFSMAP defined, we need to include linux/fsmap.h.

Found-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: identify attr dabtree field types correctly

For whatever reason, the v5 xattr dabtree header fields are mapped to
the directory dabtree header fields, which means that the types are
wrong and hence we cannot use the 'addr' command to step through the
tree. Since the v4 attr dabtree does this correctly, simply port the v5
fields to the attr code too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_spaceman: fix potential overflowing expression in trim_f()

Prevent the potential overflow in expression calculating offset
in trim_f(() by casting the first variable to off64_t (64bit signed).

Addresses-Coverity-Id: 1413771

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_spaceman: close open file for error case in openfile()

openfile() fails to close the open file in one error case.
Add close(fd) to correct the condition.

Addresses-Coverity-Id: 1413770

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_spaceman: fix potential memory leak by malloc in scan_ag

scan_ag mallocs memory that is potentially leaked. Add a free
to alleviate the potential leak.

Addresses-Coverity-Id: 1413772

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: improve argument naming in set_cur and set_iocur_type

In set_cur and set_iocur_type, the current naming for arguments
type, block number, and length are t, d, and c, respectively.
Replace these with more intuitive and descriptive names:
type, blknum, and len. Fix type of blknum (xfs_daddr_t) to be
consistent with that of libxfs_readbuf where it's used.
Additionally remove extra blank line in io.c.

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: update buffer size when new type is set

xfs_db doesn't take sector size into account when setting type.
This can result in an errant crc. For example, with a sector size
of 4096:

xfs_db> agi 0
xfs_db> p crc
crc = 0xab85043e (correct)
xfs_db> daddr
current daddr is 16
xfs_db> daddr 42
xfs_db> daddr 16
xfs_db> type agi
Metadata CRC error detected at xfs_agi block 0x10/0x200
xfs_db> p crc
crc = 0xab85043e (bad)

When xfs_db sets the new daddr in daddr_f, it does so with one
BBSIZE sector (512). Changing the type doesn't change the size
of the current buffer in iocur_top, so the checksum is calculated
on the wrong length for the type (when the actual sector size > BBSIZE (512).

For types with fields, reread the buffer to pick up the correct size for
the new type when it gets set. Facilitate the reread by setting the cursor
with set_cur().

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[sandeen: fix up long line, clarify subject & comments]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_spaceman: add group summary mode

Add a -g switch to show only a per-group summary.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: reset global gflag to 0 for each call]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_spaceman: add a man page

Add a manual page describing xfs_spaceman's behavior.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: minor edits]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_spaceman: Free space mapping command

Add freespace mapping tool modelled on the xfs_db freesp command.
The advantage of this command over xfs_db is that it can be done
online and is coherent with concurrent modifications to the
filesystem.

This requires the kernel to support the XFS_IOC_GETFSMAP ioctl to map
free space indexes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: port from FIEMAPFS to GETFSMAP]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_spaceman: add new speculative prealloc control

Add an control interface for purging speculative
preallocation via the new ioctls.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: change xfsctl to ioctl]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: change help to "removes" not "controls"]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_spaceman: add FITRIM support

Add support for discarding free space extents via the FITRIM
command. Make it easy to discard a single range, an entire AG or all
the freespace in the filesystem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: minor printf formatting fixup]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_spaceman: space management tool

xfs_spaceman is intended as a diagnostic and control tool for space
management operations within XFS. Operations like examining free
space, managing allocation policies, issuing block discards on free
space, etc.

The tool is modelled on the xfs_io interface, allowing both
interactive and command line control of the tool, enabling it to be
used in scripts and automated management tools.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: change xfsctl to ioctl]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: replace rmap_compare with libxfs version

Now that libxfs has a function to compare rmaps, replace xfs_repair's
helper function with that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: support the new getfsmap ioctl

Plumb in all the pieces we need to have xfs_io query the GETFSMAP ioctl
for an arbitrary filesystem.

[sandeen: minor doc fixes]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: introduce the XFS_IOC_GETFSMAP ioctl

Introduce a new ioctl that uses the reverse mapping btree to return
information about the physical layout of the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[sandeen: rework to have less autoconf magic]
Modified-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: use crc32c slice-by-8 variant by default

The crc32c code used in xfsprogs was copied directly from the Linux
kernel. However, that code selects slice-by-4 by default, which isn't
the fastest -- that's slice-by-8, which trades table size for speed.
Fix some makefile dependency problems and explicitly select the
algorithm we want. With this patch applied, I see about a 10% drop in
CPU time running xfs_repair.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxcmd: add cvt{int, long} to convert strings to int and long

Create some helper functions to convert strings to int or long
in a standard way and work around problems in the C library
atoi/strtol functions.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: refactor numlen into a library function

Refactor the competing numlen implementations into a single library function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>

metadump: warn about corruption if log is dirty

Add a warning about possible corruption when exporting a dirty log, as
the log content does not agree with obfuscated metadata.

[sandeen: one more minor wording tweak]

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_metadump: tag metadump image with informational flags

After the long discussion about warning the user and/or consumer
of xfs_metadumps about dirty logs, it crossed my mind that we
could use the reserved slot in the metadump header to tag the
file with attributes, so the consumer of the metadump knows how
it was created.

This patch adds 4 flags to describe the metadump: The first simply
indicates the presence of any (or no) informational flags.
The old mb_reserved field has been 0 on disk since inception, so
the presence of XFS_METADUMP_INFO_FLAGS indicates that this metadump
may contain the informational flags:

- dirty log
- obfuscated
- full blocks (unused portions of metadata blocks
are not zeroed out).

It then adds a new option to xfs_mdrestore, "-i" to show info,
which can be used with or without a target file:

# xfs_mdrestore -i metadumpfile
metadumpfile: not obfuscated, clean log, full metadata blocks

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: handle reading superblock from image on larger sector size filesystem

Due to xfs_repair uses direct IO, sometimes it can't read superblock
from an image file has smaller sector size than host filesystem.
Especially that superblock doesn't align with host filesystem's
sector size.

Fortunately, xfsprogs already has code to do isa_file and geometry
check in xfs_repair.c, it turns off O_DIRECT after phase1() if the
sector size is less than the host filesystem's geometry. So move
the isa_file auto detection over up the phase1(), and try to do
once geometry check before phase1() if get_sb() return OK.

Signed-off-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: add alignment filter to freesp command

This adds an alignment filter to the xfs_db freesp command, i.e.

xfs_db> freesp -A 4

will show only 4-block aligned free extents.

This can be useful for finding freespace suitable for allocations
with specific alignment requirements, such as inode allocation.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_growfs: ensure target path is an active xfs mountpoint

xfs_growfs manpage clearly states that the target path must be
an active xfs mountpoint.

Current behavior allows xfs_growfs to proceed if the target path
resides anywhere on a mounted xfs filesystem. This could lead to
unexpected results. Unless the target path is an active xfs
mountpoint, reject it. Create a new fs table lookup function which
matches only active xfs mount points, not any file residing within
those mountpoints.

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: remove leftover blkid include

All topology handling moved to libxcmd a while ago.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

po/pl.po: update Polish translation for 4.11.0

Signed-off-by: Jakub Bogusz <qboosh@pld-linux.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: fix xfs_trans_alloc_empty namespace

Do all the right libxfs_ magic for this new function.

Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix unaligned access in xfs_btree_visit_blocks

Source kernel commit: a4d768e702de224cc85e0c8eac9311763403b368

This structure copy was throwing unaligned access warnings on sparc64:

Kernel unaligned access at TPC[1043c088] xfs_btree_visit_blocks+0x88/0xe0 [xfs]

xfs_btree_copy_ptrs does a memcpy, which avoids it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: avoid mount-time deadlock in CoW extent recovery

Source kernel commit: 3ecb3ac7b950ff8f6c6a61e8b7b0d6e3546429a0

If a malicious user corrupts the refcount btree to cause a cycle between
different levels of the tree, the next mount attempt will deadlock in
the CoW recovery routine while grabbing buffer locks. We can use the
ability to re-grab a buffer that was previous locked to a transaction to
avoid deadlocks, so do that here.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix warnings about unused stack variables

Source kernel commit: 6e747506dde195d3d05fe2bb8ef78aceba28a5e3

Reduce stack usage and get rid of compiler warnings by eliminating
unused variables.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix indlen accounting error on partial delalloc conversion

Source kernel commit: 0daaecacb83bc6b656a56393ab77a31c28139bc7

The delalloc -> real block conversion path uses an incorrect
calculation in the case where the middle part of a delalloc extent
is being converted. This is documented as a rare situation because
XFS generally attempts to maximize contiguity by converting as much
of a delalloc extent as possible.

If this situation does occur, the indlen reservation for the two new
delalloc extents left behind by the conversion of the middle range
is calculated and compared with the original reservation. If more
blocks are required, the delta is allocated from the global block
pool. This delta value can be characterized as the difference
between the new total requirement (temp + temp2) and the currently
available reservation minus those blocks that have already been
allocated (startblockval(PREV.br_startblock) - allocated).

The problem is that the current code does not account for previously
allocated blocks correctly. It subtracts the current allocation
count from the (new - old) delta rather than the old indlen
reservation. This means that more indlen blocks than have been
allocated end up stashed in the remaining extents and free space
accounting is broken as a result.

Fix up the calculation to subtract the allocated block count from
the original extent indlen and thus correctly allocate the
reservation delta based on the difference between the new total
requirement and the unused blocks from the original reservation.
Also remove a bogus assert that contradicts the fact that the new
indlen reservation can be larger than the original indlen
reservation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS

Source kernel commit: 9070733b4efac4bf17f299a81b01c15e206f9ff5

xfs has defined PF_FSTRANS to declare a scope GFP_NOFS semantic quite
some time ago.  We would like to make this concept more generic and use
it for other filesystems as well.  Let's start by giving the flag a more
generic name PF_MEMALLOC_NOFS which is in line with an exiting
PF_MEMALLOC_NOIO already used for the same purpose for GFP_NOIO
contexts.  Replace all PF_FSTRANS usage from the xfs code in the first
step before we introduce a full API for it as xfs uses the flag directly
anyway.

This patch doesn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170306131408.9828-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: reserve enough blocks to handle btree splits when remapping

Source kernel commit: fe0be23e68200573de027de9b8cc2b27e7fce35e

In xfs_reflink_end_cow, we erroneously reserve only enough blocks to
handle adding 1 extent. This is problematic if we fragment free space,
have to do CoW, and then have to perform multiple bmap btree expansions.
Furthermore, the BUI recovery routine doesn't reserve /any/ blocks to
handle btree splits, so log recovery fails after our first error causes
the filesystem to go down.

Therefore, refactor the transaction block reservation macros until we
have a macro that works for our deferred (re)mapping activities, and fix
both problems by using that macro.

With 1k blocks we can hit this fairly often in g/187 if the scratch fs
is big enough.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: simplify validation of the unwritten extent bit

Source kernel commit: 0c1d9e4a61590c2a4d657d1deddd1674f1565097

XFS only supports the unwritten extent bit in the data fork, and only if
the file system has a version 5 superblock or the unwritten extent
feature bit.

We currently have two routines that validate the invariant:
xfs_check_nostate_extents which return -EFSCORRUPTED when it's not met,
and xfs_validate_extent that triggers and assert in debug build.

Both of them iterate over all extents of an inode fork when called,
which isn't very efficient.

This patch instead adds a new helper that verifies the invariant one
extent at a time, and calls it from the places where we iterate over
all extents to converted them from or two the in-memory format. The
callers then return -EFSCORRUPTED when reading invalid extents from
disk, or trigger an assert when writing them to disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unused values from xfs_exntst_t

Source kernel commit: 37f7f9bbf3b914e94f81426f6f59a3f97f4dc562

We only ever use the normal and unwritten states. And the actual
ondisk format (this enum isn't despite being in xfs_format.h) only
has space for the unwritten bit anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove the unused XFS_MAXLINK_1 define

Source kernel commit: 895e9bfc9e36c8e39ea4d2129dc2cbde1aafec99

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: more do_div cleanups

Source kernel commit: 4f1adf3373f072246c14119b2aa6dfb4d6510a43

On some architectures do_div does the pointer compare
trick to make sure that we've sent it an unsigned 64-bit
number. (Why unsigned? I don't know.)

Fix up the few places that squawk about this; in
xfs_bmap_wants_extents() we just used a bare int64_t so change
that to unsigned.

In xfs_adjust_extent_unmap_boundaries() all we wanted was the
mod, and we have an xfs-specific function to handle that w/o
side effects, which includes proper casting for do_div.

In xfs_daddr_to_ag[b]no, we were using the wrong type anyway;
XFS_BB_TO_FSBT returns a block in the filesystem, so use
xfs_rfsblock_t not xfs_daddr_t, and gain the unsignedness
from that type as a bonus.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove bmap block allocation retries

Source kernel commit: 7590632a33ef2d264665576d3d54e50f906fa758

Now that reflink operations don't set the firstblock value we don't
need the workarounds for non-NULL firstblock values without a prior
allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove xfs_bmap_remap_alloc

Source kernel commit: bf8eadbacb24e321c99bbdd901589942712810d1

The main thing that xfs_bmap_remap_alloc does is fixing the AGFL, similar
to what we do in the space allocator. But the reflink code doesn't touch
the allocation btree unlike the normal space allocator, so we couldn't
care less about the state of the AGFL.

So remove xfs_bmap_remap_alloc and just handle the di_nblocks update in
the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: introduce xfs_bmapi_remap

Source kernel commit: 6ebd5a4413e2afd1b1129135e1cf4a84092550e2

Add a new helper to be used for reflink extent list additions instead of
funneling them through xfs_bmapi_write and overloading the firstblock
member in struct xfs_bmalloca and struct xfs_alloc_args.

With some small changes to xfs_bmap_remap_alloc this also means we do
not need a xfs_bmalloca structure for this case at all.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: pass individual arguments to xfs_bmap_add_extent_hole_real

Source kernel commit: 6d04558f9fa9d16c4aba7243030f22ef0c1bbf32

For the reflink case we'd much rather pass the required arguments than
faking up a struct xfs_bmalloca.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove attr fork handling in xfs_bmap_finish_one

Source kernel commit: 39e07daa46e34c724ad33f903d166a0a62c20900

We never do COW operations for the attr fork, so don't pretend we handle
them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix integer truncation in xfs_bmap_remap_alloc

Source kernel commit: 52813fb13ff90bd9c39a93446cbf1103c290b6e9

bno should be a xfs_fsblock_t, which is 64-bit wides instead of a
xfs_aglock_t, which truncates the value to 32 bits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: simplify xfs_calc_dquots_per_chunk

Source kernel commit: d956f813b6e731ef82699a18e468e37d59a1366d

ndquots is a 32-bit value, and we don't care
about the remainder; there is no reason to use do_div
here, it seems to be the result of a decade+ historical
accident.

Worse, the do_div implementation in userspace breaks
when fed a 32-bit dividend, so we commented it out there
in any case.

Change to simple division, and then we can change
userspace to match, and mandate a 64-bit dividend in
the do_div() in userspace as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: implement the GETFSMAP ioctl

Source kernel commit: e89c041338ed6ef2694e6465ca1ba033e0a2978c

Introduce a new ioctl that uses the reverse mapping btree to return
information about the physical layout of the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: add a couple of queries to iterate free extents in the rtbitmap

Source kernel commit: fb3c3de2f65c007f3ee50538ea131f5c4603c7bc

Add _query_range and _query_all functions to the realtime bitmap
allocator. These two functions are similar in usage to the btree
functions with the same name and will be used for getfsmap and scrub.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: create a function to query all records in a btree

Source kernel commit: e9a2599a249ed7d31771985aea0e761f5680de64

Create a helper function that will query all records in a btree.
This will be used by the online repair functions to examine every
record in a btree to rebuild a second btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: provide a query_range function for freespace btrees

Source kernel commit: 2d520bfaa28b60401fbfe581f485a0bb952d881c

Implement a query_range function for the bnobt and cntbt. This will
be used for getfsmap fallback if there is no rmapbt and by the online
scrub and repair code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: plumb in needed functions for range querying of the freespace btrees

Source kernel commit: 08438b1e386b7899e8a2ffef59d6ebf29aac530f

Plumb in the pieces (init_high_key, diff_two_keys) necessary to call
query_range on the free space btrees. Remove the debugging asserts
so that we can make queries starting from block 0.

While we're at it, merge the redundant "if (btnum ==" hunks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix up inode validation failure message

Source kernel commit: bc593eebfd66838a3022ac92782665fbcafc412e

"xfs_iread: validation failed for inode 96 failed"

One "failed" seems like enough.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove the ISUNWRITTEN macro

Source kernel commit: 63fbb4c18d6b04d5f376326395cddf6c2de2c965

Opencoding the trivial checks makes it much easier to read (and grep..).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: factor out a xfs_bmap_is_real_extent helper

Source kernel commit: 9c4f29d39168bb9363189f0d54ee1202372e7d9b

This checks for all the non-normal extent types, including handling both
encodings of delayed allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.11.0

Update all the necessary files for a 4.11.0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: add missed quotation marks in man page

The description about set_encpolicy command in xfs_io man page missed
some quotation marks. It's a little picky, but it really causes the
quotation marks can't match each other.

Signed-off-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: add missed inode command into man page

There's an "inode" command in xfs_io, it's used to query physical
information about an inode. But there's not any information about
it in xfs_io and other related man pages. So document this command
in the xfs_io man page now.

[sandeen: include some of djwong's edits]

Signed-off-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.11.0-rc2

Update all the necessary files for a 4.11.0-rc2 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: fix statx call for changed UAPI

Due to a late-breaking change in the statx UAPI in kernel commit

1e2f82d1 statx: Kill fd-with-NULL-path support in favour of AT_EMPTY_PATH

we'll need to fix the way we call the syscall in xfs_io.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: dump metadata btrees via 'btdump'

Introduce a new 'btdump' command that can print the contents of all
blocks of any metadata subtree in the filesystem. This enables
developers and forensic analyst to view a metadata structure without
having to navigate the btree manually.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: use iocursor type to guess btree geometry if bad magic

The function block_to_bt plays an integral role in determining the btree
geometry of a block that we want to manipulate with the debugger.
Normally we use the block magic to find the geometry profile, but if the
magic is bad we'll never find it and return NULL. The callers of this
function do not check for NULL and crash.

Therefore, if we can't find a geometry profile matching the magic
number, use the iocursor type to guess the profile and scowl about that
to stdout. This makes it so that even with a corrupt magic we can try
to print the fields instead of crashing the debugger.

[sandeen: comment changes, add magic ASSERT, keep other ASSERTs]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: don't print arrays off the end of a buffer

Before printing an array, clamp the array count against the size of the
buffer so that we don't print random heap contents.

[sandeen: re-use fsz variable in call to prfunc]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs.xfs: Assign proper defaults to rmapbt and reflink flags

The "defaultval" field in the options structure was a bit confusing,
so when the rmapbt & reflink options got added, the desire was
to keep them off by default, and "defaultval = 0" got set.

However, the purpose of this field is to define the default value
when the flag is specified with no associated value, i.e.

-m rmapbt vs. -m rmapbt=0 or -m rmapbt=1

Today, the resulting behavior is unexpected, and different from any
other mkfs flags; specifying "-m rmapbt,reflink" results in a
filesystem /without/ those features.

Fix these to be consistent with every other boolean flag in the
mkfs options, so that specifying the flag with no value will
enable the feature.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: Add statx support for PowerPC architecture

Linux kernel commit f717629c7f834ab2efa05c7dbf0826f1d7c32ade (powerpc:
Wire up statx() syscall) added support for statx syscall for PowerPC
architecture. This commit enables using 'statx' sub-command of xfs_io to
be used on PowerPC.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: fix statx definition for non-x86 architecture

Apply the same fix to xfs_io as Gwendal did for fstests:

Fix a compilation error for ARM:
__ILP32__ is defined but not __X32_SYSCALL_BIT.

The check should only apply for x86_64 architecture, statx for other
architectures is not implemented yet - see commit 7acc839c9e57
"statx: Add a system call to make enhanced file info available".

Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: allow write -d to dqblks

Allow write -d to write bad data and recalculate CRC
for dqblks.

Inspired-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: allow write -d to inodes

Add a helper function to xfs_db so that we can recalculate the CRC of an
inode whose field we just wrote. This enables us to write arbitrary
values with a good CRC for the purpose of checking the read verifiers on
a v5 filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db.8: document write -d option

The xfs_db write "-d" option allows us to write bad
data with a good CRC, as added in
86769b3 xfs_db: allow recalculating CRCs on invalid metadata
but never documented, so do that now.

[sandeen: split out doc patch]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.11.0-rc1

Update all the necessary files for a 4.11.0-rc1 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs.5: document barrier deprecation in xfs

Since kernel v4.10, the barrier mount option is deprecated:

2291dab2 xfs: Always flush caches when integrity is required
4cf4573d xfs: deprecate barrier/nobarrier mount option

Document this fact in the xfs(5) manpage.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: hook up statx

Wire up the statx syscall to xfs_io.

xfs_io> help statx
statx [-v|-r][-m basic | -m all | -m <mask>][-FD] -- extended statistics on the currently open file

Display extended file status.

Options:
-v -- More verbose output
-r -- Print raw statx structure fields
-m mask -- Specify the field mask for the statx call
(can also be 'basic' or 'all'; default STATX_ALL)
-D -- Don't sync attributes with the server
-F -- Force the attributes to be sync'd with the server

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: refactor stat functions, add raw dump

This adds a "-r" raw structure dump to stat options, and
factors the code a bit; statx will also use print_file_info
and print_xfs_info.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: move stat functions to new file

Adding statx will add a bit of code, so break stat-related
functions out of open.c into their own new file.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: fix build dep on configure.ac

Zorro reported that this sequence:

# git checkout v4.9.0; make realclean; make
# git checkout v4.10.0; make clean; make

fails:

...
Building libxfs
[CC] gen_crc32table
gcc: error: @BUILD_CFLAGS@: No such file or directory
gmake[3]: *** No rule to make target `crc32table.h', needed by `crc32selftest'. Stop.

This is because

0a71e38 build: Allow compiling xfsprogs in a cross compile environment

added the new BUILD_CFLAGS to configure.ac, and unless we re-run
autotools, that variable does not get substituted when
include/builddefs gets built.

(This can be worked around by "make realclean" and then everything
gets regenerated.)

The configure script is generated from configure.ac, so adding
a Make dependency here should resolve such issues in the future.

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: pass btnum not magic to phase5 functions

When ed849ef xfs: remove boilerplate around xfs_btree_init_block
was merged from kernelspace, I made only minimal changes at the
libxfs boundary to accommodate the new libxfs_btree_init_block
interface.

We can chase that up a bit higher and remove more code by
passing in btnum from the start; we can also remove the
"finobt" argument from build_ino_tree() because that is
known from type of tree passed in.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: Fix "falloc -p" to pass KEEP_SIZE

Otherwise, the syscall just returns -EOPNOTSUPP.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: warn about dirty log with -n option

When looking at xfs_repair -n output today, we have no idea if
reported errors may be due to an un-replayed dirty log. If this
is the case, mention it in the output.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: detect invalid zero-sized symlink inodes

Mathias Troiden reproduced a filesystem corruption that resulted in
a zero-sized local format symlink inode. This is invalid state and
results in an inode that cannot be accessed or modified.

The kernel detects this problem on inode access, fails and warns the
user to umount and run xfs_repair. Unfortunately, xfs_repair doesn't
even detect the problem. Thus the user has no path to recovery.

Update xfs_repair to check for invalid zero-sized symlinks and flag
them as corrupted. This results in tossing the inode, but returns
the fs to a valid state.

Reported-by: Mathias Troiden <mathias.troiden@gmail.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: support shutdown command on foreign fses

Since f2fs and ext4 support the shutdown ioctl, enable it there.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: remove unused libxfs helper #defines

There are several #defines which aren't used anywhere
in either xfsprogs or the kernel, so remove them from the
libxfs/libxfs_priv.h helper file.

This does leave a few (currently) unused defines in place
if they are still used in the kernel, in the hope that future
libxfs changes might Just Work.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: rework the inline directory verifiers

Source kernel commit: 78420281a9d74014af7616958806c3aba056319e

The inline directory verifiers should be called on the inode fork data,
which means after iformat_local on the read side, and prior to
ifork_flush on the write side.  This makes the fork verifier more
consistent with the way buffer verifiers work -- i.e. they will operate
on the memory buffer that the code will be reading and writing directly.

Furthermore, revise the verifier function to return -EFSCORRUPTED so
that we don't flood the logs with corruption messages and assert
notices.  This has been a particular problem with xfs/348, which
triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the
kernel when CONFIG_XFS_DEBUG=y.  Disk corruption isn't supposed to do
that, at least not in a verifier.

Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: fix xfs_extent_busy_flush macro definition

xfs_extent_busy_flush is a void function, so don't reduce it to zero.
This shuts up gcc warnings about do-nothing statements.

[sandeen: switch to more common ((void)0) paradigm]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: verify inline directory data forks

Source kernel commit: 630a04e79dd41ff746b545d4fc052e0abb836120

When we're reading or writing the data fork of an inline directory,
check the contents to make sure we're not overflowing buffers or eating
garbage data. xfs/348 corrupts an inline symlink into an inline
directory, triggering a buffer overflow bug.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: try any AG when allocating the first btree block when reflinking

Source kernel commit: 2fcc319d2467a5f5b78f35f79fd6e22741a31b1e

When a reflink operation causes the bmap code to allocate a btree block
we're currently doing single-AG allocations due to having ->firstblock
set and then try any higher AG due a little reflink quirk we've put in
when adding the reflink code. But given that we do not have a minleft
reservation of any kind in this AG we can still not have any space in
the same or higher AG even if the file system has enough free space.
To fix this use a XFS_ALLOCTYPE_FIRST_AG allocation in this fall back
path instead.

[And yes, we need to redo this properly instead of piling hacks over
hacks. I'm working on that, but it's not going to be a small series.
In the meantime this fixes the customer reported issue]

Also add a warning for failing allocations to make it easier to debug.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: use iomap new flag for newly allocated delalloc blocks

Source kernel commit: f65e6fad293b3a5793b7fa2044800506490e7a2e

Commit fa7f138 ("xfs: clear delalloc and cache on buffered write
failure") fixed one regression in the iomap error handling code and
exposed another. The fundamental problem is that if a buffered write
is a rewrite of preexisting delalloc blocks and the write fails, the
failure handling code can punch out preexisting blocks with valid
file data.

This was reproduced directly by sub-block writes in the LTP
kernel/syscalls/write/write03 test. A first 100 byte write allocates
a single block in a file. A subsequent 100 byte write fails and
punches out the block, including the data successfully written by
the previous write.

To address this problem, update the ->iomap_begin() handler to
distinguish newly allocated delalloc blocks from preexisting
delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
->iomap_end() handler to decide when a failed or short write should
punch out delalloc blocks.

This introduces the subtle requirement that ->iomap_begin() should
never combine newly allocated delalloc blocks with existing blocks
in the resulting iomap descriptor. This can occur when a new
delalloc reservation merges with a neighboring extent that is part
of the current write, for example. Therefore, drop the
post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and
just return the record inserted into the fork. This ensures only new
blocks are returned and thus that preexisting delalloc blocks are
always handled as "found" blocks and not punched out on a failed
rewrite.

Reported-by: Xiong Zhou <xzhou@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove XFS_ALLOCTYPE_ANY_AG and XFS_ALLOCTYPE_START_AG

Source kernel commit: 8d242e932fb7660c24b3a534197e69c241067e0d

XFS_ALLOCTYPE_ANY_AG was only used for the RT allocator and is unused
now, and XFS_ALLOCTYPE_START_AG has been unused for a while.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: tune down agno asserts in the bmap code

Source kernel commit: 410d17f67e583559be3a922f8b6cc336331893f3

In various places we currently assert that xfs_bmap_btalloc allocates
from the same as the firstblock value passed in, unless it's either
NULLAGNO or the dop_low flag is set.  But the reflink code does not
fully follow this convention as it passes in firstblock purely as
a hint for the allocator without actually having previous allocations
in the transaction, and without having a minleft check on the current
AG, leading to the assert firing on a very full and heavily used
file system.  As even the reflink code only allocates from equal or
higher AGs for now we can simply the check to always allow for equal
or higher AGs.

Note that we need to eventually split the two meanings of the firstblock
value.  At that point we can also allow the reflink code to allocate
from any AG instead of limiting it in any way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment

Source kernel commit: 8ee9fdbebc84b39f1d1c201c5e32277c61d034aa

On a ppc64 system, executing generic/256 test with 32k block size gives the following call trace,

XFS: Assertion failed: args->maxlen > 0, file: /root/repos/linux/fs/xfs/libxfs/xfs_alloc.c, line: 2026

kernel BUG at /root/repos/linux/fs/xfs/xfs_message.c:113!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2048
DEBUG_PAGEALLOC
NUMA
pSeries
Modules linked in:
CPU: 2 PID: 19361 Comm: mkdir Not tainted 4.10.0-rc5 #58
task: c000000102606d80 task.stack: c0000001026b8000
NIP: c0000000004ef798 LR: c0000000004ef798 CTR: c00000000082b290
REGS: c0000001026bb090 TRAP: 0700   Not tainted  (4.10.0-rc5)
MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>
CR: 28004428  XER: 00000000
CFAR: c0000000004ef180 SOFTE: 1
GPR00: c0000000004ef798 c0000001026bb310 c000000001157300 ffffffffffffffea
GPR04: 000000000000000a c0000001026bb130 0000000000000000 ffffffffffffffc0
GPR08: 00000000000000d1 0000000000000021 00000000ffffffd1 c000000000dd4990
GPR12: 0000000022004444 c00000000fe00800 0000000020000000 0000000000000000
GPR16: 0000000000000000 0000000043a606fc 0000000043a76c08 0000000043a1b3d0
GPR20: 000001002a35cd60 c0000001026bbb80 0000000000000000 0000000000000001
GPR24: 0000000000000240 0000000000000004 c00000062dc55000 0000000000000000
GPR28: 0000000000000004 c00000062ecd9200 0000000000000000 c0000001026bb6c0
NIP [c0000000004ef798] .assfail+0x28/0x30
LR [c0000000004ef798] .assfail+0x28/0x30
Call Trace:
[c0000001026bb310] [c0000000004ef798] .assfail+0x28/0x30 (unreliable)
[c0000001026bb380] [c000000000455d74] .xfs_alloc_space_available+0x194/0x1b0
[c0000001026bb410] [c00000000045b914] .xfs_alloc_fix_freelist+0x144/0x480
[c0000001026bb580] [c00000000045c368] .xfs_alloc_vextent+0x698/0xa90
[c0000001026bb650] [c0000000004a6200] .xfs_ialloc_ag_alloc+0x170/0x820
[c0000001026bb7c0] [c0000000004a9098] .xfs_dialloc+0x158/0x320
[c0000001026bb8a0] [c0000000004e628c] .xfs_ialloc+0x7c/0x610
[c0000001026bb990] [c0000000004e8138] .xfs_dir_ialloc+0xa8/0x2f0
[c0000001026bbaa0] [c0000000004e8814] .xfs_create+0x494/0x790
[c0000001026bbbf0] [c0000000004e5ebc] .xfs_generic_create+0x2bc/0x410
[c0000001026bbce0] [c0000000002b4a34] .vfs_mkdir+0x154/0x230
[c0000001026bbd70] [c0000000002bc444] .SyS_mkdirat+0x94/0x120
[c0000001026bbe30] [c00000000000b760] system_call+0x38/0xfc
Instruction dump:
4e800020 60000000 7c0802a6 7c862378 3c82ffca 7ca72b78 38841c18 7c651b78
38600000 f8010010 f821ff91 4bfff94d <0fe00000> 60000000 7c0802a6 7c892378

When block size is larger than inode cluster size, the call to
XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size) returns 0. Also, mkfs.xfs
would have set xfs_sb->sb_inoalignmt to 0. This causes
xfs_ialloc_cluster_alignment() to return 0.  Due to this
args.minalignslop (in xfs_ialloc_ag_alloc()) gets the unsigned
equivalent of -1 assigned to it. This later causes alloc_len in
xfs_alloc_space_available() to have a value of 0. In such a scenario
when args.total is also 0, the assert statement "ASSERT(args->maxlen >
0);" fails.

This commit fixes the bug by replacing the call to XFS_B_TO_FSBT() in
xfs_ialloc_cluster_alignment() with a call to xfs_icluster_size_fsb().

Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: split indlen reservations fairly when under reserved

Source kernel commit: 75d65361cf3c0dae2af970c305e19c727b28a510

Certain workoads that punch holes into speculative preallocation can
cause delalloc indirect reservation splits when the delalloc extent is
split in two. If further splits occur, an already short-handed extent
can be split into two in a manner that leaves zero indirect blocks for
one of the two new extents. This occurs because the shortage is large
enough that the xfs_bmap_split_indlen() algorithm completely drains the
requested indlen of one of the extents before it honors the existing
reservation.

This ultimately results in a warning from xfs_bmap_del_extent(). This
has been observed during file copies of large, sparse files using 'cp
--sparse=always.'

To avoid this problem, update xfs_bmap_split_indlen() to explicitly
apply the reservation shortage fairly between both extents. This smooths
out the overall indlen shortage and defers the situation where we end up
with a delalloc extent with zero indlen reservation to extreme
circumstances.

Reported-by: Patrick Dung <mpatdung@gmail.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: handle indlen shortage on delalloc extent merge

Source kernel commit: 0e339ef8556d9e567aa7925f8892c263d79430d9

When a delalloc extent is created, it can be merged with pre-existing,
contiguous, delalloc extents. When this occurs,
xfs_bmap_add_extent_hole_delay() merges the extents along with the
associated indirect block reservations. The expectation here is that the
combined worst case indlen reservation is always less than or equal to
the indlen reservation for the individual extents.

This is not always the case, however, as existing extents can less than
the expected indlen reservation if the extent was previously split due
to a hole punch. If a new extent merges with such an extent, the total
indlen requirement may be larger than the sum of the indlen reservations
held by both extents.

xfs_bmap_add_extent_hole_delay() assumes that the worst case indlen
reservation is always available and assigns it to the merged extent
without consideration for the indlen held by the pre-existing extent. As
a result, the subsequent xfs_mod_fdblocks() call can attempt an
unintentional allocation rather than a free (indicated by an ASSERT()
failure). Further, if the allocation happens to fail in this context,
the failure goes unhandled and creates a filesystem wide block
accounting inconsistency.

Fix xfs_bmap_add_extent_hole_delay() to function as designed. Cap the
indlen reservation assigned to the merged extent to the sum of the
indlen reservations held by each of the individual extents.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: improve handling of busy extents in the low-level allocator

Source kernel commit: ebf55872616c7d4754db5a318591a72a8d5e6896

Currently we force the log and simply try again if we hit a busy extent,
but especially with online discard enabled it might take a while after
the log force for the busy extents to disappear, and we might have
already completed our second pass.

So instead we add a new waitqueue and a generation counter to the pag
structure so that we can do wakeups once we've removed busy extents,
and we replace the single retry with an unconditional one - after
all we hold the AGF buffer lock, so no other allocations or frees
can be racing with us in this AG.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: go straight to real allocations for direct I/O COW writes

Source kernel commit: a14234c72bf41ac96bc8c98e96e2c84b6d4bd4f2

When we allocate COW fork blocks for direct I/O writes we currently first
create a delayed allocation, and then convert it to a real allocation
once we've got the delayed one.

As there is no good reason for that this patch instead makes use call
xfs_bmapi_write from the COW allocation path. The only interesting bits
are a few tweaks the low-level allocator to allow for this, most notably
the need to remove the call to xfs_bmap_extsize_align for the cowextsize
in xfs_bmap_btalloc - for the existing convert case it's a no-op, but
for the direct allocation case it would blow up our block reservation
way beyond what we reserved for the transaction.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: allow unwritten extents in the CoW fork

Source kernel commit: 05a630d76bd3f39baf0eecfa305bed2820796dee

In the data fork, we only allow extents to perform the following state
transitions:

delay -> real <-> unwritten

There's no way to move directly from a delalloc reservation to an
/unwritten/ allocated extent.  However, for the CoW fork we want to be
able to do the following to each extent:

delalloc -> unwritten -> written -> remapped to data fork

This will help us to avoid a race in the speculative CoW preallocation
code between a first thread that is allocating a CoW extent and a second
thread that is remapping part of a file after a write.  In order to do
this, however, we need two things: first, we have to be able to
transition from da to unwritten, and second the function that converts
between real and unwritten has to be made aware of the cow fork.  Do
both of those things.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: verify free block header fields

Source kernel commit: de14c5f541e78c59006bee56f6c5c2ef1ca07272

Perform basic sanity checking of the directory free block header
fields so that we avoid hanging the system on invalid data.

(Granted that just means that now we shutdown on directory write,
but that seems better than hanging...)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: check for obviously bad level values in the bmbt root

Source kernel commit: b3bf607d58520ea8c0666aeb4be60dbb724cd3a2

We can't handle a bmbt that's taller than BTREE_MAXLEVELS, and there's
no such thing as a zero-level bmbt (for that we have extents format),
so if we see this, send back an error code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>