]> git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log
thirdparty/xfsprogs-dev.git
7 years agoxfs: rework the inline directory verifiers libxfs-4.11-sync
Darrick J. Wong [Mon, 10 Apr 2017 22:30:00 +0000 (17:30 -0500)] 
xfs: rework the inline directory verifiers

Source kernel commit: 78420281a9d74014af7616958806c3aba056319e

The inline directory verifiers should be called on the inode fork data,
which means after iformat_local on the read side, and prior to
ifork_flush on the write side.  This makes the fork verifier more
consistent with the way buffer verifiers work -- i.e. they will operate
on the memory buffer that the code will be reading and writing directly.

Furthermore, revise the verifier function to return -EFSCORRUPTED so
that we don't flood the logs with corruption messages and assert
notices.  This has been a particular problem with xfs/348, which
triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the
kernel when CONFIG_XFS_DEBUG=y.  Disk corruption isn't supposed to do
that, at least not in a verifier.

Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agolibxfs: fix xfs_extent_busy_flush macro definition
Darrick J. Wong [Mon, 10 Apr 2017 22:29:48 +0000 (17:29 -0500)] 
libxfs: fix xfs_extent_busy_flush macro definition

xfs_extent_busy_flush is a void function, so don't reduce it to zero.
This shuts up gcc warnings about do-nothing statements.

[sandeen: switch to more common ((void)0) paradigm]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: verify inline directory data forks
Darrick J. Wong [Tue, 4 Apr 2017 20:37:45 +0000 (15:37 -0500)] 
xfs: verify inline directory data forks

Source kernel commit: 630a04e79dd41ff746b545d4fc052e0abb836120

When we're reading or writing the data fork of an inline directory,
check the contents to make sure we're not overflowing buffers or eating
garbage data.  xfs/348 corrupts an inline symlink into an inline
directory, triggering a buffer overflow bug.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: try any AG when allocating the first btree block when reflinking
Christoph Hellwig [Tue, 4 Apr 2017 20:37:44 +0000 (15:37 -0500)] 
xfs: try any AG when allocating the first btree block when reflinking

Source kernel commit: 2fcc319d2467a5f5b78f35f79fd6e22741a31b1e

When a reflink operation causes the bmap code to allocate a btree block
we're currently doing single-AG allocations due to having ->firstblock
set and then try any higher AG due a little reflink quirk we've put in
when adding the reflink code.  But given that we do not have a minleft
reservation of any kind in this AG we can still not have any space in
the same or higher AG even if the file system has enough free space.
To fix this use a XFS_ALLOCTYPE_FIRST_AG allocation in this fall back
path instead.

[And yes, we need to redo this properly instead of piling hacks over
hacks.  I'm working on that, but it's not going to be a small series.
In the meantime this fixes the customer reported issue]

Also add a warning for failing allocations to make it easier to debug.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: use iomap new flag for newly allocated delalloc blocks
Brian Foster [Tue, 4 Apr 2017 20:37:44 +0000 (15:37 -0500)] 
xfs: use iomap new flag for newly allocated delalloc blocks

Source kernel commit: f65e6fad293b3a5793b7fa2044800506490e7a2e

Commit fa7f138 ("xfs: clear delalloc and cache on buffered write
failure") fixed one regression in the iomap error handling code and
exposed another. The fundamental problem is that if a buffered write
is a rewrite of preexisting delalloc blocks and the write fails, the
failure handling code can punch out preexisting blocks with valid
file data.

This was reproduced directly by sub-block writes in the LTP
kernel/syscalls/write/write03 test. A first 100 byte write allocates
a single block in a file. A subsequent 100 byte write fails and
punches out the block, including the data successfully written by
the previous write.

To address this problem, update the ->iomap_begin() handler to
distinguish newly allocated delalloc blocks from preexisting
delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
->iomap_end() handler to decide when a failed or short write should
punch out delalloc blocks.

This introduces the subtle requirement that ->iomap_begin() should
never combine newly allocated delalloc blocks with existing blocks
in the resulting iomap descriptor. This can occur when a new
delalloc reservation merges with a neighboring extent that is part
of the current write, for example. Therefore, drop the
post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and
just return the record inserted into the fork. This ensures only new
blocks are returned and thus that preexisting delalloc blocks are
always handled as "found" blocks and not punched out on a failed
rewrite.

Reported-by: Xiong Zhou <xzhou@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: remove XFS_ALLOCTYPE_ANY_AG and XFS_ALLOCTYPE_START_AG
Christoph Hellwig [Tue, 4 Apr 2017 20:37:44 +0000 (15:37 -0500)] 
xfs: remove XFS_ALLOCTYPE_ANY_AG and XFS_ALLOCTYPE_START_AG

Source kernel commit: 8d242e932fb7660c24b3a534197e69c241067e0d

XFS_ALLOCTYPE_ANY_AG  was only used for the RT allocator and is unused
now, and XFS_ALLOCTYPE_START_AG has been unused for a while.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: tune down agno asserts in the bmap code
Christoph Hellwig [Tue, 4 Apr 2017 20:37:44 +0000 (15:37 -0500)] 
xfs: tune down agno asserts in the bmap code

Source kernel commit: 410d17f67e583559be3a922f8b6cc336331893f3

In various places we currently assert that xfs_bmap_btalloc allocates
from the same as the firstblock value passed in, unless it's either
NULLAGNO or the dop_low flag is set.  But the reflink code does not
fully follow this convention as it passes in firstblock purely as
a hint for the allocator without actually having previous allocations
in the transaction, and without having a minleft check on the current
AG, leading to the assert firing on a very full and heavily used
file system.  As even the reflink code only allocates from equal or
higher AGs for now we can simply the check to always allow for equal
or higher AGs.

Note that we need to eventually split the two meanings of the firstblock
value.  At that point we can also allow the reflink code to allocate
from any AG instead of limiting it in any way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment
Chandan Rajendra [Tue, 4 Apr 2017 20:37:44 +0000 (15:37 -0500)] 
xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment

Source kernel commit: 8ee9fdbebc84b39f1d1c201c5e32277c61d034aa

On a ppc64 system, executing generic/256 test with 32k block size gives the following call trace,

XFS: Assertion failed: args->maxlen > 0, file: /root/repos/linux/fs/xfs/libxfs/xfs_alloc.c, line: 2026

kernel BUG at /root/repos/linux/fs/xfs/xfs_message.c:113!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2048
DEBUG_PAGEALLOC
NUMA
pSeries
Modules linked in:
CPU: 2 PID: 19361 Comm: mkdir Not tainted 4.10.0-rc5 #58
task: c000000102606d80 task.stack: c0000001026b8000
NIP: c0000000004ef798 LR: c0000000004ef798 CTR: c00000000082b290
REGS: c0000001026bb090 TRAP: 0700   Not tainted  (4.10.0-rc5)
MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>
CR: 28004428  XER: 00000000
CFAR: c0000000004ef180 SOFTE: 1
GPR00: c0000000004ef798 c0000001026bb310 c000000001157300 ffffffffffffffea
GPR04: 000000000000000a c0000001026bb130 0000000000000000 ffffffffffffffc0
GPR08: 00000000000000d1 0000000000000021 00000000ffffffd1 c000000000dd4990
GPR12: 0000000022004444 c00000000fe00800 0000000020000000 0000000000000000
GPR16: 0000000000000000 0000000043a606fc 0000000043a76c08 0000000043a1b3d0
GPR20: 000001002a35cd60 c0000001026bbb80 0000000000000000 0000000000000001
GPR24: 0000000000000240 0000000000000004 c00000062dc55000 0000000000000000
GPR28: 0000000000000004 c00000062ecd9200 0000000000000000 c0000001026bb6c0
NIP [c0000000004ef798] .assfail+0x28/0x30
LR [c0000000004ef798] .assfail+0x28/0x30
Call Trace:
[c0000001026bb310] [c0000000004ef798] .assfail+0x28/0x30 (unreliable)
[c0000001026bb380] [c000000000455d74] .xfs_alloc_space_available+0x194/0x1b0
[c0000001026bb410] [c00000000045b914] .xfs_alloc_fix_freelist+0x144/0x480
[c0000001026bb580] [c00000000045c368] .xfs_alloc_vextent+0x698/0xa90
[c0000001026bb650] [c0000000004a6200] .xfs_ialloc_ag_alloc+0x170/0x820
[c0000001026bb7c0] [c0000000004a9098] .xfs_dialloc+0x158/0x320
[c0000001026bb8a0] [c0000000004e628c] .xfs_ialloc+0x7c/0x610
[c0000001026bb990] [c0000000004e8138] .xfs_dir_ialloc+0xa8/0x2f0
[c0000001026bbaa0] [c0000000004e8814] .xfs_create+0x494/0x790
[c0000001026bbbf0] [c0000000004e5ebc] .xfs_generic_create+0x2bc/0x410
[c0000001026bbce0] [c0000000002b4a34] .vfs_mkdir+0x154/0x230
[c0000001026bbd70] [c0000000002bc444] .SyS_mkdirat+0x94/0x120
[c0000001026bbe30] [c00000000000b760] system_call+0x38/0xfc
Instruction dump:
4e800020 60000000 7c0802a6 7c862378 3c82ffca 7ca72b78 38841c18 7c651b78
38600000 f8010010 f821ff91 4bfff94d <0fe0000060000000 7c0802a6 7c892378

When block size is larger than inode cluster size, the call to
XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size) returns 0. Also, mkfs.xfs
would have set xfs_sb->sb_inoalignmt to 0. This causes
xfs_ialloc_cluster_alignment() to return 0.  Due to this
args.minalignslop (in xfs_ialloc_ag_alloc()) gets the unsigned
equivalent of -1 assigned to it. This later causes alloc_len in
xfs_alloc_space_available() to have a value of 0. In such a scenario
when args.total is also 0, the assert statement "ASSERT(args->maxlen >
0);" fails.

This commit fixes the bug by replacing the call to XFS_B_TO_FSBT() in
xfs_ialloc_cluster_alignment() with a call to xfs_icluster_size_fsb().

Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: split indlen reservations fairly when under reserved
Brian Foster [Tue, 4 Apr 2017 20:37:44 +0000 (15:37 -0500)] 
xfs: split indlen reservations fairly when under reserved

Source kernel commit: 75d65361cf3c0dae2af970c305e19c727b28a510

Certain workoads that punch holes into speculative preallocation can
cause delalloc indirect reservation splits when the delalloc extent is
split in two. If further splits occur, an already short-handed extent
can be split into two in a manner that leaves zero indirect blocks for
one of the two new extents. This occurs because the shortage is large
enough that the xfs_bmap_split_indlen() algorithm completely drains the
requested indlen of one of the extents before it honors the existing
reservation.

This ultimately results in a warning from xfs_bmap_del_extent(). This
has been observed during file copies of large, sparse files using 'cp
--sparse=always.'

To avoid this problem, update xfs_bmap_split_indlen() to explicitly
apply the reservation shortage fairly between both extents. This smooths
out the overall indlen shortage and defers the situation where we end up
with a delalloc extent with zero indlen reservation to extreme
circumstances.

Reported-by: Patrick Dung <mpatdung@gmail.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: handle indlen shortage on delalloc extent merge
Brian Foster [Tue, 4 Apr 2017 20:37:44 +0000 (15:37 -0500)] 
xfs: handle indlen shortage on delalloc extent merge

Source kernel commit: 0e339ef8556d9e567aa7925f8892c263d79430d9

When a delalloc extent is created, it can be merged with pre-existing,
contiguous, delalloc extents. When this occurs,
xfs_bmap_add_extent_hole_delay() merges the extents along with the
associated indirect block reservations. The expectation here is that the
combined worst case indlen reservation is always less than or equal to
the indlen reservation for the individual extents.

This is not always the case, however, as existing extents can less than
the expected indlen reservation if the extent was previously split due
to a hole punch. If a new extent merges with such an extent, the total
indlen requirement may be larger than the sum of the indlen reservations
held by both extents.

xfs_bmap_add_extent_hole_delay() assumes that the worst case indlen
reservation is always available and assigns it to the merged extent
without consideration for the indlen held by the pre-existing extent. As
a result, the subsequent xfs_mod_fdblocks() call can attempt an
unintentional allocation rather than a free (indicated by an ASSERT()
failure). Further, if the allocation happens to fail in this context,
the failure goes unhandled and creates a filesystem wide block
accounting inconsistency.

Fix xfs_bmap_add_extent_hole_delay() to function as designed. Cap the
indlen reservation assigned to the merged extent to the sum of the
indlen reservations held by each of the individual extents.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: improve handling of busy extents in the low-level allocator
Christoph Hellwig [Tue, 4 Apr 2017 20:37:44 +0000 (15:37 -0500)] 
xfs: improve handling of busy extents in the low-level allocator

Source kernel commit: ebf55872616c7d4754db5a318591a72a8d5e6896

Currently we force the log and simply try again if we hit a busy extent,
but especially with online discard enabled it might take a while after
the log force for the busy extents to disappear, and we might have
already completed our second pass.

So instead we add a new waitqueue and a generation counter to the pag
structure so that we can do wakeups once we've removed busy extents,
and we replace the single retry with an unconditional one - after
all we hold the AGF buffer lock, so no other allocations or frees
can be racing with us in this AG.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: go straight to real allocations for direct I/O COW writes
Christoph Hellwig [Tue, 4 Apr 2017 20:37:44 +0000 (15:37 -0500)] 
xfs: go straight to real allocations for direct I/O COW writes

Source kernel commit: a14234c72bf41ac96bc8c98e96e2c84b6d4bd4f2

When we allocate COW fork blocks for direct I/O writes we currently first
create a delayed allocation, and then convert it to a real allocation
once we've got the delayed one.

As there is no good reason for that this patch instead makes use call
xfs_bmapi_write from the COW allocation path.  The only interesting bits
are a few tweaks the low-level allocator to allow for this, most notably
the need to remove the call to xfs_bmap_extsize_align for the cowextsize
in xfs_bmap_btalloc - for the existing convert case it's a no-op, but
for the direct allocation case it would blow up our block reservation
way beyond what we reserved for the transaction.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: allow unwritten extents in the CoW fork
Darrick J. Wong [Tue, 4 Apr 2017 20:37:44 +0000 (15:37 -0500)] 
xfs: allow unwritten extents in the CoW fork

Source kernel commit: 05a630d76bd3f39baf0eecfa305bed2820796dee

In the data fork, we only allow extents to perform the following state
transitions:

delay -> real <-> unwritten

There's no way to move directly from a delalloc reservation to an
/unwritten/ allocated extent.  However, for the CoW fork we want to be
able to do the following to each extent:

delalloc -> unwritten -> written -> remapped to data fork

This will help us to avoid a race in the speculative CoW preallocation
code between a first thread that is allocating a CoW extent and a second
thread that is remapping part of a file after a write.  In order to do
this, however, we need two things: first, we have to be able to
transition from da to unwritten, and second the function that converts
between real and unwritten has to be made aware of the cow fork.  Do
both of those things.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: verify free block header fields
Darrick J. Wong [Tue, 4 Apr 2017 20:37:43 +0000 (15:37 -0500)] 
xfs: verify free block header fields

Source kernel commit: de14c5f541e78c59006bee56f6c5c2ef1ca07272

Perform basic sanity checking of the directory free block header
fields so that we avoid hanging the system on invalid data.

(Granted that just means that now we shutdown on directory write,
but that seems better than hanging...)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: check for obviously bad level values in the bmbt root
Darrick J. Wong [Tue, 4 Apr 2017 20:37:43 +0000 (15:37 -0500)] 
xfs: check for obviously bad level values in the bmbt root

Source kernel commit: b3bf607d58520ea8c0666aeb4be60dbb724cd3a2

We can't handle a bmbt that's taller than BTREE_MAXLEVELS, and there's
no such thing as a zero-level bmbt (for that we have extents format),
so if we see this, send back an error code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: filter out obviously bad btree pointers
Darrick J. Wong [Tue, 4 Apr 2017 20:37:43 +0000 (15:37 -0500)] 
xfs: filter out obviously bad btree pointers

Source kernel commit: d5a91baeb6033c3392121e4d5c011cdc08dfa9f7

Don't let anybody load an obviously bad btree pointer.  Since the values
come from disk, we must return an error, not just ASSERT.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: fail _dir_open when readahead fails
Darrick J. Wong [Tue, 4 Apr 2017 20:37:43 +0000 (15:37 -0500)] 
xfs: fail _dir_open when readahead fails

Source kernel commit: 7a652bbe366464267190c2792a32ce4fff5595ef

When we open a directory, we try to readahead block 0 of the directory
on the assumption that we're going to need it soon.  If the bmbt is
corrupt, the directory will never be usable and the readahead fails
immediately, so we might as well prevent the directory from being opened
at all.  This prevents a subsequent read or modify operation from
hitting it and taking the fs offline.

NOTE: We're only checking for early failures in the block mapping, not
the readahead directory block itself.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: fix toctou race when locking an inode to access the data map
Darrick J. Wong [Tue, 4 Apr 2017 20:37:43 +0000 (15:37 -0500)] 
xfs: fix toctou race when locking an inode to access the data map

Source kernel commit: 4b5bd5bf3fb182dc504b1b64e0331300f156e756

We use di_format and if_flags to decide whether we're grabbing the ilock
in btree mode (btree extents not loaded) or shared mode (anything else),
but the state of those fields can be changed by other threads that are
also trying to load the btree extents -- IFEXTENTS gets set before the
_bmap_read_extents call and cleared if it fails.

We don't actually need to have IFEXTENTS set until after the bmbt
records are successfully loaded and validated, which will fix the race
between multiple threads trying to read the same directory.  The next
patch strengthens directory bmbt validation by refusing to open the
directory if reading the bmbt to start directory readahead fails.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: remove unused struct declarations
Eric Sandeen [Tue, 4 Apr 2017 20:37:43 +0000 (15:37 -0500)] 
xfs: remove unused struct declarations

Source kernel commit: 64f61ab6040c9f04ba181cca7580212f23b89f74

After scratching my head looking for "xfs_busy_extent" I realized
it's not used; it's xfs_extent_busy, and the declaration for the
other name is bogus.  Remove that and a few others as well.

(struct xfs_log_callback is used, but the 2nd declaration is
unnecessary).

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: remove boilerplate around xfs_btree_init_block
Eric Sandeen [Tue, 4 Apr 2017 20:37:43 +0000 (15:37 -0500)] 
xfs: remove boilerplate around xfs_btree_init_block

Source kernel commit: b6f41e448277ff080fea734b93121e6cd7513f0c
(minimal changes made to mkfs & repair code for merge)

Now that xfs_btree_init_block_int is able to determine crc
status from the passed-in mp, we can determine the proper
magic as well if we are given a btree number, rather than
an explicit magic value.

Change xfs_btree_init_block[_int] callers to pass in the
btree number, and let xfs_btree_init_block_int use the
xfs_magics array via the xfs_btree_magic macro to determine
which magic value is needed.  This makes all of the
if (crc) / else stanzas identical, and the if/else can be
removed, leading to a single, common init_block call.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: make xfs_btree_magic more generic
Eric Sandeen [Tue, 4 Apr 2017 20:37:43 +0000 (15:37 -0500)] 
xfs: make xfs_btree_magic more generic

Source kernel commit: af7d20fd83d9e2b3111a847e4220bf943e2d531c

Right now the xfs_btree_magic() define takes only a cursor;
change this to take crc and btnum args to make it more generically
useful, and move to a function.

This will allow xfs_btree_init_block_int callers which don't
have a cursor to make use of the xfs_magics array, which will
happen in the next patch.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: glean crc status from mp not flags in xfs_btree_init_block_int
Eric Sandeen [Tue, 4 Apr 2017 20:37:32 +0000 (15:37 -0500)] 
xfs: glean crc status from mp not flags in xfs_btree_init_block_int

Source kernel commit: f88ae46b09e93ef07ac9efaf85df62adb5ba58e6

xfs_btree_init_block_int() can determine whether crcs are
in effect without the passed-in XFS_BTREE_CRC_BLOCKS flag;
the mp argument allows us to determine this from the
superblock.  Remove the flag from callers, and use
xfs_sb_version_hascrc(&mp->m_sb) internally instead.

This removes one difference between the if & else cases
in the callers.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfsprogs: Release v4.10.0 v4.10.0
Eric Sandeen [Sun, 26 Feb 2017 20:03:31 +0000 (14:03 -0600)] 
xfsprogs: Release v4.10.0

Update all the necessary files for a 4.10.0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoinclude: don't collide __bitwise definitions in 4.10
Darrick J. Wong [Wed, 22 Feb 2017 20:39:01 +0000 (14:39 -0600)] 
include: don't collide __bitwise definitions in 4.10

Linux 4.10 changed the definition of __bitwise in such a way that
xfsprogs' definition is no longer a strict match for it.  This causes
gcc to complain, so only #define it here if the system hasn't already
done it for us.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_metadump: ignore attr leaf with 0 entries v4.10.0-rc1
Eric Sandeen [Thu, 16 Feb 2017 03:48:31 +0000 (21:48 -0600)] 
xfs_metadump: ignore attr leaf with 0 entries

Another in the ongoing saga of attribute leaves with zero
entries; in this case, if we try to metadump an inode with
a zero-entries attribute leaf, the zeroing code will go off
the rails and segfault at:

                memset(&entries[nentries], 0,
                       first_name - (char *)&entries[nentries]);

because first_name is null, and we try to memset a large
(negative) number.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agolibxfs: sync up FSGETXATTR names and definitions with the kernel
Darrick J. Wong [Thu, 16 Feb 2017 03:04:03 +0000 (21:04 -0600)] 
libxfs: sync up FSGETXATTR names and definitions with the kernel

The rest of xfsprogs uses FS_XFLAG values for FSGETXATTR as defined in
the kernel, so we should do the same in io/cowextsize.c.  Also, move the
XFS_IOC_FSGETXATTR definition to the same part of xfs_fs.h as the
kernel.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfsprogs: Fix building xfsprogs on 32-bit platforms (again)
Eric Biggers [Thu, 16 Feb 2017 03:04:03 +0000 (21:04 -0600)] 
xfsprogs: Fix building xfsprogs on 32-bit platforms (again)

Building xfsprogs on 32-bit platforms was broken again by the recent
split of BUILD_CFLAGS from CFLAGS.  -D_FILE_OFFSET_BITS=64 was not added
to BUILD_CFLAGS, but in fact BUILD_CFLAGS is used to compile
crc32selftest, which includes xfs.h and therefore requires this
declaration.  Fix this by adding -D_FILE_OFFSET_BITS=64 to BUILD_CFLAGS.

Fixes: 0a71e3839630 ("build: Allow compiling xfsprogs in a cross compile environment")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: extsize hints are not unlikely in xfs_bmap_btalloc
Christoph Hellwig [Thu, 16 Feb 2017 03:04:03 +0000 (21:04 -0600)] 
xfs: extsize hints are not unlikely in xfs_bmap_btalloc

Source kernel commit: 493611ebd62673f39e2f52c2561182c558a21cb6

With COW files they are the hotpath, just like for files with the
extent size hint attribute.  We really shouldn't micro-manage anything
but failure cases with unlikely.

Additionally Arnd Bergmann recently reported that one of these two
unlikely annotations causes link failures together with an upcoming
kernel instrumentation patch, so let's get rid of it ASAP.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: remove racy hasattr check from attr ops
Brian Foster [Thu, 16 Feb 2017 03:04:03 +0000 (21:04 -0600)] 
xfs: remove racy hasattr check from attr ops

Source kernel commit: 5a93790d4e2df73e30c965ec6e49be82fc3ccfce

xfs_attr_[get|remove]() have unlocked attribute fork checks to optimize
away a lock cycle in cases where the fork does not exist or is otherwise
empty. This check is not safe, however, because an attribute fork short
form to extent format conversion includes a transient state that causes
the xfs_inode_hasattr() check to fail. Specifically,
xfs_attr_shortform_to_leaf() creates an empty extent format attribute
fork and then adds the existing shortform attributes to it.

This means that lookup of an existing xattr can spuriously return
-ENOATTR when racing against a setxattr that causes the associated
format conversion. This was originally reproduced by an untar on a
particularly configured glusterfs volume, but can also be reproduced on
demand with properly crafted xattr requests.

The format conversion occurs under the exclusive ilock. xfs_attr_get()
and xfs_attr_remove() already have the proper locking and checks further
down in the functions to handle this situation correctly. Drop the
unlocked checks to avoid the spurious failure and rely on the existing
logic.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: use per-AG reservations for the finobt
Christoph Hellwig [Thu, 16 Feb 2017 03:04:03 +0000 (21:04 -0600)] 
xfs: use per-AG reservations for the finobt

Source kernel commit: 76d771b4cbe33c581bd6ca2710c120be51172440

Currently we try to rely on the global reserved block pool for block
allocations for the free inode btree, but I have customer reports
(fairly complex workload, need to find an easier reproducer) where that
is not enough as the AG where we free an inode that requires a new
finobt block is entirely full.  This causes us to cancel a dirty
transaction and thus a file system shutdown.

I think the right way to guard against this is to treat the finot the same
way as the refcount btree and have a per-AG reservations for the possible
worst case size of it, and the patch below implements that.

Note that this could increase mount times with large finobt trees.  In
an ideal world we would have added a field for the number of finobt
fields to the AGI, similar to what we did for the refcount blocks.
We should do add it next time we rev the AGI or AGF format by adding
new fields.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: only update mount/resv fields on success in __xfs_ag_resv_init
Christoph Hellwig [Thu, 16 Feb 2017 03:04:02 +0000 (21:04 -0600)] 
xfs: only update mount/resv fields on success in __xfs_ag_resv_init

Source kernel commit: 4dfa2b84118fd6c95202ae87e62adf5000ccd4d0

Try to reserve the blocks first and only then update the fields in
or hanging off the mount structure.  This way we can call __xfs_ag_resv_init
again after a previous failure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: verify dirblocklog correctly
Darrick J. Wong [Thu, 16 Feb 2017 03:03:54 +0000 (21:03 -0600)] 
xfs: verify dirblocklog correctly

Source kernel commit: 83d230eb5c638949350f4761acdfc0af5cb1bc00

sb_dirblklog is added to sb_blocklog to compute the directory block size
in bytes.  Therefore, we must compare the sum of both those values
against XFS_MAX_BLOCKSIZE_LOG, not just dirblklog.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: fix COW writeback race
Christoph Hellwig [Thu, 16 Feb 2017 03:03:54 +0000 (21:03 -0600)] 
xfs: fix COW writeback race

Source kernel commit: d2b3964a0780d2d2994eba57f950d6c9fe489ed8

Due to the way how xfs_iomap_write_allocate tries to convert the whole
found extents from delalloc to real space we can run into a race
condition with multiple threads doing writes to this same extent.
For the non-COW case that is harmless as the only thing that can happen
is that we call xfs_bmapi_write on an extent that has already been
converted to a real allocation.  For COW writes where we move the extent
from the COW to the data fork after I/O completion the race is, however,
not quite as harmless.  In the worst case we are now calling
xfs_bmapi_write on a region that contains hole in the COW work, which
will trip up an assert in debug builds or lead to file system corruption
in non-debug builds.  This seems to be reproducible with workloads of
small O_DSYNC write, although so far I've not managed to come up with
a with an isolated reproducer.

The fix for the issue is relatively simple:  tell xfs_bmapi_write
that we are only asked to convert delayed allocations and skip holes
in that case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: fix xfs_mode_to_ftype() prototype
Arnd Bergmann [Thu, 16 Feb 2017 03:03:54 +0000 (21:03 -0600)] 
xfs: fix xfs_mode_to_ftype() prototype

Source kernel commit: fd29f7af75b7adf250beccffa63746c6a88e2b74

A harmless warning just got introduced:

fs/xfs/libxfs/xfs_dir2.h:40:8: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]

Removing the 'const' modifier avoids the warning and has no
other effect.

Fixes: 1fc4d33fed12 ("xfs: replace xfs_mode_to_ftype table with switch statement")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_repair: Fix uninit var in process_leaf_attr_level
Eric Sandeen [Thu, 26 Jan 2017 02:37:23 +0000 (20:37 -0600)] 
xfs_repair: Fix uninit var in process_leaf_attr_level

My unreviewed maintainer adjustment on the way in gets a
brown paper bag.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agotools/find-api-violations: fix fs -> fsr in the directory list
Darrick J. Wong [Thu, 26 Jan 2017 02:02:43 +0000 (20:02 -0600)] 
tools/find-api-violations: fix fs -> fsr in the directory list

Fix a stupid typo in the original commit.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_db: Interpret inode's di_format field as unsigned
chandan [Thu, 26 Jan 2017 02:02:43 +0000 (20:02 -0600)] 
xfs_db: Interpret inode's di_format field as unsigned

On a ppc64 big endian system, xfs_db would print the following,

xfs_db> p
core.magic = 0x494e
core.mode = 0100600
core.version = 3
core.format = -253

This is due to fp_dinode_fmt() interpretting the di_format field as
signed. This commit fixes the bug by passing BVUNSIGNED (instead of
BVSIGNED) as the argument to getbitval(). With this commit applied, we
now get,

xfs_db> p
core.magic = 0x494e
core.mode = 0100600
core.version = 3
core.format = 3 (btree)

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_repair: trash dirattr btrees that cycle to the root
Darrick J. Wong [Thu, 26 Jan 2017 02:02:43 +0000 (20:02 -0600)] 
xfs_repair: trash dirattr btrees that cycle to the root

If xfs_repair detects a dir/attr btree that cycles back to the root, the
tree should be cleared and/or rebuilt instead of simply aborting the
repair program.

[sandeen: move check outside main loop]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_repair: zero shared_vn
Darrick J. Wong [Thu, 26 Jan 2017 02:02:43 +0000 (20:02 -0600)] 
xfs_repair: zero shared_vn

Since shared_vn always has to be zero, zero it at the start of repair.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_repair: strengthen geometry checks
Darrick J. Wong [Thu, 26 Jan 2017 02:02:43 +0000 (20:02 -0600)] 
xfs_repair: strengthen geometry checks

In xfs_repair, the inodelog, sectlog, and dirblklog values are read
directly into the xfs_mount structure without any sanity checking by the
verifier.  This results in xfs_repair segfaulting when those fields have
ridiculously high values because the pointer arithmetic runs us off the
end of the metadata buffers.  Therefore, reject the superblock if these
values are garbage and try to find one of the other ones.  Clean up the
dblocks checking to use the relevant macros.

The superblock field fuzzer (xfs/1301) triggers all these segfaults.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_db: fix the 'source' command when passed as a -c option
Darrick J. Wong [Thu, 26 Jan 2017 02:02:43 +0000 (20:02 -0600)] 
xfs_db: fix the 'source' command when passed as a -c option

The 'source' command is supposed to read commands out of a file and
execute them.  This works great when done from an interactive command
line, but it doesn't work at all when invoked from the command line
because we never actually do anything with the opened file.

So don't load stdin into the input stack when we're only executing
command line options, and use that to decide if source_f is executing
from the command line so that we can actually run the input loop.  We'll
use this for the per-field fuzzing xfstests.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agolibxfs: sanitize agcount on load
Eric Sandeen [Thu, 26 Jan 2017 02:02:43 +0000 (20:02 -0600)] 
libxfs: sanitize agcount on load

Before we get into libxfs_initialize_perag and try to blindly
allocate a perag struct for every (possibly corrupted number of)
AGs, see if we can read the last one.  If not, assume it's corrupt,
and load only the first AG.

Do this only for an arbitrarily high-ish agcount, so that normal-ish
geometry on a possibly truncated file or device will still do
its best to make all readable AGs available.

Set xfs_db's exitcode to 1 if this happens.

Also teach metadump to detect this and exit appropriately if
truncated, as it resets exitcode to 0 for its own purposes internally.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_io: add DAX and CoW extent-size flags to chattr manpage
Eryu Guan [Thu, 26 Jan 2017 02:02:42 +0000 (20:02 -0600)] 
xfs_io: add DAX and CoW extent-size flags to chattr manpage

Manpage is not updated after adding set/clear DAX and CoW
extent-size flags support to xfs_io.

Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_io: fix missing syncfs command
Amir Goldstein [Thu, 26 Jan 2017 02:02:42 +0000 (20:02 -0600)] 
xfs_io: fix missing syncfs command

Fixes commit c7dd81c7cd ("xfs_io: add sync and syncfs commands")

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_logprint: handle log operation split of inode item correctly
Hou Tao [Thu, 26 Jan 2017 02:02:42 +0000 (20:02 -0600)] 
xfs_logprint: handle log operation split of inode item correctly

If an inode log item has 4 log operations, and the 4th operation
(attr fork op) is splitted to the next log record due to the size
limitation of log record, xfs_logprint doesn't check whether or not
the 4th operation is in the current log record and print invalid data.

xfs_logprint also needs to calculate the count of splitted log
operations correctly instead of just returning 1.

The following is a diff of the output before and after the patch
is applied:

  =====================================================================
  cycle: 120  version: 2      lsn: 120,11014  tail_lsn: 120,427
  length of Log Record: 32256 prev offset: 10984      num ops: 243
  ......
  h_size: 32768
  ---------------------------------------------------------------------
  Oper (0): tid: 2db4353b  len: 0  clientid: TRANS  flags: START
  ......
  ---------------------------------------------------------------------
  Oper (240): tid: 2db4353b  len: 56  clientid: TRANS  flags: none
  INODE: #regs: 4   ino: 0x200a4bf  flags: 0x45   dsize: 64
          blkno: 10506832  len: 16  boff: 7936
  Oper (241): tid: 2db4353b  len: 96  clientid: TRANS  flags: none
  INODE CORE
  ......
  Oper (242): tid: 2db4353b  len: 64  clientid: TRANS  flags: none
  EXTENTS inode data
 -Oper (243): tid: 150000  len: 83886080  clientid: ERROR  flags: none
 -LOCAL attr data
  =====================================================================
  cycle: 120  version: 2      lsn: 120,11078  tail_lsn: 120,427
  length of Log Record: 3584  prev offset: 11014      num ops: 44
  ......
  h_size: 32768
  ---------------------------------------------------------------------
  Oper (0): tid: 2db4353b  len: 52  clientid: TRANS  flags: none
 +Left over region from split log item
  ---------------------------------------------------------------------
  Oper (1): tid: 2db4353b  len: 56  clientid: TRANS  flags: none
  INODE: #regs: 3   ino: 0x100047b  flags: 0x5   dsize: 64
  ......
  ---------------------------------------------------------------------

Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfsprogs: remove irix support
Christoph Hellwig [Thu, 26 Jan 2017 02:02:42 +0000 (20:02 -0600)] 
xfsprogs: remove irix support

The port of the opensource xfsprogs to IRIX always was secondary to the
"real" tools in the IRIX tree.  IRIX has effectively been EOLed, and
dropping support for it will allow cleaning up various things in the XFS
tree where IRIX was oddly different from the other ports.  E.g. that the
xfsctl function needs a path and a fd, something we could replace with
just documenting to use ioctl in the future, or various ifdefs in the
headers shared with the kernel for structures provided by native headers
in IRIX.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: sanity check inode di_mode
Amir Goldstein [Thu, 26 Jan 2017 02:02:42 +0000 (20:02 -0600)] 
xfs: sanity check inode di_mode

Source kernel commit: a324cbf10a3c67aaa10c9f47f7b5801562925bc2

Check for invalid file type in xfs_dinode_verify()
and fail to load the inode structure from disk.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: replace xfs_mode_to_ftype table with switch statement
Amir Goldstein [Thu, 26 Jan 2017 02:02:41 +0000 (20:02 -0600)] 
xfs: replace xfs_mode_to_ftype table with switch statement

Source kernel commit: 1fc4d33fed124fb182e8e6c214e973a29389ae83

The size of the xfs_mode_to_ftype[] conversion table
was too small to handle an invalid value of mode=S_IFMT.

Instead of fixing the table size, replace the conversion table
with a conversion helper that uses a switch statement.

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: add missing include dependencies to xfs_dir2.h
Amir Goldstein [Thu, 26 Jan 2017 02:02:41 +0000 (20:02 -0600)] 
xfs: add missing include dependencies to xfs_dir2.h

Source kernel commit: b597dd5373a1ccc08218665dc8417433b1c09550

xfs_dir2.h dereferences some data types in inline functions
and fails to include those type definitions, e.g.:
xfs_dir2_data_aoff_t, struct xfs_da_geometry.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: sanity check directory inode di_size
Amir Goldstein [Thu, 26 Jan 2017 02:02:41 +0000 (20:02 -0600)] 
xfs: sanity check directory inode di_size

Source kernel commit: 3c6f46eacd876bd723a9bad3c6882714c052fd8e

This changes fixes an assertion hit when fuzzing on-disk
i_mode values.

The easy case to fix is when changing an empty file
i_mode to S_IFDIR. In this case, xfs_dinode_verify()
detects an illegal zero size for directory and fails
to load the inode structure from disk.

For the case of non empty file whose i_mode is changed
to S_IFDIR, the ASSERT() statement in xfs_dir2_isblock()
is replaced with return -EFSCORRUPTED, to avoid interacting
with corrupted jusk also when XFS_DEBUG is disabled.

Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_repair: update the manual content about xfs_repair exit status
Zirong Lang [Thu, 12 Jan 2017 20:12:42 +0000 (14:12 -0600)] 
xfs_repair: update the manual content about xfs_repair exit status

The man 8 xfs_repair said "xfs_repair run without the -n option will
always return a status code of 0". That's not correct.

xfs_repair will return 2 if it finds a fs log which needs to be
replayed or cleared, 1 if runtime error is encountered, and 0 for
all other cases.

[ sandeen: editing for clarity ]

Signed-off-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_repair.8: document dirty log conditions
Darrick J. Wong [Thu, 12 Jan 2017 20:12:42 +0000 (14:12 -0600)] 
xfs_repair.8: document dirty log conditions

Add a section describing what is a dirty log, why xfs_repair won't touch
such things, and what one can do to clear the condition and check the
filesystem.

[ sandeen: light editing, minor man-page-ification ]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agolibxfs-apply: minor improvements
Eric Sandeen [Thu, 12 Jan 2017 20:12:42 +0000 (14:12 -0600)] 
libxfs-apply: minor improvements

Three quick improvements to libxfs-apply:

- Skip already-cross-merged commits, based on the
  "Source XXX commit" line in the commitlog.

- Be clearer about which patch failed if it does

- Clean up guilt better after a failed application

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agobuild: add tar.xz target
Eric Sandeen [Thu, 12 Jan 2017 20:12:42 +0000 (14:12 -0600)] 
build: add tar.xz target

kup generates .xz files, and fedora RPMs now use that.
It'd be nice to have a handy target to generate .xz
files locally, so hack that in.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoUpdate licenses in COPYING file
Eric Sandeen [Thu, 12 Jan 2017 20:12:42 +0000 (14:12 -0600)] 
Update licenses in COPYING file

doc/COPYING was a mess, it included old versions of licenses
and included the GPLv2 licence twice.  Update licenses,
URLs, and elimiate duplicate info.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfsprogs: fix a couple 32-bit build warnings
Eric Sandeen [Thu, 12 Jan 2017 20:12:41 +0000 (14:12 -0600)] 
xfsprogs: fix a couple 32-bit build warnings

mremap_f can't turn a long long into a pointer, and
dump_dirent needs proper %llx for u64 printing.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_io: implement 'utimes' command
Deepa Dinamani [Thu, 12 Jan 2017 20:12:41 +0000 (14:12 -0600)] 
xfs_io: implement 'utimes' command

Add the utimes command to provide a way to utilize
the futimens C library call. This is the
interface to the utimensat system call, which updates
the mtime and atime of a file.

[ sandeen: minor merge fixups, re-alphabetization ]

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agolibxcmd: add non-iterating user commands
Dave Chinner [Thu, 12 Jan 2017 20:12:41 +0000 (14:12 -0600)] 
libxcmd: add non-iterating user commands

Right now command iteration is not directly controllable by the
user; it is controlled entirely by the application command flag
setup. Sometimes we don't want commands to iterate but only operate
on the currently selected object.

For example, the stat command iterates:

$ xfs_io -c "open -r foo" -c "open bar" -c "file" -c "stat" foo
 000  foo            (foreign,non-sync,non-direct,read-write)
 001  foo            (foreign,non-sync,non-direct,read-only)
[002] bar            (foreign,non-sync,non-direct,read-write)
fd.path = "foo"
fd.flags = non-sync,non-direct,read-write
stat.ino = 462399
stat.type = regular file
stat.size = 776508
stat.blocks = 1528
fd.path = "foo"
fd.flags = non-sync,non-direct,read-only
stat.ino = 462399
stat.type = regular file
stat.size = 776508
stat.blocks = 1528
fd.path = "bar"
fd.flags = non-sync,non-direct,read-write
stat.ino = 475227
stat.type = regular file
stat.size = 0
stat.blocks = 0
$

To do this, add a function to supply a "non-iterating" user command
that will execute an iterating-capable command as though it
CMD_FLAG_ONESHOT was set. Add a new command line option to xfs_io to
drive it (-C <command>) and connect it all up. Document it in the
xfs_io man page, too.

The result of "-C stat":

$ xfs_io -c "open -r foo" -c "open bar" -c "file" -C "stat" foo
 000  foo            (foreign,non-sync,non-direct,read-write)
 001  foo            (foreign,non-sync,non-direct,read-only)
[002] bar            (foreign,non-sync,non-direct,read-write)
fd.path = "bar"
fd.flags = non-sync,non-direct,read-write
stat.ino = 475227
stat.type = regular file
stat.size = 0
stat.blocks = 0
$

Is that we only see the stat output for the active open file
which is "bar".

[ sandeen: fix arg in command_loop printf ]

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_io: make various commands one-shot only
Dave Chinner [Thu, 12 Jan 2017 20:12:41 +0000 (14:12 -0600)] 
xfs_io: make various commands one-shot only

It makes no sense to iterate the file table for some xfs_io
commands. Some commands are already marked in this way, but lots of
them are not and this leads to bad behaviour. For example, the open
command will run until the process fd table is full and EMFILE is
returned rather than just opening the specified file once.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agolibxcmd: don't check generic library commands
Dave Chinner [Thu, 12 Jan 2017 20:12:41 +0000 (14:12 -0600)] 
libxcmd: don't check generic library commands

The generic "help" and "quit" commands have different methods of
skipping user provided command check functions that may prevent them
from running. xfs_quota use CMD_ALL_FSTYPES and xfs_io uses
CMD_FLAG_ONESHOT.  Add a new CMD_FLAG_LIBRARY to indicate commands
that should not be checked against application specific check
functions so they are always present and can be run regardless of
the context in which they are run.

This gets rid of the CMD_ALL_FSTYPES flag, and enables us to remove
the ONESHOT check in xfs_io so we use only app specific flags for
determining if app commands should run or not.

[ sandeen: remove CMD_ALL_FSTYPES definition ]

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agolibxcmd: merge command() and iterate_command()
Dave Chinner [Thu, 12 Jan 2017 20:12:41 +0000 (14:12 -0600)] 
libxcmd: merge command() and iterate_command()

Simplify the command loop further by merging the command loop
iteration checks with the command execution function. This removes
all visibility of command iteration from the main command execution
loop, and enables us to factor and clean up the command loop
processing neatly.

[ sandeen: align process_input() args ]

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agolibxcmd: rename args_command to command_iterator
Dave Chinner [Thu, 12 Jan 2017 20:12:41 +0000 (14:12 -0600)] 
libxcmd: rename args_command to command_iterator

It is not particularly easy to understand the function of the
args_command abstraction. it's actually a command iterator interface
that allows callers to specify the target of the command and iterate
the command multiple times over different targets. Rename and
document the abstraction to make this functionality clear.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agolibxcmd: check CMD_FLAG_GLOBAL inside args_command()
Dave Chinner [Thu, 12 Jan 2017 20:12:41 +0000 (14:12 -0600)] 
libxcmd: check CMD_FLAG_GLOBAL inside args_command()

Rather than having multiple methods of executing commands from the
CLI, use CMD_FLAG_GLOBAL to indicate a one-shot command rather than
an iterative command from args_command(). This simplifies the main
loop processing.

To make it more obvious what this CMD_FLAG_GLOBAL flag does, rename
it to CMD_FLAG_ONESHOT to indicate that the command should only ever
be executed once and not iterated.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_io: fix some documentation problems
Darrick J. Wong [Thu, 12 Jan 2017 20:12:41 +0000 (14:12 -0600)] 
xfs_io: fix some documentation problems

Describe the numberless (i.e. "reflink the whole file") behavior
in the xfs_io help system and since the clone/dedupe ioctls were
promoted to the VFS before XFS reflink landed, mention those in
the manpage.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_io: fix the minimum arguments to the reflink command
Darrick J. Wong [Thu, 12 Jan 2017 20:12:41 +0000 (14:12 -0600)] 
xfs_io: fix the minimum arguments to the reflink command

The reflink command can reflink the entirety of two files if the
offsets and lengths are not specified... but we forgot to permit
that case.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_io: prefix dedupe command error messages consistently
Darrick J. Wong [Thu, 12 Jan 2017 20:12:40 +0000 (14:12 -0600)] 
xfs_io: prefix dedupe command error messages consistently

Prefix the perror output of the dedupe command consistently.  All the
other perror calls reference the ioctl name directly, so we might as
well do that for all the dedupe cases.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_io: Improvements to copy_range return code handling
Anna Schumaker [Thu, 12 Jan 2017 20:12:40 +0000 (14:12 -0600)] 
xfs_io: Improvements to copy_range return code handling

If copy_file_range() returns 0, then that means no data was copied.  We
should break out of the loop in this case to prevent looping
indefinitely.

Additionally, if an error is returned by copy_file_range() then we need
to print out the string form to be used by error checking tests in
xfstests.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_io: implement 'set_encpolicy' and 'get_encpolicy' commands
Eric Biggers [Thu, 12 Jan 2017 20:12:40 +0000 (14:12 -0600)] 
xfs_io: implement 'set_encpolicy' and 'get_encpolicy' commands

Add set_encpolicy and get_encpolicy commands to xfs_io so that xfstests
will be able to test filesystem encryption using the actual user API,
not just hacked in with a mount option.  These commands use the common
"fscrypt" API currently implemented by ext4 and f2fs, but it's also
under development for ubifs and planned for xfs.

Note that to get encrypted files to actually work, it's also necessary
to add a key to the kernel keyring.  This patch does not add a command
for this to xfs_io because it's possible to do it using keyctl.  keyctl
can also be used to remove keys, revoke keys, invalidate keys, etc.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_logprint: di_gen is unsigned
Eric Sandeen [Thu, 12 Jan 2017 20:12:40 +0000 (14:12 -0600)] 
xfs_logprint: di_gen is unsigned

di_gen is unsigned:

        __uint32_t      di_gen;         /* generation number */

but we print it as a signed int in logprint, so see oddities like:

 forkoff:24  dmevmask:0x0  dmstate:0  flags:0x0  gen:-628807103

Fix this.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_io.8: remove duplicate .TP commands
Eric Biggers [Thu, 12 Jan 2017 20:12:40 +0000 (14:12 -0600)] 
xfs_io.8: remove duplicate .TP commands

The right margin of the xfs_io man page was being pushed inwards due to
extra .TP commands, making all the text below hard to read.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agobuild: add missing AC_PROG_INSTALL
Jan Engelhardt [Thu, 12 Jan 2017 20:12:40 +0000 (14:12 -0600)] 
build: add missing AC_PROG_INSTALL

$ autoreconf -fi
$ ./configure
configure: error: cannot find install-sh, install.sh, or shtool in . "."/.

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agobuild: resolve autoheader warning
Jan Engelhardt [Thu, 12 Jan 2017 20:12:40 +0000 (14:12 -0600)] 
build: resolve autoheader warning

$ autoreconf -fi
[...]
autoheader: warning: missing template: HAVE_UMODE_T
autoheader: Use AC_DEFINE([HAVE_UMODE_T], [], [Description])
autoreconf: /usr/bin/autoheader failed with exit status: 1

Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs_io: fix building with musl
Ralph Sennhauser [Thu, 12 Jan 2017 20:12:40 +0000 (14:12 -0600)] 
xfs_io: fix building with musl

The fallback in case the libc doesn't have or doesn't advertise the
existence of d_reclen in struct dirent uses d_namlen. Musl neither
advertises d_reclen nor does it have a d_namlen member.

Calculate the value for d_namlen from d_name in the fallback path.

Signed-off-by: Ralph Sennhauser <ralph.sennhauser@gmail.com>
Reviewed--by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agobuild: Allow compiling xfsprogs in a cross compile environment
Gwendal Grignou [Thu, 12 Jan 2017 20:12:29 +0000 (14:12 -0600)] 
build: Allow compiling xfsprogs in a cross compile environment

Without this patch, we are using the same compiler and options for the host
compiler (BUILD_CC) and the target compiler (CC), and we would get error
messages at compilation:
x86_64-pc-linux-gnu-gcc -O2 -O2 -pipe -march=armv7-a -mtune=cortex-a15 ...
x86_64-pc-linux-gnu-gcc.real: error: unrecognized command line option
'-mfpu=neon'
'-mfloat-abi=hard'
'-clang-syntax'
'-mfpu=neon'
'-mfloat-abi=hard'
'-clang-syntax'

Add BUILD_CC and BUILD_CFLAGS as precious variables to allow setting it up
from the ebuild.

Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Reviewed-by: Mike Frysinger <vapier@gentoo.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: don't rely on ->total in xfs_alloc_space_available libxfs-4.10-sync
Christoph Hellwig [Wed, 11 Jan 2017 02:09:21 +0000 (20:09 -0600)] 
xfs: don't rely on ->total in xfs_alloc_space_available

Source kernel commit: 12ef830198b0d71668eb9b59f9ba69d32951a48a

->total is a bit of an odd parameter passed down to the low-level
allocator all the way from the high-level callers.  It's supposed to
contain the maximum number of blocks to be allocated for the whole
transaction [1].

But in xfs_iomap_write_allocate we only convert existing delayed
allocations and thus only have a minimal block reservation for the
current transaction, so xfs_alloc_space_available can't use it for
the allocation decisions.  Use the maximum of args->total and the
calculated block requirement to make a decision.  We probably should
get rid of args->total eventually and instead apply ->minleft more
broadly, but that will require some extensive changes all over.

[1] which creates lots of confusion as most callers don't decrement it
once doing a first allocation.  But that's for a separate series.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: adjust allocation length in xfs_alloc_space_available
Christoph Hellwig [Wed, 11 Jan 2017 02:08:21 +0000 (20:08 -0600)] 
xfs: adjust allocation length in xfs_alloc_space_available

Source kernel commit: 54fee133ad59c87ab01dd84ab3e9397134b32acb

We must decide in xfs_alloc_fix_freelist if we can perform an
allocation from a given AG is possible or not based on the available
space, and should not fail the allocation past that point on a
healthy file system.

But currently we have two additional places that second-guess
xfs_alloc_fix_freelist: xfs_alloc_ag_vextent tries to adjust the
maxlen parameter to remove the reservation before doing the
allocation (but ignores the various minium freespace requirements),
and xfs_alloc_fix_minleft tries to fix up the allocated length
after we've found an extent, but ignores the reservations and also
doesn't take the AGFL into account (and thus fails allocations
for not matching minlen in some cases).

Remove all these later fixups and just correct the maxlen argument
inside xfs_alloc_fix_freelist once we have the AGF buffer locked.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: fix bogus minleft manipulations
Christoph Hellwig [Wed, 11 Jan 2017 02:07:48 +0000 (20:07 -0600)] 
xfs: fix bogus minleft manipulations

Source kernel commit: 255c516278175a6dc7037d1406307f35237d8688

We can't just set minleft to 0 when we're low on space - that's exactly
what we need minleft for: to protect space in the AG for btree block
allocations when we are low on free space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: bump up reserved blocks in xfs_alloc_set_aside
Christoph Hellwig [Wed, 11 Jan 2017 02:07:05 +0000 (20:07 -0600)] 
xfs: bump up reserved blocks in xfs_alloc_set_aside

Source kernel commit: 5149fd327f16e393c1d04fa5325ab072c32472bf

Setting aside 4 blocks globally for bmbt splits isn't all that useful,
as different threads can allocate space in parallel.  Bump it to 4
blocks per AG to allow each thread that is currently doing an
allocation to dip into it separately.  Without that we may no have
enough reserved blocks if there are enough parallel transactions
in an almost out space file system that all run into bmap btree
splits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: use the actual AG length when reserving blocks
Darrick J. Wong [Wed, 11 Jan 2017 02:06:43 +0000 (20:06 -0600)] 
xfs: use the actual AG length when reserving blocks

Source kernel commit: 20e73b000bcded44a91b79429d8fa743247602ad

We need to use the actual AG length when making per-AG reservations,
since we could otherwise end up reserving more blocks out of the last
AG than there are actual blocks.

Complained-about-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: use GPF_NOFS when allocating btree cursors
Darrick J. Wong [Tue, 10 Jan 2017 02:18:50 +0000 (20:18 -0600)] 
xfs: use GPF_NOFS when allocating btree cursors

Source kernel commit: b24a978c377be5f14e798cb41238e66fe51aab2f

Use NOFS for allocating btree cursors, since they can be called
under the ilock.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: ignore leaf attr ichdr.count in verifier during log replay
Eric Sandeen [Tue, 10 Jan 2017 02:18:50 +0000 (20:18 -0600)] 
xfs: ignore leaf attr ichdr.count in verifier during log replay

Source kernel commit: 2e1d23370e75d7d89350d41b4ab58c7f6a0e26b2

When we create a new attribute, we first create a shortform
attribute, and try to fit the new attribute into it.
If that fails, we copy the (empty) attribute into a leaf attribute,
and do the copy again.  Thus there can be a transient state where
we have an empty leaf attribute.

If we encounter this during log replay, the verifier will fail.
So add a test to ignore this part of the leaf attr verification
during log replay.

Thanks as usual to dchinner for spotting the problem.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: optimise CRC updates
Dave Chinner [Tue, 10 Jan 2017 02:18:50 +0000 (20:18 -0600)] 
xfs: optimise CRC updates

Source kernel commit: cae028df53449905c944603df624ac94bc619661

Nick Piggin reported that the CRC overhead in an fsync heavy
workload was higher than expected on a Power8 machine. Part of this
was to do with the fact that the power8 CRC implementation is not
efficient for CRC lengths of less than 512 bytes, and so the way we
split the CRCs over the CRC field means a lot of the CRCs are
reduced to being less than than optimal size.

To optimise this, change the CRC update mechanism to zero the CRC
field first, and then compute the CRC in one pass over the buffer
and write the result back into the buffer. We can do this safely
because anything writing a CRC has exclusive access to the buffer
the CRC is being calculated over.

We leave the CRC verify code the same - it still splits the CRC
calculation - because we do not want read-only operations modifying
the underlying buffer. This is because read-only operations may not
have an exclusive access to the buffer guaranteed, and so temporary
modifications could leak out to to other processes accessing the
buffer concurrently.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: make xfs btree stats less huge
Dave Chinner [Tue, 10 Jan 2017 02:18:49 +0000 (20:18 -0600)] 
xfs: make xfs btree stats less huge

Source kernel commit: 11ef38afe98cc7ad1a46ef24945232ec1760d5e2

Embedding a switch statement in every btree stats inc/add adds a lot
of code overhead to the core btree infrastructure paths. Stats are
supposed to be small and lightweight, but the btree stats have
become big and bloated as we've added more btrees. It needs fixing
because the reflink code will just add more overhead again.

Convert the v2 btree stats to arrays instead of independent
variables, and instead use the type to index the specific btree
array via an enum. This allows us to use array based indexing
to update the stats, rather than having to derefence variables
specific to the btree type.

If we then wrap the xfsstats structure in a union and place uint32_t
array beside it, and calculate the correct btree stats array base
array index when creating a btree cursor,  we can easily access
entries in the stats structure without having to switch names based
on the btree type.

We then replace with the switch statement with a simple set of stats
wrapper macros, resulting in a significant simplification of the
btree stats code, and:

text    data     bss     dec     hex filename
48905     144       8   49057    bfa1 fs/xfs/libxfs/xfs_btree.o.old
36793     144       8   36945    9051 fs/xfs/libxfs/xfs_btree.o

it reduces the core btree infrastructure code size by close to 25%!

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: don't allow di_size with high bit set
Darrick J. Wong [Tue, 10 Jan 2017 02:18:49 +0000 (20:18 -0600)] 
xfs: don't allow di_size with high bit set

Source kernel commit: ef388e2054feedaeb05399ed654bdb06f385d294

The on-disk field di_size is used to set i_size, which is a signed
integer of loff_t.  If the high bit of di_size is set, we'll end up with
a negative i_size, which will cause all sorts of problems.  Since the
VFS won't let us create a file with such length, we should catch them
here in the verifier too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: error out if trying to add attrs and anextents > 0
Darrick J. Wong [Tue, 10 Jan 2017 02:18:49 +0000 (20:18 -0600)] 
xfs: error out if trying to add attrs and anextents > 0

Source kernel commit: 0f352f8ee8412bd9d34fb2a6411241da61175c0e

We shouldn't assert if somehow we end up trying to add an attr fork to
an inode that apparently already has attr extents because this is an
indication of on-disk corruption.  Instead, return an error code to
userspace.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: don't crash if reading a directory results in an unexpected hole
Darrick J. Wong [Tue, 10 Jan 2017 02:18:49 +0000 (20:18 -0600)] 
xfs: don't crash if reading a directory results in an unexpected hole

Source kernel commit: 96a3aefb8ffde23180130460b0b2407b328eb727

In xfs_dir3_data_read, we can encounter the situation where err == 0 and
*bpp == NULL if the given bno offset happens to be a hole; this leads to
a crash if we try to set the buffer type after the _da_read_buf call.
Holes can happen due to corrupt or malicious entries in the bmbt data,
so be a little more careful when we're handling buffers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: complain if we don't get nextents bmap records
Darrick J. Wong [Tue, 10 Jan 2017 02:18:49 +0000 (20:18 -0600)] 
xfs: complain if we don't get nextents bmap records

Source kernel commit: 356a3225222e5bc4df88aef3419fb6424f18ab69

When reading into memory all extents of a btree-format inode fork,
complain if the number of extents we find is not the same as the number
of extents reported in the inode core.  This is needed to stop an IO
action from accessing the garbage areas of the in-core fork.

[dchinner: removed redundant assert]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: check for bogus values in btree block headers
Darrick J. Wong [Tue, 10 Jan 2017 02:18:49 +0000 (20:18 -0600)] 
xfs: check for bogus values in btree block headers

Source kernel commit: bb3be7e7c1c18e1b141d4cadeb98cc89ecf78099

When we're reading a btree block, make sure that what we retrieved
matches the owner and level; and has a plausible number of records.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: forbid AG btrees with level == 0
Darrick J. Wong [Tue, 10 Jan 2017 02:18:49 +0000 (20:18 -0600)] 
xfs: forbid AG btrees with level == 0

Source kernel commit: d2a047f31e86941fa896e0e3271536d50aba415e

There is no such thing as a zero-level AG btree since even a single-node
zero-records btree has one level.  Btree cursor constructors read
cur_nlevels straight from disk and then access things like
cur_bufs[cur_nlevels - 1] which is /really/ bad if cur_nlevels is zero!
Therefore, strengthen the verifiers to prevent this possibility.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: several xattr functions can be void
Eric Sandeen [Tue, 10 Jan 2017 02:18:49 +0000 (20:18 -0600)] 
xfs: several xattr functions can be void

Source kernel commit: f7a136aee3c1c3f7daf87197b3b3c361744a2812

There are a handful of xattr functions which now return
nothing but zero.  They can be made void, chased through calling
functions, and error handling etc can be removed.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: handle cow fork in xfs_bmap_trace_exlist
Eric Sandeen [Tue, 10 Jan 2017 02:18:49 +0000 (20:18 -0600)] 
xfs: handle cow fork in xfs_bmap_trace_exlist

Source kernel commit: c44a1f22626c153976289e1cd67bdcdfefc16e1f

By inspection, xfs_bmap_trace_exlist isn't handling cow forks,
and will trace the data fork instead.

Fix this by setting state appropriately if whichfork
== XFS_COW_FORK.

()___()
< @ @ >
 |   |
 {o_o}
  (|)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: pass state not whichfork to trace_xfs_extlist
Eric Sandeen [Tue, 10 Jan 2017 02:18:48 +0000 (20:18 -0600)] 
xfs: pass state not whichfork to trace_xfs_extlist

Source kernel commit: 7710517fc37b1899722707883b54694ea710b3c0

When xfs_bmap_trace_exlist called trace_xfs_extlist,
it sent in the "whichfork" var instead of the bmap "state"
as expected (even though state was already set up for this
purpose).

As a result, the xfs_bmap_class in tracing code used
"whichfork" not state in xfs_iext_state_to_fork(), and got
the wrong ifork pointer.  It all goes downhill from
there, including an ASSERT when ifp_bytes is empty
by the time it reaches xfs_iext_get_ext():

XFS: Assertion failed: idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: Move AGI buffer type setting to xfs_read_agi
Eric Sandeen [Tue, 10 Jan 2017 02:18:48 +0000 (20:18 -0600)] 
xfs: Move AGI buffer type setting to xfs_read_agi

Source kernel commit: 200237d6746faaeaf7f4ff4abbf13f3917cee60a

We've missed properly setting the buffer type for
an AGI transaction in 3 spots now, so just move it
into xfs_read_agi() and set it if we are in a transaction
to avoid the problem in the future.

This is similar to how it is done in i.e. the dir3
and attr3 read functions.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: track preallocation separately in xfs_bmapi_reserve_delalloc()
Brian Foster [Tue, 10 Jan 2017 02:18:48 +0000 (20:18 -0600)] 
xfs: track preallocation separately in xfs_bmapi_reserve_delalloc()

Source kernel commit: 974ae922efd93b07b6cdf989ae959883f6f05fd8

Speculative preallocation is currently processed entirely by the callers
of xfs_bmapi_reserve_delalloc(). The caller determines how much
preallocation to include, adjusts the extent length and passes down the
resulting request.

While this works fine for post-eof speculative preallocation, it is not
as reliable for COW fork preallocation. COW fork preallocation is
implemented via the cowextszhint, which aligns the start offset as well
as the length of the extent. Further, it is difficult for the caller to
accurately identify when preallocation occurs because the returned
extent could have been merged with neighboring extents in the fork.

To simplify this situation and facilitate further COW fork preallocation
enhancements, update xfs_bmapi_reserve_delalloc() to take a separate
preallocation parameter to incorporate into the allocation request. The
preallocation blocks value is tacked onto the end of the request and
adjusted to accommodate neighboring extents and extent size limits.
Since xfs_bmapi_reserve_delalloc() now knows precisely how much
preallocation was included in the allocation, it can also tag the inodes
appropriately to support preallocation reclaim.

Note that xfs_bmapi_reserve_delalloc() callers are not yet updated to
use the preallocation mechanism. This patch should not change behavior
outside of correctly tagging reflink inodes when start offset
preallocation occurs (which the caller does not handle correctly).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agofs: xfs: libxfs: constify xfs_nameops structures
Bhumika Goyal [Tue, 10 Jan 2017 02:18:48 +0000 (20:18 -0600)] 
fs: xfs: libxfs: constify xfs_nameops structures

Source kernel commit: cf7841c12d85d1fe0ad33fb8bc5746809a882010

Declare the structure xfs_nameops as const as it is only stored in the
m_dirnameops field of a xfs_mount structure. This field is of type
const struct xfs_nameops *, so xfs_nameops structures having this
property can be declared as const.
Done using Coccinelle:
@r1 disable optional_qualifier @
identifier i;
position p;
@@
static struct xfs_nameops i@p = {...};

@ok1@
identifier r1.i;
position p;
struct xfs_mount mp;
@@
mp.m_dirnameops=&i@p

@bad@
position p!={r1.p,ok1.p};
identifier r1.i;
@@
i@p

@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
static
+const
struct xfs_nameops i={...};

@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
+const
struct xfs_nameops i;

File size before:
text    data     bss     dec     hex filename
5302      85       0    5387    150b fs/xfs/libxfs/xfs_dir2.o

File size after:
text    data     bss     dec     hex filename
5318      69       0    5387    150b fs/xfs/libxfs/xfs_dir2.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: factor rmap btree size into the indlen calculations
Darrick J. Wong [Tue, 10 Jan 2017 02:18:48 +0000 (20:18 -0600)] 
xfs: factor rmap btree size into the indlen calculations

Source kernel commit: fd26a88093bab6529ea2de819114ca92dbd1d71d

When we're estimating the amount of space it's going to take to satisfy
a delalloc reservation, we need to include the space that we might need
to grow the rmapbt.  This helps us to avoid running out of space later
when _iomap_write_allocate needs more space than we reserved.  Eryu Guan
observed this happening on generic/224 when sunit/swidth were set.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: remove NULLEXTNUM
Christoph Hellwig [Tue, 10 Jan 2017 02:18:47 +0000 (20:18 -0600)] 
xfs: remove NULLEXTNUM

Source kernel commit: 0e8d630ba039d9976d250eedb82c3a423ad15447

We only ever set a field to this constant for an impossible to reach
error case in xfs_bmap_search_extents.  That functions has been removed,
so we can remove the constant as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: remove xfs_bmap_search_extents
Christoph Hellwig [Tue, 10 Jan 2017 02:18:47 +0000 (20:18 -0600)] 
xfs: remove xfs_bmap_search_extents

Source kernel commit: 6edc977f775e5ac10655b03607ef091d2b06f2f6

Now that all users are gone.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agorepair: use new extent lookup helpers in bmap_next_offset
Eric Sandeen [Tue, 10 Jan 2017 02:18:47 +0000 (20:18 -0600)] 
repair: use new extent lookup helpers in bmap_next_offset

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
7 years agoxfs: remove prev argument to xfs_bmapi_reserve_delalloc
Christoph Hellwig [Tue, 10 Jan 2017 02:18:47 +0000 (20:18 -0600)] 
xfs: remove prev argument to xfs_bmapi_reserve_delalloc

Source kernel commit: 65c5f419788d623a0410eca1866134f5e4628594

We can easily lookup the previous extent for the cases where we need it,
which saves the callers from looking it up for us later in the series.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>