]> git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log
thirdparty/xfsprogs-dev.git
9 years agoxfs_io: bmap should support querying CoW fork, shared blocks
Darrick J. Wong [Tue, 25 Oct 2016 22:14:32 +0000 (15:14 -0700)] 
xfs_io: bmap should support querying CoW fork, shared blocks

Teach the bmap command to report shared and delayed allocation
extents, and to be able to query the CoW fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
9 years agoxfs_growfs: report the presence of the reflink feature
Darrick J. Wong [Tue, 25 Oct 2016 22:14:32 +0000 (15:14 -0700)] 
xfs_growfs: report the presence of the reflink feature

Report the presence of the reflink feature in xfs_info.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs_db: print one array element per line
Darrick J. Wong [Tue, 25 Oct 2016 22:14:32 +0000 (15:14 -0700)] 
xfs_db: print one array element per line

Print one array element per line so that the debugger output isn't
a gigantic pile of screen snow.

Before (inobt):

xfs_db> p recs
recs[1-55] = [startino,holemask,count,freecount,free]
1:[128,0,64,0,0] 2:[4288,0xff,32,0,0xffffffff] 3:[4352,0,64,0,0]
4:[4416,0,64,10,0x1f0003e000000000] 5:[4480,0,64,17,0xc00e1803c2007840]

After:

xfs_db> p recs
recs[1-55] = [startino,holemask,count,freecount,free]
1:[128,0,64,0,0]
2:[4288,0xff,32,0,0xffffffff]
3:[4352,0,64,0,0]
4:[4416,0,64,10,0x1f0003e000000000]
5:[4480,0,64,17,0xc00e1803c2007840]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
9 years agoxfs_db: deal with the CoW extent size hint
Darrick J. Wong [Tue, 25 Oct 2016 22:14:32 +0000 (15:14 -0700)] 
xfs_db: deal with the CoW extent size hint

Display the CoW extent hint size when dumping inodes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs_db: metadump should copy the refcount btree too
Darrick J. Wong [Tue, 25 Oct 2016 22:14:32 +0000 (15:14 -0700)] 
xfs_db: metadump should copy the refcount btree too

Teach metadump to copy the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs_db: add support for checking the refcount btree
Darrick J. Wong [Tue, 25 Oct 2016 22:14:31 +0000 (15:14 -0700)] 
xfs_db: add support for checking the refcount btree

Do some basic checks of the refcount btree.  xfs_repair will have to
check that the reference counts match the various bmbt mappings.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
9 years agoxfs_db: dump refcount btree data
Darrick J. Wong [Tue, 25 Oct 2016 22:14:31 +0000 (15:14 -0700)] 
xfs_db: dump refcount btree data

Add the ability to walk and dump the refcount btree in xfs_db.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
9 years agolibxfs: add fsxattr flags and fields for cowextsize
Darrick J. Wong [Tue, 25 Oct 2016 22:14:31 +0000 (15:14 -0700)] 
libxfs: add fsxattr flags and fields for cowextsize

Add the cowextsize field and flag to each platform's struct fsxattr
definitions.  We can compile these definitions into the xfsprogs
utilities if we don't pick them up from the system headers, such as on
kernels prior to 4.9.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
9 years agolibxfs: free the CoW fork from an inode
Darrick J. Wong [Tue, 25 Oct 2016 22:14:31 +0000 (15:14 -0700)] 
libxfs: free the CoW fork from an inode

Clean up the CoW fork, should there ever be one.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
9 years agolibxfs: plumb in bmap deferred op log items
Darrick J. Wong [Tue, 25 Oct 2016 22:14:31 +0000 (15:14 -0700)] 
libxfs: plumb in bmap deferred op log items

Add a deferred op handler for block mapping actions.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
9 years agolibxfs: plumb in refcount deferred op log items
Darrick J. Wong [Tue, 25 Oct 2016 22:14:31 +0000 (15:14 -0700)] 
libxfs: plumb in refcount deferred op log items

Add a deferred op handler for refcount update actions.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
9 years agolibxfs: add xfs_refcount.h to the standard include list
Darrick J. Wong [Tue, 25 Oct 2016 22:14:30 +0000 (15:14 -0700)] 
libxfs: add xfs_refcount.h to the standard include list

Pick up the definitions in xfs_refcount.h for all compilation units.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
9 years agolibxfs: initialize the in-core mount context for refcount btrees
Darrick J. Wong [Tue, 25 Oct 2016 22:14:30 +0000 (15:14 -0700)] 
libxfs: initialize the in-core mount context for refcount btrees

Initialize the refcount btree maxlevel field of the mount context.
This helps us to detect overly tall trees.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs_buflock: handling parsing errors more gracefully
Darrick J. Wong [Tue, 25 Oct 2016 22:14:30 +0000 (15:14 -0700)] 
xfs_buflock: handling parsing errors more gracefully

Skip ftrace output lines that don't parse.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs_logprint: fix up the RUI printing code to reflect new format
Darrick J. Wong [Tue, 25 Oct 2016 22:14:30 +0000 (15:14 -0700)] 
xfs_logprint: fix up the RUI printing code to reflect new format

We changed the RUI format to use a variable length array, so update
the logprint code to reflect that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: defer should abort intent items if the trans roll fails libxfs-4.9-sync
Darrick J. Wong [Tue, 25 Oct 2016 02:00:12 +0000 (13:00 +1100)] 
xfs: defer should abort intent items if the trans roll fails

Source kernel commit: b77428b12b55437b28deae738d9ce8b2e0663b55

If the deferred ops transaction roll fails, we need to abort the intent
items if we haven't already logged a done item for it, regardless of
whether or not the deferred ops has had a transaction committed.  Dave
found this while running generic/388.

Move the tracepoint to make it easier to track object lifetimes.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: remove xfs_bunmapi_cow
Christoph Hellwig [Tue, 25 Oct 2016 01:59:49 +0000 (12:59 +1100)] 
xfs: remove xfs_bunmapi_cow

Source kernel commit: 64e6428ddd00f864e3ca105f914a2b6920c2bc41

Since no one uses it anymore.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: refactor xfs_bunmapi_cow
Christoph Hellwig [Tue, 25 Oct 2016 01:59:46 +0000 (12:59 +1100)] 
xfs: refactor xfs_bunmapi_cow

Source kernel commit: fa5c836ca8eb5bad6316ddfc066acbc4e2485356

Split out two helpers for deleting delayed or real extents from the COW fork.
This allows to call them directly from xfs_reflink_cow_end_io once that
function is refactored to iterate the extent tree.  It will also allow
to reuse the delalloc deletion from xfs_bunmapi in the future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: add xfs_trim_extent
Darrick J. Wong [Tue, 25 Oct 2016 01:47:36 +0000 (12:47 +1100)] 
xfs: add xfs_trim_extent

Source kernel commit: 0a0af28cad9a43d90f13c2047bd8ee3d4cffb7f3

This helpers allows to trim an extent to a subset of it's original range
while making sure the block numbers in it remain valid,

In the future xfs_trim_extent and xfs_bmapi_trim_map should probably be
merged in some form.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: split from a previous patch from Darrick, moved around and added
support for "raw" delayed extents"]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agolibxfs: clean up _calc_dquots_per_chunk
Darrick J. Wong [Tue, 25 Oct 2016 01:47:36 +0000 (12:47 +1100)] 
libxfs: clean up _calc_dquots_per_chunk

Source kernel commit: 58d789678546d46d7bbd809dd7dab417c0f23655

The function xfs_calc_dquots_per_chunk takes a parameter in units
of basic blocks.  The kernel seems to get the units wrong, but
userspace got 'fixed' by commenting out the unnecessary conversion.
Fix both.

cc: <stable@vger.kernel.org>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: remove pointless error goto in xfs_bmap_remap_alloc
Eric Sandeen [Tue, 25 Oct 2016 01:47:14 +0000 (12:47 +1100)] 
xfs: remove pointless error goto in xfs_bmap_remap_alloc

Source kernel commit: fe23759eaf2f6540de20c1623f066aad967ff9c9

The commit:

f65306ea xfs: map an inode's offset to an exact physical block

added a pointless error0: target; remove it.

Addresses-Coverity-Id: 1373865
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: add some 'static' annotations
Eric Biggers [Tue, 25 Oct 2016 01:47:14 +0000 (12:47 +1100)] 
xfs: add some 'static' annotations

Source kernel commit: f1b8243c55ca6fd2a3898e2f586b8cfcfff684bb

sparse reported that several variables and a function were not
forward-declared anywhere and therefore should be 'static'.

Found with sparse by running 'make C=2 CF=-D__CHECK_ENDIAN__ fs/xfs/'

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: remove redundant assignment of ifp
Colin Ian King [Tue, 25 Oct 2016 01:47:14 +0000 (12:47 +1100)] 
xfs: remove redundant assignment of ifp

Source kernel commit: 1d55a4bfd080ff4c6c96acfccfb7cdd2615ed6c2

Remove redundant ifp = ifp statement, it does nothing. Found with
static analysis by CoverityScan.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: rework refcount cow recovery error handling
Darrick J. Wong [Tue, 25 Oct 2016 01:47:13 +0000 (12:47 +1100)] 
xfs: rework refcount cow recovery error handling

Source kernel commit: 6f97077ff6ef28e0f3b361b6ba9c95a222ef384b

The error handling in xfs_refcount_recover_cow_leftovers is confused
and can potentially leak memory, so rework it to release resources
correctly on error.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: implement swapext for rmap filesystems
Darrick J. Wong [Tue, 25 Oct 2016 01:47:13 +0000 (12:47 +1100)] 
xfs: implement swapext for rmap filesystems

Source kernel commit: 1f08af52e7c981e9877796a2d90b0e0f08666945

Implement swapext for filesystems that have reverse mapping.  Back in
the reflink patches, we augmented the bmap code with a 'REMAP' flag
that updates only the bmbt and doesn't touch the allocator and
implemented log redo items for those two operations.  Now we can
rewrite extent swapping as a (looong) series of remap operations.

This is far less efficient than the fork swapping method implemented
in the past, so we only switch this on for rmap.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: recognize the reflink feature bit
Darrick J. Wong [Tue, 25 Oct 2016 01:46:51 +0000 (12:46 +1100)] 
xfs: recognize the reflink feature bit

Source kernel commit: e54b5bf9d7aeb92d92c7f5115035e6a851d0f0c5

Add the reflink feature flag to the set of recognized feature flags.
This enables users to write to reflink filesystems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: simulate per-AG reservations being critically low
Darrick J. Wong [Tue, 25 Oct 2016 01:46:51 +0000 (12:46 +1100)] 
xfs: simulate per-AG reservations being critically low

Source kernel commit: a35eb41519ab8db90e87d375ee9362d6e080ca4c

Create an error injection point that enables us to simulate being
critically low on per-AG block reservations.  This should enable us to
simulate this specific ENOSPC condition so that we can test falling back
to a regular file copy.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: don't mix reflink and DAX mode for now
Darrick J. Wong [Tue, 25 Oct 2016 01:46:51 +0000 (12:46 +1100)] 
xfs: don't mix reflink and DAX mode for now

Source kernel commit: 4f435ebe7d0422af61cdcddbbcc659888645a1e1

Since we don't have a strategy for handling both DAX and reflink,
for now we'll just prohibit both being set at the same time.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: check for invalid inode reflink flags
Darrick J. Wong [Tue, 25 Oct 2016 01:46:51 +0000 (12:46 +1100)] 
xfs: check for invalid inode reflink flags

Source kernel commit: c8e156ac336d82f67d7adc014404a2251e9dad09

We don't support sharing blocks on the realtime device.  Flag inodes
with the reflink or cowextsize flags set when the reflink feature is
disabled.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: convert unwritten status of reverse mappings for shared files
Darrick J. Wong [Tue, 25 Oct 2016 01:46:51 +0000 (12:46 +1100)] 
xfs: convert unwritten status of reverse mappings for shared files

Source kernel commit: 3f165b334e51477d2b33ac1c81b39927514daab7

Provide a function to convert an unwritten extent to a real one and
vice versa when shared extents are possible.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: use interval query for rmap alloc operations on shared files
Darrick J. Wong [Tue, 25 Oct 2016 01:46:22 +0000 (12:46 +1100)] 
xfs: use interval query for rmap alloc operations on shared files

Source kernel commit: ceeb9c832eeca5c1c2efc54a38f67283ccb60288

When it's possible for reverse mappings to overlap (data fork extents
of files on reflink filesystems), use the interval query function to
find the left neighbor of an extent we're trying to add; and be
careful to use the lookup functions to update the neighbors and/or
add new extents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: add shared rmap map/unmap/convert log item types
Darrick J. Wong [Tue, 25 Oct 2016 01:43:48 +0000 (12:43 +1100)] 
xfs: add shared rmap map/unmap/convert log item types

Source kernel commit: 0e07c039bac5f6ce7e3bc512ab9efb4aaa76da94

Wire up some rmap log redo item type codes to map, unmap, or convert
shared data block extents.  The actual log item recovery comes in a
later patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: increase log reservations for reflink
Darrick J. Wong [Tue, 25 Oct 2016 01:43:48 +0000 (12:43 +1100)] 
xfs: increase log reservations for reflink

Source kernel commit: 80de462e090c2c346ca6ec6344b326e81e8cef84

Increase the log reservations to handle the increased rolling that
happens at the end of copy-on-write operations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: try other AGs to allocate a BMBT block
Darrick J. Wong [Tue, 25 Oct 2016 01:43:48 +0000 (12:43 +1100)] 
xfs: try other AGs to allocate a BMBT block

Source kernel commit: 90e2056d76adc7894a019f5289d259de58065e13

Prior to the introduction of reflink, allocating a block and mapping
it into a file was performed in a single transaction with a single
block reservation, and the allocator was supposed to find enough
blocks to allocate the extent and any BMBT blocks that might be
necessary (unless we're low on space).

However, due to the way copy on write works, allocation and mapping
have been split into two transactions, which means that we must be
able to handle the case where we allocate an extent for CoW but that
AG runs out of free space before the blocks can be mapped into a file,
and the mapping requires a new BMBT block.  When this happens, look in
one of the other AGs for a BMBT block instead of taking the FS down.

The same applies to the functions that convert a data fork to extents
and later btree format.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: preallocate blocks for worst-case btree expansion
Darrick J. Wong [Tue, 25 Oct 2016 01:43:48 +0000 (12:43 +1100)] 
xfs: preallocate blocks for worst-case btree expansion

Source kernel commit: 84d6961910ea7b3ae8d8338f5b4df25dea68cee9

To gracefully handle the situation where a CoW operation turns a
single refcount extent into a lot of tiny ones and then run out of
space when a tree split has to happen, use the per-AG reserved block
pool to pre-allocate all the space we'll ever need for a maximal
btree.  For a 4K block size, this only costs an overhead of 0.3% of
available disk space.

When reflink is enabled, we have an unfortunate problem with rmap --
since we can share a block billions of times, this means that the
reverse mapping btree can expand basically infinitely.  When an AG is
so full that there are no free blocks with which to expand the rmapbt,
the filesystem will shut down hard.

This is rather annoying to the user, so use the AG reservation code to
reserve a "reasonable" amount of space for rmap.  We'll prevent
reflinks and CoW operations if we think we're getting close to
exhausting an AG's free space rather than shutting down, but this
permanent reservation should be enough for "most" users.  Hopefully.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch@lst.de: ensure that we invalidate the freed btree buffer]
Signed-off-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: create a separate cow extent size hint for the allocator
Darrick J. Wong [Tue, 25 Oct 2016 01:43:42 +0000 (12:43 +1100)] 
xfs: create a separate cow extent size hint for the allocator

Source kernel commit: f7ca35227253dc8244fd908140b06010e67a31e5

Create a per-inode extent size allocator hint for copy-on-write.  This
hint is separate from the existing extent size hint so that CoW can
take advantage of the fragmentation-reducing properties of extent size
hints without disabling delalloc for regular writes.

The extent size hint that's fed to the allocator during a copy on
write operation is the greater of the cowextsize and regular extsize
hint.

During reflink, if we're sharing the entire source file to the entire
destination file and the destination file doesn't already have a
cowextsize hint, propagate the source file's cowextsize hint to the
destination file.

Furthermore, zero the bulkstat buffer prior to setting the fields
so that we don't copy kernel memory contents into userspace.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: teach get_bmapx about shared extents and the CoW fork
Darrick J. Wong [Tue, 25 Oct 2016 01:42:12 +0000 (12:42 +1100)] 
xfs: teach get_bmapx about shared extents and the CoW fork

Source kernel commit: f86f403794b1446b68afb3c233d4c0bc0e93b654

Teach xfs_getbmapx how to report shared extents and CoW fork contents
accurately in the bmap output by querying the refcount btree
appropriately.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: store in-progress CoW allocations in the refcount btree
Darrick J. Wong [Tue, 25 Oct 2016 01:41:50 +0000 (12:41 +1100)] 
xfs: store in-progress CoW allocations in the refcount btree

Source kernel commit: 174edb0e46e520230791a1a894397b7c824cefc4

Due to the way the CoW algorithm in XFS works, there's an interval
during which blocks allocated to handle a CoW can be lost -- if the FS
goes down after the blocks are allocated but before the block
remapping takes place.  This is exacerbated by the cowextsz hint --
allocated reservations can sit around for a while, waiting to get
used.

Since the refcount btree doesn't normally store records with refcount
of 1, we can use it to record these in-progress extents.  In-progress
blocks cannot be shared because they're not user-visible, so there
shouldn't be any conflicts with other programs.  This is a better
solution than holding EFIs during writeback because (a) EFIs can't be
relogged currently, (b) even if they could, EFIs are bound by
available log space, which puts an unnecessary upper bound on how much
CoW we can have in flight, and (c) we already have a mechanism to
track blocks.

At mount time, read the refcount records and free anything we find
with a refcount of 1 because those were in-progress when the FS went
down.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: support removing extents from CoW fork
Darrick J. Wong [Tue, 25 Oct 2016 01:38:27 +0000 (12:38 +1100)] 
xfs: support removing extents from CoW fork

Source kernel commit: 4862cfe825c0087c14452b362e708a35da675f5e

Create a helper method to remove extents from the CoW fork without
any of the side effects (rmapbt/bmbt updates) of the regular extent
deletion routine.  We'll eventually use this to clear out the CoW fork
during ioend processing.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: support allocating delayed extents in CoW fork
Darrick J. Wong [Tue, 25 Oct 2016 01:38:27 +0000 (12:38 +1100)] 
xfs: support allocating delayed extents in CoW fork

Source kernel commit: 60b4984fc3924bff292ec46b95a3e98b34b8e259

Modify xfs_bmap_add_extent_delay_real() so that we can convert delayed
allocation extents in the CoW fork to real allocations, and wire this
up all the way back to xfs_iomap_write_allocate().  In a subsequent
patch, we'll modify the writepage handler to call this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: support bmapping delalloc extents in the CoW fork
Darrick J. Wong [Tue, 25 Oct 2016 01:37:50 +0000 (12:37 +1100)] 
xfs: support bmapping delalloc extents in the CoW fork

Source kernel commit: be51f8119c2f5e27437d2c4271f6419f3b8e609f

Allow the creation of delayed allocation extents in the CoW fork.  In
a subsequent patch we'll wire up iomap_begin to actually do this via
reflink helper functions.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: introduce the CoW fork
Darrick J. Wong [Tue, 25 Oct 2016 01:37:28 +0000 (12:37 +1100)] 
xfs: introduce the CoW fork

Source kernel commit: 3993baeb3c52f497d243a4a3b5510df97b22596b

Introduce a new in-core fork for storing copy-on-write delalloc
reservations and allocated extents that are in the process of being
written out.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: don't allow reflinked dir/dev/fifo/socket/pipe files
Darrick J. Wong [Tue, 25 Oct 2016 01:37:28 +0000 (12:37 +1100)] 
xfs: don't allow reflinked dir/dev/fifo/socket/pipe files

Source kernel commit: 11715a21bc3035440b853a0334685f1a55ca8c3c

Only non-rt files can be reflinked, so check that when we load an
inode.  Also, don't leak the attr fork if there's a failure.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: add reflink feature flag to geometry
Darrick J. Wong [Tue, 25 Oct 2016 01:37:28 +0000 (12:37 +1100)] 
xfs: add reflink feature flag to geometry

Source kernel commit: f0ec1b8ef11df0a51954df7e3ff3ca4aadb0d34b

Report the reflink feature in the XFS geometry so that xfs_info and
friends know the filesystem has this feature.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: return work remaining at the end of a bunmapi operation
Darrick J. Wong [Tue, 25 Oct 2016 01:37:28 +0000 (12:37 +1100)] 
xfs: return work remaining at the end of a bunmapi operation

Source kernel commit: 4453593be6c54e7581467e80f4a2757be098a3a2

Return the range of file blocks that bunmapi didn't free.  This hint
is used by CoW and reflink to figure out what part of an extent
actually got freed so that it can set up the appropriate atomic
remapping of just the freed range.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: implement deferred bmbt map/unmap operations
Darrick J. Wong [Tue, 25 Oct 2016 01:37:20 +0000 (12:37 +1100)] 
xfs: implement deferred bmbt map/unmap operations

Source kernel commit: 9f3afb57d5f1e7145986132106c6ca91f8136cc2

Implement deferred versions of the inode block map/unmap functions.
These will be used in subsequent patches to make reflink operations
atomic.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: pass bmapi flags through to bmap_del_extent
Darrick J. Wong [Tue, 25 Oct 2016 01:31:59 +0000 (12:31 +1100)] 
xfs: pass bmapi flags through to bmap_del_extent

Source kernel commit: 4847acf868bb426455c8b703c80ed5fc5e2ee556

Pass BMAPI_ flags from bunmapi into bmap_del_extent and extend
BMAPI_REMAP (which means "don't touch the allocator or the quota
accounting") to apply to bunmapi as well.  This will be used to
implement the unmap operation, which will be used by swapext.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: map an inode's offset to an exact physical block
Darrick J. Wong [Tue, 25 Oct 2016 01:31:37 +0000 (12:31 +1100)] 
xfs: map an inode's offset to an exact physical block

Source kernel commit: f65306ea5246ef3ff68a6abf85f5a73a04903366

Teach the bmap routine to know how to map a range of file blocks to a
specific range of physical blocks, instead of simply allocating fresh
blocks.  This enables reflink to map a file to blocks that are already
in use.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: log bmap intent items
Darrick J. Wong [Tue, 25 Oct 2016 01:30:19 +0000 (12:30 +1100)] 
xfs: log bmap intent items

Source kernel commit: 77d61fe45e720577a2cc0e9580fbc57d8faa7232

Provide a mechanism for higher levels to create BUI/BUD items, submit
them to the log, and a stub function to deal with recovered BUI items.
These parts will be connected to the rmapbt in a later patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: create bmbt update intent log items
Darrick J. Wong [Tue, 25 Oct 2016 01:29:43 +0000 (12:29 +1100)] 
xfs: create bmbt update intent log items

Source kernel commit: 6413a01420c2fbf03b3d059795f541caeb962e86

Create bmbt update intent/done log items to record redo information in
the log.  Because we roll transactions multiple times for reflink
operations, we also have to track the status of the metadata updates
that will be recorded in the post-roll transactions in case we crash
before committing the final transaction.  This mechanism enables log
recovery to finish what was already started.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: introduce reflink utility functions
Darrick J. Wong [Tue, 25 Oct 2016 01:29:22 +0000 (12:29 +1100)] 
xfs: introduce reflink utility functions

Source kernel commit: 350a27a6a65cc5dd2ba1b220e8641993414816d2

These functions will be used by the other reflink functions to find
the maximum length of a range of shared blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.coM>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: reserve AG space for the refcount btree root
Darrick J. Wong [Tue, 25 Oct 2016 01:26:50 +0000 (12:26 +1100)] 
xfs: reserve AG space for the refcount btree root

Source kernel commit: d0e853f3600cd2a3f7c4a067dc38155c77c51df9

Reduce the max AG usable space size so that we always have space for
the refcount btree root.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: adjust refcount when unmapping file blocks
Darrick J. Wong [Tue, 25 Oct 2016 01:26:48 +0000 (12:26 +1100)] 
xfs: adjust refcount when unmapping file blocks

Source kernel commit: 62aab20f08758b1b171a73a54e0c72dd12beb980

When we're unmapping blocks from a reflinked file, decrease the
refcount of the affected blocks and free the extents that are no
longer in use.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: connect refcount adjust functions to upper layers
Darrick J. Wong [Tue, 25 Oct 2016 01:24:46 +0000 (12:24 +1100)] 
xfs: connect refcount adjust functions to upper layers

Source kernel commit: 33ba6129208475ec3aeffe6e9dad9f9afe022405

Plumb in the upper level interface to schedule and finish deferred
refcount operations via the deferred ops mechanism.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: adjust refcount of an extent of blocks in refcount btree
Darrick J. Wong [Tue, 25 Oct 2016 01:24:45 +0000 (12:24 +1100)] 
xfs: adjust refcount of an extent of blocks in refcount btree

Source kernel commit: 3172725814f9a689d6e8b3c7979b66403abf5dae

Provide functions to adjust the reference counts for an extent of
physical blocks stored in the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: log refcount intent items
Darrick J. Wong [Tue, 25 Oct 2016 01:20:28 +0000 (12:20 +1100)] 
xfs: log refcount intent items

Source kernel commit: f997ee2137175f5b2bd7ced52acf1ca51f04f420

Provide a mechanism for higher levels to create CUI/CUD items, submit
them to the log, and a stub function to deal with recovered CUI items.
These parts will be connected to the refcountbt in a later patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: create refcount update intent log items
Darrick J. Wong [Tue, 25 Oct 2016 01:20:28 +0000 (12:20 +1100)] 
xfs: create refcount update intent log items

Source kernel commit: baf4bcacb715cebd412b2f4bb69989ef24496523

Create refcount update intent/done log items to record redo
information in the log.  Because we need to roll transactions between
updating the bmbt mapping and updating the reverse mapping, we also
have to track the status of the metadata updates that will be recorded
in the post-roll transactions, just in case we crash before committing
the final transaction.  This mechanism enables log recovery to finish
what was already started.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: add refcount btree operations
Darrick J. Wong [Tue, 25 Oct 2016 01:20:26 +0000 (12:20 +1100)] 
xfs: add refcount btree operations

Source kernel commit: bdf28630b72154e5766cbad5874576b6f22e7237

Implement the generic btree operations required to manipulate refcount
btree blocks.  The implementation is similar to the bmapbt, though it
will only allocate and free blocks from the AG.

Since the refcount root and level fields are separate from the
existing roots and levels array, they need a separate logging flag.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: fix logging of AGF refcount btree fields]
Signed-off-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: account for the refcount btree in the alloc/free log reservation
Darrick J. Wong [Tue, 25 Oct 2016 01:01:06 +0000 (12:01 +1100)] 
xfs: account for the refcount btree in the alloc/free log reservation

Source kernel commit: f310bd2ecd37b17bf0042c9d1595329057970eb6

Every time we allocate or free a data extent, we might need to split
the refcount btree.  Reserve some blocks in the transaction to handle
this possibility.  Even though the deferred refcount code can roll a
transaction to avoid overloading the transaction, we can still exceed
the reservation.

Certain pathological workloads (1k blocks, no cowextsize hint, random
directio writes), cause a perfect storm wherein a refcount adjustment
of a large range of blocks causes full tree splits in two separate
extents in two separate refcount tree blocks; allocating new refcount
tree blocks causes rmap btree splits; and all the allocation activity
causes the freespace btrees to split, blowing the reservation.

(Reproduced by generic/167 over NFS atop XFS)

Signed-off-by: Christoph Hellwig <hch@lst.de>
[darrick.wong@oracle.com: add commit message]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
9 years agoxfs: define the on-disk refcount btree format
Darrick J. Wong [Tue, 25 Oct 2016 01:00:53 +0000 (12:00 +1100)] 
xfs: define the on-disk refcount btree format

Source kernel commit: 1946b91cee4fc8ae25450673e4d4f35e9b462e9e

Start constructing the refcount btree implementation by establishing
the on-disk format and everything needed to read, write, and
manipulate the refcount btree blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: refcount btree add more reserved blocks
Darrick J. Wong [Tue, 25 Oct 2016 00:49:22 +0000 (11:49 +1100)] 
xfs: refcount btree add more reserved blocks

Source kernel commit: af30dfa14411e9df0e69c6e46e8c6c467b88229d

Since XFS reserves a small amount of space in each AG as the minimum
free space needed for an operation, save some more space in case we
touch the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: introduce refcount btree definitions
Darrick J. Wong [Tue, 25 Oct 2016 00:48:45 +0000 (11:48 +1100)] 
xfs: introduce refcount btree definitions

Source kernel commit: 46eeb521b95247170d2db773bb4cc8fb3de1d85c

Add new per-AG refcount btree definitions to the per-AG structures.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
9 years agoxfs: remote attribute blocks aren't really userdata
Dave Chinner [Tue, 25 Oct 2016 00:46:35 +0000 (11:46 +1100)] 
xfs: remote attribute blocks aren't really userdata

Source kernel commit: 292378edcb408c652e841fdc867fc14f8b4995fa

When adding a new remote attribute, we write the attribute to the
new extent before the allocation transaction is committed. This
means we cannot reuse busy extents as that violates crash
consistency semantics. Hence we currently treat remote attribute
extent allocation like userdata because it has the same overwrite
ordering constraints as userdata.

Unfortunately, this also allows the allocator to incorrectly apply
extent size hints to the remote attribute extent allocation. This
results in interesting failures, such as transaction block
reservation overruns and in-memory inode attribute fork corruption.

To fix this, we need to separate the busy extent reuse configuration
from the userdata configuration. This changes the definition of
XFS_BMAPI_METADATA slightly - it now means that allocation is
metadata and reuse of busy extents is acceptible due to the metadata
ordering semantics of the journal. If this flag is not set, it
means the allocation is that has unordered data writeback, and hence
busy extent reuse is not allowed. It no longer implies the
allocation is for user data, just that the data write will not be
strictly ordered. This matches the semantics for both user data
and remote attribute block allocation.

As such, This patch changes the "userdata" field to a "datatype"
field, and adds a "no busy reuse" flag to the field.
When we detect an unordered data extent allocation, we immediately set
the no reuse flag. We then set the "user data" flags based on the
inode fork we are allocating the extent to. Hence we only set
userdata flags on data fork allocations now and consider attribute
fork remote extents to be an unordered metadata extent.

The result is that remote attribute extents now have the expected
allocation semantics, and the data fork allocation behaviour is
completely unchanged.

It should be noted that there may be other ways to fix this (e.g.
use ordered metadata buffers for the remote attribute extent data
write) but they are more invasive and difficult to validate both
from a design and implementation POV. Hence this patch takes the
simple, obvious route to fixing the problem...

Reported-and-tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: rewrite and optimize the delalloc write path
Christoph Hellwig [Tue, 25 Oct 2016 00:46:35 +0000 (11:46 +1100)] 
xfs: rewrite and optimize the delalloc write path

Source kernel commit: 51446f5ba44874db4d2a93a6eb61b133e5ec1b3e

Currently xfs_iomap_write_delay does up to lookups in the inode
extent tree, which is rather costly especially with the new iomap
based write path and small write sizes.

But it turns out that the low-level xfs_bmap_search_extents gives us
all the information we need in the regular delalloc buffered write
path:

- it will return us an extent covering the block we are looking up
if it exists.  In that case we can simply return that extent to
the caller and are done
- it will tell us if we are beyoned the last current allocated
block with an eof return parameter.  In that case we can create a
delalloc reservation and use the also returned information about
the last extent in the file as the hint to size our delalloc
reservation.
- it can tell us that we are writing into a hole, but that there is
an extent beyoned this hole.  In this case we can create a
delalloc reservation that covers the requested size (possible
capped to the next existing allocation).

All that can be done in one single routine instead of bouncing up
and down a few layers.  This reduced the CPU overhead of the block
mapping routines and also simplified the code a lot.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: set up per-AG free space reservations
Darrick J. Wong [Tue, 25 Oct 2016 00:46:32 +0000 (11:46 +1100)] 
xfs: set up per-AG free space reservations

Source kernel commit: 3fd129b63fd062a0d8f5d55994a6e98896c20fa7

One unfortunate quirk of the reference count and reverse mapping
btrees -- they can expand in size when blocks are written to *other*
allocation groups if, say, one large extent becomes a lot of tiny
extents.  Since we don't want to start throwing errors in the middle
of CoWing, we need to reserve some blocks to handle future expansion.
The transaction block reservation counters aren't sufficient here
because we have to have a reserve of blocks in every AG, not just
somewhere in the filesystem.

Therefore, create two per-AG block reservation pools.  One feeds the
AGFL so that rmapbt expansion always succeeds, and the other feeds all
other metadata so that refcountbt expansion never fails.

Use the count of how many reserved blocks we need to have on hand to
create a virtual reservation in the AG.  Through selective clamping of
the maximum length of allocation requests and of the length of the
longest free extent, we can make it look like there's less free space
in the AG unless the reservation owner is asking for blocks.

In other words, play some accounting tricks in-core to make sure that
we always have blocks available.  On the plus side, there's nothing to
clean up if we crash, which is contrast to the strategy that the rough
draft used (actually removing extents from the freespace btrees).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
new file mode 100644
index 000000000000..e3ae0f2b4294

9 years agoxfs: defer should allow ->finish_item to request a new transaction
Darrick J. Wong [Tue, 25 Oct 2016 00:26:53 +0000 (11:26 +1100)] 
xfs: defer should allow ->finish_item to request a new transaction

Source kernel commit: 385d655861d221bb43ae69a9cfa9adbefe31ad00

When xfs_defer_finish calls ->finish_item, it's possible that
(refcount) won't be able to finish all the work in a single
transaction.  When this happens, the ->finish_item handler should
shorten the log done item's list count, update the work item to
reflect where work should continue, and return -EAGAIN so that
defer_finish knows to retain the pending item on the pending list,
roll the transaction, and restart processing where we left off.

Plumb in the code and document how this mechanism is supposed to work.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
9 years agoxfs: count the blocks in a btree
Darrick J. Wong [Tue, 25 Oct 2016 00:26:53 +0000 (11:26 +1100)] 
xfs: count the blocks in a btree

Source kernel commit: c611cc0360cd924448c23ccd70ce8be703fcb4a6

Provide a helper method to count the number of blocks in a short form
btree.  The refcount and rmap btrees need to know the number of blocks
already in use to set up their per-AG block reservations during mount.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: create a standard btree size calculator code
Darrick J. Wong [Tue, 25 Oct 2016 00:26:53 +0000 (11:26 +1100)] 
xfs: create a standard btree size calculator code

Source kernel commit: 4ed3f68792f6a9c21a290ae777565e7562a09653

Create a helper to generate AG btree height calculator functions.
This will be used (much) later when we get to the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: remove xfs_btree_bigkey
Darrick J. Wong [Tue, 25 Oct 2016 00:26:51 +0000 (11:26 +1100)] 
xfs: remove xfs_btree_bigkey

Source kernel commit: a1d46cffaf40e04acb0ecab14980ece3ef1ab933

Remove the xfs_btree_bigkey mess and simply make xfs_btree_key big enough
to hold both keys in-core.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: convert RUI log formats to use variable length arrays
Darrick J. Wong [Tue, 25 Oct 2016 00:26:50 +0000 (11:26 +1100)] 
xfs: convert RUI log formats to use variable length arrays

Source kernel commit: cd00158ce34d6e2c42d8892e8499779b8ac1d2bf

Use variable length array declarations for RUI log items,
and replace the open coded sizeof formulae with a single function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: track log done items directly in the deferred pending work item
Darrick J. Wong [Tue, 25 Oct 2016 00:26:47 +0000 (11:26 +1100)] 
xfs: track log done items directly in the deferred pending work item

Source kernel commit: ea78d80866ce375defb2fdd1c8a3aafec95e0f85

Christoph reports slab corruption when a deferred refcount update
aborts during _defer_finish().  The cause of this was broken log item
state tracking in xfs_defer_pending -- upon an abort,
_defer_trans_abort() will call abort_intent on all intent items,
including the ones that have already had a done item attached.

This is incorrect because each intent item has 2 refcount: the first
is released when the intent item is committed to the log; and the
second is released when the _done_ item is committed to the log, or
by the intent creator if there is no done item.  In other words, once
we log the done item, responsibility for releasing the intent item's
second refcount is transferred to the done item and /must not/ be
performed by anything else.

The dfp_committed flag should have been tracking whether or not we had
a done item so that _defer_trans_abort could decide if it needs to
abort the intent item, but due to a thinko this was not the case.  Rip
it out and track the done item directly so that we do the right thing
w.r.t. intent item freeing.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs: fix superblock inprogress check
Dave Chinner [Tue, 25 Oct 2016 00:26:46 +0000 (11:26 +1100)] 
xfs: fix superblock inprogress check

Source kernel commit: f3d7ebdeb2c297bd26272384e955033493ca291c

From inspection, the superblock sb_inprogress check is done in the
verifier and triggered only for the primary superblock via a
"bp->b_bn == XFS_SB_DADDR" check.

Unfortunately, the primary superblock is an uncached buffer, and
hence it is configured by xfs_buf_read_uncached() with:

bp->b_bn = XFS_BUF_DADDR_NULL;  /* always null for uncached buffers */

And so this check never triggers. Fix it.

cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agolibxfs_apply: filterdiff can't handle /dev/null properly
Dave Chinner [Mon, 24 Oct 2016 22:52:21 +0000 (09:52 +1100)] 
libxfs_apply: filterdiff can't handle /dev/null properly

Because we are mangling the diff source/destination locations, we
have to add prefixes to them to get them to apply cleanly as -p1
patches. This is all fine until we create or remove a file and
the the src/dest is /dev/null. Applying a prefix here causes
the diff to be malformed and it won't apply.

Add another hack to work around this limitation of filterdiff when
reformatting the diff into readable format.

Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agolibxfs_apply: filter commits from libxfs only
Dave Chinner [Mon, 24 Oct 2016 22:04:33 +0000 (09:04 +1100)] 
libxfs_apply: filter commits from libxfs only

When pulling commits from the kernel, it's easy to specify a commit
range such as "v4.8..for-next" to indicate we want to pull all
commits for libxfs since the 4.8 kernel release. Unfortunately,
this pull commits from all over the kernel tree, not just
fs/xfs/libxfs.

Filter the commit list retrieval to limit the commits to those touch
fs/xfs/libxfs so that we only attempt to apply the realtively small
number of relevant commits.

Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfsprogs: Release 4.8.0 v4.8.0
Dave Chinner [Mon, 17 Oct 2016 03:17:48 +0000 (14:17 +1100)] 
xfsprogs: Release 4.8.0

Update all the necessary files for a 4.8.0 release.

Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfsprogs: Release 4.8.0-rc3 v4.8.0-rc3
Dave Chinner [Mon, 3 Oct 2016 03:25:45 +0000 (14:25 +1100)] 
xfsprogs: Release 4.8.0-rc3

Update all the necessary files for a 4.8.0-rc3 release.

Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_io: fix inode command with "-n" for bogus inode
Eric Sandeen [Sun, 2 Oct 2016 23:56:00 +0000 (10:56 +1100)] 
xfs_io: fix inode command with "-n" for bogus inode

If we ask for the next allocated inode after a number for which
no other inode exists, the bulkstat returns success, but with
count == 0.  If we ignore this fact, we print a garbage result
from bstat.bs_ino in this case, so fix it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_io: refactor inode command
Eric Sandeen [Sun, 2 Oct 2016 23:50:21 +0000 (10:50 +1100)] 
xfs_io: refactor inode command

The inode_f function is a bit convoluted; the default
find-last-inode case appears at the end, there are several return
points, we print the same basic information using 2 different
variables in 2 different locations depending on the mode we're in,
the "inode not found" was a printf & exit in the middle of the
function, etc.

Move the default case up to the top so it's more obvious, not
buried.

Make a new var, result_ino, which holds whatever we want to print
regardless of the mode, and then handle all the output at the end.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_io: move inode command arg handling to top
Eric Sandeen [Sun, 2 Oct 2016 23:47:47 +0000 (10:47 +1100)] 
xfs_io: move inode command arg handling to top

As it stands, collecting the inode number and testing args validity
is all tangled up; for example the test for "-n" having no inode is
buried in an else after a large code block which handles something
else.

Get inode number argument collection and testing out of the way
before doing anything else.

Clean up the error message if a non-numeric inode arg is given.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_io: factor out new get_last_inode() helper
Eric Sandeen [Sun, 2 Oct 2016 23:46:03 +0000 (10:46 +1100)] 
xfs_io: factor out new get_last_inode() helper

The inode command by default finds the last allocated inode in the
filesystem via bulkstat, and this specific function is open-coded
after other cases are handled, leading to a fairly long inode_f
function and confusing code flow.

Clean it up by factoring it into a new function, more refactoring
will follow.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_io: fix inode command help and argsmax
Eric Sandeen [Sun, 2 Oct 2016 23:44:02 +0000 (10:44 +1100)] 
xfs_io: fix inode command help and argsmax

The short help implied that -n and -v were exclusive, and the longer
help wasn't particularly clear.

Further, argsmax is wrong; "-n -v num" is 3, not 2.

 # xfs_io -c "inode -n -v 123" /mnt/test2
 bad argument count 3 to inode, expected between 0 and 2 arguments
 # xfs_io -c "inode -vn 123" /mnt/test2
 128:32

Fix up all of those issues.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_repair: add freesp btree block overflow to the free space
Darrick J. Wong [Sun, 2 Oct 2016 23:42:22 +0000 (10:42 +1100)] 
xfs_repair: add freesp btree block overflow to the free space

If we overestimate the number of blocks needed to rebuild the free
space btrees to the point that we have more blocks than fit in the
AGFL, save those blocks and reinsert them into the free space at
the end of phase 5.  Previously, the overflow blocks would simply
be lost.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_repair: fix bogosity when rmapping new AGFL blocks
Darrick J. Wong [Sun, 2 Oct 2016 23:40:32 +0000 (10:40 +1100)] 
xfs_repair: fix bogosity when rmapping new AGFL blocks

When repair rebuilds the AGFL, the blocks can come either from the
in-core free space tree or they can come as a result of overestimating
the number of blocks needed to rebuild the on-disk free space btree.
The code in here was trying to only create rmap records for AGFL blocks
that did /not/ come from free space btree rebuild overestimation, but
was totally broken.  The initial and check conditions were totally wrong
if there was any overflow.  Remove a stray debug printf too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agolibxfs: remove unused libxfs_iget arg
Eric Sandeen [Sun, 2 Oct 2016 23:36:40 +0000 (10:36 +1100)] 
libxfs: remove unused libxfs_iget arg

libxfs_iget() is always called with bno == 0.  Which is probably a
good thing, because it then passes bno to xfs_iread as iget_flags!

So remove the libxfs_iget arg, and explicitly pass 0 to xfs_iread
for flags.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agolibxcmd: fix counting of xfs entries in fs_table_insert
Eryu Guan [Sun, 2 Oct 2016 23:36:14 +0000 (10:36 +1100)] 
libxcmd: fix counting of xfs entries in fs_table_insert

Commit bb80e3d6cd04 ("libxcmd: populate fs table with xfs entries
first, foreign entries last") adds a new counter "xfs_fs_count" and
increases the counter when inserting an XFS entry.

But it missed a counter when fs_count is zero (inserting the first
path) and the entry has no FS_FOREIGN bit set, i.e. the first XFS
entry doesn't increase xfs_fs_count.

This results in args_command() mess and infinite loop in xfs/244
when testing v4 XFS (xfs/244 notrun on v5 XFS, but this bug still
reproduces on v5 XFS). e.g.

  mkfs -t xfs -f /dev/sda5
  mount -o pquota /dev/sda5 /mnt/xfs
  mkdir /mnt/xfs/project
  touch /mnt/xfs/project/testfile
  xfs_quota -x -c "project -s -p /mnt/xfs/project/testfile 1" /dev/sda5

Fix it by increasing xfs_fs_count when flags has no FS_FOREIGN bit.

Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfsprogs: Release 4.8.0-rc2 v4.8.0-rc2
Dave Chinner [Fri, 23 Sep 2016 00:22:15 +0000 (10:22 +1000)] 
xfsprogs: Release 4.8.0-rc2

Update all the necessary files for a 4.8.0-rc2 release.

Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_copy: Fix meta UUID handling on multiple copies
Eric Sandeen [Thu, 22 Sep 2016 23:16:52 +0000 (09:16 +1000)] 
xfs_copy: Fix meta UUID handling on multiple copies

Zorro reported that when making multiple copies of a V5
filesystem with xfs_copy while generating new UUIDs, all
but the first copy were corrupt.

Upon inspection, the corruption was related to incorrect UUIDs;
the original UUID, as stamped into every metadata structure,
was not preserved in the sb_meta_uuid field of the superblock
on any but the first copy.

This happened because sb_update_uuid was using the UUID present in
the ag_hdr structure as the unchanging meta-uuid which is to match
existing structures, but it also /updates/ that UUID with the
new identifying UUID present in tcarg.  So the newly-generated
UUIDs moved transitively from tcarg->uuid to ag_hdr->xfs_sb->sb_uuid
to ag_hdr->xfs_sb->sb_meta_uuid each time the function got called.

Fix this by looking instead to the unchanging, original UUID
present in the xfs_sb_t we are given, which reflects the original
filesystem's metadata UUID, and copy /that/ UUID into each target
filesystem's meta_uuid field.

Most of this patch is changing comments and re-ordering tests
to match; the functional change is to simply use the *sb rather
than the *ag_hdr to identify the proper metadata UUID.

Reported-and-tested-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_repair: fix segfault from uninitialized tp in mv_orphanage
Eric Sandeen [Thu, 22 Sep 2016 23:16:47 +0000 (09:16 +1000)] 
xfs_repair: fix segfault from uninitialized tp in mv_orphanage

After 9074815 xfs: better xfs_trans_alloc interface, mv_orphanage
was passing an uninitialized *tp into libxfs_dir_lookup, because
the trans_alloc() call which was present prior to the call got
removed in that commit.

This ultimately led to testing an uninit tp var:

Conditional jump or move depends on uninitialised value(s)
   at 0x434D01: libxfs_trans_read_buf_map (trans.c:554)
   by 0x45152E: libxfs_da_read_buf (xfs_da_btree.c:2610)
   by 0x456ACB: xfs_dir3_block_read (xfs_dir2_block.c:136)
   by 0x4570A8: xfs_dir2_block_lookup_int (xfs_dir2_block.c:675)
   by 0x457DB7: xfs_dir2_block_lookup (xfs_dir2_block.c:623)
   by 0x455F54: libxfs_dir_lookup (xfs_dir2.c:399)
   by 0x421C46: mv_orphanage (phase6.c:1095)
   by 0x4222C2: check_for_orphaned_inodes (phase6.c:3108)
   by 0x423ABD: phase6 (phase6.c:3287)
   by 0x42E4B2: main (xfs_repair.c:933)

and ended with a segfault as we tried to use that tp when
searching for the buffer in xfs_trans_buf_item_match():

        list_for_each_entry(lidp, &tp->t_items, lid_trans) {

I think simply passing in NULL for this tp is sufficient to fix
this; we'll just go read the buffer from disk in
libxfs_trans_read_buf_map rather than trying to find it in an
existing transaction.

Reported-by: Consigliere <admin@russenmafia.at>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agolibxfs: factor mount checks into helper function
Eric Sandeen [Mon, 19 Sep 2016 22:49:41 +0000 (08:49 +1000)] 
libxfs: factor mount checks into helper function

platform_check_ismounted switched to a getmntent() loop after
ustat disappeared on some new platforms.

We also use a similar mechanism for determining the
ro/rw-mounted status of a device in platform_check_iswritable.

Because the loops are essentially the same, factor them into a
single helper which accepts a VERBOSE flag to print info if the
device is found in the checked-for state, and a WRITABLE flag
which only checks specifically for a mounted and /writable/ device.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agomkfs.xfs: clarify ftype defaults in manpage
Eric Sandeen [Mon, 19 Sep 2016 22:48:54 +0000 (08:48 +1000)] 
mkfs.xfs: clarify ftype defaults in manpage

When CRCs were made default, a few leftovers related to its
prior non-default status remained in the manpage, in the ftype
section.  Clean those up, stating the correct default for this
feature.

Reported-by: Chris Murphy <chris@cmurf.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_io: allow chattr & chproj on foreign filesystems
Eric Sandeen [Mon, 19 Sep 2016 06:07:55 +0000 (16:07 +1000)] 
xfs_io: allow chattr & chproj on foreign filesystems

Now that FS_IOC_FSSETXATTR is a generic vfs call, these
functions can be used on non-xfs filesystems, and this is
needed for generic project quota testing.

(not all flags are valid on all filesystems.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
9 years agoxfs_quota: fix free command for foreign fs
Eric Sandeen [Mon, 19 Sep 2016 06:07:50 +0000 (16:07 +1000)] 
xfs_quota: fix free command for foreign fs

The "free" command is really just a fancy df that knows about
log space and realtime blocks for an xfs filesystem.

We can simply use statfs to get more or less the same thing
on a non-xfs filesystem, so, ah, do that I guess, and re-enable
it.

# quota/xfs_quota -f -x -c path -c free /mnt/test
          Filesystem          Pathname
[000] (F) /mnt/test           /dev/sdb1 (uquota)

Filesystem           1K-blocks       Used  Available  Use% Pathname
/dev/sdb1             20511356      45000   20466356    0% /mnt/test

Fix the short help text for -N while we're at it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_quota: un-flag non-foreign-capable commands
Eric Sandeen [Mon, 19 Sep 2016 06:06:57 +0000 (16:06 +1000)] 
xfs_quota: un-flag non-foreign-capable commands

The off command calls XFS_QUOTAOFF / Q_XQUOTAOFF, which calls
quota_disable in the kernel, which returns ENOSYS if the
->quota_enable quota op doesn't exist - and it does not exist
on any non-xfs filesystems.

We could get clever if we wanted it, and send Q_QUOTAOFF
instead for foreign filesystems, but for now it's broken
so just remove the flag.

The free command relies on XFS_IOC_FSGEOMETRY_V1, so unflag it
as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_quota: Enable 3 more foreign commands
Eric Sandeen [Mon, 19 Sep 2016 06:06:36 +0000 (16:06 +1000)] 
xfs_quota: Enable 3 more foreign commands

Enable restore, limit, and timer.

Unsupported commands remain, for lack of kernel support, generally:
warn, quot,, enable, disable, and remove.

xfs_quota> report
User quota on /mnt/test2/git/xfsprogs/mnt (/dev/loop0)
                               Blocks
User ID          Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
root               13          0          0     00 [--------]

xfs_quota> restore -f quotadump
xfs_quota> report
User quota on /mnt/test2/git/xfsprogs/mnt (/dev/loop0)
                               Blocks
User ID          Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
root               13          0          0     00 [--------]
testuser            0      16384      32768     00 [--------]
fsgqa               0     102400     112640     00 [--------]

xfs_quota> limit bsoft=200m fsgqa

xfs_quota> report
User quota on /mnt/test2/git/xfsprogs/mnt (/dev/loop0)
                               Blocks
User ID          Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
root               13          0          0     00 [--------]
testuser            0      16384      32768     00 [--------]
fsgqa               0     204800     112640     00 [--------]

xfs_quota> state -u
User quota state on /mnt/test2/git/xfsprogs/mnt (/dev/loop0)
  Accounting: ON
  Enforcement: ON
  Inode: #12 (16 blocks, 1 extents)
Blocks grace time: [7 days]
Inodes grace time: [7 days]

xfs_quota> timer -b 3days
xfs_quota> state -u
User quota state on /mnt/test2/git/xfsprogs/mnt (/dev/loop0)
  Accounting: ON
  Enforcement: ON
  Inode: #12 (16 blocks, 1 extents)
Blocks grace time: [3 days]
Inodes grace time: [7 days]
Realtime Blocks grace time: [--------]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_quota: add case for foreign fs, disabled regardless of foreign_allowed
Bill O'Donnell [Mon, 19 Sep 2016 06:05:45 +0000 (16:05 +1000)] 
xfs_quota: add case for foreign fs, disabled regardless of foreign_allowed

Some commands are disallowed for foreign filesystems,
regardless of whether or not the -f flag is thrown.
Add a case for this condition and improve commenting
and output messaging accordingly in init_check_command.

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_quota: print and path output formatting: maintain reverse compatibility
Bill O'Donnell [Mon, 19 Sep 2016 06:02:41 +0000 (16:02 +1000)] 
xfs_quota: print and path output formatting: maintain reverse compatibility

This patch adjusts the formatting of the xfs_quota print and
path outputs, in order to maintain reverse compatability:
when -f flag isn't used, need to keep the output same as in
previous version.

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agolibxcmd: populate fs table with xfs entries first, foreign entries last
Bill O'Donnell [Mon, 19 Sep 2016 06:02:22 +0000 (16:02 +1000)] 
libxcmd: populate fs table with xfs entries first, foreign entries last

Commits b20b6c2 and 29647c8 modified xfs_quota for use on
non-XFS filesystems. Modifications to fs_initialise_mounts
(paths.c) resulted in an xfstest fail (xfs/261), due to foreign
fs paths being picked up first from the fs table. The xfs_quota
print command then complained about not being able to print the
foreign paths, instead of previous behavior (quiet).

This patch restores correct behavior, sorting the table so that
xfs entries are first, followed by foreign fs entries. The patch
maintains the order of xfs entries and foreign entries in the
same order as mtab entries. Then, in functions which print all
paths we can simply break at the first foreign path if the -f
switch is not specified.

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_repair: exit with status 2 if log dirtiness is unknown
Eric Sandeen [Mon, 19 Sep 2016 06:01:14 +0000 (16:01 +1000)] 
xfs_repair: exit with status 2 if log dirtiness is unknown

This new case is mostly like the known dirty log case; the log
is corrupt, dirtiness cannot be determined, and a mount/umount
cycle or an xfs_repair -L is required.

So exit with status 2 here as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agoxfs_logprint: remove the printing of transaction type
Hou Tao [Mon, 19 Sep 2016 06:00:04 +0000 (16:00 +1000)] 
xfs_logprint: remove the printing of transaction type

THe kernel stopped using meaningful types in transaction headers
when delayed logging was introduced. Since then the only transaction
type that reaches the journal is a "Checkpoint" type. Since then,
we've effectivey broken the transaction type printing for newer
kernels, and the current kernels don't even have transaction types
internally. Hence this logprint function is stale, broken, and
causing us problems.

This patch removes the transaction type parsing. If a user needs
this information from logprint, we can still build a binary from a
prior version that correctly decoded the transaction type (e.g.
3.2.1) for that purpose.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
9 years agolibxfs: move iswritable "fatal" decision to caller
Eric Sandeen [Mon, 19 Sep 2016 05:53:52 +0000 (15:53 +1000)] 
libxfs: move iswritable "fatal" decision to caller

Simplify platform_check_iswritable by moving the
"fatal" decision up to the (one) caller.  In other words,
simply return whether mounted+writable is true, and
return 1 if so.  Caller decides what to do with that info
based on /its/ "fatal" argument.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>