]> git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log
thirdparty/xfsprogs-dev.git
8 years agoxfs: set XFS_DA_OP_OKNOENT in xfs_attr_get
Eric Sandeen [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)] 
xfs: set XFS_DA_OP_OKNOENT in xfs_attr_get

Source kernel commit: c400ee3ed1b13d45adde68e12254dc6ab6977b59

It's entirely possible for userspace to ask for an xattr which
does not exist.

Normally, there is no problem whatsoever when we ask for such
a thing, but when we look at an obfuscated metadump image
on a debug kernel with selinux, we trip over this ASSERT in
xfs_da3_path_shift():

*result = -ENOENT;      /* we're out of our tree */
ASSERT(args->op_flags & XFS_DA_OP_OKNOENT);

It (more or less) only shows up in the above scenario, because
xfs_metadump obfuscates attr names, but chooses names which
keep the same hash value - and xfs_da3_node_lookup_int does:

if (((retval == -ENOENT) || (retval == -ENOATTR)) &&
(blk->hashval == args->hashval)) {
error = xfs_da3_path_shift(state, &state->path, 1, 1,
&retval);

IOWS, we only get down to the xfs_da3_path_shift() ASSERT
if we are looking for an xattr which doesn't exist, but we
find xattrs on disk which have the same hash, and so might be
a hash collision, so we try the path shift.  When *that*
fails to find what we're looking for, we hit the assert about
XFS_DA_OP_OKNOENT.

Simply setting XFS_DA_OP_OKNOENT in xfs_attr_get solves this
rather corner-case problem with no ill side effects.  It's
fine for an attr name lookup to fail.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agoxfs: fix btree cursor error cleanups
Brian Foster [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)] 
xfs: fix btree cursor error cleanups

Source kernel commit: f307080a626569f89bc8fbad9f936b307aded877

The btree cursor cleanup function takes an error parameter that
affects how buffers are released from the cursor. All buffers are
released in the event of error. Several callers do not specify the
XFS_BTREE_ERROR flag in the event of error, however. This can cause
buffers to hang around locked or with an elevated hold count and
thus lead to umount hangs in the event of errors.

Fix up the xfs_btree_del_cursor() callers to pass XFS_BTREE_ERROR if
the cursor is being torn down due to error.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agolibxfs: fix line lengths
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)] 
libxfs: fix line lengths

Fix some 80-char line length issues.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agolibxfs: remove useless stuff from the kernel
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)] 
libxfs: remove useless stuff from the kernel

Evidently the libxfs-apply script sucked in some fs/xfs/ content from
the kernel patches and an extra redefinition of _bmap_search_extents.
We don't need this, so get rid of it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agolibxfs: return bool from sb_version_hasmetauuid
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)] 
libxfs: return bool from sb_version_hasmetauuid

The kernel's version of this function returns bool, so do so here too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agolibxfs: fix whitespace to match the kernel
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)] 
libxfs: fix whitespace to match the kernel

Fix some minor whitespace errors so that my automated libxfs diff
scanning will stop reporting this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agolibxfs: refactor btree crc verifier
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)] 
libxfs: refactor btree crc verifier

In a65d8d293b ("libxfs: validate metadata LSNs against log on v5
superblocks") the hascrc check was modified to use the helper mp
variable in the kernel.  This was left out of the xfsprogs patch, so
change it here too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agolibxfs: remove unnecessary hascrc test in btree verifiers
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)] 
libxfs: remove unnecessary hascrc test in btree verifiers

xfs_btree_sblock_v5hdr_verify already checks _hascrc, so we can
remove it from the verifier functions.  For whatever reason this
change made it into the kernel but not xfsprogs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agoxfs_io: fix libxfs naming violation
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)] 
xfs_io: fix libxfs naming violation

All the calls to libxfs code should start with 'libxfs'
per libxfs_api_defs.h.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agolibxfs-apply: port to stgit
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)] 
libxfs-apply: port to stgit

Teach libxfs-apply how to talk to a stgit repository
and fix a minor typo in the guilt hunk of apply_patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agotools: create libxfs-diff to compare libxfses
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)] 
tools: create libxfs-diff to compare libxfses

Create a script to compare every file in libxfs to the same files
in another libxfs.  This is useful for comparing upstream kernel
and user progs to look for unported changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agoxfsprogs: Release v4.9.0 v4.9.0
Eric Sandeen [Thu, 5 Jan 2017 22:29:21 +0000 (16:29 -0600)] 
xfsprogs: Release v4.9.0

Update all the necessary files for a 4.9.0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agoxfsprogs: Release v4.9.0-rc1 v4.9.0-rc1
Eric Sandeen [Thu, 22 Dec 2016 22:41:04 +0000 (16:41 -0600)] 
xfsprogs: Release v4.9.0-rc1

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agoFix building xfsprogs on 32-bit platforms
Eric Biggers [Thu, 22 Dec 2016 15:31:29 +0000 (09:31 -0600)] 
Fix building xfsprogs on 32-bit platforms

xfslibs now requires that its users enable transparent largefile
support.  This broke building xfsprogs on 32-bit Linux (with glibc)
because _FILE_OFFSET_BITS=64 was not getting defined.  Although the
autoconf macro AC_SYS_LARGEFILE was intended to define it, this didn't
work because AC_SYS_LARGEFILE will only define _FILE_OFFSET_BITS in a
config header, which doesn't work for xfsprogs because not all .c files
include platform_defs.h as their first include.  Also,
platform_defs.h.in is not generated by autoheader and didn't contain a
template for _FILE_OFFSET_BITS.

Therefore, to fix the problem remove the useless autoconf macros and
instead add -D_FILE_OFFSET_BITS=64 to CFLAGS in builddefs.in.  Use
CFLAGS rather than PCFLAGS because this definition could be needed by
platforms other than "linux", and it doesn't hurt to always define it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Janda <felix.janda@posteo.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agoxfs_quota: Fix test for wrapped id from GETNEXTQUOTA
Eric Sandeen [Thu, 22 Dec 2016 05:10:54 +0000 (23:10 -0600)] 
xfs_quota: Fix test for wrapped id from GETNEXTQUOTA

dump_file and report_mount can be called with null *oid if
we aren't asking for the GETNEXTQUOTA interface, so we
should only test for the GETNEXTQUOTA wrap if *oid is
non-null.  Otherwise we'll deref a null pointer in the
test.

This only happens for certain invocations of reporting,
which apparently are not covered by any regression tests
at this point, at least on new kernels which contain
GETNEXTQUOTA.

Addresses-Coverity-ID: 1397415
Addresses-Coverity-ID: 1397416
Brown-paper-bag-worn-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agoxfs_quota: handle wrapped id from GETNEXTQUOTA
Eric Sandeen [Wed, 21 Dec 2016 05:21:19 +0000 (23:21 -0600)] 
xfs_quota: handle wrapped id from GETNEXTQUOTA

The GETNEXTQUOTA interface in the kernel had a bug
(at least in xfs) where if we pass in UINT_MAX as the
ID, it incremented, warpped, and returned 0 for the next
id.  This would cause userspace to start querying
again at zero, and an xfs_quota "report" command would
loop forever.  This occurred if a quota ID near
UINT max existed, and later offsets within the block
wrapped the xfs_dqid_t.

This will also be fixed in the kernel, but we should also
catch this in userspace, and stop the loop if it happens.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agoxfs_repair: don't indicate dirtiness if FSGEOMETRY fails
Eric Sandeen [Wed, 21 Dec 2016 04:38:06 +0000 (22:38 -0600)] 
xfs_repair: don't indicate dirtiness if FSGEOMETRY fails

Today, pointing repair at an image hosted on a non-xfs
filesystem will result in a XFS_IOC_FSGEOMETRY_V1 failure,
but repair generally proceeds without further problems.

However, calling do_warn() sets fs_is_dirty to 1, so
xfs_repair -n exits with non-zero status, indicating
corruption.  This is incorrect.

Change the message to use do_log so that it does not
incorrectly indicate corruption.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agoxfs_repair: junk leaf attribute if count == 0
Eric Sandeen [Wed, 21 Dec 2016 04:38:01 +0000 (22:38 -0600)] 
xfs_repair: junk leaf attribute if count == 0

We have recently seen a case where, during log replay, the
attr3 leaf verifier reported corruption when encountering a
leaf attribute with a count of 0 in the header.

We chalked this up to a transient state when a shortform leaf
was created, the attribute didn't fit, and we promoted the
(empty) attribute to the larger leaf form.

I've recently been given a metadump of unknown provenance which actually
contains a leaf attribute with count 0 on disk.  This causes the
verifier to fire every time xfs_repair is run:

 Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000

If this 0-count state is detected, we should just junk the leaf, same
as we would do if the count was too high.  With this change, we now
remedy the problem:

 Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000
 bad attribute count 0 in attr block 0, inode 12587828
 problem with attribute contents in inode 12587828
 clearing inode 12587828 attributes
 correcting nblocks for inode 12587828, was 2 - counted 1

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agoxfs_repair: change null check to assertion
Darrick J. Wong [Wed, 21 Dec 2016 04:35:47 +0000 (22:35 -0600)] 
xfs_repair: change null check to assertion

It /should/ be the case that we never run out of records
before we run out of btree blocks, so change the null check
(that was only to appease Coverity) to an assert.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agoxfs_repair: fix some potential null pointer deferences
Darrick J. Wong [Wed, 21 Dec 2016 04:29:01 +0000 (22:29 -0600)] 
xfs_repair: fix some potential null pointer deferences

Fix some potential NULL pointer deferences that Coverity pointed out,
and remove a trivial dead integer check.

Coverity-id: 1375789137579013757911375792
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agoxfs_repair: fix bogus rmapbt record owner check
Darrick J. Wong [Wed, 21 Dec 2016 04:28:01 +0000 (22:28 -0600)] 
xfs_repair: fix bogus rmapbt record owner check

Make the reverse mapping owner check actually validate inode numbers.

Coverity-id: 1371628
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
8 years agoplatform: remove use of off64_t
Felix Janda [Tue, 1 Nov 2016 01:39:20 +0000 (12:39 +1100)] 
platform: remove use of off64_t

Since we force transparent LFS it can be replaced by off_t.

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfs.h: require transparent LFS for all users
Felix Janda [Tue, 1 Nov 2016 01:38:42 +0000 (12:38 +1100)] 
xfs.h: require transparent LFS for all users

Since our interfaces depend on the consistent use of a 64bit offset
type, force downstreams to use transparent LFS (_FILE_OFFSET_BITS=64),
so that it becomes impossible for them to use 32bit interfaces.

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfsprogs: replace statvfs64 by equivalent statvfs
Felix Janda [Tue, 1 Nov 2016 01:38:40 +0000 (12:38 +1100)] 
xfsprogs: replace statvfs64 by equivalent statvfs

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agofsr: remove workaround for statvfs on Mac OS X
Felix Janda [Tue, 1 Nov 2016 01:38:39 +0000 (12:38 +1100)] 
fsr: remove workaround for statvfs on Mac OS X

It can be removed since fsr is no longer built on Mac OS X.

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoMakefile: disable fsr for Mac OS X
Felix Janda [Tue, 1 Nov 2016 01:38:37 +0000 (12:38 +1100)] 
Makefile: disable fsr for Mac OS X

Since its kernel does not support XFS anyway this utility is not
useful, and with its removal the portability framework can be
simplified.

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfs_io: replace posix_fadvise64 by equivalent posix_fadvise
Felix Janda [Tue, 1 Nov 2016 01:38:36 +0000 (12:38 +1100)] 
xfs_io: replace posix_fadvise64 by equivalent posix_fadvise

also fixes a compile failure on FreeBSD

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfsprogs: replace sendfile64 by equivalent sendfile
Felix Janda [Tue, 1 Nov 2016 01:38:34 +0000 (12:38 +1100)] 
xfsprogs: replace sendfile64 by equivalent sendfile

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfsprogs: replace pread64/pwrite64 by equivalent pread/pwrite
Felix Janda [Tue, 1 Nov 2016 01:38:33 +0000 (12:38 +1100)] 
xfsprogs: replace pread64/pwrite64 by equivalent pread/pwrite

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfsprogs: replace lseek64 by equivalent lseek
Felix Janda [Tue, 1 Nov 2016 01:38:29 +0000 (12:38 +1100)] 
xfsprogs: replace lseek64 by equivalent lseek

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfsprogs: replace ftruncate64 by equivalent ftruncate
Felix Janda [Tue, 1 Nov 2016 01:38:27 +0000 (12:38 +1100)] 
xfsprogs: replace ftruncate64 by equivalent ftruncate

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfsprogs: replace [fl]stat64 by equivalent [fl]stat
Felix Janda [Tue, 1 Nov 2016 01:38:25 +0000 (12:38 +1100)] 
xfsprogs: replace [fl]stat64 by equivalent [fl]stat

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoconfigure: remove unecessary definitions of _FILE_OFFSET_BITS
Felix Janda [Mon, 31 Oct 2016 23:41:40 +0000 (10:41 +1100)] 
configure: remove unecessary definitions of _FILE_OFFSET_BITS

now that we use AC_SYS_LARGEFILE, there is no need to explicitly
define _FILE_OFFSET_BITS.

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoconfigure: error out when LFS does not work
Felix Janda [Mon, 31 Oct 2016 23:40:40 +0000 (10:40 +1100)] 
configure: error out when LFS does not work

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoconfigure: use AC_SYS_LARGEFILE
Felix Janda [Mon, 31 Oct 2016 23:39:40 +0000 (10:39 +1100)] 
configure: use AC_SYS_LARGEFILE

The autoconf macro AC_SYS_LARGEFILE defines _FILE_OFFSET_BITS=64
where necessary to ensure that off_t and all interfaces using off_t
are 64bit, even on 32bit systems.

Signed-off-by: Felix Janda <felix.janda@posteo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfs_io: Fix initial -m option
Andreas Gruenbacher [Mon, 31 Oct 2016 23:38:40 +0000 (10:38 +1100)] 
xfs_io: Fix initial -m option

Like "open -m mode", the initial -m option requires a mode argument.

Document these options correctly as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfs_io: add command line option -i to start an idle thread
Amir Goldstein [Mon, 31 Oct 2016 23:38:19 +0000 (10:38 +1100)] 
xfs_io: add command line option -i to start an idle thread

xfs_io -i will start by spawning an idle thread.

The purpose of this idle thread is to test io from a multi threaded
process. With single threaded process, the file table is not shared
and file structs are not reference counted. Spawning an idle thread
can help detecting file struct reference leaks.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfsprogs: Update FSF address in COPYING file
Grozdan [Mon, 31 Oct 2016 23:38:09 +0000 (10:38 +1100)] 
xfsprogs: Update FSF address in COPYING file

The FSF address in doc/COPYING needs an update. This was caught and
reported by the openSUSE build service while building the xfsprogs
package. The new address is taken directly from FSF's license files
put on their site

Signed-off-by: Grozdan Nikolov <neutrino8@gmail.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
--

8 years agomkfs.xfs: format reflink enabled filesystems
Darrick J. Wong [Tue, 25 Oct 2016 22:14:36 +0000 (15:14 -0700)] 
mkfs.xfs: format reflink enabled filesystems

Create the refcount btree at mkfs time and set the feature flag.

v2: Turn on the reflink feature when calculating the minimum log size.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_repair: use thread pools to sort rmap data
Darrick J. Wong [Tue, 25 Oct 2016 22:14:36 +0000 (15:14 -0700)] 
xfs_repair: use thread pools to sort rmap data

Since each slab is a collection of independent mini-slabs, we can
fire up a bunch of threads to sort the mini-slabs in parallel.
This speeds up the sorting phase of the rmapbt rebuilding if we
have a large number of mini slabs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_repair: check for mergeable refcount records
Darrick J. Wong [Tue, 25 Oct 2016 22:14:36 +0000 (15:14 -0700)] 
xfs_repair: check for mergeable refcount records

Make sure there aren't adjacent refcount records that could be merged;
this is a sign that the refcount tree algorithms aren't working
correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_repair: use range query when while checking rmaps
Darrick J. Wong [Tue, 25 Oct 2016 22:14:35 +0000 (15:14 -0700)] 
xfs_repair: use range query when while checking rmaps

For shared extents, we ought to use a range query on the rmapbt to
find the corresponding rmap.  However, most of the time the observed
rmap will be an exact match for the rmapbt rmap, in which case we
could have used the (much faster) regular lookup.  Therefore, try the
regular lookup first and resort to the range lookup if that doesn't
get us what we want.  This can cut the run time of the rmap check of
xfs_repair in half.

Theoretically, the only reason why an observed rmap wouldn't be an
exact match for an rmapbt rmap is because we modified some file on
account of a metadata error.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_repair: check the CoW extent size hint
Darrick J. Wong [Tue, 25 Oct 2016 22:14:35 +0000 (15:14 -0700)] 
xfs_repair: check the CoW extent size hint

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_repair: complain about copy-on-write leftovers
Darrick J. Wong [Tue, 25 Oct 2016 22:14:35 +0000 (15:14 -0700)] 
xfs_repair: complain about copy-on-write leftovers

Complain about leftover CoW allocations that are hanging off the
refcount btree.  These are cleaned out at mount time, but we could be
louder about flagging down evidence of trouble.

Since these extents aren't "owned" by anything, we'll free them up by
reconstructing the free space btrees.

v2: When we're processing rmap records, we inadvertently forgot to
handle the CoW owner, so the leftover CoW staging blocks got marked as
file data.  These blocks will just get freed later, so mark them
"CoW".  When we process the refcountbt, complain about leftovers if
the type is unknown or "CoW".

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_repair: rebuild the refcount btree
Darrick J. Wong [Tue, 25 Oct 2016 22:14:35 +0000 (15:14 -0700)] 
xfs_repair: rebuild the refcount btree

Rebuild the refcount btree with the reference count data we assembled
during phase 4.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_repair: check the refcount btree against our observed reference counts when -n
Darrick J. Wong [Tue, 25 Oct 2016 22:14:35 +0000 (15:14 -0700)] 
xfs_repair: check the refcount btree against our observed reference counts when -n

Check the observed reference counts against whatever's in the refcount
btree for discrepancies.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_repair: fix inode reflink flags
Darrick J. Wong [Tue, 25 Oct 2016 22:14:35 +0000 (15:14 -0700)] 
xfs_repair: fix inode reflink flags

While we're computing reference counts, record which inodes actually
share blocks with other files and fix the flags as necessary.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_repair: record reflink inode state
Darrick J. Wong [Tue, 25 Oct 2016 22:14:34 +0000 (15:14 -0700)] 
xfs_repair: record reflink inode state

Record the state of the per-inode reflink flag, so that we can
compare against the rmap data and update the flags accordingly.
Clear the (reflink) state if we clear the inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_repair: process reverse-mapping data into refcount data
Darrick J. Wong [Tue, 25 Oct 2016 22:14:34 +0000 (15:14 -0700)] 
xfs_repair: process reverse-mapping data into refcount data

Take all the reverse-mapping data we've acquired and use it to generate
reference count data.  This data is used in phase 5 to rebuild the
refcount btree.

v2: Update to reflect separation of rmap_irec flags.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_repair: handle multiple owners of data blocks
Darrick J. Wong [Tue, 25 Oct 2016 22:14:34 +0000 (15:14 -0700)] 
xfs_repair: handle multiple owners of data blocks

If reflink is enabled, don't freak out if there are multiple owners of
a given block; that's just a sign that each of those owners are
reflink files.

v2: owner and offset are unsigned types, so use those for inorder
comparison.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_repair: check the existing refcount btree
Darrick J. Wong [Tue, 25 Oct 2016 22:14:34 +0000 (15:14 -0700)] 
xfs_repair: check the existing refcount btree

Spot-check the refcount btree for obvious errors, and mark the
refcount btree blocks as such.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs_repair: fix get_agino_buf to avoid corrupting inodes
Darrick J. Wong [Tue, 25 Oct 2016 22:14:34 +0000 (15:14 -0700)] 
xfs_repair: fix get_agino_buf to avoid corrupting inodes

The inode buffering code tries to read inodes in units of chunks,
which are the larger of 8K or 1 FSB.  Each chunk gets its own xfs_buf,
which means that get_agino_buf must calculate the disk address of the
chunk and feed that to libxfs_readbuf in order to find the inode data
correctly.  The current code simply grabs the chunk for the start
inode and indexes from that, which corrupts memory because the start
inode and the target inode could be in different inode chunks.  That
causes the assert in rmap.c to blow when we clear the reflink flag.

(Also fix some minor errors in the debugging printfs.)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoman: document the inode cowextsize flags & fields
Darrick J. Wong [Tue, 25 Oct 2016 22:14:34 +0000 (15:14 -0700)] 
man: document the inode cowextsize flags & fields

Document the new copy-on-write extent size fields and inode flags
available in struct fsxattr.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs_logprint: support bmap redo items
Darrick J. Wong [Tue, 25 Oct 2016 22:14:33 +0000 (15:14 -0700)] 
xfs_logprint: support bmap redo items

Print block mapping update redo items.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_logprint: support refcount redo items
Darrick J. Wong [Tue, 25 Oct 2016 22:14:33 +0000 (15:14 -0700)] 
xfs_logprint: support refcount redo items

Print reference count update redo items.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_logprint: support cowextsize reporting in log contents
Darrick J. Wong [Tue, 25 Oct 2016 22:14:33 +0000 (15:14 -0700)] 
xfs_logprint: support cowextsize reporting in log contents

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs_io: try to unshare copy-on-write blocks via fallocate
Darrick J. Wong [Tue, 25 Oct 2016 22:14:33 +0000 (15:14 -0700)] 
xfs_io: try to unshare copy-on-write blocks via fallocate

Wire up the "unshare" flag to the xfs_io fallocate command.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs_io: provide long-format help for falloc
Darrick J. Wong [Tue, 25 Oct 2016 22:14:33 +0000 (15:14 -0700)] 
xfs_io: provide long-format help for falloc

Provide long-format help for falloc so that users can learn about
the command.

Note for xfstest writers: If you need to check that a particular
fallocate mode works (-c/-i/-p/-u) on a given filesystem, you should
call _require_xfs_io_command with the falloc subcommand directly, (i.e.
_require_xfs_io_command funshare) because the subcommands are
special-cased to actually try the command.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs_io: support injecting the 'per-AG reservation critically low' error
Darrick J. Wong [Tue, 25 Oct 2016 22:14:33 +0000 (15:14 -0700)] 
xfs_io: support injecting the 'per-AG reservation critically low' error

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs_io: add refcount+bmap error injection types
Darrick J. Wong [Tue, 25 Oct 2016 22:14:33 +0000 (15:14 -0700)] 
xfs_io: add refcount+bmap error injection types

Add refcount and bmap deferred finish to the types of errors we can
inject.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs_io: get and set the CoW extent size hint
Darrick J. Wong [Tue, 25 Oct 2016 22:14:32 +0000 (15:14 -0700)] 
xfs_io: get and set the CoW extent size hint

Enable administrators to get or set the CoW extent size hint.
Report the hint when we run stat.  This also requires some
autoconf magic to detect whether or not fsx_cowextsize exists.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: Use the internal fsxattr override to guarantee that the cowextsize
field always exists inside of whatever struct fsxattr is.

8 years agolibxfs: add autoconf mechanism to override system header fsxattr
Darrick J. Wong [Tue, 25 Oct 2016 22:14:32 +0000 (15:14 -0700)] 
libxfs: add autoconf mechanism to override system header fsxattr

By default, libxfs will use the kernel/system headers to define struct
fsxattr.  Unfortunately, this creates a problem for developers who are
writing new features but building xfsprogs on a stable system, because
the stable kernel's headers don't reflect the new feature.  In this
case, we want to be able to use the internal fsxattr definition while
the kernel headers catch up, so provide some configure magic to allow
further patches to force the use of the internal definition.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: Remove the user-visible configure option but leave the fsxattr
override ability so that subsequent patches can trigger it if
necessary.

8 years agoxfs_io: bmap should support querying CoW fork, shared blocks
Darrick J. Wong [Tue, 25 Oct 2016 22:14:32 +0000 (15:14 -0700)] 
xfs_io: bmap should support querying CoW fork, shared blocks

Teach the bmap command to report shared and delayed allocation
extents, and to be able to query the CoW fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_growfs: report the presence of the reflink feature
Darrick J. Wong [Tue, 25 Oct 2016 22:14:32 +0000 (15:14 -0700)] 
xfs_growfs: report the presence of the reflink feature

Report the presence of the reflink feature in xfs_info.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs_db: print one array element per line
Darrick J. Wong [Tue, 25 Oct 2016 22:14:32 +0000 (15:14 -0700)] 
xfs_db: print one array element per line

Print one array element per line so that the debugger output isn't
a gigantic pile of screen snow.

Before (inobt):

xfs_db> p recs
recs[1-55] = [startino,holemask,count,freecount,free]
1:[128,0,64,0,0] 2:[4288,0xff,32,0,0xffffffff] 3:[4352,0,64,0,0]
4:[4416,0,64,10,0x1f0003e000000000] 5:[4480,0,64,17,0xc00e1803c2007840]

After:

xfs_db> p recs
recs[1-55] = [startino,holemask,count,freecount,free]
1:[128,0,64,0,0]
2:[4288,0xff,32,0,0xffffffff]
3:[4352,0,64,0,0]
4:[4416,0,64,10,0x1f0003e000000000]
5:[4480,0,64,17,0xc00e1803c2007840]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_db: deal with the CoW extent size hint
Darrick J. Wong [Tue, 25 Oct 2016 22:14:32 +0000 (15:14 -0700)] 
xfs_db: deal with the CoW extent size hint

Display the CoW extent hint size when dumping inodes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs_db: metadump should copy the refcount btree too
Darrick J. Wong [Tue, 25 Oct 2016 22:14:32 +0000 (15:14 -0700)] 
xfs_db: metadump should copy the refcount btree too

Teach metadump to copy the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs_db: add support for checking the refcount btree
Darrick J. Wong [Tue, 25 Oct 2016 22:14:31 +0000 (15:14 -0700)] 
xfs_db: add support for checking the refcount btree

Do some basic checks of the refcount btree.  xfs_repair will have to
check that the reference counts match the various bmbt mappings.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agoxfs_db: dump refcount btree data
Darrick J. Wong [Tue, 25 Oct 2016 22:14:31 +0000 (15:14 -0700)] 
xfs_db: dump refcount btree data

Add the ability to walk and dump the refcount btree in xfs_db.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agolibxfs: add fsxattr flags and fields for cowextsize
Darrick J. Wong [Tue, 25 Oct 2016 22:14:31 +0000 (15:14 -0700)] 
libxfs: add fsxattr flags and fields for cowextsize

Add the cowextsize field and flag to each platform's struct fsxattr
definitions.  We can compile these definitions into the xfsprogs
utilities if we don't pick them up from the system headers, such as on
kernels prior to 4.9.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agolibxfs: free the CoW fork from an inode
Darrick J. Wong [Tue, 25 Oct 2016 22:14:31 +0000 (15:14 -0700)] 
libxfs: free the CoW fork from an inode

Clean up the CoW fork, should there ever be one.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agolibxfs: plumb in bmap deferred op log items
Darrick J. Wong [Tue, 25 Oct 2016 22:14:31 +0000 (15:14 -0700)] 
libxfs: plumb in bmap deferred op log items

Add a deferred op handler for block mapping actions.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agolibxfs: plumb in refcount deferred op log items
Darrick J. Wong [Tue, 25 Oct 2016 22:14:31 +0000 (15:14 -0700)] 
libxfs: plumb in refcount deferred op log items

Add a deferred op handler for refcount update actions.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agolibxfs: add xfs_refcount.h to the standard include list
Darrick J. Wong [Tue, 25 Oct 2016 22:14:30 +0000 (15:14 -0700)] 
libxfs: add xfs_refcount.h to the standard include list

Pick up the definitions in xfs_refcount.h for all compilation units.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
8 years agolibxfs: initialize the in-core mount context for refcount btrees
Darrick J. Wong [Tue, 25 Oct 2016 22:14:30 +0000 (15:14 -0700)] 
libxfs: initialize the in-core mount context for refcount btrees

Initialize the refcount btree maxlevel field of the mount context.
This helps us to detect overly tall trees.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs_buflock: handling parsing errors more gracefully
Darrick J. Wong [Tue, 25 Oct 2016 22:14:30 +0000 (15:14 -0700)] 
xfs_buflock: handling parsing errors more gracefully

Skip ftrace output lines that don't parse.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs_logprint: fix up the RUI printing code to reflect new format
Darrick J. Wong [Tue, 25 Oct 2016 22:14:30 +0000 (15:14 -0700)] 
xfs_logprint: fix up the RUI printing code to reflect new format

We changed the RUI format to use a variable length array, so update
the logprint code to reflect that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: defer should abort intent items if the trans roll fails libxfs-4.9-sync
Darrick J. Wong [Tue, 25 Oct 2016 02:00:12 +0000 (13:00 +1100)] 
xfs: defer should abort intent items if the trans roll fails

Source kernel commit: b77428b12b55437b28deae738d9ce8b2e0663b55

If the deferred ops transaction roll fails, we need to abort the intent
items if we haven't already logged a done item for it, regardless of
whether or not the deferred ops has had a transaction committed.  Dave
found this while running generic/388.

Move the tracepoint to make it easier to track object lifetimes.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfs: remove xfs_bunmapi_cow
Christoph Hellwig [Tue, 25 Oct 2016 01:59:49 +0000 (12:59 +1100)] 
xfs: remove xfs_bunmapi_cow

Source kernel commit: 64e6428ddd00f864e3ca105f914a2b6920c2bc41

Since no one uses it anymore.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfs: refactor xfs_bunmapi_cow
Christoph Hellwig [Tue, 25 Oct 2016 01:59:46 +0000 (12:59 +1100)] 
xfs: refactor xfs_bunmapi_cow

Source kernel commit: fa5c836ca8eb5bad6316ddfc066acbc4e2485356

Split out two helpers for deleting delayed or real extents from the COW fork.
This allows to call them directly from xfs_reflink_cow_end_io once that
function is refactored to iterate the extent tree.  It will also allow
to reuse the delalloc deletion from xfs_bunmapi in the future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfs: add xfs_trim_extent
Darrick J. Wong [Tue, 25 Oct 2016 01:47:36 +0000 (12:47 +1100)] 
xfs: add xfs_trim_extent

Source kernel commit: 0a0af28cad9a43d90f13c2047bd8ee3d4cffb7f3

This helpers allows to trim an extent to a subset of it's original range
while making sure the block numbers in it remain valid,

In the future xfs_trim_extent and xfs_bmapi_trim_map should probably be
merged in some form.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: split from a previous patch from Darrick, moved around and added
support for "raw" delayed extents"]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agolibxfs: clean up _calc_dquots_per_chunk
Darrick J. Wong [Tue, 25 Oct 2016 01:47:36 +0000 (12:47 +1100)] 
libxfs: clean up _calc_dquots_per_chunk

Source kernel commit: 58d789678546d46d7bbd809dd7dab417c0f23655

The function xfs_calc_dquots_per_chunk takes a parameter in units
of basic blocks.  The kernel seems to get the units wrong, but
userspace got 'fixed' by commenting out the unnecessary conversion.
Fix both.

cc: <stable@vger.kernel.org>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfs: remove pointless error goto in xfs_bmap_remap_alloc
Eric Sandeen [Tue, 25 Oct 2016 01:47:14 +0000 (12:47 +1100)] 
xfs: remove pointless error goto in xfs_bmap_remap_alloc

Source kernel commit: fe23759eaf2f6540de20c1623f066aad967ff9c9

The commit:

f65306ea xfs: map an inode's offset to an exact physical block

added a pointless error0: target; remove it.

Addresses-Coverity-Id: 1373865
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfs: add some 'static' annotations
Eric Biggers [Tue, 25 Oct 2016 01:47:14 +0000 (12:47 +1100)] 
xfs: add some 'static' annotations

Source kernel commit: f1b8243c55ca6fd2a3898e2f586b8cfcfff684bb

sparse reported that several variables and a function were not
forward-declared anywhere and therefore should be 'static'.

Found with sparse by running 'make C=2 CF=-D__CHECK_ENDIAN__ fs/xfs/'

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfs: remove redundant assignment of ifp
Colin Ian King [Tue, 25 Oct 2016 01:47:14 +0000 (12:47 +1100)] 
xfs: remove redundant assignment of ifp

Source kernel commit: 1d55a4bfd080ff4c6c96acfccfb7cdd2615ed6c2

Remove redundant ifp = ifp statement, it does nothing. Found with
static analysis by CoverityScan.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfs: rework refcount cow recovery error handling
Darrick J. Wong [Tue, 25 Oct 2016 01:47:13 +0000 (12:47 +1100)] 
xfs: rework refcount cow recovery error handling

Source kernel commit: 6f97077ff6ef28e0f3b361b6ba9c95a222ef384b

The error handling in xfs_refcount_recover_cow_leftovers is confused
and can potentially leak memory, so rework it to release resources
correctly on error.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
8 years agoxfs: implement swapext for rmap filesystems
Darrick J. Wong [Tue, 25 Oct 2016 01:47:13 +0000 (12:47 +1100)] 
xfs: implement swapext for rmap filesystems

Source kernel commit: 1f08af52e7c981e9877796a2d90b0e0f08666945

Implement swapext for filesystems that have reverse mapping.  Back in
the reflink patches, we augmented the bmap code with a 'REMAP' flag
that updates only the bmbt and doesn't touch the allocator and
implemented log redo items for those two operations.  Now we can
rewrite extent swapping as a (looong) series of remap operations.

This is far less efficient than the fork swapping method implemented
in the past, so we only switch this on for rmap.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: recognize the reflink feature bit
Darrick J. Wong [Tue, 25 Oct 2016 01:46:51 +0000 (12:46 +1100)] 
xfs: recognize the reflink feature bit

Source kernel commit: e54b5bf9d7aeb92d92c7f5115035e6a851d0f0c5

Add the reflink feature flag to the set of recognized feature flags.
This enables users to write to reflink filesystems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: simulate per-AG reservations being critically low
Darrick J. Wong [Tue, 25 Oct 2016 01:46:51 +0000 (12:46 +1100)] 
xfs: simulate per-AG reservations being critically low

Source kernel commit: a35eb41519ab8db90e87d375ee9362d6e080ca4c

Create an error injection point that enables us to simulate being
critically low on per-AG block reservations.  This should enable us to
simulate this specific ENOSPC condition so that we can test falling back
to a regular file copy.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: don't mix reflink and DAX mode for now
Darrick J. Wong [Tue, 25 Oct 2016 01:46:51 +0000 (12:46 +1100)] 
xfs: don't mix reflink and DAX mode for now

Source kernel commit: 4f435ebe7d0422af61cdcddbbcc659888645a1e1

Since we don't have a strategy for handling both DAX and reflink,
for now we'll just prohibit both being set at the same time.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: check for invalid inode reflink flags
Darrick J. Wong [Tue, 25 Oct 2016 01:46:51 +0000 (12:46 +1100)] 
xfs: check for invalid inode reflink flags

Source kernel commit: c8e156ac336d82f67d7adc014404a2251e9dad09

We don't support sharing blocks on the realtime device.  Flag inodes
with the reflink or cowextsize flags set when the reflink feature is
disabled.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: convert unwritten status of reverse mappings for shared files
Darrick J. Wong [Tue, 25 Oct 2016 01:46:51 +0000 (12:46 +1100)] 
xfs: convert unwritten status of reverse mappings for shared files

Source kernel commit: 3f165b334e51477d2b33ac1c81b39927514daab7

Provide a function to convert an unwritten extent to a real one and
vice versa when shared extents are possible.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: use interval query for rmap alloc operations on shared files
Darrick J. Wong [Tue, 25 Oct 2016 01:46:22 +0000 (12:46 +1100)] 
xfs: use interval query for rmap alloc operations on shared files

Source kernel commit: ceeb9c832eeca5c1c2efc54a38f67283ccb60288

When it's possible for reverse mappings to overlap (data fork extents
of files on reflink filesystems), use the interval query function to
find the left neighbor of an extent we're trying to add; and be
careful to use the lookup functions to update the neighbors and/or
add new extents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: add shared rmap map/unmap/convert log item types
Darrick J. Wong [Tue, 25 Oct 2016 01:43:48 +0000 (12:43 +1100)] 
xfs: add shared rmap map/unmap/convert log item types

Source kernel commit: 0e07c039bac5f6ce7e3bc512ab9efb4aaa76da94

Wire up some rmap log redo item type codes to map, unmap, or convert
shared data block extents.  The actual log item recovery comes in a
later patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: increase log reservations for reflink
Darrick J. Wong [Tue, 25 Oct 2016 01:43:48 +0000 (12:43 +1100)] 
xfs: increase log reservations for reflink

Source kernel commit: 80de462e090c2c346ca6ec6344b326e81e8cef84

Increase the log reservations to handle the increased rolling that
happens at the end of copy-on-write operations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: try other AGs to allocate a BMBT block
Darrick J. Wong [Tue, 25 Oct 2016 01:43:48 +0000 (12:43 +1100)] 
xfs: try other AGs to allocate a BMBT block

Source kernel commit: 90e2056d76adc7894a019f5289d259de58065e13

Prior to the introduction of reflink, allocating a block and mapping
it into a file was performed in a single transaction with a single
block reservation, and the allocator was supposed to find enough
blocks to allocate the extent and any BMBT blocks that might be
necessary (unless we're low on space).

However, due to the way copy on write works, allocation and mapping
have been split into two transactions, which means that we must be
able to handle the case where we allocate an extent for CoW but that
AG runs out of free space before the blocks can be mapped into a file,
and the mapping requires a new BMBT block.  When this happens, look in
one of the other AGs for a BMBT block instead of taking the FS down.

The same applies to the functions that convert a data fork to extents
and later btree format.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: preallocate blocks for worst-case btree expansion
Darrick J. Wong [Tue, 25 Oct 2016 01:43:48 +0000 (12:43 +1100)] 
xfs: preallocate blocks for worst-case btree expansion

Source kernel commit: 84d6961910ea7b3ae8d8338f5b4df25dea68cee9

To gracefully handle the situation where a CoW operation turns a
single refcount extent into a lot of tiny ones and then run out of
space when a tree split has to happen, use the per-AG reserved block
pool to pre-allocate all the space we'll ever need for a maximal
btree.  For a 4K block size, this only costs an overhead of 0.3% of
available disk space.

When reflink is enabled, we have an unfortunate problem with rmap --
since we can share a block billions of times, this means that the
reverse mapping btree can expand basically infinitely.  When an AG is
so full that there are no free blocks with which to expand the rmapbt,
the filesystem will shut down hard.

This is rather annoying to the user, so use the AG reservation code to
reserve a "reasonable" amount of space for rmap.  We'll prevent
reflinks and CoW operations if we think we're getting close to
exhausting an AG's free space rather than shutting down, but this
permanent reservation should be enough for "most" users.  Hopefully.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch@lst.de: ensure that we invalidate the freed btree buffer]
Signed-off-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: create a separate cow extent size hint for the allocator
Darrick J. Wong [Tue, 25 Oct 2016 01:43:42 +0000 (12:43 +1100)] 
xfs: create a separate cow extent size hint for the allocator

Source kernel commit: f7ca35227253dc8244fd908140b06010e67a31e5

Create a per-inode extent size allocator hint for copy-on-write.  This
hint is separate from the existing extent size hint so that CoW can
take advantage of the fragmentation-reducing properties of extent size
hints without disabling delalloc for regular writes.

The extent size hint that's fed to the allocator during a copy on
write operation is the greater of the cowextsize and regular extsize
hint.

During reflink, if we're sharing the entire source file to the entire
destination file and the destination file doesn't already have a
cowextsize hint, propagate the source file's cowextsize hint to the
destination file.

Furthermore, zero the bulkstat buffer prior to setting the fields
so that we don't copy kernel memory contents into userspace.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: teach get_bmapx about shared extents and the CoW fork
Darrick J. Wong [Tue, 25 Oct 2016 01:42:12 +0000 (12:42 +1100)] 
xfs: teach get_bmapx about shared extents and the CoW fork

Source kernel commit: f86f403794b1446b68afb3c233d4c0bc0e93b654

Teach xfs_getbmapx how to report shared extents and CoW fork contents
accurately in the bmap output by querying the refcount btree
appropriately.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8 years agoxfs: store in-progress CoW allocations in the refcount btree
Darrick J. Wong [Tue, 25 Oct 2016 01:41:50 +0000 (12:41 +1100)] 
xfs: store in-progress CoW allocations in the refcount btree

Source kernel commit: 174edb0e46e520230791a1a894397b7c824cefc4

Due to the way the CoW algorithm in XFS works, there's an interval
during which blocks allocated to handle a CoW can be lost -- if the FS
goes down after the blocks are allocated but before the block
remapping takes place.  This is exacerbated by the cowextsz hint --
allocated reservations can sit around for a while, waiting to get
used.

Since the refcount btree doesn't normally store records with refcount
of 1, we can use it to record these in-progress extents.  In-progress
blocks cannot be shared because they're not user-visible, so there
shouldn't be any conflicts with other programs.  This is a better
solution than holding EFIs during writeback because (a) EFIs can't be
relogged currently, (b) even if they could, EFIs are bound by
available log space, which puts an unnecessary upper bound on how much
CoW we can have in flight, and (c) we already have a mechanism to
track blocks.

At mount time, read the refcount records and free anything we find
with a refcount of 1 because those were in-progress when the FS went
down.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>