Eric Biggers [Thu, 12 Jan 2017 20:12:40 +0000 (14:12 -0600)]
xfs_io: implement 'set_encpolicy' and 'get_encpolicy' commands
Add set_encpolicy and get_encpolicy commands to xfs_io so that xfstests
will be able to test filesystem encryption using the actual user API,
not just hacked in with a mount option. These commands use the common
"fscrypt" API currently implemented by ext4 and f2fs, but it's also
under development for ubifs and planned for xfs.
Note that to get encrypted files to actually work, it's also necessary
to add a key to the kernel keyring. This patch does not add a command
for this to xfs_io because it's possible to do it using keyctl. keyctl
can also be used to remove keys, revoke keys, invalidate keys, etc.
Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Ralph Sennhauser [Thu, 12 Jan 2017 20:12:40 +0000 (14:12 -0600)]
xfs_io: fix building with musl
The fallback in case the libc doesn't have or doesn't advertise the
existence of d_reclen in struct dirent uses d_namlen. Musl neither
advertises d_reclen nor does it have a d_namlen member.
Calculate the value for d_namlen from d_name in the fallback path.
Signed-off-by: Ralph Sennhauser <ralph.sennhauser@gmail.com> Reviewed--by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Gwendal Grignou [Thu, 12 Jan 2017 20:12:29 +0000 (14:12 -0600)]
build: Allow compiling xfsprogs in a cross compile environment
Without this patch, we are using the same compiler and options for the host
compiler (BUILD_CC) and the target compiler (CC), and we would get error
messages at compilation:
x86_64-pc-linux-gnu-gcc -O2 -O2 -pipe -march=armv7-a -mtune=cortex-a15 ...
x86_64-pc-linux-gnu-gcc.real: error: unrecognized command line option
'-mfpu=neon'
'-mfloat-abi=hard'
'-clang-syntax'
'-mfpu=neon'
'-mfloat-abi=hard'
'-clang-syntax'
Add BUILD_CC and BUILD_CFLAGS as precious variables to allow setting it up
from the ebuild.
Signed-off-by: Gwendal Grignou <gwendal@chromium.org> Reviewed-by: Mike Frysinger <vapier@gentoo.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
->total is a bit of an odd parameter passed down to the low-level
allocator all the way from the high-level callers. It's supposed to
contain the maximum number of blocks to be allocated for the whole
transaction [1].
But in xfs_iomap_write_allocate we only convert existing delayed
allocations and thus only have a minimal block reservation for the
current transaction, so xfs_alloc_space_available can't use it for
the allocation decisions. Use the maximum of args->total and the
calculated block requirement to make a decision. We probably should
get rid of args->total eventually and instead apply ->minleft more
broadly, but that will require some extensive changes all over.
[1] which creates lots of confusion as most callers don't decrement it
once doing a first allocation. But that's for a separate series.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
We must decide in xfs_alloc_fix_freelist if we can perform an
allocation from a given AG is possible or not based on the available
space, and should not fail the allocation past that point on a
healthy file system.
But currently we have two additional places that second-guess
xfs_alloc_fix_freelist: xfs_alloc_ag_vextent tries to adjust the
maxlen parameter to remove the reservation before doing the
allocation (but ignores the various minium freespace requirements),
and xfs_alloc_fix_minleft tries to fix up the allocated length
after we've found an extent, but ignores the reservations and also
doesn't take the AGFL into account (and thus fails allocations
for not matching minlen in some cases).
Remove all these later fixups and just correct the maxlen argument
inside xfs_alloc_fix_freelist once we have the AGF buffer locked.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
We can't just set minleft to 0 when we're low on space - that's exactly
what we need minleft for: to protect space in the AG for btree block
allocations when we are low on free space.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Setting aside 4 blocks globally for bmbt splits isn't all that useful,
as different threads can allocate space in parallel. Bump it to 4
blocks per AG to allow each thread that is currently doing an
allocation to dip into it separately. Without that we may no have
enough reserved blocks if there are enough parallel transactions
in an almost out space file system that all run into bmap btree
splits.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
We need to use the actual AG length when making per-AG reservations,
since we could otherwise end up reserving more blocks out of the last
AG than there are actual blocks.
Complained-about-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Use NOFS for allocating btree cursors, since they can be called
under the ilock.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
When we create a new attribute, we first create a shortform
attribute, and try to fit the new attribute into it.
If that fails, we copy the (empty) attribute into a leaf attribute,
and do the copy again. Thus there can be a transient state where
we have an empty leaf attribute.
If we encounter this during log replay, the verifier will fail.
So add a test to ignore this part of the leaf attr verification
during log replay.
Thanks as usual to dchinner for spotting the problem.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Nick Piggin reported that the CRC overhead in an fsync heavy
workload was higher than expected on a Power8 machine. Part of this
was to do with the fact that the power8 CRC implementation is not
efficient for CRC lengths of less than 512 bytes, and so the way we
split the CRCs over the CRC field means a lot of the CRCs are
reduced to being less than than optimal size.
To optimise this, change the CRC update mechanism to zero the CRC
field first, and then compute the CRC in one pass over the buffer
and write the result back into the buffer. We can do this safely
because anything writing a CRC has exclusive access to the buffer
the CRC is being calculated over.
We leave the CRC verify code the same - it still splits the CRC
calculation - because we do not want read-only operations modifying
the underlying buffer. This is because read-only operations may not
have an exclusive access to the buffer guaranteed, and so temporary
modifications could leak out to to other processes accessing the
buffer concurrently.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Embedding a switch statement in every btree stats inc/add adds a lot
of code overhead to the core btree infrastructure paths. Stats are
supposed to be small and lightweight, but the btree stats have
become big and bloated as we've added more btrees. It needs fixing
because the reflink code will just add more overhead again.
Convert the v2 btree stats to arrays instead of independent
variables, and instead use the type to index the specific btree
array via an enum. This allows us to use array based indexing
to update the stats, rather than having to derefence variables
specific to the btree type.
If we then wrap the xfsstats structure in a union and place uint32_t
array beside it, and calculate the correct btree stats array base
array index when creating a btree cursor, we can easily access
entries in the stats structure without having to switch names based
on the btree type.
We then replace with the switch statement with a simple set of stats
wrapper macros, resulting in a significant simplification of the
btree stats code, and:
text data bss dec hex filename
48905 144 8 49057 bfa1 fs/xfs/libxfs/xfs_btree.o.old
36793 144 8 36945 9051 fs/xfs/libxfs/xfs_btree.o
it reduces the core btree infrastructure code size by close to 25%!
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The on-disk field di_size is used to set i_size, which is a signed
integer of loff_t. If the high bit of di_size is set, we'll end up with
a negative i_size, which will cause all sorts of problems. Since the
VFS won't let us create a file with such length, we should catch them
here in the verifier too.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
We shouldn't assert if somehow we end up trying to add an attr fork to
an inode that apparently already has attr extents because this is an
indication of on-disk corruption. Instead, return an error code to
userspace.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In xfs_dir3_data_read, we can encounter the situation where err == 0 and
*bpp == NULL if the given bno offset happens to be a hole; this leads to
a crash if we try to set the buffer type after the _da_read_buf call.
Holes can happen due to corrupt or malicious entries in the bmbt data,
so be a little more careful when we're handling buffers.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
When reading into memory all extents of a btree-format inode fork,
complain if the number of extents we find is not the same as the number
of extents reported in the inode core. This is needed to stop an IO
action from accessing the garbage areas of the in-core fork.
[dchinner: removed redundant assert]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
When we're reading a btree block, make sure that what we retrieved
matches the owner and level; and has a plausible number of records.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
There is no such thing as a zero-level AG btree since even a single-node
zero-records btree has one level. Btree cursor constructors read
cur_nlevels straight from disk and then access things like
cur_bufs[cur_nlevels - 1] which is /really/ bad if cur_nlevels is zero!
Therefore, strengthen the verifiers to prevent this possibility.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
There are a handful of xattr functions which now return
nothing but zero. They can be made void, chased through calling
functions, and error handling etc can be removed.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
By inspection, xfs_bmap_trace_exlist isn't handling cow forks,
and will trace the data fork instead.
Fix this by setting state appropriately if whichfork
== XFS_COW_FORK.
()___()
< @ @ >
| |
{o_o}
(|)
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
When xfs_bmap_trace_exlist called trace_xfs_extlist,
it sent in the "whichfork" var instead of the bmap "state"
as expected (even though state was already set up for this
purpose).
As a result, the xfs_bmap_class in tracing code used
"whichfork" not state in xfs_iext_state_to_fork(), and got
the wrong ifork pointer. It all goes downhill from
there, including an ASSERT when ifp_bytes is empty
by the time it reaches xfs_iext_get_ext():
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
We've missed properly setting the buffer type for
an AGI transaction in 3 spots now, so just move it
into xfs_read_agi() and set it if we are in a transaction
to avoid the problem in the future.
This is similar to how it is done in i.e. the dir3
and attr3 read functions.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Speculative preallocation is currently processed entirely by the callers
of xfs_bmapi_reserve_delalloc(). The caller determines how much
preallocation to include, adjusts the extent length and passes down the
resulting request.
While this works fine for post-eof speculative preallocation, it is not
as reliable for COW fork preallocation. COW fork preallocation is
implemented via the cowextszhint, which aligns the start offset as well
as the length of the extent. Further, it is difficult for the caller to
accurately identify when preallocation occurs because the returned
extent could have been merged with neighboring extents in the fork.
To simplify this situation and facilitate further COW fork preallocation
enhancements, update xfs_bmapi_reserve_delalloc() to take a separate
preallocation parameter to incorporate into the allocation request. The
preallocation blocks value is tacked onto the end of the request and
adjusted to accommodate neighboring extents and extent size limits.
Since xfs_bmapi_reserve_delalloc() now knows precisely how much
preallocation was included in the allocation, it can also tag the inodes
appropriately to support preallocation reclaim.
Note that xfs_bmapi_reserve_delalloc() callers are not yet updated to
use the preallocation mechanism. This patch should not change behavior
outside of correctly tagging reflink inodes when start offset
preallocation occurs (which the caller does not handle correctly).
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Declare the structure xfs_nameops as const as it is only stored in the
m_dirnameops field of a xfs_mount structure. This field is of type
const struct xfs_nameops *, so xfs_nameops structures having this
property can be declared as const.
Done using Coccinelle:
@r1 disable optional_qualifier @
identifier i;
position p;
@@
static struct xfs_nameops i@p = {...};
@ok1@
identifier r1.i;
position p;
struct xfs_mount mp;
@@
mp.m_dirnameops=&i@p
@bad@
position p!={r1.p,ok1.p};
identifier r1.i;
@@
i@p
When we're estimating the amount of space it's going to take to satisfy
a delalloc reservation, we need to include the space that we might need
to grow the rmapbt. This helps us to avoid running out of space later
when _iomap_write_allocate needs more space than we reserved. Eryu Guan
observed this happening on generic/224 when sunit/swidth were set.
Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
We only ever set a field to this constant for an impossible to reach
error case in xfs_bmap_search_extents. That functions has been removed,
so we can remove the constant as well.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
We can easily lookup the previous extent for the cases where we need it,
which saves the callers from looking it up for us later in the series.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Rewrite the function using xfs_iext_lookup_extent and xfs_iext_get_extent,
and massage the flow into something easily understandable.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
xfs_iext_lookup_extent looks up a single extent at the passed in offset,
and returns the extent covering the area, or the one behind it in case
of a hole, as well as the index of the returned extent in arguments,
as well as a simple bool as return value that is set to false if no
extent could be found because the offset is behind EOF. It is a simpler
replacement for xfs_bmap_search_extent that leaves looking up the rarely
needed previous extent to the caller and has a nicer calling convention.
xfs_iext_get_extent is a helper for iterating over the extent list,
it takes an extent index as input, and returns the extent at that index
in it's expanded form in an argument if it exists. The actual return
value is a bool whether the index is valid or not.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
[dchinner: cleaned up XFS_MIN_CRC_BLOCKSIZE check]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
is all over the xfs code; provide a new helper
xfs_iext_count(ifp) to count the number of inline extents
in an inode fork.
[dchinner: pick up several missed conversions]
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Check the return value of xfs_trans_reserve_quota_nblks for errors.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Refactor the implementations of xfs_dir2_data_freescan into a
routine that takes the raw directory block parameters and
a second function that figures out the raw parameters from the
directory inode. This enables us to use the exact same code
for both userspace and the kernel, since repair knows exactly
which directory block geometry parameters it needs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The userspace version of _dinode_verify takes a raw inode number
instead of an inode itself. Since neither version actually needs
the inode, port the changes to the kernel. This will also reduce
the libxfs diff noise.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
It's entirely possible for userspace to ask for an xattr which
does not exist.
Normally, there is no problem whatsoever when we ask for such
a thing, but when we look at an obfuscated metadump image
on a debug kernel with selinux, we trip over this ASSERT in
xfs_da3_path_shift():
*result = -ENOENT; /* we're out of our tree */
ASSERT(args->op_flags & XFS_DA_OP_OKNOENT);
It (more or less) only shows up in the above scenario, because
xfs_metadump obfuscates attr names, but chooses names which
keep the same hash value - and xfs_da3_node_lookup_int does:
IOWS, we only get down to the xfs_da3_path_shift() ASSERT
if we are looking for an xattr which doesn't exist, but we
find xattrs on disk which have the same hash, and so might be
a hash collision, so we try the path shift. When *that*
fails to find what we're looking for, we hit the assert about
XFS_DA_OP_OKNOENT.
Simply setting XFS_DA_OP_OKNOENT in xfs_attr_get solves this
rather corner-case problem with no ill side effects. It's
fine for an attr name lookup to fail.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The btree cursor cleanup function takes an error parameter that
affects how buffers are released from the cursor. All buffers are
released in the event of error. Several callers do not specify the
XFS_BTREE_ERROR flag in the event of error, however. This can cause
buffers to hang around locked or with an elevated hold count and
thus lead to umount hangs in the event of errors.
Fix up the xfs_btree_del_cursor() callers to pass XFS_BTREE_ERROR if
the cursor is being torn down due to error.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)]
libxfs: fix line lengths
Fix some 80-char line length issues.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)]
libxfs: remove useless stuff from the kernel
Evidently the libxfs-apply script sucked in some fs/xfs/ content from
the kernel patches and an extra redefinition of _bmap_search_extents.
We don't need this, so get rid of it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)]
libxfs: refactor btree crc verifier
In a65d8d293b ("libxfs: validate metadata LSNs against log on v5
superblocks") the hascrc check was modified to use the helper mp
variable in the kernel. This was left out of the xfsprogs patch, so
change it here too.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)]
libxfs: remove unnecessary hascrc test in btree verifiers
xfs_btree_sblock_v5hdr_verify already checks _hascrc, so we can
remove it from the verifier functions. For whatever reason this
change made it into the kernel but not xfsprogs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)]
libxfs-apply: port to stgit
Teach libxfs-apply how to talk to a stgit repository
and fix a minor typo in the guilt hunk of apply_patch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Tue, 10 Jan 2017 02:16:33 +0000 (20:16 -0600)]
tools: create libxfs-diff to compare libxfses
Create a script to compare every file in libxfs to the same files
in another libxfs. This is useful for comparing upstream kernel
and user progs to look for unported changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Biggers [Thu, 22 Dec 2016 15:31:29 +0000 (09:31 -0600)]
Fix building xfsprogs on 32-bit platforms
xfslibs now requires that its users enable transparent largefile
support. This broke building xfsprogs on 32-bit Linux (with glibc)
because _FILE_OFFSET_BITS=64 was not getting defined. Although the
autoconf macro AC_SYS_LARGEFILE was intended to define it, this didn't
work because AC_SYS_LARGEFILE will only define _FILE_OFFSET_BITS in a
config header, which doesn't work for xfsprogs because not all .c files
include platform_defs.h as their first include. Also,
platform_defs.h.in is not generated by autoheader and didn't contain a
template for _FILE_OFFSET_BITS.
Therefore, to fix the problem remove the useless autoconf macros and
instead add -D_FILE_OFFSET_BITS=64 to CFLAGS in builddefs.in. Use
CFLAGS rather than PCFLAGS because this definition could be needed by
platforms other than "linux", and it doesn't hurt to always define it.
Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Janda <felix.janda@posteo.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Thu, 22 Dec 2016 05:10:54 +0000 (23:10 -0600)]
xfs_quota: Fix test for wrapped id from GETNEXTQUOTA
dump_file and report_mount can be called with null *oid if
we aren't asking for the GETNEXTQUOTA interface, so we
should only test for the GETNEXTQUOTA wrap if *oid is
non-null. Otherwise we'll deref a null pointer in the
test.
This only happens for certain invocations of reporting,
which apparently are not covered by any regression tests
at this point, at least on new kernels which contain
GETNEXTQUOTA.
Addresses-Coverity-ID: 1397415
Addresses-Coverity-ID: 1397416 Brown-paper-bag-worn-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Wed, 21 Dec 2016 05:21:19 +0000 (23:21 -0600)]
xfs_quota: handle wrapped id from GETNEXTQUOTA
The GETNEXTQUOTA interface in the kernel had a bug
(at least in xfs) where if we pass in UINT_MAX as the
ID, it incremented, warpped, and returned 0 for the next
id. This would cause userspace to start querying
again at zero, and an xfs_quota "report" command would
loop forever. This occurred if a quota ID near
UINT max existed, and later offsets within the block
wrapped the xfs_dqid_t.
This will also be fixed in the kernel, but we should also
catch this in userspace, and stop the loop if it happens.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Wed, 21 Dec 2016 04:38:06 +0000 (22:38 -0600)]
xfs_repair: don't indicate dirtiness if FSGEOMETRY fails
Today, pointing repair at an image hosted on a non-xfs
filesystem will result in a XFS_IOC_FSGEOMETRY_V1 failure,
but repair generally proceeds without further problems.
However, calling do_warn() sets fs_is_dirty to 1, so
xfs_repair -n exits with non-zero status, indicating
corruption. This is incorrect.
Change the message to use do_log so that it does not
incorrectly indicate corruption.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Wed, 21 Dec 2016 04:38:01 +0000 (22:38 -0600)]
xfs_repair: junk leaf attribute if count == 0
We have recently seen a case where, during log replay, the
attr3 leaf verifier reported corruption when encountering a
leaf attribute with a count of 0 in the header.
We chalked this up to a transient state when a shortform leaf
was created, the attribute didn't fit, and we promoted the
(empty) attribute to the larger leaf form.
I've recently been given a metadump of unknown provenance which actually
contains a leaf attribute with count 0 on disk. This causes the
verifier to fire every time xfs_repair is run:
Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000
If this 0-count state is detected, we should just junk the leaf, same
as we would do if the count was too high. With this change, we now
remedy the problem:
Metadata corruption detected at xfs_attr3_leaf block 0x480988/0x1000
bad attribute count 0 in attr block 0, inode 12587828
problem with attribute contents in inode 12587828
clearing inode 12587828 attributes
correcting nblocks for inode 12587828, was 2 - counted 1
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 21 Dec 2016 04:35:47 +0000 (22:35 -0600)]
xfs_repair: change null check to assertion
It /should/ be the case that we never run out of records
before we run out of btree blocks, so change the null check
(that was only to appease Coverity) to an assert.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 21 Dec 2016 04:29:01 +0000 (22:29 -0600)]
xfs_repair: fix some potential null pointer deferences
Fix some potential NULL pointer deferences that Coverity pointed out,
and remove a trivial dead integer check.
Coverity-id: 1375789, 1375790, 1375791, 1375792 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 21 Dec 2016 04:28:01 +0000 (22:28 -0600)]
xfs_repair: fix bogus rmapbt record owner check
Make the reverse mapping owner check actually validate inode numbers.
Coverity-id: 1371628 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Felix Janda [Tue, 1 Nov 2016 01:38:42 +0000 (12:38 +1100)]
xfs.h: require transparent LFS for all users
Since our interfaces depend on the consistent use of a 64bit offset
type, force downstreams to use transparent LFS (_FILE_OFFSET_BITS=64),
so that it becomes impossible for them to use 32bit interfaces.
Signed-off-by: Felix Janda <felix.janda@posteo.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Felix Janda [Mon, 31 Oct 2016 23:39:40 +0000 (10:39 +1100)]
configure: use AC_SYS_LARGEFILE
The autoconf macro AC_SYS_LARGEFILE defines _FILE_OFFSET_BITS=64
where necessary to ensure that off_t and all interfaces using off_t
are 64bit, even on 32bit systems.
Signed-off-by: Felix Janda <felix.janda@posteo.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Like "open -m mode", the initial -m option requires a mode argument.
Document these options correctly as well.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Amir Goldstein [Mon, 31 Oct 2016 23:38:19 +0000 (10:38 +1100)]
xfs_io: add command line option -i to start an idle thread
xfs_io -i will start by spawning an idle thread.
The purpose of this idle thread is to test io from a multi threaded
process. With single threaded process, the file table is not shared
and file structs are not reference counted. Spawning an idle thread
can help detecting file struct reference leaks.
Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Grozdan [Mon, 31 Oct 2016 23:38:09 +0000 (10:38 +1100)]
xfsprogs: Update FSF address in COPYING file
The FSF address in doc/COPYING needs an update. This was caught and
reported by the openSUSE build service while building the xfsprogs
package. The new address is taken directly from FSF's license files
put on their site
Signed-off-by: Grozdan Nikolov <neutrino8@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
--
Darrick J. Wong [Tue, 25 Oct 2016 22:14:36 +0000 (15:14 -0700)]
xfs_repair: use thread pools to sort rmap data
Since each slab is a collection of independent mini-slabs, we can
fire up a bunch of threads to sort the mini-slabs in parallel.
This speeds up the sorting phase of the rmapbt rebuilding if we
have a large number of mini slabs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Tue, 25 Oct 2016 22:14:35 +0000 (15:14 -0700)]
xfs_repair: use range query when while checking rmaps
For shared extents, we ought to use a range query on the rmapbt to
find the corresponding rmap. However, most of the time the observed
rmap will be an exact match for the rmapbt rmap, in which case we
could have used the (much faster) regular lookup. Therefore, try the
regular lookup first and resort to the range lookup if that doesn't
get us what we want. This can cut the run time of the rmap check of
xfs_repair in half.
Theoretically, the only reason why an observed rmap wouldn't be an
exact match for an rmapbt rmap is because we modified some file on
account of a metadata error.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Tue, 25 Oct 2016 22:14:35 +0000 (15:14 -0700)]
xfs_repair: complain about copy-on-write leftovers
Complain about leftover CoW allocations that are hanging off the
refcount btree. These are cleaned out at mount time, but we could be
louder about flagging down evidence of trouble.
Since these extents aren't "owned" by anything, we'll free them up by
reconstructing the free space btrees.
v2: When we're processing rmap records, we inadvertently forgot to
handle the CoW owner, so the leftover CoW staging blocks got marked as
file data. These blocks will just get freed later, so mark them
"CoW". When we process the refcountbt, complain about leftovers if
the type is unknown or "CoW".
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Tue, 25 Oct 2016 22:14:34 +0000 (15:14 -0700)]
xfs_repair: record reflink inode state
Record the state of the per-inode reflink flag, so that we can
compare against the rmap data and update the flags accordingly.
Clear the (reflink) state if we clear the inode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Tue, 25 Oct 2016 22:14:34 +0000 (15:14 -0700)]
xfs_repair: process reverse-mapping data into refcount data
Take all the reverse-mapping data we've acquired and use it to generate
reference count data. This data is used in phase 5 to rebuild the
refcount btree.
v2: Update to reflect separation of rmap_irec flags.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Darrick J. Wong [Tue, 25 Oct 2016 22:14:34 +0000 (15:14 -0700)]
xfs_repair: fix get_agino_buf to avoid corrupting inodes
The inode buffering code tries to read inodes in units of chunks,
which are the larger of 8K or 1 FSB. Each chunk gets its own xfs_buf,
which means that get_agino_buf must calculate the disk address of the
chunk and feed that to libxfs_readbuf in order to find the inode data
correctly. The current code simply grabs the chunk for the start
inode and indexes from that, which corrupts memory because the start
inode and the target inode could be in different inode chunks. That
causes the assert in rmap.c to blow when we clear the reflink flag.
(Also fix some minor errors in the debugging printfs.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>