Darrick J. Wong [Fri, 26 Aug 2016 01:17:46 +0000 (11:17 +1000)]
libxcmd: fix mount option parsing to find rt/log devices
It turns out that glibc's hasmntopt implementation returns NULL
if the opt parameter ends with an equals ('='). Therefore, we
cannot directly search for the option 'rtdev='; we must instead
have hasmntopt look for 'rtdev' and look for the trailing equals
sign ourselves. This fixes xfs_info's reporting of external
log and realtime device paths, and xfs_scrub will need it for
data block scrubbing of realtime extents.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 26 Aug 2016 01:17:21 +0000 (11:17 +1000)]
xfs: simple btree query range should look right if LE lookup fails
If the initial LOOKUP_LE in the simple query range fails to find
anything, we should attempt to increment the btree cursor to see
if there actually /are/ records for what we're trying to find.
Without this patch, a bnobt range query of (0, $agsize) returns
no results because the leftmost record never has a startblock
of zero.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 26 Aug 2016 01:16:52 +0000 (11:16 +1000)]
xfs: fix some key handling problems in _btree_simple_query_range
We only need the record's high key for the first record that we look
at; for all records, we /definitely/ need the regular record key.
Therefore, fix how the simple range query function gets its keys.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 26 Aug 2016 01:16:41 +0000 (11:16 +1000)]
xfs: don't perform lookups on zero-height btrees
If the caller passes in a cursor to a zero-height btree (which is
impossible), we never set block to anything but NULL, which causes the
later dereference of it to crash. Instead, just return -EFSCORRUPTED.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 19 Aug 2016 00:54:53 +0000 (10:54 +1000)]
mkfs.xfs: create filesystems with reverse-mappings
Originally-From: Dave Chinner <dchinner@redhat.com>
Create v5 filesystems with rmapbt turned on. Document the rmapbt
options to mkfs, and initialize the extra field we added for reflink
support.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick.wong@oracle.com: split patch, add commit message and extra fields] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 19 Aug 2016 00:53:14 +0000 (10:53 +1000)]
mkfs: set agsize prior to calculating minimum log size
Each btree has its own maxlevels variable. Since the level count of
certain btrees depend on agblocks, it's necessary to know the AG size
prior to calculating the log reservations. These reservations are
needed to calculate the log size and the kernel will refuse to mount
if we guess too low, so stuff in the real agsize when we're formatting
the log.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 19 Aug 2016 00:52:41 +0000 (10:52 +1000)]
xfs_repair: check for impossible rmap record field combinations
Make sure there are no records or keys with impossible field
combinations, such as non-inode records with offsets or flags.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 19 Aug 2016 00:52:26 +0000 (10:52 +1000)]
xfs_repair: look for mergeable rmaps
Check for adjacent mergeable rmaps; this is a sign that we've
screwed up somehow.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 19 Aug 2016 00:52:06 +0000 (10:52 +1000)]
xfs_repair: merge data & attr fork reverse mappings
Merge data and attribute fork reverse mappings.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 19 Aug 2016 00:51:24 +0000 (10:51 +1000)]
xfs_repair: add per-AG btree blocks to rmap data and add to rmapbt
Since we can't know the location of the new per-AG btree blocks prior
to constructing the rmapbt, we must record raw reverse-mapping data for
btree blocks while the new btrees are under construction. After the
rmapbt has been rebuilt, merge the btree rmap entries into the rmapbt
with the libxfs code.
Also refactor the freelist fixing code since we need it to tidy up
the AGFL after each rmapbt allocation.
Use libxfs_rmap_alloc to add rmap records for AG metadata blocks
because it knows how to merge adjacent rmaps. This particular bug was
discovered while running xfs_repair twice on generic/175 wherein block
X was originally allocated to the rmapbt, then X+1 got allocated to
the rmapbt when we expanded it to hold all the entries for the rmapbt
blocks.
[dchinner: libxfs'ify the libxfs calls.]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 19 Aug 2016 00:43:24 +0000 (10:43 +1000)]
xfs_repair: rebuild reverse-mapping btree
Rebuild the reverse-mapping btree with the rmap observations
corresponding to file extents, bmbt blocks, and fixed per-AG metadata.
Leave a few empty slots in each rmapbt leaf when we're rebuilding
the rmapbt so that we can insert records for the AG metadata blocks
without causing too many btree splits. This (hopefully) prevents the
situation where running xfs_repair greatly increases the size of the
btree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 19 Aug 2016 00:33:49 +0000 (10:33 +1000)]
xfs_repair: check existing rmapbt entries against observed rmaps
Once we've finished collecting reverse mapping observations from the
metadata scan, check those observations against the rmap btree
(particularly if we're in -n mode) to detect rmapbt problems.
[dchinner: libxfs'ify the various libxfs calls. ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 19 Aug 2016 00:20:03 +0000 (10:20 +1000)]
xfs_repair: add fixed-location per-AG rmaps
Add reverse-mappings for fixed-location per-AG metadata such as inode
chunks, superblocks, and the log to the raw rmap list, then merge the
raw rmap data (which also has the BMBT data) into the main rmap list.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 19 Aug 2016 00:10:15 +0000 (10:10 +1000)]
xfs_repair: add inode bmbt block rmaps
Record BMBT blocks in the raw rmap list.
[dchinner: remove unused lastowner/lastoffset variables from scan.c]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 19 Aug 2016 00:09:23 +0000 (10:09 +1000)]
xfs_repair: record and merge raw rmap data
Since we still allow merging of BMBT block, AG metadata, and AG btree
block rmaps, provide a facility to collect these raw observations and
merge them (with maximal length) into the main rmap list.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 19 Aug 2016 00:05:56 +0000 (10:05 +1000)]
xfs_repair: collect reverse-mapping data for refcount/rmap tree rebuilding
Collect reverse-mapping data for the entire filesystem so that we can
later check and rebuild the reference count tree and the reverse mapping
tree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Fri, 19 Aug 2016 00:03:21 +0000 (10:03 +1000)]
xfs_repair: create a slab API for allocating arrays in large chunks
Create a slab-based array and a bag-of-pointers data structure to
facilitate rapid linear scans of reverse-mapping data for later
reconstruction of the refcount and rmap btrees.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:57:04 +0000 (09:57 +1000)]
xfs_repair: fix fino_bno calculation when rmapbt is enabled
In xfs_repair, we calculate where we think mkfs put the root inode
block. However, the rmapbt component doesn't account for the fact
that mkfs reserved 2 AGFL blocks for the rmapbt, so its calculation
is off by a bit. This leads to it complaining (incorrectly) about the
root inode block being in the wrong place and blowing up.
[dchinner: small comment update to indicate AGFL block accounting]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:53:30 +0000 (09:53 +1000)]
xfs_repair: use rmap btree data to check block types
Originally-From: Dave Chinner <dchinner@redhat.com>
Use the rmap btree to pre-populate the block type information so that
when repair iterates the primary metadata, we can confirm the block
type.
Ensure that we remove the flag bits from blockcount before using the
length field.
When we're processing rmap records, we set the bmap state of
the entire extent, not just the first block of the extent. This
enables us to catch improperly overlapping rmap records and later to
ensure that the entire primary metadata extent matches (owner-wise)
the reverse mapping. It also enables us to catch the case where the
rmapbt maps something that isn't pointed to by primary metadata.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick.wong@oracle.com: split patch, strip flag bits from blockcount] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:50:22 +0000 (09:50 +1000)]
xfs_logprint: support rmap redo items
Print reverse mapping update redo items.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:48:44 +0000 (09:48 +1000)]
xfs_io: add rmap-finish error injection type
Add XFS_ERRTAG_RMAP_FINISH_ONE to the types of errors we can inject.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:48:32 +0000 (09:48 +1000)]
xfs_growfs: report rmapbt presence
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:47:57 +0000 (09:47 +1000)]
xfs_db: introduce the 'fsmap' command to find what owns a set of fsblocks
Introduce a new 'fsmap' command to the fs debugger that will query the
rmap btree to report the file/metadata extents mapped to a range of
physical blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:39:03 +0000 (09:39 +1000)]
xfs_db: copy the rmap btree
Copy the rmapbt when we're metadumping the filesystem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:36:46 +0000 (09:36 +1000)]
xfs_db: spot check rmapbt
Check the rmapbt for obvious errors. We're leaving thorough checks
such as comparing the primary metadata against the rmapbt contents
for newer things like xfs_repair.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:33:41 +0000 (09:33 +1000)]
xfs_db: display rmap btree contents
Originally-From: Dave Chinner <dchinner@redhat.com>
Teach the debugger how to dump the reverse-mapping btree contents.
Decode the extra fields in the rmapbt records and keys now that we
support reflink.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: split patch, add commit message, decode extra fields]
[darrick: support overlapped interval btree fields]
[darrick: move unwritten bit to rm_offset] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:32:31 +0000 (09:32 +1000)]
libxfs: add deferred ops item handlers for userspace
Add deferred ops handlers for userspace, which simply call back
into the libxfs functions.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:31:29 +0000 (09:31 +1000)]
libxfs: fix various oddities in the kernel import
Fix some minor anomalies in the kernel -> xfsprogs import of the
4.8 libxfs code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:30:57 +0000 (09:30 +1000)]
xfs: remove OWN_AG rmap when allocating a block from the AGFL
When we're really tight on space, xfs_alloc_ag_vextent_small() can
allocate a block from the AGFL and give it to the caller. Since the
caller is never the AGFL-fixing method, we must remove the OWN_AG
reverse mapping because it will clash with whatever rmap the caller
wants to set up. This bug was discovered by running generic/299
repeatedly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:30:47 +0000 (09:30 +1000)]
xfs: store rmapbt block count in the AGF
Track the number of blocks used for the rmapbt in the AGF. When we
get to the AG reservation code we need this counter to quickly
make our reservation during mount.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:29:55 +0000 (09:29 +1000)]
xfs_io: add free-extent error injection type
Add XFS_ERRTAG_FREE_EXTENT to the types of errors we can inject.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:29:41 +0000 (09:29 +1000)]
man: document the DAX fsxattr inode flag
Document the new inode flag in struct fsxattr for DAX.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:29:28 +0000 (09:29 +1000)]
xfs_logprint: fix formatting issues with the EFI printing code
Fix some formatting issues with the EFI handling functions.
This is a purely mechanical whitespace fix, no code changes
aside from adding 'static'.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:28:41 +0000 (09:28 +1000)]
xfs_logprint: move the EFI copying/printing functions to a redo items file
Move the functions that handle EFI items into a separate file to
avoid cluttering up log_misc.c even more when we start adding the
other redo item types.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:28:06 +0000 (09:28 +1000)]
mkfs: fix library ordering
libblkid depends on libuuid, so we must specify -lblkid before -luuid.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:27:30 +0000 (09:27 +1000)]
xfs_repair: preserve in-core meta_uuid while zeroing unused sb portions
If we zero unused parts of the superblock, we must preserve meta_uuid
across the zeroing because meta_uuid is used to verify the v5 format
checksums even on non-metauuid filesystems. If we don't, the next
thing that happens is that all metadata fails in the verifier and the
whole filesystem is "toast".
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:26:52 +0000 (09:26 +1000)]
xfs_io: bmap should print 'delalloc', not '-2'
The bmap command (without -v) should print 'delalloc' and not -2
for the physical block number of an extent.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:26:19 +0000 (09:26 +1000)]
xfs_buflock: add a tool that can be used to find buffer deadlocks
Add a (rough) python script that can parse the output of:
# trace-cmd -e xfs_buf_*lock*' <other tracepoints>
to identify xfs_buf deadlocks between XFS threads.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Thu, 18 Aug 2016 23:20:46 +0000 (09:20 +1000)]
libxfs: fix xfs_isset pointer calculation
In the macro xfs_isset, the variable 'a' is a pointer to an array
type. However, the bit offset calculation uses sizeof(a), which
returns the size of the pointer, not the size of an array element.
Fix this, which also fixes the problem where xfs_check spits out
bogus "rtblock X expected type unknown, got rtfree" messages.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Rename the deferred bmap-free to extent_free and make them only
trigger when we're really running deferred ops.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Nothing ever uses the extent array in the rmap update done redo
item, so remove it before it is fixed in the on-disk log format.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
We only need the temporary cursor in _btree_lshift if we're shifting
in an overlapped btree. Therefore, factor that into a single block
of code so we avoid unnecessary cursor duplication.
Also fix use of the wrong cursor when checking for corruption in
xfs_btree_rshift().
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
In the lshift/rshift functions we don't use the key variable for
anything now, so remove the variable and its initializer. The
update_keys functions figure out the key for a block on their own.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
These are internal btree functions; we don't need them to be
dispatched via function pointers. Make them static again and
just check the overlapped flag to figure out what we need to
do. The strategy behind this patch was suggested by Christoph.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Originally-From: Dave Chinner <dchinner@redhat.com>
Add the feature flag to the supported matrix so that the kernel can
mount and use rmap btree enabled filesystems
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick.wong@oracle.com: move the experimental tag] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Allow a caller of xfs_alloc_fix_freelist to disable rmapbt updates
when fixing the AG freelist. xfs_repair needs this during phase 5
to be able to adjust the freelist while it's reconstructing the rmap
btree; the missing entries will be added back at the very end of
phase 5 once the AGFL contents settle down.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
When we map, unmap, or convert an extent in a file's data or attr
fork, schedule a respective update in the rmapbt. Previous versions
of this patch required a 1:1 correspondence between bmap and rmap,
but this is no longer true as we now have ability to make interval
queries against the rmapbt.
We use the deferred operations code to handle redo operations
atomically and deadlock free. This plumbs in all five rmap actions
(map, unmap, convert extent, alloc, free); we'll use the first three
now for file data, and reflink will want the last two. We also add
an error injection site to test log recovery.
Finally, we need to fix the bmap shift extent code to adjust the
rmaps correctly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Connect the xfs_defer mechanism with the pieces that we'll need to
handle deferred rmap updates. We'll wire up the existing code to
our new deferred mechanism later.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Create rmap update intent/done log items to record redo information in
the log. Because we need to roll transactions between updating the
bmbt mapping and updating the reverse mapping, we also have to track
the status of the metadata updates that will be recorded in the
post-roll transactions, just in case we crash before committing the
final transaction. This mechanism enables log recovery to finish what
was already started.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Add a couple of helper functions to encapsulate rmap btree insert and
delete operations. Add tracepoints to the update function.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Provide a function to convert an unwritten rmap extent to a real one
and vice versa.
[ dchinner: Note that this algorithm and code was derived from the
existing bmapbt unwritten extent conversion code in
xfs_bmap_add_extent_unwritten_real(). ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Originally-From: Dave Chinner <dchinner@redhat.com>
Now that we have records in the rmap btree, we need to remove them
when extents are freed. This needs to find the relevant record in
the btree and remove/trim/split it accordingly.
[darrick.wong@oracle.com: make rmap routines handle the enlarged keyspace]
[dchinner: remove remaining unused debug printks]
[darrick: fix a bug when growfs in an AG with an rmap ending at EOFS]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Originally-From: Dave Chinner <dchinner@redhat.com>
Now all the btree, free space and transaction infrastructure is in
place, we can finally add the code to insert reverse mappings to the
rmap btree. Freeing will be done in a separate patch, so just the
addition operation can be focussed on here.
[darrick: handle owner offsets when adding rmaps]
[dchinner: remove remaining debug printk statements]
[darrick: move unwritten bit to rm_offset]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Now that the generic btree code supports querying all records within a
range of keys, use that functionality to allow us to ask for all the
extents mapped to a range of physical blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Now that the generic btree code supports overlapping intervals, plug
in the rmap btree to this functionality. We will need it to find
potential left neighbors in xfs_rmap_{alloc,free} later in the patch
set.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Originally-From: Dave Chinner <dchinner@redhat.com>
Implement the generic btree operations needed to manipulate rmap
btree blocks. This is very similar to the per-ag freespace btree
implementation, and uses the AGFL for allocation and freeing of
blocks.
Adapt the rmap btree to store owner offsets within each rmap record,
and to handle the primary key being redefined as the tuple
[agblk, owner, offset]. The expansion of the primary key is crucial
to allowing multiple owners per extent.
[darrick: adapt the btree ops to deal with offsets]
[darrick: remove init_rec_from_key]
[darrick: move unwritten bit to rm_offset]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Originally-From: Dave Chinner <dchinner@redhat.com>
The rmap btree is allocated from the AGFL, which means we have to
ensure ENOSPC is reported to userspace before we run out of free
space in each AG. The last allocation in an AG can cause a full
height rmap btree split, and that means we have to reserve at least
this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
Update the various space calculation functions to handle this.
Also, because the macros are now executing conditional code and are
called quite frequently, convert them to functions that initialise
variables in the struct xfs_mount, use the new variables everywhere
and document the calculations better.
[darrick.wong@oracle.com: don't reserve blocks if !rmap]
[dchinner@redhat.com: update m_ag_max_usable after growfs]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
The rmap btrees will use the AGFL as the block allocation source, so
we need to ensure that the transaction reservations reflect the fact
this tree is modified by allocation and freeing. Hence we need to
extend all the extent allocation/free reservations used in
transactions to handle this.
Note that this also gets rid of the unused XFS_ALLOCFREE_LOG_RES
macro, as we now do buffer reservations based on the number of
buffers logged via xfs_calc_buf_res(). Hence we only need the buffer
count calculation now.
[darrick: use rmap_maxlevels when calculating log block resv]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Originally-From: Dave Chinner <dchinner@redhat.com>
Now we have all the surrounding call infrastructure in place, we can
start filling out the rmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for adding the
btree operations implementation.
[darrick: record owner and offset info in rmap btree]
[darrick: fork, bmbt and unwritten state in rmap btree]
[darrick: flags are a separate field in xfs_rmap_irec]
[darrick: calculate maxlevels separately]
[darrick: move the 'unwritten' bit into unused parts of rm_offset]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Originally-From: Dave Chinner <dchinner@redhat.com>
Add the stubs into the extent allocation and freeing paths that the
rmap btree implementation will hook into. While doing this, add the
trace points that will be used to track rmap btree extent
manipulations.
[darrick.wong@oracle.com: Extend the stubs to take full owner info.]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
For the rmap btree to work, we have to feed the extent owner
information to the the allocation and freeing functions. This
information is what will end up in the rmap btree that tracks
allocated extents. While we technically don't need the owner
information when freeing extents, passing it allows us to validate
that the extent we are removing from the rmap btree actually
belonged to the owner we expected it to belong to.
We also define a special set of owner values for internal metadata
that would otherwise have no owner. This allows us to tell the
difference between metadata owned by different per-ag btrees, as
well as static fs metadata (e.g. AG headers) and internal journal
blocks.
There are also a couple of special cases we need to take care of -
during EFI recovery, we don't actually know who the original owner
was, so we need to pass a wildcard to indicate that we aren't
checking the owner for validity. We also need special handling in
growfs, as we "free" the space in the last AG when extending it, but
because it's new space it has no actual owner...
While touching the xfs_bmap_add_free() function, re-order the
parameters to put the struct xfs_mount first.
Extend the owner field to include both the owner type and some sort
of index within the owner. The index field will be used to support
reverse mappings when reflink is enabled.
When we're freeing extents from an EFI, we don't have the owner
information available (rmap updates have their own redo items).
xfs_free_extent therefore doesn't need to do an rmap update. Make
sure that the log replay code signals this correctly.
This is based upon a patch originally from Dave Chinner. It has been
extended to add more owner information with the intent of helping
recovery operations when things go wrong (e.g. offset of user data
block in a file).
[dchinner: de-shout the xfs_rmap_*_owner helpers]
[darrick: minor style fixes suggested by Christoph Hellwig]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Originally-From: Dave Chinner <dchinner@redhat.com>
XFS reserves a small amount of space in each AG for the minimum
number of free blocks needed for operation. Adding the rmap btree
increases the number of reserved blocks, but it also increases the
complexity of the calculation as the free inode btree is optional
(like the rmbt).
Rather than calculate the prealloc blocks every time we need to
check it, add a function to calculate it at mount time and store it
in the struct xfs_mount, and convert the XFS_PREALLOC_BLOCKS macro
just to use the xfs-mount variable directly.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Originally-From: Dave Chinner <dchinner@redhat.com>
Add new per-ag rmap btree definitions to the per-ag structures. The
rmap btree will sit in the empty slots on disk after the free space
btrees, and hence form a part of the array of space management
btrees. This requires the definition of the btree to be contiguous
with the free space btrees.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
By my calculations, a 1,073,741,824 block AG with a 1k block size
can attain a maximum height of 9. Assuming a record size of 24
bytes, a key/ptr size of 44 bytes, and half-full btree nodes, we'd
need 53,687,092 blocks for the records and ~6 million blocks for the
keys. That requires a btree of height 9 based on the following
derivation:
Block size = 1024b
sblock CRC header = 56b
== 1024-56 = 968 bytes for tree data
rmapbt record = 24b
== 40 records per leaf block
rmapbt ptr/key = 44b
== 22 ptr/keys per block
Worst case, each block is half full, so 20 records and 11 ptrs per block.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Add a couple of tracepoints for the deferred extent free operation and
a site for injecting errors while finishing the operation. This makes
it easier to debug deferred ops and test log redo.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Mechanical change of flist/free_list to dfops, since they're now
deferred ops, not just a freeing list.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Drop the compatibility shims that we were using to integrate the new
deferred operation mechanism into the existing code. No new code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Restructure everything that used xfs_bmap_free to use xfs_defer_ops
instead. For now we'll just remove the old symbols and play some
cpp magic to make it work; in the next patch we'll actually rename
everything.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Connect the xfs_defer mechanism with the pieces that we'll need to
handle deferred extent freeing. We'll wire up the existing code to
our new deferred mechanism later.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Add tracepoints for the internals of the deferred ops mechanism
and tracepoint classes for clients of the dops, to make debugging
easier.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
All the code around struct xfs_bmap_free basically implements a
deferred operation framework through which we can roll transactions
(to unlock buffers and avoid violating lock order rules) while
managing all the necessary log redo items. Previously we only used
this code to free extents after some sort of mapping operation, but
with the advent of rmap and reflink, we suddenly need to do more than
that.
With that in mind, xfs_bmap_free really becomes a deferred ops control
structure. Rename the structure and move the deferred ops into their
own file to avoid further bloating of the bmap code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Refactor the btree_change_owner function into a more generic apparatus
which visits all blocks in a btree. We'll use this in a subsequent
patch for counting btree blocks for AG reservations.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Create a function to enable querying of btree records mapping to a
range of keys. This will be used in subsequent patches to allow
querying the reverse mapping btree to find the extents mapped to a
range of physical blocks, though the generic code can be used for
any range query.
The overlapped query range function needs to use the btree get_block
helper because the root block could be an inode, in which case
bc_bufs[nlevels-1] will be NULL.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
On a filesystem with both reflink and reverse mapping enabled, it's
possible to have multiple rmap records referring to the same blocks on
disk. When overlapping intervals are possible, querying a classic
btree to find all records intersecting a given interval is inefficient
because we cannot use the left side of the search interval to filter
out non-matching records the same way that we can use the existing
btree key to filter out records coming after the right side of the
search interval. This will become important once we want to use the
rmap btree to rebuild BMBTs, or implement the (future) fsmap ioctl.
(For the non-overlapping case, we can perform such queries trivially
by starting at the left side of the interval and walking the tree
until we pass the right side.)
Therefore, extend the btree code to come closer to supporting
intervals as a first-class record attribute. This involves widening
the btree node's key space to store both the lowest key reachable via
the node pointer (as the btree does now) and the highest key reachable
via the same pointer and teaching the btree modifying functions to
keep the highest-key records up to date.
This behavior can be turned on via a new btree ops flag so that btrees
that cannot store overlapping intervals don't pay the overhead costs
in terms of extra code and disk format changes.
When we're deleting a record in a btree that supports overlapped
interval records and the deletion results in two btree blocks being
joined, we defer updating the high/low keys until after all possible
joining (at higher levels in the tree) have finished. At this point,
the btree pointers at all levels have been updated to remove the empty
blocks and we can update the low and high keys.
When we're doing this, we must be careful to update the keys of all
node pointers up to the root instead of stopping at the first set of
keys that don't need updating. This is because it's possible for a
single deletion to cause joining of multiple levels of tree, and so
we need to update everything going back to the root.
The diff_two_keys functions return < 0, 0, or > 0 if key1 is less than,
equal to, or greater than key2, respectively. This is consistent
with the rest of the kernel and the C library.
In btree_updkeys(), we need to evaluate the force_all parameter before
running the key diff to avoid reading uninitialized memory when we're
forcing a key update. This happens when we've allocated an empty slot
at level N + 1 to point to a new block at level N and we're in the
process of filling out the new keys.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Add some function pointers to bc_ops to get the btree keys for
leaf and node blocks, and to update parent keys of a block.
Convert the _btree_updkey calls to use our new pointer, and
modify the tree shape changing code to call the appropriate
get_*_keys pointer instead of _btree_copy_keys because the
overlapping btree has to calculate high key values.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
When a btree block has to be split, we pass the new block's ptr from
xfs_btree_split() back to xfs_btree_insert() via a pointer parameter;
however, we pass the block's key through the cursor's record. It is a
little weird to "initialize" a record from a key since the non-key
attributes will have garbage values.
When we go to add support for interval queries, we have to be able to
pass the lowest and highest keys accessible via a pointer. There's no
clean way to pass this back through the cursor's record field.
Therefore, pass the key directly back to xfs_btree_insert() the same
way that we pass the btree_ptr.
As a bonus, we no longer need init_rec_from_key and can drop it from the
codebase.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
If we make the inode root block of a btree unfull by expanding the
root, we must set *stat to 1 to signal success, rather than leaving
it uninitialized.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
When we're deleting realtime extents, we need to lock the summary
inode in case we need to update the summary info to prevent an assert
on the rsumip inode lock on a debug kernel. While we're at it, fix
the locking annotations so that we avoid triggering lockdep warnings.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Apparently cris doesn't require structure stride to align with the
largest type in the struct, so list[0] isn't at offset 4 like it is
everywhere else. Fix this... insofar as existing XFSes on cris are
screwed.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Instead we always declare struct xfs_dir2_sf_hdr as packed. That's
the expected layout, and while most major architectures do the packing
by default the new structure size and offset checker showed that not
only the ARM old ABI got this wrong, but various minor embedded
architectures did as well.
[Verified that no code change on x86-64 results from this change]
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
And use an array of unsigned char values directly to avoid problems
with architectures that pad the size of structures. This also gets
rid of the xfs_dir2_ino4_t and xfs_dir2_ino8_t types, and introduces
new constants for the size of 4 and 8 bytes as well as the size
difference between the two.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Create a common function to calculate the maximum height of a per-AG
btree. This will eventually be used by the rmapbt and refcountbt
code to calculate appropriate maxlevels values for each. This is
important because the verifiers and the transaction block
reservations depend on accurate estimates of how many blocks are
needed to satisfy a btree split.
We were mistakenly using the max bnobt height for all the btrees,
which creates a dangerous situation since the larger records and
keys in an rmapbt make it very possible that the rmapbt will be
taller than the bnobt and so we can run out of transaction block
reservation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
In struct xfs_bmap_free, convert the open-coded free extent list to
a regular list, then use list_sort to sort it prior to processing.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Break up xfs_free_extent() into a helper that fixes the freelist.
This helper will be used subsequently to ensure the freelist during
deferred rmap processing.
[darrick: refactor to put this at the head of the patchset]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Darrick J. Wong [Wed, 10 Aug 2016 01:29:35 +0000 (11:29 +1000)]
libxfs: add more list operations
Add some list operations that the deferred rmap code requires.
Code comes from the following kernel files:
lib/list_sort.c for all the list_sort stuff,
include/linux/list.h for the rest of the list_* stuff,
include/linux/kernel.h for container_of.
[ dchinner: move list_sort code to libxfs/list_sort.c ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Wed, 10 Aug 2016 01:29:30 +0000 (11:29 +1000)]
libxfs: fix set-but unused warning in dir2 code
Fix these build warnings:
xfs_dir2_leaf.c: In function ¿xfs_dir2_block_to_leaf¿:
xfs_dir2_leaf.c:389:16: warning: variable ¿tp¿ set but not used [-Wunused-but-set-variable]
xfs_trans_t *tp; /* transaction pointer */
^
xfs_dir2_node.c: In function ¿xfs_dir2_leaf_to_node¿:
xfs_dir2_node.c:302:16: warning: variable ¿tp¿ set but not used [-Wunused-but-set-variable]
xfs_trans_t *tp; /* transaction pointer */
^
Zorro Lang [Thu, 4 Aug 2016 01:29:49 +0000 (11:29 +1000)]
xfs_quota: fall back silently if XFS_GETNEXTQUOTA fails
After XFS_GETNEXTQUOTA feature has been merged into linux kernel and
xfsprogs, xfs_quota use Q_XGETNEXTQUOTA for report and dump, and
fall back to old XFS_GETQUOTA ioctl if XFS_GETNEXTQUOTA fails.
But when XFS_GETNEXTQUOTA fails, xfs_quota print a warning as
"XFS_GETQUOTA: Invalid argument". That's due to kernel can't
recognize XFS_GETNEXTQUOTA ioctl and return EINVAL. At this time,
the warning is helpless, xfs_quota just need to fall back.
Signed-off-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Anna Schumaker [Thu, 4 Aug 2016 01:29:49 +0000 (11:29 +1000)]
xfs_io: Update man page for copy_range command
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Felix Janda [Thu, 4 Aug 2016 01:29:49 +0000 (11:29 +1000)]
mkfs: Remove workaround for getsubopt() on older glibc
The workaround addressed a const-correctness warning on glibc
versions older than 2.2. However, it also captures alternative C
libraries on Linux which it should not do. glibc is really old, so
let's just remove the workaround.
Signed-off-by: Felix Janda <felix.janda@posteo.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Anna Schumaker [Wed, 20 Jul 2016 05:31:54 +0000 (15:31 +1000)]
xfs_io: implement 'copy_range' command
Implements a new xfs_io command, named 'copy_range', which is supposed
to be used to copy a range of data from one file to another.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Zorro Lang [Wed, 20 Jul 2016 05:31:18 +0000 (15:31 +1000)]
xfs_repair: low memory shouldn't indicate corruption on exit
When I run "xfs_repair -n" on a 500T device with 16G memory,
xfs_repair print warning as below:
Memory available for repair (11798MB) may not be sufficient.
At least 64048MB is needed to repair this filesystem efficiently
If repair fails due to lack of memory, please
turn prefetching off (-P) to reduce the memory footprint.
And it returned an exit value of 1. But xfs_repair didn't hit any
error, so there is no reason to mark the fs as corrupted just
because it thinks it might *possibly* not have enough memory to run
to completion.
do_warn() will set fs_is_dirty=1 and hence give a non-zero exit
status. If we only want to print an informational message (not a
real issue), then we should use do_log() instead.
Signed-off-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>