Kanda Motohiro reported that expanding a tiny xattr into a large xattr
fails on XFS because we remove the tiny xattr from a shortform fork and
then try to re-add it after converting the fork to extents format having
not removed the ATTR_REPLACE flag. This fails because the attr is no
longer present, causing a fs shutdown.
This is derived from the patch in his bug report, but we really
shouldn't ignore a nonzero retval from the remove call.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199119 Reported-by: kanda.motohiro@gmail.com Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
If xfs_bmap_extents_to_btree fails in a mode where we call
xfs_iroot_realloc(-1) to de-allocate the root, set the
format back to extents.
Otherwise we can assume we can dereference ifp->if_broot
based on the XFS_DINODE_FMT_BTREE format, and crash.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199423 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Add several more validations to xfs_dinode_verify:
- For LOCAL data fork formats, di_nextents must be 0.
- For LOCAL attr fork formats, di_anextents must be 0.
- For inodes with no attr fork offset,
- format must be XFS_DINODE_FMT_EXTENTS if set at all
- di_anextents must be 0.
Thanks to dchinner for pointing out a couple related checks I had
forgotten to add.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199377 Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Most of the generic data structures embedded in xfs_mount are
dynamically initialized immediately after mp is allocated. A few
fields are left out and initialized during the xfs_mountfs()
sequence, after mp has been attached to the superblock.
To clean this up and help prevent premature access of associated
fields, refactor xfs_mount allocation and all dependent init calls
into a new helper. This self-documents that all low level data
structures (i.e., locks, trees, etc.) should be initialized before
xfs_mount is attached to the superblock.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
We can only get into the branch if CRCs are enabled, so there's no
need to check inside the branch for CRCs being enabled....
Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
When we're verifying inode buffers, sanity-check the unlinked pointer.
We don't want to run the risk of trying to purge something that's
obviously broken.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Extent size hint validation is used by scrub to decide if there's an
error, and it will be used by repair to decide to remove the hint.
Since these use the same validation functions, move them to libxfs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
When the inode buffer verifier encounters an error, it's much more
helpful to print a buffer from the offending inode instead of just the
start of the inode chunk buffer.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Refactor some of the inode verifier failure logging call sites to use
the new xfs_inode_verifier_error method which dumps the offending buffer
as well as the code location of the failed check. This trims the
output, makes it clearer to the admin that repair must be run, and gives
the developers more details to work from.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Refactor the bmap validator into a more complete helper that looks for
extents that run off the end of the device, overflow into the next AG,
or have invalid flag states.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
In xfs_dir2_data_use_free, we examine on-disk metadata and ASSERT if
it doesn't make sense. Since a carefully crafted fuzzed image can cause
the kernel to crash after blowing a bunch of assertions, let's move
those checks into a validator function and rig everything up to return
EFSCORRUPTED to userspace. Found by lastbit fuzzing ltail.bestcount via
xfs/391.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
The struct xfs_agfl v5 header was originally introduced with
unexpected padding that caused the AGFL to operate with one less
slot than intended. The header has since been packed, but the fix
left an incompatibility for users who upgrade from an old kernel
with the unpacked header to a newer kernel with the packed header
while the AGFL happens to wrap around the end. The newer kernel
recognizes one extra slot at the physical end of the AGFL that the
previous kernel did not. The new kernel will eventually attempt to
allocate a block from that slot, which contains invalid data, and
cause a crash.
This condition can be detected by comparing the active range of the
AGFL to the count. While this detects a padding mismatch, it can
also trigger false positives for unrelated flcount corruption. Since
we cannot distinguish a size mismatch due to padding from unrelated
corruption, we can't trust the AGFL enough to simply repopulate the
empty slot.
Instead, avoid unnecessarily complex detection logic and and use a
solution that can handle any form of flcount corruption that slips
through read verifiers: distrust the entire AGFL and reset it to an
empty state. Any valid blocks within the AGFL are intentionally
leaked. This requires xfs_repair to rectify (which was already
necessary based on the state the AGFL was found in). The reset
mitigates the side effect of the padding mismatch problem from a
filesystem crash to a free space accounting inconsistency. The
generic approach also means that this patch can be safely backported
to kernels with or without a packed struct xfs_agfl.
Check the AGF for an invalid freelist count on initial read from
disk. If detected, set a flag on the xfs_perag to indicate that a
reset is required before the AGFL can be used. In the first
transaction that attempts to use a flagged AGFL, reset it to empty,
warn the user about the inconsistency and allow the freelist fixup
code to repopulate the AGFL with new blocks. The xfs_perag flag is
cleared to eliminate the need for repeated checks on each block
allocation operation.
This allows kernels that include the packing fix commit 96f859d52bcb
("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
to handle older unpacked AGFL formats without a filesystem crash.
Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
The rmapbt perag metadata reservation reserves blocks for the
reverse mapping btree (rmapbt). Since the rmapbt uses blocks from
the agfl and perag accounting is updated as blocks are allocated
from the allocation btrees, the reservation actually accounts blocks
as they are allocated to (or freed from) the agfl rather than the
rmapbt itself.
While this works for blocks that are eventually used for the rmapbt,
not all agfl blocks are destined for the rmapbt. Blocks that are
allocated to the agfl (and thus "reserved" for the rmapbt) but then
used by another structure leads to a growing inconsistency over time
between the runtime tracking of rmapbt usage vs. actual rmapbt
usage. Since the runtime tracking thinks all agfl blocks are rmapbt
blocks, it essentially believes that less future reservation is
required to satisfy the rmapbt than what is actually necessary.
The inconsistency is rectified across mount cycles because the perag
reservation is initialized based on the actual rmapbt usage at mount
time. The problem, however, is that the excessive drain of the
reservation at runtime opens a window to allocate blocks for other
purposes that might be required for the rmapbt on a subsequent
mount. This problem can be demonstrated by a simple test that runs
an allocation workload to consume agfl blocks over time and then
observe the difference in the agfl reservation requirement across an
unmount/mount cycle:
mount ...: xfs_ag_resv_init: ... resv 3193 ask 3194 len 3194
...
... : xfs_ag_resv_alloc_extent: ... resv 2957 ask 3194 len 1
umount...: xfs_ag_resv_free: ... resv 2956 ask 3194 len 0
mount ...: xfs_ag_resv_init: ... resv 3052 ask 3194 len 3194
As the above tracepoints show, the reservation requirement reduces
from 3194 blocks to 2956 blocks as the workload runs. Without any
other changes in the filesystem, the same reservation requirement
jumps from 2956 to 3052 blocks over a umount/mount cycle.
To address this divergence, update the RMAPBT reservation to account
blocks used for the rmapbt only rather than all blocks filled into
the agfl. This patch makes several high-level changes toward that
end:
1.) Reintroduce an AGFL reservation type to serve as an accounting
no-op for blocks allocated to (or freed from) the AGFL.
2.) Invoke RMAPBT usage accounting from the actual rmapbt block
allocation path rather than the AGFL allocation path.
The first change is required because agfl blocks are considered free
blocks throughout their lifetime. The perag reservation subsystem is
invoked unconditionally by the allocation subsystem, so we need a
way to tell the perag subsystem (via the allocation subsystem) to
not make any accounting changes for blocks filled into the AGFL.
The second change causes the in-core RMAPBT reservation usage
accounting to remain consistent with the on-disk state at all times
and eliminates the risk of leaving the rmapbt reservation
underfilled.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
The AGFL perag reservation type accounts all allocations that feed
into (or are released from) the allocation group free list (agfl).
The purpose of the reservation is to support worst case conditions
for the reverse mapping btree (rmapbt). As such, the agfl
reservation usage accounting only considers rmapbt usage when the
in-core counters are initialized at mount time.
This implementation inconsistency leads to divergence of the in-core
and on-disk usage accounting over time. In preparation to resolve
this inconsistency and adjust the AGFL reservation into an rmapbt
specific reservation, rename the AGFL reservation type and
associated accounting fields to something more rmapbt-specific. Also
fix up a couple tracepoints that incorrectly use the AGFL
reservation type to pass the agfl state of the associated extent
where the raw reservation type is expected.
Note that this patch does not change perag reservation behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
The AGFL size calculation is about to get more complex, so lets turn
the macro into a function first and remove the macro.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: forward port to newer kernel, simplify the helper] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Yet another round of playing whack-a-mole with directory code that
asserts on corrupt on-disk metadata when it really should be returning
-EFSCORRUPTED instead of ASSERTing. Found by a xfs/391 crash while
lastbit fuzzing of ltail.bestcount.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:11 +0000 (10:34 -0500)]
xfs_repair: fix getsubopt name definitions to use enums
Convert the getsubopt usage in xfs_repair to use enums and explicitly
initialized array elements, similar to mkfs. This also fixes the hole
in the o_opts table caused by 42fa89bc1b8dc8 ("xfs_repair: remove
pre_65_beta option") that causes segfaults in xfs/179 and xfs/202.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Fixes: 42fa89bc1b ("xfs_repair: remove pre_65_beta option") Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:11 +0000 (10:34 -0500)]
xfs_scrub_all: use system encoding for lsblk output decoding
Don't hardcode utf-8 as the decoding scheme for lsblk output, since the
system could set it to anything else.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:11 +0000 (10:34 -0500)]
xfs_scrub_all: escape paths being passed to systemd service instances
systemd doesn't like unit instance names with slashes in them, so it
replaces them with dashes when it invokes the service. However, it's
not smart enough to convert the dashes to something else, so when it
unescapes the instance name to feed to xfs_scrub, it turns all dashes
into slashes. "/moo-cow" becomes "-moo-cow" becomes "/moo/cow", which
is wrong. systemd actually /can/ escape the dashes correctly if it is
told that this is a path (and not a unit name), but it didn't do this
prior to January 2017, so fix this for them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:11 +0000 (10:34 -0500)]
xfs_scrub: disable private /tmp for scrub service
Don't make /tmp private when invoking xfs_scrub as a service, because
/tmp might contain or itself be an xfs filesystem mountpoint.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:11 +0000 (10:34 -0500)]
xfs_scrub_all: report version
Make xfs_scrub_all -V report its version like the other xfs tools.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:11 +0000 (10:34 -0500)]
xfs_scrub: refactor mountpoint finding code to use libfrog path code
Use the libfrog path finding code to determine if the argument being
passed in is a mountpoint, remove all mention of taking a block device
(we have never supported that) from the documentation, and fix some
potential memory leaks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:11 +0000 (10:34 -0500)]
xfs_scrub: don't warn about confusing names if dir/file only writable by root
If we are scanning the directory entries or attribute names of a
dir/file and the inode can only be written by root, don't warn about
Unicode confusable names by default because the system administrator
presumably made the system like that. Also don't warn about really
short confusable names because of the high chance of collisions. If
the caller really wants all the output, they can run in verbose mode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:11 +0000 (10:34 -0500)]
xfs_scrub: use Unicode skeleton function to find confusing names
Drop the weak normalization-based Unicode name collision detection in
favor of the confusable name guidelines provided in Unicode TR36 & TR39.
This means that we transform the original name into its Unicode skeleton
in order to do hashing-based collision detection.
The Unicode skeleton is defined as nfd(translation(nfd(string))), which
is to say that it flattens sequences that render ambiguously into a
unambiguous format. For example, 'l' and '1' can render identically in
some typefaces, so they're both squashed to 'l'. From the skeletons we
can figure out if two names will look the same, and thereby complain
about them. The unicode spoofing is provided by libicu, hence the
switch away from libunistring.
Note that potentially confusable names are only worth an informational
warning, since it's entirely possible that with the system typefaces in
use, two names will render distinctly enough that users can tell the
difference.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:11 +0000 (10:34 -0500)]
xfs_scrub: check name for suspicious characters
Look for suspicious characters in each name we process. This includes
control characters, text direction overrides, zero-width code points,
and names that mix characters from different directionalities.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:11 +0000 (10:34 -0500)]
xfs_scrub: transition from libunistring to libicu for Unicode processing
Move off of libunistring and onto libicu for Unicode name scanning.
This will make it easy to warn about unicode code points that do not
belong in identifiers (directional overrides, zero width elements) and
warn about names that could render similarly enough to cause confusion.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:11 +0000 (10:34 -0500)]
xfs_scrub: make name_entry a first class structure
Instead of open-coding the construction and hashtable insertion of name
entries, make name_entry a first class object. This means that we now
have name_entry_ prefix functions that take care of computing Unicode
normalized names as part of name_entry construction, and we pass around
the name_entries when we're looking for suspicious characters and
identically rendering names.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:11 +0000 (10:34 -0500)]
xfs_scrub: communicate name problems via flagset instead of booleans
Use an unsigned int to pass around name error flags instead of booleans.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:08 +0000 (10:34 -0500)]
xfs_scrub: don't complain about different normalization
Since there are different ways to normalize utf8 names, don't complain
when we find a name that is normalized in a different way than the NFKC
that we use to find duplicate names.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:08 +0000 (10:34 -0500)]
xfs_scrub: only run ascii name checks if unicode name checker
Skip the ASCII name checks if the Unicode name checker is going to run,
since the latter covers everything that the former does.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 12 Apr 2018 15:34:08 +0000 (10:34 -0500)]
xfs_scrub: avoid buffer overflow when scanning attributes
Avoid a buffer overflow when we're formatting extended attribute names
for name checking. The kernel headers provide us with XATTR_NAME_MAX,
so we can rely on that.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Tue, 3 Apr 2018 16:13:58 +0000 (11:13 -0500)]
libxfs: warn about deprecation of irix, freebsd, darwin
It's not clear that anyone is using these platforms or if
they even build at this point. Get someone's attention if
they are trying to use it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Tue, 3 Apr 2018 16:13:57 +0000 (11:13 -0500)]
xfs_repair: test XFS_SB_VERSION_SHAREDBIT only once
Remove 2 of the 3 identical tests for XFS_SB_VERSION_SHAREDBIT
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Tue, 3 Apr 2018 16:13:57 +0000 (11:13 -0500)]
xfsprogs: remove unused delete_attr_ok
delete_attr_ok is never set to anything but 1;
remove it and all associated code.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Tue, 3 Apr 2018 16:13:57 +0000 (11:13 -0500)]
xfs_repair: remove pre_65_beta option
Irix 6.5 was released 20 years ago. Remove this option from
the code. (nb: it's not present in the manpage.)
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The fs_shared_allowed global was set to 1 and then ignored, and
in fact the feature is never actualy allowed. Remove it and
the last stragglers of the old features comment.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
fs_has_extflgbit_allowed is never set to anything but 1;
remove it and all associated code.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
fs_sb_feature_bits_allowed is never set to anything but 1;
remove it and all associated code.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
fs_aligned_inodes_allowed is never set to anything but 1;
remove it and all associated code.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
fs_has_extflgbit_allowed is never set to anything but 1;
remove it and all associated code.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Tue, 3 Apr 2018 16:13:56 +0000 (11:13 -0500)]
xfs_repair: remove unused fs_attributes2_allowed
fs_attributes2_allowed is never set to anything but 1;
remove it and all associated code.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Tue, 3 Apr 2018 16:13:49 +0000 (11:13 -0500)]
xfs_repair: remove unused fs_attributes_allowed
fs_attributes_allowed is never set to anything but 1;
remove it and all associated code.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Tue, 27 Mar 2018 22:43:37 +0000 (17:43 -0500)]
libfrog: handle NULL dir && blkdev in __fs_table_lookup_mount
If neither dir nor blkdev is set, dpath never gets set,
and then gets used (uninitalized) later on.
If we are asked where "nothing" is mounted, just return
"nowhere."
Fixes-coverity-id: 1433615
Fixes-coverity-id: 1433616 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Tue, 27 Mar 2018 22:43:37 +0000 (17:43 -0500)]
xfs_scrub: initialize movon in xfs_scrub_connections
Given the logic in xfs_scrub_connections, it's possible to
fail all 3 tests and wind up checking an uninitialized moveon
variable at the end. Start out with "true" to avoid this and
move on even if all the conditions in the function are false.
Fixes-coverity-id: 1433617 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Tue, 27 Mar 2018 22:43:37 +0000 (17:43 -0500)]
xfs_scrub: synchronize error levels & logging
Having only a subset of the five error_levels present in
the log_level[] array is asking for trouble when someone
tries to __str_log(S_PREEN ...) and overruns the array.
Tie it all together in a single structure that's
initialized together to make the mapping more obvious and
idiot-proof.
Fixes-coverity-id: 1433618 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Tue, 27 Mar 2018 22:43:37 +0000 (17:43 -0500)]
xfs_spaceman: remove incorrect linux/fs.h include
Remove the direct linux/fs.h include from spaceman because all xfs
utilities should include xfs.h so that we can wrap missing kernel header
declarations properly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Jan Tulak [Tue, 27 Mar 2018 02:27:31 +0000 (21:27 -0500)]
fsck.xfs: allow forced repairs using xfs_repair
The fsck.xfs script did nothing, because xfs doesn't need a fsck to be
run on every unclean shutdown. However, sometimes it may happen that the
root filesystem really requires the usage of xfs_repair and then it is a
hassle. This patch makes the situation a bit easier by detecting forced
checks (/forcefsck or fsck.mode=force) and invoking xfs_repair.
Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Jan Tulak [Tue, 27 Mar 2018 02:27:31 +0000 (21:27 -0500)]
xfs_repair: add flag -e to modify exit code for corrected errors
xfs_repair without -n ends with a return code 0 if it finished ok,
no matter if there were some errors in the fs, or not. The new flag
-e means that we can avoid screenscraping and parsing text output to
detect if an error was found (and corrected).
If something could not be corrected or in any other case than the "found
something but fixed it all," the behaviour with this flag is unchanged.
Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: make -e and -n exclusivity clear in manpage synopsis] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Tue, 27 Mar 2018 02:27:31 +0000 (21:27 -0500)]
metadump/restore: don't use errno after fwrite/fread failures
fread/fwrite don't set errno, so printing out strerror(errno)
after a failure leads to incorrect and confusing messages:
# xfs_mdrestore pre_repair.meta pre_repair.img
xfs_mdrestore: error reading from file: Success
Don't return unset errno from write_index, either.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Tue, 27 Mar 2018 02:27:31 +0000 (21:27 -0500)]
mkfs: enable sparse inodes by default
Enable the sparse inode feature by default.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Tue, 27 Mar 2018 02:27:31 +0000 (21:27 -0500)]
xfs_fsr: refactor mountpoint finding to use libfrog paths functions
Refactor the mount-point finding code in fsr to use the libfrog helpers
instead of open-coding yet another routine.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Tue, 27 Mar 2018 02:27:28 +0000 (21:27 -0500)]
libfrog: fs_table_lookup_mount should realpath the argument
Call realpath on the dir argument so that we're comparing canonical
paths when looking for the mountpoint. This fixes the problem where
'/home/' doesn't match '/home' even though they refer to the same thing.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Tue, 27 Mar 2018 02:27:28 +0000 (21:27 -0500)]
xfs_repair: use custom ifork verifier in mv_orphanage
Now that we have a custom verifier which can ignore parent
inode numbers, use it in mv_orphanage() as well; orphan inodes
may have invalid parents, and we're about to reconnect
them anyway, so override that test when we get them off disk.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Tue, 27 Mar 2018 02:27:28 +0000 (21:27 -0500)]
xfs_repair: implement custom ifork verifiers
There are a few cases where an early stage of xfs_repair will write an
invalid inode fork buffer to signal to a later stage that it needs to
correct the value. This happens in phase 4 when we detect an inline
format directory with an invalid .. pointer. To avoid triggering the
ifork verifiers on this, inject a custom verifier for phase 6 that lets
this pass for now.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Baruch Siach [Tue, 27 Mar 2018 02:27:28 +0000 (21:27 -0500)]
xfs_scrub: add missing paths header
Fix the following build failure with musl libc:
xfs_scrub.c: In function ‘main’:
xfs_scrub.c:670:11: error: ‘_PATH_MOUNTED’ undeclared (first use in this function)
mtab = _PATH_MOUNTED;
^~~~~~~~~~~~~
Signed-off-by: Baruch Siach <baruch@tkos.co.il> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Baruch Siach [Tue, 27 Mar 2018 02:27:28 +0000 (21:27 -0500)]
workqueue: add missing pthreads header
Fix the following build failure with musl libc:
In file included from read_verify.c:25:0:
../include/workqueue.h:39:2: error: unknown type name 'pthread_t'
pthread_t *threads;
^~~~~~~~~
../include/workqueue.h:42:2: error: unknown type name 'pthread_mutex_t'
pthread_mutex_t lock;
^~~~~~~~~~~~~~~
../include/workqueue.h:43:2: error: unknown type name 'pthread_cond_t'
pthread_cond_t wakeup;
^~~~~~~~~~~~~~
Signed-off-by: Baruch Siach <baruch@tkos.co.il> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 9 Mar 2018 02:35:23 +0000 (20:35 -0600)]
xfs_repair: don't fail directory repairs when grabbing inodes
There are a few places where xfs_repair needs to be able to load a
damaged directory inode to perform repairs. Since inline data fork
verifiers can now be customized, refactor libxfs_iget to enable
repair to get at this so that we don't crash in phase 6.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 9 Mar 2018 02:35:23 +0000 (20:35 -0600)]
xfs_db: print transaction reservation type information
Create a new xfs_db command to print the transaction reservation info for
a given filesystem. This will make it easier to compare the calculations
made by the kernel and xfsprogs in case there is a discrepancy.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 9 Mar 2018 02:35:23 +0000 (20:35 -0600)]
xfs_scrub: don't try to scan xattrs if bstat says there aren't any
Only try to scan the extended attributes of a file if bstat says that
the file actually has any. Surprisingly, this reduces the phase 5
runtime by 40% if most of the files don't have attrs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 9 Mar 2018 02:35:23 +0000 (20:35 -0600)]
xfs_scrub: fix #include ordering to avoid build failure
Fix the ordering of the header includes in all the scrub source. We put
xfs.h first so that it will pull in include/linux.h which pulls in
linux/fs.h + whatever overrides are necessary (currently limited to
struct fsxattr) to make things work on this platform, and then we remove
the #includes for anything that will get pulled (directly or indirectly)
by xfs.h for cleanliness. Without this, a user compiling new xfsprogs
on a system with a 4.7 kernel gets this:
Building scrub
[CC] disk.o
In file included from ../include/xfs.h:37:0,
from disk.c:40:
../include/xfs/linux.h:185:8: error: redefinition of 'struct fsxattr'
struct fsxattr {
^~~~~~~
In file included from disk.c:31:0:
/usr/include/linux/fs.h:155:8: note: originally defined here
struct fsxattr {
^~~~~~~
gmake[2]: *** [../include/buildrules:60: disk.o] Error 1
gmake[1]: *** [include/buildrules:36: scrub] Error 2
make: *** [Makefile:77: default] Error 2
Reported-by: Mikael Magnusson <mikachu@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 9 Mar 2018 02:35:23 +0000 (20:35 -0600)]
xfs_scrub: don't ask user to run xfs_repair for only warnings
Don't advise the user to run xfs_repair on a filesystem that triggers
warnings but no errors; there's no corruption for it to fix.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 9 Mar 2018 02:35:23 +0000 (20:35 -0600)]
xfs_scrub: log operational messages when interactive
Record the summary of an interactive session in the system log so that
future support requests can get a better picture of what happened.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 9 Mar 2018 02:35:23 +0000 (20:35 -0600)]
xfs_db: don't crash in ablock if there's no inode
Make sure we actually have an inode selected before trying to unwrap its
attribute fork. Found via a crash in xfs/288 with project quotas
enabled.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 9 Mar 2018 02:35:22 +0000 (20:35 -0600)]
misc: fix gcc 7.3 warnings
Fix various compiler warnings that pop up in 7.3.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Fri, 9 Mar 2018 02:35:22 +0000 (20:35 -0600)]
xfsprogs: new libxfs-apply option for Signed-off-by: tag
Technically when a maintainer moves a patch from another project,
they should add their Signed-off-by: tag. Get that info automatically
from git-config, and add an option to to override it if desired,
to make that easy when cross-porting libxfs patches
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Fri, 9 Mar 2018 02:35:22 +0000 (20:35 -0600)]
xfsprogs: call libxfs_destroy from other utilities
Call libxfs_destroy() from xfs_copy, xfs_db, mkfs.xfs, and
xfs_repair to allow us to detect leaked items in these
utilities as well.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Fri, 9 Mar 2018 02:35:22 +0000 (20:35 -0600)]
libxfs: Catch non-empty zones on destroy
Create and use a kmem_zone_destroy which warns if we are
releasing a non-empty zone when the LIBXFS_LEAK_CHECK
environment variable is set, wire this into libxfs_destroy(),
and call that when various tools exit.
The LIBXFS_LEAK_CHECK environment variable also causes
the program to exit with failure when a leak is detected,
useful for failing automated tests if leaks are encountered.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Fri, 9 Mar 2018 02:35:22 +0000 (20:35 -0600)]
libxfs: move xfs_inode_zone to rdwr.c
The zone itself is created in rdwr.c, so define it there as
well, and add it to the list of externs in manage_zones along
with all the rest, for consistency.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Fri, 9 Mar 2018 02:35:22 +0000 (20:35 -0600)]
libxfs: add function to free all buffers in bcache
libxfs_bcache_purge simply moves all "free" buffers
onto the xfs_buf_freelist mru list; add a new function to
actually free them when we tear everything down, so leak
checkers don't go nuts about lots of unfreed xfs_bufs
at exit.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eric Sandeen [Fri, 9 Mar 2018 02:35:22 +0000 (20:35 -0600)]
libxfs: Replace XFS_BUF_SET_PTR with xfs_buf_associate_memory
We test the return value of the macro, but it returns
returns a side-effect which looks like failure. Write
a userspace-libxfs-specific version of xfs_buf_associate_memory
to make this code a tad more like the kernel, with a proper
return value to boot.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Using #!/usr/bin/env makes some package dependency tools
such as rpm complain given that it cannot verify package
dependencies. Making it explicit resolves this lint rant.
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Xiao Yang [Fri, 9 Mar 2018 02:35:20 +0000 (20:35 -0600)]
xfs_repair: Add missing braces to allow zeroing of corrupt log
When xlog_find_tail() fails to find the head or the tail, the missing
braces leads that an unparseable log always exits with status 2, even
if we've asked for -n or -L which should proceed. We can expose this
issue by xfstests case xfs/098.
Fixes: commit b04647edea32 ("xfs_repair: exit with status 2 if log dirtiness is unknown") Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Brian Foster [Fri, 9 Mar 2018 02:35:20 +0000 (20:35 -0600)]
xfs_io: support a basic extent swap command
Extent swap is a low level mechanism exported by XFS to facilitate
filesystem defragmentation. It is typically invoked by xfs_fsr under
conditions that will atomically adjust inode extent state without
loss of file data.
While xfs_fsr provides some debug capability to tailor its behavior,
it is not flexible enough to facilitate low level tests of the
extent swap mechanism. For example, xfs_fsr may skip swaps between
inodes that consist solely of preallocated extents because it
considers such files already 100% defragmented. Further, xfs_fsr
copies data between files where doing so may be unnecessary and thus
inefficient for lower level tests.
Add a basic swapext command to xfs_io that allows userspace
invocation of the command under more controlled conditions. This
facilites targeted tests without interference from xfs_fsr policy,
such as using files with only preallocated extents, known/expected
failure cases, etc. This command makes no effort to retain data
across the operation. As such, it is for testing purposes only.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 9 Mar 2018 02:35:20 +0000 (20:35 -0600)]
misc: enable link time optimization, if requested
Enable link time optimization (LTO) if the builder requests it. The
extra link optimization results in smaller binaries.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 9 Mar 2018 02:35:20 +0000 (20:35 -0600)]
misc: enable retpolines across all xfsprogs utilities
Detect and enable retpolines for all code, to mitigate Spectre v2
(branch target injection) on x86.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Don't use u32, use uint32_t, because this won't work in xfsprogs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: no-op commit, fixed previously to keep build working] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
xfs_bmap_btalloc is given a range of file offset blocks that must be
allocated to some data/attr/cow fork. If the fork has an extent size
hint associated with it, the request will be enlarged on both ends to
try to satisfy the alignment hint. If free space is fragmentated,
sometimes we can allocate some blocks but not enough to fulfill any of
the requested range. Since bmapi_allocate always trims the new extent
mapping to match the originally requested range, this results in
bmapi_write returning zero and no mapping.
The consequences of this vary -- buffered writes will simply re-call
bmapi_write until it can satisfy at least one block from the original
request. Direct IO overwrites notice nmaps == 0 and return -ENOSPC
through the dio mechanism out to userspace with the weird result that
writes fail even when we have enough space because the ENOSPC return
overrides any partial write status. For direct CoW writes the situation
was disastrous because nobody notices us returning an invalid zero-length
wrong-offset mapping to iomap and the write goes off into space.
Therefore, if free space is so fragmented that we managed to allocate
some space but not enough to map into even a single block of the
original allocation request range, we should break the alignment hint in
order to guarantee at least some forward progress for the direct write.
If we return a short allocation to iomap_apply it'll call back about the
remaining blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Since the CoW fork only exists in memory, it is incorrect to update the
on-disk quota block counts when we modify the CoW fork. Unlike the data
fork, even real extents in the CoW fork are only delalloc-style
reservations (on-disk they're owned by the refcountbt) so they must not
be tracked in the on disk quota info. Ensure the i_delayed_blks
accounting reflects this too.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Move all the inode and quota accounting updates out of xfs_bmap_btalloc
in preparation for fixing some quota accounting problems with copy on
write.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Refactor inode verifier error reporting into a non-libxfs function so
that we aren't encoding the message format in libxfs. This also
changes the kernel dmesg output to resemble buffer verifier errors
more closely.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Remove the extent size hint and realtime inode relevant code from
the xfs_bmapi_reserve_delalloc since it is not called on the inode
with extent size hint set or on a realtime inode.
Signed-off-by: Shan Hai <shan.hai@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
By splitting the b_fspriv field into two different fields (b_log_item
and b_li_list). It's possible to get rid of an old ABI workaround, by
using the new b_log_item field to store xfs_buf_log_item separated from
the log items attached to the buffer, which will be linked in the new
b_li_list field.
This way, there is no more need to reorder the log items list to place
the buf_log_item at the beginning of the list, simplifying a bit the
logic to handle buffer IO.
This also opens the possibility to change buffer's log items list into a
proper list_head.
b_log_item field is still defined as a void *, because it is still used
by the log buffers to store xlog_in_core structures, and there is no
need to add an extra field on xfs_buf just for xlog_in_core.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: minor style changes] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[sandeen: b_li_list unused in userspace] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Currently, we don't check sb_agblocks or sb_agblklog when we validate
the superblock, which means that we can fuzz garbage values into those
values and the mount succeeds. This leads to all sorts of UBSAN
warnings in xfs/350 since we can then coerce other parts of xfs into
shifting by ridiculously large values.
Once we've validated agblocks, make sure the agcount makes sense.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
[sandeen: fix up u32 usage now so we keep building] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Eryu Guan reported seeing occasional hangs when running generic/269 with
a new fsstress that supports clonerange/deduperange. The cause of this
hang is an infinite loop when we convert the CoW fork extents from
unwritten to real just prior to writing the pages out; the infinite
loop happens because there's nothing in the CoW fork to convert, and so
it spins forever.
The fundamental issue here is that when we go to perform these CoW fork
conversions, we're supposed to have an extent waiting for us, but the
low space CoW reaper has snuck in and blown them away! There are four
conditions that can dissuade the reaper from touching our file -- no
reflink iflag; dirty page cache; writeback in progress; or directio in
progress. We check the four conditions prior to taking the locks, but
we neglect to recheck them once we have the locks, which is how we end
up whacking the writeback that's in progress.
Therefore, refactor the four checks into a helper function and call it
once again once we have the locks to make sure we really want to reap
the inode. While we're at it, add an ASSERT for this weird condition so
that we'll fail noisily if we ever screw this up again.
Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Tested-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
A btree format inode fork with zero records makes no sense, so reject it
if we see it, or else we can miscalculate memory allocations. Found by
zeroes fuzzing {a,u3}.bmbt.numrecs in xfs/{374,378,412} with KASAN.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In the attribute leaf verifier, we can check for obviously bad values of
firstused and count so that later attempts at lasthash don't run off the
end of the memory buffer. Found by ones fuzzing hdr.count in xfs/400 with
KASAN.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In xfs_scrub_dir_rec, we must walk through the directory block entries
to arrive at the offset given by the hash structure. If we blindly
trust the hash address, we can end up midway into a directory entry and
stray outside the block. Found by lastbit fuzzing lents[3].address in
xfs/390 with KASAN enabled.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
While we're scrubbing various btrees, cross-reference the records
with the other metadata.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Add a couple of functions to the refcount btrees that will be used
to cross-reference metadata against the refcountbt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Add a couple of functions to the rmap btrees that will be used
to cross-reference metadata against the rmapbt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Add a couple of functions to the inode btrees that will be used
to cross-reference metadata against the inobt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Add a couple of functions to the free space btrees that will be used
to cross-reference metadata against the bnobt/cntbt, and a generic
btree function that provides the real implementation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Chris Dunlop reports a problem where an xattr operation fails,
reports the following error to syslog and hangs during unmount:
================================================
[ BUG: lock held when returning to user space! ]
...
------------------------------------------------
<PID> is leaving the kernel with locks still held!
1 lock held by <PID>:
#0: (sb_internal){......}, at: [<ffffffffa07692a3>] xfs_trans_alloc+0xe3/0x130 [xfs]
The failure/shutdown occurs during deferred ops processing which
leads to an error return from xfs_defer_finish() via
xfs_attr_leaf_addname(). While the root cause of the failure is
unknown corruption, the cause of the subsequent BUG above and
unmount hang is failure to cancel the transaction before returning
to userspace.
The transaction is not cancelled because the out_defer_cancel error
handling paths in the xfs_attr_[leaf|node]_[add|remove]name()
functions clear args.trans without releasing the transaction. The
callers therefore lose the reference to the transaction and fail to
cancel it.
Since xfs_attr_[set|remove]() always cancel args.trans when != NULL
and xfs_defer_finish()->...->xfs_trans_roll() should always return
with a valid transaction, update the leaf/node xattr functions to
not reset args.trans in the error path responsible for cancelling
deferred ops.
Reported-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>