]> git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log
thirdparty/xfsprogs-dev.git
7 months agolibxfs: implement some sanity checking for enormous rgcount
Darrick J. Wong [Thu, 21 Nov 2024 00:24:33 +0000 (16:24 -0800)] 
libxfs: implement some sanity checking for enormous rgcount

Similar to what we do for suspiciously large sb_agcount values, if
someone tries to get libxfs to load a filesystem with a very large
realtime group count, let's do some basic checks of the rt device to
see if it's really that large.  If the read fails, only load the first
rtgroup and warn the user.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agolibxfs: port userspace deferred log item to handle rtgroups
Darrick J. Wong [Thu, 21 Nov 2024 00:24:32 +0000 (16:24 -0800)] 
libxfs: port userspace deferred log item to handle rtgroups

Make the userspace log items to handle rt groups correctly.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoman: document rgextents geom field
Christoph Hellwig [Fri, 20 Dec 2024 03:48:49 +0000 (19:48 -0800)] 
man: document rgextents geom field

Document the new rgextent geom field.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
7 months agoman: document the rt group geometry ioctl
Darrick J. Wong [Thu, 21 Nov 2024 00:24:32 +0000 (16:24 -0800)] 
man: document the rt group geometry ioctl

Document the new ioctl that retrieves realtime allocation group geometry
information.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agolibfrog: scrub the realtime group superblock
Darrick J. Wong [Thu, 21 Nov 2024 00:24:30 +0000 (16:24 -0800)] 
libfrog: scrub the realtime group superblock

Enable scrubbing of realtime group superblocks in xfs_scrub, and
update the scrub ioctl documentation.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agolibxfs: use correct rtx count to block count conversion
Darrick J. Wong [Thu, 21 Nov 2024 00:24:27 +0000 (16:24 -0800)] 
libxfs: use correct rtx count to block count conversion

Fix a place where we use the wrong conversion functions to convert
between a number of rt extents and a number of rt blocks.  This isn't
really necessary since userspace cannot allocate rt extents, but let's
not leave a logic bomb.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair,mkfs: port to libxfs_rt{bitmap,summary}_create
Darrick J. Wong [Thu, 21 Nov 2024 00:24:26 +0000 (16:24 -0800)] 
xfs_repair,mkfs: port to libxfs_rt{bitmap,summary}_create

Replace the open-coded rtbitmap and summary creation routines with the
ones in libxfs so that we share code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agolibxfs: adjust xfs_fsb_to_db to handle segmented rtblocks
Darrick J. Wong [Thu, 21 Nov 2024 00:24:26 +0000 (16:24 -0800)] 
libxfs: adjust xfs_fsb_to_db to handle segmented rtblocks

Update this function to handle segmented xfs_rtblock_t, just like we did
for the kernel.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agolibxfs: remove XFS_ILOCK_RT*
Darrick J. Wong [Thu, 21 Nov 2024 00:24:25 +0000 (16:24 -0800)] 
libxfs: remove XFS_ILOCK_RT*

Now that we've centralized the realtime metadata locking routines, get
rid of the ILOCK subclasses since we now use explicit lockdep classes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: return from xfs_symlink_verify early on V4 filesystems
Darrick J. Wong [Mon, 16 Dec 2024 02:18:44 +0000 (18:18 -0800)] 
xfs: return from xfs_symlink_verify early on V4 filesystems

Source kernel commit: 7f8b718c58783f3ff0810b39e2f62f50ba2549f6

V4 symlink blocks didn't have headers, so return early if this is a V4
filesystem.

Cc: <stable@vger.kernel.org> # v5.1
Fixes: 39708c20ab5133 ("xfs: miscellaneous verifier magic value fixups")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: fix sb_spino_align checks for large fsblock sizes
Darrick J. Wong [Mon, 16 Dec 2024 02:18:44 +0000 (18:18 -0800)] 
xfs: fix sb_spino_align checks for large fsblock sizes

Source kernel commit: 7f8a44f37229fc76bfcafa341a4b8862368ef44a

For a sparse inodes filesystem, mkfs.xfs computes the values of
sb_spino_align and sb_inoalignmt with the following code:

int     cluster_size = XFS_INODE_BIG_CLUSTER_SIZE;

if (cfg->sb_feat.crcs_enabled)
cluster_size *= cfg->inodesize / XFS_DINODE_MIN_SIZE;

sbp->sb_spino_align = cluster_size >> cfg->blocklog;
sbp->sb_inoalignmt = XFS_INODES_PER_CHUNK *
cfg->inodesize >> cfg->blocklog;

On a V5 filesystem with 64k fsblocks and 512 byte inodes, this results
in cluster_size = 8192 * (512 / 256) = 16384.  As a result,
sb_spino_align and sb_inoalignmt are both set to zero.  Unfortunately,
this trips the new sb_spino_align check that was just added to
xfs_validate_sb_common, and the mkfs fails:

# mkfs.xfs -f -b size=64k, /dev/sda
meta-data=/dev/sda               isize=512    agcount=4, agsize=81136 blks
=                       sectsz=512   attr=2, projid32bit=1
=                       crc=1        finobt=1, sparse=1, rmapbt=1
=                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
=                       exchange=0   metadir=0
data     =                       bsize=65536  blocks=324544, imaxpct=25
=                       sunit=0      swidth=0 blks
naming   =version 2              bsize=65536  ascii-ci=0, ftype=1, parent=0
log      =internal log           bsize=65536  blocks=5006, version=2
=                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=65536  blocks=0, rtextents=0
=                       rgcount=0    rgsize=0 extents
Discarding blocks...Sparse inode alignment (0) is invalid.
Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
Sparse inode alignment (0) is invalid.
Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
mkfs.xfs: writing AG headers failed, err=22

Prior to commit 59e43f5479cce1 this all worked fine, even if "sparse"
inodes are somewhat meaningless when everything fits in a single
fsblock.  Adjust the checks to handle existing filesystems.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 59e43f5479cce1 ("xfs: sb_spino_align is not verified")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: update btree keys correctly when _insrec splits an inode root block
Darrick J. Wong [Mon, 16 Dec 2024 02:18:43 +0000 (18:18 -0800)] 
xfs: update btree keys correctly when _insrec splits an inode root block

Source kernel commit: 6d7b4bc1c3e00b1a25b7a05141a64337b4629337

In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec
would erroneously try to update the parent's key for a block that had
been split if we decided to insert the new record into the new block.
The solution was to detect this situation and update the in-core key
value that we pass up to the caller so that the caller will (eventually)
add the new block to the parent level of the tree with the correct key.

However, I missed a subtlety about the way inode-rooted btrees work.  If
the full block was a maximally sized inode root block, we'll solve that
fullness by moving the root block's records to a new block, resizing the
root block, and updating the root to point to the new block.  We don't
pass a pointer to the new block to the caller because that work has
already been done.  The new record will /always/ land in the new block,
so in this case we need to use xfs_btree_update_keys to update the keys.

This bug can theoretically manifest itself in the very rare case that we
split a bmbt root block and the new record lands in the very first slot
of the new block, though I've never managed to trigger it in practice.
However, it is very easy to reproduce by running generic/522 with the
realtime rmapbt patchset if rtinherit=1.

Cc: <stable@vger.kernel.org> # v4.8
Fixes: 2c813ad66a7218 ("xfs: support btrees with overlapping intervals for keys")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: fix error bailout in xfs_rtginode_create
Darrick J. Wong [Mon, 16 Dec 2024 02:18:42 +0000 (18:18 -0800)] 
xfs: fix error bailout in xfs_rtginode_create

Source kernel commit: 23bee6f390a12d0c4c51fefc083704bc5dac377e

smatch reported that we screwed up the error cleanup in this function.
Fix it.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: ae897e0bed0f54 ("xfs: support creating per-RTG files in growfs")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: return a 64-bit block count from xfs_btree_count_blocks
Darrick J. Wong [Mon, 16 Dec 2024 02:18:41 +0000 (18:18 -0800)] 
xfs: return a 64-bit block count from xfs_btree_count_blocks

Source kernel commit: bd27c7bcdca25ce8067ebb94ded6ac1bd7b47317

With the nrext64 feature enabled, it's possible for a data fork to have
2^48 extent mappings.  Even with a 64k fsblock size, that maps out to
a bmbt containing more than 2^32 blocks.  Therefore, this predicate must
return a u64 count to avoid an integer wraparound that will cause scrub
to do the wrong thing.

It's unlikely that any such filesystem currently exists, because the
incore bmbt would consume more than 64GB of kernel memory on its own,
and so far nobody except me has driven a filesystem that far, judging
from the lack of complaints.

Cc: <stable@vger.kernel.org> # v5.19
Fixes: df9ad5cc7a5240 ("xfs: Introduce macros to represent new maximum extent counts for data/attr forks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay
Christoph Hellwig [Mon, 16 Dec 2024 02:17:26 +0000 (18:17 -0800)] 
xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay

Source kernel commit: cc2dba08cc33daf8acd6e560957ef0e0f4d034ed

xfs_bmap_add_extent_hole_delay works entirely on delalloc extents, for
which xfs_bmap_same_rtgroup doesn't make sense.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
7 months agoxfs: switch to multigrain timestamps
Jeff Layton [Mon, 2 Dec 2024 19:03:30 +0000 (11:03 -0800)] 
xfs: switch to multigrain timestamps

Source kernel commit: 1cf7e834a6fb84de9d1e038d6cf4c5bd0d202ffa

Enable multigrain timestamps, which should ensure that there is an
apparent change to the timestamp whenever it has been written after
being actively observed via getattr.

Also, anytime the mtime changes, the ctime must also change, and those
are now the only two options for xfs_trans_ichgtime. Have that function
unconditionally bump the ctime, and ASSERT that XFS_ICHGTIME_CHG is
always set.

Finally, stop setting STATX_CHANGE_COOKIE in getattr, since the ctime
should give us better semantics now.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # documentation bits
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20241002-mgtime-v10-9-d1c4717f5284@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: fix sparse inode limits on runt AG
Dave Chinner [Mon, 25 Nov 2024 21:15:27 +0000 (13:15 -0800)] 
xfs: fix sparse inode limits on runt AG

Source kernel commit: 13325333582d4820d39b9e8f63d6a54e745585d9

The runt AG at the end of a filesystem is almost always smaller than
the mp->m_sb.sb_agblocks. Unfortunately, when setting the max_agbno
limit for the inode chunk allocation, we do not take this into
account. This means we can allocate a sparse inode chunk that
overlaps beyond the end of an AG. When we go to allocate an inode
from that sparse chunk, the irec fails validation because the
agbno of the start of the irec is beyond valid limits for the runt
AG.

Prevent this from happening by taking into account the size of the
runt AG when allocating inode chunks. Also convert the various
checks for valid inode chunk agbnos to use xfs_ag_block_count()
so that they will also catch such issues in the future.

Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: remove unknown compat feature check in superblock write validation
Long Li [Mon, 25 Nov 2024 21:15:14 +0000 (13:15 -0800)] 
xfs: remove unknown compat feature check in superblock write validation

Source kernel commit: 652f03db897ba24f9c4b269e254ccc6cc01ff1b7

Compat features are new features that older kernels can safely ignore,
allowing read-write mounts without issues. The current sb write validation
implementation returns -EFSCORRUPTED for unknown compat features,
preventing filesystem write operations and contradicting the feature's
definition.

Additionally, if the mounted image is unclean, the log recovery may need
to write to the superblock. Returning an error for unknown compat features
during sb write validation can cause mount failures.

Although XFS currently does not use compat feature flags, this issue
affects current kernels' ability to mount images that may use compat
feature flags in the future.

Since superblock read validation already warns about unknown compat
features, it's unnecessary to repeat this warning during write validation.
Therefore, the relevant code in write validation is being removed.

Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
7 months agoxfs: port ondisk structure checks from xfs/122 to the kernel
Darrick J. Wong [Mon, 25 Nov 2024 21:14:27 +0000 (13:14 -0800)] 
xfs: port ondisk structure checks from xfs/122 to the kernel

Source kernel commit: 13877bc79d81354c53e91f3c86ac0f7bafe3ba7b

Check this with every kernel and userspace build, so we can drop the
nonsense in xfs/122.  Roughly drafted with:

sed -e 's/^offsetof/\tXFS_CHECK_OFFSET/g' \
-e 's/^sizeof/\tXFS_CHECK_STRUCT_SIZE/g' \
-e 's/ = \([0-9]*\)/,\t\t\t\1);/g' \
-e 's/xfs_sb_t/struct xfs_dsb/g' \
-e 's/),/,/g' \
-e 's/xfs_\([a-z0-9_]*\)_t,/struct xfs_\1,/g' \
< tests/xfs/122.out | sort

and then manual fixups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: separate space btree structures in xfs_ondisk.h
Darrick J. Wong [Mon, 25 Nov 2024 21:14:27 +0000 (13:14 -0800)] 
xfs: separate space btree structures in xfs_ondisk.h

Source kernel commit: 131a883fffb1a194957dc0e400d9f627c7cd1924

Create a separate section for space management btrees so that they're
not mixed in with file structures.  Ignore the dsb stuff sprinkled
around for now, because we'll deal with that in a subsequent patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: convert struct typedefs in xfs_ondisk.h
Darrick J. Wong [Mon, 25 Nov 2024 21:14:27 +0000 (13:14 -0800)] 
xfs: convert struct typedefs in xfs_ondisk.h

Source kernel commit: 89b38282d1b0f34595f86193cb2bf96e6730060e

Replace xfs_foo_t with struct xfs_foo where appropriate.  The next patch
will import more checks from xfs/122, and it's easier to automate
deduplication if we don't have to reason about typedefs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: enable metadata directory feature
Darrick J. Wong [Mon, 25 Nov 2024 21:14:27 +0000 (13:14 -0800)] 
xfs: enable metadata directory feature

Source kernel commit: ea079efd365e60aa26efea24b57ced4c64640e75

Enable the metadata directory feature.  With this feature, all metadata
inodes are placed in the metadata directory, and the only inumbers in
the superblock are the roots of the two directory trees.

The RT device is now sharded into a number of rtgroups, where 0 rtgroups
mean that no RT extents are supported, and the traditional XFS stub RT
bitmap and summary inodes don't exist.  A single rtgroup gives roughly
identical behavior to the traditional RT setup, but now with checksummed
and self identifying free space metadata.

For quota, the quota options are read from the superblock unless
explicitly overridden via mount options.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: scrub quota file metapaths
Darrick J. Wong [Mon, 25 Nov 2024 21:14:26 +0000 (13:14 -0800)] 
xfs: scrub quota file metapaths

Source kernel commit: 128a055291ebbc156e219b83d03dc5e63e71d7ce

Enable online fsck for quota file metadata directory paths.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: use metadir for quota inodes
Darrick J. Wong [Mon, 25 Nov 2024 21:14:26 +0000 (13:14 -0800)] 
xfs: use metadir for quota inodes

Source kernel commit: e80fbe1ad8eff7d7d1363e14f1e493d84dd37c84

Store the quota inodes in the /quota metadata directory if metadir is
enabled.  This enables us to stop using the sb_[ugp]uotino fields in the
superblock.  From this point on, all metadata files will be children of
the metadata directory tree root.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: implement busy extent tracking for rtgroups
Darrick J. Wong [Mon, 25 Nov 2024 21:14:26 +0000 (13:14 -0800)] 
xfs: implement busy extent tracking for rtgroups

Source kernel commit: 7e85fc2394115db56be678b617ed646563926581

For rtgroups filesystems, track newly freed (rt) space through the log
until the rt EFIs have been committed to disk.  This way we ensure that
space cannot be reused until all traces of the old owner are gone.

As a fringe benefit, we now support -o discard on the realtime device.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: move the min and max group block numbers to xfs_group
Darrick J. Wong [Mon, 25 Nov 2024 21:14:26 +0000 (13:14 -0800)] 
xfs: move the min and max group block numbers to xfs_group

Source kernel commit: e0b5b97dde8e4737d06cb5888abd88373abc22df

Move the min and max agblock numbers to the generic xfs_group structure
so that we can start building validators for extents within an rtgroup.
While we're at it, use check_add_overflow for the extent length
computation because that has much better overflow checking.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: adjust min_block usage in xfs_verify_agbno
Darrick J. Wong [Mon, 25 Nov 2024 21:14:26 +0000 (13:14 -0800)] 
xfs: adjust min_block usage in xfs_verify_agbno

Source kernel commit: ceaa0bd773e2d6d5726d6535f605ecd6b26d2fcc

There's some weird logic in xfs_verify_agbno -- min_block ought to be
the first agblock number in the AG that can be used by non-static
metadata.  However, we initialize it to the last agblock of the static
metadata, which works due to the <= check, even though this isn't
technically correct.

Change the check to < and set min_block to the next agblock past the
static metadata.  This hasn't been an issue up to now, but we're going
to move these things into the generic group struct, and this will cause
problems with rtgroups, where min_block can be zero for an rtgroup that
doesn't have a rt superblock.

Note that there's no user-visible impact with the old logic, so this
isn't a bug fix.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t
Darrick J. Wong [Mon, 25 Nov 2024 21:14:25 +0000 (13:14 -0800)] 
xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t

Source kernel commit: 7195f240c6578caa9e24202a26aa612a7e8cba26

Now that we've finished adding allocation groups to the realtime volume,
let's make the file block mapping address (xfs_rtblock_t) a segmented
value just like we do on the data device.  This means that group number
and block number conversions can be done with shifting and masking
instead of integer division.

While in theory we could continue caching the rgno shift value in
m_rgblklog, the fact that we now always use the shift value means that
we have an opportunity to increase the redundancy of the rt geometry by
storing it in the ondisk superblock and adding more sb verifier code.
Extend the sueprblock to store the rgblklog value.

Now that we have segmented addresses, set the correct values in
m_groups[XG_TYPE_RTG] so that the xfs_group helpers work correctly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: create helpers to deal with rounding xfs_filblks_t to rtx boundaries
Darrick J. Wong [Mon, 25 Nov 2024 21:14:25 +0000 (13:14 -0800)] 
xfs: create helpers to deal with rounding xfs_filblks_t to rtx boundaries

Source kernel commit: 3f0205ebe71f92c1b98ca580de8df6eea631cfd2

We're about to segment xfs_rtblock_t addresses, so we must create
type-specific helpers to do rt extent rounding of file mapping block
lengths because the rtb helpers soon will not do the right thing there.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries
Darrick J. Wong [Mon, 25 Nov 2024 21:14:25 +0000 (13:14 -0800)] 
xfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries

Source kernel commit: fd7588fa6475771fe95f44011aea268c5d841da2

We're about to segment xfs_rtblock_t addresses, so we must create
type-specific helpers to do rt extent rounding of file block offsets
because the rtb helpers soon will not do the right thing there.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: mask off the rtbitmap and summary inodes when metadir in use
Darrick J. Wong [Mon, 25 Nov 2024 21:14:25 +0000 (13:14 -0800)] 
xfs: mask off the rtbitmap and summary inodes when metadir in use

Source kernel commit: ea99122b18ca6cf902417e1acbc19a197f662299

Set the rtbitmap and summary file inumbers to NULLFSINO in the
superblock and make sure they're zeroed whenever we write the superblock
to disk, to mimic mkfs behavior.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: scrub metadir paths for rtgroup metadata
Darrick J. Wong [Mon, 25 Nov 2024 21:14:24 +0000 (13:14 -0800)] 
xfs: scrub metadir paths for rtgroup metadata

Source kernel commit: a74923333d9c3bc7cae3f8820d5e80535dca1457

Add the code we need to scan the metadata directory paths of rt group
metadata files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: scrub the realtime group superblock
Darrick J. Wong [Mon, 25 Nov 2024 21:14:24 +0000 (13:14 -0800)] 
xfs: scrub the realtime group superblock

Source kernel commit: 3f1bdf50ab1b9c94d0da010f8879895d29585fd9

Enable scrubbing of realtime group superblocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: make the RT allocator rtgroup aware
Christoph Hellwig [Mon, 25 Nov 2024 21:14:24 +0000 (13:14 -0800)] 
xfs: make the RT allocator rtgroup aware

Source kernel commit: d162491c5459f4dd72e65b72a2c864591668ec07

Make the allocator rtgroup aware by either picking a specific group if
there is a hint, or loop over all groups otherwise.  A simple rotor is
provided to pick the placement for initial allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: don't merge ioends across RTGs
Darrick J. Wong [Mon, 25 Nov 2024 21:14:24 +0000 (13:14 -0800)] 
xfs: don't merge ioends across RTGs

Source kernel commit: b91afef724710e3dc7d65a28105ffd7a4e861d69

Unlike AGs, RTGs don't always have metadata in their first blocks, and
thus we don't get automatic protection from merging I/O completions
across RTG boundaries.  Add code to set the IOMAP_F_BOUNDARY flag for
ioends that start at the first block of a RTG so that they never get
merged into the previous ioend.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: use realtime EFI to free extents when rtgroups are enabled
Darrick J. Wong [Mon, 25 Nov 2024 21:14:24 +0000 (13:14 -0800)] 
xfs: use realtime EFI to free extents when rtgroups are enabled

Source kernel commit: 44e69c9af159e61d4f765ff4805dd5b55f241597

When rmap is enabled, XFS expects a certain order of operations, which
is: 1) remove the file mapping, 2) remove the reverse mapping, and then
3) free the blocks.  When reflink is enabled, XFS replaces (3) with a
deferred refcount decrement operation that can schedule freeing the
blocks if that was the last refcount.

For realtime files, xfs_bmap_del_extent_real tries to do 1 and 3 in the
same transaction, which will break both rmap and reflink unless we
switch it to use realtime EFIs.  Both rmap and reflink depend on the
rtgroups feature, so let's turn on EFIs for all rtgroups filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: support error injection when freeing rt extents
Darrick J. Wong [Mon, 25 Nov 2024 21:14:23 +0000 (13:14 -0800)] 
xfs: support error injection when freeing rt extents

Source kernel commit: fc91d9430e5dd2008ef6c1350fa15c1a0ed17f11

A handful of fstests expect to be able to test what happens when extent
free intents fail to actually free the extent.  Now that we're
supporting EFIs for realtime extents, add to xfs_rtfree_extent the same
injection point that exists in the regular extent freeing code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: support logging EFIs for realtime extents
Darrick J. Wong [Mon, 25 Nov 2024 21:14:23 +0000 (13:14 -0800)] 
xfs: support logging EFIs for realtime extents

Source kernel commit: 4c8900bbf106592ce647285e308abd2a7f080d88

Teach the EFI mechanism how to free realtime extents.  We're going to
need this to enforce proper ordering of operations when we enable
realtime rmap.

Declare a new log intent item type (XFS_LI_EFI_RT) and a separate defer
ops for rt extents.  This keeps the ondisk artifacts and processing code
completely separate between the rt and non-rt cases.  Hopefully this
will make it easier to debug filesystem problems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: grow the realtime section when realtime groups are enabled
Darrick J. Wong [Mon, 25 Nov 2024 21:14:23 +0000 (13:14 -0800)] 
xfs: grow the realtime section when realtime groups are enabled

Source kernel commit: ee321351487ae00db147d570c8c2a43e10207386

Enable growing the rt section when realtime groups are enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: encode the rtsummary in big endian format
Darrick J. Wong [Mon, 25 Nov 2024 21:14:23 +0000 (13:14 -0800)] 
xfs: encode the rtsummary in big endian format

Source kernel commit: a2c28367396a85f2d9cfb22acfcedcff08dd1c3c

Currently, the ondisk realtime summary file counters are accessed in
units of 32-bit words.  There's no endian translation of the contents of
this file, which means that the Bad Things Happen(tm) if you go from
(say) x86 to powerpc.  Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file.  Encode the summary
information in big endian format, like most of the rest of the
filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: encode the rtbitmap in big endian format
Darrick J. Wong [Mon, 25 Nov 2024 21:14:23 +0000 (13:14 -0800)] 
xfs: encode the rtbitmap in big endian format

Source kernel commit: eba42c2c53c8b8905307b702c93dffef0719a896

Currently, the ondisk realtime bitmap file is accessed in units of
32-bit words.  There's no endian translation of the contents of this
file, which means that the Bad Things Happen(tm) if you go from (say)
x86 to powerpc.  Since we have a new feature flag, let's take the
opportunity to enforce an endianness on the file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add block headers to realtime bitmap and summary blocks
Darrick J. Wong [Mon, 25 Nov 2024 21:14:22 +0000 (13:14 -0800)] 
xfs: add block headers to realtime bitmap and summary blocks

Source kernel commit: 118895aa9513412b9077a8cae0bc63df8956f9b2

Upgrade rtbitmap and rtsummary blocks to have self describing metadata
like most every other thing in XFS.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: export the geometry of realtime groups to userspace
Darrick J. Wong [Mon, 25 Nov 2024 21:14:22 +0000 (13:14 -0800)] 
xfs: export the geometry of realtime groups to userspace

Source kernel commit: 3fa7a6d0c7eb264e469eaf1e3ef59b6793a853ee

Create an ioctl so that the kernel can report the status of realtime
groups to userspace.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: record rt group metadata errors in the health system
Darrick J. Wong [Mon, 25 Nov 2024 21:14:22 +0000 (13:14 -0800)] 
xfs: record rt group metadata errors in the health system

Source kernel commit: ab7bd650e17a392a205ec6b6c72b97cae18d43b4

Record the state of per-rtgroup metadata sickness in the rtgroup
structure for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add frextents to the lazysbcounters when rtgroups enabled
Darrick J. Wong [Mon, 25 Nov 2024 21:14:22 +0000 (13:14 -0800)] 
xfs: add frextents to the lazysbcounters when rtgroups enabled

Source kernel commit: 35537f25d23697716f0070ea0a6e8b3f1fe10196

Make the free rt extent count a part of the lazy sb counters when the
realtime groups feature is enabled.  This is possible because the patch
to recompute frextents from the rtbitmap during log recovery predates
the code adding rtgroup support, hence we know that the value will
always be correct during runtime.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add a helper to prevent bmap merges across rtgroup boundaries
Christoph Hellwig [Mon, 25 Nov 2024 21:14:21 +0000 (13:14 -0800)] 
xfs: add a helper to prevent bmap merges across rtgroup boundaries

Source kernel commit: 8458c4944e10aa8119d9de88e257d60a3537263e

Except for the rt superblock, realtime groups do not store any metadata
at the start (or end) of the group.  There is nothing to prevent the
bmap code from merging allocations from multiple groups into a single
bmap record.  Add a helper to check for this case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: massage the commit message after pulling this into rtgroups]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: check that rtblock extents do not break rtsupers or rtgroups
Darrick J. Wong [Mon, 25 Nov 2024 21:14:21 +0000 (13:14 -0800)] 
xfs: check that rtblock extents do not break rtsupers or rtgroups

Source kernel commit: 9bb512734722d2815bb79e27850dddeeff10db90

Check that rt block pointers do not point to the realtime superblock and
that allocated rt space extents do not cross rtgroup boundaries.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: export realtime group geometry via XFS_FSOP_GEOM
Darrick J. Wong [Mon, 25 Nov 2024 21:14:21 +0000 (13:14 -0800)] 
xfs: export realtime group geometry via XFS_FSOP_GEOM

Source kernel commit: 8edde94d640153d645f85b94b2e1af8872c11ac8

Export the realtime geometry information so that userspace can query it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: update realtime super every time we update the primary fs super
Darrick J. Wong [Mon, 25 Nov 2024 21:14:21 +0000 (13:14 -0800)] 
xfs: update realtime super every time we update the primary fs super

Source kernel commit: 76d3be00df91a56f7c05142ed500f8f8544d5457

Every time we update parts of the primary filesystem superblock that are
echoed in the rt superblock, we must update the rt super.  Avoid
changing the log to support logging to the rt device by using ordered
buffers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: define the format of rt groups
Darrick J. Wong [Mon, 25 Nov 2024 21:14:21 +0000 (13:14 -0800)] 
xfs: define the format of rt groups

Source kernel commit: 96768e91511bfced6e9e537f4891157d909b13ee

Define the ondisk format of realtime group metadata, and a superblock
for realtime volumes.  rt supers are conditionally enabled by a
predicate function so that they can be disabled if we ever implement
zoned storage support for the realtime volume.

For rt group enabled file systems there is a separate bitmap and summary
file for each group and thus the number of bitmap and summary blocks
needs to be calculated differently.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agolibfrog: add memchr_inv
Darrick J. Wong [Thu, 21 Nov 2024 00:24:27 +0000 (16:24 -0800)] 
libfrog: add memchr_inv

Add this kernel function so we can use it in userspace.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: make RT extent numbers relative to the rtgroup
Christoph Hellwig [Mon, 25 Nov 2024 21:14:20 +0000 (13:14 -0800)] 
xfs: make RT extent numbers relative to the rtgroup

Source kernel commit: f220f6da5f4ad7da538c39075cf57e829d5202f7

To prepare for adding per-rtgroup bitmap files, make the xfs_rtxnum_t
type encode the RT extent number relative to the rtgroup.  The biggest
part of this to clearly distinguish between the relative extent number
that gets masked when converting from a global block number and length
values that just have a factor applied to them when converting from
file system blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: refactor xfs_rtsummary_blockcount
Christoph Hellwig [Mon, 25 Nov 2024 21:14:20 +0000 (13:14 -0800)] 
xfs: refactor xfs_rtsummary_blockcount

Source kernel commit: f8c5a8415f6e23fa5b6301635d8b451627efae1c

Make xfs_rtsummary_blockcount take all the required information from
the mount structure and return the number of summary levels from it
as well.  This cleans up many of the callers and prepares for making the
rtsummary files per-rtgroup where they need to look at different value.

This means we recalculate some values in some callers, but as all these
calculations are outside the fast path and cheap, which seems like a
price worth paying.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: refactor xfs_rtbitmap_blockcount
Christoph Hellwig [Mon, 25 Nov 2024 21:14:20 +0000 (13:14 -0800)] 
xfs: refactor xfs_rtbitmap_blockcount

Source kernel commit: 5a7566c8d6b9b5c0aac34882f30448d29d9deafc

Rename the existing xfs_rtbitmap_blockcount to
xfs_rtbitmap_blockcount_len and add a new xfs_rtbitmap_blockcount wrapper
around it that takes the number of extents from the mount structure.

This will simplify the move to per-rtgroup bitmaps as those will need to
pass in the number of extents per rtgroup instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: support creating per-RTG files in growfs
Christoph Hellwig [Mon, 25 Nov 2024 21:14:20 +0000 (13:14 -0800)] 
xfs: support creating per-RTG files in growfs

Source kernel commit: ae897e0bed0f5461a6b1c3259c7d899759ba2a62

To support adding new RT groups in growfs, we need to be able to create
the per-RT group files.  Add a new xfs_rtginode_create helper to create
a given per-RTG file.  Most of the code for that is shared, but the
details of the actual file are abstracted out using a new create method
in struct xfs_rtginode_ops.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: move RT bitmap and summary information to the rtgroup
Christoph Hellwig [Mon, 25 Nov 2024 21:14:20 +0000 (13:14 -0800)] 
xfs: move RT bitmap and summary information to the rtgroup

Source kernel commit: e3088ae2dcae3c15d03d7970d4926c8095fd8c7c

Move the pointers to the RT bitmap and summary inodes as well as the
summary cache to the rtgroups structure to prepare for having a
separate bitmap and summary inodes for each rtgroup.

Code using the inodes now needs to operate on a rtgroup.  Where easily
possible such code is converted to iterate over all rtgroups, else
rtgroup 0 (the only one that can currently exist) is hardcoded.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add a xfs_bmap_free_rtblocks helper
Christoph Hellwig [Mon, 25 Nov 2024 21:14:19 +0000 (13:14 -0800)] 
xfs: add a xfs_bmap_free_rtblocks helper

Source kernel commit: 9c3cfb9c96eee7f1656ef165e1471e1778510f6f

Split the RT extent freeing logic from xfs_bmap_del_extent_real because
it will become more complicated when adding RT group.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: support caching rtgroup metadata inodes
Darrick J. Wong [Mon, 25 Nov 2024 21:14:19 +0000 (13:14 -0800)] 
xfs: support caching rtgroup metadata inodes

Source kernel commit: 65b1231b8cea7fbe7362dceecfda76026d335536

Create the necessary per-rtgroup infrastructure that we need to load
metadata inodes into memory.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: add a lockdep class key for rtgroup inodes
Darrick J. Wong [Mon, 25 Nov 2024 21:14:19 +0000 (13:14 -0800)] 
xfs: add a lockdep class key for rtgroup inodes

Source kernel commit: c29237a65c8dbfade3c032763b66d495b8e8cb7a

Add a dynamic lockdep class key for rtgroup inodes.  This will enable
lockdep to deduce inconsistencies in the rtgroup metadata ILOCK locking
order.  Each class can have 8 subclasses, and for now we will only have
2 inodes per group.  This enables rtgroup order and inode order checks
when nesting ILOCKs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: define locking primitives for realtime groups
Darrick J. Wong [Mon, 25 Nov 2024 21:14:19 +0000 (13:14 -0800)] 
xfs: define locking primitives for realtime groups

Source kernel commit: 0e4875b3fb24c5bfdf685876c76713cda5a23b65

Define helper functions to lock all metadata inodes related to a
realtime group.  There's not much to look at now, but this will become
important when we add per-rtgroup metadata files and online fsck code
for them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs: create incore realtime group structures
Darrick J. Wong [Mon, 25 Nov 2024 21:14:18 +0000 (13:14 -0800)] 
xfs: create incore realtime group structures

Source kernel commit: 87fe4c34a383d51ec75f254240bcd08828f4ce5a

Create an incore object that will contain information about a realtime
allocation group.  This will eventually enable us to shard the realtime
section in a similar manner to how we shard the data section, but for
now just a single object for the entire RT subvolume is created.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agomkfs: add a utility to generate protofiles
Darrick J. Wong [Thu, 21 Nov 2024 00:24:23 +0000 (16:24 -0800)] 
mkfs: add a utility to generate protofiles

Add a new utility to generate mkfs protofiles from a directory tree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agomkfs: support copying in xattrs
Darrick J. Wong [Thu, 21 Nov 2024 00:24:23 +0000 (16:24 -0800)] 
mkfs: support copying in xattrs

Update the protofile code to import extended attributes from the source
files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agomkfs: support copying in large or sparse files
Darrick J. Wong [Thu, 21 Nov 2024 00:24:23 +0000 (16:24 -0800)] 
mkfs: support copying in large or sparse files

Restructure the protofile code to handle sparse files and files that are
larger than the program's address space.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agolibxfs: resync libxfs_alloc_file_space interface with the kernel
Darrick J. Wong [Thu, 21 Nov 2024 00:24:22 +0000 (16:24 -0800)] 
libxfs: resync libxfs_alloc_file_space interface with the kernel

Make the userspace xfs_alloc_file_space behave (more or less) like the
kernel version, at least as far as the interface goes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agomkfs.xfs: enable metadata directories
Darrick J. Wong [Thu, 21 Nov 2024 00:24:22 +0000 (16:24 -0800)] 
mkfs.xfs: enable metadata directories

Enable formatting filesystems with metadata directories.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: refactor generate_rtinfo
Christoph Hellwig [Thu, 21 Nov 2024 00:24:22 +0000 (16:24 -0800)] 
xfs_repair: refactor generate_rtinfo

Move the allocation of the computed values into generate_rtinfo, and thus
make the variables holding them private in rt.c, and clean up a few
formatting nits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
[djwong: move functions to fix build errors]
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: do not count metadata directory files when doing quotacheck
Darrick J. Wong [Thu, 21 Nov 2024 00:24:21 +0000 (16:24 -0800)] 
xfs_repair: do not count metadata directory files when doing quotacheck

Previously, we stated that files in the metadata directory tree are not
counted in the dquot information.  Fix the offline quotacheck code in
xfs_repair and xfs_check to reflect this.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: truncate and unmark orphaned metadata inodes
Darrick J. Wong [Thu, 21 Nov 2024 00:24:21 +0000 (16:24 -0800)] 
xfs_repair: truncate and unmark orphaned metadata inodes

If an inode claims to be a metadata inode but wasn't linked in either
directory tree, remove the attr fork and reset the data fork if the
contents weren't regular extent mappings before moving the inode to the
lost+found.

We don't ifree the inode, because it's possible that the inode was not
actually a metadata inode but simply got corrupted due to bitflips or
something, and we'd rather let the sysadmin examine what's left of the
file instead of photorec'ing it.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: drop all the metadata directory files during pass 4
Darrick J. Wong [Thu, 21 Nov 2024 00:24:21 +0000 (16:24 -0800)] 
xfs_repair: drop all the metadata directory files during pass 4

Drop the entire metadata directory tree during pass 4 so that we can
reinitialize the entire tree in phase 6.  The existing metadata files
(rtbitmap, rtsummary, quotas) will be reattached to the newly rebuilt
directory tree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: metadata dirs are never plausible root dirs
Darrick J. Wong [Thu, 21 Nov 2024 00:24:21 +0000 (16:24 -0800)] 
xfs_repair: metadata dirs are never plausible root dirs

Metadata directories are never candidates to be the root of the
user-accessible directory tree.  Update has_plausible_rootdir to ignore
them all, as well as detecting the case where the superblock incorrectly
thinks both trees have the same root.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: adjust keep_fsinos to handle metadata directories
Darrick J. Wong [Thu, 21 Nov 2024 00:24:20 +0000 (16:24 -0800)] 
xfs_repair: adjust keep_fsinos to handle metadata directories

In keep_fsinos, mark the root of the metadata directory tree as inuse.
The realtime bitmap and summary files still come after the root
directories, so this is a fairly simple change to the loop test.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: mark space used by metadata files
Darrick J. Wong [Thu, 21 Nov 2024 00:24:20 +0000 (16:24 -0800)] 
xfs_repair: mark space used by metadata files

Track space used by metadata files as a separate incore extent type.
This ensures that we can warn about cross-linked metadata files, even
though we are going to rebuild the entire metadata directory tree in the
end.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: pass private data pointer to scan_lbtree
Darrick J. Wong [Thu, 21 Nov 2024 00:24:20 +0000 (16:24 -0800)] 
xfs_repair: pass private data pointer to scan_lbtree

Pass a private data pointer through scan_lbtree.  We'll use this
later when scanning the rtrmapbt to keep track of scan state.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: update incore metadata state whenever we create new files
Darrick J. Wong [Thu, 21 Nov 2024 00:24:20 +0000 (16:24 -0800)] 
xfs_repair: update incore metadata state whenever we create new files

Make sure that we update our incore metadata inode bookkeepping whenever
we create new metadata files.  There will be many more of these later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: don't let metadata and regular files mix
Darrick J. Wong [Thu, 21 Nov 2024 00:24:20 +0000 (16:24 -0800)] 
xfs_repair: don't let metadata and regular files mix

Track whether or not inodes thought they were metadata inodes.  We
cannot allow metadata inodes to appear in the regular directory tree,
and we cannot allow regular inodes to appear in the metadata directory
tree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: rebuild the metadata directory
Darrick J. Wong [Thu, 21 Nov 2024 00:24:19 +0000 (16:24 -0800)] 
xfs_repair: rebuild the metadata directory

Check the dirents in metadata directories for problems and repair them
if necessary.  Also make sure that the sb-rooted inodes (root, metadir
root, rt bitmap, rt summary) are always allocated in that order.

Note that xfs_repair will always rebuild the metadata directory tree
itself, so we only need to report problems, not fix them.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: use libxfs_metafile_iget for quota/rt inodes
Darrick J. Wong [Thu, 21 Nov 2024 00:24:19 +0000 (16:24 -0800)] 
xfs_repair: use libxfs_metafile_iget for quota/rt inodes

Use the new iget function for these metadata files so that we can check
types, etc.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: check metadata inode flag
Darrick J. Wong [Thu, 21 Nov 2024 00:24:19 +0000 (16:24 -0800)] 
xfs_repair: check metadata inode flag

Check whether or not the metadata inode flag is set appropriately.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: refactor grabbing realtime metadata inodes
Darrick J. Wong [Thu, 21 Nov 2024 00:24:19 +0000 (16:24 -0800)] 
xfs_repair: refactor grabbing realtime metadata inodes

Create a helper function to grab a realtime metadata inode.  When
metadir arrives, the bitmap and summary inodes can float, so we'll
turn this function into a "load or allocate" function.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: refactor root directory initialization
Darrick J. Wong [Thu, 21 Nov 2024 00:24:18 +0000 (16:24 -0800)] 
xfs_repair: refactor root directory initialization

Refactor root directory initialization into a separate function we can
call for both the root dir and the metadir.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: refactor marking of metadata inodes
Darrick J. Wong [Thu, 21 Nov 2024 00:24:18 +0000 (16:24 -0800)] 
xfs_repair: refactor marking of metadata inodes

Refactor the mechanics of marking a metadata inode into a helper
function so that we don't have to open-code that for every single
metadata inode.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: refactor fixing dotdot
Darrick J. Wong [Thu, 21 Nov 2024 00:24:18 +0000 (16:24 -0800)] 
xfs_repair: refactor fixing dotdot

Pull the code that fixes a directory's dot-dot entry into a separate
helper function so that we can call it on the rootdir and (later) the
metadir.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: dont check metadata directory dirent inumbers
Darrick J. Wong [Thu, 21 Nov 2024 00:24:18 +0000 (16:24 -0800)] 
xfs_repair: dont check metadata directory dirent inumbers

Phase 6 always rebuilds the entire metadata directory tree, and repair
quietly ignores all the DIFLAG2_METADATA directory inodes that it finds.
As a result, none of the metadata directories are marked inuse in the
incore data.  Therefore, the is_inode_free checks are not valid for
anything we find in a metadata directory.

Therefore, avoid checking is_inode_free when scanning metadata directory
dirents.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_repair: handle sb_metadirino correctly when zeroing supers
Darrick J. Wong [Thu, 21 Nov 2024 00:24:17 +0000 (16:24 -0800)] 
xfs_repair: handle sb_metadirino correctly when zeroing supers

The metadata directory root inumber is now the last field in the
superblock, so extend the zeroing code to know about that.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_scrub: re-run metafile scrubbers during phase 5
Darrick J. Wong [Thu, 21 Nov 2024 00:24:17 +0000 (16:24 -0800)] 
xfs_scrub: re-run metafile scrubbers during phase 5

For metadata files on a metadir filesystem, re-run the scrubbers during
phase 5 to ensure that the metadata files are still connected.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_scrub: scan metadata directories during phase 3
Darrick J. Wong [Thu, 21 Nov 2024 00:24:17 +0000 (16:24 -0800)] 
xfs_scrub: scan metadata directories during phase 3

Scan metadata directories for correctness during phase 3.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_scrub: tread zero-length read verify as an IO error
Darrick J. Wong [Thu, 21 Nov 2024 00:24:17 +0000 (16:24 -0800)] 
xfs_scrub: tread zero-length read verify as an IO error

While doing some chaos testing on the xfs_scrub read verify code, I
noticed that if the device under a live filesystem gets resized while
scrub is running a media scan, reads will start returning 0.  This
causes read_verify() to run around in an infinite loop instead of
erroring out like it should.

Cc: <linux-xfs@vger.kernel.org> # v5.3.0
Fixes: 27464242956fac ("xfs_scrub: fix read verify disk error handling strategy")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_spaceman: report health of metadir inodes too
Darrick J. Wong [Thu, 21 Nov 2024 00:24:17 +0000 (16:24 -0800)] 
xfs_spaceman: report health of metadir inodes too

If the filesystem has a metadata directory tree, we should include those
inodes in the health report.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_io: support scrubbing metadata directory paths
Darrick J. Wong [Thu, 21 Nov 2024 00:24:16 +0000 (16:24 -0800)] 
xfs_io: support scrubbing metadata directory paths

Support invoking the metadata directory path scrubber from xfs_io for
testing.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_io: support flag for limited bulkstat of the metadata directory
Darrick J. Wong [Thu, 21 Nov 2024 00:24:16 +0000 (16:24 -0800)] 
xfs_io: support flag for limited bulkstat of the metadata directory

Support the new XFS_BULK_IREQ_METADIR flag for bulkstat commands.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_db: drop the metadata checking code from blockget
Darrick J. Wong [Wed, 11 Dec 2024 22:48:33 +0000 (14:48 -0800)] 
xfs_db: drop the metadata checking code from blockget

Drop the check subcommand and all the metadata checking code from
xfs_db.  We haven't shipped xfs_check in xfsprogs in a decade and the
last known user (fstests) stopped calling it back in July 2024.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_db: display di_metatype
Darrick J. Wong [Thu, 21 Nov 2024 00:24:16 +0000 (16:24 -0800)] 
xfs_db: display di_metatype

Print the metadata file type if available.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_db: show the metadata root directory when dumping superblocks
Darrick J. Wong [Thu, 21 Nov 2024 00:24:16 +0000 (16:24 -0800)] 
xfs_db: show the metadata root directory when dumping superblocks

Show the metadirino field when appropriate.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_db: support metadata directories in the path command
Darrick J. Wong [Thu, 21 Nov 2024 00:24:15 +0000 (16:24 -0800)] 
xfs_db: support metadata directories in the path command

Teach various directory tree debugger commands to traverse the metadata
directory tree by adding a -m switch to select that tree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_db: don't obfuscate metadata directories and attributes
Darrick J. Wong [Thu, 21 Nov 2024 00:24:15 +0000 (16:24 -0800)] 
xfs_db: don't obfuscate metadata directories and attributes

Don't obfuscate the directory and attribute names of metadata inodes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_db: report metadir support for version command
Darrick J. Wong [Thu, 21 Nov 2024 00:24:15 +0000 (16:24 -0800)] 
xfs_db: report metadir support for version command

Report metadir support if we have it enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_db: disable xfs_check when metadir is enabled
Darrick J. Wong [Thu, 21 Nov 2024 00:24:15 +0000 (16:24 -0800)] 
xfs_db: disable xfs_check when metadir is enabled

As of July 2024, xfs_repair can detect more types of corruptions than
xfs_check does.  I don't think it makes sense to maintain the xfs_check
code anymore, so let's just turn it off for any filesystem that has
metadata directory trees.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agoxfs_io: support scrubbing metadata directory paths
Darrick J. Wong [Thu, 21 Nov 2024 00:24:14 +0000 (16:24 -0800)] 
xfs_io: support scrubbing metadata directory paths

Support invoking the metadata directory path scrubber from xfs_io for
testing.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
7 months agolibfrog: allow METADIR in xfrog_bulkstat_single5
Darrick J. Wong [Thu, 21 Nov 2024 00:24:14 +0000 (16:24 -0800)] 
libfrog: allow METADIR in xfrog_bulkstat_single5

This is a valid flag for a single-file bulkstat, so add that to the
filter.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>