]> git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log
thirdparty/xfsprogs-dev.git
4 months agoxfs_repair: check existing realtime refcountbt entries against observed refcounts
Darrick J. Wong [Mon, 24 Feb 2025 18:22:05 +0000 (10:22 -0800)] 
xfs_repair: check existing realtime refcountbt entries against observed refcounts

Once we've finished collecting reverse mapping observations from the
metadata scan, check those observations against the realtime refcount
btree (particularly if we're in -n mode) to detect rtrefcountbt
problems.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: compute refcount data for the realtime groups
Darrick J. Wong [Mon, 24 Feb 2025 18:22:05 +0000 (10:22 -0800)] 
xfs_repair: compute refcount data for the realtime groups

At the end of phase 4, compute reference count information for realtime
groups from the realtime rmap information collected, just like we do for
AGs in the data section.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: find and mark the rtrefcountbt inode
Darrick J. Wong [Mon, 24 Feb 2025 18:22:05 +0000 (10:22 -0800)] 
xfs_repair: find and mark the rtrefcountbt inode

Make sure that we find the realtime refcountbt inode and mark it
appropriately, just in case we find a rogue inode claiming to
be an rtrefcount, or just plain garbage in the superblock field.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: use realtime refcount btree data to check block types
Darrick J. Wong [Mon, 24 Feb 2025 18:22:05 +0000 (10:22 -0800)] 
xfs_repair: use realtime refcount btree data to check block types

Use the realtime refcount btree to pre-populate the block type information
so that when repair iterates the primary metadata, we can confirm the
block type.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: allow CoW staging extents in the realtime rmap records
Darrick J. Wong [Mon, 24 Feb 2025 18:22:05 +0000 (10:22 -0800)] 
xfs_repair: allow CoW staging extents in the realtime rmap records

Don't flag the rt rmap btree as having errors if there are CoW staging
extent records in it and the filesystem supports reflink.  As far as
reporting leftover staging extents, we'll report them when we scan the
rt refcount btree, in a future patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_spaceman: report health of the realtime refcount btree
Darrick J. Wong [Mon, 24 Feb 2025 18:22:04 +0000 (10:22 -0800)] 
xfs_spaceman: report health of the realtime refcount btree

Report the health of the realtime reference count btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_db: add rtrefcount reservations to the rgresv command
Darrick J. Wong [Mon, 24 Feb 2025 18:22:04 +0000 (10:22 -0800)] 
xfs_db: add rtrefcount reservations to the rgresv command

Report rt refcount btree reservations in the rgresv subcommand output.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_db: copy the realtime refcount btree
Darrick J. Wong [Mon, 24 Feb 2025 18:22:04 +0000 (10:22 -0800)] 
xfs_db: copy the realtime refcount btree

Copy the realtime refcountbt when we're metadumping the filesystem.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_db: support the realtime refcountbt
Darrick J. Wong [Mon, 24 Feb 2025 18:22:04 +0000 (10:22 -0800)] 
xfs_db: support the realtime refcountbt

Wire up various parts of xfs_db for realtime refcount support so that we
can dump the rt refcount btree contents.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_db: display the realtime refcount btree contents
Darrick J. Wong [Mon, 24 Feb 2025 18:22:03 +0000 (10:22 -0800)] 
xfs_db: display the realtime refcount btree contents

Implement all the code we need to dump rtrefcountbt contents, starting
from the inode root.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoman: document userspace API changes due to rt reflink
Darrick J. Wong [Mon, 24 Feb 2025 18:22:03 +0000 (10:22 -0800)] 
man: document userspace API changes due to rt reflink

Update documentation to describe userspace ABI changes made for realtime
reflink support.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agolibfrog: enable scrubbing of the realtime refcount data
Darrick J. Wong [Mon, 24 Feb 2025 18:22:03 +0000 (10:22 -0800)] 
libfrog: enable scrubbing of the realtime refcount data

Add a new entry so that we can scrub the rtrefcountbt and its metadata
directory tree path.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agolibxfs: apply rt extent alignment constraints to CoW extsize hint
Darrick J. Wong [Mon, 24 Feb 2025 18:22:03 +0000 (10:22 -0800)] 
libxfs: apply rt extent alignment constraints to CoW extsize hint

The copy-on-write extent size hint is subject to the same alignment
constraints as the regular extent size hint.  Since we're in the process
of adding reflink (and therefore CoW) to the realtime device, we must
apply the same scattered rextsize alignment validation strategies to
both hints to deal with the possibility of rextsize changing.

Therefore, fix the inode validator to perform rextsize alignment checks
on regular realtime files, and to remove misaligned directory hints.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agolibxfs: add a realtime flag to the refcount update log redo items
Darrick J. Wong [Mon, 24 Feb 2025 18:22:03 +0000 (10:22 -0800)] 
libxfs: add a realtime flag to the refcount update log redo items

Extend the refcount update (CUI) log items with a new realtime flag that
indicates that the updates apply against the realtime refcountbt.  We'll
wire up the actual refcount code later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agolibxfs: compute the rt refcount btree maxlevels during initialization
Darrick J. Wong [Mon, 24 Feb 2025 18:22:02 +0000 (10:22 -0800)] 
libxfs: compute the rt refcount btree maxlevels during initialization

Compute max rt refcount btree height information when we set up libxfs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agomkfs: create the realtime rmap inode
Darrick J. Wong [Mon, 24 Feb 2025 18:22:02 +0000 (10:22 -0800)] 
mkfs: create the realtime rmap inode

Create a realtime rmapbt inode if we format the fs with realtime
and rmap.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_logprint: report realtime RUIs
Darrick J. Wong [Mon, 24 Feb 2025 18:22:02 +0000 (10:22 -0800)] 
xfs_logprint: report realtime RUIs

Decode the RUI format just enough to report if an RUI targets the
realtime device or not.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: reserve per-AG space while rebuilding rt metadata
Darrick J. Wong [Mon, 24 Feb 2025 18:22:02 +0000 (10:22 -0800)] 
xfs_repair: reserve per-AG space while rebuilding rt metadata

Realtime metadata btrees can consume quite a bit of space on a full
filesystem.  Since the metadata are just regular files, we need to
make the per-AG reservations to avoid overfilling any of the AGs while
rebuilding metadata.  This avoids the situation where a filesystem comes
straight from repair and immediately trips over not having enough space
in an AG.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: rebuild the bmap btree for realtime files
Darrick J. Wong [Mon, 24 Feb 2025 18:22:02 +0000 (10:22 -0800)] 
xfs_repair: rebuild the bmap btree for realtime files

Use the realtime rmap btree information to rebuild an inode's data fork
when appropriate.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: check for global free space concerns with default btree slack levels
Darrick J. Wong [Mon, 24 Feb 2025 18:22:01 +0000 (10:22 -0800)] 
xfs_repair: check for global free space concerns with default btree slack levels

It's possible that before repair was started, the filesystem might have
been nearly full, and its metadata btree blocks could all have been
nearly full.  If we then rebuild the btrees with blocks that are only
75% full, that expansion might be enough to run out of free space.  The
solution to this is to pack the new blocks completely full if we fear
running out of space.

Previously, we only had to check and decide that on a per-AG basis.
However, now that XFS can have filesystems with metadata btrees rooted
in inodes, we have a global free space concern because there might be
enough space in each AG to regenerate the AG btrees at 75%, but that
might not leave enough space to regenerate the inode btrees, even if we
fill those blocks to 100%.

Hence we need to precompute the worst case space usage for all btrees in
the filesystem and compare /that/ against the global free space to
decide if we're going to pack the btrees maximally to conserve space.
That decision can override the per-AG determination.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: rebuild the realtime rmap btree
Darrick J. Wong [Mon, 24 Feb 2025 18:22:01 +0000 (10:22 -0800)] 
xfs_repair: rebuild the realtime rmap btree

Rebuild the realtime rmap btree file from the reverse mapping records we
gathered from walking the inodes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: always check realtime file mappings against incore info
Darrick J. Wong [Mon, 24 Feb 2025 18:22:01 +0000 (10:22 -0800)] 
xfs_repair: always check realtime file mappings against incore info

Curiously, the xfs_repair code that processes data fork mappings of
realtime files doesn't actually compare the mappings against the incore
state map during the !check_dups phase (aka phase 3).  As a result, we
lose the opportunity to clear damaged realtime data forks before we get
to crosslinked file checking in phase 4, which results in ondisk
metadata errors calling do_error, which aborts repair.

Split the process_rt_rec_state code into two functions: one to check the
mapping, and another to update the incore state.  The first one can be
called to help us decide if we're going to zap the fork, and the second
one updates the incore state if we decide to keep the fork.  We already
do this for regular data files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: check existing realtime rmapbt entries against observed rmaps
Darrick J. Wong [Mon, 24 Feb 2025 18:22:01 +0000 (10:22 -0800)] 
xfs_repair: check existing realtime rmapbt entries against observed rmaps

Once we've finished collecting reverse mapping observations from the
metadata scan, check those observations against the realtime rmap btree
(particularly if we're in -n mode) to detect rtrmapbt problems.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: find and mark the rtrmapbt inodes
Darrick J. Wong [Mon, 24 Feb 2025 18:22:01 +0000 (10:22 -0800)] 
xfs_repair: find and mark the rtrmapbt inodes

Make sure that we find the realtime rmapbt inodes and mark them
appropriately, just in case we find a rogue inode claiming to be an
rtrmap, or garbage in the metadata directory tree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: refactor realtime inode check
Darrick J. Wong [Mon, 24 Feb 2025 18:22:00 +0000 (10:22 -0800)] 
xfs_repair: refactor realtime inode check

Refactor the realtime bitmap and summary checks into a helper function.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: create a new set of incore rmap information for rt groups
Darrick J. Wong [Mon, 24 Feb 2025 18:22:00 +0000 (10:22 -0800)] 
xfs_repair: create a new set of incore rmap information for rt groups

Create a parallel set of "xfs_ag_rmap" structures to cache information
about reverse mappings for the realtime groups.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: use realtime rmap btree data to check block types
Darrick J. Wong [Mon, 24 Feb 2025 18:22:00 +0000 (10:22 -0800)] 
xfs_repair: use realtime rmap btree data to check block types

Use the realtime rmap btree to pre-populate the block type information
so that when repair iterates the primary metadata, we can confirm the
block type.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: flag suspect long-format btree blocks
Darrick J. Wong [Mon, 24 Feb 2025 18:22:00 +0000 (10:22 -0800)] 
xfs_repair: flag suspect long-format btree blocks

Pass a "suspect" counter through scan_lbtree just like we do for
short-format btree blocks, and increment its value when we encounter
blocks with bad CRCs or outright corruption.  This makes it so that
repair actually catches bmbt blocks with bad crcs or other verifier
errors.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_repair: tidy up rmap_diffkeys
Darrick J. Wong [Mon, 24 Feb 2025 18:21:59 +0000 (10:21 -0800)] 
xfs_repair: tidy up rmap_diffkeys

Tidy up the comparison code in this function to match the kernel.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_spaceman: report health status of the realtime rmap btree
Darrick J. Wong [Mon, 24 Feb 2025 18:21:59 +0000 (10:21 -0800)] 
xfs_spaceman: report health status of the realtime rmap btree

Add reporting of the rt rmap btree health to spaceman.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_db: add an rgresv command
Darrick J. Wong [Mon, 24 Feb 2025 18:21:59 +0000 (10:21 -0800)] 
xfs_db: add an rgresv command

Create a command to dump rtgroup btree space reservations.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_db: make fsmap query the realtime reverse mapping tree
Darrick J. Wong [Mon, 24 Feb 2025 18:21:59 +0000 (10:21 -0800)] 
xfs_db: make fsmap query the realtime reverse mapping tree

Extend the 'fsmap' debugger command to support querying the realtime
rmap btree via a new -r argument.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_db: copy the realtime rmap btree
Darrick J. Wong [Mon, 24 Feb 2025 18:21:58 +0000 (10:21 -0800)] 
xfs_db: copy the realtime rmap btree

Copy the realtime rmapbt when we're metadumping the filesystem.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_db: support the realtime rmapbt
Darrick J. Wong [Mon, 24 Feb 2025 18:21:58 +0000 (10:21 -0800)] 
xfs_db: support the realtime rmapbt

Wire up various parts of xfs_db for realtime rmap support so that we can
dump the btree contents.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_db: display the realtime rmap btree contents
Darrick J. Wong [Mon, 24 Feb 2025 18:21:58 +0000 (10:21 -0800)] 
xfs_db: display the realtime rmap btree contents

Implement all the code we need to dump rtrmapbt contents, starting
from the inode root.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_db: don't abort when bmapping on a non-extents/bmbt fork
Darrick J. Wong [Mon, 24 Feb 2025 18:21:58 +0000 (10:21 -0800)] 
xfs_db: don't abort when bmapping on a non-extents/bmbt fork

We're going to introduce new fork formats, so let's fix the problem that
xfs_db's bmap command aborts when the fork format isn't one of the
existing ones.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_db: compute average btree height
Darrick J. Wong [Mon, 24 Feb 2025 18:21:58 +0000 (10:21 -0800)] 
xfs_db: compute average btree height

Compute the btree height assuming that the blocks are 75% full.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoman: document userspace API changes due to rt rmap
Darrick J. Wong [Mon, 24 Feb 2025 18:21:57 +0000 (10:21 -0800)] 
man: document userspace API changes due to rt rmap

Update documentation to describe userspace ABI changes made for realtime
rmap support.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agolibfrog: enable scrubbing of the realtime rmap
Darrick J. Wong [Mon, 24 Feb 2025 18:21:57 +0000 (10:21 -0800)] 
libfrog: enable scrubbing of the realtime rmap

Add a new entry so that we can scrub the rtrmapbt and its metadata
directory tree path too.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agolibxfs: add a realtime flag to the rmap update log redo items
Darrick J. Wong [Mon, 24 Feb 2025 18:21:57 +0000 (10:21 -0800)] 
libxfs: add a realtime flag to the rmap update log redo items

Extend the rmap update (RUI) log items with a new realtime flag that
indicates that the updates apply against the realtime rmapbt.  We'll
wire up the actual rmap code later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agolibxfs: compute the rt rmap btree maxlevels during initialization
Darrick J. Wong [Mon, 24 Feb 2025 18:21:57 +0000 (10:21 -0800)] 
libxfs: compute the rt rmap btree maxlevels during initialization

Compute max rt rmap btree height information when we set up libxfs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs/libxfs: replace kmalloc() and memcpy() with kmemdup()
Mirsad Todorovac [Mon, 24 Feb 2025 18:21:56 +0000 (10:21 -0800)] 
xfs/libxfs: replace kmalloc() and memcpy() with kmemdup()

Source kernel commit: 9d9b72472631262b35157f1a650f066c0e11c2bb

The source static analysis tool gave the following advice:

./fs/xfs/libxfs/xfs_dir2.c:382:15-22: WARNING opportunity for kmemdup

→ 382         args->value = kmalloc(len,
383                          GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_RETRY_MAYFAIL);
384         if (!args->value)
385                 return -ENOMEM;
386
→ 387         memcpy(args->value, name, len);
388         args->valuelen = len;
389         return -EEXIST;

Replacing kmalloc() + memcpy() with kmemdump() doesn't change semantics.
Original code works without fault, so this is not a bug fix but proposed improvement.

Link: https://lwn.net/Articles/198928/
Fixes: 94a69db2367ef ("xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS")
Fixes: 384f3ced07efd ("[XFS] Return case-insensitive match for dentry cache")
Fixes: 2451337dd0439 ("xfs: global error sign conversion")
Cc: Carlos Maiolino <cem@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chandan Babu R <chandanbabu@kernel.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: linux-xfs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
4 months agoxfs: constify feature checks
Christoph Hellwig [Mon, 24 Feb 2025 18:21:56 +0000 (10:21 -0800)] 
xfs: constify feature checks

Source kernel commit: 183d988ae9e7ada9d7d4333e2289256e74a5ab5b

They will eventually be needed to be const for zoned growfs, but even
now having such simpler helpers as const as possible is a good thing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
4 months agoxfs: remove XFS_ILOG_NONCORE
Christoph Hellwig [Mon, 24 Feb 2025 18:21:56 +0000 (10:21 -0800)] 
xfs: remove XFS_ILOG_NONCORE

Source kernel commit: 415dee1e06da431f3d314641ceecb9018bb6fa53

XFS_ILOG_NONCORE is not used in the kernel code or xfsprogs, remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
4 months agoxfs: mark xfs_dir_isempty static
Christoph Hellwig [Mon, 24 Feb 2025 18:21:56 +0000 (10:21 -0800)] 
xfs: mark xfs_dir_isempty static

Source kernel commit: 23ebf63925989adbe4c4277c8e9b04e0a37f6005

And return bool instead of a boolean condition as int.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
4 months agoxfs: fix the entry condition of exact EOF block allocation optimization
Jinliang Zheng [Mon, 24 Feb 2025 18:21:56 +0000 (10:21 -0800)] 
xfs: fix the entry condition of exact EOF block allocation optimization

Source kernel commit: 915175b49f65d9edeb81659e82cbb27b621dbc17

When we call create(), lseek() and write() sequentially, offset != 0
cannot be used as a judgment condition for whether the file already
has extents.

Furthermore, when xfs_bmap_adjacent() has not given a better blkno,
it is not necessary to use exact EOF block allocation.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
4 months agoxfs: scrub the metadir path of rt refcount btree files
Darrick J. Wong [Mon, 24 Feb 2025 18:21:55 +0000 (10:21 -0800)] 
xfs: scrub the metadir path of rt refcount btree files

Source kernel commit: ca757af07fccf527f91ad49f3b6648e6783b0bc8

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the refcount btree file for each rt group.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: scrub the realtime refcount btree
Darrick J. Wong [Mon, 24 Feb 2025 18:21:55 +0000 (10:21 -0800)] 
xfs: scrub the realtime refcount btree

Source kernel commit: c27929670de144ec76a0dab2f3a168cb4897b314

Add code to scrub realtime refcount btrees.  Similar to the refcount
btree checking code for the data device, we walk the rmap btree for each
refcount record to confirm that the reference counts are correct.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: report realtime refcount btree corruption errors to the health system
Darrick J. Wong [Mon, 24 Feb 2025 18:21:55 +0000 (10:21 -0800)] 
xfs: report realtime refcount btree corruption errors to the health system

Source kernel commit: 026c8ed8d4580228949f177445c605d475880c93

Whenever we encounter corrupt realtime refcount btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: enable extent size hints for CoW operations
Darrick J. Wong [Mon, 24 Feb 2025 18:21:55 +0000 (10:21 -0800)] 
xfs: enable extent size hints for CoW operations

Source kernel commit: 8e84e8052bc283ebb37f929eb9fb97483ea7385e

Wire up the copy-on-write extent size hint for realtime files, and
connect it to the rt allocator so that we avoid fragmentation on rt
filesystems.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: apply rt extent alignment constraints to CoW extsize hint
Darrick J. Wong [Mon, 24 Feb 2025 18:21:54 +0000 (10:21 -0800)] 
xfs: apply rt extent alignment constraints to CoW extsize hint

Source kernel commit: 4de1a7ba4171db681691bd80506d0cf43c5cb46a

The copy-on-write extent size hint is subject to the same alignment
constraints as the regular extent size hint.  Since we're in the process
of adding reflink (and therefore CoW) to the realtime device, we must
apply the same scattered rextsize alignment validation strategies to
both hints to deal with the possibility of rextsize changing.

Therefore, fix the inode validator to perform rextsize alignment checks
on regular realtime files, and to remove misaligned directory hints.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files
Darrick J. Wong [Mon, 24 Feb 2025 18:21:54 +0000 (10:21 -0800)] 
xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files

Source kernel commit: 6853d23badd0f1852d3b711128924e2456d27634

Currently, we (ab)use xfs_get_extsz_hint so that it always returns a
nonzero value for realtime files.  This apparently was done to disable
delayed allocation for realtime files.

However, once we enable realtime reflink, we can also turn on the
alwayscow flag to force CoW writes to realtime files.  In this case, the
logic will incorrectly send the write through the delalloc write path.

Fix this by adjusting the logic slightly.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: recover CoW leftovers in the realtime volume
Darrick J. Wong [Mon, 24 Feb 2025 18:21:54 +0000 (10:21 -0800)] 
xfs: recover CoW leftovers in the realtime volume

Source kernel commit: 51e232674975ff138d0e892272fdde9bc444c572

Scan the realtime refcount tree at mount time to get rid of leftover
CoW staging extents.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: allow inodes to have the realtime and reflink flags
Darrick J. Wong [Mon, 24 Feb 2025 18:21:54 +0000 (10:21 -0800)] 
xfs: allow inodes to have the realtime and reflink flags

Source kernel commit: c3d3605f9661a2451c437a037d338dc79fb78f37

Now that we can share blocks between realtime files, allow this
combination.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: compute rtrmap btree max levels when reflink enabled
Darrick J. Wong [Mon, 24 Feb 2025 18:21:54 +0000 (10:21 -0800)] 
xfs: compute rtrmap btree max levels when reflink enabled

Source kernel commit: c2694ff678c9b667ab4cb7c0b45d45309c4dd64b

Compute the maximum possible height of the realtime rmap btree when
reflink is enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: update rmap to allow cow staging extents in the rt rmap
Darrick J. Wong [Mon, 24 Feb 2025 18:21:53 +0000 (10:21 -0800)] 
xfs: update rmap to allow cow staging extents in the rt rmap

Source kernel commit: 0bada82331238bd366aaa0566d125c6338b42590

Don't error out on CoW staging extent records when realtime reflink is
enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: create routine to allocate and initialize a realtime refcount btree inode
Darrick J. Wong [Mon, 24 Feb 2025 18:21:53 +0000 (10:21 -0800)] 
xfs: create routine to allocate and initialize a realtime refcount btree inode

Source kernel commit: 4ee3113aaf3f6a3c24fcf952d8489363f56ab375

Create a library routine to allocate and initialize an empty realtime
refcountbt inode.  We'll use this for growfs, mkfs, and repair.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: wire up realtime refcount btree cursors
Darrick J. Wong [Mon, 24 Feb 2025 18:21:53 +0000 (10:21 -0800)] 
xfs: wire up realtime refcount btree cursors

Source kernel commit: e5a171729baf61b703069b11fa0d2955890e9b6b

Wire up realtime refcount btree cursors wherever they're needed
throughout the code base.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: wire up a new metafile type for the realtime refcount
Darrick J. Wong [Mon, 24 Feb 2025 18:21:53 +0000 (10:21 -0800)] 
xfs: wire up a new metafile type for the realtime refcount

Source kernel commit: f0415af60f482a2192065be8b334b409495ca8a3

Plumb in the pieces we need to embed the root of the realtime refcount
btree in an inode's data fork, complete with metafile type and on-disk
interpretation functions.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: add metadata reservations for realtime refcount btree
Darrick J. Wong [Mon, 24 Feb 2025 18:21:52 +0000 (10:21 -0800)] 
xfs: add metadata reservations for realtime refcount btree

Source kernel commit: bf0b99411335db18a9ed4fcef278ce9e313f6076

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime refcount btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: add realtime refcount btree inode to metadata directory
Darrick J. Wong [Mon, 24 Feb 2025 18:21:52 +0000 (10:21 -0800)] 
xfs: add realtime refcount btree inode to metadata directory

Source kernel commit: eaed472c40527e526217aff3737816b44b08b363

Add a metadir path to select the realtime refcount btree inode and load
it at mount time.  The rtrefcountbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: add a realtime flag to the refcount update log redo items
Darrick J. Wong [Mon, 24 Feb 2025 18:21:52 +0000 (10:21 -0800)] 
xfs: add a realtime flag to the refcount update log redo items

Source kernel commit: fd9300679ccec20c6ee1b95458ab0bcf0db628d5

Extend the refcount update (CUI) log items with a new realtime flag that
indicates that the updates apply against the realtime refcountbt.  We'll
wire up the actual refcount code later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: prepare refcount functions to deal with rtrefcountbt
Darrick J. Wong [Mon, 24 Feb 2025 18:21:52 +0000 (10:21 -0800)] 
xfs: prepare refcount functions to deal with rtrefcountbt

Source kernel commit: 01cef1db246ee8b094fca6df23ea6d4335748181

Prepare the high-level refcount functions to deal with the new realtime
refcountbt and its slightly different conventions.  Provide the ability
to talk to either refcountbt or rtrefcountbt formats from the same high
level code.

Note that we leave the _recover_cow_leftovers functions for a separate
patch so that we can convert it all at once.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: add realtime refcount btree operations
Darrick J. Wong [Mon, 24 Feb 2025 18:21:52 +0000 (10:21 -0800)] 
xfs: add realtime refcount btree operations

Source kernel commit: 1a6f88ea538db9b3d8aef86112894e7e6d098287

Implement the generic btree operations needed to manipulate rtrefcount
btree blocks. This is different from the regular refcountbt in that we
allocate space from the filesystem at large, and are neither constrained
to the free space nor any particular AG.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: realtime refcount btree transaction reservations
Darrick J. Wong [Mon, 24 Feb 2025 18:21:51 +0000 (10:21 -0800)] 
xfs: realtime refcount btree transaction reservations

Source kernel commit: 2003c6a8754e307970c101a20baf8fb67d0588f2

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrefcountbt to add the record and a second
split in the regular refcountbt to record the rtrefcountbt split.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: introduce realtime refcount btree ondisk definitions
Darrick J. Wong [Mon, 24 Feb 2025 18:21:51 +0000 (10:21 -0800)] 
xfs: introduce realtime refcount btree ondisk definitions

Source kernel commit: 9abe03a0e4f978615a2b1b484b8d09ca84c16ea0

Add the ondisk structure definitions for realtime refcount btrees. The
realtime refcount btree will be rooted from a hidden inode so it needs
to have a separate btree block magic and pointer format.

Next, add everything needed to read, write and manipulate refcount btree
blocks. This prepares the way for connecting the btree operations
implementation, though the changes to actually root the rtrefcount btree
in an inode come later.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: namespace the maximum length/refcount symbols
Darrick J. Wong [Mon, 24 Feb 2025 18:21:51 +0000 (10:21 -0800)] 
xfs: namespace the maximum length/refcount symbols

Source kernel commit: 70fcf6866578e69635399e806273376f5e0b8e2b

Actually namespace these variables properly, so that readers can tell
that this is an XFS symbol, and that it's for the refcount
functionality.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: create a shadow rmap btree during realtime rmap repair
Darrick J. Wong [Mon, 24 Feb 2025 18:21:51 +0000 (10:21 -0800)] 
xfs: create a shadow rmap btree during realtime rmap repair

Source kernel commit: 4a61f12eb11958f157e054d386466627445644cd

Create an in-memory btree of rmap records instead of an array.  This
enables us to do live record collection instead of freezing the fs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: online repair of the realtime rmap btree
Darrick J. Wong [Mon, 24 Feb 2025 18:21:50 +0000 (10:21 -0800)] 
xfs: online repair of the realtime rmap btree

Source kernel commit: 6a849bd81b69ccbda5b766cc700f0be86194e4d1

Repair the realtime rmap btree while mounted.  Similar to the regular
rmap btree repair code, we walk the data fork mappings of every realtime
file in the filesystem to collect reverse-mapping records in an xfarray.
Then we sort the xfarray, and use the btree bulk loader to create a new
rtrmap btree ondisk.  Finally, we swap the btree roots, and reap the old
blocks in the usual way.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: online repair of realtime bitmaps for a realtime group
Darrick J. Wong [Mon, 24 Feb 2025 18:21:50 +0000 (10:21 -0800)] 
xfs: online repair of realtime bitmaps for a realtime group

Source kernel commit: 8defee8dff2b202702cdf33f6d8577adf9ad3e82

For a given rt group, regenerate the bitmap contents from the group's
realtime rmap btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: scrub the metadir path of rt rmap btree files
Darrick J. Wong [Mon, 24 Feb 2025 18:21:50 +0000 (10:21 -0800)] 
xfs: scrub the metadir path of rt rmap btree files

Source kernel commit: 366243cc99b7e80236a19d7391b68d0f47677f4f

Add a new XFS_SCRUB_METAPATH subtype so that we can scrub the metadata
directory tree path to the rmap btree file for each rt group.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: scrub the realtime rmapbt
Darrick J. Wong [Mon, 24 Feb 2025 18:21:50 +0000 (10:21 -0800)] 
xfs: scrub the realtime rmapbt

Source kernel commit: 9a6cc4f6d081fddc0d5ff96744a2507d3559f949

Check the realtime reverse mapping btree against the rtbitmap, and
modify the rtbitmap scrub to check against the rtrmapbt.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: report realtime rmap btree corruption errors to the health system
Darrick J. Wong [Mon, 24 Feb 2025 18:21:50 +0000 (10:21 -0800)] 
xfs: report realtime rmap btree corruption errors to the health system

Source kernel commit: 6d4933c221958d1e1848d5092a3e3d1c6e4a6f92

Whenever we encounter corrupt realtime rmap btree blocks, we should
report that to the health monitoring system for later reporting.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: create routine to allocate and initialize a realtime rmap btree inode
Darrick J. Wong [Mon, 24 Feb 2025 18:21:49 +0000 (10:21 -0800)] 
xfs: create routine to allocate and initialize a realtime rmap btree inode

Source kernel commit: 71b8acb42be60e11810eb43a6f470589fcf7b7dd

Create a library routine to allocate and initialize an empty realtime
rmapbt inode.  We'll use this for mkfs and repair.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: wire up rmap map and unmap to the realtime rmapbt
Darrick J. Wong [Mon, 24 Feb 2025 18:21:49 +0000 (10:21 -0800)] 
xfs: wire up rmap map and unmap to the realtime rmapbt

Source kernel commit: 609a592865c9e66a1c00eb7b8ee7436eea3c39a3

Connect the map and unmap reverse-mapping operations to the realtime
rmapbt via the deferred operation callbacks.  This enables us to
perform rmap operations against the correct btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: wire up a new metafile type for the realtime rmap
Darrick J. Wong [Mon, 24 Feb 2025 18:21:49 +0000 (10:21 -0800)] 
xfs: wire up a new metafile type for the realtime rmap

Source kernel commit: f33659e8a114e2c17108227d30a2bdf398e39bdb

Plumb in the pieces we need to embed the root of the realtime rmap btree
in an inode's data fork, complete with new metafile type and on-disk
interpretation functions.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: add metadata reservations for realtime rmap btrees
Darrick J. Wong [Mon, 24 Feb 2025 18:21:49 +0000 (10:21 -0800)] 
xfs: add metadata reservations for realtime rmap btrees

Source kernel commit: 8491a55cfc73ff5c2c637a70ade51d4d08abb90a

Reserve some free blocks so that we will always have enough free blocks
in the data volume to handle expansion of the realtime rmap btree.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: add realtime reverse map inode to metadata directory
Darrick J. Wong [Mon, 24 Feb 2025 18:21:49 +0000 (10:21 -0800)] 
xfs: add realtime reverse map inode to metadata directory

Source kernel commit: 6b08901a6e8fcda555f3ad39abd73bb0dd37f231

Add a metadir path to select the realtime rmap btree inode and load
it at mount time.  The rtrmapbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: support file data forks containing metadata btrees
Darrick J. Wong [Mon, 24 Feb 2025 18:21:48 +0000 (10:21 -0800)] 
xfs: support file data forks containing metadata btrees

Source kernel commit: 702c90f451622384d6c65897b619f647704b06a9

Create a new fork format type for metadata btrees.  This fork type
requires that the inode is in the metadata directory tree, and only
applies to the data fork.  The actual type of the metadata btree itself
is determined by the di_metatype field.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: pretty print metadata file types in error messages
Darrick J. Wong [Mon, 24 Feb 2025 18:21:48 +0000 (10:21 -0800)] 
xfs: pretty print metadata file types in error messages

Source kernel commit: 219ee99d3673ded7abbc13ddd4d7847e92661e2c

Create a helper function to turn a metadata file type code into a
printable string, and use this to complain about lockdep problems with
rtgroup inodes.  We'll use this more in the next patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: add a realtime flag to the rmap update log redo items
Darrick J. Wong [Mon, 24 Feb 2025 18:21:48 +0000 (10:21 -0800)] 
xfs: add a realtime flag to the rmap update log redo items

Source kernel commit: 9e823fc27419b09718fff74ae2297b25ae6fb317

Extend the rmap update (RUI) log items to handle realtime volumes by
adding a new log intent item type.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: prepare rmap functions to deal with rtrmapbt
Darrick J. Wong [Mon, 24 Feb 2025 18:21:48 +0000 (10:21 -0800)] 
xfs: prepare rmap functions to deal with rtrmapbt

Source kernel commit: adafb31c80e608e63adcf8cae5675db00c734149

Prepare the high-level rmap functions to deal with the new realtime
rmapbt and its slightly different conventions.  Provide the ability
to talk to either rmapbt or rtrmapbt formats from the same high
level code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: add realtime rmap btree operations
Darrick J. Wong [Mon, 24 Feb 2025 18:21:47 +0000 (10:21 -0800)] 
xfs: add realtime rmap btree operations

Source kernel commit: d386b4024372ea2f06aaa0f2c6c380b45ba0536e

Implement the generic btree operations needed to manipulate rtrmap
btree blocks. This is different from the regular rmapbt in that we
allocate space from the filesystem at large, and are neither
constrained to the free space nor any particular AG.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: realtime rmap btree transaction reservations
Darrick J. Wong [Mon, 24 Feb 2025 18:21:47 +0000 (10:21 -0800)] 
xfs: realtime rmap btree transaction reservations

Source kernel commit: e1c76fce50bb750dff236aa51a3b698de4f7132c

Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents.  We have to reserve enough space
to handle a split in the rtrmapbt to add the record and a second
split in the regular rmapbt to record the rtrmapbt split.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: introduce realtime rmap btree ondisk definitions
Darrick J. Wong [Mon, 24 Feb 2025 18:21:47 +0000 (10:21 -0800)] 
xfs: introduce realtime rmap btree ondisk definitions

Source kernel commit: fc6856c6ff08642e3e8437f0416d70a5e1807010

Add the ondisk structure definitions for realtime rmap btrees. The
realtime rmap btree will be rooted from a hidden inode so it needs to
have a separate btree block magic and pointer format.

Next, add everything needed to read, write and manipulate rmap btree
blocks. This prepares the way for connecting the btree operations
implementation, though embedding the rtrmap btree root in the inode
comes later in the series.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: allow inode-based btrees to reserve space in the data device
Darrick J. Wong [Mon, 24 Feb 2025 18:21:47 +0000 (10:21 -0800)] 
xfs: allow inode-based btrees to reserve space in the data device

Source kernel commit: 05290bd5c6236b8ad659157edb36bd2d38f46d3e

Create a new space reservation scheme so that btree metadata for the
realtime volume can reserve space in the data device to avoid space
underruns.

Back when we were testing the rmap and refcount btrees for the data
device, people observed occasional shutdowns when xfs_btree_split was
called for either of those two btrees.  This happened when certain
operations (mostly writeback ioends) created new rmap or refcount
records, which would expand the size of the btree.  If there were no
free blocks available the allocation would fail and the split would shut
down the filesystem.

I considered pre-reserving blocks for btree expansion at the time of a
write() call, but there wasn't any good way to attach the reservations
to an inode and keep them there all the way to ioend processing.  Unlike
delalloc reservations which have that indlen mechanism, there's no way
to do that for mapped extents; and indlen blocks are given back during
the delalloc -> unwritten transition.

The solution was to reserve sufficient blocks for rmap/refcount btree
expansion at mount time.  This is what the XFS_AG_RESV_* flags provide;
any expansion of those two btrees can come from the pre-reserved space.

This patch brings that pre-reservation ability to inode-rooted btrees so
that the rt rmap and refcount btrees can also save room for future
expansion.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: support storing records in the inode core root
Darrick J. Wong [Mon, 24 Feb 2025 18:21:46 +0000 (10:21 -0800)] 
xfs: support storing records in the inode core root

Source kernel commit: 2f63b20b7a26c9a7c76ea5a6565ca38cd9e31282

Add the necessary flags and code so that we can support storing leaf
records in the inode root block of a btree.  This hasn't been necessary
before, but the realtime rmapbt will need to be able to do this.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions
Darrick J. Wong [Mon, 24 Feb 2025 18:21:46 +0000 (10:21 -0800)] 
xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions

Source kernel commit: 953f76bf7a3622351f335c77c56ed7efb793e3e7

Simplify the calling conventions by allowing callers to pass a fsbno
(xfs_fsblock_t) directly into these functions, since we're just going to
set it in a struct anyway.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: prepare to reuse the dquot pointer space in struct xfs_inode
Darrick J. Wong [Mon, 24 Feb 2025 18:21:46 +0000 (10:21 -0800)] 
xfs: prepare to reuse the dquot pointer space in struct xfs_inode

Source kernel commit: 84140a96cf7a5b5b48b862a79c8322aa220ce591

Files participating in the metadata directory tree are not accounted to
the quota subsystem.  Therefore, the i_[ugp]dquot pointers in struct
xfs_inode are never used and should always be NULL.

In the next patch we want to add a u64 count of fs blocks reserved for
metadata btree expansion, but we don't want every inode in the fs to pay
the memory price for this feature.  The intent is to union those three
pointers with the u64 counter, but for that to work we must guard
against all access to the dquot pointers for metadata files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: add some rtgroup inode helpers
Darrick J. Wong [Mon, 24 Feb 2025 18:21:46 +0000 (10:21 -0800)] 
xfs: add some rtgroup inode helpers

Source kernel commit: af32541081ed6b6ad49b1ea38b5128cb319841b0

Create some simple helpers to reduce the amount of typing whenever we
access rtgroup inodes.  Conversion was done with this spatch and some
minor reformatting:

@@
expression rtg;
@@

- rtg->rtg_inodes[XFS_RTGI_BITMAP]
+ rtg_bitmap(rtg)

@@
expression rtg;
@@

- rtg->rtg_inodes[XFS_RTGI_SUMMARY]
+ rtg_summary(rtg)

and the CLI command:

$ spatch --sp-file /tmp/moo.cocci --dir fs/xfs/ --use-gitgrep --in-place

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: hoist the node iroot update code out of xfs_btree_kill_iroot
Darrick J. Wong [Mon, 24 Feb 2025 18:21:45 +0000 (10:21 -0800)] 
xfs: hoist the node iroot update code out of xfs_btree_kill_iroot

Source kernel commit: 505248719fcbf2c76594fe2ef293680d97fe426c

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node child into the root block
to a separate function.  Remove some unnecessary conditionals and clean
up a few function calls in the new function.  Note that this change
reorders the ->free_block call with respect to the change in bc_nlevels
to make it easier to support inode root leaf blocks in the next patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: hoist the node iroot update code out of xfs_btree_new_iroot
Darrick J. Wong [Mon, 24 Feb 2025 18:21:45 +0000 (10:21 -0800)] 
xfs: hoist the node iroot update code out of xfs_btree_new_iroot

Source kernel commit: 7708951ae52132d3c4e05aee2e57d35f0d89bd49

In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node root into a child block
to a separate function.  Note that the new function explicitly computes
the keys of the new child block and stores that in the root block; while
the bmap btree could rely on leaving the key alone, realtime rmap needs
to set the new high key.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: tidy up xfs_bmap_broot_realloc a bit
Darrick J. Wong [Mon, 24 Feb 2025 18:21:45 +0000 (10:21 -0800)] 
xfs: tidy up xfs_bmap_broot_realloc a bit

Source kernel commit: c914081775e2e39e4afa9b4bb9e5c98202110f51

Hoist out the code that migrates broot pointers during a resize
operation to avoid code duplication and streamline the caller.  Also
use the correct bmbt pointer type for the sizeof operation.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: make xfs_iroot_realloc a bmap btree function
Darrick J. Wong [Mon, 24 Feb 2025 18:21:45 +0000 (10:21 -0800)] 
xfs: make xfs_iroot_realloc a bmap btree function

Source kernel commit: eb9bff22311ca47ef4848bbdcf24dae06ae3f243

Move the inode fork btree root reallocation function part of the btree
ops because it's now mostly bmbt-specific code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: make xfs_iroot_realloc take the new numrecs instead of deltas
Darrick J. Wong [Mon, 24 Feb 2025 18:21:45 +0000 (10:21 -0800)] 
xfs: make xfs_iroot_realloc take the new numrecs instead of deltas

Source kernel commit: 6a92924275ecdd768c8105f8975b971300c5ba7d

Change the calling signature of xfs_iroot_realloc to take the ifork and
the new number of records in the btree block, not a diff against the
current number.  This will make the callsites easier to understand.

Note that this function is misnamed because it is very specific to the
single type of inode-rooted btree supported.  This will be addressed in
a subsequent patch.

Return the new btree root to reduce the amount of code clutter.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: refactor the inode fork memory allocation functions
Darrick J. Wong [Mon, 24 Feb 2025 18:21:44 +0000 (10:21 -0800)] 
xfs: refactor the inode fork memory allocation functions

Source kernel commit: 6c1c55ac3c0512262817a088e805d99aad4c0867

Hoist the code that allocates, frees, and reallocates if_broot into a
single xfs_iroot_krealloc function.  Eventually we're going to push
xfs_iroot_realloc into the btree ops structure to handle multiple
inode-rooted btrees, but first let's separate out the bits that should
stay in xfs_inode_fork.c.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: tidy up xfs_iroot_realloc
Darrick J. Wong [Mon, 24 Feb 2025 18:21:44 +0000 (10:21 -0800)] 
xfs: tidy up xfs_iroot_realloc

Source kernel commit: 4f13f0a3fc6ad193e4d144a5e001b7b8f1fc4b7f

Tidy up this function a bit before we start refactoring the memory
handling and move the function to the bmbt code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_scrub: try harder to fill the bulkstat array with bulkstat()
Darrick J. Wong [Mon, 24 Feb 2025 18:21:44 +0000 (10:21 -0800)] 
xfs_scrub: try harder to fill the bulkstat array with bulkstat()

Sometimes, the last bulkstat record returned by the first xfrog_bulkstat
call in bulkstat_for_inumbers will contain an inumber less than the
highest allocated inode mentioned in the inumbers record.  This happens
either because the inodes have been freed, or because the the kernel
encountered a corrupt inode during bulkstat and stopped filling up the
array.

In both cases, we can call bulkstat again to try to fill up the rest of
the array.  If there are newly allocated inodes, they'll be returned; if
we've truly hit the end of the filesystem, the kernel will return zero
records; and if the first allocated inode is indeed corrupt, the kernel
will return EFSCORRUPTED.

As an optimization to avoid the single-step code, call bulkstat with an
increasing ino parameter until the bulkstat array is full or the kernel
tells us there are no bulkstat records to return.  This speeds things
up a bit in cases where the allocmask is all ones and only the second
inode is corrupt.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_scrub: ignore freed inodes when single-stepping during phase 3
Darrick J. Wong [Mon, 24 Feb 2025 18:21:44 +0000 (10:21 -0800)] 
xfs_scrub: ignore freed inodes when single-stepping during phase 3

For inodes that inumbers told us were allocated but weren't loaded by
the bulkstat call, we fall back to loading bulkstat data one inode at a
time to try to find the inodes that are too corrupt to load.

However, there are a couple of outcomes of the single bulkstat call that
clearly indicate that the inode is free, not corrupt.  In this case, the
phase 3 inode scan will try to scrub the inode, only to be told ENOENT
because it doesn't exist.

As an optimization here, don't increment ocount, just move on to the
next inode in the mask.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs_scrub: hoist the phase3 bulkstat single stepping code
Darrick J. Wong [Mon, 24 Feb 2025 18:21:43 +0000 (10:21 -0800)] 
xfs_scrub: hoist the phase3 bulkstat single stepping code

We're about to make the bulkstat single step loading code more complex,
so hoist it into a separate function.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>