]> git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log
thirdparty/xfsprogs-dev.git
15 months agoxfs: force small EFIs for reaping btree extents
Darrick J. Wong [Mon, 15 Apr 2024 23:07:34 +0000 (16:07 -0700)] 
xfs: force small EFIs for reaping btree extents

Source kernel commit: 3f3cec031099c37513727efc978a12b6346e326d

Introduce the concept of a defer ops barrier to separate consecutively
queued pending work items of the same type.  With a barrier in place,
the two work items will be tracked separately, and receive separate log
intent items.  The goal here is to prevent reaping of old metadata
blocks from creating unnecessarily huge EFIs that could then run the
risk of overflowing the scrub transaction.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: remove unused fields from struct xbtree_ifakeroot
Darrick J. Wong [Mon, 15 Apr 2024 23:07:34 +0000 (16:07 -0700)] 
xfs: remove unused fields from struct xbtree_ifakeroot

Source kernel commit: 4c8ecd1cfdd01fb727121035014d9f654a30bdf2

Remove these unused fields since nobody uses them.  They should have
been removed years ago in a different cleanup series from Christoph
Hellwig.

Fixes: daf83964a3681 ("xfs: move the per-fork nextents fields into struct xfs_ifork")
Fixes: f7e67b20ecbbc ("xfs: move the fork format fields into struct xfs_ifork")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: automatic freeing of freshly allocated unwritten space
Darrick J. Wong [Mon, 15 Apr 2024 23:07:34 +0000 (16:07 -0700)] 
xfs: automatic freeing of freshly allocated unwritten space

Source kernel commit: e3042be36c343207b7af249a09f50b4e37e9fda4

As mentioned in the previous commit, online repair wants to allocate
space to write out a new metadata structure, and it also wants to hedge
against system crashes during repairs by logging (and later cancelling)
EFIs to free the space if we crash before committing the new data
structure.

Therefore, create a trio of functions to schedule automatic reaping of
freshly allocated unwritten space.  xfs_alloc_schedule_autoreap creates
a paused EFI representing the space we just allocated.  Once the
allocations are made and the autoreaps scheduled, we can start writing
to disk.

If the writes succeed, xfs_alloc_cancel_autoreap marks the EFI work
items as stale and unpauses the pending deferred work item.  Assuming
that's done in the same transaction that commits the new structure into
the filesystem, we guarantee that either the new object is fully
visible, or that all the space gets reclaimed.

If the writes succeed but only part of an extent was used, repair must
call the same _cancel_autoreap function to kill the first EFI and then
log a new EFI to free the unused space.  The first EFI is already

For full extents that aren't used, xfs_alloc_commit_autoreap will
unpause the EFI, which results in the space being freed during the next
_defer_finish cycle.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: remove __xfs_free_extent_later
Darrick J. Wong [Mon, 15 Apr 2024 23:07:34 +0000 (16:07 -0700)] 
xfs: remove __xfs_free_extent_later

Source kernel commit: 4c88fef3af4a51c2cdba6a28237e98da4873e8dc

xfs_free_extent_later is a trivial helper, so remove it to reduce the
amount of thinking required to understand the deferred freeing
interface.  This will make it easier to introduce automatic reaping of
speculative allocations in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: allow pausing of pending deferred work items
Darrick J. Wong [Mon, 15 Apr 2024 23:07:34 +0000 (16:07 -0700)] 
xfs: allow pausing of pending deferred work items

Source kernel commit: 4dffb2cbb4839fd6f9bbac0b3fd06cc9015cbb9b

Traditionally, all pending deferred work attached to a transaction is
finished when one of the xfs_defer_finish* functions is called.
However, online repair wants to be able to allocate space for a new data
structure, format a new metadata structure into the allocated space, and

As a hedge against system crashes during repairs, we also want to log
some EFI items for the allocated space speculatively, and cancel them if
we elect to commit the new data structure.

Therefore, introduce the idea of pausing a pending deferred work item.
Log intent items are still created for paused items and relogged as
necessary.  However, paused items are pushed onto a side list before we
start calling ->finish_item, and the whole list is reattach to the
transaction afterwards.  New work items are never attached to paused
pending items.

Modify xfs_defer_cancel to clean up pending deferred work items holding
a log intent item but not a log intent done item, since that is now
possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: don't append work items to logged xfs_defer_pending objects
Darrick J. Wong [Mon, 15 Apr 2024 23:07:33 +0000 (16:07 -0700)] 
xfs: don't append work items to logged xfs_defer_pending objects

Source kernel commit: 6b126139401a2284402d7c38fe3168d5a26da41d

When someone tries to add a deferred work item to xfs_defer_add, it will
try to attach the work item to the most recently added xfs_defer_pending
object attached to the transaction.  However, it doesn't check if the
pending object has a log intent item attached to it.  This is incorrect
behavior because we cannot add more work to an object that has already
been committed to the ondisk log.

Therefore, change the behavior not to append to pending items with a non
null dfp_intent.  In practice this has not been an issue because the
only way xfs_defer_add gets called after log intent items have been
the @dop_pending isolation in xfs_defer_finish_noroll protects the
pending items that have already been logged.

However, the next patch will add the ability to pause a deferred extent
free object during online btree rebuilding, and any new extfree work
items need to have their own pending event.

While we're at it, hoist the predicate to its own static inline function
for readability.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: elide ->create_done calls for unlogged deferred work
Darrick J. Wong [Mon, 15 Apr 2024 23:07:33 +0000 (16:07 -0700)] 
xfs: elide ->create_done calls for unlogged deferred work

Source kernel commit: 9c07bca793b4ff9f0b7871e2a928a1b28b8fa4e3

Extended attribute updates use the deferred work machinery to manage
state across a chain of smaller transactions.  All previous deferred
work users have employed log intent items and log done items to manage
restarting of interrupted operations, which means that ->create_intent
sets dfp_intent to a log intent item and ->create_done uses that item to
create a log intent done item.

However, xattrs have used the INCOMPLETE flag to deal with the lack of
recovery support for an interrupted transaction chain.  Log items are
optional if the xattr update caller didn't set XFS_DA_OP_LOGGED to
require a restartable sequence.

In other words, ->create_intent can return NULL to say that there's no
log intent item.  If that's the case, no log intent done item should be
created.  Clean up xfs_defer_create_done not to do this, so that the
->create_done functions don't have to check for non-null dfp_intent
themselves.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: don't allow overly small or large realtime volumes
Darrick J. Wong [Mon, 15 Apr 2024 23:07:33 +0000 (16:07 -0700)] 
xfs: don't allow overly small or large realtime volumes

Source kernel commit: e14293803f4e84eb23a417b462b56251033b5a66

Don't allow realtime volumes that are less than one rt extent long.
This has been broken across 4 LTS kernels with nobody noticing, so let's
just disable it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: fix 32-bit truncation in xfs_compute_rextslog
Darrick J. Wong [Mon, 15 Apr 2024 23:07:33 +0000 (16:07 -0700)] 
xfs: fix 32-bit truncation in xfs_compute_rextslog

Source kernel commit: cf8f0e6c1429be7652869059ea44696b72d5b726

It's quite reasonable that some customer somewhere will want to
configure a realtime volume with more than 2^32 extents.  If they try to
do this, the highbit32() call will truncate the upper bits of the
xfs_rtbxlen_t and produce the wrong value for rextslog.  This in turn
causes the rsumlevels to be wrong, which results in a realtime summary
file that is the wrong length.  Fix that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: make rextslog computation consistent with mkfs
Darrick J. Wong [Mon, 15 Apr 2024 23:07:32 +0000 (16:07 -0700)] 
xfs: make rextslog computation consistent with mkfs

Source kernel commit: a6a38f309afc4a7ede01242b603f36c433997780

There's a weird discrepancy in xfsprogs dating back to the creation of
the Linux port -- if there are zero rt extents, mkfs will set
sb_rextents and sb_rextslog both to zero:

sbp->sb_rextslog =
(uint8_t)(rtextents ?
libxfs_highbit32((unsigned int)rtextents) : 0);

However, that's not the check that xfs_repair uses for nonzero rtblocks:

if (sb->sb_rextslog !=
libxfs_highbit32((unsigned int)sb->sb_rextents))

The difference here is that xfs_highbit32 returns -1 if its argument is
zero.  Unfortunately, this means that in the weird corner case of a
realtime volume shorter than 1 rt extent, xfs_repair will immediately
flag a freshly formatted filesystem as corrupt.  Because mkfs has been
writing ondisk artifacts like this for decades, we have to accept that
as "correct".  TBH, zero rextslog for zero rtextents makes more sense to
me anyway.

Regrettably, the superblock verifier checks created in commit copied
xfs_repair even though mkfs has been writing out such filesystems for
ages.  Fix the superblock verifier to accept what mkfs spits out; the
userspace version of this patch will have to fix xfs_repair as well.

Note that the new helper leaves the zeroday bug where the upper 32 bits
of sb_rextents is ripped off and fed to highbit32.  This leads to a
seriously undersized rt summary file, which immediately breaks mkfs:

$ hugedisk.sh foo /dev/sdc $(( 0x100000080 * 4096))B
$ /sbin/mkfs.xfs -f /dev/sda -m rmapbt=0,reflink=0 -r rtdev=/dev/mapper/foo
meta-data=/dev/sda               isize=512    agcount=4, agsize=1298176 blks
=                       sectsz=512   attr=2, projid32bit=1
=                       crc=1        finobt=1, sparse=1, rmapbt=0
=                       reflink=0    bigtime=1 inobtcount=1 nrext64=1
data     =                       bsize=4096   blocks=5192704, imaxpct=25
=                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
=                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =/dev/mapper/foo        extsz=4096   blocks=4294967424, rtextents=4294967424
Discarding blocks...Done.
mkfs.xfs: Error initializing the realtime space [117 - Structure needs cleaning]

The next patch will drop support for rt volumes with fewer than 1 or
more than 2^32-1 rt extents, since they've clearly been broken forever.

Fixes: f8e566c0f5e1f ("xfs: validate the realtime geometry in xfs_validate_sb_common")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: move ->iop_relog to struct xfs_defer_op_type
Darrick J. Wong [Mon, 15 Apr 2024 23:07:32 +0000 (16:07 -0700)] 
xfs: move ->iop_relog to struct xfs_defer_op_type

Source kernel commit: a49c708f9a445457f6a5905732081871234f61c6

The only log items that need relogging are the ones created for deferred
work operations, and the only part of the code base that relogs log
items is the deferred work machinery.  Move the function pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: hoist xfs_trans_add_item calls to defer ops functions
Darrick J. Wong [Mon, 15 Apr 2024 23:07:32 +0000 (16:07 -0700)] 
xfs: hoist xfs_trans_add_item calls to defer ops functions

Source kernel commit: b28852a5bd08654634e4e32eb072fba14c5fae26

Remove even more repeated boilerplate.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: clean out XFS_LI_DIRTY setting boilerplate from ->iop_relog
Darrick J. Wong [Mon, 15 Apr 2024 23:07:32 +0000 (16:07 -0700)] 
xfs: clean out XFS_LI_DIRTY setting boilerplate from ->iop_relog

Source kernel commit: 3e0958be2156d90ef908a1a547b4e27a3ec38da9

Hoist this dirty flag setting to the ->iop_relog callsite to reduce
boilerplate.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: use xfs_defer_create_done for the relogging operation
Darrick J. Wong [Mon, 15 Apr 2024 23:07:32 +0000 (16:07 -0700)] 
xfs: use xfs_defer_create_done for the relogging operation

Source kernel commit: bd3a88f6b71c7509566b44b7021581191cc11ae3

Now that we have a helper to handle creating a log intent done item and
updating all the necessary state flags, use it to reduce boilerplate in
the ->iop_relog implementations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: hoist ->create_intent boilerplate to its callsite
Darrick J. Wong [Mon, 15 Apr 2024 23:07:31 +0000 (16:07 -0700)] 
xfs: hoist ->create_intent boilerplate to its callsite

Source kernel commit: f3fd7f6fce1cc9b8eb59705b27f823330207b7c9

Hoist the dirty flag setting code out of each ->create_intent
implementation up to the callsite to reduce boilerplate further.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: hoist intent done flag setting to ->finish_item callsite
Darrick J. Wong [Mon, 15 Apr 2024 23:07:31 +0000 (16:07 -0700)] 
xfs: hoist intent done flag setting to ->finish_item callsite

Source kernel commit: 3dd75c8db1c1675a26d3e228bab349c1fc065867

Each log intent item's ->finish_item call chain inevitably includes some
code to set the dirty flag of the transaction.  If there's an associated
log intent done item, it also sets the item's dirty flag and the
transaction's INTENT_DONE flag.  This is repeated throughout the
codebase.

Reduce the LOC by moving all that to xfs_defer_finish_one.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: move ->iop_recover to xfs_defer_op_type
Darrick J. Wong [Mon, 15 Apr 2024 23:07:31 +0000 (16:07 -0700)] 
xfs: move ->iop_recover to xfs_defer_op_type

Source kernel commit: db7ccc0bac2add5a41b66578e376b49328fc99d0

Finish off the series by moving the intent item recovery function
pointer to the xfs_defer_op_type struct, since this is really a deferred
work function now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: use xfs_defer_finish_one to finish recovered work items
Darrick J. Wong [Mon, 15 Apr 2024 23:07:31 +0000 (16:07 -0700)] 
xfs: use xfs_defer_finish_one to finish recovered work items

Source kernel commit: e5f1a5146ec35f3ed5d7f5ac7807a10c0062b6b8

Get rid of the open-coded calls to xfs_defer_finish_one.  This also
means that the recovery transaction takes care of cleaning up the dfp,
and we have solved (I hope) all the ownership issues in recovery.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: recreate work items when recovering intent items
Darrick J. Wong [Mon, 15 Apr 2024 23:07:31 +0000 (16:07 -0700)] 
xfs: recreate work items when recovering intent items

Source kernel commit: e70fb328d5277297ea2d9169a3a046de6412d777

Recreate work items for each xfs_defer_pending object when we are
recovering intent items.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs: use xfs_defer_pending objects to recover intent items
Darrick J. Wong [Mon, 15 Apr 2024 23:07:30 +0000 (16:07 -0700)] 
xfs: use xfs_defer_pending objects to recover intent items

Source kernel commit: 03f7767c9f6120ac933378fdec3bfd78bf07bc11

One thing I never quite got around to doing is porting the log intent
item recovery code to reconstruct the deferred pending work state.  As a
result, each intent item open codes xfs_defer_finish_one in its recovery
method, because that's what the EFI code did before xfs_defer.c even
existed.

This is a gross thing to have left unfixed -- if an EFI cannot proceed
due to busy extents, we end up creating separate new EFIs for each
unfinished work item, which is a change in behavior from what runtime
would have done.

Worse yet, Long Li pointed out that there's a UAF in the recovery code.
The ->commit_pass2 function adds the intent item to the AIL and drops
the refcount.  The one remaining refcount is now owned by the recovery
mechanism (aka the log intent items in the AIL) with the intent of
giving the refcount to the intent done item in the ->iop_recover
function.

However, if something fails later in recovery, xlog_recover_finish will
walk the recovered intent items in the AIL and release them.  If the CIL
hasn't been pushed before that point (which is possible since we don't
force the log until later) then the intent done release will try to free
its associated intent, which has already been freed.

This patch starts to address this mess by having the ->commit_pass2
functions recreate the xfs_defer_pending state.  The next few patches
will fix the recovery functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs_{db,repair}: use m_blockwsize instead of sb_blocksize for rt blocks
Darrick J. Wong [Mon, 15 Apr 2024 23:07:30 +0000 (16:07 -0700)] 
xfs_{db,repair}: use m_blockwsize instead of sb_blocksize for rt blocks

In preparation to add block headers to rt bitmap and summary blocks,
convert all the relevant calculations in the userspace tools to use the
per-block word count instead of the raw blocksize.  This is key to
adding this support outside of libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs_{db,repair}: use accessor functions for summary info words
Darrick J. Wong [Mon, 15 Apr 2024 23:07:30 +0000 (16:07 -0700)] 
xfs_{db,repair}: use accessor functions for summary info words

Port xfs_db and xfs_repair to use get and set functions for rtsummary
words so that we can redefine the ondisk format with a specific
endianness.  Note that this requires the definition of a distinct type
for ondisk summary info words so that the compiler can perform proper
typechecking.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs_{db,repair}: use helpers for rtsummary block/wordcount computations
Darrick J. Wong [Mon, 15 Apr 2024 23:07:30 +0000 (16:07 -0700)] 
xfs_{db,repair}: use helpers for rtsummary block/wordcount computations

Port xfs_db and xfs_repair to use the new helper functions that compute
the number of blocks or words necessary to store the rt summary file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs_{db,repair}: use accessor functions for bitmap words
Darrick J. Wong [Mon, 15 Apr 2024 23:07:30 +0000 (16:07 -0700)] 
xfs_{db,repair}: use accessor functions for bitmap words

Port xfs_db and xfs_repair to use get and set functions for rtbitmap
words so that we can redefine the ondisk format with a specific
endianness.  Note that this requires the definition of a distinct type
for ondisk rtbitmap words so that the compiler can perform proper
typechecking as we go back and forth.

In the upcoming rtgroups feature, we're going to fix the problem that
rtwords are written in host endian order, which means we'll need the
distinct rtword/rtword_raw types.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs_repair: convert helpers for rtbitmap block/wordcount computations
Darrick J. Wong [Mon, 15 Apr 2024 23:07:29 +0000 (16:07 -0700)] 
xfs_repair: convert helpers for rtbitmap block/wordcount computations

Port xfs_repair to use the new helper functions that compute the number
of blocks or words necessary to store the rt bitmap.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs_{db,repair}: convert open-coded xfs_rtword_t pointer accesses to helper
Darrick J. Wong [Mon, 15 Apr 2024 23:07:29 +0000 (16:07 -0700)] 
xfs_{db,repair}: convert open-coded xfs_rtword_t pointer accesses to helper

There are a bunch of places in xfs_db and xfs_repair where we use
open-coded logic to find a pointer to an xfs_rtword_t within a rt bitmap
buffer.  Convert all that to helper functions for better type safety.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agomkfs: convert utility to use new rt extent helpers and types
Darrick J. Wong [Mon, 15 Apr 2024 23:07:29 +0000 (16:07 -0700)] 
mkfs: convert utility to use new rt extent helpers and types

Convert the repair program to use the new realtime extent types and
helper functions instead of open-coding them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs_repair: convert utility to use new rt extent helpers and types
Darrick J. Wong [Mon, 15 Apr 2024 23:07:29 +0000 (16:07 -0700)] 
xfs_repair: convert utility to use new rt extent helpers and types

Convert the repair program to use the new realtime extent types and
helper functions instead of open-coding them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agolibxfs: use helpers to convert rt block numbers to rt extent numbers
Darrick J. Wong [Mon, 15 Apr 2024 23:07:28 +0000 (16:07 -0700)] 
libxfs: use helpers to convert rt block numbers to rt extent numbers

Now that we have helpers to do unit conversions of rt block numbers to
rt extent numbers, plug that into libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agolibxfs: create a helper to compute leftovers of realtime extents
Darrick J. Wong [Mon, 15 Apr 2024 23:07:28 +0000 (16:07 -0700)] 
libxfs: create a helper to compute leftovers of realtime extents

Port the inode item precommunt function to use a helper to compute the
misalignment between a file extent (xfs_extlen_t) and a realtime extent.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agoxfs_repair: fix confusing rt space units in the duplicate detection code
Darrick J. Wong [Mon, 15 Apr 2024 23:07:28 +0000 (16:07 -0700)] 
xfs_repair: fix confusing rt space units in the duplicate detection code

Christoph Hellwig stumbled over the crosslinked file data detection code
in xfs_repair.  While trying to make sense of his fixpatch, I realized
that the variable names and unit types are very misleading.

The rt dup tree builder inserts records in units of realtime extents.
One query of the rt dup tree passes in a realtime extent number, but one
of them does not.  Confusingly, all the variable names have "block" even
though they really mean "extent".  This makes a real difference for
rextsize > 1 filesystems, though given the lack of complaints I'm
guessing there aren't many users.

Clean up this whole mess by fixing the variable names of the duplicates
tree and the state array to reflect the units that are stored in the
data structure, and fix the buggy query code.  Later on in this patchset
we'll fix the variable types too.

This seems to have been broken since before the start of the git repo.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agomkfs: fix log sunit rounding when external logs are in use
Darrick J. Wong [Mon, 15 Apr 2024 23:07:28 +0000 (16:07 -0700)] 
mkfs: fix log sunit rounding when external logs are in use

Due to my heinous nature, I set up an external log device with 4k LBAs
using this command:

# losetup -b 4096 -o 4096 --sizelimit $(( (128 * 1048576) - 4096 )) -f /dev/sdb
# blockdev --getsize64 /dev/loop0
134213632

This creates a log device that is slightly smaller than 128MB in size.
Next I ran generic/054, which sets the log sunit to 256k and fails:

# mkfs.xfs -f /dev/sda -l logdev=/dev/loop0,su=256k,version=2 -s size=4096
meta-data=/dev/sda               isize=512    agcount=4, agsize=72448 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
         =                       metadir=0
data     =                       bsize=4096   blocks=289792, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=0
log      =/dev/loop0             bsize=4096   blocks=32768, version=2
         =                       sectsz=4096  sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
         =                       rgcount=0    rgsize=0 blks
Discarding blocks...Done.
Discarding blocks...Done.
mkfs.xfs: libxfs_device_zero write failed: No space left on device

Notice that mkfs thinks it should format a 32768-fsblock external log,
but the log device itself is 32767 fsblocks.  Hence the write goes off
the end of the device and we get ENOSPC.

I tracked this behavior down to align_log_size in mkfs, which first
tries to round the log size up to a stripe boundary, then tries to round
it down.  Unfortunately, in the case of an external log we call the
function with XFS_MAX_LOG_BLOCKS without accounting for the possibility
that the log device might be smaller.

Correct the callsite and clean up the open-coded rounding.

Fixes: 8d1bff2be336 ("mkfs: reduce internal log size when log stripe units are in play")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agolibxfs: fix incorrect porting to 6.7
Darrick J. Wong [Mon, 15 Apr 2024 23:07:28 +0000 (16:07 -0700)] 
libxfs: fix incorrect porting to 6.7

Userspace libxfs is supposed to match the kernel libxfs except for the
preprocessor include directives.  Fix a few discrepancies that came up
for whatever reason.

To fix the build errors resulting from CONFIG_XFS_RT not being defined,
add it to libxfs.h and alter the Makefile to track xfs_rtbitmap.h.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
15 months agodebian: fix package configuration after removing platform_defs.h.in
Darrick J. Wong [Mon, 15 Apr 2024 23:07:27 +0000 (16:07 -0700)] 
debian: fix package configuration after removing platform_defs.h.in

In commit 0fa9dcb61b4f, we made platform_defs.h a static header file
instead of generating it from platform_defs.h.in.  Unfortunately, it
turns out that the debian packaging rules use "make
include/platform_defs.h" to run configure with the build options
set via LOCAL_CONFIGURE_OPTIONS.

Since platform_defs.h is no longer generated, the make command in
debian/rules does nothing, which means that the binaries don't get built
the way the packaging scripts specify.  This breaks multiarch for
libhandle.so, as well as libeditline and libblkid support for
xfs_db/io/spaceman.

Fix this by correcting debian/rules to make include/builddefs, which
will start ./configure with the desired options.

Fixes: 0fa9dcb61b4f ("include: stop generating platform_defs.h")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
15 months agoxfsprogs: Release v6.7.0 v6.7.0
Carlos Maiolino [Wed, 17 Apr 2024 07:55:22 +0000 (09:55 +0200)] 
xfsprogs: Release v6.7.0

Update all the necessary files for a 6.7.0 release.

Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for HDIO_GETGEO
Christoph Hellwig [Thu, 15 Feb 2024 06:54:24 +0000 (07:54 +0100)] 
configure: don't check for HDIO_GETGEO

HDIO_GETGEO has been around longer than XFS.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for SG_IO
Christoph Hellwig [Thu, 15 Feb 2024 06:54:23 +0000 (07:54 +0100)] 
configure: don't check for SG_IO

SG_IO has been around longer than XFS.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for fstatat
Christoph Hellwig [Thu, 15 Feb 2024 06:54:22 +0000 (07:54 +0100)] 
configure: don't check for fstatat

fstatat has been supported since Linux 2.6.16 and glibc 2.4.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for openat
Christoph Hellwig [Thu, 15 Feb 2024 06:54:21 +0000 (07:54 +0100)] 
configure: don't check for openat

openat has been supported since Linux 2.6.16 and glibc 2.4.

Note that xfs_db already uses it without the ifdef.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for the f_flags field in statfs
Christoph Hellwig [Thu, 15 Feb 2024 06:54:20 +0000 (07:54 +0100)] 
configure: don't check for the f_flags field in statfs

The f_flags field has been supported since Linux 2.6.36.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for fsetxattr
Christoph Hellwig [Thu, 15 Feb 2024 06:54:19 +0000 (07:54 +0100)] 
configure: don't check for fsetxattr

fsetxattr has been supported since Linux 2.4 and glibc 2.3.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for mremap
Christoph Hellwig [Thu, 15 Feb 2024 06:54:18 +0000 (07:54 +0100)] 
configure: don't check for mremap

mremap has been around since before the dawn of it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for preadv and pwritev
Christoph Hellwig [Thu, 15 Feb 2024 06:54:17 +0000 (07:54 +0100)] 
configure: don't check for preadv and pwritev

preadv and pwritev have been supported since Linux 2.6.30.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for syncfs
Christoph Hellwig [Thu, 15 Feb 2024 06:54:16 +0000 (07:54 +0100)] 
configure: don't check for syncfs

syncfs has been supported since Linux 2.6.39.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for fallocate
Christoph Hellwig [Thu, 15 Feb 2024 06:54:15 +0000 (07:54 +0100)] 
configure: don't check for fallocate

fallocate has been supported since Linux 2.6.23 and glibc 2.10.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for fls
Christoph Hellwig [Thu, 15 Feb 2024 06:54:14 +0000 (07:54 +0100)] 
configure: don't check for fls

fls should never be provided by system headers.  It seems like on MacOS
it did, but as we're not supporting MacOS anymore there is no need to
check for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for readdir
Christoph Hellwig [Thu, 15 Feb 2024 06:54:13 +0000 (07:54 +0100)] 
configure: don't check for readdir

readdir has been part of Posix since the very beginning.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for sync_file_range
Christoph Hellwig [Thu, 15 Feb 2024 06:54:12 +0000 (07:54 +0100)] 
configure: don't check for sync_file_range

sync_file_range has been supported since Linux 2.6.17.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for fiemap
Christoph Hellwig [Thu, 15 Feb 2024 06:54:11 +0000 (07:54 +0100)] 
configure: don't check for fiemap

fiemap has been supported since Linux 2.6.28.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for mincor
Christoph Hellwig [Thu, 15 Feb 2024 06:54:10 +0000 (07:54 +0100)] 
configure: don't check for mincor

mincore has been supported since Linux 2.3.99pre1 and glibc 2.2.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for madvise
Christoph Hellwig [Thu, 15 Feb 2024 06:54:09 +0000 (07:54 +0100)] 
configure: don't check for madvise

madvise has been supported since before the dawn of it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for sendfile
Christoph Hellwig [Thu, 15 Feb 2024 06:54:08 +0000 (07:54 +0100)] 
configure: don't check for sendfile

sendfile has been supported since Linux 2.2 and glibc 2.1.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for fadvise
Christoph Hellwig [Thu, 15 Feb 2024 06:54:07 +0000 (07:54 +0100)] 
configure: don't check for fadvise

fadvise has been supported since Linux 2.5.60 and glibc 2.2.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: require libblkid
Christoph Hellwig [Thu, 15 Feb 2024 06:54:06 +0000 (07:54 +0100)] 
configure: require libblkid

We can't support block device access (which is the reason for xfsprogs
to exist) without blkid.  Make it a hard requirement and remove the
stubs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoconfigure: don't check for getmntent
Christoph Hellwig [Thu, 15 Feb 2024 06:54:05 +0000 (07:54 +0100)] 
configure: don't check for getmntent

getmntent always exists on Linux (and always has), so don't bother
checking for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoio: don't redefine SEEK_DATA and SEEK_HOLE
Christoph Hellwig [Thu, 15 Feb 2024 06:54:04 +0000 (07:54 +0100)] 
io: don't redefine SEEK_DATA and SEEK_HOLE

HAVE_SEEK_DATA is never defined, so the code in xfs_io just
unconditionally redefines SEEK_DATA and SEEK_HOLE.  Switch to the
system version instead, which has been around since Linux 3.1 and
glibc of similar vintage.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoinclude: stop generating platform_defs.h
Christoph Hellwig [Thu, 15 Feb 2024 06:54:03 +0000 (07:54 +0100)] 
include: stop generating platform_defs.h

Now that the sizeof checks are gone, we can stop generating platform_defs.h.
The only caveat is that we need to stop undefining ENABLE_GETTEXT, which the
generation process had removed before.  The actual ENABLE_GETTEXT will be
passd on the compiler command line, just like other ENABLE or HAVE values
from autoconf.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoinclude: stop using SIZEOF_LONG
Christoph Hellwig [Thu, 15 Feb 2024 06:54:02 +0000 (07:54 +0100)] 
include: stop using SIZEOF_LONG

SIZEOF_LONG together with the unused SIZEOF_CHAR_P is the last thing that
really needs a generated configuration header.  Switch to just using
sizeof(long) so that we can stop generating platform_defs.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agorepair: refactor the BLKMAP_NEXTS_MAX check
Christoph Hellwig [Thu, 15 Feb 2024 06:54:01 +0000 (07:54 +0100)] 
repair: refactor the BLKMAP_NEXTS_MAX check

Check the 32-bit limits using sizeof instead of cpp ifdefs so that we
can get rid of BITS_PER_LONG.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoinclude: unconditionally define umode_t
Christoph Hellwig [Thu, 15 Feb 2024 06:54:00 +0000 (07:54 +0100)] 
include: unconditionally define umode_t

No system or kernel uapi header defines umode_t, so just define it
unconditionally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoinclude: remove the filldir_t typedef
Christoph Hellwig [Thu, 15 Feb 2024 06:53:59 +0000 (07:53 +0100)] 
include: remove the filldir_t typedef

Neither struct filldir, nor filldir_t is used anywhere in xfsprogs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
16 months agoxfs_db: don't hardcode 'type data' size at 512b
Darrick J. Wong [Thu, 22 Feb 2024 22:04:31 +0000 (14:04 -0800)] 
xfs_db: don't hardcode 'type data' size at 512b

On a disk with 4096-byte LBAs, the xfs_db 'type data' subcommand doesn't
work:

# xfs_io -c 'sb' -c 'type data' /dev/sda
xfs_db: read failed: Invalid argument
no current object

The cause of this is the hardcoded initialization of bb_count when we're
setting type data -- it should be the filesystem sector size, not just 1.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agodebian: Increase build verbosity, add terse support
Bastian Germann [Mon, 12 Feb 2024 23:07:55 +0000 (00:07 +0100)] 
debian: Increase build verbosity, add terse support

Section 4.9 of the Debian Policy reads:

"The package build should be as verbose as reasonably possible,
except where the terse tag is included in DEB_BUILD_OPTIONS".

Implement such behavior for xfsprogs by passing V=1 to make by default.

Link: https://www.debian.org/doc/debian-policy/ch-source.html#main-building-script-debian-rules
Link: https://bugs.debian.org/1063774
Reported-by: Emanuele Rocca <ema@debian.org>
Signed-off-by: Bastian Germann <bage@debian.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agobuild: Request 64-bit time_t where possible
Sam James [Mon, 5 Feb 2024 23:23:21 +0000 (23:23 +0000)] 
build: Request 64-bit time_t where possible

Suggested by Darrick during LFS review. We take the same approach as in
5c0599b721d1d232d2e400f357abdf2736f24a97 ('Fix building xfsprogs on 32-bit platforms')
to avoid autoconf hell - just take the tried & tested approach which is working
fine for us with LFS already.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoio: Adapt to >= 64-bit time_t
Sam James [Mon, 5 Feb 2024 23:23:20 +0000 (23:23 +0000)] 
io: Adapt to >= 64-bit time_t

We now require (at least) 64-bit time_t, so we need to adjust some printf
specifiers accordingly.

Unfortunately, we've stumbled upon a ridiculous C mmoment whereby there's
no neat format specifier (not even one of the inttypes ones) for time_t, so
we cast to intmax_t and use %jd.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoRemove use of LFS64 interfaces
Violet Purcell [Mon, 5 Feb 2024 23:23:19 +0000 (23:23 +0000)] 
Remove use of LFS64 interfaces

LFS64 interfaces are non-standard and are being removed in the upcoming musl
1.2.5. Setting _FILE_OFFSET_BITS=64 (which is currently being done) makes all
interfaces on glibc 64-bit by default, so using the LFS64 interfaces is
redundant. This commit replaces all occurences of off64_t with off_t,
stat64 with stat, and fstat64 with fstat.

Link: https://bugs.gentoo.org/907039
Cc: Felix Janda <felix.janda@posteo.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Violet Purcell <vimproved@inventati.org>
Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: inode recovery does not validate the recovered inode
Dave Chinner [Thu, 15 Feb 2024 08:27:54 +0000 (09:27 +0100)] 
xfs: inode recovery does not validate the recovered inode

Source kernel commit: 038ca189c0d2c1570b4d922f25b524007c85cf94

Discovered when trying to track down a weird recovery corruption
issue that wasn't detected at recovery time.

The specific corruption was a zero extent count field when big
extent counts are in use, and it turns out the dinode verifier
doesn't detect that specific corruption case, either. So fix it too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: fix internal error from AGFL exhaustion
Omar Sandoval [Thu, 15 Feb 2024 08:27:54 +0000 (09:27 +0100)] 
xfs: fix internal error from AGFL exhaustion

Source kernel commit: f63a5b3769ad7659da4c0420751d78958ab97675

We've been seeing XFS errors like the following:

XFS: Internal error i != 1 at line 3526 of file fs/xfs/libxfs/xfs_btree.c.  Caller xfs_btree_insert+0x1ec/0x280
...
Call Trace:
xfs_corruption_error+0x94/0xa0
xfs_btree_insert+0x221/0x280
xfs_alloc_fixup_trees+0x104/0x3e0
xfs_alloc_ag_vextent_size+0x667/0x820
xfs_alloc_fix_freelist+0x5d9/0x750
xfs_free_extent_fix_freelist+0x65/0xa0
__xfs_free_extent+0x57/0x180
...

This is the XFS_IS_CORRUPT() check in xfs_btree_insert() when
xfs_btree_insrec() fails.

After converting this into a panic and dissecting the core dump, I found
that xfs_btree_insrec() is failing because it's trying to split a leaf
node in the cntbt when the AG free list is empty. In particular, it's
failing to get a block from the AGFL _while trying to refill the AGFL_.

If a single operation splits every level of the bnobt and the cntbt (and
the rmapbt if it is enabled) at once, the free list will be empty. Then,
when the next operation tries to refill the free list, it allocates
space. If the allocation does not use a full extent, it will need to
insert records for the remaining space in the bnobt and cntbt. And if
those new records go in full leaves, the leaves (and potentially more
nodes up to the old root) need to be split.

Fix it by accounting for the additional splits that may be required to
refill the free list in the calculation for the minimum free list size.

P.S. As far as I can tell, this bug has existed for a long time -- maybe
back to xfs-history commit afdf80ae7405 ("Add XFS_AG_MAXLEVELS macros
...") in April 1994! It requires a very unlucky sequence of events, and
in fact we didn't hit it until a particular sparse mmap workload updated
from 5.12 to 5.19. But this bug existed in 5.12, so it must've been
exposed by some other change in allocation or writeback patterns. It's
also much less likely to be hit with the rmapbt enabled, since that
increases the minimum free list size and is unlikely to split at the
same time as the bnobt and cntbt.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: abort intent items when recovery intents fail
Long Li [Thu, 15 Feb 2024 08:27:53 +0000 (09:27 +0100)] 
xfs: abort intent items when recovery intents fail

Source kernel commit: f8f9d952e42dd49ae534f61f2fa7ca0876cb9848

When recovering intents, we capture newly created intent items as part of
point, we forget to remove those newly created intent items from the AIL
and hang:

[root@localhost ~]# cat /proc/539/stack
[<0>] xfs_ail_push_all_sync+0x174/0x230
[<0>] xfs_unmount_flush_inodes+0x8d/0xd0
[<0>] xfs_mountfs+0x15f7/0x1e70
[<0>] xfs_fs_fill_super+0x10ec/0x1b20
[<0>] get_tree_bdev+0x3c8/0x730
[<0>] vfs_get_tree+0x89/0x2c0
[<0>] path_mount+0xecf/0x1800
[<0>] do_mount+0xf3/0x110
[<0>] __x64_sys_mount+0x154/0x1f0
[<0>] do_syscall_64+0x39/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

When newly created intent items fail to commit via transaction, intent
recovery hasn't created done items for these newly created intent items,
so the capture structure is the sole owner of the captured intent items.
We must release them explicitly or else they leak:

unreferenced object 0xffff888016719108 (size 432):
comm "mount", pid 529, jiffies 4294706839 (age 144.463s)
hex dump (first 32 bytes):
08 91 71 16 80 88 ff ff 08 91 71 16 80 88 ff ff  ..q.......q.....
18 91 71 16 80 88 ff ff 18 91 71 16 80 88 ff ff  ..q.......q.....
backtrace:
[<ffffffff8230c68f>] xfs_efi_init+0x18f/0x1d0
[<ffffffff8230c720>] xfs_extent_free_create_intent+0x50/0x150
[<ffffffff821b671a>] xfs_defer_create_intents+0x16a/0x340
[<ffffffff821bac3e>] xfs_defer_ops_capture_and_commit+0x8e/0xad0
[<ffffffff82322bb9>] xfs_cui_item_recover+0x819/0x980
[<ffffffff823289b6>] xlog_recover_process_intents+0x246/0xb70
[<ffffffff8233249a>] xlog_recover_finish+0x8a/0x9a0
[<ffffffff822eeafb>] xfs_log_mount_finish+0x2bb/0x4a0
[<ffffffff822c0f4f>] xfs_mountfs+0x14bf/0x1e70
[<ffffffff822d1f80>] xfs_fs_fill_super+0x10d0/0x1b20
[<ffffffff81a21fa2>] get_tree_bdev+0x3d2/0x6d0
[<ffffffff81a1ee09>] vfs_get_tree+0x89/0x2c0
[<ffffffff81a9f35f>] path_mount+0xecf/0x1800
[<ffffffff81a9fd83>] do_mount+0xf3/0x110
[<ffffffff81aa00e4>] __x64_sys_mount+0x154/0x1f0
[<ffffffff83968739>] do_syscall_64+0x39/0x80

Fix the problem above by abort intent items that don't have a done item
when recovery intents fail.

Fixes: e6fff81e4870 ("xfs: proper replay of deferred ops queued during log recovery")
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: factor out xfs_defer_pending_abort
Long Li [Thu, 15 Feb 2024 08:27:53 +0000 (09:27 +0100)] 
xfs: factor out xfs_defer_pending_abort

Source kernel commit: 2a5db859c6825b5d50377dda9c3cc729c20cad43

Factor out xfs_defer_pending_abort() from xfs_defer_trans_abort(), which
not use transaction parameter, so it can be used after the transaction
life cycle.

Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: invert the realtime summary cache
Omar Sandoval [Thu, 15 Feb 2024 08:27:53 +0000 (09:27 +0100)] 
xfs: invert the realtime summary cache

Source kernel commit: e23aaf450de733044a74bc95528f728478b61c2a

In commit 355e3532132b ("xfs: cache minimum realtime summary level"), I
added a cache of the minimum level of the realtime summary that has any
free extents. However, it turns out that the _maximum_ level is more
useful for upcoming optimizations, and basically equivalent for the
existing usage. So, let's change the meaning of the cache to be the
maximum level + 1, or 0 if there are no free extents.

For example, if the cache contains:

{0, 4}

then there are no free extents starting in realtime bitmap block 0, and
there are no free extents larger than or equal to 2^4 blocks starting in
realtime bitmap block 1. The cache is a loose upper bound, so there may
or may not be free extents smaller than 2^4 blocks in realtime bitmap
block 1.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: simplify rt bitmap/summary block accessor functions
Darrick J. Wong [Thu, 15 Feb 2024 08:27:53 +0000 (09:27 +0100)] 
xfs: simplify rt bitmap/summary block accessor functions

Source kernel commit: e2cf427c91494ea0d1173a911090c39665c5fdef

Simplify the calling convention of these functions since the
xfs_rtalloc_args structure contains the parameters we need.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: simplify xfs_rtbuf_get calling conventions
Darrick J. Wong [Thu, 15 Feb 2024 08:27:52 +0000 (09:27 +0100)] 
xfs: simplify xfs_rtbuf_get calling conventions

Source kernel commit: 5b1d0ae9753f0654ab56c1e06155b3abf2919d71

Now that xfs_rtalloc_args holds references to the last-read bitmap and
summary blocks, we don't need to pass the buffer pointer out of
xfs_rtbuf_get.

Callers no longer have to xfs_trans_brelse on their own, though they are
required to call xfs_rtbuf_cache_relse before the xfs_rtalloc_args goes
out of scope.

While we're at it, create some trivial helpers so that we don't have to
remember if "0" means "bitmap" and "1" means "summary".

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: cache last bitmap block in realtime allocator
Omar Sandoval [Thu, 15 Feb 2024 08:26:52 +0000 (09:26 +0100)] 
xfs: cache last bitmap block in realtime allocator

Source kernel commit: e94b53ff699c2674a9ec083342a5254866210ade

Profiling a workload on a highly fragmented realtime device showed a ton
of CPU cycles being spent in xfs_trans_read_buf() called by
xfs_rtbuf_get(). Further tracing showed that much of that was repeated
calls to xfs_rtbuf_get() for the same block of the realtime bitmap.
These come from xfs_rtallocate_extent_block(): as it walks through
ranges of free bits in the bitmap, each call to xfs_rtcheck_range() and
xfs_rtfind_{forw,back}() gets the same bitmap block. If the bitmap block
is very fragmented, then this is _a lot_ of buffer lookups.

The realtime allocator already passes around a cache of the last used
realtime summary block to avoid repeated reads (the parameters rbpp and
rsb). We can do the same for the realtime bitmap.

This replaces rbpp and rsb with a struct xfs_rtbuf_cache, which caches
the most recently used block for both the realtime bitmap and summary.
xfs_rtbuf_get() now handles the caching instead of the callers, which
requires plumbing xfs_rtbuf_cache to more functions but also makes sure
we don't miss anything.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: consolidate realtime allocation arguments
Dave Chinner [Thu, 15 Feb 2024 08:25:48 +0000 (09:25 +0100)] 
xfs: consolidate realtime allocation arguments

Source kernel commit: 41f33d82cfd310e344fc9183f02cc9e0d2d27663

Consolidate the arguments passed around the rt allocator into a
struct xfs_rtalloc_arg similar to how the btree allocator arguments
are consolidated in a struct xfs_alloc_arg....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: use accessor functions for summary info words
Darrick J. Wong [Thu, 15 Feb 2024 08:25:48 +0000 (09:25 +0100)] 
xfs: use accessor functions for summary info words

Source kernel commit: 663b8db7b0256b81152b2f786e45ecf12bdf265f

Create get and set functions for rtsummary words so that we can redefine
the ondisk format with a specific endianness.  Note that this requires
the definition of a distinct type for ondisk summary info words so that
the compiler can perform proper typechecking.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: create helpers for rtsummary block/wordcount computations
Darrick J. Wong [Thu, 15 Feb 2024 08:25:48 +0000 (09:25 +0100)] 
xfs: create helpers for rtsummary block/wordcount computations

Source kernel commit: bd85af280de66a946022775a876edf0c553e3f35

Create helper functions that compute the number of blocks or words
necessary to store the rt summary file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: use accessor functions for bitmap words
Darrick J. Wong [Thu, 15 Feb 2024 08:25:47 +0000 (09:25 +0100)] 
xfs: use accessor functions for bitmap words

Source kernel commit: 97e993830a1cdd86ad7d207308b9f55a00660edd

Create get and set functions for rtbitmap words so that we can redefine
the ondisk format with a specific endianness.  Note that this requires
the definition of a distinct type for ondisk rtbitmap words so that the
compiler can perform proper typechecking as we go back and forth.

In the upcoming rtgroups feature, we're going to fix the problem that
rtwords are written in host endian order, which means we'll need the
distinct rtword/rtword_raw types.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: create a helper to handle logging parts of rt bitmap/summary blocks
Darrick J. Wong [Thu, 15 Feb 2024 08:25:47 +0000 (09:25 +0100)] 
xfs: create a helper to handle logging parts of rt bitmap/summary blocks

Source kernel commit: 312d61021b8947446aa9ec80b78b9230e8cb3691

Create an explicit helper function to log parts of rt bitmap and summary
blocks.  While we're at it, fix an off-by-one error in two of the
rtbitmap logging calls that led to unnecessarily large log items but was
otherwise benign.

Note that the upcoming rtgroups patchset will add block headers to the
rtbitmap and rtsummary files.  The helpers in this and the next few
patches take a less than direct route through xfs_rbmblock_wordptr and
xfs_rsumblock_infoptr to avoid helper churn in that patchset.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: create helpers for rtbitmap block/wordcount computations
Darrick J. Wong [Thu, 15 Feb 2024 08:25:47 +0000 (09:25 +0100)] 
xfs: create helpers for rtbitmap block/wordcount computations

Source kernel commit: d0448fe76ac1a9ccbce574577a4c82246d17eec4

Create helper functions that compute the number of blocks or words
necessary to store the rt bitmap.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: convert to new timestamp accessors
Jeff Layton [Thu, 15 Feb 2024 08:25:40 +0000 (09:25 +0100)] 
xfs: convert to new timestamp accessors

Source kernel commit: 75d1e312bbbd175fa27ffdd4c4fe9e8cc7d047ec

Convert to using the new inode timestamp accessor functions.

[Carlos: Also partially port 077c212f0344ae and 12cd4402365166]
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-75-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: convert rt summary macros to helpers
Darrick J. Wong [Thu, 15 Feb 2024 08:25:37 +0000 (09:25 +0100)] 
xfs: convert rt summary macros to helpers

Source kernel commit: 097b4b7b64ef67a4703b89fd4064480b61557fd5

Convert the realtime summary file macros to helper functions so that we
can improve type checking.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: convert open-coded xfs_rtword_t pointer accesses to helper
Darrick J. Wong [Mon, 12 Feb 2024 14:27:20 +0000 (15:27 +0100)] 
xfs: convert open-coded xfs_rtword_t pointer accesses to helper

Source kernel commit: a9948626849c2c65dfd201b5e9d855e62937de61

There are a bunch of places where we use open-coded logic to find a
pointer to an xfs_rtword_t within a rt bitmap buffer.  Convert all that
to helper functions for better type safety.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros
Darrick J. Wong [Mon, 12 Feb 2024 14:26:20 +0000 (15:26 +0100)] 
xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros

Source kernel commit: add3cddaea509071d01bf1d34df0d05db1a93a07

Remove these trivial macros since they're not even part of the ondisk
format.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: convert the rtbitmap block and bit macros to static inline functions
Darrick J. Wong [Mon, 12 Feb 2024 14:25:20 +0000 (15:25 +0100)] 
xfs: convert the rtbitmap block and bit macros to static inline functions

Source kernel commit: 90d98a6ada1da0f8797ff3f5adafd175dd8c0a81

Replace these macros with typechecked helper functions.  Eventually
we're going to add more logic to the helpers and it'll be easier if we
don't have to macro it up.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: use shifting and masking when converting rt extents, if possible
Darrick J. Wong [Mon, 12 Feb 2024 14:24:20 +0000 (15:24 +0100)] 
xfs: use shifting and masking when converting rt extents, if possible

Source kernel commit: ef5a83b7e597038d1c734ddb4bc00638082c2bf1

Avoid the costs of integer division (32-bit and 64-bit) if the realtime
extent size is a power of two.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: create rt extent rounding helpers for realtime extent blocks
Darrick J. Wong [Mon, 12 Feb 2024 14:23:20 +0000 (15:23 +0100)] 
xfs: create rt extent rounding helpers for realtime extent blocks

Source kernel commit: 5f57f7309d9ab9d24d50c5707472b1ed8af4eabc

Create a pair of functions to round rtblock numbers up or down to the
nearest rt extent.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: convert do_div calls to xfs_rtb_to_rtx helper calls
Darrick J. Wong [Mon, 12 Feb 2024 14:22:20 +0000 (15:22 +0100)] 
xfs: convert do_div calls to xfs_rtb_to_rtx helper calls

Source kernel commit: 055641248f649b52620a5fe8774bea253690e057

Convert these calls to use the helpers, and clean up all these places
where the same variable can have different units depending on where it
is in the function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: create helpers to convert rt block numbers to rt extent numbers
Darrick J. Wong [Mon, 12 Feb 2024 14:21:20 +0000 (15:21 +0100)] 
xfs: create helpers to convert rt block numbers to rt extent numbers

Source kernel commit: 5dc3a80d46a450481df7f7e9fe673ba3eb4514c3

Create helpers to do unit conversions of rt block numbers to rt extent
numbers.  There are three variations -- one to compute the rt extent
number from an rt block number; one to compute the offset of an rt block
within an rt extent; and one to extract both.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: create a helper to convert extlen to rtextlen
Darrick J. Wong [Mon, 12 Feb 2024 14:20:20 +0000 (15:20 +0100)] 
xfs: create a helper to convert extlen to rtextlen

Source kernel commit: 2c2b981b737a519907429f62148bbd9e40e01132

Create a helper to compute the realtime extent (xfs_rtxlen_t) from an
extent length (xfs_extlen_t) value.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: create a helper to compute leftovers of realtime extents
Darrick J. Wong [Mon, 12 Feb 2024 14:19:20 +0000 (15:19 +0100)] 
xfs: create a helper to compute leftovers of realtime extents

Source kernel commit: 68db60bf01c131c09bbe35adf43bd957a4c124bc

Create a helper to compute the misalignment between a file extent
(xfs_extlen_t) and a realtime extent.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: create a helper to convert rtextents to rtblocks
Darrick J. Wong [Mon, 12 Feb 2024 14:18:20 +0000 (15:18 +0100)] 
xfs: create a helper to convert rtextents to rtblocks

Source kernel commit: fa5a387230861116c2434c20d29fc4b3fd077d24

Create a helper to convert a realtime extent to a realtime block.  Later
on we'll change the helper to use bit shifts when possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: convert rt extent numbers to xfs_rtxnum_t
Darrick J. Wong [Mon, 12 Feb 2024 14:17:20 +0000 (15:17 +0100)] 
xfs: convert rt extent numbers to xfs_rtxnum_t

Source kernel commit: 2d5f216b77e33f9b503bd42998271da35d4b7055

Further disambiguate the xfs_rtblock_t uses by creating a new type,
xfs_rtxnum_t, to store the position of an extent within the realtime
section, in units of rtextents.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: rename xfs_verify_rtext to xfs_verify_rtbext
Darrick J. Wong [Mon, 12 Feb 2024 14:16:20 +0000 (15:16 +0100)] 
xfs: rename xfs_verify_rtext to xfs_verify_rtbext

Source kernel commit: 3d2b6d034f0feb7741b313f978a2fe45e917e1be

This helper function validates that a range of *blocks* in the
realtime section is completely contained within the realtime section.
It does /not/ validate ranges of *rtextents*.  Rename the function to
avoid suggesting that it does, and change the type of the @len parameter
since xfs_rtblock_t is a position unit, not a length unit.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: convert rt bitmap extent lengths to xfs_rtbxlen_t
Darrick J. Wong [Mon, 12 Feb 2024 14:15:20 +0000 (15:15 +0100)] 
xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t

Source kernel commit: f29c3e745dc253bf9d9d06ddc36af1a534ba1dd0

XFS uses xfs_rtblock_t for many different uses, which makes it much more
difficult to perform a unit analysis on the codebase.  One of these
(ab)uses is when we need to store the length of a free space extent as
stored in the realtime bitmap.  Because there can be up to 2^64 realtime
extents in a filesystem, we need a new type that is larger than
xfs_rtxlen_t for callers that are querying the bitmap directly.  This
means scrub and growfs.

Create this type as "xfs_rtbxlen_t" and use it to store 64-bit rtx
lengths.  'b' stands for 'bitmap' or 'big'; reader's choice.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: convert rt bitmap/summary block numbers to xfs_fileoff_t
Darrick J. Wong [Mon, 12 Feb 2024 14:14:20 +0000 (15:14 +0100)] 
xfs: convert rt bitmap/summary block numbers to xfs_fileoff_t

Source kernel commit: 03f4de332e2e79db36ed2156fb2350480f142bec

We should use xfs_fileoff_t to store the file block offset of any
location within the realtime bitmap or summary files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: convert xfs_extlen_t to xfs_rtxlen_t in the rt allocator
Darrick J. Wong [Mon, 12 Feb 2024 14:13:20 +0000 (15:13 +0100)] 
xfs: convert xfs_extlen_t to xfs_rtxlen_t in the rt allocator

Source kernel commit: a684c538bc14410565e8939393089670fa1e19dd

In most of the filesystem, we use xfs_extlen_t to store the length of a
file (or AG) space mapping in units of fs blocks.  Unfortunately, the
realtime allocator also uses it to store the length of a rt space
mapping in units of rt extents.  This is confusing, since one rt extent
can consist of many fs blocks.

Separate the two by introducing a new type (xfs_rtxlen_t) to store the
length of a space mapping (in units of realtime extents) that would be
found in a file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h
Darrick J. Wong [Mon, 12 Feb 2024 14:13:16 +0000 (15:13 +0100)] 
xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h

Source kernel commit: 13928113fc5b5e79c91796290a99ed991ac0efe2

Move all the declarations for functionality in xfs_rtbitmap.c into a
separate xfs_rtbitmap.h header file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: fix units conversion error in xfs_bmap_del_extent_delay
Darrick J. Wong [Mon, 12 Feb 2024 14:02:05 +0000 (15:02 +0100)] 
xfs: fix units conversion error in xfs_bmap_del_extent_delay

Source kernel commit: ddd98076d5c075c8a6c49d9e6e8ee12844137f23

The unit conversions in this function do not make sense.  First we
convert a block count to bytes, then divide that bytes value by
rextsize, which is in blocks, to get an rt extent count.  You can't
divide bytes by blocks to get a (possibly multiblock) extent value.

Fortunately nobody uses delalloc on the rt volume so this hasn't
mattered.

Fixes: fa5c836ca8eb5 ("xfs: refactor xfs_bunmapi_cow")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
17 months agoxfs: hoist freeing of rt data fork extent mappings
Darrick J. Wong [Mon, 12 Feb 2024 14:01:33 +0000 (15:01 +0100)] 
xfs: hoist freeing of rt data fork extent mappings

Source kernel commit: 6c664484337b37fa0cf6e958f4019623e30d40f7

Currently, xfs_bmap_del_extent_real contains a bunch of code to convert
the physical extent of a data fork mapping for a realtime file into rt
extents and pass that to the rt extent freeing function.  Since the
details of this aren't needed when CONFIG_XFS_REALTIME=n, move it to
xfs_rtbitmap.c to reduce code size when realtime isn't enabled.

This will (one day) enable realtime EFIs to reuse the same
unit-converting call with less code duplication.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>