git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

xfs: check unused nlink fields in the ondisk inode

Source kernel commit: 40cb8613d6122ca80a9e42e4cecc4d308c3b80fb

v2/v3 inodes use di_nlink and not di_onlink; and v1 inodes use di_onlink
and not di_nlink. Whichever field is not in use, make sure its contents
are zero, and teach xfs_scrub to fix that if it is.

This clears a bunch of missing scrub failure errors in xfs/385 for
core.onlink.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: pass the owner to xfs_symlink_write_target

Source kernel commit: ea8214c3195c2ed3a205dea42bbe7746712fc461

Require callers of xfs_symlink_write_target to pass the owner number
explicitly. This sets us up for online repair to be able to write a
remote symlink target to sc->tempip with sc->ip's inumber in the block
heaader.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: expose xfs_bmap_local_to_extents for online repair

Source kernel commit: ef744be416b5c649d287604730400dfa728779fe

Allow online repair to call xfs_bmap_local_to_extents and add a void *
argument at the end so that online repair can pass its own context.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: repair extended attributes

Source kernel commit: e47dcf113ae348678143cc935a1183059c02c9ad

If the extended attributes look bad, try to sift through the rubble to
find whatever keys/values we can, stage a new attribute structure in a
temporary file and use the atomic extent swapping mechanism to commit
the results in bulk.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: use atomic extent swapping to fix user file fork data

Source kernel commit: 629fdaf5f5b1b7f7107ed4de04e0991a99501ced

Build on the code that was recently added to the temporary repair file
code so that we can atomically switch the contents of any file fork,
even if the fork is in local format.  The upcoming functions to repair
xattrs, directories, and symlinks will need that capability.

Repair can lock out access to these user files by holding IOLOCK_EXCL on
these user files.  Therefore, it is safe to drop the ILOCK of both the
file being repaired and the tempfile being used for staging, and cancel
the scrub transaction.  We do this so that we can reuse the resource
estimation and transaction allocation functions used by a regular file
exchange operation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: validate explicit directory free block owners

Source kernel commit: fe6c9f8e48e0dcbfc3dba17edd88490c8579b34b

Port the existing directory freespace block header checking function to
accept an owner number instead of an xfs_inode, then update the
callsites to use xfs_da_args.owner when possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: validate explicit directory block buffer owners

Source kernel commit: 29b41ce919b7f0b0c2220e088e450d9b132bec36

Port the existing directory block header checking function to accept an
owner number instead of an xfs_inode, then update the callsites to use
xfs_da_args.owner when possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: validate explicit directory data buffer owners

Source kernel commit: cc6740ddb423db2066f7669eaaa377fdbf84ab1e

Port the existing directory data header checking function to accept an
owner number instead of an xfs_inode, then update the callsites to use
xfs_da_args.owner when possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: validate directory leaf buffer owners

Source kernel commit: 402eef10a1bab0b428c418cfbaaa0a62efc9c951

Check the owner field of directory leaf blocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: validate dabtree node buffer owners

Source kernel commit: d44bea9b41ca25f91fd9f25ed2cc3bb2f6dab4bc

Check the owner field of dabtree node blocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: validate attr remote value buffer owners

Source kernel commit: 8c25dc728bd1ca9344001aa6ef4556885572baa4

Check the owner field of xattr remote value blocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: validate attr leaf buffer owners

Source kernel commit: f4887fbc41dcb1560ec5da982ac7c6ad04b71de5

Create a leaf block header checking function to validate the owner field
of xattr leaf blocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: use the xfs_da_args owner field to set new dir/attr block owner

Source kernel commit: 17a85dc64ae0804d33a2293686fc987a830a462d

When we're creating leaf, data, freespace, or dabtree blocks for
directories and xattrs, use the explicit owner field (instead of the
xfs_inode) to set the owner field. This will enable online repair to
construct replacement data structures in a temporary file without having
to change the owner fields prior to swapping the new and old structures.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: add an explicit owner field to xfs_da_args

Source kernel commit: 9eef772f3a194f6841850e45dacdf4207ec7da84

Add an explicit owner field to xfs_da_args, which will make it easier
for online fsck to set the owner field of the temporary directory and
xattr structures that it builds to repair damaged metadata.

Note: I hopefully found all the xfs_da_args definitions by looking for
automatic stack variable declarations and xfs_da_args.dp assignments:

git grep -E '(args.*dp =|struct xfs_da_args[[:space:]]*[a-z0-9][a-z0-9]*)'

Note that callers of xfs_attr_{get,set,change} can set the owner to zero
(or leave it unset) to have the default set to args->dp.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: enable logged file mapping exchange feature

Source kernel commit: 0730e8d8ba1d1507f1d7fd719e1f835ce69961fe

Add the XFS_SB_FEAT_INCOMPAT_EXCHRANGE feature to the set of features
that we will permit when mounting a filesystem. This turns on support
for the file range exchange feature.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: capture inode generation numbers in the ondisk exchmaps log item

Source kernel commit: 14f19991020b3c712d626727c17599f134cc6efa

Per some very late review comments, capture the generation numbers of
both inodes involved in a file content exchange operation so that we
don't accidentally target files with have been reallocated.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: make file range exchange support realtime files

Source kernel commit: e62941103faa2eedba6a155316e059a490c743a6

Now that bmap items support the realtime device, we can add the
necessary pieces to the file range exchange code to support exchanging
mappings. All we really need to do here is adjust the blockcount
upwards to the end of the rt extent and remove the inode checks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: condense symbolic links after a mapping exchange operation

Source kernel commit: 33a9be2b7016e79f47c4c1b523a0aa59d41893c0

The previous commit added a new file mapping exchange flag that enables
us to perform post-exchange processing on file2 once we're done
exchanging the extent mappings.  Now add this ability for symlinks.

This isn't used anywhere right now, but we need to have the basic ondisk
flags in place so that a future online symlink repair feature can
salvage the remote target in a temporary link and exchange the data fork
mappings when ready.  If one file is in extents format and the other is
inline, we will have to promote both to extents format to perform the
exchange.  After the exchange, we can try to condense the fixed symlink
down to inline format if possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: condense directories after a mapping exchange operation

Source kernel commit: da165fbde23b84591b6ccdf6749277d2d767b770

The previous commit added a new file mapping exchange flag that enables
us to perform post-swap processing on file2 once we're done exchanging
extent mappings.  Now add this ability for directories.

This isn't used anywhere right now, but we need to have the basic ondisk
flags in place so that a future online directory repair feature can
create salvaged dirents in a temporary directory and exchange the data
fork mappings when ready.  If one file is in extents format and the
other is inline, we will have to promote both to extents format to
perform the exchange.  After the exchange, we can try to condense the
fixed directory down to inline format if possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: condense extended attributes after a mapping exchange operation

Source kernel commit: 497d7a2608f8b7329e92bdaaf745ca127a582ad9

Add a new file mapping exchange flag that enables us to perform
post-exchange processing on file2 once we're done exchanging the extent
mappings. If we were swapping mappings between extended attribute
forks, we want to be able to convert file2's attr fork from block to
inline format.

(This implies that all fork contents are exchanged.)

This isn't used anywhere right now, but we need to have the basic ondisk
flags in place so that a future online xattr repair feature can create
salvaged attrs in a temporary file and exchange the attr fork mappings
when ready. If one file is in extents format and the other is inline,
we will have to promote both to extents format to perform the exchange.
After the exchange, we can try to condense the fixed file's attr fork
back down to inline format if possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: add error injection to test file mapping exchange recovery

Source kernel commit: 5fd022ec7d420dfca1eaaf997923a5d4dd0dcf62

Add an errortag so that we can test recovery of exchmaps log items.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: create deferred log items for file mapping exchanges

Source kernel commit: 966ceafc7a437105ecfe1cadb3747b2965a260ca

Now that we've created the skeleton of a log intent item to track and
restart file mapping exchange operations, add the upper level logic to
log. This builds on the existing bmap update intent items that have
been around for a while now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: introduce a file mapping exchange log intent item

Source kernel commit: 6c08f434bd33f88cf169e9e43c7a5e42fb3f2118

Introduce a new intent log item to handle exchanging mappings between
the forks of two files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: create a incompat flag for atomic file mapping exchanges

Source kernel commit: 1518646eef26c220e9256906260ecaaa64503522

Create a incompat flag so that we only attempt to process file mapping
exchange log items if the filesystem supports it, and a geometry flag to
advertise support if it's present.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: introduce new file range exchange ioctl

Source kernel commit: 9a64d9b3109d01cca0b83c1d36538b7a37c5284e

Introduce a new ioctl to handle exchanging ranges of bytes
between files. The goal here is to perform the exchange atomically with
respect to applications -- either they see the file contents before the
exchange or they see that A-B is now B-A, even if the kernel crashes.

My original goal with all this code was to make it so that online repair
can build a replacement directory or xattr structure in a temporary file
and commit the repair by atomically exchanging all the data blocks
between the two files. However, I needed a way to test this mechanism
thoroughly, so I've been evolving an ioctl interface since then.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: constify xfs_bmap_is_written_extent

Source kernel commit: 15f78aa3eb07645e7bef15a53b4ae1c757907d2c

This predicate doesn't modify the structure that's being passed in, so
we can mark it const.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: pass xfs_buf lookup flags to xfs_*read_agi

Source kernel commit: 549d3c9a29921f388ef3bcfd1d4f669b7dd4eed2

Allow callers to pass buffer lookup flags to xfs_read_agi and
xfs_ialloc_read_agi. This will be used in the next patch to fix a
deadlock in the online fsck inode scanner.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: don't crash on -vv

A user reported a crash in xfs_repair when they run it with -vv
specified on the command line. Ultimately this harks back to xfs_m in
main() containing uninitialized stack contents, and inadequate null
checks. Fix both problems in one go.

Reported-by: Santiago Kraus <santiago_kraus@yahoo.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: don't leak the rootdir inode when orphanage already exists

If repair calls mk_orphanage and the /lost+found directory already
exists, we need to irele the root directory before exiting the function.

Fixes: 6c39a3cbda32 ("Don't trash lost+found in phase 4 Merge of master-melb:xfs-cmds:29144a by kenmcd.")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfile: fix missing error unlock in xfile_fcb_find

Fix a missing mutex pthread_mutex_unlock and uninitialized return value
in xfile_fcb_find.

Coverity-id: 1604113
Coverity-id: 1604099
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

repair: btree blocks are never on the RT subvolume

scan_bmapbt tries to track btree blocks in the RT duplicate extent
AVL tree if the inode has the realtime flag set. Given that the
RT subvolume is only ever used for file data this is incorrect.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

[PATCH v3] Remove support for split-/usr installs

Always install binaries and other files under /usr, not /. This will
break any distribution that hasn't yet merged the two, which are
vanishingly small these days. This breaks the usecase of needing to
repair the /usr partition when there is no initramfs or livecd
available and / is the only option.

Signed-off-by: Chris Hofstaedtler <zeha@debian.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

xfsprogs: Release v6.9.0

Update all the necessary files for a 6.9.0 release.

Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs_db: Fix uninicialized error variable

To silence redhat's covscan checker:

Error: UNINIT (CWE-457): [#def1] [important]
xfsprogs-6.4.0/db/hash.c:308:2: var_decl: Declaring variable "error" without initializer.
xfsprogs-6.4.0/db/hash.c:353:2: uninit_use: Using uninitialized value "error".

Signed-off-by: Pavel Reichl <preichl@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: correct type of variable global_msgs.interval to time_t

Use time_t instead of int for interval field.

Coverity-id: 1596599

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

xfs_fsr: correct type in fsrprintf() call

Use %lld instead of %d for howlong variable.

Coverity-id: 1596598

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

xfs_db: fix unitialized automatic struct ifake to 0.

Ensure automatic struct ifake is properly initialized.

Coverity-id: 1596600, 1596597

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

mkfs.xfs: avoid potential overflowing expression in xfs_mkfs.c

Cast max_tx_bytes to uint64_t to avoid overflowing expression in
calc_concurrency_logblocks().

Coverity-id: 1596603

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

xfsprogs: remove platform_zero_range wrapper

Now that the HAVE_FALLOCATE guard around including
<linux/falloc.h> in linux/xfs.h has been removed via
15fb447f ("configure: don't check for fallocate"),
bad things can happen because we reference fallocate in
<xfs/linux.h> without defining _GNU_SOURCE:

$ cat test.c
#include <xfs/linux.h>

int main(void)
{
return 0;
}

$ gcc -o test test.c
In file included from test.c:1:
/usr/include/xfs/linux.h: In function ‘platform_zero_range’:
/usr/include/xfs/linux.h:186:15: error: implicit declaration of function ‘fallocate’ [-Wimplicit-function-declaration]
  186 |         ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, start, len);
      |               ^~~~~~~~~

i.e. xfs/linux.h includes fcntl.h without _GNU_SOURCE, so we
don't get an fallocate prototype.

Rather than playing games with header files, just remove the
platform_zero_range() wrapper - we have only one platform, and
only one caller after all - and simply call fallocate directly
if we have the FALLOC_FL_ZERO_RANGE flag defined.

(LTP also runs into this sort of problem at configure time ...)

Darrick points out that this changes a public header, but
platform_zero_range() has only been exposed by default
(without the oddball / internal xfsprogs guard) for a couple
of xfsprogs releases, so it's quite unlikely that anyone is
using this oddball fallocate wrapper.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
V2: remove error variable, add to commit msg
V3: Drop FALLOC_FL_ZERO_RANGE #ifdef per hch's suggestion and
    add his RVB from V2, with changes.

NOTE: compile tested only

xfs_io: fix mread with length 1 mod page size

Fix a weird bug in mread where if you passed it a length that was 1
modulo the page size, for example

        xfs_io -r file -c "mmap -r 0 8192" -c "mread -v 0 4097"

... it never reset its pointer into the buffer into which it copies the
data from the memory map.  This caused an out-of-bounds write, which
depending on the length passed could be very large and reliably
segfault.  Also nothing was printed, despite the use of -v option.

(I don't know if this case gets reached by any existing xfstest, but
presumably not.  I noticed it while working on a patch to an xfstest.)

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

xfs_repair: detect null buf passed to duration

gcc 12.2 with ubsan and fortify turned on complains about this:

In file included from /usr/include/stdio.h:906,
                 from ../include/platform_defs.h:9,
                 from ../include/libxfs.h:16,
                 from progress.c:3:
In function ‘sprintf’,
    inlined from ‘duration’ at progress.c:443:4:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:30:10: error: null destination pointer [-Werror=format-overflow=]
   30 |   return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   31 |                                   __glibc_objsize (__s), __fmt,
      |                                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   32 |                                   __va_arg_pack ());
      |                                   ~~~~~~~~~~~~~~~~~

I think this is a false negative since all callers are careful not to
pass in a null pointer.  Unfortunately the compiler cannot detect that
since this isn't a static function and complains.  Fix this by adding an
explicit declaration that buf isn't null.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs_io: make MADV_SOFT_OFFLINE conditional

mips64el does not have the symbol MADV_SOFT_OFFLINE, so wrap it in an
ifdef.

Signed-off-by: Bastian Germann <bage@debian.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

Merge tag 'mkfs-cleanups-6.9_2024-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next

mkfs: cleanups for 6.9 [v30.5 11/35]

Clean up mkfs' open-coded symlink handling code.

This has been running on the djcloud for months with no problems. Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

Merge tag 'repair-refcount-scalability-6.9_2024-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next

xfs_repair: reduce refcount repair memory usage [v30.5 09/35]

The refcountbt repair code has serious memory usage problems when the
block sharing factor of the filesystem is very high.  This can happen if
a deduplication tool has been run against the filesystem, or if the fs
stores reflinked VM images that have been aging for a long time.

Recall that the original reference counting algorithm walks the reverse
mapping records of the filesystem to generate reference counts.  For any
given block in the AG, the rmap bag structure contains the all rmap
records that cover that block; the refcount is the size of that bag.

For online repair, the bag doesn't need the owner, offset, or state flag
information, so it discards those.  This halves the record size, but the
bag structure still stores one excerpted record for each reverse
mapping.  If the sharing count is high, this will use a LOT of memory
storing redundant records.  In the extreme case, 100k mappings to the
same piece of space will consume 100k*16 bytes = 1.6M of memory.

For offline repair, the bag stores the owner values so that we know
which inodes need to be marked as being reflink inodes.  If a
deduplication tool has been run and there are many blocks within a file
pointing to the same physical space, this will stll use a lot of memory
to store redundant records.

The solution to this problem is to deduplicate the bag records when
possible by adding a reference count to the bag record, and changing the
bag add function to detect an existing record to bump the refcount.  In
the above example, the 100k mappings will now use 24 bytes of memory.
These lookups can be done efficiently with a btree, so we create a new
refcount bag btree type (inside of online repair).  This is why we
refactored the btree code in the previous patchset.

The btree conversion also dramatically reduces the runtime of the
refcount generation algorithm, because the code to delete all bag
records that end at a given agblock now only has to delete one record
instead of (using the example above) 100k records.  As an added benefit,
record deletion now gives back the unused xfile space, which it did not
do previously.

This has been running on the djcloud for months with no problems.  Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

Merge tag 'repair-use-in-memory-btrees-6.9_2024-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next

xfs_repair: use in-memory rmap btrees [v30.5 08/35]

Now that we've ported support for in-memory btrees to userspace, port
xfs_repair to use them instead of the clunky slab interface that we
currently use.  This has the effect of moving memory consumption for
tracking reverse mappings into a memfd file, which means that we could
(theoretically) reduce the memory requirements by pointing it at an
on-disk file or something.  It also enables us to remove the sorting
step and to avoid having to coalesce adjacent contiguous bmap records
into a single rmap record.

This has been running on the djcloud for months with no problems.  Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

Merge tag 'repair-fixes-6.9_2024-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next

xfs_repair: minor fixes [v30.5 07/35]

Fix some random minor problems in xfs_repair.

This has been running on the djcloud for months with no problems. Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

Merge tag 'scrub-fixes-6.9_2024-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next

xfs_scrub: updates for 6.9 [v30.5 06/35]

Now that the kernel has the code for userspace to upload a clean bill of
health (which clears out all the secondary markers of ill health that
hint at forgotten sicknesses), let's make xfs_scrub do that if the
filesystem is actually clean.

Second, restructure the xfs_scrub program so that it scrubs file link
counts and quotacheck in parallel.

This has been running on the djcloud for months with no problems. Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

Merge tag 'spaceman-updates-6.9_2024-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next

xfs_spaceman: updates for 6.9 [v30.5 05/35]

Update xfs_spaceman to handle the new health reporting code that was
merged in 6.9.

This has been running on the djcloud for months with no problems. Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

Merge tag 'realtime-bmap-intents-6.9_2024-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next

xfsprogs: widen BUI formats to support realtime [v30.5 04/35]

Atomic extent swapping (and later, reverse mapping and reflink) on the
realtime device needs to be able to defer file mapping and extent
freeing work in much the same manner as is required on the data volume.
Make the BUI log items operate on rt extents in preparation for atomic
swapping and realtime rmap.

This has been running on the djcloud for months with no problems. Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

Merge tag 'bmap-intent-cleanups-6.9_2024-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next

xfsprogs: bmap log intent cleanups [v30.5 03/35]

The next major target of online repair are metadata that are persisted
in blocks mapped by a file fork.  In other words, we want to repair
directories, extended attributes, symbolic links, and the realtime free
space information.  For file-based metadata, we assume that the space
metadata is correct, which enables repair to construct new versions of
the metadata in a temporary file.  We then need to swap the file fork
mappings of the two files atomically.  With this patchset, we begin
constructing such a facility based on the existing bmap log items and a
new extent swap log item.

This series cleans up a few parts of the file block mapping log intent
code before we start adding support for realtime bmap intents.  Most of
it involves cleaning up tracepoints so that more of the data extraction
logic ends up in the tracepoint code and not the tracepoint call site,
which should reduce overhead further when tracepoints are disabled.
There is also a change to pass bmap intents all the way back to the bmap
code instead of unboxing the intent values and re-boxing them after the
_finish_one function completes.

This has been running on the djcloud for months with no problems.  Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

Merge tag 'libxfs-sync-6.9_2024-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next

libxfs: sync with 6.9 [v30.5 02/35]

Synchronize libxfs with the kernel.

This has been running on the djcloud for months with no problems. Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

Merge tag 'libxfs-6.9-sync-prep_2024-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next

libxfs: prepare to sync with 6.9 [v30.5 01/35]

Apply some cleanups to libxfs before we synchronize it with the kernel.

This has been running on the djcloud for months with no problems. Enjoy!

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

mkfs: use libxfs to create symlinks

Now that we've grabbed the kernel-side symlink writing function, use it
to create symbolic links from protofiles.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_io: print sysfs paths of mounted filesystems

Enable users to print the sysfs or debugfs path for the filesystems
backing the open files.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs_io: fix gcc complaints about potentially uninitialized variables

When I turned on UBSAN on the userspace build with gcc 12.2, I get this:

bulkstat.c: In function ‘bulkstat_single_f’:
bulkstat.c:316:24: error: ‘ino’ may be used uninitialized [-Werror=maybe-uninitialized]
  316 |                 ret = -xfrog_bulkstat_single(&xfd, ino, flags, &bulkstat);
      |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bulkstat.c:293:41: note: ‘ino’ was declared here
  293 |                 uint64_t                ino;
      |                                         ^~~

I /think/ this is a failure of the gcc static checker to notice that sm
will always be set to the last element of the tags[] array if it didn't
set ino, but this code could be more explicit about deciding to
fallback to strtoul.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: remove the old bag implementation

Remove the old bag implementation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: remove the old rmap collection slabs

Now that we've switched the offline repair code to use an in-memory
rmap btree for everything except recording the rmaps for the newly
generated per-AG btrees, get rid of all the old code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: check num before bplist[num]

smatch complained about checking an array index before indexing the
array, so fix that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: upload clean bills of health

If scrub terminates with a clean bill of health, tell the kernel that
the result of the scan is that everything's healthy.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_spaceman: report health of inode link counts

Report on the health of the inode link counts.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: port to the new refcount bag structure

Port the refcount record generating code to use the new refcount bag
data structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: reduce rmap bag memory usage when creating refcounts

The algorithm that computes reference count records uses a "bag"
structure to remember the rmap records corresponding to the current
block.  In the previous patch we converted the bag structure to store
actual rmap records instead of pointers to rmap records owned by another
structure as part of preparing for converting this algorithm to use
in-memory rmap btrees.

However, the memory usage of the bag structure is now excessive -- we
only need the physical extent and inode owner information to generate
refcount records and mark inodes that require the reflink flag.  IOWs,
the flags and offset fields are unnecessary.  Create a custom structure
for the bag, which halves its memory usage.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: log when buffers fail CRC checks even if we just recompute it

We should always log metadata block CRC validation errors, even if we
decide that the block contents are ok and that we'll simply recompute
the checksum. Without this patch, xfs_repair -n won't say anything
about crc errors on these blocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: use multiple threads to run in-kernel metadata scrubs that scan inodes

Instead of running the inode link count and quotacheck scanners in
serial, run them in parallel, with a slight delay to stagger the work to
reduce inode resource contention.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: create refcount bag

Create a bag structure for refcount information that uses the refcount
bag btree defined in the previous patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: compute refcount data from in-memory rmap btrees

Use the in-memory rmap btrees to compute the reference count
information. Convert the bag implementation to hold actual records
instead of pointers to slab objects.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: update health status if we get a clean bill of health

If we checked a filesystem and it turned out to be clean, upload that
information into the kernel.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: define an in-memory btree for storing refcount bag info

Create a new in-memory btree type so that we can store refcount bag info
in a much more memory-efficient format. Note that the xfs_repair rcbag
btree stores inode numbers (unlike the kernel rcbag btree) because
xfs_repair needs to compute the bitmap of inodes that must have the
reflink iflag set.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: verify on-disk rmap btrees with in-memory btree data

Check the on-disk reverse mappings with the observations we've recorded
in the in-memory btree during the filesystem walk.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: check file link counts

Check file link counts as part of checking a filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: convert regular rmap repair to use in-memory btrees

Convert the rmap btree repair code to use in-memory rmap btrees to store
the observed reverse mapping records. This will eliminate the need for
a separate record sorting step, as well as eliminate the need for all
the code that turns multiple consecutive bmap records into a single rmap
record.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: implement live quotacheck inode scan

Teach xfs_scrub to check quota resource usage counters when checking a
filesystem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: provide a kernel-compatible kasprintf

The kernel-like kasprintf will be used by the new metadir code, as well
as the rmap data structures in xfs_repair.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: tweak commit message]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>

libxfs: add a realtime flag to the bmap update log redo items

Extend the bmap update (BUI) log items with a new realtime flag that
indicates that the updates apply against a realtime file's data fork.
We'll wire up the actual code later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: add a xattr_entry helper

Add a helper to translate from the item list head to the attr_intent
item structure and use it so shorten assignments and avoid the need for
extra local variables.

Inspired-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: allow sunit mount option to repair bad primary sb stripe values

Source kernel commit: 15922f5dbf51dad334cde888ce6835d377678dc9

If a filesystem has a busted stripe alignment configuration on disk
(e.g. because broken RAID firmware told mkfs that swidth was smaller
than sunit), then the filesystem will refuse to mount due to the
stripe validation failing. This failure is triggering during distro
upgrades from old kernels lacking this check to newer kernels with
this check, and currently the only way to fix it is with offline
xfs_db surgery.

This runtime validity checking occurs when we read the superblock
for the first time and causes the mount to fail immediately. This
prevents the rewrite of stripe unit/width via
mount options that occurs later in the mount process. Hence there is
no way to recover this situation without resorting to offline xfs_db
rewrite of the values.

However, we parse the mount options long before we read the
superblock, and we know if the mount has been asked to re-write the
stripe alignment configuration when we are reading the superblock
and verifying it for the first time. Hence we can conditionally
ignore stripe verification failures if the mount options specified
will correct the issue.

We validate that the new stripe unit/width are valid before we
overwrite the superblock values, so we can ignore the invalid config
at verification and fail the mount later if the new values are not
valid. This, at least, gives users the chance of correcting the
issue after a kernel upgrade without having to resort to xfs-db
hacks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs_spaceman: report the health of quota counts

Report the health of quota counts.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: reuse xfs_bmap_update_cancel_item

Reuse xfs_bmap_update_cancel_item to put the AG/RTG and free the item in
a few places that currently open code the logic.

Inspired-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: shrink failure needs to hold AGI buffer

Source kernel commit: 75bcffbb9e7563259b7aed0fa77459d6a3a35627

Chandan reported a AGI/AGF lock order hang on xfs/168 during recent
testing. The cause of the problem was the task running xfs_growfs
to shrink the filesystem. A failure occurred trying to remove the
free space from the btrees that the shrink would make disappear,
and that meant it ran the error handling for a partial failure.

This error path involves restoring the per-ag block reservations,
and that requires calculating the amount of space needed to be
reserved for the free inode btree. The growfs operation hung here:

[18679.536829]  down+0x71/0xa0
[18679.537657]  xfs_buf_lock+0xa4/0x290 [xfs]
[18679.538731]  xfs_buf_find_lock+0xf7/0x4d0 [xfs]
[18679.539920]  xfs_buf_lookup.constprop.0+0x289/0x500 [xfs]
[18679.542628]  xfs_buf_get_map+0x2b3/0xe40 [xfs]
[18679.547076]  xfs_buf_read_map+0xbb/0x900 [xfs]
[18679.562616]  xfs_trans_read_buf_map+0x449/0xb10 [xfs]
[18679.569778]  xfs_read_agi+0x1cd/0x500 [xfs]
[18679.573126]  xfs_ialloc_read_agi+0xc2/0x5b0 [xfs]
[18679.578708]  xfs_finobt_calc_reserves+0xe7/0x4d0 [xfs]
[18679.582480]  xfs_ag_resv_init+0x2c5/0x490 [xfs]
[18679.586023]  xfs_ag_shrink_space+0x736/0xd30 [xfs]
[18679.590730]  xfs_growfs_data_private.isra.0+0x55e/0x990 [xfs]
[18679.599764]  xfs_growfs_data+0x2f1/0x410 [xfs]
[18679.602212]  xfs_file_ioctl+0xd1e/0x1370 [xfs]

trying to get the AGI lock. The AGI lock was held by a fstress task
trying to do an inode allocation, and it was waiting on the AGF
lock to allocate a new inode chunk on disk. Hence deadlock.

The fix for this is for the growfs code to hold the AGI over the
transaction roll it does in the error path. It already holds the AGF
locked across this, and that is what causes the lock order inversion
in the xfs_ag_resv_init() call.

Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Fixes: 46141dc891f7 ("xfs: introduce xfs_ag_shrink_space()")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

libxfs: add a bi_entry helper

Add a helper to translate from the item list head to the bmap_intent
structure and use it so shorten assignments and avoid the need for extra
local variables.

Inspired-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: xfs_btree_bload_prep_block() should use __GFP_NOFAIL

Source kernel commit: 3aca0676a1141c4d198f8b3c934435941ba84244

This was missed in the conversion from KM* flags.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 10634530f7ba ("xfs: convert kmem_zalloc() to kzalloc()")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

libxfs: remove kmem_alloc, kmem_zalloc, and kmem_free

Remove all three of these helpers now that the kernel has dropped them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: move symlink target write function to libxfs

Source kernel commit: b8102b61f7b8929ad8043e4574a1e26276398041

Move xfs_symlink_write_target to xfs_symlink_remote.c so that kernel and
mkfs can share the same function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: move remote symlink target read function to libxfs

Source kernel commit: 376b4f0522484f43660dab8e4e92b471863b49f9

Move xfs_readlink_bmap_ilocked to xfs_symlink_remote.c so that the
swapext code can use it to convert a remote format symlink back to
shortform format after a metadata repair. While we're at it, fix a
broken printf prefix.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: move xfs_symlink_remote.c declarations to xfs_symlink_remote.h

Source kernel commit: 622d88e2ad7960b83af38dabf6b848a22a5a1c1f

Move declarations for libxfs symlink functions into a separate header
file like we do for most everything else.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: xfs_bmap_finish_one should map unwritten extents properly

Source kernel commit: 6c8127e93e3ac9c2cf6a13b885dd2d057b7e7d50

The deferred bmap work state and the log item can transmit unwritten
state, so the XFS_BMAP_MAP handler must map in extents with that
unwritten state.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: support deferred bmap updates on the attr fork

Source kernel commit: 52f807067ba4a122e75bf1e0e0595c78e6a3d8b6

The deferred bmap update log item has always supported the attr fork, so
plumb this in so that higher layers can access this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: add a realtime flag to the bmap update log redo items

Source kernel commit: 7302cda7f8b08062b11d2ba9ae0b4f3871fe3d46

Extend the bmap update (BUI) log items with a new realtime flag that
indicates that the updates apply against a realtime file's data fork.
We'll wire up the actual code later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: fix xfs_bunmapi to allow unmapping of partial rt extents

Source kernel commit: 2b6a5ec26887cba195022286b039f2cc0ec683b1

When XFS_BMAPI_REMAP is passed to bunmapi, that means that we want to
remove part of a block mapping without touching the allocator. For
realtime files with rtextsize > 1, that also means that we should skip
all the code that changes a partial remove request into an unwritten
extent conversion. IOWs, bunmapi in this mode should handle removing
the mapping from the rt file and nothing else.

Note that XFS_BMAPI_REMAP callers are required to decrement the
reference count and/or free the space manually.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: move xfs_bmap_defer_add to xfs_bmap_item.c

Source kernel commit: 80284115854e60686b2e0183b31bb303ae69aa8c

Move the code that adds the incore xfs_bmap_item deferred work data to a
transaction live with the BUI log item code. This means that the file
mapping code no longer has to know about the inner workings of the BUI
log items.

As a consequence, we can hide the _get_group helper.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: clean up bmap log intent item tracepoint callsites

Source kernel commit: 2a15e7686094d1362b5026533b96f57ec989a245

Pass the incore bmap structure to the tracepoints instead of open-coding
the argument passing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: hook live rmap operations during a repair operation

Source kernel commit: 7e1b84b24d257700e417bc9cd724c1efdff653d7

Hook the regular rmap code when an rmapbt repair operation is running so
that we can unlock the AGF buffer to scan the filesystem and keep the
in-memory btree up to date during the scan.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: create a shadow rmap btree during rmap repair

Source kernel commit: 4787fc802752c9b73b28ff18860c0560bf4337f2

Create an in-memory btree of rmap records instead of an array. This
enables us to do live record collection instead of freezing the fs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: repair the rmapbt

Source kernel commit: 32080a9b9b2ef8f4089e8e28a2c307334431757e

Rebuild the reverse mapping btree from all primary metadata. This first
patch establishes the bare mechanics of finding records and putting
together a new ondisk tree; more complex pieces are needed to make it
work properly.

Link: Documentation/filesystems/xfs-online-fsck-design.rst
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: create a helper to decide if a file mapping targets the rt volume

Source kernel commit: 5049ff4d140c8f6545464811409302cab017321a

Create a helper so that we can stop open-coding this decision
everywhere.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: launder in-memory btree buffers before transaction commit

Source kernel commit: 0dc63c8a1ce39c1ac7da536ee9174cdc714afae2

As we've noted in various places, all current users of in-memory btrees
are online fsck.  Online fsck only stages a btree long enough to rebuild
an ondisk data structure, which means that the in-memory btree is
ephemeral.  Furthermore, if we encounter /any/ errors while updating an
in-memory btree, all we do is tear down all the staged data and return
an errno to userspace.  In-memory btrees need not be transactional, so
their buffers should not be committed to the ondisk log, nor should they
be checkpointed by the AIL.  That's just as well since the ephemeral
nature of the btree means that the buftarg and the buffers may disappear
quickly anyway.

Therefore, we need a way to launder the btree buffers that get attached
to the transaction by the generic btree code.  Because the buffers are
directly mapped to backing file pages, there's no need to bwrite them
back to the tmpfs file.  All we need to do is clean enough of the buffer
log item state so that the bli can be detached from the buffer, remove
the bli from the transaction's log item list, and reset the transaction
dirty state as if the laundered items had never been there.

For simplicity, create xfbtree transaction commit and cancel helpers
that launder the in-memory btree buffers for callers.  Once laundered,
call the write verifier on non-stale buffers to avoid integrity issues,
or punch a hole in the backing file for stale buffers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: support in-memory btrees

Source kernel commit: a095686a2383526d7315197e2419d84ee8470217

Adapt the generic btree cursor code to be able to create a btree whose
buffers come from a (presumably in-memory) buftarg with a header block
that's specific to in-memory btrees. We'll connect this to other parts
of online scrub in the next patches.

Note that in-memory btrees always have a block size matching the system
memory page size for efficiency reasons. There are also a few things we
need to do to finalize a btree update; that's covered in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: add a xfs_btree_ptrs_equal helper

Source kernel commit: 8c1771c45dfa9dddd4569727c48204b66073d2c2

This only has a single caller and thus might be a bit questionable,
but I think it really improves the readability of
xfs_btree_visit_block.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

libxfs: support in-memory buffer cache targets

Allow the buffer cache to target in-memory files by connecting it to
xfiles. The next few patches will enable creating xfs_btrees in memory.
Unlike the kernel version of this patch, we use a partitioned xfile to
avoid overflowing the fd table instead of opening a separate memfd for
each target.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>

xfs: teach buftargs to maintain their own buffer hashtable

Source kernel commit: e7b58f7c1be20550d4f51cec6307b811e7555f52

Currently, cached buffers are indexed by per-AG hashtables.  This works
great for the data device, but won't work for in-memory btrees.  To
handle that use case, buftargs will need to be able to index buffers
independently of other data structures.

We accomplish this by hoisting the rhashtable and its lock into a
separate xfs_buf_cache structure, make the buftarg point to the
_buf_cache structure, and rework various functions to use it.  This
will enable the in-memory buftarg to come up with its own _buf_cache.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>