git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

xfs_scrub: remove flags argument from scrub_scan_all_inodes

Now that there's only one caller of scrub_scan_all_inodes, remove the
single defined flag because it can set the METADIR bulkstat flag if
needed. Clarify in the documentation that this is a special purpose
inode iterator that picks up things that don't normally happen.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: call bulkstat directly if we're only scanning user files

Christoph observed xfs_scrub phase 5 consuming a lot of CPU time on a
filesystem with a very large number of rtgroups.  He traced this to
bulkstat_for_inumbers spending a lot of time trying to single-step
through inodes that were marked allocated in the inumbers record but
didn't show up in the bulkstat data.  These correspond to files in the
metadata directory tree that are not returned by the regular bulkstat.

This complex machinery isn't necessary for the inode walk that occur
during phase 5 because phase 5 wants to open user files and check the
dirent/xattr names associated with that file.  It's not needed for phase
6 because we're only using it to report data loss in unlinked files when
parent pointers aren't enabled.

Furthermore, we don't need to do this inumbers -> bulkstat dance because
phase 3 and 4 supposedly fixed any inode that was to corrupt to be
igettable and hence reported on by bulkstat.

Fix this by creating a simpler user file iterator that walks bulkstat
across the filesystem without using inumbers.  While we're at it, fix
the obviously incorrect comments in inodes.h.

Cc: <linux-xfs@vger.kernel.org> # v4.15.0
Fixes: 372d4ba99155b2 ("xfs_scrub: add inode iteration functions")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: don't report data loss in unlinked inodes twice

If parent pointers are enabled, report_ioerr_fsmap will report lost file
data and xattrs for all files, having used the parent pointer ioctls to
generate the path of the lost file. For unlinked files, the path lookup
will fail, but we'll report the inumber of the file that lost data.

Therefore, we don't need to do a separate scan of the unlinked inodes
in report_all_media_errors after doing the fsmap scan.

Cc: <linux-xfs@vger.kernel.org> # v6.10.0
Fixes: 9b5d1349ca5fb1 ("xfs_scrub: use parent pointers to report lost file data")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libfrog: wrap handle construction code

Clean up all the open-coded logic to construct a file handle from a
fshandle and some bulkstat/parent pointer information.  The new
functions are stashed in a private header file to avoid leaking the
details of xfs_handle construction in the public libhandle headers.

I tried moving the code to libhandle, but I don't entirely like the
result.  The libhandle functions pass around handles as arbitrary binary
blobs that come from and are sent to the kernel, meaning that the
interface is full of (void *, size_t) tuples.  Putting these new
functions in libhandle breaks that abstraction because now clients know
that they can deal with a struct xfs_handle.

We could fix that leak by changing it to a (void *, size_t) tuple, but
then we'd have to validate the size_t or returns -1 having set errno,
which then means that all the client code now has to have error handling
for a case that we're fairly sure can't be true.  This is overkill for
xfsprogs code that knows better, because we can trust ourselves to know
the exact layout of a handle.

So this nice compact code:

memcpy(&handle.ha_fsid, file->fshandle, file->fshandle_len);
handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
sizeof(handle.ha_fid.fid_len);

becomes:

ret = handle_from_fshandle(&handle, file->fshandle,
file->fshandle_len);
if (ret) {
perror("what?");
return -1;
}

Which is much more verbose code, and right now it exists to handle an
exceptional condition that is not possible.  If someone outside of
xfsprogs would like this sort of functionality in libhandle I'm all for
adding it, but with zero demand from external users, I prefer to keep
things simple.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

man: document new XFS_BULK_IREQ_METADIR flag to bulkstat

Document this new flag.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: mark xmbuf_{un,}map_page static

Not used outside of buf_mem.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

libxfs: unmap xmbuf pages to avoid disaster

It turns out that there's a maximum mappings count, so we need to be
smartish about not overflowing that with too many xmbuf buffers. This
needs to be a global value because high-agcount filesystems will create
a large number of xmbuf caches but this is a process-global limit.

Cc: <linux-xfs@vger.kernel.org> # v6.9.0
Fixes: 124b388dac17f5 ("libxfs: support in-memory buffer cache targets")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: obfuscate rt superblock label when metadumping

Metadump can obfuscate the filesystem label on all the superblocks on
the data device, so it must perform the same transformation on the
realtime device superblock to avoid leaking information and so that the
mdrestored filesystem is consistent.

Found by running xfs/503 with realtime turned on and a patch to set
labels on common/populated filesystem images.

Cc: <linux-xfs@vger.kernel.org> # v6.13.0
Fixes: 6bc20c5edbab51 ("xfs_db: metadump realtime devices")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

mkfs,xfs_repair: don't pass a daddr as the flags argument

libxfs_buf_get_uncached doesn't take a daddr argument, so don't pass one
as the flags argument.

Cc: <linux-xfs@vger.kernel.org> # v6.13.0
Fixes: 0d7c490474e5e5 ("mkfs: format realtime groups")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: Release v6.13.0

Update all the necessary files for a v6.13.0 release.

Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>

xfs_protofile: fix device number encoding

Actually crack major/minor device numbers from the stat results that we
get when we encounter a character/block device file.

Fixes: 6aace700b7b82d ("mkfs: add a utility to generate protofiles")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_protofile: fix mode formatting error

The protofile parser expects the mode to be specified with three octal
digits. Unfortunately, the generator doesn't get that right if the mode
doesn't have any of bits 8-11 (aka no owner access privileges) set.

Fixes: 6aace700b7b82d ("mkfs: add a utility to generate protofiles")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

mkfs: fix file size setting when interpreting a protofile

When we're copying a regular file into the filesystem, we should set the
size of the new file to the size indicated by the stat data, not the
highest offset written, because we now use SEEK_DATA/HOLE to ignore
sparse regions.

Fixes: 73fb78e5ee8940 ("mkfs: support copying in large or sparse files")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: require zeroed quota/rt inodes in metadir superblocks

If metadata directory trees are enabled, the superblock inode pointers
to quota and rt free space metadata must all be zero. The only inode
pointers in the superblock are sb_rootino and sb_metadirino.

Found by running xfs/418.

Fixes: b790ab2a303d58 ("xfs_repair: support quota inodes in the metadata directory")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

mkfs: use a default sector size that is also suitable for the rtdev

When creating a filesytem where the data device has a sector size
smalle than that of the RT device without further options, mkfs
currently fails with:

mkfs.xfs: error - cannot set blocksize 512 on block device $RTDEV: Invalid argument

This is because XFS sets the sector size based on logical block size
of the data device, but not that of the RT device. Change the code
so that is uses the larger of the two values.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

xfs_scrub_all.timer: don't run if /var/lib/xfsprogs is readonly

The xfs_scrub_all program wants to write a state file into the package
state dir to keep track of how recently it performed a media scan.
Don't allow the systemd timer to run if that path isn't writable.

Cc: linux-xfs@vger.kernel.org # v6.10.0
Fixes: 267ae610a3d90f ("xfs_scrub_all: enable periodic file data scrubs automatically")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_logprint: Fix super block buffer interpretation issue

When using xfs_logprint to interpret the buffer of the super block, the
icount will always be 6360863066640355328 (0x5846534200001000). This is
because the offset of icount is incorrect, causing xfs_logprint to
misinterpret the MAGIC number as icount.
This patch fixes the offset value of the SB counters in xfs_logprint.

Before this patch:
icount: 6360863066640355328 ifree: 5242880 fdblks: 0 frext: 0

After this patch:
icount: 10240 ifree: 4906 fdblks: 37 frext: 0

Suggested-by: Darrick J. Wong <djwong@kernel.org>
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

mkfs: allow sizing realtime allocation groups for concurrency

Add a -r concurrency= option to mkfs so that sysadmins can configure the
filesystem so that there are enough rtgroups that the specified number
of threads can (in theory) can find an uncontended rtgroup from which to
allocate space. This has the exact same purpose as the -d concurrency
switch that was added for the data device.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

build: initialize stack variables to zero by default

Newer versions of gcc and clang can include the ability to zero stack
variables by default. Let's enable it so that we (a) reduce the risk of
writing stack contents to disk somewhere and (b) try to reduce
unpredictable program behavior based on random stack contents. The
kernel added this 6 years ago, so I think it's mature enough for
xfsprogs.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reluctantly-Reviewed-by: Christoph Hellwig <hch@lst.de>

m4: fix statx override selection if /usr/include doesn't define it

If the system headers (aka the ones in /usr/include) do not define
struct statx at all, we need to use our internal override. The m4 code
doesn't handle this admittedly corner case, but let's fix it for anyone
trying to build new xfsprogs on a decade-old distribution.

Fixes: 409477af604f46 ("xfs_io: add support for atomic write statx fields")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

mkfs: fix parsing of value-less -d/-l concurrency cli option

It's supposed to be possible to specify the -d concurrency option with
no value in order to get mkfs calculate the agcount from the number of
CPUs. Unfortunately I forgot to handle that case (optarg is null) so
mkfs crashes instead. Fix that.

Fixes: 9338bc8b1bf073 ("mkfs: allow sizing allocation groups for concurrency")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: improve error message when unknown btree type given to btheight

I found accidentally that if you do this (note 'rmap', not 'rmapbt'):

xfs_db /dev/sda -c 'btheight -n 100 rmap'

The program spits back "Numerical result out of range". That's the
result of it failing to match "rmap" against a known btree type, and
falling back to parsing the string as if it were a btree geometry
description.

Improve this a little by checking that there's at least one semicolon in
the string so that the error message improves to:

"rmap: expected a btree geometry specification"

Fixes: cb1e69c564c1e0 ("xfs_db: add a function to compute btree geometry")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: fix uninit variable in libxfs_alloc_file_space

Fix this uninitialized variable.

Coverity-id: 1637359
Fixes: b48164b8cd7618 ("libxfs: resync libxfs_alloc_file_space interface with the kernel")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: don't obliterate return codes

Don't clobber error here, it's err2 that's the temporary variable.

Coverity-id: 1637363
Fixes: b790ab2a303d58 ("xfs_repair: support quota inodes in the metadata directory")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: fix multiple dblock commands

Tom Samstag reported that running the following sequence of commands no
longer works quite right:

> inode [inodenum]
> dblock 0
> p
> dblock 1
> p
> dblock 2
> p
> [etc]

Mr. Samstag looked into the source code and discovered that the
dblock_f is incorrectly accessing iocur_top->data outside of the
push_cur -> set_cur_inode -> pop_cur sequence that this function uses to
compute the type of the file data. In other words, it's using
whatever's on top of the stack at the start of the function. For the
"dblock 0" case above this is the inode, but for the "dblock 1" case
this is the contents of file data block 0, not an inode.

Fix this by relocating the check to the correct place.

Reported-by: tom.samstag@netrise.io
Tested-by: Tom Samstag <tom.samstag@netrise.io>
Cc: <linux-xfs@vger.kernel.org> # v6.12.0
Fixes: b05a31722f5d4c ("xfs_db: access realtime file blocks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: don't return an error from xfs_update_last_rtgroup_size for !XFS_RT

Source kernel commit: 7ee7c9b39ed36caf983706f5b893cc5c37a79071

Non-rtg file systems have a fake RT group even if they do not have a RT
device, and thus an rgcount of 1. Ensure xfs_update_last_rtgroup_size
doesn't fail when called for !XFS_RT to handle this case.

Fixes: 87fe4c34a383 ("xfs: create incore realtime group structures")
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs_io: add extsize command support

extsize command is currently only supported with XFS filesystem.
Lift this restriction now that ext4 is also supporting extsize hints.

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

xfs_io: allow foreign FSes to show FS_IOC_FSGETXATTR details

Currently with stat we only show FS_IOC_FSGETXATTR details if the
filesystem is XFS. With extsize support also coming to ext4 and possibly
other filesystems, make sure to allow foreign FSes to display these details
when "stat" or "statx" is used.

(Thanks to Dave for suggesting implementation of print_extended_info())

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

include/linux.h: use linux/magic.h to get XFS_SUPER_MAGIC

This avoids open coding the magic number

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

mkfs: enable rt quota options

Now that the kernel supports quota and realtime devices, allow people to
format filesystems with permanent quota options.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_quota: report warning limits for realtime space quotas

Report the number of warnings that a user will get for exceeding the
soft limit of a realtime volume.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

mkfs: add quota flags when setting up filesystem

If we're creating a metadir filesystem, the quota accounting and
enforcement flags persist until the sysadmin changes them. Add a means
to specify those qflags at format time.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: try not to trash qflags on metadir filesystems

Try to preserve the accounting and enforcement quota flags when
repairing filesystems.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: support quota inodes in the metadata directory

Handle quota inodes on metadir filesystems. This means that we have to
discover whatever quota inodes exist by looking in /quotas instead of
the superblock, and mend any broken metadir tree links might exist.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: hoist the secondary sb qflags handling

Hoist all the secondary superblock qflags and quota inode modification
code into a separate function so that we can disable it in the next
patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: refactor quota inumber handling

In preparation for putting quota files in the metadata directory tree,
refactor repair's quota inumber handling to use its own variables
instead of the xfs_mount's.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: support metadir quotas

Support finding the quota files in the metadata directory.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libfrog: scrub quota file metapaths

Support scrubbing quota file metadir paths.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

mkfs: format realtime groups

Create filesystems with the realtime group feature enabled.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

mkfs: add headers to realtime bitmap blocks

When the rtgroups feature is enabled, format rtbitmap blocks with the
appropriate block headers. libxfs takes care of the actual writing for
us, so all we have to do is ensure that the bitmap is the correct size.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: use histograms to speed up phase 8 on the realtime volume

Use the same statistical methods that we use on the data volume to
compute the minimum threshold size for fstrims on the realtime volume.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: trim realtime volumes too

On the kernel side, the XFS realtime groups patchset added support for
FITRIM of the realtime volume. This support doesn't actually require
there to be any realtime groups, so teach scrub to run through the whole
region.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: call GETFSMAP for each rt group in parallel

If realtime groups are enabled, we should take advantage of the sharding
to speed up the spacemap scans. Do so by issuing per-rtgroup GETFSMAP
calls.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: cleanup fsmap keys initialization

Use the good old array notations instead of pointer arithmetics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
[djwong: fold scan_rtg_rmaps cleanups into next patch]
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

xfs_scrub: check rtgroup metadata directory connections

Run the rtgroup metapath scrubber during phase 5 to ensure that any
rtgroup metadata files are still connected to the metadir tree after
we've pruned any bad links.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_scrub: scrub realtime allocation group metadata

Scan realtime group metadata as part of phase 2, just like we do for AG
metadata. For pre-rtgroup filesystems, pretend that this is a "rtgroup
0" scrub request because the kernel expects that. Replace the old
cond_wait code with a scrub barrier because they're equivalent for two
items that cannot be scrubbed in parallel.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_spaceman: report on realtime group health

Add the realtime group status to the health reporting done by
xfs_spaceman.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_mdrestore: restore rt group superblocks to realtime device

Support restoring realtime device metadata to the realtime device, if
the dumped filesystem had one.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_mdrestore: refactor open-coded fd/is_file into a structure

Create an explicit object to track the fd and flags associated with a
device onto which we are restoring metadata, and use it to reduce the
amount of open-coded arguments to ->restore. This avoids some grossness
in the next patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_io: display rt group in verbose fsmap output

Display the rt group number in the fsmap output, just like we do for
regular data files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_io: display rt group in verbose bmap output

Display the rt group number in the bmap -v output, just like we do for
regular data files.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_io: add a command to display realtime group information

Add a new 'rginfo' command to xfs_io so that we can display realtime
group geometry.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_io: add a command to display allocation group information

Add a new 'aginfo' command to xfs_io so that we can display allocation
group geometry.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_io: support scrubbing rtgroup metadata paths

Support scrubbing the metadata directory path of an rtgroup metadata
file.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_io: support scrubbing rtgroup metadata

Support scrubbing all rtgroup metadata with a scrubv call.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: report rt group and block number in the bmap command

The bmap command does not report startblocks for realtime files
correctly. If rtgroups are enabled, we need to use the appropriate
functions to crack the startblock into rtgroup and block numbers; if
not, then we need to report a linear address and not try to report a
group number.

Fix both of these issues.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: dump rt summary blocks

Now that rtsummary blocks have a header, make it so that xfs_db can
analyze the structure.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: dump rt bitmap blocks

Now that rtbitmap blocks have a header, make it so that xfs_db can
analyze the structure.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: metadump realtime devices

Teach the metadump device to dump the filesystem metadata of a realtime
device to the metadump file. Currently, this is limited to the realtime
superblock.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: metadump metadir rt bitmap and summary files

Don't skip dumping the data fork for regular files that are marked as
metadata inodes. This catches rtbitmap and summary inodes on rtgroup
enabled file systems where their inode numbers aren't recorded in the
superblock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: enable conversion of rt space units

Teach the xfs_db convert function about realtime extents, blocks, and
realtime group numbers.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: support changing the label and uuid of rt superblocks

Update the label and uuid commands to change the rt superblocks along
with the filesystem superblocks.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: support dumping realtime group data and superblocks

Allow dumping of realtime device superblocks and the new fields in the
primary superblock that were added for rtgroups support.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: listify the definition of enum typnm

Convert the enum definition into a list so that future patches adding
things to enum typnm don't have to reflow the entire thing.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: enable rtconvert to handle segmented rtblocks

Now that we've turned xfs_rtblock_t into a segmented address and
xfs_rtxnum_t into a per-rtgroup address, port the rtconvert debugger
command to handle the unit conversions correctly. Also add an example
of the bitmap/summary-related conversion commands to the manpage.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_db: enable the rtblock and rtextent commands for segmented rt block numbers

Now that xfs_rtblock_t can be a segmented address, fix the validation in
rtblock_f to handle the inputs correctly; and fix rtextent_f to do all
of its conversions in linear address space.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: repair rtbitmap and rtsummary block headers

Check and repair the new block headers attached to rtbitmap and
rtsummary blocks.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: support realtime superblocks

Support the realtime superblock feature.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: find and clobber rtgroup bitmap and summary files

On a rtgroups filesystem, if the rtgroups bitmap or summary files are
garbage, we need to clear the dinode and update the incore bitmap so
that we don't bother to check the old rt freespace metadata.

However, we regenerate the entire rt metadata directory tree during
phase 6. If the bitmap and summary files are ok, we still want to clear
the dinode, but we can still use the incore inode to check the old
freespace contents. Split the clear_dinode function into two pieces,
one that merely zeroes the inode, and the old clear_dinode now turns off
checking.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: support realtime groups

Make repair aware of multiple rtgroups. This now uses the same code as
the AG-based data device for block usage tracking instead of the less
optimal AVL trees and bitmaps used for the traditonal RT device. This
is done by introducing similar per-rtgroup space tracking structures as
we have for the AGs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

xfs_repair: add a real per-AG bitmap abstraction

Add a struct bmap that contains the btree root and the lock, and provide
helpers for loking instead of directly poking into the data structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: simplify rt_lock handling

No need to cacheline align rt_lock if we move it next to the data
it protects. Also reduce the critical section to just where those
data structures are accessed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: improve rtbitmap discrepancy reporting

Improve the reporting of discrepancies in the realtime bitmap and
summary files by creating a separate helper function that will pinpoint
the exact (word) locations of mismatches. This will help developers to
diagnose problems with the rtgroups feature and users to figure out
exactly what's bad in a filesystem.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: refactor offsetof+sizeof to offsetofend

Replace this open-coded logic with the kernel's offsetofend macro before
we start adding more in the next patch.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: refactor phase4

Split out helpers to process all duplicate extents in an AG and the RT
duplicate extents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: adjust rtbitmap/rtsummary word updates to handle big endian values

With rtgroups, the rt bitmap and summary file words are defined to be
be32 values. Adjust repair to handle the endianness correctly.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_logprint: report realtime EFIs

Decode the EFI format just enough to report if an EFI targets the
realtime device or not.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libfrog: add bitmap_clear

Uncomment and fix bitmap_clear so that xfs_repair can start using it.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libfrog: report rt groups in output

Report realtime group geometry.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libfrog: support scrubbing rtgroup metadata paths

Support scrubbing the metadata paths of rtgroup metadata inodes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: implement some sanity checking for enormous rgcount

Similar to what we do for suspiciously large sb_agcount values, if
someone tries to get libxfs to load a filesystem with a very large
realtime group count, let's do some basic checks of the rt device to
see if it's really that large. If the read fails, only load the first
rtgroup and warn the user.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: port userspace deferred log item to handle rtgroups

Make the userspace log items to handle rt groups correctly.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

man: document rgextents geom field

Document the new rgextent geom field.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

man: document the rt group geometry ioctl

Document the new ioctl that retrieves realtime allocation group geometry
information.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libfrog: scrub the realtime group superblock

Enable scrubbing of realtime group superblocks in xfs_scrub, and
update the scrub ioctl documentation.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: use correct rtx count to block count conversion

Fix a place where we use the wrong conversion functions to convert
between a number of rt extents and a number of rt blocks. This isn't
really necessary since userspace cannot allocate rt extents, but let's
not leave a logic bomb.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair,mkfs: port to libxfs_rt{bitmap,summary}_create

Replace the open-coded rtbitmap and summary creation routines with the
ones in libxfs so that we share code.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: adjust xfs_fsb_to_db to handle segmented rtblocks

Update this function to handle segmented xfs_rtblock_t, just like we did
for the kernel.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: remove XFS_ILOCK_RT*

Now that we've centralized the realtime metadata locking routines, get
rid of the ILOCK subclasses since we now use explicit lockdep classes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: return from xfs_symlink_verify early on V4 filesystems

Source kernel commit: 7f8b718c58783f3ff0810b39e2f62f50ba2549f6

V4 symlink blocks didn't have headers, so return early if this is a V4
filesystem.

Cc: <stable@vger.kernel.org> # v5.1
Fixes: 39708c20ab5133 ("xfs: miscellaneous verifier magic value fixups")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: fix sb_spino_align checks for large fsblock sizes

Source kernel commit: 7f8a44f37229fc76bfcafa341a4b8862368ef44a

For a sparse inodes filesystem, mkfs.xfs computes the values of
sb_spino_align and sb_inoalignmt with the following code:

int     cluster_size = XFS_INODE_BIG_CLUSTER_SIZE;

if (cfg->sb_feat.crcs_enabled)
cluster_size *= cfg->inodesize / XFS_DINODE_MIN_SIZE;

sbp->sb_spino_align = cluster_size >> cfg->blocklog;
sbp->sb_inoalignmt = XFS_INODES_PER_CHUNK *
cfg->inodesize >> cfg->blocklog;

On a V5 filesystem with 64k fsblocks and 512 byte inodes, this results
in cluster_size = 8192 * (512 / 256) = 16384.  As a result,
sb_spino_align and sb_inoalignmt are both set to zero.  Unfortunately,
this trips the new sb_spino_align check that was just added to
xfs_validate_sb_common, and the mkfs fails:

# mkfs.xfs -f -b size=64k, /dev/sda
meta-data=/dev/sda               isize=512    agcount=4, agsize=81136 blks
=                       sectsz=512   attr=2, projid32bit=1
=                       crc=1        finobt=1, sparse=1, rmapbt=1
=                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
=                       exchange=0   metadir=0
data     =                       bsize=65536  blocks=324544, imaxpct=25
=                       sunit=0      swidth=0 blks
naming   =version 2              bsize=65536  ascii-ci=0, ftype=1, parent=0
log      =internal log           bsize=65536  blocks=5006, version=2
=                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=65536  blocks=0, rtextents=0
=                       rgcount=0    rgsize=0 extents
Discarding blocks...Sparse inode alignment (0) is invalid.
Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
Sparse inode alignment (0) is invalid.
Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
mkfs.xfs: writing AG headers failed, err=22

Prior to commit 59e43f5479cce1 this all worked fine, even if "sparse"
inodes are somewhat meaningless when everything fits in a single
fsblock.  Adjust the checks to handle existing filesystems.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 59e43f5479cce1 ("xfs: sb_spino_align is not verified")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: update btree keys correctly when _insrec splits an inode root block

Source kernel commit: 6d7b4bc1c3e00b1a25b7a05141a64337b4629337

In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec
would erroneously try to update the parent's key for a block that had
been split if we decided to insert the new record into the new block.
The solution was to detect this situation and update the in-core key
value that we pass up to the caller so that the caller will (eventually)
add the new block to the parent level of the tree with the correct key.

However, I missed a subtlety about the way inode-rooted btrees work.  If
the full block was a maximally sized inode root block, we'll solve that
fullness by moving the root block's records to a new block, resizing the
root block, and updating the root to point to the new block.  We don't
pass a pointer to the new block to the caller because that work has
already been done.  The new record will /always/ land in the new block,
so in this case we need to use xfs_btree_update_keys to update the keys.

This bug can theoretically manifest itself in the very rare case that we
split a bmbt root block and the new record lands in the very first slot
of the new block, though I've never managed to trigger it in practice.
However, it is very easy to reproduce by running generic/522 with the
realtime rmapbt patchset if rtinherit=1.

Cc: <stable@vger.kernel.org> # v4.8
Fixes: 2c813ad66a7218 ("xfs: support btrees with overlapping intervals for keys")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: fix error bailout in xfs_rtginode_create

Source kernel commit: 23bee6f390a12d0c4c51fefc083704bc5dac377e

smatch reported that we screwed up the error cleanup in this function.
Fix it.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: ae897e0bed0f54 ("xfs: support creating per-RTG files in growfs")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: return a 64-bit block count from xfs_btree_count_blocks

Source kernel commit: bd27c7bcdca25ce8067ebb94ded6ac1bd7b47317

With the nrext64 feature enabled, it's possible for a data fork to have
2^48 extent mappings. Even with a 64k fsblock size, that maps out to
a bmbt containing more than 2^32 blocks. Therefore, this predicate must
return a u64 count to avoid an integer wraparound that will cause scrub
to do the wrong thing.

It's unlikely that any such filesystem currently exists, because the
incore bmbt would consume more than 64GB of kernel memory on its own,
and so far nobody except me has driven a filesystem that far, judging
from the lack of complaints.

Cc: <stable@vger.kernel.org> # v5.19
Fixes: df9ad5cc7a5240 ("xfs: Introduce macros to represent new maximum extent counts for data/attr forks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay

Source kernel commit: cc2dba08cc33daf8acd6e560957ef0e0f4d034ed

xfs_bmap_add_extent_hole_delay works entirely on delalloc extents, for
which xfs_bmap_same_rtgroup doesn't make sense.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: switch to multigrain timestamps

Source kernel commit: 1cf7e834a6fb84de9d1e038d6cf4c5bd0d202ffa

Enable multigrain timestamps, which should ensure that there is an
apparent change to the timestamp whenever it has been written after
being actively observed via getattr.

Also, anytime the mtime changes, the ctime must also change, and those
are now the only two options for xfs_trans_ichgtime. Have that function
unconditionally bump the ctime, and ASSERT that XFS_ICHGTIME_CHG is
always set.

Finally, stop setting STATX_CHANGE_COOKIE in getattr, since the ctime
should give us better semantics now.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # documentation bits
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20241002-mgtime-v10-9-d1c4717f5284@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: fix sparse inode limits on runt AG

Source kernel commit: 13325333582d4820d39b9e8f63d6a54e745585d9

The runt AG at the end of a filesystem is almost always smaller than
the mp->m_sb.sb_agblocks. Unfortunately, when setting the max_agbno
limit for the inode chunk allocation, we do not take this into
account. This means we can allocate a sparse inode chunk that
overlaps beyond the end of an AG. When we go to allocate an inode
from that sparse chunk, the irec fails validation because the
agbno of the start of the irec is beyond valid limits for the runt
AG.

Prevent this from happening by taking into account the size of the
runt AG when allocating inode chunks. Also convert the various
checks for valid inode chunk agbnos to use xfs_ag_block_count()
so that they will also catch such issues in the future.

Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: remove unknown compat feature check in superblock write validation

Source kernel commit: 652f03db897ba24f9c4b269e254ccc6cc01ff1b7

Compat features are new features that older kernels can safely ignore,
allowing read-write mounts without issues. The current sb write validation
implementation returns -EFSCORRUPTED for unknown compat features,
preventing filesystem write operations and contradicting the feature's
definition.

Additionally, if the mounted image is unclean, the log recovery may need
to write to the superblock. Returning an error for unknown compat features
during sb write validation can cause mount failures.

Although XFS currently does not use compat feature flags, this issue
affects current kernels' ability to mount images that may use compat
feature flags in the future.

Since superblock read validation already warns about unknown compat
features, it's unnecessary to repeat this warning during write validation.
Therefore, the relevant code in write validation is being removed.

Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: port ondisk structure checks from xfs/122 to the kernel

Source kernel commit: 13877bc79d81354c53e91f3c86ac0f7bafe3ba7b

Check this with every kernel and userspace build, so we can drop the
nonsense in xfs/122. Roughly drafted with:

sed -e 's/^offsetof/\tXFS_CHECK_OFFSET/g' \
-e 's/^sizeof/\tXFS_CHECK_STRUCT_SIZE/g' \
-e 's/ = $[0-9]*$/,\t\t\t\1);/g' \
-e 's/xfs_sb_t/struct xfs_dsb/g' \
-e 's/),/,/g' \
-e 's/xfs_$[a-z0-9_]*$_t,/struct xfs_\1,/g' \
< tests/xfs/122.out | sort

and then manual fixups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs: separate space btree structures in xfs_ondisk.h

Source kernel commit: 131a883fffb1a194957dc0e400d9f627c7cd1924

Create a separate section for space management btrees so that they're
not mixed in with file structures. Ignore the dsb stuff sprinkled
around for now, because we'll deal with that in a subsequent patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>