Darrick J. Wong [Mon, 24 Feb 2025 18:21:42 +0000 (10:21 -0800)]
xfs_scrub: remove flags argument from scrub_scan_all_inodes
Now that there's only one caller of scrub_scan_all_inodes, remove the
single defined flag because it can set the METADIR bulkstat flag if
needed. Clarify in the documentation that this is a special purpose
inode iterator that picks up things that don't normally happen.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:42 +0000 (10:21 -0800)]
xfs_scrub: call bulkstat directly if we're only scanning user files
Christoph observed xfs_scrub phase 5 consuming a lot of CPU time on a
filesystem with a very large number of rtgroups. He traced this to
bulkstat_for_inumbers spending a lot of time trying to single-step
through inodes that were marked allocated in the inumbers record but
didn't show up in the bulkstat data. These correspond to files in the
metadata directory tree that are not returned by the regular bulkstat.
This complex machinery isn't necessary for the inode walk that occur
during phase 5 because phase 5 wants to open user files and check the
dirent/xattr names associated with that file. It's not needed for phase
6 because we're only using it to report data loss in unlinked files when
parent pointers aren't enabled.
Furthermore, we don't need to do this inumbers -> bulkstat dance because
phase 3 and 4 supposedly fixed any inode that was to corrupt to be
igettable and hence reported on by bulkstat.
Fix this by creating a simpler user file iterator that walks bulkstat
across the filesystem without using inumbers. While we're at it, fix
the obviously incorrect comments in inodes.h.
Cc: <linux-xfs@vger.kernel.org> # v4.15.0 Fixes: 372d4ba99155b2 ("xfs_scrub: add inode iteration functions") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:42 +0000 (10:21 -0800)]
xfs_scrub: don't report data loss in unlinked inodes twice
If parent pointers are enabled, report_ioerr_fsmap will report lost file
data and xattrs for all files, having used the parent pointer ioctls to
generate the path of the lost file. For unlinked files, the path lookup
will fail, but we'll report the inumber of the file that lost data.
Therefore, we don't need to do a separate scan of the unlinked inodes
in report_all_media_errors after doing the fsmap scan.
Cc: <linux-xfs@vger.kernel.org> # v6.10.0 Fixes: 9b5d1349ca5fb1 ("xfs_scrub: use parent pointers to report lost file data") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:41 +0000 (10:21 -0800)]
libfrog: wrap handle construction code
Clean up all the open-coded logic to construct a file handle from a
fshandle and some bulkstat/parent pointer information. The new
functions are stashed in a private header file to avoid leaking the
details of xfs_handle construction in the public libhandle headers.
I tried moving the code to libhandle, but I don't entirely like the
result. The libhandle functions pass around handles as arbitrary binary
blobs that come from and are sent to the kernel, meaning that the
interface is full of (void *, size_t) tuples. Putting these new
functions in libhandle breaks that abstraction because now clients know
that they can deal with a struct xfs_handle.
We could fix that leak by changing it to a (void *, size_t) tuple, but
then we'd have to validate the size_t or returns -1 having set errno,
which then means that all the client code now has to have error handling
for a case that we're fairly sure can't be true. This is overkill for
xfsprogs code that knows better, because we can trust ourselves to know
the exact layout of a handle.
ret = handle_from_fshandle(&handle, file->fshandle,
file->fshandle_len);
if (ret) {
perror("what?");
return -1;
}
Which is much more verbose code, and right now it exists to handle an
exceptional condition that is not possible. If someone outside of
xfsprogs would like this sort of functionality in libhandle I'm all for
adding it, but with zero demand from external users, I prefer to keep
things simple.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:41 +0000 (10:21 -0800)]
libxfs: unmap xmbuf pages to avoid disaster
It turns out that there's a maximum mappings count, so we need to be
smartish about not overflowing that with too many xmbuf buffers. This
needs to be a global value because high-agcount filesystems will create
a large number of xmbuf caches but this is a process-global limit.
Cc: <linux-xfs@vger.kernel.org> # v6.9.0 Fixes: 124b388dac17f5 ("libxfs: support in-memory buffer cache targets") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:41 +0000 (10:21 -0800)]
xfs_db: obfuscate rt superblock label when metadumping
Metadump can obfuscate the filesystem label on all the superblocks on
the data device, so it must perform the same transformation on the
realtime device superblock to avoid leaking information and so that the
mdrestored filesystem is consistent.
Found by running xfs/503 with realtime turned on and a patch to set
labels on common/populated filesystem images.
Cc: <linux-xfs@vger.kernel.org> # v6.13.0 Fixes: 6bc20c5edbab51 ("xfs_db: metadump realtime devices") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 3 Feb 2025 22:40:55 +0000 (14:40 -0800)]
xfs_protofile: fix device number encoding
Actually crack major/minor device numbers from the stat results that we
get when we encounter a character/block device file.
Fixes: 6aace700b7b82d ("mkfs: add a utility to generate protofiles") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 3 Feb 2025 22:40:39 +0000 (14:40 -0800)]
xfs_protofile: fix mode formatting error
The protofile parser expects the mode to be specified with three octal
digits. Unfortunately, the generator doesn't get that right if the mode
doesn't have any of bits 8-11 (aka no owner access privileges) set.
Fixes: 6aace700b7b82d ("mkfs: add a utility to generate protofiles") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 3 Feb 2025 22:40:24 +0000 (14:40 -0800)]
mkfs: fix file size setting when interpreting a protofile
When we're copying a regular file into the filesystem, we should set the
size of the new file to the size indicated by the stat data, not the
highest offset written, because we now use SEEK_DATA/HOLE to ignore
sparse regions.
Fixes: 73fb78e5ee8940 ("mkfs: support copying in large or sparse files") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 29 Jan 2025 00:48:26 +0000 (16:48 -0800)]
xfs_repair: require zeroed quota/rt inodes in metadir superblocks
If metadata directory trees are enabled, the superblock inode pointers
to quota and rt free space metadata must all be zero. The only inode
pointers in the superblock are sb_rootino and sb_metadirino.
Found by running xfs/418.
Fixes: b790ab2a303d58 ("xfs_repair: support quota inodes in the metadata directory") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
mkfs: use a default sector size that is also suitable for the rtdev
When creating a filesytem where the data device has a sector size
smalle than that of the RT device without further options, mkfs
currently fails with:
mkfs.xfs: error - cannot set blocksize 512 on block device $RTDEV: Invalid argument
This is because XFS sets the sector size based on logical block size
of the data device, but not that of the RT device. Change the code
so that is uses the larger of the two values.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 27 Jan 2025 21:36:34 +0000 (13:36 -0800)]
xfs_scrub_all.timer: don't run if /var/lib/xfsprogs is readonly
The xfs_scrub_all program wants to write a state file into the package
state dir to keep track of how recently it performed a media scan.
Don't allow the systemd timer to run if that path isn't writable.
Cc: linux-xfs@vger.kernel.org # v6.10.0 Fixes: 267ae610a3d90f ("xfs_scrub_all: enable periodic file data scrubs automatically") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Chi Zhiling [Thu, 16 Jan 2025 09:09:39 +0000 (17:09 +0800)]
xfs_logprint: Fix super block buffer interpretation issue
When using xfs_logprint to interpret the buffer of the super block, the
icount will always be 6360863066640355328 (0x5846534200001000). This is
because the offset of icount is incorrect, causing xfs_logprint to
misinterpret the MAGIC number as icount.
This patch fixes the offset value of the SB counters in xfs_logprint.
After this patch:
icount: 10240 ifree: 4906 fdblks: 37 frext: 0
Suggested-by: Darrick J. Wong <djwong@kernel.org> Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:05 +0000 (13:22 -0800)]
mkfs: allow sizing realtime allocation groups for concurrency
Add a -r concurrency= option to mkfs so that sysadmins can configure the
filesystem so that there are enough rtgroups that the specified number
of threads can (in theory) can find an uncontended rtgroup from which to
allocate space. This has the exact same purpose as the -d concurrency
switch that was added for the data device.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:05 +0000 (13:22 -0800)]
build: initialize stack variables to zero by default
Newer versions of gcc and clang can include the ability to zero stack
variables by default. Let's enable it so that we (a) reduce the risk of
writing stack contents to disk somewhere and (b) try to reduce
unpredictable program behavior based on random stack contents. The
kernel added this 6 years ago, so I think it's mature enough for
xfsprogs.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reluctantly-Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:05 +0000 (13:22 -0800)]
m4: fix statx override selection if /usr/include doesn't define it
If the system headers (aka the ones in /usr/include) do not define
struct statx at all, we need to use our internal override. The m4 code
doesn't handle this admittedly corner case, but let's fix it for anyone
trying to build new xfsprogs on a decade-old distribution.
Fixes: 409477af604f46 ("xfs_io: add support for atomic write statx fields") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:04 +0000 (13:22 -0800)]
mkfs: fix parsing of value-less -d/-l concurrency cli option
It's supposed to be possible to specify the -d concurrency option with
no value in order to get mkfs calculate the agcount from the number of
CPUs. Unfortunately I forgot to handle that case (optarg is null) so
mkfs crashes instead. Fix that.
Fixes: 9338bc8b1bf073 ("mkfs: allow sizing allocation groups for concurrency") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:04 +0000 (13:22 -0800)]
xfs_db: improve error message when unknown btree type given to btheight
I found accidentally that if you do this (note 'rmap', not 'rmapbt'):
xfs_db /dev/sda -c 'btheight -n 100 rmap'
The program spits back "Numerical result out of range". That's the
result of it failing to match "rmap" against a known btree type, and
falling back to parsing the string as if it were a btree geometry
description.
Improve this a little by checking that there's at least one semicolon in
the string so that the error message improves to:
"rmap: expected a btree geometry specification"
Fixes: cb1e69c564c1e0 ("xfs_db: add a function to compute btree geometry") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:04 +0000 (13:22 -0800)]
libxfs: fix uninit variable in libxfs_alloc_file_space
Fix this uninitialized variable.
Coverity-id: 1637359 Fixes: b48164b8cd7618 ("libxfs: resync libxfs_alloc_file_space interface with the kernel") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:03 +0000 (13:22 -0800)]
xfs_repair: don't obliterate return codes
Don't clobber error here, it's err2 that's the temporary variable.
Coverity-id: 1637363 Fixes: b790ab2a303d58 ("xfs_repair: support quota inodes in the metadata directory") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:03 +0000 (13:22 -0800)]
xfs_db: fix multiple dblock commands
Tom Samstag reported that running the following sequence of commands no
longer works quite right:
> inode [inodenum]
> dblock 0
> p
> dblock 1
> p
> dblock 2
> p
> [etc]
Mr. Samstag looked into the source code and discovered that the
dblock_f is incorrectly accessing iocur_top->data outside of the
push_cur -> set_cur_inode -> pop_cur sequence that this function uses to
compute the type of the file data. In other words, it's using
whatever's on top of the stack at the start of the function. For the
"dblock 0" case above this is the inode, but for the "dblock 1" case
this is the contents of file data block 0, not an inode.
Fix this by relocating the check to the correct place.
Reported-by: tom.samstag@netrise.io Tested-by: Tom Samstag <tom.samstag@netrise.io> Cc: <linux-xfs@vger.kernel.org> # v6.12.0 Fixes: b05a31722f5d4c ("xfs_db: access realtime file blocks") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Non-rtg file systems have a fake RT group even if they do not have a RT
device, and thus an rgcount of 1. Ensure xfs_update_last_rtgroup_size
doesn't fail when called for !XFS_RT to handle this case.
Fixes: 87fe4c34a383 ("xfs: create incore realtime group structures") Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Ojaswin Mujoo [Thu, 19 Dec 2024 12:39:14 +0000 (18:09 +0530)]
xfs_io: allow foreign FSes to show FS_IOC_FSGETXATTR details
Currently with stat we only show FS_IOC_FSGETXATTR details if the
filesystem is XFS. With extsize support also coming to ext4 and possibly
other filesystems, make sure to allow foreign FSes to display these details
when "stat" or "statx" is used.
(Thanks to Dave for suggesting implementation of print_extended_info())
Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:43 +0000 (16:24 -0800)]
mkfs: add quota flags when setting up filesystem
If we're creating a metadir filesystem, the quota accounting and
enforcement flags persist until the sysadmin changes them. Add a means
to specify those qflags at format time.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:42 +0000 (16:24 -0800)]
xfs_repair: support quota inodes in the metadata directory
Handle quota inodes on metadir filesystems. This means that we have to
discover whatever quota inodes exist by looking in /quotas instead of
the superblock, and mend any broken metadir tree links might exist.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:42 +0000 (16:24 -0800)]
xfs_repair: refactor quota inumber handling
In preparation for putting quota files in the metadata directory tree,
refactor repair's quota inumber handling to use its own variables
instead of the xfs_mount's.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:41 +0000 (16:24 -0800)]
mkfs: add headers to realtime bitmap blocks
When the rtgroups feature is enabled, format rtbitmap blocks with the
appropriate block headers. libxfs takes care of the actual writing for
us, so all we have to do is ensure that the bitmap is the correct size.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:40 +0000 (16:24 -0800)]
xfs_scrub: trim realtime volumes too
On the kernel side, the XFS realtime groups patchset added support for
FITRIM of the realtime volume. This support doesn't actually require
there to be any realtime groups, so teach scrub to run through the whole
region.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Use the good old array notations instead of pointer arithmetics.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
[djwong: fold scan_rtg_rmaps cleanups into next patch] Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Run the rtgroup metapath scrubber during phase 5 to ensure that any
rtgroup metadata files are still connected to the metadir tree after
we've pruned any bad links.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:40 +0000 (16:24 -0800)]
xfs_scrub: scrub realtime allocation group metadata
Scan realtime group metadata as part of phase 2, just like we do for AG
metadata. For pre-rtgroup filesystems, pretend that this is a "rtgroup
0" scrub request because the kernel expects that. Replace the old
cond_wait code with a scrub barrier because they're equivalent for two
items that cannot be scrubbed in parallel.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 11 Dec 2024 22:00:47 +0000 (14:00 -0800)]
xfs_mdrestore: refactor open-coded fd/is_file into a structure
Create an explicit object to track the fd and flags associated with a
device onto which we are restoring metadata, and use it to reduce the
amount of open-coded arguments to ->restore. This avoids some grossness
in the next patch.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:38 +0000 (16:24 -0800)]
xfs_db: report rt group and block number in the bmap command
The bmap command does not report startblocks for realtime files
correctly. If rtgroups are enabled, we need to use the appropriate
functions to crack the startblock into rtgroup and block numbers; if
not, then we need to report a linear address and not try to report a
group number.
Fix both of these issues.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:38 +0000 (16:24 -0800)]
xfs_db: metadump realtime devices
Teach the metadump device to dump the filesystem metadata of a realtime
device to the metadump file. Currently, this is limited to the realtime
superblock.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
xfs_db: metadump metadir rt bitmap and summary files
Don't skip dumping the data fork for regular files that are marked as
metadata inodes. This catches rtbitmap and summary inodes on rtgroup
enabled file systems where their inode numbers aren't recorded in the
superblock.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:36 +0000 (16:24 -0800)]
xfs_db: enable rtconvert to handle segmented rtblocks
Now that we've turned xfs_rtblock_t into a segmented address and
xfs_rtxnum_t into a per-rtgroup address, port the rtconvert debugger
command to handle the unit conversions correctly. Also add an example
of the bitmap/summary-related conversion commands to the manpage.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:36 +0000 (16:24 -0800)]
xfs_db: enable the rtblock and rtextent commands for segmented rt block numbers
Now that xfs_rtblock_t can be a segmented address, fix the validation in
rtblock_f to handle the inputs correctly; and fix rtextent_f to do all
of its conversions in linear address space.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:35 +0000 (16:24 -0800)]
xfs_repair: find and clobber rtgroup bitmap and summary files
On a rtgroups filesystem, if the rtgroups bitmap or summary files are
garbage, we need to clear the dinode and update the incore bitmap so
that we don't bother to check the old rt freespace metadata.
However, we regenerate the entire rt metadata directory tree during
phase 6. If the bitmap and summary files are ok, we still want to clear
the dinode, but we can still use the incore inode to check the old
freespace contents. Split the clear_dinode function into two pieces,
one that merely zeroes the inode, and the old clear_dinode now turns off
checking.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 11 Dec 2024 21:28:13 +0000 (13:28 -0800)]
xfs_repair: support realtime groups
Make repair aware of multiple rtgroups. This now uses the same code as
the AG-based data device for block usage tracking instead of the less
optimal AVL trees and bitmaps used for the traditonal RT device. This
is done by introducing similar per-rtgroup space tracking structures as
we have for the AGs.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Add a struct bmap that contains the btree root and the lock, and provide
helpers for loking instead of directly poking into the data structure.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
No need to cacheline align rt_lock if we move it next to the data
it protects. Also reduce the critical section to just where those
data structures are accessed.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Improve the reporting of discrepancies in the realtime bitmap and
summary files by creating a separate helper function that will pinpoint
the exact (word) locations of mismatches. This will help developers to
diagnose problems with the rtgroups feature and users to figure out
exactly what's bad in a filesystem.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Split out helpers to process all duplicate extents in an AG and the RT
duplicate extents.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:33 +0000 (16:24 -0800)]
libfrog: add bitmap_clear
Uncomment and fix bitmap_clear so that xfs_repair can start using it.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
[hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:33 +0000 (16:24 -0800)]
libxfs: implement some sanity checking for enormous rgcount
Similar to what we do for suspiciously large sb_agcount values, if
someone tries to get libxfs to load a filesystem with a very large
realtime group count, let's do some basic checks of the rt device to
see if it's really that large. If the read fails, only load the first
rtgroup and warn the user.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:27 +0000 (16:24 -0800)]
libxfs: use correct rtx count to block count conversion
Fix a place where we use the wrong conversion functions to convert
between a number of rt extents and a number of rt blocks. This isn't
really necessary since userspace cannot allocate rt extents, but let's
not leave a logic bomb.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
On a V5 filesystem with 64k fsblocks and 512 byte inodes, this results
in cluster_size = 8192 * (512 / 256) = 16384. As a result,
sb_spino_align and sb_inoalignmt are both set to zero. Unfortunately,
this trips the new sb_spino_align check that was just added to
xfs_validate_sb_common, and the mkfs fails:
Prior to commit 59e43f5479cce1 this all worked fine, even if "sparse"
inodes are somewhat meaningless when everything fits in a single
fsblock. Adjust the checks to handle existing filesystems.
Cc: <stable@vger.kernel.org> # v6.13-rc1 Fixes: 59e43f5479cce1 ("xfs: sb_spino_align is not verified") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec
would erroneously try to update the parent's key for a block that had
been split if we decided to insert the new record into the new block.
The solution was to detect this situation and update the in-core key
value that we pass up to the caller so that the caller will (eventually)
add the new block to the parent level of the tree with the correct key.
However, I missed a subtlety about the way inode-rooted btrees work. If
the full block was a maximally sized inode root block, we'll solve that
fullness by moving the root block's records to a new block, resizing the
root block, and updating the root to point to the new block. We don't
pass a pointer to the new block to the caller because that work has
already been done. The new record will /always/ land in the new block,
so in this case we need to use xfs_btree_update_keys to update the keys.
This bug can theoretically manifest itself in the very rare case that we
split a bmbt root block and the new record lands in the very first slot
of the new block, though I've never managed to trigger it in practice.
However, it is very easy to reproduce by running generic/522 with the
realtime rmapbt patchset if rtinherit=1.
Cc: <stable@vger.kernel.org> # v4.8 Fixes: 2c813ad66a7218 ("xfs: support btrees with overlapping intervals for keys") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
With the nrext64 feature enabled, it's possible for a data fork to have
2^48 extent mappings. Even with a 64k fsblock size, that maps out to
a bmbt containing more than 2^32 blocks. Therefore, this predicate must
return a u64 count to avoid an integer wraparound that will cause scrub
to do the wrong thing.
It's unlikely that any such filesystem currently exists, because the
incore bmbt would consume more than 64GB of kernel memory on its own,
and so far nobody except me has driven a filesystem that far, judging
from the lack of complaints.
Cc: <stable@vger.kernel.org> # v5.19 Fixes: df9ad5cc7a5240 ("xfs: Introduce macros to represent new maximum extent counts for data/attr forks") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Enable multigrain timestamps, which should ensure that there is an
apparent change to the timestamp whenever it has been written after
being actively observed via getattr.
Also, anytime the mtime changes, the ctime must also change, and those
are now the only two options for xfs_trans_ichgtime. Have that function
unconditionally bump the ctime, and ASSERT that XFS_ICHGTIME_CHG is
always set.
Finally, stop setting STATX_CHANGE_COOKIE in getattr, since the ctime
should give us better semantics now.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # documentation bits Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20241002-mgtime-v10-9-d1c4717f5284@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
The runt AG at the end of a filesystem is almost always smaller than
the mp->m_sb.sb_agblocks. Unfortunately, when setting the max_agbno
limit for the inode chunk allocation, we do not take this into
account. This means we can allocate a sparse inode chunk that
overlaps beyond the end of an AG. When we go to allocate an inode
from that sparse chunk, the irec fails validation because the
agbno of the start of the irec is beyond valid limits for the runt
AG.
Prevent this from happening by taking into account the size of the
runt AG when allocating inode chunks. Also convert the various
checks for valid inode chunk agbnos to use xfs_ag_block_count()
so that they will also catch such issues in the future.
Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Compat features are new features that older kernels can safely ignore,
allowing read-write mounts without issues. The current sb write validation
implementation returns -EFSCORRUPTED for unknown compat features,
preventing filesystem write operations and contradicting the feature's
definition.
Additionally, if the mounted image is unclean, the log recovery may need
to write to the superblock. Returning an error for unknown compat features
during sb write validation can cause mount failures.
Although XFS currently does not use compat feature flags, this issue
affects current kernels' ability to mount images that may use compat
feature flags in the future.
Since superblock read validation already warns about unknown compat
features, it's unnecessary to repeat this warning during write validation.
Therefore, the relevant code in write validation is being removed.
Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Create a separate section for space management btrees so that they're
not mixed in with file structures. Ignore the dsb stuff sprinkled
around for now, because we'll deal with that in a subsequent patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>