Repair the realtime rmap btree while mounted. Similar to the regular
rmap btree repair code, we walk the data fork mappings of every realtime
file in the filesystem to collect reverse-mapping records in an xfarray.
Then we sort the xfarray, and use the btree bulk loader to create a new
rtrmap btree ondisk. Finally, we swap the btree roots, and reap the old
blocks in the usual way.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Connect the map and unmap reverse-mapping operations to the realtime
rmapbt via the deferred operation callbacks. This enables us to
perform rmap operations against the correct btree.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Plumb in the pieces we need to embed the root of the realtime rmap btree
in an inode's data fork, complete with new metafile type and on-disk
interpretation functions.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Add a metadir path to select the realtime rmap btree inode and load
it at mount time. The rtrmapbt inode will have a unique extent format
code, which means that we also have to update the inode validation and
flush routines to look for it.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Create a new fork format type for metadata btrees. This fork type
requires that the inode is in the metadata directory tree, and only
applies to the data fork. The actual type of the metadata btree itself
is determined by the di_metatype field.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Create a helper function to turn a metadata file type code into a
printable string, and use this to complain about lockdep problems with
rtgroup inodes. We'll use this more in the next patch.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Prepare the high-level rmap functions to deal with the new realtime
rmapbt and its slightly different conventions. Provide the ability
to talk to either rmapbt or rtrmapbt formats from the same high
level code.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Implement the generic btree operations needed to manipulate rtrmap
btree blocks. This is different from the regular rmapbt in that we
allocate space from the filesystem at large, and are neither
constrained to the free space nor any particular AG.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Make sure that there's enough log reservation to handle mapping
and unmapping realtime extents. We have to reserve enough space
to handle a split in the rtrmapbt to add the record and a second
split in the regular rmapbt to record the rtrmapbt split.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Add the ondisk structure definitions for realtime rmap btrees. The
realtime rmap btree will be rooted from a hidden inode so it needs to
have a separate btree block magic and pointer format.
Next, add everything needed to read, write and manipulate rmap btree
blocks. This prepares the way for connecting the btree operations
implementation, though embedding the rtrmap btree root in the inode
comes later in the series.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Create a new space reservation scheme so that btree metadata for the
realtime volume can reserve space in the data device to avoid space
underruns.
Back when we were testing the rmap and refcount btrees for the data
device, people observed occasional shutdowns when xfs_btree_split was
called for either of those two btrees. This happened when certain
operations (mostly writeback ioends) created new rmap or refcount
records, which would expand the size of the btree. If there were no
free blocks available the allocation would fail and the split would shut
down the filesystem.
I considered pre-reserving blocks for btree expansion at the time of a
write() call, but there wasn't any good way to attach the reservations
to an inode and keep them there all the way to ioend processing. Unlike
delalloc reservations which have that indlen mechanism, there's no way
to do that for mapped extents; and indlen blocks are given back during
the delalloc -> unwritten transition.
The solution was to reserve sufficient blocks for rmap/refcount btree
expansion at mount time. This is what the XFS_AG_RESV_* flags provide;
any expansion of those two btrees can come from the pre-reserved space.
This patch brings that pre-reservation ability to inode-rooted btrees so
that the rt rmap and refcount btrees can also save room for future
expansion.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Add the necessary flags and code so that we can support storing leaf
records in the inode root block of a btree. This hasn't been necessary
before, but the realtime rmapbt will need to be able to do this.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Simplify the calling conventions by allowing callers to pass a fsbno
(xfs_fsblock_t) directly into these functions, since we're just going to
set it in a struct anyway.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Files participating in the metadata directory tree are not accounted to
the quota subsystem. Therefore, the i_[ugp]dquot pointers in struct
xfs_inode are never used and should always be NULL.
In the next patch we want to add a u64 count of fs blocks reserved for
metadata btree expansion, but we don't want every inode in the fs to pay
the memory price for this feature. The intent is to union those three
pointers with the u64 counter, but for that to work we must guard
against all access to the dquot pointers for metadata files.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Create some simple helpers to reduce the amount of typing whenever we
access rtgroup inodes. Conversion was done with this spatch and some
minor reformatting:
In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node child into the root block
to a separate function. Remove some unnecessary conditionals and clean
up a few function calls in the new function. Note that this change
reorders the ->free_block call with respect to the change in bc_nlevels
to make it easier to support inode root leaf blocks in the next patch.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
In preparation for allowing records in an inode btree root, hoist the
code that copies keyptrs from an existing node root into a child block
to a separate function. Note that the new function explicitly computes
the keys of the new child block and stores that in the root block; while
the bmap btree could rely on leaving the key alone, realtime rmap needs
to set the new high key.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Hoist out the code that migrates broot pointers during a resize
operation to avoid code duplication and streamline the caller. Also
use the correct bmbt pointer type for the sizeof operation.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Change the calling signature of xfs_iroot_realloc to take the ifork and
the new number of records in the btree block, not a diff against the
current number. This will make the callsites easier to understand.
Note that this function is misnamed because it is very specific to the
single type of inode-rooted btree supported. This will be addressed in
a subsequent patch.
Return the new btree root to reduce the amount of code clutter.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Hoist the code that allocates, frees, and reallocates if_broot into a
single xfs_iroot_krealloc function. Eventually we're going to push
xfs_iroot_realloc into the btree ops structure to handle multiple
inode-rooted btrees, but first let's separate out the bits that should
stay in xfs_inode_fork.c.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:44 +0000 (10:21 -0800)]
xfs_scrub: try harder to fill the bulkstat array with bulkstat()
Sometimes, the last bulkstat record returned by the first xfrog_bulkstat
call in bulkstat_for_inumbers will contain an inumber less than the
highest allocated inode mentioned in the inumbers record. This happens
either because the inodes have been freed, or because the the kernel
encountered a corrupt inode during bulkstat and stopped filling up the
array.
In both cases, we can call bulkstat again to try to fill up the rest of
the array. If there are newly allocated inodes, they'll be returned; if
we've truly hit the end of the filesystem, the kernel will return zero
records; and if the first allocated inode is indeed corrupt, the kernel
will return EFSCORRUPTED.
As an optimization to avoid the single-step code, call bulkstat with an
increasing ino parameter until the bulkstat array is full or the kernel
tells us there are no bulkstat records to return. This speeds things
up a bit in cases where the allocmask is all ones and only the second
inode is corrupt.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:44 +0000 (10:21 -0800)]
xfs_scrub: ignore freed inodes when single-stepping during phase 3
For inodes that inumbers told us were allocated but weren't loaded by
the bulkstat call, we fall back to loading bulkstat data one inode at a
time to try to find the inodes that are too corrupt to load.
However, there are a couple of outcomes of the single bulkstat call that
clearly indicate that the inode is free, not corrupt. In this case, the
phase 3 inode scan will try to scrub the inode, only to be told ENOENT
because it doesn't exist.
As an optimization here, don't increment ocount, just move on to the
next inode in the mask.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:43 +0000 (10:21 -0800)]
xfs_scrub: don't blow away new inodes in bulkstat_single_step
bulkstat_single_step has an ugly misfeature -- given the inumbers
record, it expects to find bulkstat data for those inodes, in the exact
order that they were specified in inumbers. If a new inode is created
after inumbers but before bulkstat, bulkstat will return stat data for
that inode, only to have bulkstat_single_step obliterate it. Then we
fail to scan that inode.
Instead, we should use the returned bulkstat array to compute a bitmask
of inodes that bulkstat had to have seen while it was walking the inobt.
An important detail is that any inode between the @ino parameter passed
to bulkstat and the last bulkstat record it returns was seen, even if no
bstat record was produced.
Any inode set in xi_allocmask but not set in the seen_mask is missing
and needs to be loaded. Load bstat data for those inodes into the /end/
of the array so that we don't obliterate bstat data for a newly created
inode, then re-sort the array so we always scan in ascending inumber
order.
Cc: <linux-xfs@vger.kernel.org> # v5.18.0 Fixes: 245c72a6eeb720 ("xfs_scrub: balance inode chunk scan across CPUs") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:43 +0000 (10:21 -0800)]
xfs_scrub: return early from bulkstat_for_inumbers if no bulkstat data
If bulkstat doesn't return an error code or any bulkstat records, we've
hit the end of the filesystem, so return early. This can happen if the
inumbers data came from the very last inobt record in the filesystem and
every inode in that inobt record is freed immediately after INUMBERS.
There's no bug here, it's just a minor optimization.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:43 +0000 (10:21 -0800)]
xfs_scrub: don't (re)set the bulkstat request icount incorrectly
Don't change the bulkstat request icount in bulkstat_for_inumbers
because alloc_ichunk already set it to LIBFROG_BULKSTAT_CHUNKSIZE.
Lowering it to xi_alloccount here means that we can miss inodes at the
end of the inumbers chunk if any are allocated to the same inobt record
after the inumbers call but before the bulkstat call.
Cc: <linux-xfs@vger.kernel.org> # v5.3.0 Fixes: e3724c8b82a320 ("xfs_scrub: refactor xfs_iterate_inodes_range_check") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:43 +0000 (10:21 -0800)]
xfs_scrub: don't double-scan inodes during phase 3
The bulkstat ioctl only allows us to specify the starting inode number
and the length of the bulkstat array. It is possible that a bulkstat
request for {startino = 30, icount = 10} will return stat data for inode
50. For most bulkstat users this is ok because they're marching
linearly across all inodes in the filesystem.
Unfortunately for scrub phase 3 this is undesirable because we only want
the inodes that belong to a specific inobt record because we need to
know about inodes that are marked as allocated but are too corrupt to
appear in the bulkstat output. Another worker will process the inobt
record(s) that corresponds to the extra inodes, which means we can
double-scan some inodes.
Therefore, bulkstat_for_inumbers should trim out inodes that don't
correspond to the inumbers record that it is given.
Cc: <linux-xfs@vger.kernel.org> # v5.3.0 Fixes: e3724c8b82a320 ("xfs_scrub: refactor xfs_iterate_inodes_range_check") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:42 +0000 (10:21 -0800)]
xfs_scrub: actually iterate all the bulkstat records
In scan_ag_bulkstat, we have a for loop that iterates all the
xfs_bulkstat records in breq->bulkstat. The loop condition test should
test against the array length, not the number of bits set in an
unrelated data structure. If ocount > xi_alloccount then we miss some
inodes; if ocount < xi_alloccount then we've walked off the end of the
array.
Cc: <linux-xfs@vger.kernel.org> # v5.18.0 Fixes: 245c72a6eeb720 ("xfs_scrub: balance inode chunk scan across CPUs") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:42 +0000 (10:21 -0800)]
xfs_scrub: selectively re-run bulkstat after re-running inumbers
In the phase 3 inode scan, don't bother retrying the inumbers ->
bulkstat conversion unless inumbers returns the same startino and there
are allocated inodes. If inumbers returns data for a totally different
inobt record, that means the whole inode chunk was freed.
Cc: <linux-xfs@vger.kernel.org> # v5.18.0 Fixes: 245c72a6eeb720 ("xfs_scrub: balance inode chunk scan across CPUs") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:42 +0000 (10:21 -0800)]
xfs_scrub: remove flags argument from scrub_scan_all_inodes
Now that there's only one caller of scrub_scan_all_inodes, remove the
single defined flag because it can set the METADIR bulkstat flag if
needed. Clarify in the documentation that this is a special purpose
inode iterator that picks up things that don't normally happen.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:42 +0000 (10:21 -0800)]
xfs_scrub: call bulkstat directly if we're only scanning user files
Christoph observed xfs_scrub phase 5 consuming a lot of CPU time on a
filesystem with a very large number of rtgroups. He traced this to
bulkstat_for_inumbers spending a lot of time trying to single-step
through inodes that were marked allocated in the inumbers record but
didn't show up in the bulkstat data. These correspond to files in the
metadata directory tree that are not returned by the regular bulkstat.
This complex machinery isn't necessary for the inode walk that occur
during phase 5 because phase 5 wants to open user files and check the
dirent/xattr names associated with that file. It's not needed for phase
6 because we're only using it to report data loss in unlinked files when
parent pointers aren't enabled.
Furthermore, we don't need to do this inumbers -> bulkstat dance because
phase 3 and 4 supposedly fixed any inode that was to corrupt to be
igettable and hence reported on by bulkstat.
Fix this by creating a simpler user file iterator that walks bulkstat
across the filesystem without using inumbers. While we're at it, fix
the obviously incorrect comments in inodes.h.
Cc: <linux-xfs@vger.kernel.org> # v4.15.0 Fixes: 372d4ba99155b2 ("xfs_scrub: add inode iteration functions") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:42 +0000 (10:21 -0800)]
xfs_scrub: don't report data loss in unlinked inodes twice
If parent pointers are enabled, report_ioerr_fsmap will report lost file
data and xattrs for all files, having used the parent pointer ioctls to
generate the path of the lost file. For unlinked files, the path lookup
will fail, but we'll report the inumber of the file that lost data.
Therefore, we don't need to do a separate scan of the unlinked inodes
in report_all_media_errors after doing the fsmap scan.
Cc: <linux-xfs@vger.kernel.org> # v6.10.0 Fixes: 9b5d1349ca5fb1 ("xfs_scrub: use parent pointers to report lost file data") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:41 +0000 (10:21 -0800)]
libfrog: wrap handle construction code
Clean up all the open-coded logic to construct a file handle from a
fshandle and some bulkstat/parent pointer information. The new
functions are stashed in a private header file to avoid leaking the
details of xfs_handle construction in the public libhandle headers.
I tried moving the code to libhandle, but I don't entirely like the
result. The libhandle functions pass around handles as arbitrary binary
blobs that come from and are sent to the kernel, meaning that the
interface is full of (void *, size_t) tuples. Putting these new
functions in libhandle breaks that abstraction because now clients know
that they can deal with a struct xfs_handle.
We could fix that leak by changing it to a (void *, size_t) tuple, but
then we'd have to validate the size_t or returns -1 having set errno,
which then means that all the client code now has to have error handling
for a case that we're fairly sure can't be true. This is overkill for
xfsprogs code that knows better, because we can trust ourselves to know
the exact layout of a handle.
ret = handle_from_fshandle(&handle, file->fshandle,
file->fshandle_len);
if (ret) {
perror("what?");
return -1;
}
Which is much more verbose code, and right now it exists to handle an
exceptional condition that is not possible. If someone outside of
xfsprogs would like this sort of functionality in libhandle I'm all for
adding it, but with zero demand from external users, I prefer to keep
things simple.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:41 +0000 (10:21 -0800)]
libxfs: unmap xmbuf pages to avoid disaster
It turns out that there's a maximum mappings count, so we need to be
smartish about not overflowing that with too many xmbuf buffers. This
needs to be a global value because high-agcount filesystems will create
a large number of xmbuf caches but this is a process-global limit.
Cc: <linux-xfs@vger.kernel.org> # v6.9.0 Fixes: 124b388dac17f5 ("libxfs: support in-memory buffer cache targets") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 24 Feb 2025 18:21:41 +0000 (10:21 -0800)]
xfs_db: obfuscate rt superblock label when metadumping
Metadump can obfuscate the filesystem label on all the superblocks on
the data device, so it must perform the same transformation on the
realtime device superblock to avoid leaking information and so that the
mdrestored filesystem is consistent.
Found by running xfs/503 with realtime turned on and a patch to set
labels on common/populated filesystem images.
Cc: <linux-xfs@vger.kernel.org> # v6.13.0 Fixes: 6bc20c5edbab51 ("xfs_db: metadump realtime devices") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 3 Feb 2025 22:40:55 +0000 (14:40 -0800)]
xfs_protofile: fix device number encoding
Actually crack major/minor device numbers from the stat results that we
get when we encounter a character/block device file.
Fixes: 6aace700b7b82d ("mkfs: add a utility to generate protofiles") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 3 Feb 2025 22:40:39 +0000 (14:40 -0800)]
xfs_protofile: fix mode formatting error
The protofile parser expects the mode to be specified with three octal
digits. Unfortunately, the generator doesn't get that right if the mode
doesn't have any of bits 8-11 (aka no owner access privileges) set.
Fixes: 6aace700b7b82d ("mkfs: add a utility to generate protofiles") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 3 Feb 2025 22:40:24 +0000 (14:40 -0800)]
mkfs: fix file size setting when interpreting a protofile
When we're copying a regular file into the filesystem, we should set the
size of the new file to the size indicated by the stat data, not the
highest offset written, because we now use SEEK_DATA/HOLE to ignore
sparse regions.
Fixes: 73fb78e5ee8940 ("mkfs: support copying in large or sparse files") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 29 Jan 2025 00:48:26 +0000 (16:48 -0800)]
xfs_repair: require zeroed quota/rt inodes in metadir superblocks
If metadata directory trees are enabled, the superblock inode pointers
to quota and rt free space metadata must all be zero. The only inode
pointers in the superblock are sb_rootino and sb_metadirino.
Found by running xfs/418.
Fixes: b790ab2a303d58 ("xfs_repair: support quota inodes in the metadata directory") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
mkfs: use a default sector size that is also suitable for the rtdev
When creating a filesytem where the data device has a sector size
smalle than that of the RT device without further options, mkfs
currently fails with:
mkfs.xfs: error - cannot set blocksize 512 on block device $RTDEV: Invalid argument
This is because XFS sets the sector size based on logical block size
of the data device, but not that of the RT device. Change the code
so that is uses the larger of the two values.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 27 Jan 2025 21:36:34 +0000 (13:36 -0800)]
xfs_scrub_all.timer: don't run if /var/lib/xfsprogs is readonly
The xfs_scrub_all program wants to write a state file into the package
state dir to keep track of how recently it performed a media scan.
Don't allow the systemd timer to run if that path isn't writable.
Cc: linux-xfs@vger.kernel.org # v6.10.0 Fixes: 267ae610a3d90f ("xfs_scrub_all: enable periodic file data scrubs automatically") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Chi Zhiling [Thu, 16 Jan 2025 09:09:39 +0000 (17:09 +0800)]
xfs_logprint: Fix super block buffer interpretation issue
When using xfs_logprint to interpret the buffer of the super block, the
icount will always be 6360863066640355328 (0x5846534200001000). This is
because the offset of icount is incorrect, causing xfs_logprint to
misinterpret the MAGIC number as icount.
This patch fixes the offset value of the SB counters in xfs_logprint.
After this patch:
icount: 10240 ifree: 4906 fdblks: 37 frext: 0
Suggested-by: Darrick J. Wong <djwong@kernel.org> Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:05 +0000 (13:22 -0800)]
mkfs: allow sizing realtime allocation groups for concurrency
Add a -r concurrency= option to mkfs so that sysadmins can configure the
filesystem so that there are enough rtgroups that the specified number
of threads can (in theory) can find an uncontended rtgroup from which to
allocate space. This has the exact same purpose as the -d concurrency
switch that was added for the data device.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:05 +0000 (13:22 -0800)]
build: initialize stack variables to zero by default
Newer versions of gcc and clang can include the ability to zero stack
variables by default. Let's enable it so that we (a) reduce the risk of
writing stack contents to disk somewhere and (b) try to reduce
unpredictable program behavior based on random stack contents. The
kernel added this 6 years ago, so I think it's mature enough for
xfsprogs.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reluctantly-Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:05 +0000 (13:22 -0800)]
m4: fix statx override selection if /usr/include doesn't define it
If the system headers (aka the ones in /usr/include) do not define
struct statx at all, we need to use our internal override. The m4 code
doesn't handle this admittedly corner case, but let's fix it for anyone
trying to build new xfsprogs on a decade-old distribution.
Fixes: 409477af604f46 ("xfs_io: add support for atomic write statx fields") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:04 +0000 (13:22 -0800)]
mkfs: fix parsing of value-less -d/-l concurrency cli option
It's supposed to be possible to specify the -d concurrency option with
no value in order to get mkfs calculate the agcount from the number of
CPUs. Unfortunately I forgot to handle that case (optarg is null) so
mkfs crashes instead. Fix that.
Fixes: 9338bc8b1bf073 ("mkfs: allow sizing allocation groups for concurrency") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:04 +0000 (13:22 -0800)]
xfs_db: improve error message when unknown btree type given to btheight
I found accidentally that if you do this (note 'rmap', not 'rmapbt'):
xfs_db /dev/sda -c 'btheight -n 100 rmap'
The program spits back "Numerical result out of range". That's the
result of it failing to match "rmap" against a known btree type, and
falling back to parsing the string as if it were a btree geometry
description.
Improve this a little by checking that there's at least one semicolon in
the string so that the error message improves to:
"rmap: expected a btree geometry specification"
Fixes: cb1e69c564c1e0 ("xfs_db: add a function to compute btree geometry") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:04 +0000 (13:22 -0800)]
libxfs: fix uninit variable in libxfs_alloc_file_space
Fix this uninitialized variable.
Coverity-id: 1637359 Fixes: b48164b8cd7618 ("libxfs: resync libxfs_alloc_file_space interface with the kernel") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:03 +0000 (13:22 -0800)]
xfs_repair: don't obliterate return codes
Don't clobber error here, it's err2 that's the temporary variable.
Coverity-id: 1637363 Fixes: b790ab2a303d58 ("xfs_repair: support quota inodes in the metadata directory") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 16 Jan 2025 21:22:03 +0000 (13:22 -0800)]
xfs_db: fix multiple dblock commands
Tom Samstag reported that running the following sequence of commands no
longer works quite right:
> inode [inodenum]
> dblock 0
> p
> dblock 1
> p
> dblock 2
> p
> [etc]
Mr. Samstag looked into the source code and discovered that the
dblock_f is incorrectly accessing iocur_top->data outside of the
push_cur -> set_cur_inode -> pop_cur sequence that this function uses to
compute the type of the file data. In other words, it's using
whatever's on top of the stack at the start of the function. For the
"dblock 0" case above this is the inode, but for the "dblock 1" case
this is the contents of file data block 0, not an inode.
Fix this by relocating the check to the correct place.
Reported-by: tom.samstag@netrise.io Tested-by: Tom Samstag <tom.samstag@netrise.io> Cc: <linux-xfs@vger.kernel.org> # v6.12.0 Fixes: b05a31722f5d4c ("xfs_db: access realtime file blocks") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Non-rtg file systems have a fake RT group even if they do not have a RT
device, and thus an rgcount of 1. Ensure xfs_update_last_rtgroup_size
doesn't fail when called for !XFS_RT to handle this case.
Fixes: 87fe4c34a383 ("xfs: create incore realtime group structures") Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Ojaswin Mujoo [Thu, 19 Dec 2024 12:39:14 +0000 (18:09 +0530)]
xfs_io: allow foreign FSes to show FS_IOC_FSGETXATTR details
Currently with stat we only show FS_IOC_FSGETXATTR details if the
filesystem is XFS. With extsize support also coming to ext4 and possibly
other filesystems, make sure to allow foreign FSes to display these details
when "stat" or "statx" is used.
(Thanks to Dave for suggesting implementation of print_extended_info())
Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:43 +0000 (16:24 -0800)]
mkfs: add quota flags when setting up filesystem
If we're creating a metadir filesystem, the quota accounting and
enforcement flags persist until the sysadmin changes them. Add a means
to specify those qflags at format time.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:42 +0000 (16:24 -0800)]
xfs_repair: support quota inodes in the metadata directory
Handle quota inodes on metadir filesystems. This means that we have to
discover whatever quota inodes exist by looking in /quotas instead of
the superblock, and mend any broken metadir tree links might exist.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:42 +0000 (16:24 -0800)]
xfs_repair: refactor quota inumber handling
In preparation for putting quota files in the metadata directory tree,
refactor repair's quota inumber handling to use its own variables
instead of the xfs_mount's.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:41 +0000 (16:24 -0800)]
mkfs: add headers to realtime bitmap blocks
When the rtgroups feature is enabled, format rtbitmap blocks with the
appropriate block headers. libxfs takes care of the actual writing for
us, so all we have to do is ensure that the bitmap is the correct size.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:40 +0000 (16:24 -0800)]
xfs_scrub: trim realtime volumes too
On the kernel side, the XFS realtime groups patchset added support for
FITRIM of the realtime volume. This support doesn't actually require
there to be any realtime groups, so teach scrub to run through the whole
region.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Use the good old array notations instead of pointer arithmetics.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
[djwong: fold scan_rtg_rmaps cleanups into next patch] Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Run the rtgroup metapath scrubber during phase 5 to ensure that any
rtgroup metadata files are still connected to the metadir tree after
we've pruned any bad links.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:40 +0000 (16:24 -0800)]
xfs_scrub: scrub realtime allocation group metadata
Scan realtime group metadata as part of phase 2, just like we do for AG
metadata. For pre-rtgroup filesystems, pretend that this is a "rtgroup
0" scrub request because the kernel expects that. Replace the old
cond_wait code with a scrub barrier because they're equivalent for two
items that cannot be scrubbed in parallel.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 11 Dec 2024 22:00:47 +0000 (14:00 -0800)]
xfs_mdrestore: refactor open-coded fd/is_file into a structure
Create an explicit object to track the fd and flags associated with a
device onto which we are restoring metadata, and use it to reduce the
amount of open-coded arguments to ->restore. This avoids some grossness
in the next patch.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:38 +0000 (16:24 -0800)]
xfs_db: report rt group and block number in the bmap command
The bmap command does not report startblocks for realtime files
correctly. If rtgroups are enabled, we need to use the appropriate
functions to crack the startblock into rtgroup and block numbers; if
not, then we need to report a linear address and not try to report a
group number.
Fix both of these issues.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 21 Nov 2024 00:24:38 +0000 (16:24 -0800)]
xfs_db: metadump realtime devices
Teach the metadump device to dump the filesystem metadata of a realtime
device to the metadump file. Currently, this is limited to the realtime
superblock.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
xfs_db: metadump metadir rt bitmap and summary files
Don't skip dumping the data fork for regular files that are marked as
metadata inodes. This catches rtbitmap and summary inodes on rtgroup
enabled file systems where their inode numbers aren't recorded in the
superblock.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>