Note that MIN-IO == PHY-SEC, so dsunit/dswidth are zero. With this
change, we no longer set the lsunit to the fsblock size if the log
sector size is greater than 512. Unfortunately, dsunit is also not set,
so mkfs never sets the log sunit and it remains zero. I think
this causes problems with the log roundoff computation in the kernel:
because now the roundoff factor is less than the log sector size. After
a while, the filesystem cannot be mounted anymore because:
XFS (sda3): Mounting V5 Filesystem 81b8ffa8-383b-4574-a68c-9b8202707a26
XFS (sda3): Corruption warning: Metadata has LSN (4:2729) ahead of current LSN (4:2727). Please unmount and run xfs_repair (>= v4.3) to resolve.
XFS (sda3): log mount/recovery failed: error -22
XFS (sda3): log mount failed
Reverting this patch makes the problem go away, but I think you're
trying to make it so that mkfs will set lsunit = dsunit if dsunit>0 and
the caller didn't specify any -lsunit= parameter, right?
But there's something that just seems off with this whole function. If
the user provided a -lsunit/-lsu option then we need to validate the
value and either use it if it makes sense, or complain if not. If the
user didn't specify any option, then we should figure it out
automatically from the other data device geometry options (internal) or
the external log device probing.
But that's not what this function does. Why would you do this:
and then loudly validate that lsu (bytes) is congruent with the fsblock
size? This is trivially true, but then it disables the "make lsunit use
dsunit if set" logic below:
} else if (cfg->sb_feat.log_version == 2 &&
cfg->loginternal && cfg->dsunit) {
/* lsunit and dsunit now in fs blocks */
cfg->lsunit = cfg->dsunit;
}
AFAICT, the "lsunit matches fs block size" logic is buggy. This code
was added with no justification as part of a "reworking" commit 2f44b1b0e5adc4 ("mkfs: rework stripe calculations") back in 2017. I
think the correct logic is to move the "lsunit matches fs block size"
logic to the no-lsunit-option code after the validation code.
This seems to set sb_logsunit to 4096 on my test VM, to 0 on the even
more boring VMs with 512 physical sectors, and to 262144 with the
scsi_debug device that Lukas Herbolt created with:
Darrick J. Wong [Mon, 2 Mar 2026 20:46:56 +0000 (12:46 -0800)]
mkfs: fix protofile data corruption when in/out file block sizes don't match
As written in 73fb78e5ee8940, if libxfs_file_write is passed an
unaligned file range to write, it will zero the unaligned regions at the
head and tail of the block. This is what we want for a newly allocated
(and hence unwritten) block, but this is definitely not what we want
if some other part of the block has already been written.
Fix this by extending the data/hole_pos range to be aligned to the block
size of the new filesystem. This means we read slightly more, but we
never rewrite blocks in the new filesystem, sidestepping the behavior.
Found by xfs/841 when the test filesystem has a 1k fsblock size.
Cc: <linux-xfs@vger.kernel.org> # v6.13.0 Fixes: 73fb78e5ee8940 ("mkfs: support copying in large or sparse files") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 2 Mar 2026 20:55:34 +0000 (12:55 -0800)]
libxfs: fix data corruption bug in libxfs_file_write
libxfs_file_write tries to initialize the entire file block buffer,
which includes zeroing the head portion if @pos is not aligned to the
filesystem block size. However, @buf is the file data to copy in at
position @pos, not the position of the file block. Therefore, block_off
should be added to b_addr, not buf.
Cc: <linux-xfs@vger.kernel.org> # v6.13.0 Fixes: 73fb78e5ee8940 ("mkfs: support copying in large or sparse files") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Bastian Germann [Fri, 20 Feb 2026 17:17:10 +0000 (18:17 +0100)]
debian: Drop Uploader: Bastian Germann
I am no longer uploading the package to Debian.
The package is the same except for debian/upstream/signing-key.asc
which I have kept on the actual signer's key for the releases.
Signed-off-by: Bastian Germann <bage@debian.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Sparse inode cluster allocation sets min/max agbno values to avoid
allocating an inode cluster that might map to an invalid inode
chunk. For example, we can't have an inode record mapped to agbno 0
or that extends past the end of a runt AG of misaligned size.
The initial calculation of max_agbno is unnecessarily conservative,
however. This has triggered a corner case allocation failure where a
small runt AG (i.e. 2063 blocks) is mostly full save for an extent
to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this
case, which happens to be the offset of the last possible valid
inode chunk in the AG. In practice, we should be able to allocate
the 4-block cluster at agbno 2052 to map to the parent inode record
at agbno 2048, but the max_agbno value precludes it.
Note that this can result in filesystem shutdown via dirty trans
cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk
all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because
the tail AG selection by the allocator sets t_highest_agno on the
transaction. If the inode allocator spins around and finds an inode
chunk with free inodes in an earlier AG, the subsequent dir name
creation path may still fail to allocate due to the AG restriction
and cancel.
To avoid this problem, update the max_agbno calculation to the agbno
prior to the last chunk aligned agbno in the AG. This is not
necessarily the last valid allocation target for a sparse chunk, but
since inode chunks (i.e. records) are chunk aligned and sparse
allocs are cluster sized/aligned, this allows the sb_spino_align
alignment restriction to take over and round down the max effective
agbno to within the last valid inode chunk in the AG.
Note that even though the allocator improvements in the
aforementioned commit seem to avoid this particular dirty trans
cancel situation, the max_agbno logic improvement still applies as
we should be able to allocate from an AG that has been appropriately
selected. The more important target for this patch however are
older/stable kernels prior to this allocator rework/improvement.
Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure") Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
__xfs_rtgroup_extents is not used outside of xfs_rtgroup.c, so mark it
static. Move it and xfs_rtgroup_extents up in the file to avoid forward
declarations.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Garbage collection assumes all zones contain the full amount of blocks.
Mkfs already ensures this happens, but make the kernel check it as well
to avoid getting into trouble due to fuzzers or mkfs bugs.
Fixes: 2167eaabe2fa ("xfs: define the zoned on-disk format") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
We can easily check if there are any reclaimble zones by just looking
at the used counters in the reclaim buckets, so do that to free up the
xarray mark we currently use for this purpose.
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
xlog_in_core_2_t is a really odd type, not only is it grossly
misnamed because it actually is an on-disk structure, but it also
reprents the actual on-disk structure in a rather odd way.
I.e., the ext headers are a variable sized array at the end of the
header. So instead of declaring a union of xlog_rec_header,
xlog_rec_ext_header and padding to BBSIZE, add the proper padding to
struct struct xlog_rec_header and struct xlog_rec_ext_header, and
add a variable sized array of the latter to the former. This also
exposes the somewhat unusual scope of the log checksums, which is
made explicitly now by adding proper padding and macro designating
the actual payload length.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
The XLOG_HEADER_CYCLE_SIZE / BBSIZE expression is used a lot
in the log code, give it a symbolic name.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
The xfs_dquot structure currently uses the anti-pattern of using the
in-object lock that protects the content to also serialize reference
count updates for the structure, leading to a cumbersome free path.
This is partially papered over by the fact that we never free the dquot
directly but always through the LRU. Switch to use a lockref instead and
move the reference counter manipulations out of q_qlock.
To make this work, xfs_qm_flush_one and xfs_qm_flush_one are converted to
acquire a dquot reference while flushing to integrate with the lockref
"get if not dead" scheme.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
iomap_zero_range() has to cover various corner cases that are
difficult to test on production kernels because it is used in fairly
limited use cases. For example, it is currently only used by XFS and
mostly only in partial block zeroing cases.
While it's possible to test most of these functional cases, we can
provide more robust test coverage by co-opting fallocate zero range
to invoke zeroing of the entire range instead of the more efficient
block punch/allocate sequence. Add an errortag to occasionally
invoke forced zeroing.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
Lukas Herbolt [Thu, 19 Feb 2026 11:44:09 +0000 (12:44 +0100)]
mkfs.xfs fix sunit size on 512e and 4kN disks.
Creating of XFS on 4kN or 512e disk result in suboptimal LSU/LSUNIT.
As of now we check if the sectorsize is bigger than XLOG_HEADER_SIZE
and so we set lsu to blocksize. But we do not check the the size if
lsunit can be bigger to fit the disk geometry.
Signed-off-by: Lukas Herbolt <lukas@herbolt.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Mon, 2 Feb 2026 19:14:05 +0000 (11:14 -0800)]
xfs_scrub_all: fix non-service-mode arguments to xfs_scrub
Back in commit 7da76e2745d6a7, we changed the default arguments to
xfs_scrub for the xfs_scrub@ service to derive the fix/preen/check mode
from the "autofsck" filesystem property instead of hardcoding "-p".
Unfortunately, I forgot to make the same update for xfs_scrub_all being
run from the CLI and directly invoking xfs_scrub.
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1125314 Cc: linux-xfs@vger.kernel.org # v6.10.0 Fixes: 7da76e2745d6a7 ("xfs_scrub: use the autofsck fsproperty to select mode") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Modify the function xfrog_report_zones() to default to always trying
first a cached report zones using the BLKREPORTZONEV2 ioctl.
If the kernel does not support BLKREPORTZONEV2, fall back to the
(slower) regular report zones BLKREPORTZONE ioctl.
TO enable this feature even if xfsprogs is compiled on a system where
linux/blkzoned.h does not define BLKREPORTZONEV2, this ioctl is defined
in libfrog/zones.h, together with the BLK_ZONE_REP_CACHED flag and the
BLK_ZONE_COND_ACTIVE zone condition.
Since a cached report zone always return the condition
BLK_ZONE_COND_ACTIVE for any zone that is implicitly open, explicitly
open or closed, the function xfs_zone_validate_seq() is modified to
handle this new condition as being equivalent to the implicit open,
explicit open or closed conditions.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
[hch: don't try cached reporting again if not supported] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Damien Le Moal [Wed, 28 Jan 2026 04:32:58 +0000 (05:32 +0100)]
libfrog: lift common zone reporting code from mkfs and repair
Define the new helper function xfrog_report_zones() to report zones of
a zoned block device. This function is implemented in the new file
libfrog/zones.c and defined in the header file libfrog/zones.h and
use it from mkfs and repair instead of the previous open coded versions.
xfrog_report_zones() allocates and returns a struct blk_zone_report
structure, which can be be reused by subsequent invocations. It is the
responsibility of the caller to free this structure after use.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
[hch: refactored to allow buffer reuse] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Damien Le Moal [Wed, 28 Jan 2026 04:32:57 +0000 (05:32 +0100)]
mkfs: remove unnecessary return value affectation
The function report_zones() in mkfs/xfs_mkfs.c is a void function. So
there is no need to set the variable ret to -EIO before returning if
fstat() fails.
Fixes: 2e5a737a61d3 ("xfs_mkfs: support creating file system with zoned RT devices") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de>
Modify xfs_mount_zones() to replace the call to blkdev_report_zones()
with blkdev_report_zones_cached() to speed-up mount operations.
Since this causes xfs_zone_validate_seq() to see zones with the
BLK_ZONE_COND_ACTIVE condition, this function is also modified to acept
this condition as valid.
With this change, mounting a freshly formatted large capacity (30 TB)
SMR HDD completes under 2s compared to over 4.7s before.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Hellwig <hch@lst.de>
We'll need to conditionally add definitions added in later version of
blkzoned.h soon. The right place for that is platform_defs.h, which
means blkzoned.h needs to be included there for cpp trickery to work.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Darrick J. Wong [Tue, 20 Jan 2026 17:51:51 +0000 (09:51 -0800)]
debian: don't explicitly reload systemd from postinst
Now that we use dh_installsystemd, it's no longer necessary to run
systemctl daemon-reload explicitly from postinst because
dh_installsystemd will inject that into the DEBHELPER section on its
own.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Tue, 20 Jan 2026 17:51:35 +0000 (09:51 -0800)]
xfs_mdrestore: fix restoration on filesystems with 4k sectors
Running xfs/129 on a disk with 4k LBAs produces the following failure:
--- /run/fstests/bin/tests/xfs/129.out 2025-07-15 14:41:40.210489431 -0700
+++ /run/fstests/logs/xfs/129.out.bad 2026-01-05 21:43:08.814485633 -0800
@@ -2,3 +2,8 @@ QA output created by 129
Create the original file blocks
Reflink every other block
Create metadump file, restore it and check restored fs
+xfs_mdrestore: Invalid superblock disk address/length
+mount: /opt: can't read superblock on /dev/loop0.
+ dmesg(1) may have more information after failed mount system call.
+mount /dev/loop0 /opt failed
+(see /run/fstests/logs/xfs/129.full for details)
This is a failure to restore a v2 metadump to /dev/loop0. Looking at
the metadump itself, the first xfs_meta_extent contains:
{
.xme_addr = 0,
.xme_len = 8,
}
Hrm. This is the primary superblock on the data device, with a length
of 8x512B = 4K. The original filesystem has this geometry:
In other words, a sector size of 4k because the device's LBA size is 4k.
Regrettably, the metadump validation in mdrestore assumes that the
primary superblock is only 512 bytes long, which is not correct for this
scenario.
Fix this by allowing an xme_len value of up to the maximum sector size
for xfs, which is 32k. Also remove a redundant and confusing mask check
for the xme_addr.
Note that this error was masked (at least on little-endian platforms
that most of us test on) until recent commit 98f05de13e7815 ("mdrestore:
fix restore_v2() superblock length check") which is why I didn't spot it
earlier.
Cc: linux-xfs@vger.kernel.org # v6.6.0 Fixes: fa9f484b79123c ("mdrestore: Define mdrestore ops for v2 format") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Tue, 20 Jan 2026 17:51:19 +0000 (09:51 -0800)]
mkfs: quiet down warning about insufficient write zones
xfs/067 fails with the following weird mkfs message:
--- tests/xfs/067.out 2025-07-15 14:41:40.191273467 -0700
+++ /run/fstests/logs/xfs/067.out.bad 2026-01-06 16:59:11.907677987 -0800
@@ -1,4 +1,8 @@
QA output created by 067
+Warning: not enough zones (134/133) for backing requested rt size due to
+over-provisioning needs, writable size will be less than (null)
+Warning: not enough zones (134/133) for backing requested rt size due to
+over-provisioning needs, writable size will be less than (null)
In this case, MKFS_OPTIONS is set to: "-rrtdev=/dev/sdb4 -m
metadir=1,autofsck=1,uquota,gquota,pquota -d rtinherit=1 -r zoned=1
/dev/sda4"
In other words, we didn't pass an explicit rt volume size to mkfs, so
the message is a bit bogus. Let's skip printing the message when
the user did not provide an explicit rtsize parameter.
Cc: linux-xfs@vger.kernel.org # v6.18.0 Fixes: b5d372d96db1ad ("mkfs: adjust_nr_zones for zoned file system on conventional devices") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Tue, 20 Jan 2026 17:51:04 +0000 (09:51 -0800)]
xfs_logprint: print log data to the screen in host-endian order
Add a cli option so that users won't have to byteswap u32 values when
they're digging through broken logs on little-endian systems. Also make
it more obvious which column is the offset and which are the byte(s).
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Tue, 20 Jan 2026 17:50:48 +0000 (09:50 -0800)]
mkfs: set rtstart from user-specified dblocks
generic/211 fails to format the disk on a system with an internal zoned
device. Poking through the shell scripts, it's apparently doing this:
# mkfs.xfs -d size=629145600 -r size=629145600 -b size=4096 -m metadir=1,autofsck=1,uquota,gquota,pquota, -r zoned=1 -d rtinherit=1 /dev/sdd
size 629145600 specified for data subvolume is too large, maximum is 131072 blocks
Strange -- we asked for 629M data and rt sections, the device is 20GB in
size, but it claims insufficient space in the data subvolume.
Further analysis shows that open_devices is setting rtstart to 1% of the
size of the data volume (or no less than 300M) and rounding that up to
the nearest power of two (512M). Hence the 131072 number.
But wait, we said that we wanted a 629M data section. Let's set rtstart
to the same value if the user didn't already provide one, instead of
using the default value.
Cc: linux-xfs@vger.kernel.org # v6.15.0 Fixes: 2e5a737a61d34e ("xfs_mkfs: support creating file system with zoned RT devices") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
mkfs: adjust_nr_zones for zoned file system on conventional devices
When creating zoned file systems on conventional devices, mkfs doesn't
currently align the RT device size to the zone size, which can create
unmountable file systems. Fix this by moving the rgcount modification
to account for reserved zoned and then calling adjust_nr_zones
unconditionally, and thus ensuring that the rtblocks and rtextents values
are guaranteed to always be a multiple of the zone size.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Dec 2025 20:57:38 +0000 (12:57 -0800)]
xfs_logprint: fix pointer bug
generic/055 captures a crash in xfs_logprint due to an incorrect
refactoring trying to increment a pointer-to-pointer whereas before it
incremented a pointer.
Fixes: 5a9b7e95140893 ("logprint: factor out a xlog_print_op helper") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
On big-endian architectures (e.g. s390x), restoring a filesystem from a
v2 metadump fails with "Invalid superblock disk address/length". This is
caused by restore_v2() treating a superblock extent length of 1 as an
error, even though a length of 1 is expected because the superblock fits
within a 512-byte sector.
On little-endian systems, the same raw extent length bytes that represent
a value of 1 on big-endian are misinterpreted as 16777216 due to byte
ordering, so the faulty check never triggers there and the bug is hidden.
Fix the issue by using an endian-correct comparison of xme_len so that
the superblock extent length is validated properly and consistently on
all architectures.
Signed-off-by: Pavel Reichl <preichl@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
repair: add canonical names for the XR_INO_ constants
Add an array with the canonical name for each inode type so that code
doesn't have to implement switch statements for that, and remove the now
trivial process_misc_ino_types and process_misc_ino_types_blocks
functions.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Dec 2025 16:16:08 +0000 (08:16 -0800)]
mkfs: enable new features by default
Since the LTS is coming up, enable parent pointers and exchange-range by
default for all users. Also fix up an out of date comment.
I created a really stupid benchmarking script that does:
#!/bin/bash
# pptr overhead benchmark
umount /opt /mnt
rmmod xfs
for i in 1 0; do
umount /opt
mkfs.xfs -f /dev/sdb -n parent=$i | grep -i parent=
mount /dev/sdb /opt
mkdir -p /opt/foo
for ((i=0;i<5;i++)); do
time fsstress -n 100000 -p 4 -z -f creat=1 -d /opt/foo -s 1
done
done
This is the result of creating an enormous number of empty files in a
single directory:
# ./dumb.sh
naming =version 2 bsize=4096 ascii-ci=0, ftype=1, parent=0
real 0m18.807s
user 0m2.169s
sys 0m54.013s
naming =version 2 bsize=4096 ascii-ci=0, ftype=1, parent=1
real 0m20.654s
user 0m2.374s
sys 1m4.441s
As you can see, there's a 10% increase in runtime here. If I make the
workload a bit more representative by changing the -f argument to
include a directory tree workout:
libfrog: fix incorrect FS_IOC_FSSETXATTR argument to ioctl()
xfsprogs 6.17.0 has broken project quota due to incorrect argument
passed to FS_IOC_FSSETXATTR ioctl(). Instead of passing struct fsxattr,
struct file_attr was passed.
# LC_ALL=C /usr/sbin/xfs_quota -x -c "project -s -p /home/xxx 389701" /home
Setting up project 389701 (path /home/xxx)...
xfs_quota: cannot set project on /home/xxx: Invalid argument
Processed 1 (/etc/projects and cmdline) paths for project 389701 with
recursion depth infinite (-1).
There seems to be a double mistake which hides the original ioctl()
argument bug on old kernel with xfsprogs built against it. The size of
fa_xflags was also wrong in xfsprogs's linux.h header. This way when
xfsprogs is compiled on newer kernel but used with older kernel this bug
uncovers.
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
When we are picking a zone for gc it might already be in the pipeline
which can lead to us moving the same data twice resulting in in write
amplification and a very unfortunate case where we keep on garbage
collecting the zone we just filled with migrated data stopping all
forward progress.
Fix this by introducing a count of on-going GC operations on a zone, and
skip any zone with ongoing GC when picking a new victim.
Fixes: 080d01c41 ("xfs: implement zoned garbage collection") Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
For regular block devices using the zoned allocator, the default
maximum number of open zones is set to 1/4 of the number of realtime
groups. For a large capacity device, this leads to a very large limit.
E.g. with a 26 TB HDD:
mount /dev/sdb /mnt
...
XFS (sdb): 95836 zones of 65536 blocks size (23959 max open)
In turn such large limit on the number of open zones can lead, depending
on the workload, on a very large number of concurrent write streams
which devices generally do not handle well, leading to poor performance.
Introduce the default limit XFS_DEFAULT_MAX_OPEN_ZONES, defined as 128
to match the hardware limit of most SMR HDDs available today, and use
this limit to set mp->m_max_open_zones in xfs_calc_open_zones() instead
of calling xfs_max_open_zones(), when the user did not specify a limit
with the max_open_zones mount option.
For the 26 TB HDD example, we now get:
mount /dev/sdb /mnt
...
XFS (sdb): 95836 zones of 65536 blocks (128 max open zones)
This change does not prevent the user from specifying a lareger number
for the open zones limit. E.g.
mount -o max_open_zones=4096 /dev/sdb /mnt
...
XFS (sdb): 95836 zones of 65536 blocks (4096 max open zones)
Finally, since xfs_calc_open_zones() checks and caps the
mp->m_max_open_zones limit against the value calculated by
xfs_max_open_zones() for any type of device, this new default limit does
not increase m_max_open_zones for small capacity devices.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Don't pass expr to XFS_TEST_ERROR. Most calls pass a constant false,
and the places that do pass an expression become cleaner by moving it
out.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
[aalbersh: remove argument from a macro and fix call in defer_item.c] Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
These are purely in-memory values and not used at all in xfsprogs.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
When mounting file systems with a log that was dirtied on i386 on
other architectures or vice versa, log recovery is unhappy:
[ 11.068052] XFS (vdb): Torn write (CRC failure) detected at log block 0x2. Truncating head block from 0xc.
This is because the CRCs generated by i386 and other architectures
always diff. The reason for that is that sizeof(struct xlog_rec_header)
returns different values for i386 vs the rest (324 vs 328), because the
struct is not sizeof(uint64_t) aligned, and i386 has odd struct size
alignment rules.
This issue goes back to commit 13cdc853c519 ("Add log versioning, and new
super block field for the log stripe") in the xfs-import tree, which
adds log v2 support and the h_size field that causes the unaligned size.
At that time it only mattered for the crude debug only log header
checksum, but with commit 0e446be44806 ("xfs: add CRC checks to the log")
it became a real issue for v5 file system, because now there is a proper
CRC, and regular builds actually expect it match.
Fix this by allowing checksums with and without the padding.
Fixes: 0e446be44806 ("xfs: add CRC checks to the log") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Also fix up the comment about the struct xfs_extent definition to be
correct and read more easily.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
These sysctl knobs were scheduled for removal in September 2025. That
time has come, so remove them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
These four mount options were scheduled for removal in September 2025,
so remove them now.
Cc: preichl@redhat.com Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Darrick J. Wong [Tue, 2 Dec 2025 01:28:00 +0000 (17:28 -0800)]
man2: fix getparents ioctl manpage
Fix a silly typo in the manual page for the GETPARENTS ioctl.
Cc: linux-xfs@vger.kernel.org # v6.10.0 Fixes: a24294c252d4a6 ("man: document the XFS_IOC_GETPARENTS ioctl") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Tue, 2 Dec 2025 01:27:29 +0000 (17:27 -0800)]
libxfs: fix build warnings
gcc 14.2 with all the warnings turn on complains about missing
prototypes for these two functions:
util.c:147:1: error: no previous prototype for 'current_fixed_time' [-Werror=missing-prototypes]
147 | current_fixed_time(
| ^~~~~~~~~~~~~~~~~~
util.c:590:1: error: no previous prototype for 'get_deterministic_seed' [-Werror=missing-prototypes]
590 | get_deterministic_seed(
| ^~~~~~~~~~~~~~~~~~~~~~
Since they're not used outside of util.c, just make them static.
Fixes: 4a54700b4385bb ("libxfs: support reproducible filesystems using deterministic time/seed") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Re-indent, drop typedefs and invert a conditional to allow for an early
return.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[aalbersh: add one column of tabs to arguments and vars definitions] Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
Toward the caller. And reindent it while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[aalbersh: add one tab column to arguments to make it align with in_f32] Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>