mkfs: adjust_nr_zones for zoned file system on conventional devices
When creating zoned file systems on conventional devices, mkfs doesn't
currently align the RT device size to the zone size, which can create
unmountable file systems. Fix this by moving the rgcount modification
to account for reserved zoned and then calling adjust_nr_zones
unconditionally, and thus ensuring that the rtblocks and rtextents values
are guaranteed to always be a multiple of the zone size.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Dec 2025 20:57:38 +0000 (12:57 -0800)]
xfs_logprint: fix pointer bug
generic/055 captures a crash in xfs_logprint due to an incorrect
refactoring trying to increment a pointer-to-pointer whereas before it
incremented a pointer.
Fixes: 5a9b7e95140893 ("logprint: factor out a xlog_print_op helper") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
On big-endian architectures (e.g. s390x), restoring a filesystem from a
v2 metadump fails with "Invalid superblock disk address/length". This is
caused by restore_v2() treating a superblock extent length of 1 as an
error, even though a length of 1 is expected because the superblock fits
within a 512-byte sector.
On little-endian systems, the same raw extent length bytes that represent
a value of 1 on big-endian are misinterpreted as 16777216 due to byte
ordering, so the faulty check never triggers there and the bug is hidden.
Fix the issue by using an endian-correct comparison of xme_len so that
the superblock extent length is validated properly and consistently on
all architectures.
Signed-off-by: Pavel Reichl <preichl@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
repair: add canonical names for the XR_INO_ constants
Add an array with the canonical name for each inode type so that code
doesn't have to implement switch statements for that, and remove the now
trivial process_misc_ino_types and process_misc_ino_types_blocks
functions.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Tue, 9 Dec 2025 16:16:08 +0000 (08:16 -0800)]
mkfs: enable new features by default
Since the LTS is coming up, enable parent pointers and exchange-range by
default for all users. Also fix up an out of date comment.
I created a really stupid benchmarking script that does:
#!/bin/bash
# pptr overhead benchmark
umount /opt /mnt
rmmod xfs
for i in 1 0; do
umount /opt
mkfs.xfs -f /dev/sdb -n parent=$i | grep -i parent=
mount /dev/sdb /opt
mkdir -p /opt/foo
for ((i=0;i<5;i++)); do
time fsstress -n 100000 -p 4 -z -f creat=1 -d /opt/foo -s 1
done
done
This is the result of creating an enormous number of empty files in a
single directory:
# ./dumb.sh
naming =version 2 bsize=4096 ascii-ci=0, ftype=1, parent=0
real 0m18.807s
user 0m2.169s
sys 0m54.013s
naming =version 2 bsize=4096 ascii-ci=0, ftype=1, parent=1
real 0m20.654s
user 0m2.374s
sys 1m4.441s
As you can see, there's a 10% increase in runtime here. If I make the
workload a bit more representative by changing the -f argument to
include a directory tree workout:
libfrog: fix incorrect FS_IOC_FSSETXATTR argument to ioctl()
xfsprogs 6.17.0 has broken project quota due to incorrect argument
passed to FS_IOC_FSSETXATTR ioctl(). Instead of passing struct fsxattr,
struct file_attr was passed.
# LC_ALL=C /usr/sbin/xfs_quota -x -c "project -s -p /home/xxx 389701" /home
Setting up project 389701 (path /home/xxx)...
xfs_quota: cannot set project on /home/xxx: Invalid argument
Processed 1 (/etc/projects and cmdline) paths for project 389701 with
recursion depth infinite (-1).
There seems to be a double mistake which hides the original ioctl()
argument bug on old kernel with xfsprogs built against it. The size of
fa_xflags was also wrong in xfsprogs's linux.h header. This way when
xfsprogs is compiled on newer kernel but used with older kernel this bug
uncovers.
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
When we are picking a zone for gc it might already be in the pipeline
which can lead to us moving the same data twice resulting in in write
amplification and a very unfortunate case where we keep on garbage
collecting the zone we just filled with migrated data stopping all
forward progress.
Fix this by introducing a count of on-going GC operations on a zone, and
skip any zone with ongoing GC when picking a new victim.
Fixes: 080d01c41 ("xfs: implement zoned garbage collection") Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
For regular block devices using the zoned allocator, the default
maximum number of open zones is set to 1/4 of the number of realtime
groups. For a large capacity device, this leads to a very large limit.
E.g. with a 26 TB HDD:
mount /dev/sdb /mnt
...
XFS (sdb): 95836 zones of 65536 blocks size (23959 max open)
In turn such large limit on the number of open zones can lead, depending
on the workload, on a very large number of concurrent write streams
which devices generally do not handle well, leading to poor performance.
Introduce the default limit XFS_DEFAULT_MAX_OPEN_ZONES, defined as 128
to match the hardware limit of most SMR HDDs available today, and use
this limit to set mp->m_max_open_zones in xfs_calc_open_zones() instead
of calling xfs_max_open_zones(), when the user did not specify a limit
with the max_open_zones mount option.
For the 26 TB HDD example, we now get:
mount /dev/sdb /mnt
...
XFS (sdb): 95836 zones of 65536 blocks (128 max open zones)
This change does not prevent the user from specifying a lareger number
for the open zones limit. E.g.
mount -o max_open_zones=4096 /dev/sdb /mnt
...
XFS (sdb): 95836 zones of 65536 blocks (4096 max open zones)
Finally, since xfs_calc_open_zones() checks and caps the
mp->m_max_open_zones limit against the value calculated by
xfs_max_open_zones() for any type of device, this new default limit does
not increase m_max_open_zones for small capacity devices.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Don't pass expr to XFS_TEST_ERROR. Most calls pass a constant false,
and the places that do pass an expression become cleaner by moving it
out.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
[aalbersh: remove argument from a macro and fix call in defer_item.c] Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
These are purely in-memory values and not used at all in xfsprogs.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
When mounting file systems with a log that was dirtied on i386 on
other architectures or vice versa, log recovery is unhappy:
[ 11.068052] XFS (vdb): Torn write (CRC failure) detected at log block 0x2. Truncating head block from 0xc.
This is because the CRCs generated by i386 and other architectures
always diff. The reason for that is that sizeof(struct xlog_rec_header)
returns different values for i386 vs the rest (324 vs 328), because the
struct is not sizeof(uint64_t) aligned, and i386 has odd struct size
alignment rules.
This issue goes back to commit 13cdc853c519 ("Add log versioning, and new
super block field for the log stripe") in the xfs-import tree, which
adds log v2 support and the h_size field that causes the unaligned size.
At that time it only mattered for the crude debug only log header
checksum, but with commit 0e446be44806 ("xfs: add CRC checks to the log")
it became a real issue for v5 file system, because now there is a proper
CRC, and regular builds actually expect it match.
Fix this by allowing checksums with and without the padding.
Fixes: 0e446be44806 ("xfs: add CRC checks to the log") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Also fix up the comment about the struct xfs_extent definition to be
correct and read more easily.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
There are almost no users of the typedef left, kill it and switch the
remaining users to use the underlying struct.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
These sysctl knobs were scheduled for removal in September 2025. That
time has come, so remove them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
These four mount options were scheduled for removal in September 2025,
so remove them now.
Cc: preichl@redhat.com Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Darrick J. Wong [Tue, 2 Dec 2025 01:28:00 +0000 (17:28 -0800)]
man2: fix getparents ioctl manpage
Fix a silly typo in the manual page for the GETPARENTS ioctl.
Cc: linux-xfs@vger.kernel.org # v6.10.0 Fixes: a24294c252d4a6 ("man: document the XFS_IOC_GETPARENTS ioctl") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Tue, 2 Dec 2025 01:27:29 +0000 (17:27 -0800)]
libxfs: fix build warnings
gcc 14.2 with all the warnings turn on complains about missing
prototypes for these two functions:
util.c:147:1: error: no previous prototype for 'current_fixed_time' [-Werror=missing-prototypes]
147 | current_fixed_time(
| ^~~~~~~~~~~~~~~~~~
util.c:590:1: error: no previous prototype for 'get_deterministic_seed' [-Werror=missing-prototypes]
590 | get_deterministic_seed(
| ^~~~~~~~~~~~~~~~~~~~~~
Since they're not used outside of util.c, just make them static.
Fixes: 4a54700b4385bb ("libxfs: support reproducible filesystems using deterministic time/seed") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Re-indent, drop typedefs and invert a conditional to allow for an early
return.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[aalbersh: add one column of tabs to arguments and vars definitions] Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
Toward the caller. And reindent it while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[aalbersh: add one tab column to arguments to make it align with in_f32] Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
The code has been stubbed out since the initial creation of the
xfsprogs repository. Open code the single-line printf in the
data fork caller (attr forks can't contain directories) and remove
the dead code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
Darrick J. Wong [Fri, 21 Nov 2025 16:39:37 +0000 (08:39 -0800)]
xfs_scrub: fix null pointer crash in scrub_render_ino_descr
Starting in Debian 13's libc6, passing a NULL format string to vsnprintf
causes the program to segfault. Prior to this, the null format string
would be ignored. Because @format is optional, let's explicitly steer
around the vsnprintf if there is no format string. Also tidy whitespace
in the comment.
Found by generic/45[34] on Debian 13.
Cc: linux-xfs@vger.kernel.org # v6.10.0 Fixes: 9a8b09762f9a52 ("xfs_scrub: use parent pointers when possible to report file operations") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Carlos Maiolino [Thu, 13 Nov 2025 13:57:11 +0000 (14:57 +0100)]
metadump: catch used extent array overflow
An user reported a SIGSEGV when attempting to create a metadump image of
a filesystem.
The reason is because we fail to catch a possible overflow in the
used extents array in process_exinode() which may happen if the extent
count is corrupted.
This leads process_bmbt_reclist() to attempt to index into the array
using the bogus extent count with:
convert_extent(&rp[numrecs - 1], &o, &s, &c, &f);
Fix this by extending the used counter to uint64_t and
checking for the overflow possibility.
Reported-by: hubert . <hubjin657@outlook.com> Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Carlos Maiolino [Thu, 13 Nov 2025 13:46:13 +0000 (14:46 +0100)]
mkfs: fix zone capacity check for sequential zones
Sequential zones can have a different, smaller capacity than
conventional zones.
Currently mkfs assumes both sequential and conventional zones will have
the same capacity and and set the zone_info to the capacity of the first
found zone and use that value to validate all the remaining zones's
capacity.
Because conventional zones can't have a different capacity than its
size, the first zone always have the largest possible capacity, so, mkfs
will fail to validate any consecutive sequential zone if its capacity is
smaller than the conventional zones.
What we should do instead, is set the zone info capacity accordingly to
the settings of first zone found of the respective type and validate
the capacity based on that instead of assuming all zones will have the
same capacity.
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Luca Di Maio [Sat, 8 Nov 2025 14:39:53 +0000 (15:39 +0100)]
libxfs: support reproducible filesystems using deterministic time/seed
Add support for reproducible filesystem creation through two environment
variables that enable deterministic behavior when building XFS filesystems.
SOURCE_DATE_EPOCH support:
When SOURCE_DATE_EPOCH is set, use its value for all filesystem timestamps
instead of the current time. This follows the reproducible builds
specification (https://reproducible-builds.org/specs/source-date-epoch/)
and ensures consistent inode timestamps across builds.
DETERMINISTIC_SEED support:
When DETERMINISTIC_SEED=1 is set, return a fixed seed value (0x53454544 =
"SEED") from get_random_u32() instead of reading from /dev/urandom.
get_random_u32() seems to be used mostly to set inode generation number, being
fixed should not be create collision issues at mkfs time.
The implementation introduces two helper functions to minimize changes
to existing code:
- current_fixed_time(): Parses and caches SOURCE_DATE_EPOCH on first
call. Returns fixed timestamp when set, falls back to gettimeofday() on
parse errors or when unset.
- get_deterministic_seed(): Checks for DETERMINISTIC_SEED=1 environment
variable on first call, and returns a fixed seed value (0x53454544).
Falls back to getrandom() when unset.
- Both helpers use one-time initialization to avoid repeated getenv() calls.
- Both quickly exit and noop if environment is not set or has invalid
variables, falling back to original behaviour.
This enables distributions and build systems to create bit-for-bit
identical XFS filesystems when needed for verification and debugging.
v1 -> v2:
- simplify deterministic seed by returning a fixed value instead
of using Middle Square Weyl Sequence PRNG
- fix timestamp type time_t -> time64_t
- fix timestamp initialization flag to allow negative epochs
- fix timestamp conversion type using strtoll
- fix timestamp conversion check to be sure the whole string was parsed
- print warning message when SOURCE_DATE_EPOCH is invalid
Signed-off-by: Luca Di Maio <luca.dimaio1@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Right now 5 places in the kernel and one in xfsprogs need to be updated
for each new error tag. Add a bit of macro magic so that only the
error tag definition and a single table, which reside next to each
other, need to be updated.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Chandan Babu R [Tue, 4 Nov 2025 09:14:37 +0000 (14:44 +0530)]
repair/prefetch.c: Create one workqueue with multiple workers
When xfs_repair is executed with a non-zero value for ag_stride,
do_inode_prefetch() create multiple workqueues with each of them having just
one worker thread.
Since commit 12838bda12e669 ("libfrog: fix overly sleep workqueues"), a
workqueue can process multiple work items concurrently. Hence, this commit
replaces the above logic with just one workqueue having multiple workers.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Chandan Babu R [Tue, 4 Nov 2025 09:14:36 +0000 (14:44 +0530)]
libfrog: Prevent unnecessary waking of worker thread when using bounded workqueues
When woken up, a worker will pick a work item from the workqueue and wake up
another worker when the current workqueue is a bounded workqueue and if there
is atleast one more work item remains to be processed.
The commit 12838bda12e669 ("libfrog: fix overly sleep workqueues") prevented
single-threaded processing of work items by waking up sleeping workers when a
work item is added to the workqueue. Hence the earlier described mechanism of
waking workers is no longer required.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Luca Di Maio <luca.dimaio1@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Fixes: 8a4ea72724930c ("proto: add ability to populate a filesystem from a directory")
Zone reset is a mandatory part of creating a file system on a zoned
device, unlike discard, which can be skipped. It also is implemented
a bit different, so just split the handling. This also means that we
can now support the -K option to skip discard on the data section for
zoned file systems.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Mon, 13 Oct 2025 23:34:24 +0000 (16:34 -0700)]
xfs_scrub_fail: reduce security lockdowns to avoid postfix problems
Iustin Pop reports that the xfs_scrub_fail service fails to email
problem reports on Debian when postfix is installed. This is apparently
due to several factors:
1. postfix's sendmail wrapper calling postdrop directly,
2. postdrop requiring the ability to write to the postdrop group,
3. lockdown preventing the xfs_scrub_fail@ service to have postdrop in
the supplemental group list or the ability to run setgid programs
Item (3) could be solved by adding the whole service to the postdrop
group via SupplementalGroups=, but that will fail if postfix is not
installed and hence there is no postdrop group.
It could also be solved by forcing msmtp to be installed, bind mounting
msmtp into the service container, and injecting a config file that
instructs msmtp to connect to port 25, but that in turn isn't compatible
with systems not configured to allow an smtp server to listen on ::1.
So we'll go with the less restrictive approach that e2scrub_fail@ does,
which is to say that we just turn off all the sandboxing. :( :(
Reported-by: iustin@debian.org Cc: linux-xfs@vger.kernel.org # v6.10.0 Fixes: 9042fcc08eed6a ("xfs_scrub_fail: tighten up the security on the background systemd service") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org>
ENODATA (aka ENOATTR) has a very specific meaning in the xfs xattr code;
namely, that the requested attribute name could not be found.
However, a medium error from disk may also return ENODATA. At best,
this medium error may escape to userspace as "attribute not found"
when in fact it's an IO (disk) error.
At worst, we may oops in xfs_attr_leaf_get() when we do:
because an ENODATA/ENOATTR error from disk leaves us with a null bp,
and the xfs_trans_brelse will then null-deref it.
As discussed on the list, we really need to modify the lower level
IO functions to trap all disk errors and ensure that we don't let
unique errors like this leak up into higher xfs functions - many
like this should be remapped to EIO.
However, this patch directly addresses a reported bug in the xattr
code, and should be safe to backport to stable kernels. A larger-scope
patch to handle more unique errors at lower levels can follow later.
(Note, prior to 07120f1abdff we did not oops, but we did return the
wrong error code to userspace.)
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Replace the deprecated strncpy() with memtostr_pad(). This also avoids
the need for separate zeroing using memset(). Mark sb_fname buffer with
__nonstring as its size is XFSLABEL_MAX and so no terminating NULL for
sb_fname.
Signed-off-by: Pranav Tyagi <pranav.tyagi03@gmail.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Split up the XFS_IS_CORRUPT statement so that it immediately shows
if the reference counter overflowed or underflowed.
I ran into this quite a bit when developing the zoned allocator, and had
to reapply the patch for some work recently. We might as well just apply
it upstream given that freeing group is far removed from performance
critical code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
xfs_trans_alloc_empty can't return errors, so return the allocated
transaction directly instead of an output double pointer argument.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Use cmp_int() to yield the result of a three-way-comparison instead of
performing subtractions with extra casts. Thus also rename the function
to make its name clearer in purpose.
Found by Linux Verification Center (linuxtesting.org).
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Perhaps that's just my silly imagination but 'diff' doesn't look good for
the name of a variable to hold a result of a three-way-comparison
(-1, 0, 1) which is what ->cmp_key_with_cur() does. It implies to contain
an actual difference between the two integer variables but that's not true
anymore after recent refactoring.
Declaring it as int64_t is also misleading now. Plain integer type is
more than enough.
Found by Linux Verification Center (linuxtesting.org).
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
The net value of these functions is to determine the result of a
three-way-comparison between operands of the same type.
Simplify the code using cmp_int() to eliminate potential errors with
opencoded casts and subtractions. This also means we can change the return
value type of cmp_key_with_cur routines from int64_t to int and make the
interface a bit clearer.
Found by Linux Verification Center (linuxtesting.org).
Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
The net value of these functions is to determine the result of a
three-way-comparison between operands of the same type.
Simplify the code using cmp_int() to eliminate potential errors with
opencoded casts and subtractions. This also means we can change the return
value type of cmp_two_keys routines from int64_t to int and make the
interface a bit clearer.
Found by Linux Verification Center (linuxtesting.org).
Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>