Add an errortag mount option that enables an errortag with the default
injection frequency. This allows injecting errors into the mount
process instead of just on live file systems, and thus test mount
error handling.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
There is no synchronization for updating m_errortag, which is fine as
it's just a debug tool. It would still be nice to fully avoid the
theoretical case of torn values, so use WRITE_ONCE and READ_ONCE to
access the members.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs: move the guts of XFS_ERRORTAG_DELAY out of line
Mirror what is done for the more common XFS_ERRORTAG_TEST version,
and also only look at the error tag value once now that we can
easily have a local variable.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
We can trust XFS developers enough to not pass random stuff to
XFS_ERROR_TEST/DELAY. Open code the validity check in xfs_errortag_add,
which is the only place that receives unvalidated error tag values from
user space, and drop the now pointless xfs_errortag_enabled helper.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Ensure the mount structure always has a valid m_errortag for debug
builds. This removes the NULL checking from the runtime code, and
prepares for allowing to set errortags from mount.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs: fix the errno sign for the xfs_errortag_{add,clearall} stubs
All errno values should be negative in the kernel.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs: validate log record version against superblock log version
Syzbot creates a fuzzed record where xfs_has_logv2() but the
xlog_rec_header h_version != XLOG_VERSION_2. This causes a
KASAN: slab-out-of-bounds read in xlog_do_recovery_pass() ->
xlog_recover_process() -> xlog_cksum().
Fix by adding a check to xlog_valid_rec_header() to abort journal
recovery if the xlog_rec_header h_version does not match the super
block log version.
A file system with a version 2 log will only ever set
XLOG_VERSION_2 in its headers (and v1 will only ever set V_1), so if
there is any mismatch, either the journal or the superblock has been
corrupted and therefore we abort processing with a -EFSCORRUPTED error
immediately.
Also, refactor the structure of the validity checks for better
readability. At the default error level (LOW), XFS_IS_CORRUPT() emits
the condition that failed, the file and line number it is
located at, then dumps the stack. This gives us everything we need
to know about the failure if we do a single validity check per
XFS_IS_CORRUPT().
Reported-by: syzbot+9f6d080dece587cfdd4c@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9f6d080dece587cfdd4c Tested-by: syzbot+9f6d080dece587cfdd4c@syzkaller.appspotmail.com Fixes: 45cf976008dd ("xfs: fix log recovery buffer allocation for the legacy h_size fixup") Signed-off-by: Raphael Pinsonneault-Thibeault <rpthibeault@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs_zone_gc_space_available only has one caller left, so fold it into
that. Reorder the checks so that the cheaper scratch_available check
is done first.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs: use a seprate member to track space availabe in the GC scatch buffer
When scratch_head wraps back to 0 and scratch_tail is also 0 because no
I/O has completed yet, the ring buffer could be mistaken for empty.
Fix this by introducing a separate scratch_available member in
struct xfs_zone_gc_data. This actually ends up simplifying the code as
well.
Reported-by: Chris Mason <clm@meta.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Carlos Maiolino [Wed, 28 Jan 2026 09:16:12 +0000 (10:16 +0100)]
Merge tag 'attr-pptr-speedup-7.0_2026-01-25' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-7.0-merge
xfs: improve shortform attr performance [2/3]
Improve performance of the xattr (and parent pointer) code when the
attr structure is in short format and we can therefore perform all
updates in a single transaction. Avoiding the attr intent code brings
a very nice speedup in those operations.
With a bit of luck, this should all go splendidly.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Carlos Maiolino [Wed, 28 Jan 2026 09:15:07 +0000 (10:15 +0100)]
Merge tag 'attr-leaf-freemap-fixes-7.0_2026-01-25' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-7.0-merge
xfs: fix problems in the attr leaf freemap code [1/3]
Running generic/753 for hours revealed data corruption problems in the
attr leaf block space management code. Under certain circumstances,
freemap entries are left with zero size but a nonzero offset. If that
offset happens to be the same offset as the end of the entries array
during an attr set operation, the leaf entry table expansion will push
the freemap record offset upwards without checking for overlap with any
other freemap entries. If there happened to be a second freemap entry
overlapping with the newly allocated leaf entry space, then the next
attr set operation might find that space and overwrite the leaf entry,
thereby corrupting the leaf block.
Fix this by zeroing the freemap offset any time we set the size to zero.
If a subsequent attr set operation finds no space in the freemap, it
will compact the block and regenerate the freemaps.
With a bit of luck, this should all go splendidly.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Carlos Maiolino [Wed, 28 Jan 2026 08:47:42 +0000 (09:47 +0100)]
Merge tag 'health-monitoring-7.0_2026-01-20' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-7.0-merge
xfs: autonomous self healing of filesystems [v7]
This patchset builds new functionality to deliver live information about
filesystem health events to userspace. This is done by creating an
anonymous file that can be read() for events by userspace programs.
Events are captured by hooking various parts of XFS and iomap so that
metadata health failures, file I/O errors, and major changes in
filesystem state (unmounts, shutdowns, etc.) can be observed by
programs.
When an event occurs, the hook functions queue an event object to each
event anonfd for later processing. Programs must have CAP_SYS_ADMIN
to open the anonfd and there's a maximum event lag to prevent resource
overconsumption. The events themselves can be read() from the anonfd
as C structs for the xfs_healer daemon.
In userspace, we create a new daemon program that will read the event
objects and initiate repairs automatically. This daemon is managed
entirely by systemd and will not block unmounting of the filesystem
unless repairs are ongoing. They are auto-started by a starter
service that uses fanotify.
This patchset depends on the new fserror code that Christian Brauner
has tentatively accepted for Linux 7.0:
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs-7.0.fserror
v7: more cleanups of the media verification ioctl, improve comments, and
reuse the bio
v6: fix pi-breaking bugs, make verify failures trigger health reports
and filter bio status flags better
v5: add verify-media ioctl, collapse small helper funcs with only
one caller
v4: drop multiple client support so we can make direct calls into
healthmon instead of chasing pointers and doing indirect calls
v3: drag out of rfc status
With a bit of luck, this should all go splendidly.
Conflicts:
This merge required an update on files:
- fs/xfs/xfs_healthmon.c
- fs/xfs/xfs_verify_media.c
Such change was required because a parallel developement changed
XFS header file xfs.h naming to xfs_platform.h, so the merge
required to update those includes in both files above
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Fri, 23 Jan 2026 17:27:40 +0000 (09:27 -0800)]
xfs: check for deleted cursors when revalidating two btrees
The free space and inode btree repair functions will rebuild both btrees
at the same time, after which it needs to evaluate both btrees to
confirm that the corruptions are gone.
However, Jiaming Zhang ran syzbot and produced a crash in the second
xchk_allocbt call. His root-cause analysis is as follows (with minor
corrections):
In xrep_revalidate_allocbt(), xchk_allocbt() is called twice (first
for BNOBT, second for CNTBT). The cause of this issue is that the
first call nullified the cursor required by the second call.
Let's first enter xrep_revalidate_allocbt() via following call chain:
xchk_allocbt() is called twice in this function. In the first call:
/* Note that sc->sm->sm_type is XFS_SCRUB_TYPE_BNOPT now */
xchk_allocbt() ->
xchk_btree() ->
`bs->scrub_rec(bs, recp)` ->
xchk_allocbt_rec() ->
xchk_allocbt_xref() ->
xchk_allocbt_xref_other()
since sm_type is XFS_SCRUB_TYPE_BNOBT, pur is set to &sc->sa.cnt_cur.
Kernel called xfs_alloc_get_rec() and returned -EFSCORRUPTED. Call
chain:
xfs_alloc_get_rec() ->
xfs_btree_get_rec() ->
xfs_btree_check_block() ->
(XFS_IS_CORRUPT || XFS_TEST_ERROR), the former is false and the latter
is true, return -EFSCORRUPTED. This should be caused by
ioctl$XFS_IOC_ERROR_INJECTION I guess.
Back to xchk_allocbt_xref_other(), after receiving -EFSCORRUPTED from
xfs_alloc_get_rec(), kernel called xchk_should_check_xref(). In this
function, *curpp (points to sc->sa.cnt_cur) is nullified.
Back to xrep_revalidate_allocbt(), since sc->sa.cnt_cur has been
nullified, it then triggered null-ptr-deref via xchk_allocbt() (second
call) -> xchk_btree().
So. The bnobt revalidation failed on a cross-reference attempt, so we
deleted the cntbt cursor, and then crashed when we tried to revalidate
the cntbt. Therefore, check for a null cntbt cursor before that
revalidation, and mark the repair incomplete. Also we can ignore the
second tree entirely if the first tree was rebuilt but is already
corrupt.
Apply the same fix to xrep_revalidate_iallocbt because it has the same
problem.
Darrick J. Wong [Fri, 23 Jan 2026 17:27:39 +0000 (09:27 -0800)]
xfs: fix UAF in xchk_btree_check_block_owner
We cannot dereference bs->cur when trying to determine if bs->cur
aliases bs->sc->sa.{bno,rmap}_cur after the latter has been freed.
Fix this by sampling before type before any freeing could happen.
The correct temporal ordering was broken when we removed xfs_btnum_t.
Darrick J. Wong [Fri, 23 Jan 2026 17:27:38 +0000 (09:27 -0800)]
xfs: check return value of xchk_scrub_create_subord
Fix this function to return NULL instead of a mangled ENOMEM, then fix
the callers to actually check for a null pointer and return ENOMEM.
Most of the corrections here are for code merged between 6.2 and 6.10.
Cc: r772577952@gmail.com Cc: <stable@vger.kernel.org> # v6.12 Fixes: 1a5f6e08d4e379 ("xfs: create subordinate scrub contexts for xchk_metadata_inode_subtype") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jiaming Zhang <r772577952@gmail.com>
Darrick J. Wong [Fri, 23 Jan 2026 17:27:37 +0000 (09:27 -0800)]
xfs: only call xf{array,blob}_destroy if we have a valid pointer
Only call the xfarray and xfblob destructor if we have a valid pointer,
and be sure to null out that pointer afterwards. Note that this patch
fixes a large number of commits, most of which were merged between 6.9
and 6.10.
Darrick J. Wong [Fri, 23 Jan 2026 17:27:37 +0000 (09:27 -0800)]
xfs: get rid of the xchk_xfile_*_descr calls
The xchk_xfile_*_descr macros call kasprintf, which can fail to allocate
memory if the formatted string is larger than 16 bytes (or whatever the
nofail guarantees are nowadays). Some of them could easily exceed that,
and Jiaming Zhang found a few places where that can happen with syzbot.
The descriptions are debugging aids and aren't required to be unique, so
let's just pass in static strings and eliminate this path to failure.
Note this patch touches a number of commits, most of which were merged
between 6.6 and 6.14.
Darrick J. Wong [Fri, 23 Jan 2026 17:27:36 +0000 (09:27 -0800)]
xfs: add a method to replace shortform attrs
If we're trying to replace an xattr in a shortform attr structure and
the old entry fits the new entry, we can just memcpy and exit without
having to delete, compact, and re-add the entry (or worse use the attr
intent machinery). For parent pointers this only advantages renaming
where the filename length stays the same (e.g. mv autoexec.bat
scandisk.exe) but for regular xattrs it might be useful for updating
security labels and the like.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 23 Jan 2026 17:27:35 +0000 (09:27 -0800)]
xfs: speed up parent pointer operations when possible
After a recent fsmark benchmarking run, I observed that the overhead of
parent pointers on file creation and deletion can be a bit high. On a
machine with 20 CPUs, 128G of memory, and an NVME SSD capable of pushing
750000iops, I see the following results:
So we created 40 AGs, one per CPU. Now we create 40 directories and run
fsmark:
$ time fs_mark -D 10000 -S 0 -n 100000 -s 0 -L 8 -d ...
# Version 3.3, 40 thread(s) starting at Wed Dec 10 14:22:07 2025
# Sync method: NO SYNC: Test does not issue sync() or fsync() calls.
# Directories: Time based hash between directories across 10000 subdirectories with 180 seconds per subdirectory.
# File names: 40 bytes long, (16 initial bytes of time stamp with 24 random bytes at end of name)
# Files info: size 0 bytes, written with an IO size of 16384 bytes per write
# App overhead is time in microseconds spent in the test not doing file writing related system calls.
parent=0 parent=1
================== ==================
real 0m57.573s real 1m2.934s
user 3m53.578s user 3m53.508s
sys 19m44.440s sys 25m14.810s
$ time rm -rf ...
parent=0 parent=1
================== ==================
real 0m59.649s real 1m12.505s
user 0m41.196s user 0m47.489s
sys 13m9.566s sys 20m33.844s
Parent pointers increase the system time by 28% overhead to create 32
million files that are totally empty. Removing them incurs a system
time increase of 56%. Wall time increases by 9% and 22%.
For most filesystems, each file tends to have a single owner and not
that many xattrs. If the xattr structure is shortform, then all xattr
changes are logged with the inode and do not require the the xattr
intent mechanism to persist the parent pointer.
Therefore, we can speed up parent pointer operations by calling the
shortform xattr functions directly if the child's xattr is in short
format. Now the overhead looks like:
parent=0 parent=1
================== ==================
real 0m58.030s real 1m0.983s
user 3m54.141s user 3m53.758s
sys 19m57.003s sys 21m30.605s
$ time rm -rf ...
parent=0 parent=1
================== ==================
real 0m58.911s real 1m4.420s
user 0m41.329s user 0m45.169s
sys 13m27.857s sys 15m58.564s
Now parent pointers only increase the system time by 8% for creation and
19% for deletion. Wall time increases by 5% and 9% now.
Close the performance gap by creating helpers for the attr set, remove,
and replace operations that will try to make direct shortform updates,
and fall back to the attr intent machinery if that doesn't work. This
works for regular xattrs and for parent pointers.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 23 Jan 2026 17:27:33 +0000 (09:27 -0800)]
xfs: fix remote xattr valuelblk check
In debugging other problems with generic/753, it turns out that it's
possible for the system go to down in the middle of a remote xattr set
operation such that the leaf block entry is marked incomplete and
valueblk is set to zero. Make this no longer a failure.
Darrick J. Wong [Fri, 23 Jan 2026 17:27:33 +0000 (09:27 -0800)]
xfs: fix the xattr scrub to detect freemap/entries array collisions
In the previous patches, we observed that it's possible for there to be
freemap entries with zero size but a nonzero base. This isn't an
inconsistency per se, but older kernels can get confused by this and
corrupt the block, leading to corruption.
If we see this, flag the xattr structure for optimization so that it
gets rebuilt.
Darrick J. Wong [Fri, 23 Jan 2026 17:27:32 +0000 (09:27 -0800)]
xfs: strengthen attr leaf block freemap checking
Check for erroneous overlapping freemap regions and collisions between
freemap regions and the xattr leaf entry array.
Note that we must explicitly zero out the extra freemaps in
xfs_attr3_leaf_compact so that the in-memory buffer has a correctly
initialized freemap array to satisfy the new verification code, even if
subsequent code changes the contents before unlocking the buffer.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Upon enabling quite a lot more debugging code, I narrowed this down to
fsstress trying to set a local extended attribute with namelen=3 and
valuelen=71. This results in an entry size of 80 bytes.
At the start of xfs_attr3_leaf_add_work, the freemap looks like this:
i 0 base 448 size 0 rhs 448 count 46
i 1 base 388 size 132 rhs 448 count 46
i 2 base 2120 size 4 rhs 448 count 46
firstused = 520
where "rhs" is the first byte past the end of the leaf entry array.
This is inconsistent -- the entries array ends at byte 448, but
freemap[1] says there's free space starting at byte 388!
By the end of the function, the freemap is in worse shape:
i 0 base 456 size 0 rhs 456 count 47
i 1 base 388 size 52 rhs 456 count 47
i 2 base 2120 size 4 rhs 456 count 47
firstused = 440
Important note: 388 is not aligned with the entries array element size
of 8 bytes.
Based on the incorrect freemap, the name area starts at byte 440, which
is below the end of the entries array! That's why the assertion
triggers and the filesystem shuts down.
How did we end up here? First, recall from the previous patch that the
freemap array in an xattr leaf block is not intended to be a
comprehensive map of all free space in the leaf block. In other words,
it's perfectly legal to have a leaf block with:
* 376 bytes in use by the entries array
* freemap[0] has [base = 376, size = 8]
* freemap[1] has [base = 388, size = 1500]
* the space between 376 and 388 is free, but the freemap stopped
tracking that some time ago
If we add one xattr, the entries array grows to 384 bytes, and
freemap[0] becomes [base = 384, size = 0]. So far, so good. But if we
add a second xattr, the entries array grows to 392 bytes, and freemap[0]
gets pushed up to [base = 392, size = 0]. This is bad, because
freemap[1] hasn't been updated, and now the entries array and the free
space claim the same space.
The fix here is to adjust all freemap entries so that none of them
collide with the entries array. Note that this fix relies on commit 2a2b5932db6758 ("xfs: fix attr leaf header freemap.size underflow") and
the previous patch that resets zero length freemap entries to have
base = 0.
Cc: <stable@vger.kernel.org> # v2.6.12 Fixes: 1da177e4c3f415 ("Linux-2.6.12-rc2") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 23 Jan 2026 17:27:30 +0000 (09:27 -0800)]
xfs: delete attr leaf freemap entries when empty
Back in commit 2a2b5932db6758 ("xfs: fix attr leaf header freemap.size
underflow"), Brian Foster observed that it's possible for a small
freemap at the end of the end of the xattr entries array to experience
a size underflow when subtracting the space consumed by an expansion of
the entries array. There are only three freemap entries, which means
that it is not a complete index of all free space in the leaf block.
This code can leave behind a zero-length freemap entry with a nonzero
base. Subsequent setxattr operations can increase the base up to the
point that it overlaps with another freemap entry. This isn't in and of
itself a problem because the code in _leaf_add that finds free space
ignores any freemap entry with zero size.
However, there's another bug in the freemap update code in _leaf_add,
which is that it fails to update a freemap entry that begins midway
through the xattr entry that was just appended to the array. That can
result in the freemap containing two entries with the same base but
different sizes (0 for the "pushed-up" entry, nonzero for the entry
that's actually tracking free space). A subsequent _leaf_add can then
allocate xattr namevalue entries on top of the entries array, leading to
data loss. But fixing that is for later.
For now, eliminate the possibility of confusion by zeroing out the base
of any freemap entry that has zero size. Because the freemap is not
intended to be a complete index of free space, a subsequent failure to
find any free space for a new xattr will trigger block compaction, which
regenerates the freemap.
It looks like this bug has been in the codebase for quite a long time.
Cc: <stable@vger.kernel.org> # v2.6.12 Fixes: 1da177e4c3f415 ("Linux-2.6.12-rc2") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
The code passes 0 to memalloc_nofs_restore() when committing the original
transaction, but memalloc_nofs_restore() should always receive the
flags returned from the paired memalloc_nofs_save() call.
Before commit 3f6d5e6a468d ("mm: introduce memalloc_flags_{save,restore}"),
calling memalloc_nofs_restore(0) would unset the PF_MEMALLOC_NOFS flag,
which could cause memory allocation deadlocks[1].
Fortunately, after that commit, memalloc_nofs_restore(0) does nothing,
so this issue is currently harmless.
Hans Holmberg [Tue, 20 Jan 2026 08:57:46 +0000 (09:57 +0100)]
xfs: always allocate the free zone with the lowest index
Zones in the beginning of the address space are typically mapped to
higer bandwidth tracks on HDDs than those at the end of the address
space. So, in stead of allocating zones "round robin" across the whole
address space, always allocate the zone with the lowest index.
This increases average write bandwidth for overwrite workloads
when less than the full capacity is being used. At ~50% utilization
this improves bandwidth for a random file overwrite benchmark
with 128MiB files and 256MiB zone capacity by 30%.
Running the same benchmark with small 2-8 MiB files at 67% capacity
shows no significant difference in performance. Due to heavy
fragmentation the whole zone range is in use, greatly limiting the
number of free zones with high bw.
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Wed, 21 Jan 2026 06:45:40 +0000 (22:45 -0800)]
xfs: promote metadata directories and large block support
Large block support was merged upstream in 6.12 (Dec 2024) and metadata
directories was merged in 6.13 (Jan 2025). We've not received any
serious complaints about the ondisk formats of these two features in the
past year, so let's remove the experimental warnings.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs: use blkdev_get_zone_info to simplify zone reporting
Unwind the callback based programming model by querying the cached
zone information using blkdev_get_zone_info.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs: check that used blocks are smaller than the write pointer
Any used block must have been written, this reject used blocks > write
pointer.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Currently xfs_zone_validate mixes validating the software zone state in
the XFS realtime group with validating the hardware state reported in
struct blk_zone and deriving the write pointer from that.
Move all code that works on the realtime group to xfs_init_zone, and only
keep the hardware state validation in xfs_zone_validate. This makes the
code more clear, and allows for better reuse in userspace.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Move the two methods to query the write pointer out of xfs_init_zone into
the callers, so that xfs_init_zone doesn't have to bother with the
blk_zone structure and instead operates purely at the XFS realtime group
level.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Add a helper to figure the on-disk size of a group, accounting for the
XFS_SB_FEAT_INCOMPAT_ZONE_GAPS feature if needed.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Damien Le Moal [Wed, 14 Jan 2026 06:53:24 +0000 (07:53 +0100)]
xfs: add missing forward declaration in xfs_zones.h
Add the missing forward declaration for struct blk_zone in xfs_zones.h.
This avoids headaches with the order of header file inclusion to avoid
compilation errors.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
The calling convention of xfs_attr_leaf_hasname() is problematic, because
it returns a NULL buffer when xfs_attr3_leaf_read fails, a valid buffer
when xfs_attr3_leaf_lookup_int returns -ENOATTR or -EEXIST, and a
non-NULL buffer pointer for an already released buffer when
xfs_attr3_leaf_lookup_int fails with other error values.
Fix this by simply open coding xfs_attr_leaf_hasname in the callers, so
that the buffer release code is done by each caller of
xfs_attr3_leaf_read.
Cc: stable@vger.kernel.org # v5.19+ Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines") Reported-by: Mark Tinguely <mark.tinguely@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Fri, 19 Dec 2025 02:40:50 +0000 (18:40 -0800)]
xfs: mark data structures corrupt on EIO and ENODATA
I learned a few things this year: first, blk_status_to_errno can return
ENODATA for critical media errors; and second, the scrub code doesn't
mark data structures as corrupt on ENODATA or EIO.
Currently, scrub failing to capture these errors isn't all that
impactful -- the checking code will exit to userspace with EIO/ENODATA,
and xfs_scrub will log a complaint and exit with nonzero status. Most
people treat fsck tools failing as a sign that the fs is corrupt, but
online fsck should mark the metadata bad and keep moving.
Cc: stable@vger.kernel.org # v4.15 Fixes: 4700d22980d459 ("xfs: create helpers to record and deal with scrub problems") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
The double buffering where just one scratch area is used at a time does
not efficiently use the available memory. It was originally implemented
when GC I/O could happen out of order, but that was removed before
upstream submission to avoid fragmentation. Now that all GC I/Os are
processed in order, just use a number of buffers as a simple ring buffer.
For a synthetic benchmark that fills 256MiB HDD zones and punches out
holes to free half the space this leads to a decrease of GC time by
a little more than 25%.
Thanks to Hans Holmberg <hans.holmberg@wdc.com> for testing and
benchmarking.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Replace our somewhat fragile code to reuse the bio, which caused a
regression in the past with the block layer bio_reuse helper.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Add a helper to allow an existing bio to be resubmitted without
having to re-add the payload.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
The xfs.h header conflicts with the public xfs.h in xfsprogs, leading
to a spurious difference in all shared libxfs files that have to
include libxfs_priv.h in userspace. Directly include xfs_platform.h so
that we can add a header of the same name to xfsprogs and remove this
major annoyance for the shared code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Add a new xlog_write_space_advance that returns the current place in the
iclog that data is written to, and advances the various counters by the
amount taken from xlog_write_iovec, and also use it xlog_write_partial,
which open codes the counter adjustments, but misses the asserts.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs: improve the calling convention for the xlog_write helpers
The xlog_write chain passes around the same seven variables that are
often passed by reference. Add a xlog_write_data structure to contain
them to improve code generation and readability.
This change increases the generated code size by about 140 bytes for my
x86_64 build, which is hopefully worth the much easier to follow code:
$ size fs/xfs/xfs_log.o*
text data bss dec hex filename
29300 1730 176 31206 79e6 fs/xfs/xfs_log.o
29160 1730 176 31066 795a fs/xfs/xfs_log.o.old
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs: regularize iclog space accounting in xlog_write_partial
When xlog_write_partial splits a log region over multiple iclogs, it
has to include the continuation ophder in the length requested for the
new iclog. Currently is simply adds that to the request, which makes
the accounting of the used space below look slightly different from the
other users of iclog space that decrement it.
To prepare for more code sharing, add the ophdr size to the len variable
that tracks the number of bytes still are left in this xlog_write
operation before the calling xlog_write_get_more_iclog_space, and then
decrement it later when consuming that space.
This changes the value of len when xlog_write_get_more_iclog_space
returns an error, but as nothing looks at len in that case the
difference doesn't matter.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Export a higher level interface to format log items. The xlog_format_buf
structure is hidden inside xfs_log_cil.c and only accessed using two
helpers (and a wrapper build on top), hiding details of log iovecs from
the log items. This also allows simply using an index into lv_iovecp
instead of keeping a cursor vec.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
lv_bytes is mostly just use by the CIL code, but has crept into the
low-level log writing code to decide on a full or partial iclog
write. Ensure it is valid even for the special log writes that don't
go through the CIL by initializing it in xlog_write_one_vec.
Note that even without this fix, the checkpoint commits would never
trigger a partial iclog write, as they have no payload beyond the
opheader.
The unmount record on the other hand could in theory trigger a an
overflow of the iclog, but given that is has never been seen in
the wild this has probably been masked by the small size of it
and the fact that the unmount process does multiple log forces
before writing the unmount record and we thus usually operate on
an empty or almost empty iclog.
Fixes: 110dc24ad2ae ("xfs: log vector rounding leaks log space") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Wed, 21 Jan 2026 02:06:52 +0000 (18:06 -0800)]
xfs: add media verification ioctl
Add a new privileged ioctl so that xfs_scrub can ask the kernel to
verify the media of the devices backing an xfs filesystem, and have any
resulting media errors reported to fsnotify and xfs_healer.
To accomplish this, the kernel allocates a folio between the base page
size and 1MB, and issues read IOs to a gradually incrementing range of
one of the storage devices underlying an xfs filesystem. If any error
occurs, that raw error is reported to the calling process. If the error
happens to be one of the ones that the kernel considers indicative of
data loss, then it will also be reported to xfs_healthmon and fsnotify.
Driving the verification from the kernel enables xfs (and by extension
xfs_scrub) to have precise control over the size and error handling of
IOs that are issued to the underlying block device, and to emit
notifications about problems to other relevant kernel subsystems
immediately.
Note that the caller is also allowed to reduce the size of the IO and
to ask for a relaxation period after each IO.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 21 Jan 2026 02:06:51 +0000 (18:06 -0800)]
xfs: check if an open file is on the health monitored fs
Create a new ioctl for the healthmon file that checks that a given fd
points to the same filesystem that the healthmon file is monitoring.
This allows xfs_healer to check that when it reopens a mountpoint to
perform repairs, the file that it gets matches the filesystem that
generated the corruption report.
(Note that xfs_healer doesn't maintain an open fd to a filesystem that
it's monitoring so that it doesn't pin the mount.)
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 21 Jan 2026 02:06:51 +0000 (18:06 -0800)]
xfs: allow toggling verbose logging on the health monitoring file
Make it so that we can reconfigure the health monitoring device by
calling the XFS_IOC_HEALTH_MONITOR ioctl on it. As of right now we can
only toggle the verbose flag, but this is less annoying than having to
closing the monitor fd and reopen it.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 21 Jan 2026 02:06:50 +0000 (18:06 -0800)]
xfs: convey file I/O errors to the health monitor
Connect the fserror reporting to the health monitor so that xfs can send
events about file I/O errors to the xfs_healer daemon. These events are
entirely informational because xfs cannot regenerate user data, so
hopefully the fsnotify I/O error event gets noticed by the relevant
management systems.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 21 Jan 2026 02:06:49 +0000 (18:06 -0800)]
xfs: convey externally discovered fsdax media errors to the health monitor
Connect the fsdax media failure notification code to the health monitor
so that xfs can send events about that to the xfs_healer daemon.
Later on we'll add the ability for the xfs_scrub media scan (phase 6) to
report the errors that it finds to the kernel so that those are also
logged by xfs_healer.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 21 Jan 2026 02:06:47 +0000 (18:06 -0800)]
xfs: convey metadata health events to the health monitor
Connect the filesystem metadata health event collection system to the
health monitor so that xfs can send events to xfs_healer as it collects
information.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 21 Jan 2026 02:06:46 +0000 (18:06 -0800)]
xfs: create event queuing, formatting, and discovery infrastructure
Create the basic infrastructure that we need to report health events to
userspace. We need a compact form for recording critical information
about an event and queueing them; a means to notice that we've lost some
events; and a means to format the events into something that userspace
can handle. Make the kernel export C structures via read().
In a previous iteration of this new subsystem, I wanted to explore data
exchange formats that are more flexible and easier for humans to read
than C structures. The thought being that when we want to rev (or
worse, enlarge) the event format, it ought to be trivially easy to do
that in a way that doesn't break old userspace.
I looked at formats such as protobufs and capnproto. These look really
nice in that extending the wire format is fairly easy, you can give it a
data schema and it generates the serialization code for you, handles
endianness problems, etc. The huge downside is that neither support C
all that well.
Too hard, and didn't want to port either of those huge sprawling
libraries first to the kernel and then again to xfsprogs. Then I
thought, how about JSON? Javascript objects are human readable, the
kernel can emit json without much fuss (it's all just strings!) and
there are plenty of interpreters for python/rust/c/etc.
There's a proposed schema format for json, which means that xfs can
publish a description of the events that kernel will emit. Userspace
consumers (e.g. xfsprogs/xfs_healer) can embed the same schema document
and use it to validate the incoming events from the kernel, which means
it can discard events that it doesn't understand, or garbage being
emitted due to bugs.
However, json has a huge crutch -- javascript is well known for its
vague definitions of what are numbers. This makes expressing a large
number rather fraught, because the runtime is free to represent a number
in nearly any way it wants. Stupider ones will truncate values to word
size, others will roll out doubles for uint52_t (yes, fifty-two) with
the resulting loss of precision. Not good when you're dealing with
discrete units.
It just so happens that python's json library is smart enough to see a
sequence of digits and put them in a u64 (at least on x86_64/aarch64)
but an actual javascript interpreter (pasting into Firefox) isn't
necessarily so clever.
It turns out that none of the proposed json schemas were ever ratified
even in an open-consensus way, so json blobs are still just loosely
structured blobs. The parsing in userspace was also noticeably slow and
memory-consumptive.
Hence only the C interface survives.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 21 Jan 2026 02:06:45 +0000 (18:06 -0800)]
xfs: start creating infrastructure for health monitoring
Start creating helper functions and infrastructure to pass filesystem
health events to a health monitoring file. Since this is an
administrative interface, we only support a single health monitor
process per filesystem, so we don't need to use anything fancy such as
notifier chains (== tons of indirect calls).
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Linus Torvalds [Sun, 18 Jan 2026 23:15:47 +0000 (15:15 -0800)]
Merge tag 'landlock-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fixes from Mickaël Salaün:
"This fixes TCP handling, tests, documentation, non-audit elided code,
and minor cosmetic changes"
* tag 'landlock-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Clarify documentation for the IOCTL access right
selftests/landlock: Properly close a file descriptor
landlock: Improve the comment for domain_is_scoped
selftests/landlock: Use scoped_base_variants.h for ptrace_test
selftests/landlock: Fix missing semicolon
selftests/landlock: Fix typo in fs_test
landlock: Optimize stack usage when !CONFIG_AUDIT
landlock: Fix spelling
landlock: Clean up hook_ptrace_access_check()
landlock: Improve erratum documentation
landlock: Remove useless include
landlock: Fix wrong type usage
selftests/landlock: NULL-terminate unix pathname addresses
selftests/landlock: Remove invalid unix socket bind()
selftests/landlock: Add missing connect(minimal AF_UNSPEC) test
selftests/landlock: Fix TCP bind(AF_UNSPEC) test case
landlock: Fix TCP handling of short AF_UNSPEC addresses
landlock: Fix formatting
Linus Torvalds [Sun, 18 Jan 2026 21:18:40 +0000 (13:18 -0800)]
Merge tag 'phy-fixes-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy fixes from Vinod Koul:
"A bunch of driver fixes:
- Freescale typec orientation switch fix, clearing register fix,
assertion of phy reset during power on
- Qualcomm pcs register clear before using
- stm one off fix
- TI runtimepm error handling, regmap leak fixes
- Rockchip gadget mode disconnection and disruption fixes
- Tegra register level fix
- Broadcom pointer cast warning fix"
* tag 'phy-fixes-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
phy: freescale: imx8m-pcie: assert phy reset during power on
phy: rockchip: inno-usb2: Fix a double free bug in rockchip_usb2phy_probe()
phy: broadcom: ns-usb3: Fix Wvoid-pointer-to-enum-cast warning (again)
phy: tegra: xusb: Explicitly configure HS_DISCON_LEVEL to 0x7
phy: rockchip: inno-usb2: fix communication disruption in gadget mode
phy: rockchip: inno-usb2: fix disconnection in gadget mode
phy: ti: gmii-sel: fix regmap leak on probe failure
phy: sparx5-serdes: make it selectable for ARCH_LAN969X
phy: ti: da8xx-usb: Handle devm_pm_runtime_enable() errors
phy: stm32-usphyc: Fix off by one in probe()
phy: qcom-qusb2: Fix NULL pointer dereference on early suspend
phy: fsl-imx8mq-usb: Clear the PCS_TX_SWING_FULL field before using it
dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: Update pcie phy bindings for qcs8300
phy: fsl-imx8mq-usb: fix typec orientation switch when built as module
Linus Torvalds [Sun, 18 Jan 2026 20:29:12 +0000 (12:29 -0800)]
Merge tag 'soundwire-6.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire fix from Vinod Koul:
- Single off-by-one fix for allocating slave id
* tag 'soundwire-6.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: bus: fix off-by-one when allocating slave IDs
Linus Torvalds [Sun, 18 Jan 2026 20:09:13 +0000 (12:09 -0800)]
Merge tag 'usb-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes and new device ids for 6.19-rc6
Included in here are:
- new usb-serial device ids
- dwc3-apple driver fixes to get things working properly on that
hardware platform
- ohci/uhci platfrom driver module soft-deps with ehci to remove a
runtime warning that sometimes shows up on some platforms.
- quirk for broken devices that can not handle reading the BOS
descriptor from them without going crazy.
- usb-serial driver fixes
- xhci driver fixes
- usb gadget driver fixes
All of these except for the last xhci fix has been in linux-next for a
while. The xhci fix has been reported by others to solve the issue for
them, so should be ok"
* tag 'usb-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: sideband: don't dereference freed ring when removing sideband endpoint
usb: gadget: uvc: retry vb2_reqbufs() with vb_vmalloc_memops if use_sg fail
usb: gadget: uvc: return error from uvcg_queue_init()
usb: gadget: uvc: fix interval_duration calculation
usb: gadget: uvc: fix req_payload_size calculation
usb: dwc3: apple: Ignore USB role switches to the active role
usb: host: xhci-tegra: Use platform_get_irq_optional() for wake IRQs
USB: OHCI/UHCI: Add soft dependencies on ehci_platform
usb: dwc3: apple: Set USB2 PHY mode before dwc3 init
USB: serial: f81232: fix incomplete serial port generation
USB: serial: ftdi_sio: add support for PICAXE AXE027 cable
USB: serial: option: add Telit LE910 MBIM composition
usb: core: add USB_QUIRK_NO_BOS for devices that hang on BOS descriptor
dt-bindings: usb: qcom,dwc3: Correct MSM8994 interrupts
dt-bindings: usb: qcom,dwc3: Correct IPQ5018 interrupts
tcpm: allow looking for role_sw device in the main node
usb: dwc3: Check for USB4 IP_NAME
Linus Torvalds [Sun, 18 Jan 2026 20:00:35 +0000 (12:00 -0800)]
Merge tag 'i2c-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- riic, imx-lpi2c: suspend/resume fixes
- qcom-geni: DMA handling fix
- iproc: correct DT binding description
* tag 'i2c-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx-lpi2c: change to PIO mode in system-wide suspend/resume progress
i2c: qcom-geni: make sure I2C hub controllers can't use SE DMA
i2c: riic: Move suspend handling to NOIRQ phase
dt-bindings: i2c: brcm,iproc-i2c: Allow 2 reg entries for brcm,iproc-nic-i2c
Linus Torvalds [Sun, 18 Jan 2026 19:39:56 +0000 (11:39 -0800)]
Merge tag 'edac_urgent_for_v6.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
"Make sure the memory-mapped memory controller registers BAR gets
unmapped when the driver memory allocation fails
Fix that in both x38 and i3200 EDAC drivers as former has copied the
bug from the latter, it looks like"
* tag 'edac_urgent_for_v6.19_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/x38: Fix a resource leak in x38_probe1()
EDAC/i3200: Fix a resource leak in i3200_probe1()
Linus Torvalds [Sun, 18 Jan 2026 19:34:11 +0000 (11:34 -0800)]
Merge tag 'x86-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Fix resctrl initialization on Hygon CPUs
- Fix resctrl memory bandwidth counters on Hygon CPUs
- Fix x86 self-tests build bug
* tag 'x86-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/x86: Add selftests include path for kselftest.h after centralization
x86/resctrl: Fix memory bandwidth counter width for Hygon
x86/resctrl: Add missing resctrl initialization for Hygon
Linus Torvalds [Sun, 18 Jan 2026 18:17:40 +0000 (10:17 -0800)]
Merge tag 'sched-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Misc deadline scheduler fixes, mainly for a new category of bugs that
were discovered and fixed recently:
- Fix a race condition in the DL server
- Fix a DL server bug which can result in incorrectly going idle when
there's work available
- Fix DL server bug which triggers a WARN() due to broken
get_prio_dl() logic and subsequent misbehavior
- Fix double update_rq_clock() calls
- Fix setscheduler() assumption about static priorities
- Make sure balancing callbacks are always called
- Plus a handful of preparatory commits for the fixes"
* tag 'sched-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Use ENQUEUE_MOVE to allow priority change
sched: Deadline has dynamic priority
sched: Audit MOVE vs balance_callbacks
sched: Fold rq-pin swizzle into __balance_callbacks()
sched/deadline: Avoid double update_rq_clock()
sched/deadline: Ensure get_prio_dl() is up-to-date
sched/deadline: Fix server stopping with runnable tasks
sched: Provide idle_rq() helper
sched/deadline: Fix potential race in dl_add_task_root_domain()
sched/deadline: Remove unnecessary comment in dl_add_task_root_domain()
Linus Torvalds [Sun, 18 Jan 2026 17:09:32 +0000 (09:09 -0800)]
Merge tag 'objtool-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
"Fix two objtool build failures that trigger in uncommon build
environments"
* tag 'objtool-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: fix build failure due to missing libopcodes check
objtool: fix compilation failure with the x32 toolchain
Julian Sun [Mon, 8 Dec 2025 12:37:13 +0000 (20:37 +0800)]
ext4: add missing down_write_data_sem in mext_move_extent().
Commit 962e8a01eab9 ("ext4: introduce mext_move_extent()") attempts to
call ext4_swap_extents() on the failure path to recover the swapped
extents, but fails to acquire locks for the two inode->i_data_sem,
triggering the BUG_ON statement in ext4_swap_extents().
This issue can be fixed by calling ext4_double_down_write_data_sem()
before ext4_swap_extents().
Signed-off-by: Julian Sun <sunjunchao@bytedance.com> Reported-by: syzbot+4ea6bd8737669b423aae@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69368649.a70a0220.38f243.0093.GAE@google.com/ Fixes: 962e8a01eab9 ("ext4: introduce mext_move_extent()") Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://patch.msgid.link/20251208123713.1971068-1-sunjunchao@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Arnd Bergmann [Thu, 4 Dec 2025 10:19:10 +0000 (11:19 +0100)]
ext4: fix ext4_tune_sb_params padding
The padding at the end of struct ext4_tune_sb_params is architecture
specific and in particular is different between x86-32 and x86-64,
since the __u64 member only enforces struct alignment on the latter.
This shows up as a new warning when test-building the headers with
-Wpadded:
include/linux/ext4.h:144:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded]
All members inside the structure are naturally aligned, so the only
difference here is the amount of padding at the end. Make the padding
explicit, to have a consistent sizeof(struct ext4_tune_sb_params) of
232 on all architectures and avoid adding compat ioctl handling for
EXT4_IOC_GET_TUNE_SB_PARAM/EXT4_IOC_SET_TUNE_SB_PARAM.
This is an ABI break on x86-32 but hopefully this can go into 6.18.y early
enough as a fixup so no actual users will be affected. Alternatively, the
kernel could handle the ioctl commands for both sizes (232 and 228 bytes)
on all architectures.
Fixes: 04a91570ac67 ("ext4: implemet new ioctls to set and get superblock parameters") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20251204101914.1037148-1-arnd@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
Linus Torvalds [Sun, 18 Jan 2026 03:29:32 +0000 (19:29 -0800)]
Merge tag 'for-6.19-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- with large folios in use, fix partial incorrect update of a reflinked
range
- fix potential deadlock in iget when lookup fails and eviction is
needed
- in send, validate inline extent type while detecting file holes
- fix memory leak after an error when creating a space info
- remove zone statistics from sysfs again, the output size limitations
make it unusable, we'll do it in another way in another release
- test fixes:
- return proper error codes from block remapping tests
- fix tree root leaks in qgroup tests after errors
* tag 'for-6.19-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: remove zoned statistics from sysfs
btrfs: fix memory leaks in create_space_info() error paths
btrfs: invalidate pages instead of truncate after reflinking
btrfs: update the Kconfig string for CONFIG_BTRFS_EXPERIMENTAL
btrfs: send: check for inline extents in range_is_hole_in_parent()
btrfs: tests: fix return 0 on rmap test failure
btrfs: tests: fix root tree leak in btrfs_test_qgroups()
btrfs: release path before iget_failed() in btrfs_read_locked_inode()
Linus Torvalds [Sun, 18 Jan 2026 03:24:48 +0000 (19:24 -0800)]
Merge tag 'loongarch-fixes-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Remove redundant code in head.S, fix PMU counter allocation for mixed-
type event groups, fix a lot of dts build warnings, and fix kvm_device
memory leaks"
* tag 'loongarch-fixes-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Fix kvm_device leak in kvm_pch_pic_destroy()
LoongArch: KVM: Fix kvm_device leak in kvm_eiointc_destroy()
LoongArch: KVM: Fix kvm_device leak in kvm_ipi_destroy()
LoongArch: dts: loongson-2k1000: Fix i2c-gpio node names
LoongArch: dts: loongson-2k2000: Add default interrupt controller address cells
LoongArch: dts: loongson-2k1000: Add default interrupt controller address cells
LoongArch: dts: loongson-2k0500: Add default interrupt controller address cells
LoongArch: dts: Describe PCI sideband IRQ through interrupt-extended
LoongArch: Fix PMU counter allocation for mixed-type event groups
LoongArch: Remove redundant code in head.S
Linus Torvalds [Sat, 17 Jan 2026 16:52:45 +0000 (08:52 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
"An arm64/mpam fix to use non-atomic bitops on struct mmap_props member
(atomicity not required).
For kunit testing, the structure is packed to avoid memcmp() errors
but this affects atomic bitops as they have strict alignment
requirements.
Also remove a duplicate include in the mpam driver"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm_mpam: Use non-atomic bitops when modifying feature bitmap
arm_mpam: Remove duplicate linux/srcu.h header
selftests/x86: Add selftests include path for kselftest.h after centralization
The previous change centralizing kselftest.h include path in lib.mk caused x86
selftests to fail, as x86 Makefile overwrites CFLAGS using ":=", dropping the
include path added in lib.mk. Therefore, helpers.h could not find kselftest.h
during compilation.
Fix this by adding the tools/testing/sefltest to CFLAGS in x86 Makefile.
Linus Torvalds [Sat, 17 Jan 2026 04:59:46 +0000 (20:59 -0800)]
Merge tag 'block-6.19-20260116' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Device quirk to disable faulty temperature (Ilikara)
- TCP target null pointer fix from bad host protocol usage (Shivam)
- Add apple,t8103-nvme-ans2 as a compatible apple controller
(Janne)
- FC tagset leak fix (Chaitanya)
- TCP socket deadlock fix (Hannes)
- Target name buffer overrun fix (Shin'ichiro)
- Fix for an underflow for rnbd during device unmap
- Zero the non-PI part of the auto integrity buffer
- Fix for a configfs memory leak in the null block driver
* tag 'block-6.19-20260116' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
rnbd-clt: fix refcount underflow in device unmap path
nvme: fix PCIe subsystem reset controller state transition
nvmet: do not copy beyond sybsysnqn string length
nvmet-tcp: fixup hang in nvmet_tcp_listen_data_ready()
null_blk: fix kmemleak by releasing references to fault configfs items
block: zero non-PI portion of auto integrity buffer
nvme-fc: release admin tagset if init fails
nvme-apple: add "apple,t8103-nvme-ans2" as compatible
nvme-tcp: fix NULL pointer dereferences in nvmet_tcp_build_pdu_iovec
nvme-pci: disable secondary temp for Wodposit WPBSNM8
Qiang Ma [Sat, 17 Jan 2026 02:57:03 +0000 (10:57 +0800)]
LoongArch: KVM: Fix kvm_device leak in kvm_pch_pic_destroy()
In kvm_ioctl_create_device(), kvm_device has allocated memory,
kvm_device->destroy() seems to be supposed to free its kvm_device
struct, but kvm_pch_pic_destroy() is not currently doing this, that
would lead to a memory leak.
So, fix it.
Cc: stable@vger.kernel.org Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Qiang Ma <maqianga@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Qiang Ma [Sat, 17 Jan 2026 02:57:02 +0000 (10:57 +0800)]
LoongArch: KVM: Fix kvm_device leak in kvm_eiointc_destroy()
In kvm_ioctl_create_device(), kvm_device has allocated memory,
kvm_device->destroy() seems to be supposed to free its kvm_device
struct, but kvm_eiointc_destroy() is not currently doing this, that
would lead to a memory leak.
So, fix it.
Cc: stable@vger.kernel.org Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Qiang Ma <maqianga@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Qiang Ma [Sat, 17 Jan 2026 02:57:02 +0000 (10:57 +0800)]
LoongArch: KVM: Fix kvm_device leak in kvm_ipi_destroy()
In kvm_ioctl_create_device(), kvm_device has allocated memory,
kvm_device->destroy() seems to be supposed to free its kvm_device
struct, but kvm_ipi_destroy() is not currently doing this, that
would lead to a memory leak.
So, fix it.
Cc: stable@vger.kernel.org Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Qiang Ma <maqianga@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Add missing address-cells 0 to the Local I/O, Extend I/O and PCH-PIC
Interrupt Controller node to silence W=1 warning:
loongson-2k2000.dtsi:364.5-49: Warning (interrupt_map): /bus@10000000/pcie@1a000000/pcie@9,0:interrupt-map:
Missing property '#address-cells' in node /bus@10000000/interrupt-controller@10000000, using 0 as fallback
Value '0' is correct because:
1. The LIO/EIO/PCH interrupt controller does not have children,
2. interrupt-map property (in PCI node) consists of five components and
the fourth component "parent unit address", which size is defined by
'#address-cells' of the node pointed to by the interrupt-parent
component, is not used (=0)
Add missing address-cells 0 to the Local I/O interrupt controller node
to silence W=1 warning:
loongson-2k1000.dtsi:498.5-55: Warning (interrupt_map): /bus@10000000/pcie@1a000000/pcie@9,0:interrupt-map:
Missing property '#address-cells' in node /bus@10000000/interrupt-controller@1fe01440, using 0 as fallback
Value '0' is correct because:
1. The Local I/O interrupt controller does not have children,
2. interrupt-map property (in PCI node) consists of five components and
the fourth component "parent unit address", which size is defined by
'#address-cells' of the node pointed to by the interrupt-parent
component, is not used (=0)
Add missing address-cells 0 to the Local I/O and Extend I/O interrupt
controller node to silence W=1 warning:
loongson-2k0500.dtsi:513.5-51: Warning (interrupt_map): /bus@10000000/pcie@1a000000/pcie@0,0:interrupt-map:
Missing property '#address-cells' in node /bus@10000000/interrupt-controller@1fe11600, using 0 as fallback
Value '0' is correct because:
1. The Local I/O & Extend I/O interrupt controller do not have children,
2. interrupt-map property (in PCI node) consists of five components and
the fourth component "parent unit address", which size is defined by
'#address-cells' of the node pointed to by the interrupt-parent
component, is not used (=0)
Yao Zi [Sat, 17 Jan 2026 02:56:52 +0000 (10:56 +0800)]
LoongArch: dts: Describe PCI sideband IRQ through interrupt-extended
SoC integrated peripherals on LS2K1000 and LS2K2000 could be discovered
as PCI devices, but require sideband interrupts to function, which are
previously described by interrupts and interrupt-parent properties.
However, pci/pci-device.yaml allows interrupts property to only specify
PCI INTx interrupts, not sideband ones. Convert these devices to use
interrupt-extended property, which describes sideband interrupts used by
PCI devices since dt-schema commit e6ea659d2baa ("schemas: pci-device:
Allow interrupts-extended for sideband interrupts"), eliminating
dtbs_check warnings.
Cc: stable@vger.kernel.org Fixes: 30a5532a3206 ("LoongArch: dts: DeviceTree for Loongson-2K1000") Signed-off-by: Yao Zi <me@ziyao.cc> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Lisa Robinson [Sat, 17 Jan 2026 02:56:43 +0000 (10:56 +0800)]
LoongArch: Fix PMU counter allocation for mixed-type event groups
When validating a perf event group, validate_group() unconditionally
attempts to allocate hardware PMU counters for the leader, sibling
events and the new event being added.
This is incorrect for mixed-type groups. If a PERF_TYPE_SOFTWARE event
is part of the group, the current code still tries to allocate a hardware
PMU counter for it, which can wrongly consume hardware PMU resources and
cause spurious allocation failures.
Fix this by only allocating PMU counters for hardware events during group
validation, and skipping software events.
Linus Torvalds [Fri, 16 Jan 2026 21:48:18 +0000 (13:48 -0800)]
Merge tag 'drm-fixes-2026-01-16' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Simona Vetter:
"We've had nothing aside of a compiler noise fix until today, when the
amd and drm-misc fixes showed up after Dave already went into weekend
mode. So it's on me to push these out, since there's a bunch of
important fixes in here I think that shouldn't be delayed for a week.
Core Changes:
- take gem lock when preallocating in gpuvm
- add single byte read fallback to dp for broken usb-c adapters
- remove duplicate drm_sysfb declarations
Driver Changes:
- i915: compiler noise fix
- amdgpu/amdkfd: pile of fixes all over
- vmwgfx:
- v10 cursor regression fix
- other fixes
- rockchip:
- waiting for cfgdone regression fix
- other fixes
- gud: fix oops on disconnect
- simple-panel:
- regression fix when connector is not set
- fix for DataImage SCF0700C48GGU18
- nouveau: cursor handling locking fix"
* tag 'drm-fixes-2026-01-16' of https://gitlab.freedesktop.org/drm/kernel: (33 commits)
drm/amd/display: Add an hdmi_hpd_debounce_delay_ms module
drm/amdgpu/userq: Fix fence reference leak on queue teardown v2
drm/amdkfd: No need to suspend whole MES to evict process
Revert "drm/amdgpu: don't attach the tlb fence for SI"
drm/amdgpu: validate the flush_gpu_tlb_pasid()
drm/amd/pm: fix smu overdrive data type wrong issue on smu 14.0.2
drm/amd/display: Initialise backlight level values from hw
drm/amd/display: Bump the HDMI clock to 340MHz
drm/amd/display: Show link name in PSR status message
drm/amdkfd: fix a memory leak in device_queue_manager_init()
drm/amdgpu: make sure userqs are enabled in userq IOCTLs
drm/amdgpu: Use correct address to setup gart page table for vram access
Revert duplicate "drm/amdgpu: disable peer-to-peer access for DCC-enabled GC12 VRAM surfaces"
drm/amd: Clean up kfd node on surprise disconnect
drm/amdgpu: fix drm panic null pointer when driver not support atomic
drm/amdgpu: Fix gfx9 update PTE mtype flag
drm/sysfb: Remove duplicate declarations
drm/nouveau/kms/nv50-: Assert we hold nv50_disp->lock in nv50_head_flush_*
drm/nouveau/disp/nv50-: Set lock_core in curs507a_prepare
drm/gud: fix NULL fb and crtc dereferences on USB disconnect
...