Jan Kara [Thu, 26 Mar 2026 09:54:34 +0000 (10:54 +0100)]
ext4: Track metadata bhs in fs-private inode part
Track metadata bhs for an inode in fs-private part of the inode. We need
the tracking only for nojournal mode so this is somewhat wasteful. We
can relatively easily make the mapping_metadata_bhs struct dynamically
allocated similarly to how we treat jbd2_inode but let's leave that for
ext4 specific series once the dust settles a bit.
Jan Kara [Thu, 26 Mar 2026 09:54:27 +0000 (10:54 +0100)]
fs: Provide functions for handling mapping_metadata_bhs directly
As part of transition toward moving mapping_metadata_bhs to fs-private
part of the inode, provide functions for operations on this list
directly instead of going through the inode / mapping.
Jan Kara [Thu, 26 Mar 2026 09:54:26 +0000 (10:54 +0100)]
fs: Switch inode_has_buffers() to take mapping_metadata_bhs
As part of a move towards placing mapping_metadata_bhs in fs-private
inode part, switch inode_has_buffers() to take mapping_metadata_bhs
and rename the function to mmb_has_buffers().
Jan Kara [Thu, 26 Mar 2026 09:54:25 +0000 (10:54 +0100)]
fs: Make bhs point to mapping_metadata_bhs
Make buffer heads point to mapping_metadata_bhs instead of struct
address_space. This makes the code more self contained. For the (only)
case of IO error handling where we really need to reach struct
address_space add a pointer to the mapping from mapping_metadata_bhs.
Jan Kara [Thu, 26 Mar 2026 09:54:24 +0000 (10:54 +0100)]
fs: Move metadata bhs tracking to a separate struct
Instead of tracking metadata bhs for a mapping using i_private_list and
i_private_lock create a dedicated mapping_metadata_bhs struct for it.
So far this struct is embedded in address_space but that will be
switched for per-fs private inode parts later in the series. This also
changes the locking from bdev mapping's i_private_lock to a new lock
embedded in mapping_metadata_bhs to untangle the i_private_lock locking
for maintaining lists of metadata bhs and the locking for looking up /
reclaiming bdev's buffer heads. The locking in remove_assoc_map() gets
more complex due to this but overall this looks like a reasonable
tradeoff.
Jan Kara [Thu, 26 Mar 2026 09:54:23 +0000 (10:54 +0100)]
fs: Fold fsync_buffers_list() into sync_mapping_buffers()
There's only single caller of fsync_buffers_list() so untangle the code
a bit by folding fsync_buffers_list() into sync_mapping_buffers(). Also
merge the comments and update them to reflect current state of code.
Jan Kara [Thu, 26 Mar 2026 09:54:22 +0000 (10:54 +0100)]
fs: Drop osync_buffers_list()
The function only waits for already locked buffers in the list of
metadata bhs. fsync_buffers_list() has just waited for all outstanding
IO on buffers so this isn't adding anything useful. Comment in front of
fsync_buffers_list() mentions concerns about buffers being moved out
from tmp list back to mappings i_private_list but these days
mark_buffer_dirty_inode() doesn't touch inodes with b_assoc_map set so
that cannot happen. Just delete the stale code.
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-70-jack@suse.cz Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
Jan Kara [Thu, 26 Mar 2026 09:54:21 +0000 (10:54 +0100)]
kvm: Use private inode list instead of i_private_list
Instead of using mapping->i_private_list use a list in private part of
the inode.
CC: kvm@vger.kernel.org CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-69-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
Jan Kara [Thu, 26 Mar 2026 09:54:20 +0000 (10:54 +0100)]
fs: Remove i_private_data
Nobody is using it anymore.
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-68-jack@suse.cz Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
Jan Kara [Thu, 26 Mar 2026 09:54:19 +0000 (10:54 +0100)]
aio: Stop using i_private_data and i_private_lock
Instead of using i_private_data and i_private_lock, just create aio
inodes with appropriate necessary fields.
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-67-jack@suse.cz Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
Jan Kara [Thu, 26 Mar 2026 09:54:17 +0000 (10:54 +0100)]
fs: Stop using i_private_data for metadata bh tracking
All filesystem using generic metadata bh tracking are using bdev mapping
as a backing for these bhs. Stop using i_private_data for it and get to
bdev mapping directly.
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-65-jack@suse.cz Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
Jan Kara [Thu, 26 Mar 2026 09:54:16 +0000 (10:54 +0100)]
fs: Ignore inode metadata buffers in inode_lru_isolate()
There are only a few filesystems that use generic tracking of inode
metadata buffer heads. As such the logic to reclaim tracked metadata
buffer heads in inode_lru_isolate() doesn't bring a benefit big enough
to justify intertwining of inode reclaim and metadata buffer head
tracking. Just treat tracked metadata buffer heads as any other metadata
filesystem has to properly clean up on inode eviction and stop handling
it in inode_lru_isolate(). As a result filesystems using generic
tracking of metadata buffer heads may now see dirty metadata buffers in
their .evict methods more often which can slow down inode reclaim but
given these filesystems aren't used in performance demanding setups we
should be fine.
Jan Kara [Thu, 26 Mar 2026 09:54:15 +0000 (10:54 +0100)]
affs: Sync and invalidate metadata buffers from affs_evict_inode()
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Jan Kara [Thu, 26 Mar 2026 09:54:14 +0000 (10:54 +0100)]
bfs: Sync and invalidate metadata buffers from bfs_evict_inode()
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Jan Kara [Thu, 26 Mar 2026 09:54:13 +0000 (10:54 +0100)]
ext4: Sync and invalidate metadata buffers from ext4_evict_inode()
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-61-jack@suse.cz Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
Jan Kara [Thu, 26 Mar 2026 09:54:12 +0000 (10:54 +0100)]
ext2: Sync and invalidate metadata buffers from ext2_evict_inode()
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Jan Kara [Thu, 26 Mar 2026 09:54:11 +0000 (10:54 +0100)]
minix: Sync and invalidate metadata buffers from minix_evict_inode()
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Jan Kara [Thu, 26 Mar 2026 09:54:10 +0000 (10:54 +0100)]
udf: Sync and invalidate metadata buffers from udf_evict_inode()
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Jan Kara [Thu, 26 Mar 2026 09:54:09 +0000 (10:54 +0100)]
fat: Sync and invalidate metadata buffers from fat_evict_inode()
There are only very few filesystems using generic metadata buffer head
tracking and everybody is paying the overhead. When we remove this
tracking for inode reclaim code .evict will start to see inodes with
metadata buffers attached so write them out and prune them.
Jan Kara [Thu, 26 Mar 2026 09:54:07 +0000 (10:54 +0100)]
fs: Drop sync_mapping_buffers() from __generic_file_fsync()
No filesystem calling __generic_file_fsync() uses metadata bh tracking.
Drop sync_mapping_buffers() call from __generic_file_fsync() as it's
pointless now which untangles buffer head handling from fs/libfs.c.
Jan Kara [Thu, 26 Mar 2026 09:54:06 +0000 (10:54 +0100)]
fat: Switch to generic_buffers_fsync_noflush()
FAT uses a list of metadata bhs attached to an inode. Switch it to use
generic_buffers_fsync_noflush() instead of __generic_file_fsync() as
we'll be removing metadata bh handling from __generic_file_fsync().
Jan Kara [Thu, 26 Mar 2026 09:54:05 +0000 (10:54 +0100)]
bfs: Switch to generic_buffers_fsync()
BFS uses list of metadata bhs attached to an inode. Switch it to use
generic_buffers_fsync() instead of generic_file_fsync() as we'll be
removing metadata bh handling from generic_file_fsync().
Jan Kara [Thu, 26 Mar 2026 09:54:04 +0000 (10:54 +0100)]
minix: Switch to generic_buffers_fsync()
Minix uses list of metadata bhs attached to an inode. Switch it to
generic_buffers_fsync() instead of generic_file_fsync() as we'll be
removing metadata bh handling from generic_file_fsync().
Jan Kara [Thu, 26 Mar 2026 09:54:03 +0000 (10:54 +0100)]
udf: Switch to generic_buffers_fsync()
UDF uses metadata bh list attached to inode. Switch it to
generic_buffers_fsync() instead of generic_file_fsync() as we'll be
removing metadata bh handling from generic_file_fsync().
Jan Kara [Thu, 26 Mar 2026 09:54:02 +0000 (10:54 +0100)]
fs: Remove inode lock from __generic_file_fsync()
Inode lock in __generic_file_fsync() protects sync_mapping_buffers() and
sync_inode_metadata() calls. Neither sync_mapping_buffers() nor
sync_inode_metadata() themselves need the protection by inode_lock and
both metadata buffer head writeback and inode writeback can happen
without inode lock (either in case of background writeback or sync(2)
calls). The only protection inode_lock can possibly provide is that
write(2) or other inode modifying calls cannot happen in the middle of
bh+inode writeout and thus result in writeout of inconsistent metadata.
However if writes and fsyncs race, background writeback can submit
inconsistent metadata just after fsync completed even with inode_lock
protecting fsync so this seems moot as well. So let's remove the
apparently pointless inode_lock protection.
Jan Kara [Thu, 26 Mar 2026 09:53:59 +0000 (10:53 +0100)]
bdev: Drop pointless invalidate_inode_buffers() call
Nobody is calling mark_buffer_dirty_inode() with internal bdev inode and
it doesn't make sense for internal bdev inode to have any metadata
buffer heads. Just drop the pointless invalidate_inode_buffers() call
and consequently the whole bdev_evict_inode() because generic code takes
care of the rest.
Jan Kara [Thu, 26 Mar 2026 09:53:58 +0000 (10:53 +0100)]
ocfs2: Drop pointless sync_mapping_buffers() calls
ocfs2 never calls mark_buffer_dirty_inode() and thus its metadata
buffers list is always empty. Drop the pointless sync_mapping_buffers()
calls.
CC: Joel Becker <jlbec@evilplan.org> CC: Joseph Qi <joseph.qi@linux.alibaba.com> CC: ocfs2-devel@lists.linux.dev Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-46-jack@suse.cz Tested-by: syzbot@syzkaller.appspotmail.com Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
Jan Kara [Thu, 26 Mar 2026 09:53:57 +0000 (10:53 +0100)]
ntfs3: Drop pointless sync_mapping_buffers() and invalidate_inode_buffers() calls
ntfs3 never calls mark_buffer_dirty_inode() and thus its metadata
buffers list is always empty. Drop the pointless sync_mapping_buffers()
and invalidate_inode_buffers() calls.
CC: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> CC: ntfs3@lists.linux.dev Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-45-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
Jan Kara [Thu, 26 Mar 2026 09:53:56 +0000 (10:53 +0100)]
gfs2: Don't zero i_private_data
Remove the explicit zeroing of mapping->i_private_data since this
field is no longer used.
CC: Andreas Gruenbacher <agruenba@redhat.com> CC: gfs2@lists.linux.dev Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-44-jack@suse.cz Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
Jihed Chaibi [Tue, 24 Mar 2026 22:45:30 +0000 (23:45 +0100)]
ASoC: fsl: mpc5200_dma: Convert to devm_ioremap()
Replace ioremap() with devm_ioremap() so the mapping is released
automatically when the device is unbound. Remove the corresponding
iounmap() calls from the error path in mpc5200_audio_dma_create() and
from mpc5200_audio_dma_destroy().
Since devm_ioremap() failure already returns directly and no other
cleanup is needed at that point, simplify the kzalloc error path to
return -ENOMEM directly instead of jumping to the now-removed out_unmap
label.
The kernel test robot reported that on hexagon with clang, several test
functions in fat_test.c exceed the 1280-byte stack frame limit.
The root cause is the compound literal assignment in
fat_test_set_time_offset():
*sbi = (struct msdos_sb_info){};
struct msdos_sb_info contains two hash tables of 256 hlist_head entries
(FAT_HASH_SIZE), making it several kilobytes. The compound literal
creates a temporary on the stack, and when clang inlines
fat_test_set_time_offset() into each test function, the large temporary
inflates every caller's stack frame beyond the limit.
Replace the compound literal with memset() which zeroes the struct
in-place without a stack temporary.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202603251755.4UYY1Rcd-lkp@intel.com/ Signed-off-by: Christian Brauner <brauner@kernel.org>
Frank Li [Mon, 16 Feb 2026 19:18:44 +0000 (14:18 -0500)]
media: synopsys: csi2rx: add i.MX93 support
The i.MX93 uses a newer version of the DW CSI-2 controller with a changed
register layout and an integrated Image Pixel Interface (IPI), which
converts the received CSI-2 packets from byte to pixel format and produces
a pixel data bus containing vertical and horizontal synchronization
information.
The reset flow also differs, so add the .assert_reset(), .deassert_reset(),
and .idi_enable() callbacks to support it.
Reviewed-by: Michael Riesch <michael.riesch@collabora.com> Signed-off-by: Frank Li <Frank.Li@nxp.com>
[Sakari Ailus: include missing linux/bitfield.h.] Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
The i.MX93 uses the DW CSI-2 RX controller, which is similar to the
Rockchip RK3568 implementation.
The i.MX93 variant provides one IRQ, two clocks, and no resets. Add the
"fsl,imx93-mipi-csi2" compatible string and keep the same constraints for
RK3568.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Acked-by: Michael Riesch <michael.riesch@collabora.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Frank Li [Mon, 16 Feb 2026 19:18:42 +0000 (14:18 -0500)]
media: synopsys: csi2rx: Use enum and u32 array for register offsets
Use enum dw_mipi_csi2rx_regs_index together with a u32 array to describe
register offsets. This allows supporting new IP versions with different
register layouts in a structured way.
Add rk3568_regs matching the previous macro definitions and pass it as
driver data during probe.
No functional change intended.
Reviewed-by: Michael Riesch <michael.riesch@collabora.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Implement the .get_frame_desc() callback to fetch information from the
remote endpoint.
Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Michael Riesch <michael.riesch@collabora.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Frank Li [Mon, 16 Feb 2026 19:18:40 +0000 (14:18 -0500)]
media: synopsys: csi2rx: only check errors from devm_clk_bulk_get_all()
devm_clk_bulk_get_all() returns all clocks described in the DT, which are
already validated by the binding. Do not need enforce an expected clock
count.
Only check for error returns (< 0) to support more SoCs.
Reviewed-by: Michael Riesch <michael.riesch@collabora.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Frank Li [Mon, 16 Feb 2026 19:18:39 +0000 (14:18 -0500)]
media: synopsys: csi2rx: use devm_reset_control_get_optional_exclusive()
The DW MIPI CSI-2 RX is used on different SoCs, not all of which provide a
reset controller. Switch to devm_reset_control_get_optional_exclusive()
to support such platforms.
Reset presence and numbering are validated by the DT binding.
Reviewed-by: Michael Riesch <michael.riesch@collabora.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Matthias Fend [Mon, 12 Jan 2026 14:59:28 +0000 (15:59 +0100)]
media: i2c: imx283: add support for non-continuous MIPI clock mode
Add support for selecting between continuous and non-continuous MIPI clock
mode.
Previously, the CSI-2 non-continuous clock endpoint flag was ignored and
the sensor was always configured for non-continuous clock mode. For
existing device tree nodes that do not have this property enabled, this
update will therefore change the actual clock mode.
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com> Signed-off-by: Matthias Fend <matthias.fend@emfend.at> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Matthias Fend [Tue, 24 Mar 2026 10:41:43 +0000 (11:41 +0100)]
media: i2c: ov08d10: add support for 24 MHz input clock
The sensor supports an input clock in the range of 6 to 27 MHz. Currently,
the driver only supports a 19.2 MHz clock. Extend the driver so that at
least 24 MHz, which is a typical frequency for this sensor, can also be
used.
Signed-off-by: Matthias Fend <matthias.fend@emfend.at> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
The current settings for the "image vertical start" register appear to be
incorrect. While this only results in an incorrect start line for native
modes, this faulty setting causes actual problems in binning mode. At least
on an i.MX8MP test system, only corrupted frames could be received.
To correct this, the recommended settings from the reference register sets
are used for all modes. Since this shifts the start by one line, the Bayer
pattern also changes, which has also been corrected.
Fixes: 7be91e02ed57 ("media: i2c: Add ov08d10 camera sensor driver") Cc: stable@vger.kernel.org Signed-off-by: Matthias Fend <matthias.fend@emfend.at> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Matthias Fend [Tue, 24 Mar 2026 10:41:35 +0000 (11:41 +0100)]
media: i2c: ov08d10: fix runtime PM handling in probe
Set the device's runtime PM status and enable runtime PM before registering
the async sub-device. This is needed to avoid the case where the device is
runtime PM resumed while runtime PM has not been enabled yet.
Remove the related, non-driver-specific comment while at it.
Fixes: 7be91e02ed57 ("media: i2c: Add ov08d10 camera sensor driver") Cc: stable@vger.kernel.org Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Matthias Fend <matthias.fend@emfend.at> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Darrick J. Wong [Mon, 23 Mar 2026 21:04:33 +0000 (14:04 -0700)]
xfs: remove file_path tracepoint data
The xfile/xmbuf shmem file descriptions are no longer as detailed as
they were when online fsck was first merged, because moving to static
strings in commit 60382993a2e180 ("xfs: get rid of the
xchk_xfile_*_descr calls") removed a memory allocation and hence a
source of failure.
However this makes encoding the description in the tracepoints sort of a
waste of memory. David Laight also points out that file_path doesn't
zero the whole buffer which causes exposure of stale trace bytes, and
Steven Rostedt wonders why we're not using a dynamic array for the file
path.
I don't think this is worth fixing, so let's just rip it out.
Cc: rostedt@goodmis.org Cc: david.laight.linux@gmail.com Link: https://lore.kernel.org/linux-xfs/20260323172204.work.979-kees@kernel.org/ Cc: stable@vger.kernel.org # v6.11 Fixes: 19ebc8f84ea12e ("xfs: fix file_path handling in tracepoints") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Mon, 23 Mar 2026 21:01:57 +0000 (14:01 -0700)]
xfs: don't irele after failing to iget in xfs_attri_recover_work
xlog_recovery_iget* never set @ip to a valid pointer if they return
an error, so this irele will walk off a dangling pointer. Fix that.
Cc: stable@vger.kernel.org # v6.10 Fixes: ae673f534a3097 ("xfs: record inode generation in xattr update log intent items") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Laurent Pinchart [Mon, 23 Mar 2026 16:45:26 +0000 (18:45 +0200)]
drm: rcar-du: Don't leak device_link to CMM
The DU driver creates device_link instances between the DU and CMMs, but
never deletes them. Fix it by introducing a rcar_du_cmm structure to
group the CMM device and device_link, and deleting the links at cleanup
time.
Laurent Pinchart [Mon, 23 Mar 2026 16:45:24 +0000 (18:45 +0200)]
drm: rcar-du: Store CMM device pointer instead of platform_device
The DU driver stores the CMM devices as pointers to struct
platform_device, and passes them to the API exposed by the CMM driver.
This is similar to how the VSP is handled, except that the VSP uses
struct device pointers. Replace the CMM platform_device pointers with
device pointers for consistency.
Laurent Pinchart [Mon, 23 Mar 2026 16:45:23 +0000 (18:45 +0200)]
drm: rcar-du: Ensure correct suspend/resume ordering with VSP
The VSP serves as an interface to memory and a compositor to the DU. It
therefore needs to be suspended after and resumed before the DU, to be
properly stopped and restarted in a controlled fashion driven by the DU
driver. This currently works by chance. Avoid relying on luck by
enforcing the correct suspend/resume ordering with device links.
Jens Axboe [Thu, 26 Mar 2026 13:02:53 +0000 (07:02 -0600)]
io_uring/fdinfo: fix SQE_MIXED SQE displaying
When displaying pending SQEs for a MIXED ring, each 128-byte SQE
increments sq_head to skip the second slot, but the loop counter is not
adjusted. This can cause the loop to read past sq_tail by one entry for
each 128-byte SQE encountered, displaying SQEs that haven't been made
consumable yet by the application.
Match the kernel's own consumption logic in io_init_req() which
decrements what's left when consuming the extra slot.
Fixes: 1cba30bf9fdd ("io_uring: add support for IORING_SETUP_SQE_MIXED") Signed-off-by: Jens Axboe <axboe@kernel.dk>
Den 2026-03-25 kl. 22:11, skrev Simona Vetter:
> On Wed, Mar 25, 2026 at 10:26:40AM -0700, Guenter Roeck wrote:
>> Hi,
>>
>> On Fri, Mar 13, 2026 at 04:17:27PM +0100, Maarten Lankhorst wrote:
>>> When trying to do a rather aggressive test of igt's "xe_module_load
>>> --r reload" with a full desktop environment and game running I noticed
>>> a few OOPSes when dereferencing freed pointers, related to
>>> framebuffers and property blobs after the compositor exits.
>>>
>>> Solve this by guarding the freeing in drm_file with drm_dev_enter/exit,
>>> and immediately put the references from struct drm_file objects during
>>> drm_dev_unplug().
>>>
>>
>> With this patch in v6.18.20, I get the warning backtraces below.
>> The backtraces are gone with the patch reverted.
>
> Yeah, this needs to be reverted, reasoning below. Maarten, can you please
> take care of that and feed the revert through the usual channels? I don't
> think it's critical enough that we need to fast-track this into drm.git
> directly.
>
> Quoting the patch here again:
>
>> drivers/gpu/drm/drm_file.c | 5 ++++-
>> drivers/gpu/drm/drm_mode_config.c | 9 ++++++---
>> 2 files changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> index ec820686b3021..f52141f842a1f 100644
>> --- a/drivers/gpu/drm/drm_file.c
>> +++ b/drivers/gpu/drm/drm_file.c
>> @@ -233,6 +233,7 @@ static void drm_events_release(struct drm_file *file_priv)
>> void drm_file_free(struct drm_file *file)
>> {
>> struct drm_device *dev;
>> + int idx;
>>
>> if (!file)
>> return;
>> @@ -249,9 +250,11 @@ void drm_file_free(struct drm_file *file)
>>
>> drm_events_release(file);
>>
>> - if (drm_core_check_feature(dev, DRIVER_MODESET)) {
>> + if (drm_core_check_feature(dev, DRIVER_MODESET) &&
>> + drm_dev_enter(dev, &idx)) {
>
> This is misplaced for two reasons:
>
> - Even if we'd want to guarantee that we hold a drm_dev_enter/exit
> reference during framebuffer teardown, we'd need to do this
> _consistently over all callsites. Not ad-hoc in just one place that a
> testcase hits. This also means kerneldoc updates of the relevant hooks
> and at least a bunch of acks from other driver people to document the
> consensus.
>
> - More importantly, this is driver responsibilities in general unless we
> have extremely good reasons to the contrary. Which means this must be
> placed in xe.
>
>> drm_fb_release(file);
>> drm_property_destroy_user_blobs(dev, file);
>> + drm_dev_exit(idx);
>> }
>>
>> if (drm_core_check_feature(dev, DRIVER_SYNCOBJ))
>> diff --git a/drivers/gpu/drm/drm_mode_config.c b/drivers/gpu/drm/drm_mode_config.c
>> index 84ae8a23a3678..e349418978f79 100644
>> --- a/drivers/gpu/drm/drm_mode_config.c
>> +++ b/drivers/gpu/drm/drm_mode_config.c
>> @@ -583,10 +583,13 @@ void drm_mode_config_cleanup(struct drm_device *dev)
>> */
>> WARN_ON(!list_empty(&dev->mode_config.fb_list));
>> list_for_each_entry_safe(fb, fbt, &dev->mode_config.fb_list, head) {
>> - struct drm_printer p = drm_dbg_printer(dev, DRM_UT_KMS, "[leaked fb]");
>> + if (list_empty(&fb->filp_head) || drm_framebuffer_read_refcount(fb) > 1) {
>> + struct drm_printer p = drm_dbg_printer(dev, DRM_UT_KMS, "[leaked fb]");
>
> This is also wrong:
>
> - Firstly, it's a completely independent bug, we do not smash two bugfixes
> into one patch.
>
> - Secondly, it's again a driver bug: drm_mode_cleanup must be called when
> the last drm_device reference disappears (hence the existence of
> drmm_mode_config_init), not when the driver gets unbound. The fact that
> this shows up in a callchain from a devres cleanup means the intel
> driver gets this wrong (like almost everyone else because historically
> we didn't know better).
>
> If we don't follow this rule, then we get races with this code here
> running concurrently with drm_file fb cleanups, which just does not
> work. Review pointed that out, but then shrugged it off with a confused
> explanation:
>
> https://lore.kernel.org/all/e61e64c796ccfb17ae673331a3df4b877bf42d82.camel@linux.intel.com/
>
> Yes this also means a lot of the other drm_device teardown that drivers
> do happens way too early. There is a massive can of worms here of a
> magnitude that most likely is much, much bigger than what you can
> backport to stable kernels. Hotunplug is _hard_.
Back to the drawing board, and fixing it in the intel display driver
instead.
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 6bee098b9141 ("drm: Fix use-after-free on framebuffers and property blobs when calling drm_dev_unplug") Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Simona Vetter <simona.vetter@ffwll.ch> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260326082217.39941-2-dev@lankhorst.se
Daniel Almeida [Mon, 23 Mar 2026 23:27:02 +0000 (20:27 -0300)]
rust: drm: dispatch delayed work items to the private data
Much like the patch that dispatched (regular) work items, we also need to
dispatch delayed work items in order not to trigger the orphan rule. This
allows a drm::Device<T> to dispatch the delayed work to T::Data.
Daniel Almeida [Mon, 23 Mar 2026 23:27:01 +0000 (20:27 -0300)]
rust: workqueue: add delayed work support for ARef<T>
The preceding patches added support for ARef<T> work items. By the same
token, add support for delayed work items too.
The rationale is the same: it may be convenient or even necessary at times
to implement HasDelayedWork directly on ARef<T>. A follow up patch will
also implement support for drm::Device as the first user.
Daniel Almeida [Mon, 23 Mar 2026 23:27:00 +0000 (20:27 -0300)]
rust: drm: dispatch work items to the private data
This implementation dispatches any work enqueued on ARef<drm::Device<T>> to
its driver-provided handler. It does so by building upon the newly-added
ARef<T> support in workqueue.rs in order to call into the driver
implementations for work_container_of and raw_get_work.
This is notably important for work items that need access to the drm
device, as it was not possible to enqueue work on a ARef<drm::Device<T>>
previously without failing the orphan rule.
The current implementation needs T::Data to live inline with drm::Device in
order for work_container_of to function. This restriction is already
captured by the trait bounds. Drivers that need to share their ownership of
T::Data may trivially get around this:
Daniel Almeida [Mon, 23 Mar 2026 23:26:59 +0000 (20:26 -0300)]
rust: workqueue: add support for ARef<T>
Add support for the ARef<T> smart pointer. This allows an instance of
ARef<T> to handle deferred work directly, which can be convenient or even
necessary at times, depending on the specifics of the driver or subsystem.
The implementation is similar to that of Arc<T>, and a subsequent patch
will implement support for drm::Device as the first user. This is notably
important for work items that need access to the drm device, as it was not
possible to enqueue work on a ARef<drm::Device<T>> previously without
failing the orphan rule.
songxiebing [Wed, 25 Mar 2026 02:28:04 +0000 (10:28 +0800)]
ASoC: renesas: Fix non-static global variable
When using global variables in a .c file only,it is necessary to add
the keyword "static", so here fix the warning.
sparse warnings: (new ones prefixed by >>)
>> sound/soc/renesas/dma-sh7760.c:62:3: sparse: sparse: symbol
'cam_pcm_data' was not declared. Should it be static?
Signed-off-by: songxiebing <songxiebing@kylinos.cn> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202412171210.7a4vH3Ew-lkp@intel.com/ Link: https://patch.msgid.link/20260325022804.253353-1-songxiebing@kylinos.cn Signed-off-by: Mark Brown <broonie@kernel.org>
Paolo Valerio [Mon, 23 Mar 2026 19:16:34 +0000 (20:16 +0100)]
net: macb: use the current queue number for stats
There's a potential mismatch between the memory reserved for statistics
and the amount of memory written.
gem_get_sset_count() correctly computes the number of stats based on the
active queues, whereas gem_get_ethtool_stats() indiscriminately copies
data using the maximum number of queues, and in the case the number of
active queues is less than MACB_MAX_QUEUES, this results in a OOB write
as observed in the KASAN splat.
==================================================================
BUG: KASAN: vmalloc-out-of-bounds in gem_get_ethtool_stats+0x54/0x78
[macb]
Write of size 760 at addr ffff80008080b000 by task ethtool/1027
Paolo Abeni [Thu, 26 Mar 2026 12:46:55 +0000 (13:46 +0100)]
Merge tag 'for-net-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- L2CAP: Fix deadlock in l2cap_conn_del()
- L2CAP: Fix ERTM re-init and zero pdu_len infinite loop
- L2CAP: Fix send LE flow credits in ACL link
- btintel: serialize btintel_hw_error() with hci_req_sync_lock
- btusb: clamp SCO altsetting table indices
* tag 'for-net-2026-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btusb: clamp SCO altsetting table indices
Bluetooth: L2CAP: Fix ERTM re-init and zero pdu_len infinite loop
Bluetooth: L2CAP: Fix deadlock in l2cap_conn_del()
Bluetooth: btintel: serialize btintel_hw_error() with hci_req_sync_lock
Bluetooth: L2CAP: Fix send LE flow credits in ACL link
====================
Alex Williamson [Mon, 23 Mar 2026 21:56:58 +0000 (15:56 -0600)]
vfio/pci: Fix double free in dma-buf feature
The error path through vfio_pci_core_feature_dma_buf() ignores its
own advice to only use dma_buf_put() after dma_buf_export(), instead
falling through the entire unwind chain. In the unlikely event that
we encounter file descriptor exhaustion, this can result in an
unbalanced refcount on the vfio device and double free of allocated
objects.
Avoid this by moving the "put" directly into the error path and return
the errno rather than entering the unwind chain.
Reported-by: Renato Marziano <renato@marziano.top> Fixes: 5d74781ebc86 ("vfio/pci: Add dma-buf export support for MMIO regions") Cc: stable@vger.kernel.org Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@nvidia.com> Link: https://lore.kernel.org/r/20260323215659.2108191-3-alex.williamson@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Alex Williamson <alex@shazbot.org>
kernel: Use trace_call__##name() at guarded tracepoint call sites
Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.
Cc: David Vernet <void@manifault.com> Cc: Andrea Righi <arighi@nvidia.com> Cc: Changwoo Min <changwoo@igalia.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Thomas Gleixner <tglx@kernel.org> Cc: "Yury Norov [NVIDIA]" <yury.norov@gmail.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Roman Kisel <romank@linux.microsoft.com> Cc: Joel Fernandes <joelagnelf@nvidia.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20260323160052.17528-3-vineeth@bitbyteword.org Suggested-by: Steven Rostedt <rostedt@goodmis.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org> Assisted-by: Claude:claude-sonnet-4-6 Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglx@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
David Carlier [Wed, 25 Mar 2026 13:11:08 +0000 (14:11 +0100)]
netfilter: ctnetlink: use netlink policy range checks
Replace manual range and mask validations with netlink policy
annotations in ctnetlink code paths, so that the netlink core rejects
invalid values early and can generate extack errors.
- CTA_PROTOINFO_TCP_STATE: reject values > TCP_CONNTRACK_SYN_SENT2 at
policy level, removing the manual >= TCP_CONNTRACK_MAX check.
- CTA_PROTOINFO_TCP_WSCALE_ORIGINAL/REPLY: reject values > TCP_MAX_WSCALE
(14). The normal TCP option parsing path already clamps to this value,
but the ctnetlink path accepted 0-255, causing undefined behavior when
used as a u32 shift count.
- CTA_FILTER_ORIG_FLAGS/REPLY_FLAGS: use NLA_POLICY_MASK with
CTA_FILTER_F_ALL, removing the manual mask checks.
- CTA_EXPECT_FLAGS: use NLA_POLICY_MASK with NF_CT_EXPECT_MASK, adding
a new mask define grouping all valid expect flags.
Extracted from a broader nf-next patch by Florian Westphal, scoped to
ctnetlink for the fixes tree.
Fixes: c8e2078cfe41 ("[NETFILTER]: ctnetlink: add support for internal tcp connection tracking flags handling") Signed-off-by: David Carlier <devnexen@gmail.com> Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Weiming Shi [Wed, 25 Mar 2026 13:11:07 +0000 (14:11 +0100)]
netfilter: nf_conntrack_sip: fix use of uninitialized rtp_addr in process_sdp
process_sdp() declares union nf_inet_addr rtp_addr on the stack and
passes it to the nf_nat_sip sdp_session hook after walking the SDP
media descriptions. However rtp_addr is only initialized inside the
media loop when a recognized media type with a non-zero port is found.
If the SDP body contains no m= lines, only inactive media sections
(m=audio 0 ...) or only unrecognized media types, rtp_addr is never
assigned. Despite that, the function still calls hooks->sdp_session()
with &rtp_addr, causing nf_nat_sdp_session() to format the stale stack
value as an IP address and rewrite the SDP session owner and connection
lines with it.
With CONFIG_INIT_STACK_ALL_ZERO (default on most distributions) this
results in the session-level o= and c= addresses being rewritten to
0.0.0.0 for inactive SDP sessions. Without stack auto-init the
rewritten address is whatever happened to be on the stack.
Fix this by pre-initializing rtp_addr from the session-level connection
address (caddr) when available, and tracking via a have_rtp_addr flag
whether any valid address was established. Skip the sdp_session hook
entirely when no valid address exists.
Fixes: 4ab9e64e5e3c ("[NETFILTER]: nf_nat_sip: split up SDP mangling") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
spi: spi-fsl-lpspi: fix teardown order issue (UAF)
There is a teardown order issue in the driver. The SPI controller is
registered using devm_spi_register_controller(), which delays
unregistration of the SPI controller until after the fsl_lpspi_remove()
function returns.
As the fsl_lpspi_remove() function synchronously tears down the DMA
channels, a running SPI transfer triggers the following NULL pointer
dereference due to use after free:
Switch from devm_spi_register_controller() to spi_register_controller() in
fsl_lpspi_probe() and add the corresponding spi_unregister_controller() in
fsl_lpspi_remove().
Add trace_call__##name() as a companion to trace_##name(). When a
caller already guards a tracepoint with an explicit enabled check:
if (trace_foo_enabled() && cond)
trace_foo(args);
trace_foo() internally repeats the static_branch_unlikely() test, which
the compiler cannot fold since static branches are patched binary
instructions. This results in two static-branch evaluations for every
guarded call site.
trace_call__##name() calls __do_trace_##name() directly, skipping the
redundant static-branch re-check. This avoids leaking the internal
__do_trace_##name() symbol into call sites while still eliminating the
double evaluation:
if (trace_foo_enabled() && cond)
trace_invoke_foo(args); /* calls __do_trace_foo() directly */
Three locations are updated:
- __DECLARE_TRACE: invoke form omits static_branch_unlikely, retains
the LOCKDEP RCU-watching assertion.
- __DECLARE_TRACE_SYSCALL: same, plus retains might_fault().
- !TRACEPOINTS_ENABLED stub: empty no-op so callers compile cleanly
when tracepoints are compiled out.
Cc: Dmitry Ilvokhin <d@ilvokhin.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: Jon Maloy <jmaloy@redhat.com> Cc: Aaron Conole <aconole@redhat.com> Cc: Eelco Chaudron <echaudro@redhat.com> Cc: Ilya Maximets <i.maximets@ovn.org> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Oded Gabbay <ogabbay@kernel.org> Cc: Koby Elbaz <koby.elbaz@intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: "Gautham R. Shenoy" <gautham.shenoy@amd.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Len Brown <lenb@kernel.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: MyungJoo Ham <myungjoo.ham@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Chanwoo Choi <cw00.choi@samsung.com> Cc: Christian König <christian.koenig@amd.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Eddie James <eajames@linux.ibm.com> Cc: Andrew Jeffery <andrew@codeconstruct.com.au> Cc: Joel Stanley <joel@jms.id.au> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Benjamin Tissoires <bentiss@kernel.org> Cc: Wolfram Sang <wsa+renesas@sang-engineering.com> Cc: Mark Brown <broonie@kernel.org> Cc: Michael Hennerich <michael.hennerich@analog.com> Cc: Nuno Sá <nuno.sa@analog.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: SeongJae Park <sj@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20260323160052.17528-2-vineeth@bitbyteword.org Suggested-by: Steven Rostedt <rostedt@goodmis.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org> Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
netfilter: nf_conntrack_expect: store netns and zone in expectation
__nf_ct_expect_find() and nf_ct_expect_find_get() are called under
rcu_read_lock() but they dereference the master conntrack via
exp->master.
Since the expectation does not hold a reference on the master conntrack,
this could be dying conntrack or different recycled conntrack than the
real master due to SLAB_TYPESAFE_RCU.
Store the netns, the master_tuple and the zone in struct
nf_conntrack_expect as a safety measure.
This patch is required by the follow up fix not to dump expectations
that do not belong to this netns.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: ctnetlink: ensure safe access to master conntrack
Holding reference on the expectation is not sufficient, the master
conntrack object can just go away, making exp->master invalid.
To access exp->master safely:
- Grab the nf_conntrack_expect_lock, this gets serialized with
clean_from_lists() which also holds this lock when the master
conntrack goes away.
- Hold reference on master conntrack via nf_conntrack_find_get().
Not so easy since the master tuple to look up for the master conntrack
is not available in the existing problematic paths.
This patch goes for extending the nf_conntrack_expect_lock section
to address this issue for simplicity, in the cases that are described
below this is just slightly extending the lock section.
The add expectation command already holds a reference to the master
conntrack from ctnetlink_create_expect().
However, the delete expectation command needs to grab the spinlock
before looking up for the expectation. Expand the existing spinlock
section to address this to cover the expectation lookup. Note that,
the nf_ct_expect_iterate_net() calls already grabs the spinlock while
iterating over the expectation table, which is correct.
The get expectation command needs to grab the spinlock to ensure master
conntrack does not go away. This also expands the existing spinlock
section to cover the expectation lookup too. I needed to move the
netlink skb allocation out of the spinlock to keep it GFP_KERNEL.
For the expectation events, the IPEXP_DESTROY event is already delivered
under the spinlock, just move the delivery of IPEXP_NEW under the
spinlock too because the master conntrack event cache is reached through
exp->master.
While at it, add lockdep notations to help identify what codepaths need
to grab the spinlock.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_conntrack_expect: use expect->helper
Use expect->helper in ctnetlink and /proc to dump the helper name.
Using nfct_help() without holding a reference to the master conntrack
is unsafe.
Use exp->master->helper in ctnetlink path if userspace does not provide
an explicit helper when creating an expectation to retain the existing
behaviour. The ctnetlink expectation path holds the reference on the
master conntrack and nf_conntrack_expect lock and the nfnetlink glue
path refers to the master ct that is attached to the skb.
Reported-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_conntrack_expect: honor expectation helper field
The expectation helper field is mostly unused. As a result, the
netfilter codebase relies on accessing the helper through exp->master.
Always set on the expectation helper field so it can be used to reach
the helper.
nf_ct_expect_init() is called from packet path where the skb owns
the ct object, therefore accessing exp->master for the newly created
expectation is safe. This saves a lot of updates in all callsites
to pass the ct object as parameter to nf_ct_expect_init().
This is a preparation patches for follow up fixes.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Chris Arges reports high memory consumption with thousands of
containers, this patch revisits the array allocation logic.
For anonymous sets, start by 16 slots (which takes 256 bytes on x86_64).
Expand it by x2 until threshold of 512 slots is reached, over that
threshold, expand it by x1.5.
For non-anonymous set, start by 1024 slots in the array (which takes 16
Kbytes initially on x86_64). Expand it by x1.5.
Use set->ndeact to subtract deactivated elements when calculating the
number of the slots in the array, otherwise the array size array gets
increased artifically. Add special case shrink logic to deal with flush
set too.
The shrink logic is skipped by anonymous sets.
Use check_add_overflow() to calculate the new array size.
Add a WARN_ON_ONCE check to make sure elements fit into the new array
size.
Reported-by: Chris Arges <carges@cloudflare.com> Fixes: 7e43e0a1141d ("netfilter: nft_set_rbtree: translate rbtree to array for binary search") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ren Wei [Wed, 25 Mar 2026 13:11:00 +0000 (14:11 +0100)]
netfilter: ip6t_rt: reject oversized addrnr in rt_mt6_check()
Reject rt match rules whose addrnr exceeds IP6T_RT_HOPS.
rt_mt6() expects addrnr to stay within the bounds of rtinfo->addrs[].
Validate addrnr during rule installation so malformed rules are rejected
before the match logic can use an out-of-range value.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Yuhang Zheng <z1652074432@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Weiming Shi [Wed, 25 Mar 2026 13:10:58 +0000 (14:10 +0100)]
netfilter: nfnetlink_log: fix uninitialized padding leak in NFULA_PAYLOAD
__build_packet_message() manually constructs the NFULA_PAYLOAD netlink
attribute using skb_put() and skb_copy_bits(), bypassing the standard
nla_reserve()/nla_put() helpers. While nla_total_size(data_len) bytes
are allocated (including NLA alignment padding), only data_len bytes
of actual packet data are copied. The trailing nla_padlen(data_len)
bytes (1-3 when data_len is not 4-byte aligned) are never initialized,
leaking stale heap contents to userspace via the NFLOG netlink socket.
Replace the manual attribute construction with nla_reserve(), which
handles the tailroom check, header setup, and padding zeroing via
__nla_reserve(). The subsequent skb_copy_bits() fills in the payload
data on top of the properly initialized attribute.
Fixes: df6fb868d611 ("[NETFILTER]: nfnetlink: convert to generic netlink attribute functions") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Sakari Ailus [Sat, 21 Mar 2026 21:21:44 +0000 (23:21 +0200)]
media: ccs: Avoid deadlock in ccs_init_state()
The sub-device state lock has been already acquired when ccs_init_state()
is called. Do not try to acquire it again.
Reported-by: David Heidelberg <david@ixit.cz> Fixes: a88883d1209c ("media: ccs: Rely on sub-device state locking") Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Ricardo Ribalda [Fri, 20 Mar 2026 07:49:10 +0000 (07:49 +0000)]
media: uvcvideo: Fix bug in error path of uvc_alloc_urb_buffers
Recent cleanup introduced a bug in the error path of
uvc_alloc_urb_buffers(). If there is not enough memory for the
allocation the following error will be triggered:
[ 739.196672] UBSAN: shift-out-of-bounds in mm/page_alloc.c:1403:22
[ 739.196710] shift exponent 52 is too large for 32-bit type 'int'
Resulting in:
[ 740.464422] BUG: unable to handle page fault for address: fffffac1c0800000
The reason for the bug is that usb_free_noncoherent is called with an
invalid size (0) instead of the actual size of the urb.
This patch takes care of that.
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Closes: https://lore.kernel.org/linux-media/abycbXzYupZpGkvR@hyeyoo/T/#t Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Fixes: c824345288d1 ("media: uvcvideo: Pass allocation size directly to uvc_alloc_urb_buffer") Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Link: https://patch.msgid.link/20260320-uvc-urb-free-error-v1-1-b12cc3762a19@chromium.org Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Bhargav Joshi [Wed, 25 Mar 2026 23:05:59 +0000 (04:35 +0530)]
regulator: dt-bindings: mps,mp8859: convert to DT schema
Convert the Monolithic Power Systems MP8859 voltage regulator binding
from legacy text format to DT schema. This patch does not change any
functionality, the bindings remain the same.
====================
net: dpaa2-mac: export standard statistics
This patch set adds support for standard ethtool statistics - rmon,
eth-ctrl, eth-mac and pause - to dpaa2-mac and its users dpaa2-eth and
dpaa2-switch.
The first patch extends the firmware APIs related to MAC counters and
adds dpmac_get_statistics() which can be used to retrieve multiple counter
values through a single firmware call.
This new API is put in use in the second patch by gathering all
previously exported ethtool statistics through a single MC firmware
call. In this patch we are also adding the setup and cleanup
infrastructure which will be also used for the standard ethtool
counters.
The third patch adds the actual suppord for rmon, eth-ctrl, eth-mac and
pause statistics in dpaa2-mac and its users.
====================
Ioana Ciornei [Mon, 23 Mar 2026 11:50:39 +0000 (13:50 +0200)]
net: dpaa2-mac: export standard statistics
Take advantage of all the newly added MAC counters available through the
MC API and export them through the standard statistics structures -
rmon, eth-ctrl, eth-mac and pause.
A new version based feature is added into dpaa2-mac -
DPAA2_MAC_FEATURE_STANDARD_STATS - and based on the two memory zones
needed for gathering the MAC counters are setup for each statistics
group.
The dpmac_counter structure is extended with a new field - size_t offset
- which is being used to instruct the dpaa2_mac_transfer_stats()
function where exactly to store a counter value inside the standard
statistics structure.
The newly added support is used both in the dpaa2-eth driver as well as
the dpaa2-switch one.
Ioana Ciornei [Mon, 23 Mar 2026 11:50:38 +0000 (13:50 +0200)]
net: dpaa2-mac: retrieve MAC statistics in one firmware command
The latest MC firmware version added a new command to retrieve all DPMAC
counters in a single firmware call. Use this new command, when possible,
in dpaa2-mac as well.
In order to use the dpmac_get_statistics() API, two DMA memory areas are
used: one to transmit what counters the driver is requesting and one to
receive the values of those counters. These memory areas are allocated
and DMA mapped at probe time so that we don't waste time at runtime.
And since we are planning to add rmon, eth-ctrl and other standard
statistics using the same infrastructure, make the setup and cleanup
processes as generic as possibile through the dpaa2_mac_setup_stats()
and dpaa2_mac_clear_stats() functions.
Ioana Ciornei [Mon, 23 Mar 2026 11:50:37 +0000 (13:50 +0200)]
net: dpaa2-mac: extend APIs related to statistics
Extend the dpmac_counter_id enum with the newly added counters which can
be interrogated through the MC firmware. Also add the
dpmac_get_statistics() API which can be used to retrieve multiple MAC
counters through a single firmware command.