Dinh Nguyen [Sun, 22 Jun 2025 11:52:49 +0000 (06:52 -0500)]
arm64: dts: socfpga: swvp: remove phy-addr in the GMAC node
This addresses this warning:
socfpga_stratix10_swvp.dtb: ethernet@ff800000 (altr,socfpga-stmmac-a10-s10):
'phy-addr' does not match any of the regexes: '^pinctrl-[0-9]+$'
Dinh Nguyen [Thu, 5 Jun 2025 18:19:20 +0000 (13:19 -0500)]
arm64: dts: socfpga: swvp: remove cpu1-start-addr
The cpu1-start-addr property is only applicable to 32-bit SoCFPGA
platforms.
Removing this property will take care of warnings like this:
socfpga_stratix10_swvp.dtb: sysmgr@ffd12000: cpu1-start-addr:
False schema does not allow 4291846704
Dinh Nguyen [Wed, 4 Jun 2025 18:44:08 +0000 (13:44 -0500)]
arm64: dts: socfpga: agilex: fix dtbs_check warning for f2s-free-clk
The f2s-free-clk requires a clock-frequency value. We put in an
arbitrary value of 100 MHz for a constant. The true clock frequency
would get generated in an FPGA design and the bootloader will populated
in actual hardware designs.
This fixes warning like this:
arch/arm64/boot/dts/intel:34:8
4 f2s-free-clk (fixed-clock): 'clock-frequency' is a required property
During RAID resync, faulty rdev cannot be removed and will result in
"Device or resource busy" error when attempting hot removal.
Reproduction steps:
mdadm -Cv /dev/md0 -l1 -n3 -e1.2 /dev/sd{b..d}
mdadm /dev/md0 -f /dev/sdb
mdadm /dev/md0 -r /dev/sdb
-> mdadm: hot remove failed for /dev/sdb: Device or resource busy
After commit 4b10a3bc67c1 ("md: ensure resync is prioritized over
recovery"), when a device becomes faulty during resync, the
md_choose_sync_action() function returns early without calling
remove_and_add_spares(), preventing faulty device removal.
This patch extracts a helper function remove_spares() to support
removing faulty devices during RAID resync operations.
Ryo Takakura [Sun, 1 Jun 2025 01:37:02 +0000 (10:37 +0900)]
md/raid5: unset WQ_CPU_INTENSIVE for raid5 unbound workqueue
When specified with WQ_CPU_INTENSIVE, the workqueue doesn't
participate in concurrency management. This behaviour is already
accounted for WQ_UNBOUND workqueues given that they are assigned
to their own worker threads.
Unset WQ_CPU_INTENSIVE as the use of flag has no effect when
used with WQ_UNBOUND.
Xiao Ni [Wed, 11 Jun 2025 07:31:07 +0000 (15:31 +0800)]
md: Don't clear MD_CLOSING until mddev is freed
UNTIL_STOP is used to avoid mddev is freed on the last close before adding
disks to mddev. And it should be cleared when stopping an array which is
mentioned in commit efeb53c0e572 ("md: Allow md devices to be created by
name."). So reset ->hold_active to 0 in md_clean.
And MD_CLOSING should be kept until mddev is freed to avoid reopen.
Xiao Ni [Wed, 11 Jun 2025 07:31:06 +0000 (15:31 +0800)]
md: call del_gendisk in control path
Now del_gendisk and put_disk are called asynchronously in workqueue work.
The asynchronous way has a problem that the device node can still exist
after mdadm --stop command returns in a short window. So udev rule can
open this device node and create the struct mddev in kernel again. So put
del_gendisk in control path and still leave put_disk in md_kobj_release
to avoid uaf of gendisk.
Function del_gendisk can't be called with reconfig_mutex. If it's called
with reconfig mutex, a deadlock can happen. del_gendisk waits all sysfs
files access to finish and sysfs file access waits reconfig mutex. So
put del_gendisk after releasing reconfig mutex.
But there is still a window that sysfs can be accessed between mddev_unlock
and del_gendisk. So some actions (add disk, change level, .e.g) can happen
which lead unexpected results. MD_DELETED is used to resolve this problem.
MD_DELETED is set before releasing reconfig mutex and it should be checked
for these sysfs access which need reconfig mutex. For sysfs access which
don't need reconfig mutex, del_gendisk will wait them to finish.
But it doesn't need to do this in function mddev_lock_nointr. There are
ten places that call it.
* Five of them are in dm raid which we don't need to care. MD_DELETED is
only used for md raid.
* stop_sync_thread, md_do_sync and md_start_sync are related sync request,
and it needs to wait sync thread to finish before stopping an array.
* md_ioctl: md_open is called before md_ioctl, so ->openers is added. It
will fail to stop the array. So it doesn't need to check MD_DELETED here
* md_set_readonly:
It needs to call mddev_set_closing_and_sync_blockdev when setting readonly
or read_auto. So it will fail to stop the array too because MD_CLOSING is
already set.
arm64: dts: allwinner: a133-liontron-h-a133l: Add Ethernet support
The Liontron H-A133L board features an Ethernet controller with a
JLSemi JL1101 PHY. Its reset pin is tied to the PH12 GPIO.
Note that the reset pin must be handled as a bus-wide reset GPIO in
order to let the MDIO core properly reset it before trying to read
its identification registers. There's no other device on the MDIO bus.
The datasheet of the PHY mentions that the reset signal must be held
for 1 ms to take effect. Make it 2 ms (and the same for post-delay) to
be on the safe side without wasting too much time during boot.
Signed-off-by: Paul Kocialkowski <paulk@sys-base.io> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Tested-by: Andre Przywara <andre.przywara@arm.com> Link: https://patch.msgid.link/20250707165155.581579-5-paulk@sys-base.io Signed-off-by: Chen-Yu Tsai <wens@csie.org>
The Allwinner A100/A133 Ethernet MAC (EMAC) is compatible with the A64
one and needs access to the syscon register for control of the
top-level integration of the unit.
Note that there are two such controllers on the sun50iw10 die, which are
the same unit with a different top-level syscon register offset.
arm64: dts: allwinner: a100: Add pin definitions for RGMII/RMII
The Allwinner A100/A133 supports both RGMII and RMII for its Ethernet
MAC (EMAC) controller. Add corresponding pin definitions.
Note that the sun50iw10 die actually includes two ethernet controllers,
the second of which is rarely exposed to pins. Call the first controller
"emac0" to distinguish it from the second that may be added later.
Detlev Casanova [Mon, 23 Jun 2025 16:07:21 +0000 (12:07 -0400)]
media: rkvdec: Remove TODO file
2 items are present in the TODO file:
- HEVC support
- Evaluate adding helper for rkvdec_request_validate
Missing HEVC support is not a reason for a driver to be in staging,
support for different features of the hardware can be added in drivers
in the main tree.
The rkvdec_request_validate function was simplified in
commit 54676d5f5630 ("media: rkvdec: Do not require all controls to be present in every request")
by not setting controls that have not changed.
As it now basically just calls vb2_request_validate(), there is no need
for a helper.
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com> Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Detlev Casanova [Mon, 23 Jun 2025 16:07:16 +0000 (12:07 -0400)]
media: dt-bindings: rockchip: Add RK3576 Video Decoder bindings
The video decoder in RK3576 (vdpu383) is described the same way as the
one in RK3588 (vdpu381). A new compatible is added as the driver
implementation will be different.
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Ming Qian [Thu, 24 Apr 2025 10:33:24 +0000 (18:33 +0800)]
media: amphion: Support dmabuf and v4l2 buffer without binding
When using VB2_DMABUF, the relationship between dma-buf and v4l2 buffer
may not one-to-one, a single dma-buf may be queued via different
v4l2 buffers, and different dma-bufs may be queued via the same
v4l2 buffer, so it's not appropriate to use the v4l2 buffer index
as the frame store id.
We can generate a frame store id according to the dma address.
Then for a given dma-buf, the id is fixed.
Driver now manages the frame store and vb2-buffer states independently.
When a dmabuf is queued via another v4l2 buffer before the buffer is
released by firmware, need to pend it until firmware release it.
Signed-off-by: Ming Qian <ming.qian@oss.nxp.com> Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Marek Szyprowski [Fri, 11 Jul 2025 09:41:58 +0000 (11:41 +0200)]
media: v4l2: Add support for NV12M tiled variants to v4l2_format_info()
Commit 6f1466123d73 ("media: s5p-mfc: Add YV12 and I420 multiplanar
format support") added support for the new formats to s5p-mfc driver,
what in turn required some internal calls to the v4l2_format_info()
function while setting up formats. This in turn broke support for the
"old" tiled NV12MT* formats, which are not recognized by this function.
Fix this by adding those variants of NV12M pixel format to
v4l2_format_info() function database.
Fixes: 6f1466123d73 ("media: s5p-mfc: Add YV12 and I420 multiplanar format support") Cc: stable@vger.kernel.org Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Al Viro [Sat, 12 Jul 2025 05:02:31 +0000 (06:02 +0100)]
habanalabs: fix UAF in export_dmabuf()
As soon as we'd inserted a file reference into descriptor table, another
thread could close it. That's fine for the case when all we are doing is
returning that descriptor to userland (it's a race, but it's a userland
race and there's nothing the kernel can do about it). However, if we
follow fd_install() with any kind of access to objects that would be
destroyed on close (be it the struct file itself or anything destroyed
by its ->release()), we have a UAF.
dma_buf_fd() is a combination of reserving a descriptor and fd_install().
habanalabs export_dmabuf() calls it and then proceeds to access the
objects destroyed on close. In particular, it grabs an extra reference to
another struct file that will be dropped as part of ->release() for ours;
that "will be" is actually "might have already been".
Fix that by reserving descriptor before anything else and do fd_install()
only when everything had been set up. As a side benefit, we no longer
have the failure exit with file already created, but reference to
underlying file (as well as ->dmabuf_export_cnt, etc.) not grabbed yet;
unlike dma_buf_fd(), fd_install() can't fail.
Fixes: db1a8dd916aa ("habanalabs: add support for dma-buf exporter") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jan Kara [Fri, 11 Jul 2025 16:32:03 +0000 (18:32 +0200)]
loop: Avoid updating block size under exclusive owner
Syzbot came up with a reproducer where a loop device block size is
changed underneath a mounted filesystem. This causes a mismatch between
the block device block size and the block size stored in the superblock
causing confusion in various places such as fs/buffer.c. The particular
issue triggered by syzbot was a warning in __getblk_slow() due to
requested buffer size not matching block device block size.
Fix the problem by getting exclusive hold of the loop device to change
its block size. This fails if somebody (such as filesystem) has already
an exclusive ownership of the block device and thus prevents modifying
the loop device under some exclusive owner which doesn't expect it.
Reported-by: syzbot+01ef7a8da81a975e1ccd@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: syzbot+01ef7a8da81a975e1ccd@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20250711163202.19623-2-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Fri, 11 Jul 2025 08:30:09 +0000 (16:30 +0800)]
block: fix kobject leak in blk_unregister_queue
The kobject for the queue, `disk->queue_kobj`, is initialized with a
reference count of 1 via `kobject_init()` in `blk_register_queue()`.
While `kobject_del()` is called during the unregister path to remove
the kobject from sysfs, the initial reference is never released.
Add a call to `kobject_put()` in `blk_unregister_queue()` to properly
decrement the reference count and fix the leak.
Jakub Kicinski [Sat, 12 Jul 2025 00:50:26 +0000 (17:50 -0700)]
Merge tag 'batadv-next-pullrequest-20250710' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- batman-adv: store hard_iface as iflink private data,
by Matthias Schiffer
* tag 'batadv-next-pullrequest-20250710' of git://git.open-mesh.org/linux-merge:
batman-adv: store hard_iface as iflink private data
batman-adv: Start new development cycle
====================
Jakub Kicinski [Sat, 12 Jul 2025 00:33:06 +0000 (17:33 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: cleanups and preparation for live migration
Jake Keller says:
Various cleanups and preparation to the ice driver code for supporting
SR-IOV live migration.
The logic for unpacking Rx queue context data is added. This is the inverse
of the existing packing logic. Thanks to <linux/packing.h> this is trivial
to add.
Code to enable both reading and writing the Tx queue context for a queue
over a shared hardware register interface is added. Thanks to ice_adapter,
this is locked across all PFs that need to use it, preventing concurrency
issues with multiple PFs.
The RSS hash configuration requested by a VF is cached within the VF
structure. This will be used to track and restore the same configuration
during migration load.
ice_sriov_set_msix_vec_count() is updated to use pci_iov_vf_id() instead of
open-coding a worse equivalent, and checks to avoid rebuilding MSI-X if the
current request is for the existing amount of vectors.
A new ice_get_vf_by_dev() helper function is added to simplify accessing a
VF from its PCI device structure. This will be used more heavily within the
live migration code itself.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: introduce ice_get_vf_by_dev() wrapper
ice: avoid rebuilding if MSI-X vector count is unchanged
ice: use pci_iov_vf_id() to get VF ID
ice: expose VF functions used by live migration
ice: move ice_vsi_update_l2tsel to ice_lib.c
ice: save RSS hash configuration for migration
ice: add functions to get and set Tx queue context
ice: add support for reading and unpacking Rx queue context
====================
Merge tag 'pci-v6.16-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:
- Track apple Root Ports explicitly and look up the driver data from
the struct device instead of using dev->driver_data, which is used by
pci_host_common_init() for the generic host bridge pointer (Marc
Zyngier)
- Set dev->driver_data before pci_host_common_init() calls
gen_pci_init() because some drivers need it to set up ECAM mappings;
this fixes a regression on MicroChip MPFS Icicle (Geert Uytterhoeven)
- Revert the now-unnecessary use of ECAM pci_config_window.priv to
store a copy of dev->driver_data (Marc Zyngier)
* tag 'pci-v6.16-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
Revert "PCI: ecam: Allow cfg->priv to be pre-populated from the root port device"
PCI: host-generic: Set driver_data before calling gen_pci_init()
PCI: apple: Add tracking of probed root ports
Core Changes:
- fix race in gem_handle_create_tail
- fixup handle_count fb refcount regression from -rc5, popular with
reports ...
- call rust dtor for drm_device release
Driver Changes:
- nouveau: magic 50ms suspend fix, acpi leak fix
- tegra: dma api error in nvdec
- pvr: fix device reset
- habanalbs maintainer update
- intel display: fix some dsi mipi sequences
- xe fixes: SRIOV fixes, small GuC fixes, disable indirect ring due
to issues, compression fix for fragmented BO, doc update
* tag 'drm-fixes-2025-07-12' of https://gitlab.freedesktop.org/drm/kernel: (22 commits)
drm/xe/guc: Default log level to non-verbose
drm/xe/bmg: Don't use WA 16023588340 and 22019338487 on VF
drm/xe/guc: Recommend GuC v70.46.2 for BMG, LNL, DG2
drm/xe/pm: Correct comment of xe_pm_set_vram_threshold()
drm/xe: Release runtime pm for error path of xe_devcoredump_read()
drm/xe/pm: Restore display pm if there is error after display suspend
drm/i915/bios: Apply vlv_fixup_mipi_sequences() to v2 mipi-sequences too
drm/gem: Fix race in drm_gem_handle_create_tail()
drm/framebuffer: Acquire internal references on GEM handles
agp/amd64: Check AGP Capability before binding to unsupported devices
drm/xe/bmg: fix compressed VRAM handling
Revert "drm/xe/xe2: Enable Indirect Ring State support for Xe2"
drm/xe: Allocate PF queue size on pow2 boundary
drm/xe/pf: Clear all LMTT pages on alloc
drm/nouveau/gsp: fix potential leak of memory used during acpi init
rust: drm: remove unnecessary imports
MAINTAINERS: Change habanalabs maintainer
drm/imagination: Fix kernel crash when hard resetting the GPU
drm/tegra: nvdec: Fix dma_alloc_coherent error check
rust: drm: device: drop_in_place() the drm::Device in release()
...
I haven't figured out what the actual bug in this commit is, but I did
spend a lot of time chasing it down and eventually succeeded in
bisecting it down to this.
For some reason, this eventpoll commit ends up causing delays and stuck
user space processes, but it only happens on one of my machines, and
only during early boot or during the flurry of initial activity when
logging in.
I must be triggering some very subtle timing issue, but once I figured
out the behavior pattern that made it reasonably reliable to trigger, it
did bisect right to this, and reverting the commit fixes the problem.
Of course, that was only after I had failed at bisecting it several
times, and had flailed around blaming both the drm people and the
netlink people for the odd problems. The most obvious of which happened
at the time of the first graphical login (the most common symptom being
that some gnome app aborted due to a 30s timeout, often leading to the
whole session then failing if it was some critical component like
gnome-shell or similar).
Acked-by: Nam Cao <namcao@linutronix.de> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
net: ll_temac: Fix incorrect PHY node reference in debug message
In temac_probe(), the debug message intended to print the resolved
PHY node was mistakenly using the controller node temac_np
instead of the actual PHY node lp->phy_node. This patch corrects
the log to reference the correct device tree node.
====================
netdevsim: support setting a permanent address
Network management daemons that match on the device permanent address
currently have no virtual interface types to test against.
NetworkManager, in particular, has carried an out of tree patch to set
the permanent address on netdevsim devices to use in its CI for this
purpose.
This series adds support to netdevsim to set a permanent address on port
creation, and adds a test script to test setting and getting of the
different L2 address types.
selftests: net: add netdev-l2addr.sh for testing L2 address functionality
Add a new test script to the network selftests which tests getting and
setting of layer 2 addresses through netlink, including the newly added
support for setting a permaddr on netdevsim devices.
net: netdevsim: Support setting dev->perm_addr on port creation
Network management daemons that match on the device permanent address
currently have no virtual interface types to test against.
NetworkManager, in particular, has carried an out of tree patch to set
the permanent address on netdevsim devices to use in its CI for this
purpose.
To support this use case, support setting netdev->perm_addr when
creating a netdevsim port.
selftests: flip local/remote endpoints in iou-zcrx.py
The iou-zcrx selftest currently runs the server on the remote host
and the client on the local host. This commit flips the endpoints
such that server runs on localhost and client on remote.
This change brings the iou-zcrx selftest in convention with other
selftests.
Drive-by fix for a missing import exception that happens when the
network interface has less than 2 combined channels.
Test plan: ran iou-zcrx.py selftest between 2 physical machines
Edward Cree [Thu, 10 Jul 2025 17:32:13 +0000 (18:32 +0100)]
sfc: falcon: refactor and document ef4_ethtool_get_rxfh_fields
The code had some rather odd control flow inherited from when it was
shared with siena and ef10 before this driver was split out.
Simplify that for easier reading.
Also add a comment explaining why we return the values we do, since
some Falcon documents and datasheets confusingly mention the part
supporting 4-tuple UDP hashing.
(I couldn't find any record of exactly what was "broken" about the
original Falcon A hash, I'm just trusting that falcon_init_rx_cfg()
had a good reason for not using it.)
net: emaclite: Fix missing pointer increment in aligned_read()
Add missing post-increment operators for byte pointers in the
loop that copies remaining bytes in xemaclite_aligned_read().
Without the increment, the same byte was written repeatedly
to the destination.
This update aligns with xemaclite_aligned_write()
====================
net_sched: act: extend RCU use in dump() methods
We are trying to get away from central RTNL in favor of fine-grained
mutexes. While looking at net/sched, I found that act already uses
RCU in the fast path for the most cases, and could also be used
in dump() methods.
This series is not complete and will be followed by a second one.
Eric Dumazet [Wed, 9 Jul 2025 09:01:57 +0000 (09:01 +0000)]
net_sched: act_ctinfo: use atomic64_t for three counters
Commit 21c167aa0ba9 ("net/sched: act_ctinfo: use percpu stats")
missed that stats_dscp_set, stats_dscp_error and stats_cpmark_set
might be written (and read) locklessly.
Use atomic64_t for these three fields, I doubt act_ctinfo is used
heavily on big SMP hosts anyway.
Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pedro Tammela <pctammela@mojatatu.com> Link: https://patch.msgid.link/20250709090204.797558-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 9 Jul 2025 20:59:10 +0000 (13:59 -0700)]
eth: fbnic: fix ubsan complaints about OOB accesses
UBSAN complains that we reach beyond the end of the log entry:
UBSAN: array-index-out-of-bounds in drivers/net/ethernet/meta/fbnic/fbnic_fw_log.c:94:50
index 71 is out of range for type 'char [*]'
Call Trace:
<TASK>
ubsan_epilogue+0x5/0x2b
fbnic_fw_log_write+0x120/0x960
fbnic_fw_parse_logs+0x161/0x210
We're just taking the address of the character after the array,
so this really seems like something that should be legal.
But whatever, easy enough to silence by doing direct pointer math.
Fixes: c2b93d6beca8 ("eth: fbnic: Create ring buffer for firmware logs") Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250709205910.3107691-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
virtio_net: simplify tx queue wake condition check
Consolidate the two nested if conditions for checking tx queue wake
conditions into a single combined condition. This improves code
readability without changing functionality. And move netif_tx_wake_queue
into if condition to reduce unnecessary checks for queue stops.
William Liu [Tue, 8 Jul 2025 16:44:05 +0000 (16:44 +0000)]
selftests/tc-testing: Add tests for restrictions on netem duplication
Ensure that a duplicating netem cannot exist in a tree with other netems
in both qdisc addition and change. This is meant to prevent the soft
lockup and OOM loop scenario discussed in [1]. Also adjust a HFSC's
re-entrancy test case with netem for this new restriction - KASAN
still triggers upon its failure.
Signed-off-by: William Liu <will@willsroot.io> Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20250708164219.875521-1-will@willsroot.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>
William Liu [Tue, 8 Jul 2025 16:43:26 +0000 (16:43 +0000)]
net/sched: Restrict conditions for adding duplicating netems to qdisc tree
netem_enqueue's duplication prevention logic breaks when a netem
resides in a qdisc tree with other netems - this can lead to a
soft lockup and OOM loop in netem_dequeue, as seen in [1].
Ensure that a duplicating netem cannot exist in a tree with other
netems.
Previous approaches suggested in discussions in chronological order:
1) Track duplication status or ttl in the sk_buff struct. Considered
too specific a use case to extend such a struct, though this would
be a resilient fix and address other previous and potential future
DOS bugs like the one described in loopy fun [2].
2) Restrict netem_enqueue recursion depth like in act_mirred with a
per cpu variable. However, netem_dequeue can call enqueue on its
child, and the depth restriction could be bypassed if the child is a
netem.
3) Use the same approach as in 2, but add metadata in netem_skb_cb
to handle the netem_dequeue case and track a packet's involvement
in duplication. This is an overly complex approach, and Jamal
notes that the skb cb can be overwritten to circumvent this
safeguard.
4) Prevent the addition of a netem to a qdisc tree if its ancestral
path contains a netem. However, filters and actions can cause a
packet to change paths when re-enqueued to the root from netem
duplication, leading us to the current solution: prevent a
duplicating netem from inhabiting the same tree as other netems.
Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication") Reported-by: William Liu <will@willsroot.io> Reported-by: Savino Dicanosa <savy@syst3mfailure.io> Signed-off-by: William Liu <will@willsroot.io> Signed-off-by: Savino Dicanosa <savy@syst3mfailure.io> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20250708164141.875402-1-will@willsroot.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Devers [Thu, 12 Dec 2024 15:47:53 +0000 (10:47 -0500)]
rust: sync: Add #[must_use] to Lock::try_lock()
The `Lock::try_lock()` function returns an `Option<Guard<...>>`, but it
currently does not issue a warning if the return value is unused.
To avoid potential bugs, the `#[must_use]` annotation is added to ensure
proper usage.
Note that `T` is `#[must_use]` but `Option<T>` is not.
For more context, see: https://github.com/rust-lang/rust/issues/71368.
locking/mutex: Mark devm_mutex_init() as __must_check
devm_mutex_init() can fail. With CONFIG_DEBUG_MUTEXES=y the mutex will be
marked as unusable and trigger errors on usage.
Enforce all callers check the return value through the compiler.
As devm_mutex_init() itself is a macro, it can not be annotated
directly. Annotate __devm_mutex_init() instead.
Unfortunately __must_check/warn_unused_result don't propagate through
statement expression. So move the statement expression into the argument
list of the call to __devm_mutex_init() through a helper macro.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/r/20250617-must_check-devm_mutex_init-v7-3-d9e449f4d224@weissschuh.net
Alex Williamson [Thu, 26 Jun 2025 22:56:18 +0000 (16:56 -0600)]
vfio/pci: Separate SR-IOV VF dev_set
In the below noted Fixes commit we introduced a reflck mutex to allow
better scaling between devices for open and close. The reflck was
based on the hot reset granularity, device level for root bus devices
which cannot support hot reset or bus/slot reset otherwise. Overlooked
in this were SR-IOV VFs, where there's also no bus reset option, but
the default for a non-root-bus, non-slot-based device is bus level
reflck granularity.
The reflck mutex has since become the dev_set mutex (via commit 2cd8b14aaa66 ("vfio/pci: Move to the device set infrastructure")) and
is our defacto serialization for various operations and ioctls. It
still seems to be the case though that sets of vfio-pci devices really
only need serialization relative to hot resets affecting the entire
set, which is not relevant to SR-IOV VFs. As described in the Closes
link below, this serialization contributes to startup latency when
multiple VFs sharing the same "bus" are opened concurrently.
Mark the device itself as the basis of the dev_set for SR-IOV VFs.
Reported-by: Aaron Lewis <aaronlewis@google.com> Closes: https://lore.kernel.org/all/20250626180424.632628-1-aaronlewis@google.com Tested-by: Aaron Lewis <aaronlewis@google.com> Fixes: e309df5b0c9e ("vfio/pci: Parallelize device open and release") Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250626225623.1180952-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Eric Biggers [Sun, 6 Jul 2025 23:11:00 +0000 (16:11 -0700)]
lib/crypto: x86/poly1305: Fix performance regression on short messages
Restore the len >= 288 condition on using the AVX implementation, which
was incidentally removed by commit 318c53ae02f2 ("crypto: x86/poly1305 -
Add block-only interface"). This check took into account the overhead
in key power computation, kernel-mode "FPU", and tail handling
associated with the AVX code. Indeed, restoring this check slightly
improves performance for len < 256 as measured using poly1305_kunit on
an "AMD Ryzen AI 9 365" (Zen 5) CPU:
While the optimal threshold for this CPU might be slightly lower than
288 (see the len == 256 case), other CPUs would need to be tested too,
and these sorts of benchmarks can underestimate the true cost of
kernel-mode "FPU". Therefore, for now just restore the 288 threshold.
Eric Biggers [Sun, 6 Jul 2025 23:10:59 +0000 (16:10 -0700)]
lib/crypto: x86/poly1305: Fix register corruption in no-SIMD contexts
Restore the SIMD usability check and base conversion that were removed
by commit 318c53ae02f2 ("crypto: x86/poly1305 - Add block-only
interface").
This safety check is cheap and is well worth eliminating a footgun.
While the Poly1305 functions should not be called when SIMD registers
are unusable, if they are anyway, they should just do the right thing
instead of corrupting random tasks' registers and/or computing incorrect
MACs. Fixing this is also needed for poly1305_kunit to pass.
Just use irq_fpu_usable() instead of the original crypto_simd_usable(),
since poly1305_kunit won't rely on crypto_simd_disabled_for_test.
Eric Biggers [Sun, 6 Jul 2025 23:10:58 +0000 (16:10 -0700)]
lib/crypto: arm64/poly1305: Fix register corruption in no-SIMD contexts
Restore the SIMD usability check that was removed by commit a59e5468a921
("crypto: arm64/poly1305 - Add block-only interface").
This safety check is cheap and is well worth eliminating a footgun.
While the Poly1305 functions should not be called when SIMD registers
are unusable, if they are anyway, they should just do the right thing
instead of corrupting random tasks' registers and/or computing incorrect
MACs. Fixing this is also needed for poly1305_kunit to pass.
Just use may_use_simd() instead of the original crypto_simd_usable(),
since poly1305_kunit won't rely on crypto_simd_disabled_for_test.
Eric Biggers [Sun, 6 Jul 2025 23:10:57 +0000 (16:10 -0700)]
lib/crypto: arm/poly1305: Fix register corruption in no-SIMD contexts
Restore the SIMD usability check that was removed by commit 773426f4771b
("crypto: arm/poly1305 - Add block-only interface").
This safety check is cheap and is well worth eliminating a footgun.
While the Poly1305 functions should not be called when SIMD registers
are unusable, if they are anyway, they should just do the right thing
instead of corrupting random tasks' registers and/or computing incorrect
MACs. Fixing this is also needed for poly1305_kunit to pass.
Just use may_use_simd() instead of the original crypto_simd_usable(),
since poly1305_kunit won't rely on crypto_simd_disabled_for_test.
Jacob Pan [Wed, 18 Jun 2025 23:46:18 +0000 (16:46 -0700)]
vfio: Prevent open_count decrement to negative
When vfio_df_close() is called with open_count=0, it triggers a warning in
vfio_assert_device_open() but still decrements open_count to -1. This allows
a subsequent open to incorrectly pass the open_count == 0 check, leading to
unintended behavior, such as setting df->access_granted = true.
For example, running an IOMMUFD compat no-IOMMU device with VFIO tests
(https://github.com/awilliam/tests/blob/master/vfio-noiommu-pci-device-open.c)
results in a warning and a failed VFIO_GROUP_GET_DEVICE_FD ioctl on the first
run, but the second run succeeds incorrectly.
Add checks to avoid decrementing open_count below zero.
Fixes: 05f37e1c03b6 ("vfio: Pass struct vfio_device_file * to vfio_device_open/close()") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> Link: https://lore.kernel.org/r/20250618234618.1910456-2-jacob.pan@linux.microsoft.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Jacob Pan [Wed, 18 Jun 2025 23:46:17 +0000 (16:46 -0700)]
vfio: Fix unbalanced vfio_df_close call in no-iommu mode
For devices with no-iommu enabled in IOMMUFD VFIO compat mode, the group open
path skips vfio_df_open(), leaving open_count at 0. This causes a warning in
vfio_assert_device_open(device) when vfio_df_close() is called during group
close.
The correct behavior is to skip only the IOMMUFD bind in the device open path
for no-iommu devices. Commit 6086efe73498 omitted vfio_df_open(), which was
too broad. This patch restores the previous behavior, ensuring
the vfio_df_open is called in the group open path.
Fixes: 6086efe73498 ("vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()") Suggested-by: Alex Williamson <alex.williamson@redhat.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jacob Pan <jacob.pan@linux.microsoft.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250618234618.1910456-1-jacob.pan@linux.microsoft.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Fragments aren't limited by Z_EROFS_PCLUSTER_MAX_DSIZE. However, if
a fragment's logical length is larger than Z_EROFS_PCLUSTER_MAX_DSIZE
but the fragment is not the whole inode, it currently returns
-EOPNOTSUPP because m_flags has the wrong EROFS_MAP_ENCODED flag set.
It is not intended by design but should be rare, as it can only be
reproduced by mkfs with `-Eall-fragments` in a specific case.
Let's normalize fragment m_flags using the new EROFS_MAP_FRAGMENT.
Lucas De Marchi [Thu, 26 Jun 2025 21:25:53 +0000 (14:25 -0700)]
drm/xe: Normalize default param values
Document xe module params with the default values following a similar
strategy for all of them:
1) Define a DEFAULT_* macro with the default value. When the
value can't be directly stringified, also define a *_STR
variant
2) Use __stringify() or the _STR variant to make sure the
default value shows up in the param description
This allows us to show the correct default according to the
configuration. max_vfs for example was wrongly documented for
CONFIG_DRM_XE_DEBUG and svm_notifier_size didn't have its default
documented.
Lucas De Marchi [Thu, 10 Jul 2025 21:34:41 +0000 (14:34 -0700)]
drm/xe/migrate: Fix alignment check
The check would fail if the address is unaligned, but not when
accounting the offset. Instead of `buf | offset` it should have
been `buf + offset`. To make it more readable and also drop the
uintptr_t, just use the IS_ALIGNED() macro.
Fixes: 270172f64b11 ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250710-migrate-aligned-v1-1-44003ef3c078@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Ian Rogers [Thu, 10 Jul 2025 23:51:25 +0000 (16:51 -0700)]
perf python: Improve leader copying from evlist
The struct pyrf_evlist embeds the evlist requiring the copying from
things like parsed events. The copying logic handles the leader being
the event itself, but if the leader group event is a different in the
list it will cause an evsel to point to the evsel in the list that was
copied from which is bad. Fix this by adding another pass over the
evlist rewriting leaders, simplified by the introductin of two evlist
helpers.
Ian Rogers [Thu, 10 Jul 2025 23:51:24 +0000 (16:51 -0700)]
perf python: Correct pyrf_evsel__read for tool PMUs
Tool PMUs assume that stat's process_counter_values is being used to
read the counters. Specifically they hold onto old values in
evsel->prev_raw_counts and give the cumulative count based off of this
value. Update pyrf_evsel__read to allocate counts and prev_raw_counts,
use evsel__read_counter rather than perf_evsel__read so tool PMUs are
read from not just perf_event_open events, make the returned
pyrf_counts_values contain the delta value rather than the cumulative
value.
Ian Rogers [Thu, 10 Jul 2025 23:51:22 +0000 (16:51 -0700)]
perf python: In str(evsel) use the evsel__pmu_name helper
The evsel__pmu_name helper will internally use evsel__find_pmu that
handles legacy events, extended types, etc. in determining a PMU and
will provide a better value than just trying to access the PMU's name
directly as the PMU may not have been computed.
Ian Rogers [Thu, 10 Jul 2025 23:51:21 +0000 (16:51 -0700)]
perf jevents: If the long_desc and desc are identical then drop the long_desc
If the short and long descriptions are the same then save space and
don't store both of them. When storing the desc in the perf_pmu_alias,
don't duplicate the desc into the long_desc.
By avoiding storing the duplicate the size of the events string in the
binary on x86 is reduced by 29,840 bytes.
Ian Rogers [Thu, 10 Jul 2025 23:51:20 +0000 (16:51 -0700)]
perf expr: Accumulate rather than replace in the context counts
Metrics will fill in the context to have mappings from an event to a
count. When counts are added they replace existing mappings which
generally shouldn't exist with aggregation. Switch to accumulating to
better support cases where perf stat's aggregation isn't used and we
may see a counter more than once.
Ian Rogers [Thu, 10 Jul 2025 23:51:19 +0000 (16:51 -0700)]
perf stat: Move metric list from config to evlist
The rblist of metric_event that then have a list of associated
metric_expr is moved out of the stat_config and into the evlist. This
is done as part of refactoring things for python, having the state
split in two places complicates that implementation. The evlist is
doing the harder work of enabling and disabling events, the metrics
are needed to compute a value and it doesn't seem unreasonable to hang
them from the evlist.
Ian Rogers [Thu, 10 Jul 2025 23:51:18 +0000 (16:51 -0700)]
perf metricgroup: Factor out for-each function and move out printing
Factor metricgroup__for_each_metric into its own function handling
regular and sys metrics. Make the metric adding and printing code use
it, move the printing code into print-events files.
Ian Rogers [Thu, 10 Jul 2025 23:51:17 +0000 (16:51 -0700)]
perf pmu: Tolerate failure to read the type for wellknown PMUs
If sysfs isn't mounted then we may fail to read a PMU's type. In this
situation resort to lookup of wellknown types. Only applies to
software, tracepoint and breakpoint PMUs.
Ian Rogers [Thu, 10 Jul 2025 23:51:14 +0000 (16:51 -0700)]
perf hwmon_pmu: Avoid shortening hwmon PMU name
Long names like ucsi_source_psy_USBC000:001 when prefixed with hwmon_
exceed the buffer size and the last digit is lost. This causes
confusion with similar names like ucsi_source_psy_USBC000:002. Extend
the buffer size to avoid this.
Cross-merge networking fixes after downstream PR (net-6.16-rc6-2).
No conflicts.
Adjacent changes:
drivers/net/wireless/mediatek/mt76/mt7925/mcu.c c701574c5412 ("wifi: mt76: mt7925: fix invalid array index in ssid assignment during hw scan") b3a431fe2e39 ("wifi: mt76: mt7925: fix off by one in mt7925_mcu_hw_scan()")
drivers/net/wireless/mediatek/mt76/mt7996/mac.c 62da647a2b20 ("wifi: mt76: mt7996: Add MLO support to mt7996_tx_check_aggr()") dc66a129adf1 ("wifi: mt76: add a wrapper for wcid access with validation")
drivers/net/wireless/mediatek/mt76/mt7996/main.c 3dd6f67c669c ("wifi: mt76: Move RCU section in mt7996_mcu_add_rate_ctrl()") 8989d8e90f5f ("wifi: mt76: mt7996: Do not set wcid.sta to 1 in mt7996_mac_sta_event()")
net/mac80211/cfg.c 58fcb1b4287c ("wifi: mac80211: reject VHT opmode for unsupported channel widths") 037dc18ac3fb ("wifi: mac80211: add support for storing station S1G capabilities")
Nam Cao [Fri, 11 Jul 2025 11:20:42 +0000 (13:20 +0200)]
objtool: Add vpanic() to the noreturn list
vpanic() does not return. However, objtool doesn't know this and gets
confused:
kernel/trace/rv/reactor_panic.o: warning: objtool: rv_panic_reaction(): unexpected end of section .text
Add vpanic() to the list of noreturn functions.
Cc: John Ogness <john.ogness@linutronix.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/073f826ebec18b2bb59cba88606cd865d8039fd2.1752232374.git.namcao@linutronix.de Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507110826.2ekbVdWZ-lkp@intel.com/ Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
pinmux: fix race causing mux_owner NULL with active mux_usecount
commit 5a3e85c3c397 ("pinmux: Use sequential access to access
desc->pinmux data") tried to address the issue when two client of the
same gpio calls pinctrl_select_state() for the same functionality, was
resulting in NULL pointer issue while accessing desc->mux_owner.
However, issue was not completely fixed due to the way it was handled
and it can still result in the same NULL pointer.
The issue occurs due to the following interleaving:
This sequence leads to a state where the pin appears to be in use
(`mux_usecount == 1`) but has no owner (`mux_owner == NULL`), which can
cause NULL pointer on next pin_request on the same pin.
Ensure that updates to mux_usecount and mux_owner are performed
atomically under the same lock. Only clear mux_owner when mux_usecount
reaches zero and no new owner has been assigned.
nouveau_drm_ioctl() only checks the _IOC_NR() bits in the
DRM_NOUVEAU_NVIF command, but ignores the type and direction bits, so any
command with '7' in the low eight bits gets passed into
nouveau_abi16_ioctl() instead of drm_ioctl().
Check for all the bits except the size that is handled inside of the
handler.
Fixes: 27111a23d01c ("drm/nouveau: expose the full object/event interfaces to userspace") Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ Fix up two checkpatch warnings and a typo. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250711072458.2665325-1-arnd@kernel.org
====================
Move attach_type into bpf_link
Andrii suggested moving the attach_type into bpf_link, the previous discussion
is as follows:
https://lore.kernel.org/bpf/CAEf4BzY7TZRjxpCJM-+LYgEqe23YFj5Uv3isb7gat2-HU4OSng@mail.gmail.com
patch1 add attach_type in bpf_link, and pass it to bpf_link_init, which
will init the attach_type field.
patch2-7 remove the attach_type in struct bpf_xx_link, update the info
with bpf_link attach_type.
There are some functions finally call bpf_link_init but do not have bpf_attr
from user or do not need to init attach_type from user like bpf_raw_tracepoint_open,
now use prog->expected_attach_type to init attach_type.