git.ipfire.org Git - thirdparty/kernel/linux.git/log

ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14IRH8

The Lenovo Yoga Pro 7 14IRH8 (ALC287 codec, subsystem ID 0x17aa:0x38b1)
has bass speakers on pin 0x17 that are not routed through a DAC with
volume control. This causes the bass speakers to play at full volume
regardless of the volume slider position.

Apply ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN which corrects the DAC
routing for pin 0x17, enabling proper volume control. This is the same
fix used for other Yoga Pro 7 models with identical audio topology
(14APH8, 14AHP9, 14ASP10, 14IAH10).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217949
Co-developed-by: Felix Aljoscha Schnuell <felix.aljoscha.schnuell@stud.uni-hannover.de>
Signed-off-by: Felix Aljoscha Schnuell <felix.aljoscha.schnuell@stud.uni-hannover.de>
Signed-off-by: Moritz Baron <moritz.baron@stud.uni-hannover.de>
Link: https://patch.msgid.link/20260609141648.60608-1-moritz.baron@stud.uni-hannover.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>

iomap: pass the correct len to fserror_report_io in __iomap_write_begin

len is size of the (larger) write request, plen is the range for which
the read failed here.

Fixes: a9d573ee88af ("iomap: report file I/O errors to the VFS")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260610050642.1906695-1-hch@lst.de
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

m68k: Correct CONFIG_MVME16x macro name in #endif comment

A comment in arch/m68k/kernel/head.S incorrectly refers to
CONFIG_MVME162 and CONFIG_MVME167 instead of CONFIG_MVME16x. Correct it.

Discovered while searching for CONFIG_* symbols referenced in code but
not defined in any Kconfig file.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://patch.msgid.link/20260609201211.173438-1-enelsonmoore@gmail.com
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

rust: make `build_assert` module the home of related macros

Given the macro scoping rules, all macros are rendered twice, in the
module and in the top-level of kernel crate.

Add `#[doc(hidden)]` to the macro definition and `#[doc(inline)]` to the
re-export inside `build_assert` module so the top-level items are hidden.

[ Sadly, because the definition is hidden, `rustdoc` decides to not list
  them as re-exports in the `prelude` page anymore, even if we refer to
  the not-actually-hidden item.

    - Miguel ]

Acked-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Acked-by: Boqun Feng <boqun@kernel.org>
Signed-off-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260609142637.373347-1-gary@kernel.org
[ Kept a single declaration in the prelude, and reworded since they
  already had `no_inline`. Removed other imports from `predefine` since
  we now use the prelude. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: str: clean unused import for Rust >= 1.98

Starting with Rust 1.98.0 (expected 2026-08-20), the compiler has changed
how the resolution algorithm works [1] in upstream commit c4d84db5f184
("Resolver: Batched import resolution."), and it now spots:

    error: unused import: `flags::*`
     --> rust/kernel/str.rs:7:9
      |
    7 |         flags::*,
      |         ^^^^^^^^
      |
      = note: `-D unused-imports` implied by `-D warnings`
      = help: to override `-D warnings` add `#[allow(unused_imports)]`

It happens to not be needed because the `prelude::*` already provides
the flags.

Thus clean it up.

Cc: stable@vger.kernel.org # Needed in 6.18.y and later (prelude added to `str`).
Link: https://github.com/rust-lang/rust/pull/145108
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260609104152.261145-2-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: str: use the "kernel vertical" imports style

Convert the imports to use the "kernel vertical" imports style [1].

No functional changes intended.

Link: https://docs.kernel.org/rust/coding-guidelines.html#imports
Reviewed-by: Gary Guo <gary@garyguo.net>
Link: https://patch.msgid.link/20260609104152.261145-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: aref: use the "kernel vertical" imports style

Convert the imports to use the "kernel vertical" imports style [1].

No functional changes intended.

Link: https://docs.kernel.org/rust/coding-guidelines.html#imports
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
Link: https://patch.msgid.link/20260604-unique-ref-v17-8-7b4c3d2930b9@kernel.org
[ Picked from larger series and reworded. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

rust: page: use the "kernel vertical" imports style

Convert the imports to use the "kernel vertical" imports style [1].

No functional changes intended.

Link: https://docs.kernel.org/rust/coding-guidelines.html#imports
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
Link: https://patch.msgid.link/20260604-unique-ref-v17-4-7b4c3d2930b9@kernel.org
[ Picked from larger series and reworded. Adjusted the `error::`
block too. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

wifi: brcmfmac: flowring: simplify flow allocation

Use a flexible array member and kzalloc_flex to combine allocations.
Simplifies code slightly.

Add __counted_by for extra runtime analysis.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://patch.msgid.link/20260608051102.6698-1-rosenp@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: brcm80211: change current_bss to value

Change to a single allocation and remove some boilerplate.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://patch.msgid.link/20260608052854.11718-1-rosenp@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Merge tag 'ath-next-20260609' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath

Jeff Johnson says:
==================
ath.git patches for v7.2 (PR #4)

An assortment of cleanups and minor bug fixes across wcn36xx, ath9k,
ath10k, ath11k, and ath12k.
==================

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

xfs: add newly added RTGs to the free pool in growfs

When growing a zoned RT section, the newly added RTGs also need to be
tagged as free in the radix tree and add to the nr_free_zones counters.
Call xfs_add_free_zone to do that, otherwise using up the newly added
space will wait for free zones forever.

Fixes: 01b71e64bb87 ("xfs: support growfs on zoned file systems")
Cc: stable@vger.kernel.org # v6.15
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

xfs: factor out a xfs_zone_mark_free helper

Add a helper for adding a zone to the free pool in preparation of adding
another caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

accel/amdxdna: Clear sva pointer after unbind

Add client->sva = NULL after the unbind makes it consistent with how
amdxdna_sva_fini() already clears the pointer after unbinding. The
IS_ERR_OR_NULL guard in sva_fini will then correctly skip the second
unbind.

Fixes: 3cc5d7a59519 ("accel/amdxdna: Add carveout memory support for non-IOMMU systems")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260604202815.2425882-1-lizhi.hou@amd.com

can: virtio: Fix comment in UAPI header

When compile testing the UAPI headers with clang, there is an warning turned
error for using a C++ style ('//') comment, which is explicitly forbidden for
UAPI headers.

  In file included from <built-in>:1:
  ./usr/include/linux/virtio_can.h:29:35: error: // comments are not allowed in this language [-Werror,-Wcomment]
     29 | #define VIRTIO_CAN_MAX_DLEN    64 // this is like CANFD_MAX_DLEN
        |                                   ^
  1 error generated.

Switch to a standard C style comment.

Fixes: 2b6b4bb7d96f ("can: virtio: Add virtio CAN driver")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260604-virtio_can-fix-uapi-comment-v1-1-199fa96ec5f0@kernel.org>

can: virtio: Add virtio CAN driver

Add virtio CAN driver based on Virtio 1.4 specification (see
https://github.com/oasis-tcs/virtio-spec/tree/virtio-1.4). The driver
implements a complete CAN bus interface over Virtio transport,
supporting both CAN Classic and CAN-FD Ids. In term of frames, it
supports classic and CAN FD. RTR frames are only supported with classic
CAN.

Usage:
- "ip link set up can0" - start controller
- "ip link set down can0" - stop controller
- "candump can0" - receive frames
- "cansend can0 123#DEADBEEF" - send frames

Signed-off-by: Harald Mommer <harald.mommer@oss.qualcomm.com>
Co-developed-by: Harald Mommer <harald.mommer@oss.qualcomm.com>
Signed-off-by: Mikhail Golubev-Ciuchea <mikhail.golubev-ciuchea@oss.qualcomm.com>
Co-developed-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Damir Shaikhutdinov <Damir.Shaikhutdinov@opensynergy.com>
Reviewed-by: Francesco Valla <francesco@valla.it>
Tested-by: Francesco Valla <francesco@valla.it>
Signed-off-by: Matias Ezequiel Vara Larsen <mvaralar@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <ahXNb+KzuHYbS24+@fedora>

virtio: add num_vf callback to virtio_bus

Recent QEMU versions added support for virtio SR-IOV emulation,
allowing virtio devices to expose SR-IOV VFs to the guest.
However, virtio_bus does not implement the num_vf callback of bus_type,
causing dev_num_vf() to return 0 for virtio devices even when
SR-IOV VFs are active.

net/core/rtnetlink.c calls dev_num_vf(dev->dev.parent) to populate
IFLA_NUM_VF in RTM_GETLINK responses. For a virtio-net device,
dev.parent points to the virtio_device, whose busis virtio_bus.
Without num_vf, SR-IOV VF information is silently
omitted from tools that rely on rtnetlink, such as 'ip link show'.

Add a num_vf callback that delegates to dev_num_vf(dev->parent),
which in turn reaches the underlying transport (pci_bus_type for
virtio-pci) where the actual VF count is tracked. Non-PCI transports
are unaffected as dev_num_vf() returns 0 when no num_vf callback is
present.

Signed-off-by: Yui Washizu <yui.washidu@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260310061454.683894-1-yui.washidu@gmail.com>

fw_cfg: Add support for LoongArch architecture

Qemu fw_cfg support was missing for LoongArch, which made some functions
unusable in virtual machines. So add the missing LoongArch defines.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260529140559.1775511-1-chenhuacai@loongson.cn>

vdpa/octeon_ep: fix IRQ-to-ring mapping in interrupt handler

Look up the IRQ index in oct_hw->irqs instead of assuming
irq - irqs[0]. This supports non-contiguous IRQ numbers and
avoids incorrect ring indexing when irqs[0] is not the base.

Fixes: 26f8ce06af64 ("vdpa/octeon_ep: enable support for multiple interrupts per device")
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260224095226.1001151-5-schalla@marvell.com>

vdpa/octeon_ep: Add vDPA device event handling for firmware notifications

Handle vDPA device add and remove events from Octeon firmware. Use
irq 0 for event delivery as device interrupts are multiplexed.

Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260224095226.1001151-4-schalla@marvell.com>

vdpa/octeon_ep: Use 4 bytes for mailbox signature

The upper 4 bytes are reserved by the firmware for
storing meta data. Use only lower 4 bytes to update
the signature details.

Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260224095226.1001151-3-schalla@marvell.com>

vdpa/octeon_ep: Fix PF->VF mailbox data address calculation

The mailbox address was computed assuming 1 ring per VF. Read the
actual rings-per-VF from OCTEP_EPF_RINFO and use it when calculating
OCTEP_PF_MBOX_DATA offsets, fixing VF initialization when rings
per VF > 1.

Fixes: 8b6c724cdab8 ("virtio: vdpa: vDPA driver for Marvell OCTEON DPU devices")
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260224095226.1001151-2-schalla@marvell.com>

vhost_task_create: kill unnecessary .exit_signal initialization

The only reason for this janitorial change is that this initialization
adds unnecessary noise to "git grep exit_signal".

args.exit_signal has no effect with CLONE_THREAD, not to mention it is
zero-initialized by the compiler anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <acAIm732QPFZs15C@redhat.com>

vhost: remove unnecessary module_init/exit functions

The vhost driver has unnecessary empty module_init and
module_exit functions. Remove them. Note that if a module_init function
exists, a module_exit function must also exist; otherwise, the module
cannot be unloaded.

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260131020010.45647-1-enelsonmoore@gmail.com>

vdpa/mlx5: Use kvzalloc_flex() for MTT command memory

The create mkey command memory embeds the MTT array as a flexible array
member. Use kvzalloc_flex() to allocate it directly instead of open-coding
the struct_size() calculation with kvcalloc().

The MTT allocation still needs to be aligned to MLX5_VDPA_MTT_ALIGN bytes.
Since each MTT entry is __be64, align the entry count directly and avoid
carrying a separate byte length variable.

Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260508051837.1744409-1-rosenp@gmail.com>

vdpa_sim_net: switch to dynamic root device

Driver core expects devices to be dynamically allocated and will, for
example, complain loudly when no release function has been provided.

Use root_device_register() to allocate and register the root device
instead of open coding using a static device.

Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260424104703.2619093-3-johan@kernel.org>

vdpa_sim_blk: switch to dynamic root device

Driver core expects devices to be dynamically allocated and will, for
example, complain loudly when no release function has been provided.

Use root_device_register() to allocate and register the root device
instead of open coding using a static device.

Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260424104703.2619093-2-johan@kernel.org>

virtio-mem: Destroy mutex before freeing virtio_mem

Add a call to mutex_destroy in the error code path as well as in the
virtio_mem_remove code path.

Signed-off-by: Maurice Hieronymus <mhi@mailbox.org>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20251123175750.445461-3-mhi@mailbox.org>

virtio-balloon: Destroy mutex before freeing virtio_balloon

Add a call to mutex_destroy in the error code path as well as in the
virtballoon_remove code path.

Signed-off-by: Maurice Hieronymus <mhi@mailbox.org>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20251123175750.445461-2-mhi@mailbox.org>

tools/virtio: fix build for kmalloc_obj API and missing stubs

Add stubs for kmalloc_obj() and kmalloc_objs() to the tools/virtio
test harness, matching the new kernel allocator API. Also add the
DMA_ATTR_CPU_CACHE_CLEAN definition and include kernel.h from err.h
for the unlikely() macro.

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <a0bd4b5bed56c49626c92a754d7aceab3325de25.1780520728.git.mst@redhat.com>

virtio_ring: Add READ_ONCE annotations for device-writable fields

KCSAN reports data races when accessing virtio ring fields that are
concurrently written by the device (host). These are legitimate
concurrent accesses where the CPU reads fields that the device updates
via DMA-like mechanisms.

Add accessor functions that use READ_ONCE() to properly annotate these
device-writable fields and prevent compiler optimizations that could in
theory break the code. This also serves as documentation showing which
fields are shared with the device.

The affected fields are:
- Split ring: used->idx, used->ring[].id, used->ring[].len
- Packed ring: desc[].flags, desc[].id, desc[].len

This patch was partially written using the help of Kiro, an
AI coding assistant, to automate the mechanical work of generating the
inline function definition.

Signed-off-by: Alexander Graf <graf@amazon.com>
[jth: Add READ_ONCE in virtqueue_kick_prepare_split ]
Co-developed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260131102810.1254845-1-johannes.thumshirn@wdc.com>

vduse: fix compat handling for VDUSE_IOTLB_GET_FD/VDUSE_VQ_GET_INFO

These two ioctls are incompatible on 32-bit x86 userspace, because
the data structures are shorter than they are on 64-bit.

Add a proper .compat_ioctl handler for x86 that reads the structures
with the smaller padding before calling the internal handlers. On
all other architectures, CONFIG_COMPAT_FOR_U64_ALIGNMENT is disabled
and no special handling is required.

Fixes: ad146355bfad ("vduse: Support querying information of IOVA regions")
Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260213154051.4172275-1-arnd@kernel.org>

tools/virtio: check mmap return value in vringh_test

In parallel_test(), the return values of mmap() for both host_map and
guest_map are not checked against MAP_FAILED. If mmap() fails, the
subsequent code will dereference the invalid pointer, leading to a
segmentation fault.

Add MAP_FAILED checks after both mmap() calls, using err() to report
the error and exit, consistent with the existing error handling style
in this file (e.g., the open() call on line 149).

Fixes: 1515c5ce26ae ("tools/virtio: add vring_test.")
Signed-off-by: longlong yan <yanlonglong@kylinos.cn>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260605021446.1611-1-yanlonglong@kylinos.cn>

vhost/net: complete zerocopy ubufs only once

vhost-net initializes one ubuf_info per outstanding zerocopy TX
descriptor and hands it to the backend socket.  The networking stack may
then clone a zerocopy skb before all skb references are released.  For
example, batman-adv fragmentation reaches skb_split(), which calls
skb_zerocopy_clone() and increments the same ubuf_info refcount.

vhost_zerocopy_complete() currently treats every ubuf callback as a
completed vhost descriptor.  It dereferences ubuf->ctx, writes the
descriptor completion state, and drops the vhost_net_ubuf_ref even when
the callback only releases a cloned skb reference.  A backend reset can
therefore wait for and free the vhost_net_ubuf_ref while another cloned
skb still carries the same ubuf_info.  A later completion then
dereferences the freed ubufs pointer.

KASAN reports the stale completion as:

  BUG: KASAN: slab-use-after-free in vhost_zerocopy_complete+0x1d7/0x1f0
  BUG: KASAN: slab-use-after-free in vhost_zerocopy_complete+0x101/0x1f0
  vhost_zerocopy_complete
  skb_copy_ubufs
  __dev_forward_skb2
  veth_xmit

The freed object was allocated from vhost_net_ioctl() while setting the
backend and freed through kfree_rcu()/kvfree_rcu_bulk after backend
removal, while delayed skb completion still reached
vhost_zerocopy_complete().

Honor the generic ubuf_info refcount before touching vhost state, and run
the vhost descriptor completion only for the final ubuf reference.  This
matches the msg_zerocopy_complete() ownership rule for cloned zerocopy
skbs.

Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support")
Signed-off-by: Qing Ming <a0yami@mailbox.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260601104300.197210-1-a0yami@mailbox.org>

VDUSE: avoid leaking information to userspace

The bounceing is not necessarily page aligned, so current VDUSE can
leak kernel information through mapping bounce pages to
userspace. Allocate bounce pages with __GFP_ZERO to avoid leaking
information to userspace.

Fixes: 8c773d53fb7b ("vduse: Implement an MMU-based software IOTLB")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260130050750.4050-1-jasowang@redhat.com>

vduse: Fix race in vduse_dev_msg_sync and vduse_dev_read_iter

There is one race case in vduse_dev_msg_sync and vduse_dev_read_iter:

vduse_dev_read_iter():
    lock(msg_lock);
    dequeue_msg(send_list);
    unlock(msg_lock);
vduse_dev_msg_sync():
    wait_timeout() finish
    lock(msg_lock);
    check msg->complete is false
        list_del(msg);   <- double list_del() crash!

To fix this case, we shall ensure vduse_msg is on send_list or recv_list
outside the msg_lock critical section.

Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260226115550.1814-3-zhangtianci.1997@bytedance.com>

vduse: Requeue failed read to send_list head

When copy_to_iter() fails in vduse_dev_read_iter(), put the message back
at the head of send_list to preserve FIFO ordering and retry the oldest
pending request first.

Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Suggested-by: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260226115550.1814-2-zhangtianci.1997@bytedance.com>

vdpa/mlx5: update MAC address handling in mlx5_vdpa_set_attr()

Improve MAC address handling in mlx5_vdpa_set_attr() to ensure that
old MAC entries are properly removed from the MPFS table before
adding a new one. The new MAC address is then added to both the MPFS
and VLAN tables.

This change fixes an issue where the updated MAC address would not
take effect until QEMU was rebooted.

Signed-off-by: Cindy Lu <lulu@redhat.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260126094848.9601-4-lulu@redhat.com>

vdpa/mlx5: update mlx_features with driver state check

Add logic in mlx5_vdpa_set_attr() to ensure the VIRTIO_NET_F_MAC
feature bit is properly set only when the device is not yet in
the DRIVER_OK (running) state.

This makes the MAC address visible in the output of:

vdpa dev config show -jp

when the device is created without an initial MAC address.

Signed-off-by: Cindy Lu <lulu@redhat.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260126094848.9601-2-lulu@redhat.com>

vdpa/ifcvf: handle dev_set_name() failure in ifcvf_vdpa_dev_add()

dev_set_name() may fail and return an error, but its return value
is currently ignored and overwritten by _vdpa_register_device().

Abort device creation if dev_set_name() fails and release the
device reference to avoid continuing with an improperly initialized
struct device.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Evgenii Burenchev <evg28bur@yandex.ru>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Zhu Lingshan <lingshan.zhu@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260226152924.38790-1-evg28bur@yandex.ru>

virtio_console: read size from config space during device init

Previously, the size was only read upon receiving the config interrupt.
This interrupt is sent when the size changes. However, we also need to
read the initial size.

Also make sure to only read the size from config if F_SIZE is enabled.

Fixes: 9778829cffd4 ("virtio: console: Store each console's size in the console structure")
Signed-off-by: Filip Hejsek <filip.hejsek@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260223-virtio-console-fix-v1-1-0cf08303b428@gmail.com>

virtio_console: Fix spelling mistake "colums" -> "columns"

There is a spelling mistake in a struct description. Fix it.

Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260418-virtio-typo-v1-1-0df6f943a79d@ethancedwards.com>

virtio: rtc: tear down old virtqueues before restore

virtio_device_restore() resets the device and restores the negotiated
features before calling ->restore(). viortc_freeze() intentionally
leaves the existing virtqueues in place so the alarm queue can still
wake the system, but viortc_restore() immediately calls
viortc_init_vqs() without first deleting those old queues.

If virtqueue reinitialization fails on virtio-pci, the transport error
path can run vp_del_vqs() against a newly allocated vp_dev->vqs array
while vdev->vqs still contains the old virtqueues. vp_del_vqs() then
looks up queue state through the new array and can dereference a NULL
info pointer in vp_del_vq(), crashing the guest kernel during restore.

This can also happen during a non-faulty reinitialization, when one of
the vp_find_vqs_msix() attempts is unsuccessful before a later attempt
would succeed.

Delete the stale virtqueues before rebuilding them. If restore fails
before virtio_device_ready(), reuse the remove path to stop the device.
Once the device is ready, return errors directly instead of deleting the
virtqueues again.

Fixes: 0623c7592768 ("virtio_rtc: Add module and driver core")
Signed-off-by: Jia Jia <physicalmtea@gmail.com>
Reviewed-by: Peter Hilber <peter.hilber@oss.qualcomm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260507120801.3677552-1-physicalmtea@gmail.com>

virtio-mmio: fix device release warning on module unload

Driver core expects devices to be allocated dynamically and complains
loudly when a device that lacks a release function is freed.

Use __root_device_register() to allocate and register the root device
instead of open coding using a static device.

Note that root_device_register(), which also creates a link to the
module, cannot be used as the device is registered when parsing the
module parameters which happens before the module kobject has been set
up.

Fixes: 81a054ce0b46 ("virtio-mmio: Devices parameter parsing")
Cc: stable@vger.kernel.org # 3.5
Cc: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260427143710.14702-1-johan@kernel.org>

vhost/vdpa: validate virtqueue index in mmap and fault paths

vhost_vdpa_mmap() and vhost_vdpa_fault() use vma->vm_pgoff as a
virtqueue index for get_vq_notification(), but they do not validate
that the index is smaller than v->nvqs.

The ioctl path already performs both a bounds check and
array_index_nospec(), but the mmap/fault path only checks that the
index fits in u16. This allows an out-of-range queue index to reach
driver-specific get_vq_notification() callbacks.

Fix this by extracting a unified vhost_vdpa_get_vq_notification()
helper that validates the queue index against v->nvqs and applies
array_index_nospec() before calling the driver callback. Both the
mmap and fault paths use this helper, and the bounds checking is
consolidated into a single location.

From source inspection, the most defensible impact is out-of-bounds
access in the callback path, potentially leading to invalid PFN
remaps and crash/DoS.

Fixes: ddd89d0a059d ("vhost_vdpa: support doorbell mapping via mmap")
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Qihang Tang <q.h.hack.winter@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260508075821.92656-1-q.h.hack.winter@gmail.com>

vduse: hold vduse_lock across IDR lookup in open path

vduse_dev_open() looks up struct vduse_dev through the IDR and then
acquires dev->lock only after vduse_lock has been dropped.

This leaves a window where a concurrent VDUSE_DESTROY_DEV can remove the
same object from the IDR and free it before the open path locks the
device, leading to a use-after-free.

Close this race by keeping vduse_lock held until dev->lock has been
acquired in the open path, matching the lock ordering already used by
the destroy path.

Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Signed-off-by: Qihang Tang <q.h.hack.winter@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260508094659.94647-1-q.h.hack.winter@gmail.com>

vhost/vsock: Refuse the connection immediately when guest isn't ready

When the host initiates an AF_VSOCK connect() to a guest that has not
yet loaded the virtio-vsock transport (i.e. still booting), the caller
blocks for VSOCK_DEFAULT_CONNECT_TIMEOUT.

A caller that wants to know if the guest is up yet instead of waiting
could theoretically tune SO_VM_SOCKETS_CONNECT_TIMEOUT, but it's tricky
to find the right timeout, if not impossible: there's no way to
distinguish "guest won't reply because it's not up yet" vs "guest is up
and tried to reply, but was too slow".

Furthermore, this delay is pointless:
- If the guest doesn't initialize within this timeout, connect()
  returns ETIMEDOUT.
- If the guest **does** initialize, it'll reply with RST immediately,
  because there won't be a listener on the port yet; connect() returns
  ECONNRESET.

That's also inconsistent with the behavior at other initialization
stages: if a connection is attempted when the guest driver is already
loaded, but nothing is listening yet, we return ECONNRESET immediately
without waiting.

Fix this by checking the RX virtqueue backend in
vhost_transport_send_pkt() before queuing. If it's NULL, return
-EHOSTUNREACH immediately.

Callers that used to get ETIMEDOUT will now usually get EHOSTUNREACH.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Co-developed-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
Signed-off-by: Polina Vishneva <polina.vishneva@virtuozzo.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260513145842.809404-1-polina.vishneva@virtuozzo.com>

virtio: add missing kernel-doc for map and vmap members

Commit bee8c7c24b73 ("virtio: introduce map ops in virtio core") and
commit b16060c5c7d5 ("virtio: introduce virtio_map container union")
added 'map' and 'vmap' members to struct virtio_device but did not
update the kernel-doc comment block. This caused 'make htmldocs' to
emit warnings:

./include/linux/virtio.h:188 struct member 'map' not described in 'virtio_device'
./include/linux/virtio.h:188 struct member 'vmap' not described in 'virtio_device'

Add the missing entries in struct-declaration order to match the
existing convention in the file. After this patch, 'make htmldocs'
no longer emits these warnings.

Fixes: bee8c7c24b73 ("virtio: introduce map ops in virtio core")
Fixes: b16060c5c7d5 ("virtio: introduce virtio_map container union")
Reported-by: Luis Felipe Hernandez <luis.hernandez093@gmail.com>
Signed-off-by: Christian Fontanez <christfontanez@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20260519013321.32511-1-christfontanez@gmail.com>

Input: ipaq-micro-keys - simplify allocation

Embed the keycode array in the struct to have a single allocation.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20260609213532.25181-1-rosenp@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

clocksource: move NXP timer selection to drivers/clocksource

The Kconfig logic for selecting the scheduler clocksource on
NXP Vybrid (VF610) uses a `choice` block restricted to 32-bit ARM. This
prevents 64-bit architectures, such as the NXP S32 family, from enabling
the NXP Periodic Interrupt Timer (PIT) driver (CONFIG_NXP_PIT_TIMER).

Relocate the NXP clocksource selection from arch/arm/mach-imx/Kconfig to
drivers/clocksource/Kconfig. This allows the configuration to be shared
across different architectures.

Update the selection to include support for ARCH_S32 and add a "None"
option restricted to ARCH_S32, since Vybrid lacks the ARM Architected
Timer. The Vybrid Global Timer option is restricted to ARCH_MULTI_V7
SOC_VF610 platforms to prevent it from being visible on Cortex-M4 builds,
which lack the ARM Global Timer hardware.

Fixes: bee33f22d7c3 ("clocksource/drivers/nxp-pit: Add NXP Automotive s32g2 / s32g3 support")
Signed-off-by: Enric Balletbo i Serra <eballetb@redhat.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260514-fix-nxp-timer-v3-1-a3e68fdb505e@redhat.com

clocksource/drivers/timer-tegra186: Reserve and service a kernel watchdog

Tegra SoCs supports multiple watchdog timers. If the kernel crashes or
hangs before userspace enables a watchdog, the system cannot recover and
may remain bricked, e.g. after a failed OTA update. The driver currently
leaves all watchdogs disabled until userspace configures them.

Reserve first available watchdog as a kernel-only watchdog for Tegra186
and Tegra234. Arm it during probe (120s timeout) and keep it alive in
the driver IRQ handler. Do not register it to userspace. Other available
watchdogs remain exposed to userspace. This guarantees the system can
reset itself in case of a hang or crash even when userspace never starts.

Signed-off-by: Kartik Rajput <kkartik@nvidia.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260507154557.2082697-5-kkartik@nvidia.com

clocksource/drivers/timer-tegra186: Register all accessible watchdog timers

Tegra186+ SoCs expose multiple watchdog timers, but the driver only
registers WDT(0).

Iterate over num_wdts and, for each WDT, check the SCR (firewall) registers
in the TKE block to determine whether Linux has read and write access.
Register the watchdogs that are accessible.

Signed-off-by: Kartik Rajput <kkartik@nvidia.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260507154557.2082697-4-kkartik@nvidia.com

clocksource/drivers/timer-tegra186: Correct num_wdts for Tegra186 and Tegra234

On Tegra186 and Tegra234, WDT2 is connected to the Audio Processing
Engine (APE) and cannot be accessed from Linux. Only WDT0 and WDT1
are accessible to Linux.

Update num_wdts from 3 to 2 for both Tegra186 and Tegra234 to reflect
the actual number of watchdogs available to Linux.

Signed-off-by: Kartik Rajput <kkartik@nvidia.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260507154557.2082697-3-kkartik@nvidia.com

clocksource/drivers/timer-tegra186: Fix support for multiple watchdog instances

Tegra SoCs support multiple watchdogs; currently only one (WDT0) is
used. When multiple watchdogs are registered, tegra186_wdt_enable()
overwrites the TKEIE(x) register, discarding any existing watchdog
interrupt enable bits. As a result, enabling one watchdog inadvertently
disables interrupts for the others.

Fix this by preserving the existing TKEIE(x) value and updating it
using a read-modify-write sequence.

Fixes: 42cee19a9f83 ("clocksource: Add Tegra186 timers support")
Cc: stable@vger.kernel.org
Signed-off-by: Kartik Rajput <kkartik@nvidia.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260507154557.2082697-2-kkartik@nvidia.com

Merge branch 'fix-kptr-dtor-deadlock'

Kumar Kartikeya Dwivedi says:

====================
Fix kptr dtor deadlock

Referenced kptr destruction can run from tracing/NMI contexts through
bpf_obj_drop() and map value update/delete paths, reaching NMI-unsafe
special field teardown and deadlocks. Justin reported the issue and
iterated on fixes in [0]-[2], and also confirmed the bpf_obj_drop()
reproducer in [3].

This series rejects unsafe obj drops from non-iterator tracing programs,
limits map value recycle to NMI-safe field cancellation, and adds
focused selftests for the obj_drop(), NMI delete, and recycle teardown
cases.

See patches for details.

  [0]: https://lore.kernel.org/bpf/20260505150851.3090688-1-utilityemal77@gmail.com
  [1]: https://lore.kernel.org/bpf/20260507175453.1140400-1-utilityemal77@gmail.com
  [2]: https://lore.kernel.org/bpf/20260519011450.1144935-1-utilityemal77@gmail.com
  [3]: https://lore.kernel.org/bpf/agyG3eQwgmoJwmj2@suesslenovo

Changelog:
----------
v2 -> v3
v2: https://lore.kernel.org/bpf/20260609093719.2858096-1-memxor@gmail.com

* Replace bpf_obj_cancel_fields() to use bpf_map_free_internal_structs(). (Mykyta)
* Fix CI failures.

v1 -> v2
v1: https://lore.kernel.org/bpf/20260608144841.1732406-1-memxor@gmail.com

* Drop is_tracing_prog_type() fix due to compat breakage, revisit separately.
* Rework bpf_obj_drop() fix to additionally reject non-iter tracing progs.
====================

Link: https://patch.msgid.link/20260609202548.3571690-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Exercise kptr map update lifetime

Add focused map_kptr coverage for BPF-side map updates that touch values
containing referenced kptrs.

The new syscall programs stash the testmod refcounted object in an array
map, a preallocated hash map, and a no-prealloc hash map, then update the
same map from BPF. The refcount must remain elevated after the update,
while the userspace runner destroys the skeleton and reuses the existing
refcount wait to confirm map teardown releases the kptr.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260609202548.3571690-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Exercise unsafe obj drops from tracing progs

Add task_kfunc failure cases for bpf_obj_drop() on local objects with
referenced kptr fields from tracing and NMI tracing programs. These programs
must be rejected because dropping the object would run full special-field
destruction synchronously in an unsafe context.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260609202548.3571690-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Cancel special fields on map value recycle

Map update and delete paths currently call bpf_obj_free_fields() when a
value is being replaced or recycled. That makes field destruction depend
on the context of the update/delete operation. For tracing programs this
can include NMI context, where referenced kptr destructors, uptr
unpinning, and graph root destruction are not generally safe.

Introduce bpf_obj_cancel_fields() for the reusable-value path. It only
performs NMI-safe cleanup for timer, workqueue, and task_work fields.
Fields that need full destruction are left attached to the recycled value
and are destroyed by the final cleanup path instead.

Switch array and hashtab update/delete/recycle paths to this cancel
helper. Keep bpf_obj_free_fields() for final map destruction and for
bpf_mem_alloc destructors. Preallocated hashtabs do not have allocator
destructors, so teardown continues to walk the normal and extra elements
and fully destroy their fields.

This deliberately relaxes the eager-free semantics of map update/delete
for special fields. Programs that relied on a recycled map slot becoming
empty immediately after update/delete were relying on behavior that
cannot be implemented safely from every BPF execution context without
offloading arbitrary destructors.

There is a chance this change breaks programs making assumptions
regarding the eager freeing of fields. If so, we can relax semantics to
cancellation only when irqs_disabled() is true in the future. However,
theoretically, map values that get reused eagerly already have weaker
guarantees as parallel users can recreate freed fields before the new
element becomes visible again.

Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr")
Signed-off-by: Justin Suess <utilityemal77@gmail.com>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260609202548.3571690-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Reject bpf_obj_drop() from tracing progs

bpf_obj_drop() runs bpf_obj_free_fields() synchronously for
program-allocated objects. When such an object contains NMI unsafe
fields, tracing programs that can run from arbitrary instrumented
context can reach that destruction from unsafe contexts, including NMI.

NMI is likely one instance of this problem, and other instances would
include possible unsafe reentrancy. Deferring bpf_obj_drop() is not
appealing either: it would add delayed-free machinery to a release
operation that otherwise has straightforward synchronous ownership
semantics.

Reject bpf_obj_drop() and bpf_percpu_obj_drop() from tracing programs
that may run from unsafe contexts unless every field in the object's BTF
record is explicitly NMI safe. Do not reject sleepable
BPF_PROG_TYPE_TRACING programs, since they are not the arbitrary/NMI
contexts that motivate the restriction.

Note that while bpf_rb_root and bpf_list_head would be NMI safe on their
own to free, the objects recursively held by them may not be; be
conservative and just mark them as not NMI safe for now.

Use a whitelist for the NMI-safe field set instead of listing only known
NMI unsafe fields. Locks, async fields, unreferenced kptrs, and
refcounts are known to be NMI safe because their destruction is either a
no-op, simple state reset, or async cancellation. Referenced kptrs,
percpu referenced kptrs, uptrs, graph roots, graph nodes, and any future
field type are rejected until audited for arbitrary tracing and NMI
contexts. This is less susceptible to future changes in fields that were
previously safe by exclusion, and to new fields being added without
updating this check.

Convert the existing recursive local-object drop success case to a
syscall program in the same commit, since this verifier change makes the
old tracing program form invalid. The test still exercises
bpf_obj_drop() releasing a referenced task kptr from a safe program
type.

Fixes: ac9f06050a35 ("bpf: Introduce bpf_obj_drop")
Signed-off-by: Justin Suess <utilityemal77@gmail.com>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260609202548.3571690-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'selftests-bpf-fix-tests-for-llvm23-true-signature'

Yonghong Song says:

====================
selftests/bpf: Fix tests for llvm23 true signature

LLVM23 ([1]) records the 'true' function signature in BTF, i.e. the
signature inferred after optimization rather than the one written in C.
This caused two kinds of selftest failures (see below).

Case 1: keep int return type for tailcall subprogs

The verifier requires any subprog that issues a bpf_tail_call to return
an 'int' (see check_btf_func() in kernel/bpf/check_btf.c, which rejects
it with "tail_call is only allowed in functions that return 'int'").

Several tailcall subprogs do 'return 0' (or another constant) whose
result no caller uses. With llvm23 the compiler folds the constant and,
since the return value is dead, optimizes the subprog to effectively
return 'void' and records 'void' in BTF, so the program fails to load.

Use barrier_var() and __sink() to prevent returned value from being
optimized.

Case 2: adjust tracing prog ctx layout for the true signature

test_pkt_access_subprog2() has an unused argument that llvm optimizes
away. Before llvm23 the BTF signature did not match the optimized
assembly, so the verifier fell back to MAX_BPF_FUNC_REG_ARGS (5) u64
arguments and the fexit return value sat after args[5]. With llvm23 the
true signature has a single argument, so the return value moves to the
slot after args[1]. Select the matching ctx struct based on __clang_major__
so the test works with both old and new llvm.

  [1] https://github.com/llvm/llvm-project/pull/198426

Changelogs:
  v1 -> v2:
    - v1: https://lore.kernel.org/bpf/20260609163947.1717694-1-yonghong.song@linux.dev/
    - Do not use bpf array map or bpf global var. Use __sink() instead.
====================

Link: https://patch.msgid.link/20260609233402.2711071-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Adjust fexit_bpf2bpf ctx layout for llvm23 true signature

test_pkt_access_subprog2() is defined in C as

int test_pkt_access_subprog2(int val, volatile struct __sk_buff *skb)

but llvm optimizes away the unused 'int val' argument. Before llvm23 the
BTF signature did not match the optimized assembly, so the verifier set
attach_func_proto to NULL and fell back to MAX_BPF_FUNC_REG_ARGS (5) u64
arguments (see btf_ctx_access()). The fexit ctx struct therefore placed
the return value after args[5].

With llvm23 the 'true' signature

int test_pkt_access_subprog2(volatile struct __sk_buff *skb)

is recorded in BTF, so nr_args becomes 1 and the return value moves to
the slot right after args[1]. Select the matching args_subprog2 layout
based on __clang_major__ so the test works with both old and new llvm.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260609233412.2712178-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Keep int return type for tailcall subprogs

LLVM23 ([1]) supports 'true' function signature in BTF. The return type
of the caller of a tailcall must be an 'int'. Otherwise, verification will
fail (see check_btf_func() in check_btf.c). So with llvm23, it is possible
that the compiler may change the caller's return type from 'int' to 'void'.
To prevent this, barrier_var() and __sink() are used to avoid returning
a constant prone to be optimized.

[1] https://github.com/llvm/llvm-project/pull/198426

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260609233407.2711577-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'net-dsa-realtek-rtl8365mb-bridge-offloading-and-vlan-support'

Luiz Angelo Daros de Luca says:

====================
net: dsa: realtek: rtl8365mb: bridge offloading and VLAN support

This series introduces bridge offloading, FDB management, and VLAN support
for the Realtek rtl8365mb DSA switch driver. The primary goal is to
enable hardware frame forwarding between bridge ports, reducing CPU
overhead and providing advanced features like VLAN and FDB isolation.

Some of these patches are based on original work by Alvin Šipraga,
subsequently adapted and updated for the current net-next state.

I attempted to reach Alvin for review of the final version but was
unable to establish contact. Any regressions in this version are my
responsibility.
====================

Link: https://patch.msgid.link/20260606-realtek_forward-v13-0-b9e409687cbe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: realtek: rtl8365mb: add bridge port flags

Implement support for bridge port flags to control learning and flooding
behavior. This patch maps hardware functionalities to the following
bridge flags:

- BR_LEARNING
- BR_FLOOD
- BR_MCAST_FLOOD
- BR_BCAST_FLOOD

By default, all flooding types are enabled during port setup to ensure
standard bridge behavior.

Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://patch.msgid.link/20260606-realtek_forward-v13-9-b9e409687cbe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: realtek: rtl8365mb: add port_bridge_{join,leave}

Implement hardware offloading of bridge functionality. This is achieved
by using the per-port isolation registers, which contain a forwarding
port mask. The switch will refuse to forward packets ingressed on a
given port to a port which is not in its forwarding mask.

For each bridge that is offloaded, use the DSA-provided bridge number
for the Extended Filtering ID (EFID). When using Independent VLAN
Learning (IVL), the forwarding database is keyed with the tuple
{VID, MAC, EFID}. There are 8 EFIDs available (0~7), but we reserve the
default EFID 0 for standalone ports where learning is disabled. This
fits nicely because DSA indexes the bridge number starting from 1.

Because of the limited number of EFIDs, we have to set the
max_num_bridges property of our switch to 7: we can't offload more than
that or we will fail to offer IVL as at least two bridges would end up
having to share an EFID.

All ports start isolated, forwarding exclusively to CPU ports, and
with VLAN transparent, ignoring VLAN membership. Once a member in a
bridge, the port isolation is expanded to include the bridge members.
When that bridge enables VLAN filtering, the VLAN transparent feature is
disabled, letting the switch filter based on VLAN setup.

Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Co-developed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://patch.msgid.link/20260606-realtek_forward-v13-8-b9e409687cbe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: realtek: rtl8365mb: add FDB support

Implement support for FDB and MDB management for the RTL8365MB series
switches.

The hardware supports IVL by keying the unicast forwarding database with
the {MAC, VID, EFID} tuple. The Extended Filtering ID (EFID) is 3 bits
wide, providing 8 unique filtering domains. This driver reserves EFID 0
for standalone ports, effectively limiting the hardware offload to a
maximum of 7 bridges. The multicast database uses a {MAC, VID} key, with
ports from different bridges sharing the same multicast group.

Introduce a mutex lock (l2_lock) to protect concurrent L2 table updates.

Add support for forwarding database operations, including unicast and
multicast entry handling as well as fast aging support.

Set DSA switch flags assisted_learning_on_cpu_port and fdb_isolation.

Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Co-developed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://patch.msgid.link/20260606-realtek_forward-v13-7-b9e409687cbe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: realtek: rtl8365mb: add VLAN support

Realtek RTL8365MB switches (a.k.a. RTL8367C family) use two different
structures for VLANs:

- VLAN4K: A full table with 4096 entries defining port membership and
  tagging.
- VLANMC: A smaller table with 32 entries used primarily for PVID
  assignment.

In this hardware, a port's PVID must point to an index in the VLANMC
table rather than a VID directly. Since the VLANMC table is limited to
32 entries, the driver implements a dynamic allocation scheme to
maximize resource usage:

- VLAN4K is treated by the driver as the source of truth for membership.
- A VLANMC entry is only allocated when a port is configured to use a
  specific VID as its PVID.
- VLANMC entries are deleted when no longer needed as a PVID by any port.

Although VLANMC has a members field, the switch only checks membership
in the VLAN4K table. This driver will use VLANMC members field as way to
track which ports are using that entry as PVID.

VLANMC index 0, although a valid entry, is reserved in this driver as a
neutral PVID value for ports not using a specific PVID.

In the subsequent RTL8367D switch family, VLANMC table was
removed and PVID assignment was delegated to a dedicated set of
registers.

The use of FIELD_PREP for reconstructing LO/HI values was suggested by
Yury Norov.

Fix for vlan_setup and vlan_filtering was suggested by Abdulkader
Alrezej.

Suggested-by: Yury Norov <ynorov@nvidia.com>
Suggested-by: Abdulkader Alrezej <abdulkader.alrezej@gmail.com>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Co-developed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://patch.msgid.link/20260606-realtek_forward-v13-6-b9e409687cbe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: realtek: rtl8365mb: add table lookup interface

Add a generic table lookup interface to centralize access to
the RTL8365MB internal tables.

This interface abstracts the low-level table access logic and
will be used by subsequent commits to implement FDB and VLAN
operations.

Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Co-developed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://patch.msgid.link/20260606-realtek_forward-v13-5-b9e409687cbe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: realtek: rtl8365mb: prepare for multiple source files

Rename rtl8365mb.c to rtl8365mb_main.c in preparation for subsequent
commits which add additional source files to the driver.

The trailing backslash in the Makefile is deliberate. It allows for new
files to be added without clobbering git history.

Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Co-developed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://patch.msgid.link/20260606-realtek_forward-v13-4-b9e409687cbe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: realtek: rtl8365mb: use dsa helpers for port iteration

Convert open-coded port iteration loops to use the DSA helpers and
restructure rtl8365mb_setup() into clear blocking, user, and
CPU port phases.

As part of this refactoring, unused ports are explicitly placed into a
blocked, isolated state with learning disabled, ensuring safe default
hardware behavior. The driver also does not allocate a virtual IRQ
mapping for unused ports. To accommodate this, a guard check is added to
the interrupt handler (rtl8365mb_irq) to safely skip ports without a
valid IRQ mapping. The irq domain teardown, however, does clean all
ports as external PHYs may still map the IRQ.

Furthermore, since the new initialization loop starts with all ports
administratively isolated by default, CPU port forwarding and isolation
masks are explicitly configured at the end of the setup phase to prevent
egress traffic from being blocked.

Suggested-by: Abdulkader Alrezej <abdulkader.alrezej@gmail.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://patch.msgid.link/20260606-realtek_forward-v13-3-b9e409687cbe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: realtek: rtl8365mb: reject unsupported topologies

Explicitly enforce the presence of a CPU port (-EINVAL) and reject DSA
cascade links (-EOPNOTSUPP) during setup to prevent silent failures.

These topologies were already non-functional. Without a CPU port, the
driver does not activate CPU tagging. Additionally, the switch hardware
was not designed to be cascaded, and DSA links never worked because
CPU tagging is not enabled for them.

Reviewed-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://patch.msgid.link/20260606-realtek_forward-v13-2-b9e409687cbe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: realtek: rtl8365mb: use ERR_PTR

Convert numeric error codes into human-readable strings by using %pe
together with ERR_PTR() in dev_err() messages. Also use dev_err_probe()
instead of checking for -EPROBE_DEFER.

Reviewed-by: Linus Walleij <linusw@kernel.org>
Reviewed-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://patch.msgid.link/20260606-realtek_forward-v13-1-b9e409687cbe@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: lan966x: restore RX state on reload failure

lan966x_fdma_reload() backs up rx->page_pool and rx->fdma before
reallocating the RX resources for the new MTU. If the allocation fails,
the restore path puts these fields back before restarting RX.

However, the reload path also updates rx->page_order and rx->max_mtu
before calling lan966x_fdma_rx_alloc(). These fields are not restored on
failure, so RX can be restarted with the old pages, old FDMA state and
old page pool, but with the page geometry from the failed new MTU.

This can make the XDP path advertise a frame size derived from the new
page_order while the actual RX pages still come from the old allocation.
For example, after a failed reload to a jumbo MTU, xdp_init_buff() may be
called with a frame size larger than the restored RX pages.

lan966x_fdma_rx_alloc_page_pool() also registers the newly allocated page
pool with each port's XDP RXQ before fdma_alloc_coherent() is called. If
fdma_alloc_coherent() fails, the new page pool is destroyed, but the
rollback path does not restore the per-port XDP RXQ mem model
registration either.

Save and restore rx->page_order and rx->max_mtu, and restore the old page
pool registration for each port's XDP RXQ before RX is started again.
This keeps the restored RX state consistent after a failed reload.

Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
Reviewed-by: David Carlier <devnexen@gmail.com>
Link: https://patch.msgid.link/20260607145747.1494514-1-lgs201920130244@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ptp: ocp: fix resource freeing order

Commit a60fc3294a37 ("ptp: rework ptp_clock_unregister() to disable
events") added a call to ptp_disable_all_events() which changes the
configuration of pins if they support EXTTS events. In ptp_ocp_detach()
pins resources are freed before ptp_clock_unregister() and it leads to
use-after-free during driver removal. Fix it by changing the order of
free/unregister calls. To avoid irq handler running on the other core
while ptp device unregistering, call synchronize_irq() after HW is
configured to stop producing irqs and no irqs are in-flight.

Fixes: a60fc3294a37 ("ptp: rework ptp_clock_unregister() to disable events")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260608155952.240304-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: pse-pd: pd692x0: support disabling disable ports GPIO

Microchip PSE controllers have a dedicated disable ports input that like it
name says disables PoE on all ports.

So lets support parsing that GPIO and using the GPIO flags to set it low
by default and enable PoE on all ports during probe.

Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20260607165600.1260210-2-robert.marko@sartura.hr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: pse-pd: microchip,pd692x0: add port disable GPIO

Microchip PSE controllers have a dedicated port disable input that like it
name suggest, will disable PoE on all ports.

So, lets document that GPIO.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Reviewed-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20260607165600.1260210-1-robert.marko@sartura.hr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: add getsockopt_iter binary to .gitignore

The generated binary for getsockopt_iter.c shouldn't show up as an
untracked git file after running:

make -C tools/testing/selftests TARGETS=net install

Let's just ignore it.

Fixes: d39887f55d8e ("net: selftests: add getsockopt_iter regression tests")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Link: https://patch.msgid.link/20260608112259.4022-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tun: zero the whole vnet header in tun_put_user()

tun_put_user() declares an on-stack struct virtio_net_hdr_v1_hash_tunnel
without zeroing it. For a non-tunnel skb, virtio_net_hdr_tnl_from_skb()
only initializes the first 10 bytes (sizeof(struct virtio_net_hdr)),
leaving bytes 10..23 (num_buffers and the hash/tunnel fields) as stack
garbage.

An unprivileged user can set the vnet header size to 24 with
TUNSETVNETHDRSZ, so __tun_vnet_hdr_put() copies all 24 bytes of the
partially-initialized struct to userspace, leaking 14 bytes of kernel
stack on every read of a non-tunnel packet.

Fix it the same way tun_get_user() already does by zeroing the whole
header right after declaration.

Fixes: 288f30435132 ("tun: enable gso over UDP tunnel support.")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260607054428.3050243-1-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/rds: fix NULL deref in rds_ib_send_cqe_handler() on masked atomic completion

rds_ib_xmit_atomic() always programs a masked atomic opcode
(IB_WR_MASKED_ATOMIC_CMP_AND_SWP or IB_WR_MASKED_ATOMIC_FETCH_AND_ADD)
for every RDS atomic cmsg.  But the completion-side switch in
rds_ib_send_unmap_op() only handles the non-masked opcodes, so a masked
atomic completion falls through to default and returns rm == NULL while
send->s_op is left set.  rds_ib_send_cqe_handler() then dereferences the
NULL rm via rm->m_final_op, oopsing in softirq context.  An unprivileged
AF_RDS sendmsg() of an atomic cmsg over an active RDS/IB connection
triggers it; on hardware that natively accepts masked atomics (mlx4,
mlx5) no extra setup is needed.

  RDS/IB: rds_ib_send_unmap_op: unexpected opcode 0xd in WR!
  Oops: general protection fault [#1] SMP KASAN
  KASAN: null-ptr-deref in range [0x0000000000000190-0x0000000000000197]
  RIP: rds_ib_send_cqe_handler+0x25c/0xb10 (net/rds/ib_send.c:282)
  Call Trace:
   <IRQ>
   rds_ib_send_cqe_handler (net/rds/ib_send.c:282)
   poll_scq (net/rds/ib_cm.c:274)
   rds_ib_tasklet_fn_send (net/rds/ib_cm.c:294)
   tasklet_action_common (kernel/softirq.c:943)
   handle_softirqs (kernel/softirq.c:573)
   run_ksoftirqd (kernel/softirq.c:479)
   </IRQ>
  Kernel panic - not syncing: Fatal exception in interrupt

Handle the masked atomic opcodes in the same case as the non-masked
ones: they map to the same struct rds_message.atomic union member, so
the existing container_of()/rds_ib_send_unmap_atomic() body is correct
for them.

Fixes: 20c72bd5f5f9 ("RDS: Implement masked atomic operations")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Reviewed-by: Allison Henderson <achender@kernel.org>
Link: https://patch.msgid.link/20260606192447.1179255-2-bestswngs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: guard timestamp cmsgs to real error queue skbs

skb_is_err_queue() treats PACKET_OUTGOING as the sole marker for an skb
from sk_error_queue. That assumption is not true for AF_PACKET sockets:
outgoing packet taps are also delivered to packet sockets with
skb->pkt_type == PACKET_OUTGOING, but their skb->cb is owned by AF_PACKET
instead of struct sock_exterr_skb.

If such an skb is received with timestamping enabled, the generic
timestamp cmsg path can read AF_PACKET control-buffer state as
sock_exterr_skb::opt_stats. With SO_RXQ_OVFL enabled, the packet drop
counter overlaps opt_stats. An odd drop count makes the path emit
SCM_TIMESTAMPING_OPT_STATS with skb->len and skb->data. For non-linear
skbs this copies past the linear head and can trigger hardened usercopy or
disclose adjacent heap contents.

Keep skb_is_err_queue() local to net/socket.c, but make it verify that
the PACKET_OUTGOING marker is paired with the sock_rmem_free destructor
installed by sock_queue_err_skb(). AF_PACKET receive skbs use normal
receive ownership and no longer pass as error-queue skbs, while legitimate
sk_error_queue entries keep the PACKET_OUTGOING marker and sock_rmem_free
ownership.

Fixes: 8605330aac5a ("tcp: fix SCM_TIMESTAMPING_OPT_STATS for normal skbs")
Signed-off-by: Kyle Zeng <kylebot@openai.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260607021819.49698-1-kylebot@openai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: validate embedded INIT chunk and address list lengths in cookie

sctp_unpack_cookie() only checked that the embedded INIT chunk length
did not exceed the remaining cookie payload, but did not ensure that the
INIT chunk is large enough to contain a complete INIT header.

A malformed COOKIE_ECHO can therefore carry a truncated INIT chunk whose
length field is smaller than sizeof(struct sctp_init_chunk).  Later,
sctp_process_init() accesses INIT parameters unconditionally, which may
lead to out-of-bounds reads.

In addition, raw_addr_list_len is not fully validated against the
remaining cookie payload. When cookie authentication is disabled, an
attacker can supply an oversized raw_addr_list_len and cause
sctp_raw_to_bind_addrs() to read beyond the end of the cookie. The
address parser also lacks sufficient bounds checks for parameter headers
and lengths, allowing malformed address parameters to trigger
out-of-bounds reads.

Fix this by:

- requiring the embedded INIT chunk length to be at least sizeof(struct
  sctp_init_chunk);
- validating that the INIT chunk and raw address list together fit
  within the cookie payload;
- verifying sufficient data exists for each address parameter header and
  payload before parsing it.

Note that sctp_verify_init() must be called after sctp_unpack_cookie()
and before sctp_process_init() when cookie authentication is disabled.
This will be addressed in a separate patch.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/75af23a89adf881a0895d511775e4770da367cbf.1780873427.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-add-retry-mechanism-to-ndo_set_rx_mode_async'

Stanislav Fomichev says:

====================
net: add retry mechanism to ndo_set_rx_mode_async

Original async ndo_set_rx_mode work left one place where we do netdev_WARN
in response to a ENOMEM. The intent was to see whether actual real
users can hit that (adding uc/mc under memory pressure seems like a
very unlikely thing to do). However, it was quickly triggered by
syzbot's failslab. Add a retry mechanism and downgrade netdev_WARN
to netdev_err. The retry logic is a typical exponential backoff:
1, 2, 4, 8 seconds, 15 in total, hopefully enough for a system to resolve
memory pressure.
====================

Link: https://patch.msgid.link/20260608154014.227538-1-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ip6_vti: set netns_immutable on the fallback device.

john1988 and Noam Rathaus reported that vti6_init_net() does not set the
netns_immutable flag on the per-netns fallback tunnel device (ip6_vti0).

Other similar tunnel drivers (like ip6_tunnel, sit, ip6_gre, and ip_tunnel)
correctly set this flag during their fallback device initialization to
prevent them from being moved to another network namespace.

Fixes: 61220ab34948 ("vti6: Enable namespace changing")
Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://patch.msgid.link/20260608155918.787644-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt: convert to core rx_mode retry mechanism

Remove the driver-specific BNXT_STATE_L2_FILTER_RETRY + timer + sp_task
retry mechanism and rely on the core stack's ndo_set_rx_mode_async retry
instead.

bnxt_cfg_rx_mode() now returns errors instead of swallowing them. The
PF-unavailable case (-ENODEV from HWRM on a VF) is normalized to
-EAGAIN at the boundary so callers can match on a single "retry me"
errno without re-implementing the VF/-ENODEV check. Other errors
propagate unchanged.

This removes:
- BNXT_STATE_L2_FILTER_RETRY state bit
- BNXT_RX_MASK_SP_EVENT sp_event bit
- Retry trigger from bnxt_timer()
- BNXT_RX_MASK_SP_EVENT handling from bnxt_sp_task()

bnxt_init_chip() still calls bnxt_cfg_rx_mode() directly during open.
On a fresh open dev->uc is empty and the call effectively cannot fail
on the unicast path. But on FW reset reopen (bnxt_fw_reset_task ->
bnxt_open) a VF may have a populated dev->uc and the PF may be
transiently unavailable; since that path doesn't go through
__dev_open(), the follow-up rx_mode call that would otherwise drive
the core retry doesn't fire. On -EAGAIN, swallow the error and call
netif_rx_mode_schedule_retry() explicitly. The unicast filter loop
truncates vnic->uc_filter_count on failure, so the retry's delta check
sees pending work and reinstalls.

Cc: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20260608154014.227538-4-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: add retry mechanism to ndo_set_rx_mode_async

When ndo_set_rx_mode_async returns an error, schedule a retry with
exponential backoff (1s, 2s, 4s, 8s -- 15s total). Give up after the
4th retry and log an error via netdev_err().

This moves retry logic from individual drivers into the core stack.

Timer callback does not hold a ref on dev. Safe because the timer can
only be armed when dev is IFF_UP, and __dev_close_many runs
timer_delete_sync before clearing IFF_UP. Unregister always closes
IFF_UP devices first, so by the time dev can be freed the timer is
dead and cannot be re-armed.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260608154014.227538-3-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: change ndo_set_rx_mode_async return type to int

Change the return type of ndo_set_rx_mode_async from void to int to
allow drivers to report failures back to the core stack. This is a
prerequisite for adding retry logic in the core when drivers fail to
program RX filters (e.g. bnxt VF when PF is unavailable).

All existing implementations return 0 for now, maintaining current
behavior.

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260608154014.227538-2-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: fix uninit-value in __sctp_rcv_asconf_lookup()

__sctp_rcv_asconf_lookup() in net/sctp/input.c only checks that the ASCONF
chunk can hold the ADDIP header and a parameter header, then calls
af->from_addr_param(), which reads the full address (16 bytes for IPv6)
trusting the parameter's declared length.

An unauthenticated peer can send a truncated trailing ASCONF chunk that
declares an IPv6 address parameter but stops after the 4-byte parameter
header; reached from the no-association lookup path, from_addr_param() then
reads uninitialized bytes past the parameter.

Impact: an unauthenticated SCTP peer makes the receive path read up to 16
bytes of uninitialized memory past a truncated ASCONF address parameter.

The sibling __sctp_rcv_init_lookup() bounds parameters with
sctp_walk_params(); this path open-codes the fetch and omits the bound.
Verify the whole address parameter lies within the chunk before
from_addr_param() reads it, the same class of fix as commit 51e5ad549c43
("net: sctp: fix KMSAN uninit-value in sctp_inq_pop").

Fixes: df2185771439 ("[SCTP]: Update association lookup to look at ASCONF chunks as well")
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/20260608122234.459098-1-michael.bommarito@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

atm: drv: Replace strcpy() + strlcat() with snprintf()

Avoid string function that are due to be deprecated.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Link: https://patch.msgid.link/20260608095523.2606-36-david.laight.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: eth: benet: Use strscpy() to copy strings into arrays

Replacing strcpy() with strscpy() ensures that overflow of the target
buffer cannot happen.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
Link: https://patch.msgid.link/20260608095500.2567-4-david.laight.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Cache MANA_QUERY_LINK_CONFIG result to avoid repeated HWC queries

mana_query_link_cfg() sends an HWC command to firmware on every call,
but the link speed and QoS values it returns only change when the
driver explicitly calls mana_set_bw_clamp(). This function is called
not only by userspace via ethtool get_link_ksettings, but also
periodically by hv_netvsc through netvsc_get_link_ksettings and by
the sysfs speed_show attribute via dev_attr_show, resulting in
unnecessary HWC traffic every few minutes.

Add a link_cfg_error field to mana_port_context to cache the query
result. The field uses three states: 1 (not yet queried, initial
value set during mana_probe_port), 0 (success, speed/max_speed are
valid), or a negative errno for permanent errors like -EOPNOTSUPP
when the hardware does not support the command. Transient errors and
qos_unconfigured responses are not cached so that subsequent calls
will retry.

MANA is ops-locked because it implements net_shaper_ops, so the core
already takes netdev_lock() around all ethtool_ops and net_shaper_ops
entry points. Reuse that lock to serialize mana_query_link_cfg() and
mana_set_bw_clamp(). This prevents a concurrent mana_set_bw_clamp()
from racing with an in-flight query and publishing stale pre-clamp
speed/max_speed.

Invalidate the cache inside mana_set_bw_clamp() on success, so all
current and future callers that change the link configuration
automatically trigger a fresh query on the next mana_query_link_cfg()
call. Also reset link_cfg_error during resume in mana_probe() under
netdev_lock(), so that any query already in flight cannot later
store 0 and silently overwrite the post-resume invalidation.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Link: https://patch.msgid.link/20260606133301.2180073-1-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: validate netdev belongs to switch in .netdev_to_port()

The .netdev_to_port() currently takes only a net_device and returns the
port index, without verifying the netdev actually belongs to the switch
being operated on. This can cause flower rule parsing to silently
resolve to a wrong port on the local hardware.

Update both implementations felix_netdev_to_port() and
ocelot_netdev_to_port() to validate ownership. Also update the callers
in ocelot_flower.c to pass through the ocelot context.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260606125247.305167-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Fix NULL pointer dereference

PCIe errors detected by a Root Port or Downstream Port cause error
recovery services to run on all subordinate devices regardless of
administrative state.

The .error_detected() callback, bnxt_io_error_detected(), disables
and synchronizes IRQs via bnxt_disable_int_sync(), which calls
bnxt_cp_num_to_irq_num() to map completion rings to IRQs using
bp->bnapi.

Since bp->bnapi is allocated on NIC open and freed on NIC close, PCIe
error recovery on a closed NIC can dereference a NULL pointer.

Check if bp->bnapi is NULL before disabling and synchronizing IRQs.

Fixes: e5811b8c09df ("bnxt_en: Add IRQ remapping logic.")
Cc: stable@vger.kernel.org
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/aiNM1CY2-StPilxW@hpe.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mana: Add support for PF device 0x00C1

Update the device id table to include the new device id 0x00C1.
This device's BAR layout is similar to VF's, update the function,
mana_gd_init_registers(), accordingly.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/20260605212302.2135499-1-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: qca8k: Add support for force mode for fixed link topology

A fixed link topology is commonly used to connect this switch (on port
0 or 6) to a SoC's MAC over SGMII. When inband negotiation is not used,
the switch needs to be configured to operate in force mode. As such,
enable support for force mode.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: George Moussalem <george.moussalem@outlook.com>
Link: https://patch.msgid.link/20260605-qca8337-force-mode-v2-1-d9a6b6545bfa@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-motorcomm-8531s-set-ds-func-and-8522-driver'

Minda Chen says:

====================
Add motorcomm 8531s set ds func and 8522 driver

This patch is for Starfive JHB100 EVB board. JHB100 contain
1 RGMII/RMII and 1 RMII synopsys GMAC cores. In the EVB board, RGMII
interface connect with YT8531s Ethernet PHY. RMII interface connect
with YT8522 ethernet PHY. So patch 1-2 is for RGMII interface
patch 3 is RMII is for RMII interface.

JHB100 is a Starfive new RISC-V SoC for datacenter BMC (BaseBoard
Managent Controller). Similar with Aspeed 27x0.

The JHB100 minimal system upstream is in progress:
https://patchwork.kernel.org/project/linux-riscv/cover/20260508053632.818548-1-changhuang.liang@starfivetech.com/
====================

Link: https://patch.msgid.link/20260605060212.41895-1-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: motorcomm: Add YT8522 100M RMII PHY support

Add YT8522 100M RMII ethernet PHY base driver support, including
PHY ID and base config init function.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260605060212.41895-4-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: motorcomm: phy: set drive strength in YT8531s RGMII

Set RXD and RX CLK pin drive strength while in YT8531s connect
with RGMII. Need to check 8531s PHY ID because 8521 and 8531s
pin drive strength is different, 8521 can not call
yt8531_set_ds().

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260605060212.41895-3-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: motorcomm: move mdio lock out from yt8531_set_ds()

yt8531_set_ds() default set register with mdio lock and only called
with YT8531 PHY. But new type YT8531s support RGMII and has the same
pin strength setting with YT8531, YT8531s need to call yt8531_set_ds()
setting pin drive strength. But YT8531s config init function
yt8521_config_init() already get the mdio lock with phy_select_page().
If calling yt8521_config_init() with mdio lock will cause dead lock.

Need to get the lock before calling yt8531_set_ds() and move mdio
lock out from it for YT8531s.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260605060212.41895-2-minda.chen@starfivetech.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sctp: stream: fully roll back denied add-stream state

When ADD_OUT_STREAMS is denied, SCTP only shrinks the queued chunks and
then lowers outcnt. That leaves removed stream metadata behind, so a
later re-add can reuse a stale ext and hit a null-pointer dereference in
the scheduler get path.

Fix the rollback by tearing down the removed stream state the same way
other stream resizes do. Unschedule the current scheduler state, drop
the removed stream ext state with sctp_stream_outq_migrate(), and then
reschedule the remaining streams.

This keeps scheduler-private RR/FC/PRIO lists consistent while fully
rolling back denied outgoing stream additions.

Fixes: 637784ade221 ("sctp: introduce priority based stream scheduler")
Cc: stable@kernel.org
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Acked-by: Xin Long <lucien.xin@gmail.com>
Link: https://patch.msgid.link/d78954ecd94954653ee299400e98d74a03a6f7d3.1780603399.git.bronzed_45_vested@icloud.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mana-per-vport-eq'

Long Li says:

====================
net: mana: Per-vPort EQ and MSI-X management

This series moves EQ ownership from the shared mana_context to per-vPort
mana_port_context, enabling each vPort to have dedicated MSI-X vectors
when the hardware provides enough vectors. When vectors are limited, the
driver falls back to sharing MSI-X among vPorts.

The series introduces a GDMA IRQ Context (GIC) abstraction with reference
counting to manage interrupt context lifecycle. This allows both Ethernet
and RDMA EQs to dynamically acquire dedicated or shared MSI-X vectors at
vPort creation time rather than pre-allocating all vectors at probe time.
====================

Link: https://patch.msgid.link/20260605005717.2059954-1-longli@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>