git.ipfire.org Git - thirdparty/kernel/linux.git/log

soc: qcom: socinfo: Add PMIV0102 & PMIV0104 PMICs

Add the PMIV0102 and PMIV0104 to the pmic_models array.

Signed-off-by: Alexander Koskovich <akoskovich@pm.me>
Link: https://lore.kernel.org/r/20260413-add-pmic-ids-v1-1-1f40b8773ef8@pm.me
Signed-off-by: Bjorn Andersson <andersson@kernel.org>

wifi: rtw89: phy: set BB wrap of QAM options

Apply these options to selected QAM to TX signal under requirements.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260511070148.25257-7-pkshih@realtek.com

wifi: rtw89: phy: set BB wrap of QAM threshold

Make hardware to consider which QAM (data rate) to apply BB wrapper
parameters, which are set by other registers.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260511070148.25257-6-pkshih@realtek.com

wifi: rtw89: phy: set BB wrap of control options

Set main options to control BB wrap functions. For example, enable options
by data bandwidth or channel bandwidth.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260511070148.25257-5-pkshih@realtek.com

wifi: rtw89: phy: set BB wrap of DPD by bandwidth

Apply different settings to out-of-band DPD (digital pre-distortion) by
bandwidth, as hardware defines separate registers.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260511070148.25257-4-pkshih@realtek.com

wifi: rtw89: phy: set BB wrap of out-of-band DPD

Control the out-of-band DPD (digital pre-distortion) to ensure out-of-band
signal under requirement.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260511070148.25257-3-pkshih@realtek.com

wifi: rtw89: phy: define BB wrap data for RTL8922D variants

The BB wrap is a hardware block to control TX power. Since RTL8922D has
many variants with different CID and RFE types, prepare flow and dummy
struct adopt to configuration functions for coming patches.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20260511070148.25257-2-pkshih@realtek.com

selftests/bpf: Fix test for refinement of single-value tnum

This patch fixes the "bounds refinement with single-value tnum on umin"
verifier selftest. This selftest was introduced in commit e6ad477d1bf8
("selftests/bpf: Test refinement of single-value tnum") to cover the
logic from __update_reg64_bounds(), introduced in commit efc11a667878
("bpf: Improve bounds when tnum has a single possible value"). However,
the test still passes if that last commit is reverted.

The test is supposed to cover the case when the tnum and u64 range (or
cnum64 now) overlap in a single value. __update_reg64_bounds() detects
that case and refines the bounds to a known constant. However, the
constants for the test were poorly chosen and the bounds get refined to
a known constant even without __update_reg64_bounds(). The code is as
follows:

  0: call bpf_get_prandom_u32#7  ; R0=scalar()
  1: r0 |= 224                   ; R0=scalar(umin=umin32=224,var_off=(0xe0; 0xffffffffffffff1f))
  2: r0 &= 240                   ; R0=scalar(smin=umin=smin32=umin32=224,smax=umax=smax32=umax32=240,var_off=(0xe0; 0x10))
  3: if r0 == 0xf0 goto pc+2     ; R0=224

After instruction 3, we have u64=[0xe0; 0xef] and tnum=(0xe0; 0x10).
__reg_bound_offset() is able to deduce a new tnum from the u64,
tnum=(0xe0; 0x0f), which combined with the existing tnum gives us a
constant: 0xe0 or 224.

We can easily fix this by choosing different starting bounds. If we make
it u64=[0xe1; 0xf0], then __reg_bound_offset() doesn't have any impact.

Fixes: e6ad477d1bf8 ("selftests/bpf: Test refinement of single-value tnum")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/be2dc2c3d85120286e60b3029b3338fff339f942.1779121582.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Reject unsupported -k option in vmtest.sh

vmtest.sh does not document a -k option and does not handle it in the
getopts case statement. However, the getopts optstring includes k, which
causes the script to accept -k silently instead of reporting it as an
invalid option.

Remove k from the optstring so unsupported options are rejected through
the existing invalid-option path.

Fixes: c9709f52386d ("bpf: Helper script for running BPF presubmit tests")
Signed-off-by: Roman Kvasnytskyi <roman@kvasnytskyi.net>
Acked-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/20260516120625.80839-1-roman@kvasnytskyi.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Override EXTRA_LDFLAGS for static builds

When running vmtest.sh with static linking, the bpftool_map_access
selftests fail. These selftests are calling the bpftool binary in
tools/sbin/ directly, which results in the following error:

error while loading shared libraries: libLLVM.so.21.1:
cannot open shared object file: No such file or directory

To fix this, we need to also build bpftool statically. That can be done
by setting EXTRA_LDFLAGS=-static.

Fixes: 2d96bbdfd3b5 ("selftests/bpf: convert test_bpftool_map_access.sh into test_progs framework")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/714556da329c812988010ffe53173d9152570a78.1778669303.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'intel-wired-lan-driver-updates-2026-05-15-ice-ixgbevf-igc-e1000e'

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2026-05-15 (ice, ixgbevf, igc, e1000e)

For ice:
Jake fixes a mismatch in locking around wait queue usage.

Jose Ignacio Tornos Martinez adjusts allowed lower bound for VF data
buffer size to accommodate low MTU sizes.

Marcin adjusts for -EEXIST to not trigger error path when the promisc
filter already exists as part of adding VLAN Ids.

Grzegorz fixes a few issues related to PTP. He adds locking to
ice_start_phy_timer_eth56g() to protect proper register programming.
Fixes the PTP lock used in 2xNAC configuration to always be the primary
and restores PTP configuration on ethtool channel changes.

For ixgbevf:
Michael Bommarito sets freed skb pointer to NULL to prevent
use-after-free.

For igc:
Kohei Enju resolves a couple of issues reported by Sashiko; setting
buffer type for an SMD skb and freeing skb on error of
igc_fpe_init_tx_descriptor().
====================

Link: https://patch.msgid.link/20260515182419.1597859-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

igc: fix potential skb leak in igc_fpe_xmit_smd_frame()

When igc_fpe_init_tx_descriptor() fails, no one takes care of an
allocated skb, leaking it. [1]
Use dev_kfree_skb_any() on failure.

Tested on an I226 adapter with the following command, while injecting
faults in igc_fpe_init_tx_descriptor() to trigger the error path.
# ethtool --set-mm $DEV verify-enabled on tx-enabled on pmac-enabled on

[1]
unreferenced object 0xffff888113c6cdc0 (size 224):
...
  backtrace (crc be3d3fda):
    kmem_cache_alloc_node_noprof+0x3b1/0x410
    __alloc_skb+0xde/0x830
    igc_fpe_xmit_smd_frame.isra.0+0xad/0x1b0
    igc_fpe_send_mpacket+0x37/0x90
    ethtool_mmsv_verify_timer+0x15e/0x300

Cc: stable@vger.kernel.org
Fixes: 5422570c0010 ("igc: add support for frame preemption verification")
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260515182419.1597859-10-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

igc: set tx buffer type for SMD frames

Sashiko pointed out that igc_fpe_init_smd_frame() initializes
igc_tx_buffer fields for an SMD skb, but does not set the buffer type:
https://sashiko.dev/#/patchset/20260415025226.114115-1-kohei%40enjuk.jp

Since igc_tx_buffer entries are reused, a stale XDP or XSK type can
remain and make TX completion use the wrong cleanup path.

Set the buffer type to IGC_TX_BUFFER_TYPE_SKB.

Fixes: 5422570c0010 ("igc: add support for frame preemption verification")
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260515182419.1597859-9-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ixgbevf: fix use-after-free in VEPA multicast source pruning

ixgbevf_clean_rx_irq() prunes frames whose source MAC matches the VF's
own address (VEPA multicast workaround) by freeing the skb and
continuing to the next descriptor:

    dev_kfree_skb_irq(skb);
    continue;

The skb pointer is declared outside the while loop and persists across
iterations.  Because the continue skips the "skb = NULL" reset at the
bottom of the loop, the next iteration enters the "else if (skb)" path
and calls ixgbevf_add_rx_frag() on the freed skb, dereferencing
skb_shinfo(skb)->nr_frags - a use-after-free in NAPI softirq context.

The sibling driver iavf already handles this correctly by nulling the
pointer before continuing.  Apply the same pattern here.

I do not have ixgbevf hardware; the bug was found by static analysis
(scan_drop_continue_loops.py + semgrep drop_continue_in_loop, multi-tool
corroboration with the highest score in the scan).  The UAF was confirmed
under KASAN by loading a test module that reproduces the exact code
pattern (alloc skb, kfree_skb, then read skb_shinfo(skb)->nr_frags):

  BUG: KASAN: slab-use-after-free in ixgbevf_uaf_test_init+0x100/0x1000
  Read of size 8 at addr 000000006163ae78 by task insmod/30
  freed 208-byte region [000000006163adc0, 000000006163ae90)

QEMU emulates igb (82576) but not ixgbe (82599), and the igbvf VF
driver does not include the VEPA source pruning path, so a full
end-to-end reproduction with emulated hardware was not possible.

Fixes: bad17234ba70 ("ixgbevf: Change receive model to use double buffered page based receives")
Cc: stable@vger.kernel.org
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260515182419.1597859-8-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: restore PTP Rx timestamp config after ethtool set-channels

When ethtool -L changes queue counts, ice_vsi_recfg_qs() closes and
rebuilds the VSI, reallocating Rx rings. The newly allocated rings have
ptp_rx cleared, so RX hardware timestamps are no longer attached to skb
until hwtstamp configuration is applied again.

Restore timestamp mode after ice_vsi_open() in the queue reconfiguration
path, matching reset/rebuild behavior and ensuring newly rebuilt Rx rings
have PTP RX timestamping re-enabled.

Testing hints:
- run ptp4l application in client synchronization mode:
ptp4l -i ethX -m -s
- run PTP traffic
- change queue number on ethX netdev interface:
ethtool -L ethX combined new_queue_size
- observe ptp4l output
- expected result: no "received DELAY_REQ without timestamp" messages

Fixes: 77a781155a65 ("ice: enable receive hardware timestamping")
Cc: stable@vger.kernel.org
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260515182419.1597859-7-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: ptp: use primary NAC semaphore on E825

For E825 2xNAC configurations, PTP semaphore operations must hit the
primary NAC register block so both sides coordinate on the same lock.

Commit e2193f9f9ec9 ("ice: enable timesync operation on 2xNAC E825
devices") updated other primary-only PTP register accesses to
use the primary NAC on non-primary functions, but left ice_ptp_lock()
and ice_ptp_unlock() operating on the local NAC. As a result, secondary
NAC PTP paths can take a different semaphore than the primary side.

Select the primary hardware in ice_ptp_lock() and ice_ptp_unlock() when
the current function is not primary, keeping semaphore operations
symmetric and consistent with the rest of the 2xNAC PTP register access
path.

Fixes: e2193f9f9ec9 ("ice: enable timesync operation on 2xNAC E825 devices")
Reviewed-by: Arkadiusz Kubalewski <Arkadiusz.kubalewski@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260515182419.1597859-6-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: ptp: serialize E825 PHY timer start with PTP lock

ice_start_phy_timer_eth56g() programs TIMETUS registers and issues
INIT_INCVAL without holding the global PTP semaphore.

This allows concurrent PTP command paths to interleave with PHY timer
start, which can make the sequence fail and leave timer initialization
inconsistent.

Take the PTP lock around TIMETUS registers programming and INIT_INCVAL
command execution, and make sure the lock is released on all error paths.

Keep the subsequent sync step outside of this critical section, since
ice_sync_phy_timer_eth56g() takes the same semaphore internally.

Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Reviewed-by: Arkadiusz Kubalewski <Arkadiusz.kubalewski@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260515182419.1597859-5-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: fix setting promisc mode while adding VID filter

There are at least two paths through which VSI promiscuous mode can be
independently configured via ice_fltr_set_vsi_promisc():
- ice_vlan_rx_add_vid() (netdev op)
- ice_service_task() -> ... -> ice_set_promisc()

Both paths may try to program promiscuous mode concurrently. One such
scenario is:

1. Add ice netdev to bond
2. Add the bond netdev to bridge
3. ice netdev enters allmulticast mode (IFF_ALLMULTI)
4. Service task programs promisc mode filter
5. Bridge -> bond calls ice_vlan_rx_add_vid()

Crucially, ice_vlan_rx_add_vid() fails if ice_fltr_set_vsi_promisc()
returns any error, including -EEXIST. This causes VLAN filtering setup
to fail on the bond interface. ice_set_promisc() already handles -EEXIST
correctly.

Fix by adding the same -EEXIST check to ice_vlan_rx_add_vid(): if the
promisc filter is already programmed, continue without returning error.

Fixes: 1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling")
Cc: stable@vger.kernel.org
Signed-off-by: Marcin Szycik <marcin.szycik@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260515182419.1597859-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: fix VF queue configuration with low MTU values

The ice driver's VF queue configuration validation rejects
databuffer_size values below 1024 bytes, which prevents VFs from
using MTU values below 871 bytes.

The iavf driver calculates databuffer_size based on the MTU using:
  databuffer_size = ALIGN(MTU + LIBETH_RX_LL_LEN, 128)

where LIBETH_RX_LL_LEN = 26 (ETH_HLEN + 2*VLAN_HLEN + ETH_FCS_LEN).

For MTU values below 871:
  MTU 870: 870 + 26 = 896, aligned to 128 = 896 (< 1024, rejected)
  MTU 871: 871 + 26 = 897, aligned to 128 = 1024 (>= 1024, accepted)

The 1024-byte minimum seems unnecessarily restrictive, because the hardware
supports databuffer_size as low as 128 bytes (the alignment boundary),
which should allow MTU values down to the standard minimum of 68 bytes.

I haven't found the reason why the limit was configured in the commit
9c7dd7566d18 ("ice: add validation in OP_CONFIG_VSI_QUEUES VF message"), so
with no more information and since it is working, change the minimum
databuffer_size validation from 1024 to 128 bytes to allow standard low
MTU values while still preventing invalid configurations.

Fixes: 9c7dd7566d18 ("ice: add validation in OP_CONFIG_VSI_QUEUES VF message")
cc: stable@vger.kernel.org
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260515182419.1597859-3-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: fix locking around wait_event_interruptible_locked_irq

Commit 50327223a8bb ("ice: add lock to protect low latency interface")
introduced a wait queue used to protect the low latency timer interface.
The queue is used with the wait_event_interruptible_locked_irq macro, which
unlocks the wait queue lock while sleeping. The irq variant uses
spin_lock_irq and spin_unlock_irq to manage this. The wait queue lock was
previously locked using spin_lock_irqsave. This difference in lock variants
could lead to issues, since wait_event would unlock the wait queue and
restore interrupts while sleeping.

The ice_read_phy_tstamp_ll_e810() function is ultimately called through
ice_read_phy_tstamp, which is called from ice_ptp_process_tx_tstamp or
ice_ptp_clear_unexpected_tx_ready. The former is called through the
miscellaneous IRQ thread function, while the latter is called from the
service task work queue thread. Neither of these functions has interrupts
disabled, so use spin_lock_irq instead of spin_lock_irqsave.

Fixes: 50327223a8bb ("ice: add lock to protect low latency interface")
Cc: stable@vger.kernel.org
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20250109181823.77f44c69@kernel.org/
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260515182419.1597859-2-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ARM: dts: aspeed: anacapa: Correct SGPIO names for monitoring

Update several Anacapa SGPIO line names to match the existing platform
hardware design and the signal names consumed by userspace monitoring.

The previous names did not match the actual Anacapa SGPIO usage. Some
lines were named as CPU or CPU power-good signals, but they are wired
and used on Anacapa for EDSFF presence, EDSFF power-good, boot EDSFF
presence, and thermal-trip assertion monitoring.

Correct the mappings as follows:

  PWRGD_PVDDCR_SOC_P0     -> L_PRSNT_EDSFF0_N
  PWRGD_PVDDIO_P0         -> L_PRSNT_EDSFF1_N
  PWRGD_PVDDIO_MEM_S3_P0  -> R_PRSNT_EDSFF2_N
  PWRGD_CHMP_CPU0_FPGA    -> R_PRSNT_EDSFF3_N
  PWRGD_CHIL_CPU0_FPGA    -> HPM_EDSFF_PG
  EAM2_CPU_MOD_PWR_GD_R   -> PRSNT_EDSFF_BOOT_N
  CPU0_SP7R1              -> L_EDSFF0_PG
  CPU0_SP7R2              -> L_EDSFF1_PG
  CPU0_SP7R3              -> R_EDSFF2_PG
  CPU0_SP7R4              -> R_EDSFF3_PG
  HPM_AMC_THERMTRIP_R_L   -> AMC_THERMTRIP_ASSERT
  FM_CPU0_THERMTRIP_N     -> CPU_THERMTRIP_ASSERT

The left-side EDSFF slots are numbered as EDSFF0 and EDSFF1 to match
the platform slot numbering used by userspace. The thermtrip names are
also updated to describe the asserted condition monitored by userspace
instead of the raw active-low signal names.

This is a naming correction for the existing Anacapa hardware design.
There is no new board revision or underlying hardware change involved.

[arj: Tweak capitalisation in commit subject, rewrap paragraph]

Signed-off-by: Rex Fu <Rex.Fu@amd.com>
Link: https://patch.msgid.link/20260518-anacapa-sgpio-edsff-thermtrip-v2-1-e43b1847b2dc@amd.com
Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>

Merge branch 'bpf-follow-up-fixes-for-bpf-syscall-common-attributes'

Leon Hwang says:

====================
bpf: Follow-up fixes for BPF syscall common attributes

Address sashiko reviews for BPF syscall common attributes series.

1. The tailing padding bytes in struct bpf_common_attr should be
checked. [1]
2. There was a concurrent regression in syscall.c::map_create(). [2]
3. OPTS_VALID() was missing to validate the nested struct bpf_log_opts
in libbpf. [3]
4. The token_fd should be -1 to avoid a valid token fd in test. [4]

A test is added to verify the fix #1.

The fix #2 is hard to be verified by test. So, I decide not to add a test
for it to avoid over-engineering.

Decide not to add a test for fix #3 to avoid over-engineering, as the
fix looks really simple.

Links:
[1] https://lore.kernel.org/bpf/20260513224823.6494FC19425@smtp.kernel.org/
[2] https://lore.kernel.org/bpf/20260512233658.CEED7C2BCB0@smtp.kernel.org/
[3] https://lore.kernel.org/bpf/20260512235629.C5CABC2BCB0@smtp.kernel.org/
[4] https://lore.kernel.org/bpf/20260513003358.55836C2BCB0@smtp.kernel.org/
====================

Link: https://patch.msgid.link/20260518145446.6794-1-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add test to verify checking padding bytes for BPF syscall common attributes

Add a test to verify that the tailing padding 4 bytes are checked in
syscall.c::__sys_bpf() using bpf_check_uarg_tail_zero().

Without the fix, the test fails with:

test_common_attr_padding:FAIL:syscall unexpected syscall: actual 4 >= expected 0
#213/12 map_create_failure/common_attr_padding:FAIL

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260518145446.6794-6-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Use -1 as token_fd in map create failure test

Because 0xFF can be an open BPF token fd in the test runner that will fail
test_invalid_token_fd(), change token_fd from 0xFF to -1 to avoid such
test failure.

Fixes: f675483cac1d ("selftests/bpf: Add tests to verify map create failure log")
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260518145446.6794-5-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Add OPTS_VALID() for log_opts in bpf_map_create

There should be an OPTS_VALID() check for log_opts before extracting its
fields.

If no such OPTS_VALID() check and an application compiled against a future
libbpf header passes a log_opts with new, non-zero fields to libbpf.so,
those fields will be ignored silently.

Fixes: 702259006f93 ("libbpf: Add syscall common attributes support for map_create")
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260518145446.6794-4-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Check tail zero of bpf_common_attr using offsetofend

Because of the 8-byte alignment, the compiler will pad struct
bpf_common_attr to 24 bytes. That said, sizeof(attr_common) is 24 instead
of 20.

When check tail zero using sizeof(attr_common) in
bpf_check_uarg_tail_zero(), there will be 4 bytes that won't be checked.

To also check the padding 4 bytes, replace sizeof(attr_common) with
offsetofend(struct bpf_common_attr, log_true_size).

Fixes: f28771c0691b ("bpf: Extend BPF syscall with common attributes support")
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260518145446.6794-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'net-devmem-support-devmem-with-netkit-devices'

Bobby Eshleman says:

====================
net: devmem: support devmem with netkit devices

This series enables TCP devmem TX through netkit devices.

Netkit now supports queue leasing. A physical NIC's RX queue can be
leased to a netkit guest interface inside a container namespace. This
gives the container a devmem-capable data path on the RX side (bind-rx,
etc...). On the TX side, the container process binds to its netkit guest
interface and sends traffic that netkit redirects (via BPF or ip
forwarding) to the physical NIC for DMA.

Two things in the existing devmem TX path prevent this from working:

1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will
   forward a dmabuf-backed (unreadable) skb. This protects skbs from
   landing on devices that don't have the IOMMU mappings for the backing
   dmabuf or that don't speak netmem. Netkit, however, does not support
   DMA, doesn't attempt to read unreadable skb pages and so doesn't
   break netmem (it is pure skb routing and redirection). It is
   functionally capable of routing unreadable skbs, but there is no way
   for the TX validation pathway to distinguish between a device that
   will actually attempt DMA-ing the skb and another device
   (like netkit) that does not DMA but also does not break
   netmem.

2. bind_tx_doit uses the bound device as the DMA device.  When the user
   binds devmem TX to the netkit guest, the bind handler attempts to
   create DMA mappings against netkit, which has no DMA capability and
   no IOMMU mappings.

This series solves these problems as follows:

1. Extend netmem_tx to two bits, assigned to one of three values:

   NETMEM_TX_NONE   - netmem not supported
   NETMEM_TX_DMA    - netmem supported and performs DMA
   NETMEM_TX_NO_DMA - netmem supported, but does not DMA

   With these bits, phys devices can set NETMEM_TX_DMA and devices like
   netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any
   DMA-capable netdev exactly matches the bound device, guaranteeing the
   correct mapping of the bound dmabuf. The validation TX path also
   allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices
   will not misuse netmem or run into IOMMU faults. After redirection or
   routing and the skb finally makes its way through the stack to a
   physical device's TX path, the above NETMEM_TX_DMA check is performed
   again to guarantee the device has the appropriate binding/mappings.

2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and
   finds the phys TX device and binds to that instead. For the netkit
   case, if it has been leased a queue from a DMA-capable device
   already, then the bind action is performed on the DMA-capable device
   instead and the dmabuf is mapped correctly.
====================

Link: https://patch.msgid.link/20260514-tcp-dm-netkit-v5-0-408c59b91e66@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: add netkit devmem tests

Add nk_devmem.py with four tests for TCP devmem through a netkit device.

These tests are just duplicates of the original devmem tests, with some
adjusted parameters such as telling ncdevmem to avoid device setup
(since it only has access to netkit, not a phys device).

Each test uses NetDrvContEnv with primary_rx_redirect=True to set up the
BPF redirect program on the primary netkit interface, then calls a
shared run_*() helper which probes for devmem support and configures
the NIC (HDS, RSS, queue lease) before driving the test. NIC state is
restored per-test via defer() callbacks registered inside the helper.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260514-tcp-dm-netkit-v5-8-408c59b91e66@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv

When sending from a namespace that has access to a netkit device with a
leased queue, the nk primary in the host namespace needs to redirect its
RX to the physical device. This patch adds that redirection bpf program
and teaches the harness to install it.

Add primary_rx_redirect=False parameter to NetDrvContEnv.__init__().
When enabled, _attach_primary_rx_redirect_bpf() attaches a new BPF TC
program (nk_primary_rx_redirect.bpf.c) to the primary (host-side) netkit
interface. The program redirects non-ICMPv6 IPv6 packets to the physical
NIC via bpf_redirect_neigh(), with the physical ifindex configured via
the .bss map. ICMPv6 is left on the host's netkit primary so IPv6
neighbor discovery still work locally.

Extract _find_bss_map_id() from _attach_bpf() into a reusable helper so
other BPF attachment methods can use it.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260514-tcp-dm-netkit-v5-7-408c59b91e66@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: refactor devmem command builders into lib module

Adding netkit-based devmem tests is a straight-forward copy of devmem
test commands plus some args for the nk cases, so this patch breaks out
these command builders into helpers used by both.

Though we tried to avoid libraries to avoid increasing the barrier of
entry/complexity (see selftests/drivers/net/README.md, section "Avoid
libraries and frameworks"), factoring out these functions seemed like
the lesser of two evils in this case of using the same commands, just
with slightly different args per environment.

I experimented with just having all of the tests in the same file to
avoid having helpers in a library file, but because ksft_run() is
limited to a single call per file, and the new tests will require
different environments (NetDrvContEnv/NetDrvEpEnv), it would have been
necessary to have each test set up its own environment instead of
sharing one for the entire ksft_run() run. This came at the cost of
ballooning the test time (from under 5s to 30s on my test system), so to
strike a balance these tests were placed in separate files so they could
keep a shared environment across a single ksft_run() run shared across
all tests using the same env type (introduced in subsequent patches).

The helpers work transparently with both plain and netkit environments
by inspecting cfg for netkit-specific attributes (netns, nk_queue,
etc...).

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260514-tcp-dm-netkit-v5-6-408c59b91e66@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: make attr _nk_guest_ifname public

Subsequent patches will use the _nk_guest_ifname as a public attr for
setting up devmem. Rename to nk_guest_ifname to avoid angering the
linter about the '_' prefix being used for a non-private attr.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260514-tcp-dm-netkit-v5-5-408c59b91e66@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration

Add a -n (skip_config) flag that causes ncdevmem to skip NIC
configuration when operating as an RX server. When -n is passed,
ncdevmem skips configuring header split, RSS, and flow steering, as well
as their teardown on exit.

This allows ksft tests to pre-configure the NIC in the host namespace
before launching ncdevmem in the guest namespace. This is needed for
netkit devmem tests where the test harness namespace has direct access
to the NIC and the ncdevmem namespace does not.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260514-tcp-dm-netkit-v5-4-408c59b91e66@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: devmem: support TX over NETMEM_TX_NO_DMA devices

When a netkit virtual device leases queues from a physical NIC, devmem
TX bindings created on the netkit device must still result in the dmabuf
being mapped for dma by the physical device. This patch accomplishes
this by teaching the bind handler to search for the underlying
DMA-capable device by looking it up via leased rx queues. The function
netdev_find_netmem_tx_dev(), used for finding the underlying DMA-capable
device, can be extended to support other non-netkit NETMEM_TX_NO_DMA
devices in the future if needed.

Additionally, this patch extends validate_xmit_unreadable_skb() to
support the netkit case, where the skb is validated twice: once on the
netkit guest device and again on the physical NIC after BPF redirect or
ip forwarding.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260514-tcp-dm-netkit-v5-3-408c59b91e66@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: netkit: declare NETMEM_TX_NO_DMA mode

Some virtual devices like netkit (or ifb) never DMA and never touch frag
contents, they just forward the skb to another device. They are unable
to forward unreadable skbs, however, because they fail to pass TX
validation checks on dev->netmem_tx. The existing two-state
NETMEM_TX_NONE / NETMEM_TX_DMA doesn't give the TX validator enough
information to differentiate devices that will attempt DMA on the
unreadable skb from those that will simply route it untouched.

Add a third mode to the enum so drivers can indicate 1) if they have
netmem TX support, and 2) if they do, whether they are DMA-capable:

NETMEM_TX_NO_DMA - pass-through, device never DMAs

Widen dev->netmem_tx from a 1-bit field to 2 bits to fit the new value,
and declare netkit as NETMEM_TX_NO_DMA. Devmem TX support over these
devices comes in a follow-up patch.

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260514-tcp-dm-netkit-v5-2-408c59b91e66@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: convert netmem_tx flag to enum

Devices that support netmem TX previously set dev->netmem_tx = true.
This was checked in validate_xmit_unreadable_skb() to drop unreadable
skbs (skbs with dmabuf-backed frags) before they reach drivers that
would mishandle them or devices that would not have the iommu mappings
for them.

A subsequent patch will introduce a third state for virtual devices
that forward unreadable skbs without ever performing DMA on them. To
prepare for that, convert the boolean dev->netmem_tx into an enum:

NETMEM_TX_NONE - no netmem TX support (drop unreadable skbs)
NETMEM_TX_DMA - full support, device does DMA

Update the existing NIC drivers (bnxt, gve, mlx5, fbnic) and the
validators in net/core to use the new enum. No functional change.

Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260514-tcp-dm-netkit-v5-1-408c59b91e66@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mlx5e-simplify-and-optimize-napi-poll-flow'

Tariq Toukan says:

====================
net/mlx5e: simplify and optimize napi poll flow

This series simplifies the mlx5e napi poll flow and reduces branching in
hot paths by leveraging existing dependencies between channel features.

Patch 1 avoids passing the full channel object to kTLS RX code paths
when only the async ICOSQ is needed, and slightly optimizes the common
flow in the pending resync handling logic.

Patch 2 further reduces branches in napi poll by relying on established
feature dependencies between XDP, async ICOSQ, XSK, and kTLS RX state.
====================

Link: https://patch.msgid.link/20260514111038.338251-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Let kTLS RX get async ICOSQ param in napi poll

Do not pass channel just to extract the async ICOSQ.
It's already extracted, pass it.

Re-order the checks in mlx5e_ktls_rx_pending_resync_list to optimize the
common flow.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/20260514111038.338251-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Reduce branches in napi poll

Reduce the number of branches in napi poll, based on the following list
of dependencies:

1. xsk_open=t only if c->xdp and c->async_icosq.
2. c->xdpsq only if c->xdp.
3. c->xdp implies c->async_icosq.
4. ktls_rx_was_enabled implies c->async_icosq.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Link: https://patch.msgid.link/20260514111038.338251-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'perf-tools-fixes-for-v7.1-2026-05-18' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf-tools fixes
"An usual sync-up for the header files and related code:

   - copy headers that are used for perf trace syscall beautifier

   - update the beautifier scripts according to the changes

   - don't show differences in the headers by default"

* tag 'perf-tools-fixes-for-v7.1-2026-05-18' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf trace: Update beautifier script for clone flags
  perf trace: Add beautifier script for fsmount flags
  perf build: Add make check-headers target
  perf trace: Sync uapi/linux/sched.h with the kernel source
  perf trace: Sync uapi/linux/mount.h with the kernel source
  perf trace: Sync uapi/linux/fs.h with the kernel source
  perf trace: Sync linux/socket.h with the kernel source

cifs: Fix undefined variables

Fix a couple of undefined variables introduced by the patch to fix tearing
on ->remote_i_size and ->zero_point. For some reason, make W=1 with gcc
doesn't give undefined variable warnings (but clang does).

Fixes: 2c8f4742bb76 ("netfs: Fix potential for tearing in ->remote_i_size and ->zero_point")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202605031459.eX5UbO3K-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202605021450.ca5QGqLH-lkp@intel.com/
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christian Brauner <brauner@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

include: Remove unused ks8851_mll.h

The last user was removed in commit 72628da6d634 ("net: ks8851:
Remove ks8851_mll.c") which consolidated the driver into a
common implementation. No file includes this header.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Link: https://patch.msgid.link/20260515184531.1515418-1-costa.shul@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-26-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for net:

1) Fix small race windows in nf_ct_helper_log() when accessing helper,
   from Florian Westphal.

2) Fix potential infinite loop and race conditions in IPVS caused by
   frequent user-triggered service table changes, from Julia Anastasov.

3) Fix a race condition when dumping ipsets for restore,
   from Jozsef Kadlecsik.

4) Fix inner transport offset in IPv6 in nft_inner when extension
   headers come before the layer 4 transport header, from Yizhou Zhao.

5) Fix incorrect iteration over IPv4 ranges in several hash set types,
   from Nan Li.

6) Fix incorrect order when restoring BH in nft_inner_restore_tun_ctx(),
   from Florian Westphal.

7) Validate option array from ip6t_hbh checkpath() to fix an off-by-one
   access, from Zhengchuan Liang.

8) Fix race condition between ipset list -terse and concurrent updates,
   from Jozsef Kadlecisk.

9) Fix race condition when inserting elements into a hash bucket, also
   from Jozsef.

10) Annotate access to first free slot in hashtable, from Jozsef Kadlecsik.

11) Ensure sufficient headroom in br_netfilter neigh transmission,
    from Lorenzo Bianconi.

12) Hold reference on skb->dev in nfqueue exit path, bridge local input
    is speciall since skb->dev != state->indev, allowing for net_device
    to go away while packet is sitting in nfqueue. From Haoze Xie.

* tag 'nf-26-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_queue: hold bridge skb->dev while queued
  netfilter: br_netfilter: Reallocate headroom if necessary in neigh_hh_bridge()
  netfilter: ipset: annotate "pos" for concurrent readers/writers
  netfilter: ipset: Fix data race between add and dump in all hash types
  netfilter: ipset: Fix data race between add and list header in all hash types
  netfilter: ip6t_hbh: reject oversized option lists
  netfilter: nft_inner: release local_lock before re-enabling softirqs
  netfilter: ipset: stop hash:* range iteration at end
  netfilter: nft_inner: Fix IPv6 inner_thoff desync
  netfilter: ipset: fix a potential dump-destroy race
  ipvs: avoid possible loop in ip_vs_dst_event on resizing
  netfilter: nf_conntrack_helper: fix possible null deref during error log
====================

Link: https://patch.msgid.link/20260516115627.967773-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dibs: Improve DIBS prompts and help texts

The Kconfig prompts for the DIBS options read:

DIBS support (DIBS) [N/m/y/?]
intra-OS shortcut with dibs loopback (DIBS_LO) [N/y/?]

Clarify the DIBS prompt by expanding the acronym.
Capitalize the first character of the DIBS_LO prompt.
While at it, join the first two lines of the DIBS help text into a real
sentence.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://patch.msgid.link/9093259c43e5d1996965d1522562444419196a19.1778838921.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

x86: Remove unnecessary architecture-specific <asm/device.h>

arch/x86/include/asm/device.h is identical to
include/asm-generic/device.h, and therefore the x86-specific version
is unnecessary. Remove it.

[ dhansen: Minor note: It looks like if asm/foo.h does not exist that
   the build system generates one that does a #include
   <asm-generic/foo.h>. Thus, all that needs to be done is
   remove the arch-specific one. ]

Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20260517025713.97791-1-enelsonmoore@gmail.com

Merge tag 'batadv-net-pullrequest-20260515' of https://git.open-mesh.org/batadv

Simon Wunderlich says:

====================
Here are various batman-adv bugfixes:

- fix tp_meter counter underflow during shutdown, by Luxiao Xu

- fix tp_meter tp_vars reference leak in receiver shutdown,
   by Sven Eckelmann

- fix various translation table integer handling issues,
   by Sven Eckelmann (3 patches)

- fix various translation table counter issues,
   by Sven Eckelmann (3 patches)

- fix fragment reassembly length accounting, by Ruide Cao

- clear current gateway during teardown, by Ruijie Li

- handle forward allocation error in DAT, by Sven Eckelmann

- tp_meter: avoid use of uninitialized sender variables in tp_meter,
   by Sven Eckelmann

- disallow unicast fragment in fragment, by Sven Eckelmann

- directly shut down tp_meter timer on cleanup, by Sven Eckelmann

* tag 'batadv-net-pullrequest-20260515' of https://git.open-mesh.org/batadv:
  batman-adv: tp_meter: directly shut down timer on cleanup
  batman-adv: frag: disallow unicast fragment in fragment
  batman-adv: tp_meter: avoid use of uninit sender vars
  batman-adv: dat: handle forward allocation error
  batman-adv: clear current gateway during teardown
  batman-adv: fix fragment reassembly length accounting
  batman-adv: tt: prevent TVLV entry number overflow
  batman-adv: tt: avoid empty VLAN responses
  batman-adv: tt: fix TOCTOU race for reported vlans
  batman-adv: tt: fix negative last_changeset_len
  batman-adv: tt: fix negative tt_buff_len
  batman-adv: tt: reject oversized local TVLV buffers
  batman-adv: tp_meter: fix tp_vars reference leak in receiver shutdown
  batman-adv: fix tp_meter counter underflow during shutdown
====================

Link: https://patch.msgid.link/20260515095540.325586-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-net-2026-05-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

- af_bluetooth: serialize accept_q access
- L2CAP: ecred_reconfigure: send packed pdu, not stack pointer
- btmtk: accept too short WMT FUNC_CTRL events
- hci_qca: Convert timeout from jiffies to ms

* tag 'for-net-2026-05-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: hci_qca: Convert timeout from jiffies to ms
  Bluetooth: L2CAP: ecred_reconfigure: send packed pdu, not stack pointer
  Bluetooth: btmtk: accept too short WMT FUNC_CTRL events
  Bluetooth: serialize accept_q access
====================

Link: https://patch.msgid.link/20260514172340.1515042-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

openvswitch: vport: fix race between linking and the device notifier

Sashiko reports that it is technically possible that we got the device
reference, but by the time we're linking it to the OVS datapath, it
may be already in the process of being deleted.  In this case if the
notifier wins the race for RTNL, it will see that the device is not
yet in the OVS datapath (ovs_netdev_get_vport() will fail in the
dp_device_event()) and will do nothing.  Then the ovs_netdev_link()
will take the RTNL and link the unregistering device to OVS datapath.

Eventually, netdev_wait_allrefs_any() will re-broadcast the event and
the device will be properly detached, but it will take at least a
second before that happens, so it's not something we should rely on.

Let's avoid linking the non-registered device in the first place.

Note: As per documentation, RTNL doesn't protect the reg_state, but
it actually does for all the state transitions we care about here,
so it should not be necessary to use READ_ONCE or taking the instance
lock.  We can still do that, but we have a few more places even in
this file where the reg_state is accessed without those while under
RTNL, and many more places like this across the kernel code, so it
might make more sense to change all of them in a more centralized
fashion in the future, if necessary.

Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://patch.msgid.link/20260514184702.2461435-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: qualcomm: rmnet: fix endpoint use-after-free in rmnet_dellink()

rmnet_dellink() removes the endpoint from the hash table with
hlist_del_init_rcu() and then immediately frees it with kfree(). However,
RCU readers on the receive path (rmnet_rx_handler ->
__rmnet_map_ingress_handler) may still hold a reference to the endpoint and
dereference ep->egress_dev after the memory has been freed. The endpoint is
a kmalloc-32 object, and the stale read at offset 8 corresponds to the
egress_dev pointer.

  BUG: unable to handle page fault for address: ffffffffde942eef
  Oops: 0002 [#1] SMP NOPTI
  CPU: 1 UID: 0 PID: 137 Comm: poc_write Not tainted 7.0.0+ #4 PREEMPTLAZY
  RIP: 0010:rmnet_vnd_rx_fixup (rmnet_vnd.c:27)
  Call Trace:
   <TASK>
   __rmnet_map_ingress_handler (rmnet_handlers.c:48 rmnet_handlers.c:101)
   rmnet_rx_handler (rmnet_handlers.c:129 rmnet_handlers.c:235)
   __netif_receive_skb_core.constprop.0 (net/core/dev.c:6096)
   __netif_receive_skb_one_core (net/core/dev.c:6208)
   netif_receive_skb (net/core/dev.c:6467)
   tun_get_user (drivers/net/tun.c:1955)
   tun_chr_write_iter (drivers/net/tun.c:2003)
   vfs_write (fs/read_write.c:688)
   ksys_write (fs/read_write.c:740)
   </TASK>

Add an rcu_head field to struct rmnet_endpoint and replace kfree() with
kfree_rcu() so the endpoint memory remains valid through the RCU grace
period. Also remove the rmnet_vnd_dellink() call and inline only the
nr_rmnet_devs decrement, since rmnet_vnd_dellink() would set
ep->egress_dev to NULL during the grace period, creating a data race
with lockless readers.

Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Link: https://patch.msgid.link/20260514122511.3083479-2-bestswngs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: appletalk: fix NULL pointer dereference in aarp_send_ddp()

aarp_send_ddp() calls atalk_find_dev_addr(dev) in the LocalTalk fast
path without checking for NULL. When the device has no AppleTalk
interface configured (dev->atalk_ptr == NULL), this leads to a NULL
pointer dereference at the at->s_net access.

KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:aarp_send_ddp (net/appletalk/aarp.c:552 (discriminator 2))
Call Trace:
  <TASK>
  atalk_sendmsg (net/appletalk/ddp.c:1715)
  __sys_sendto (net/socket.c:2265 (discriminator 1))
  __x64_sys_sendto (net/socket.c:2272)
  do_syscall_64 (arch/x86/entry/syscall_64.c:94)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)

Add a NULL check consistent with the other callers of
atalk_find_dev_addr().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Link: https://patch.msgid.link/20260514123806.3085961-3-bestswngs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netdevsim: psp: reset spi on key rotation and check for exhaustion on alloc

The PSP spec states that the lower 31b of the SPI need to be
non-zero. Though not in the spec, I think it is reasonable to reset
the lower 31b of the spi space after a key rotation, and to also
decline to generate session keys when the lower 31b saturate.

Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260515-spi-handle-v1-1-debf8cb467cb@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-mlx5-frag-buffer-improvements'

Tariq Toukan says:

====================
net/mlx5: frag buffer improvements

This series adds observability for mlx5 fragment buffer DMA pools and
improves the default NUMA placement policy for fragment buffer
allocations.

Patch 1 adds a debugfs interface exposing per-node DMA pool usage
statistics for mlx5_frag_buf allocations, helping with debugging and
visibility into pool utilization.

Patch 2 improves locality of default fragment buffer allocations by
using numa_mem_id() when no explicit NUMA node is requested, allowing
allocations to prefer the current CPU's local memory node.

Together, these changes improve both introspection and memory locality
behavior of mlx5 fragment buffer allocations.
====================

Link: https://patch.msgid.link/20260514104925.337570-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: add debugfs stats for frag buf dma pools

Add a debugfs file exposing per-node DMA pool usage for mlx5_frag_buf
allocations.

  # cat /sys/kernel/debug/mlx5/<dev>/frag_buf_dma_pools
  node  block_size  used_blocks  allocated_blocks
     0        4096            0                 0
     0        8192            0                 0
     0       16384            0                 0
     0       32768            0                 0
     0       65536            0                 0
     1        4096            0                 0
     1        8192            0                 0
     1       16384            0                 0
     1       32768            0                 0
     1       65536            0                 0

Signed-off-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260514104925.337570-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: use numa_mem_id() for default frag buf allocations

Use the current CPU's local memory node when callers do not request a
specific NUMA node for mlx5_frag_buf allocations.

Signed-off-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260514104925.337570-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: xsk: Fix unlocked writing to ICOSQ

During napi poll, when the affinity changes and there's still XSK work
to be done, we trigger an ICOSQ interrupt on the new CPU. However, this
triggering on the ICOSQ is done unprotected.

There are 2 such races:

A) mlx5e_trigger_irq() is called while mlx5e_xsk_alloc_rx_mpwqe() is
running from a different CPU due to affinity change. This can happen
because IRQ triggering is done after napi_complete_done(). At this point
the NAPI can be scheduled on a different CPU. Like this:

  CPU A (old affinity, NAPI tail)    CPU B (new affinity, fresh NAPI)
  -------------------------------    --------------------------------
  napi_complete_done()  clears SCHED
  mlx5e_cq_arm(...)
                                     napi_schedule_prep() sets SCHED
                                     mlx5e_napi_poll()
                                       mlx5e_xsk_alloc_rx_mpwqe()
                                         mlx5e_icosq_sync_lock() // noop
                                         memcpy 640 B UMR body
                                         advance sq->pc by 10
  mlx5e_trigger_irq(&c->icosq)
    wqe_info[pi] = {NOP, 1}
    mlx5e_post_nop() advances sq->pc

B) mlx5e_trigger_irq() is called on the ICOSQ when
mlx5e_trigger_napi_icosq() is running.

The obvious fix would be to lock the ICOSQ. But ICOSQ has an optimized
locking scheme that doesn't work for this scenario. Kick the async ICOSQ
instead which is always locked.

This issue was noticed in the wild with the following splat:

  netdevice: ge-0-0-1: Bad OP in ICOSQ CQE: 0xd
  WARNING: drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:826 [...]
  [...]
  Call Trace:
   <IRQ>
   mlx5e_napi_poll+0x11d/0x7f0 [mlx5_core]
   __napi_poll+0x30/0x200
   ? skb_defer_free_flush+0x9c/0xc0
   net_rx_action+0x2fe/0x3f0
   handle_softirqs+0xd8/0x340
   __irq_exit_rcu+0xbc/0xe0
   common_interrupt+0x85/0xa0
   </IRQ>
   <TASK>
   asm_common_interrupt+0x26/0x40
  [...]
  ---[ end trace 0000000000000000 ]---
  mlx5_core 0000:08:00.0 ge-0-0-1: Error cqe on cqn 0x548, ci 0x2022, qn 0x8f4,
  opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000030: 00 00 00 00 01 00 68 02 01 00 08 f4 de 14 59 d2
  WQE DUMP: WQ size 16384 WQ cur size 0, WQE index 0x1e14, len: 64
  00000000: 00 00 00 01 d9 ed 80 02 00 00 00 01 d9 ed 90 02
  00000010: 00 00 00 01 d9 ed a0 02 00 00 00 01 d9 ed b0 02
  00000020: 00 00 00 01 d9 ed c0 02 00 00 00 01 d9 ed d0 02
  00000030: 00 00 00 01 d9 ed e0 02 00 00 00 01 d9 ed f0 02
  mlx5_core 0000:08:00.0 ge-0-0-1: Error cqe on cqn 0x548, ci 0x2023, qn 0x8f4,
  opcode 0xd, syndrome 0x5, vendor syndrome 0xf9
  00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00000030: 00 00 00 00 01 00 f9 05 01 00 08 f4 de 15 cf d2

Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support")
Reported-by: Paul Saab <ps@mu.org>
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260513064613.334602-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

accel/amdxdna: Add expandable device heap support

Introduce an expandable device heap to avoid allocating a large heap
upfront. Start with a smaller initial heap and grow it on demand.
Return -EAGAIN when BO allocation fails due to insufficient heap space,
allowing userspace to trigger heap expansion via a heap BO creation
IOCTL and retry the allocation.

Manage heap chunks using an xarray. On expansion, register new chunks
with the firmware via MSG_OP_ADD_HOST_BUFFER.

Since heap shrinking is not supported by the firmware, release all heap
chunks on device close.

Co-developed-by: Wendy Liang <wendy.liang@amd.com>
Signed-off-by: Wendy Liang <wendy.liang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260515161922.744647-1-lizhi.hou@amd.com

drm/v3d: Release indirect CSD GEM reference on CPU job free

v3d_get_cpu_indirect_csd_params() takes a reference to the indirect BO via
drm_gem_object_lookup() and stashes it in cpu_job->indirect_csd.indirect,
but nothing on the CPU job teardown path ever drops that reference.

Drop the extra reference in v3d_cpu_job_free(). The NULL check covers ioctl
errors before the lookup ran and CPU job types other than
V3D_CPU_JOB_TYPE_INDIRECT_CSD, which leave the field zero-initialised.

Cc: stable@vger.kernel.org
Fixes: 18b8413b25b7 ("drm/v3d: Create a CPU job extension for a indirect CSD job")
Assisted-by: Claude:claude-opus-4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patch.msgid.link/20260515-v3d-cpu-job-leaks-v1-2-7f147cbbf935@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>

drm/v3d: Fix use-after-free of CPU job query arrays on error path

The CPU job ioctl's fail label calls kvfree() on cpu_job's timestamp and
performance query arrays after v3d_job_cleanup(), which drops the job's
last reference and frees cpu_job. Reading cpu_job at that point is a
use-after-free. Also, on the early v3d_job_init() failure path, it is a
NULL dereference, since v3d_job_deallocate() zeroes the local pointer.

In the success path, the arrays are released from the scheduler's
.free_job callback, but on the error path, they are freed manually, as
the job was never pushed to the scheduler. While the success path deals
with this correctly, the fail path doesn't.

On top of that, the manual kvfree() calls only free the array storage;
they don't drm_syncobj_put() the per-query syncobjs that
v3d_timestamp_query_info_free() and v3d_performance_query_info_free()
release on the success path. So the same fail path that triggers the
use-after-free also leaks one syncobj reference per query.

Unify the CPU job teardown into the CPU job's kref destructor, mirroring
v3d_render_job_free(). The scheduler's .free_job slot reverts to the
generic v3d_sched_job_free() and the fail label drops the manual
kvfree() calls, leaving a single teardown path that is reached from both
the scheduler and the ioctl error path. That removes the use-after-free,
the NULL dereference, and the syncobj leak by construction.

Cc: stable@vger.kernel.org
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query job")
Assisted-by: Claude:claude-opus-4.7
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patch.msgid.link/20260515-v3d-cpu-job-leaks-v1-1-7f147cbbf935@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>

drm/amd/display: Fix eDP receiver ready status check in T7 sequence

[Why]
Some eDP panels return sinkstatus as 0x5, causing the original sinkstatus == 1
check to never match and resulting in unnecessary polling delay. The
equality check is too restrictive and doesn't properly validate the
specific status bit that indicates receiver readiness.

[How]
Replace direct value comparison with proper bitmask check using
DP_RECEIVE_PORT_0_STATUS constant.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Sung-huai Wang <Danny.Wang@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Revert "drm/amd/display: dmub_cmd.h: add missing kernel-doc for enums"

[Why & How]
This reverts commit f71d0b68ec58b781f4f44ea642846bedce075e85.

1. Auto-generated Header: The file 'dmub_cmd.h' is an auto-generated header
   managed in an external repository (dmu_stg). Manual changes made directly in
   this repository will be overwritten and lost during the next automated weekly
   synchronization.

2. Tooling Compatibility: This header is governed by internal AMD firmware
   standards which require Doxygen formatting for cross-team documentation.
   Moving to kernel-doc syntax may break internal documentation pipelines.

3. Suppressing Warnings: Current 'make htmldocs' and 'make W=1' builds
   do not actively scan 'dmub_cmd.h' for kernel-doc compliance, thus no warnings
   are triggered during standard compilation. To address warnings generated when
   manually running './scripts/kernel-doc', we have added a notice at the file
   header indicating that this is an auto-generated file that does not strictly
   follow kernel-doc formatting. This ensures that any future linting tools or
   manual checks recognize the formatting as intentional.

Acked-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: James Lin <pinglei.lin@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add some missing code for dcn42

[why & how]
Some DCN4.2 related code is missing from upstream

Fixes: e56e3cff2a1b ("drm/amd/display: Sync dcn42 with DC 3.2.373")
Acked-by: ChiaHsuan Chung <ChiaHsuan.Chung@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: James Lin <pinglei.lin@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd: Reduce code duplication in runtime PM

[Why]
amdgpu_pmops_runtime_suspend() runs almost the same code that
amdgpu_pmops_runtime_idle() runs. That is there is pointless code
duplication.

[How]
Move amdgpu_pmops_runtime_idle() up, extract common code and then
call from both functions. No intended functional changes.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Check bounds for allocate_sdma_queue restore_sdma_id

allocate_sdma_queue has an option where the sdma queue id can be
specified (used by CRIU). We weren't bounds-checking that
value.

Confirm it's less than the maximum number of queues.

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: use atomic operation to achieve lockless serialization

In amdgpu_seq64_alloc there is a possibility that two difference cores
from two separate NODES can try to and could get the same free slot.
So this fixes that race here using atomic test_and_set clear operations.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vce3: Fix VCE 3 firmware size and offsets

The VCPU BO contains the actual FW at an offset, but
it was not calculated into the VCPU BO size.
Subtract this from the FW size to make sure there is
no out of bounds access.

This may fix VM faults when using VCE 3.

Cc: John Olender <john.olender@gmail.com>
Fixes: e98226221467 ("drm/amdgpu: recalculate VCE firmware BO size")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Check bounds on allocate_doorbell

allocated_doorbell has an option to set the doorbell id
to a specific value (used by CRIU). This value was not
bounds checked.

Check to confirm it's less than KFD_MAX_NUM_OF_QUEUES_PER_PROCESS.

Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vce2: Fix VCE 2 firmware size and offsets

The VCPU BO contains the actual FW at an offset, but
it was not calculated into the VCPU BO size.
Subtract this from the FW size to make sure there is
no out of bounds access.

Additionally, increase the VCE_V2_0_DATA_SIZE to
have extra space after the VCE handles.

Also increase the data size used for each VCE handle.
The FW needs 23744 bytes, use 24K to be safe.

This fixes VM faults when using VCE 2.

Cc: John Olender <john.olender@gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4802
Fixes: e98226221467 ("drm/amdgpu: recalculate VCE firmware BO size")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vce1: Stop using amdgpu_vce_resume

The VCE1 firmware works slightly differently and is already
loaded by vce_v1_0_load_fw(). It doesn't actually need to
call amdgpu_vce_resume().

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vce1: Fix VCE 1 firmware size and offsets

The VCPU BO contains the actual FW at an offset, but
it was not calculated into the VCPU BO size.
Subtract this from the FW size to make sure there is
no out of bounds access.

Make sure the stack and data offsets are aligned to
the 32K TLB size.

Check that the FW microcode actually fits in the
space that is reserved for it.

Fixes: d4a640d4b9f3 ("drm/amdgpu/vce1: Implement VCE1 IP block (v2)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vce1: Don't repeat GTT MGR node allocation

Only allocate entries from the GTT manager when the
VCE GTT node is not allocated yet. This prevents the
possibility of allocating them multiple times, which
causes issues during GPU reset and suspend/resume.

Fixes: 71aec08f80e7 ("amdgpu/vce: use amdgpu_gtt_mgr_alloc_entries")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vce1: Check if VRAM address is lower than GART.

Previously, I had assumed this was not possible
so it was OK to not handle it, but now we got a report
from a user who has a board that is configured this way.

When the VCPU BO is already located in a low 32-bit address
in VRAM (eg. when VRAM is mapped to the low address space),
don't do the workaround.

Fixes: 71aec08f80e7 ("amdgpu/vce: use amdgpu_gtt_mgr_alloc_entries")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vce1: Remove superfluous address check

The same thing is already checked a few lines above.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vce1: Check that the GPU address is < 128 MiB

When ensuring the low 32-bit address, make sure it is
less than 128 MiB, otherwise the VCE seems to fail to initialize.
This seems to be an undocumented limitation of the firmware
validation mechanism. Note that in case of VCE1 the BAR
address is zero and we can't change it also due to the
firmware validator.

When programming the mmVCE_VCPU_CACHE_OFFSETn registers,
don't AND them with a mask. This is incorrect because
the register mask is actually 0x0fffffff and useless because
we already ensure the addresses are below the limit.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Align amdgpu_gtt_mgr entries to TLB size on Tahiti (v2)

The TLB is organized in groups of 8 entries, each one is 4K.
On Tahiti, the HW requires these GART entries to be 32K-aligned.

This fixes a VCE 1 firmware validation failure that can happen
after suspend/resume since we use amdgpu_gtt_mgr for VCE 1.

v2:
- Change variable declaration order
- Add comment about "V bit HW bug"

Fixes: 698fa62f56aa ("drm/amdgpu: Add helper to alloc GART entries")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Fix OOB memory exposure in get_wave_state()

The get_wave_state() function for v9 trusts cp_hqd_cntl_stack_size and
cp_hqd_cntl_stack_offset values read directly from the MQD, which are
written by GPU microcode and fully attacker-controlled on the
CRIU-restore path (via AMDKFD_IOC_RESTORE_PROCESS with H3).

this leads to an unbounded copy_to_user() that can leak adjacent
GTT/kernel memory. If offset > size, integer underflow produces a ~4 GiB
read length, if size is set to 1 MiB against a 4 KiB allocation, we leak
1 MiB of adjacent kernel memory (other queues' MQDs, ring buffers, KASLR
pointers).

Fix by clamping both cp_hqd_cntl_stack_size to the actual allocated
buffer size (q->ctl_stack_size) and cp_hqd_cntl_stack_offset to the
clamped size before performing arithmetic and copy_to_user().

This ensures we never read beyond the allocated kernel BO regardless of
attacker-supplied MQD field values.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: fix memleak of dpm_policies on smu v15

In smu_v15_0_fini_smc_tables, dpm_policies was not freed or NULLed, causing a memory leak.
Add kfree() and NULL assignment to properly release memory and avoid dangling pointers.

Fixes: 2beedc3a92b7 ("drm/amd/pm: Add initial support for smu v15_0_8");
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Enable SDMA queue reset on gfx v12.1

After suspend/resume sdma_gang is supported on MES 12.1, SDMA queue reset
is supported too.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Michael Chen<michael.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Support MES suspend_all_sdma_gangs

suspend_all_sdma_gangs is supported in new MES firmware for gfx 12.1

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michael Chen<michael.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix OOB risk parsing virt RAS batch trace replies on the VF

amdgpu_virt_ras_get_batch_records() indexed batchs[] and records[]
from ras_cmd_batch_trace_record_rsp copied out of shared memory without
fully bounding the cache window or per-batch offset/trace_num. A
tampered or corrupted buffer could set real_batch_num past the array,
make a naive start_batch_id + real_batch_num comparison wrap in
uint64_t, or point offset+trace_num past records[].

Add amdgpu_virt_ras_check_batch_cached() for a subtraction-based window
with a real_batch_num cap, re-run it after GET_BATCH_TRACE_RECORD, and
use an explicit batch index into batchs[]. Consolidate batch_id,
trace_num, and offset+trace_num checks; on any failure memset the cache
and return -EIO so the next call refetches.

Signed-off-by: Chenglei Xie <Chenglei.Xie@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add guest driver CUID support

v3:
improve the coding style.

v2:
use debugfs_create_x64 and debugfs_create_x8 to create node.

v1:
1. Add guest driver CUID support
2. Do not expose vf index(variable "fcn_idx") to customers,
replace the fcn_idx with pad.
Only expose the unitid to customers.

background:
Change fcn_idx to pad, VF index won't expose to guest vm.
Introduce a new unitid field as the VF identifier to replace the VF index:
1).unitid is assigned by the host driver
2).It is delivered to the guest via the pf2vf message
3).The application or umd can retrieve united from the sysfs node

Signed-off-by: chong li <chongli2@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix discovery offset check under VF

Discovery table may be kept at offset 0 by host driver. Remove the
validation check.

Fixes: 01bdc7e219c4 ("drm/amdgpu: New interface to get IP discovery binary v3")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ellen Pan <yunru.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: remove va cursors for all mappings

va_cursor struct needs to be cleaned even if the mapping
has been removed already.

Also simplify it by make it a void function as return value
check isn't needed as its called during tear down.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: reject non-user addresses early in GEM_USERPTR ioctl

amdgpu_gem_userptr_ioctl() currently accepts any value of args->addr
and only discovers an out-of-range pointer much later, inside
amdgpu_gem_object_create() and the HMM mirror registration path.
Userspace can drive that path with kernel-side virtual addresses;
the get_user_pages() layer rejects them, but only after the driver
has already allocated a GEM object and started wiring up notifier
state that then has to be torn down on failure.

Add an access_ok() guard at the top of the ioctl, right after the
existing page-alignment check and before flag validation, so any
address that does not lie within the calling task's user address
range is rejected with -EFAULT before any allocation occurs. No
legitimate ROCm/HSA userspace passes kernel-mode pointers through
this interface, so this is defense-in-depth rather than a behaviour
change for valid callers; -EFAULT matches the convention already
used by other uaccess-style rejections in the kernel.

Also add an explicit #include <linux/uaccess.h>; access_ok() is
otherwise only available transitively through other headers in
this translation unit.

Signed-off-by: Amir Shetaia <Amir.Shetaia@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

watchdog: ziirave_wdt: Use named initializers for struct i2c_device_id

While being less compact, using named initializers allows to more easily
see which members of the structs are assigned which value without having
to lookup the declaration of the struct. And it's also more robust
against changes to the struct definition.

This patch doesn't modify the compiled arrays, only their representation
in source form benefits. The former was confirmed with x86 and arm64
builds.

Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/20260518171901.904094-2-u.kleine-koenig@baylibre.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

drm/amdgpu/vpe: Force collaborate sync after TRAP

VPE1 could possibly hang and fail to power off at the end of commands in
collaboration mode. This workaround adds a COLLAB_SYNC after TRAP to
force instances synchronized to avoid VPE1 fail to power off.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alan liu <haoping.liu@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5171
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: update the vm task info during signal ioctl

Pagefaults does not have process information correctly populated
as vm->task is not set during vm_init but should be updated while
real submission. So setting that up during signal_ioctl to get
the correct submission process details.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: cancel reset work while tear down in progress

While tear down of a userq_mgr is happening when all the queues
are free we should cancel any reset work if pending before exiting.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Remove UML build exclusion from Kconfig

The depends on !UML was added in commit dffe68131707 ("amdgpu: Avoid
building on UML") to work around build failures with allyesconfig on
UML. The original errors were:

- smu7_hwmgr.c: incompatible pointer type 'struct cpuinfo_um *' vs
'struct cpuinfo_x86 *' in intel_core_rkl_chk()
- kfd_topology.c: 'struct cpuinfo_um' has no member named 'apicid'

Both issues have since been resolved independently:
- intel_core_rkl_chk() has been removed entirely.
- kfd_topology.c now uses a proper #ifdef CONFIG_X86_64 guard.
- All other cpuinfo_x86/cpu_data() references in the driver are
guarded by #if IS_ENABLED(CONFIG_X86) or #ifdef CONFIG_X86_64.

Removing this exclusion allows CONFIG_DRM_AMDGPU to be selected on UML,
which in turn enables running KUnit tests (such as amdgpu_dm_crc_test)
under UML without needing a full hardware-capable kernel build.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: rework userq reset work handling

It is illegal to schedule reset work from another reset work!

Fix this by scheduling the userq reset work directly on the work queue
of the reset domain.

Not fully tested, I leave that to the IGT test cases.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: pin mqd and fw object bo to avoid eviction

mqd and fw objects are queue core objects which should remain
valid and never be unmapped and evicted for user queues to work
properly.

During eviction if these buffers are evicted the hw continue to
use the invalid addresses and caused page faults and system hung.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: use drm_exec in amdgpu_userq_fence_read_wptr

To access the bo from vm mapping first lock the root bo and
then the object bo of the mapping to make sure both locks
are taken safely.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/xe/guc: Use xe_device_is_l2_flush_optimized()

We encapsulate the logic to check if the platform has L2 flush
optimization feature in xe_device_is_l2_flush_optimized(), but
guc_ctl_feature_flags() is using an open-coded version of that same type
of check. Fix that by replacing the open-coded check with
xe_device_is_l2_flush_optimized().

Reviewed-by: Himanshu Girotra <himanshu.girotra@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://patch.msgid.link/20260513-guc-l2-flush-opt-use-xe_device_is_l2_flush_optimized-v1-1-36fa866d6ed8@intel.com
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>

HID: core: Fix size_t specifier in hid_report_raw_event()

When building for 32-bit platforms, for which 'size_t' is
'unsigned int', there are warnings around using the incorrect format
specifier to print bsize in hid_report_raw_event():

  drivers/hid/hid-core.c:2054:29: error: format specifies type 'long' but the argument has type 'size_t' (aka 'unsigned int') [-Werror,-Wformat]
   2053 |                 hid_warn_ratelimited(hid, "Event data for report %d is incorrect (%d vs %ld)\n",
        |                                                                                         ~~~
        |                                                                                         %zu
   2054 |                                      report->id, csize, bsize);
        |                                                         ^~~~~
  drivers/hid/hid-core.c:2076:29: error: format specifies type 'long' but the argument has type 'size_t' (aka 'unsigned int') [-Werror,-Wformat]
   2075 |                 hid_warn_ratelimited(hid, "Event data for report %d was too short (%d vs %ld)\n",
        |                                                                                          ~~~
        |                                                                                          %zu
   2076 |                                      report->id, rsize, bsize);
        |                                                         ^~~~~

Use the proper 'size_t' format specifier, '%zu', to clear up the
warnings.

Cc: stable@vger.kernel.org
Fixes: 2c85c61d1332 ("HID: pass the buffer size to hid_report_raw_event")
Reported-by: Miguel Ojeda <ojeda@kernel.org>
Closes: https://lore.kernel.org/20260516020430.110135-1-ojeda@kernel.org/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'perf-upstream'

sched/cache: Fix stale preferred_llc for a new task

On fork without CLONE_VM, the child gets a new mm,
the parent's preferred_llc value is stale for the
child.

Fix this by resetting the task's preferred_llc to -1.

This bug was reported by sashiko.

Fixes: 47d8696b95f7 ("sched/cache: Assign preferred LLC ID to processes")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/0ec7309d0e24ede97656754d1505b7490403d966.1778703694.git.tim.c.chen@linux.intel.com

sched/cache: Fix has_multi_llcs iff at least one partition has multiple LLCs

sched_cache_present is a global static key, but build_sched_domains()
is called per partition from the "Build new domains" loop in
partition_sched_domains_locked(). Each call unconditionally sets the
key based solely on the has_multi_llcs local variable for that partition.

The call to the last partition set the value even when there
are previous partitions with multiple LLCs.

If partition A (multi-LLC) is built first, the key is enabled. Then
when partition B (single-LLC) is built, the key is disabled. The
multi-LLC partition A is still active but the key is now off.

Fix it by doing a similar thing as sched_energy_present: check the
multi-LLCs during the iteration over all the partitions rather than
checking it on a single partition.

This bug was reported by sashiko.

Fixes: d59f4fd1d303 ("sched/cache: Enable cache aware scheduling for multi LLCs NUMA node")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/c541af2547d54509fbfd3b3a1e8072e2e5c7ff68.1778703694.git.tim.c.chen@linux.intel.com

sched/cache: Fix cache aware scheduling enabling for multi LLCs system

If there are multiple LLCs in the system, cache aware scheduling
should be enabled. However, there is a corner case where, if there
is a single NUMA node and a single LLC per node, cache aware
scheduling will be turned on in the current implementation -
because at this moment, the parent domain has not yet been
degenerated, and it is possible that the current domain has the
same cpu span as its parent. There is no need to turn cache aware
scheduling on in this scenario.

Fix it by iterating the parent domains to find a domain that is
a superset of the current sd_llc, so that later, after the duplicated
parent domains have been degenerated, cache aware scheduling will
take effect.

For example, the expected behavior would be:
2 sockets, 1 LLC per socket: MC span=0-3, PKG span=0-7, has_multi_llcs=true
1 socket, 2 LLCs per socket: MC span=0-3, PKG span=0-7, has_multi_llcs=true
2 sockets, 2 LLCs per socket: MC span=0-3, PKG span=0-7, has_multi_llcs=true
1 socket, 1 LLC per socket: MC span=0-3, PKG span=0-3, has_multi_llcs=false

This bug was reported by sashiko.

Fixes: d59f4fd1d303 ("sched/cache: Enable cache aware scheduling for multi LLCs NUMA node")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/6328a8a7f40925cec2a712d81ee58128a4c4444a.1778703694.git.tim.c.chen@linux.intel.com

sched/cache: Fix race condition during sched domain rebuild

sched_cache_active_set_unlocked() checks hardware support without
locks:
static void sched_cache_active_set(bool locked)
{
        /* hardware does not support */
        if (!static_branch_likely(&sched_cache_present)) {
                _sched_cache_active_set(false, locked);
                return;
        }
    ...
If build_sched_domains() runs concurrently during CPU hotplug,
it can disable sched_cache_present under sched_domains_mutex
and the CPU hotplug lock. If a debugfs write thread evaluates
sched_cache_present as true right before that, and then blocks
or gets preempted, it might proceed to enable sched_cache_active
after the hardware support has been marked as absent. Make it
safer by acquiring cpus_read_lock() and sched_domains_mutex_lock()
when the user changes sched_cache_active via debugfs.

This bug was reported by sashiko.

Fixes: 067a31358143 ("sched/cache: Allow the user space to turn on and off cache aware scheduling")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/9afddf439687f04bb56b46625bd9f153eb8abad5.1778703694.git.tim.c.chen@linux.intel.com

sched/cache: Fix checking active load balance by only considering the CFS task

The currently running task cur may not be a CFS task, such as
an RT or Deadline task. For non-CFS tasks, the task_util(cur)
utilization average is not maintained, so this might pass a
stale or meaningless value to can_migrate_llc().

Check if the task is CFS before getting its task_util().

This bug was reported by sashiko.

Fixes: 714059f79ff0 ("sched/cache: Handle moving single tasks to/from their preferred LLC")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/f9161133cf040d286dca11344a112c5ef2a5253d.1778703694.git.tim.c.chen@linux.intel.com

sched/cache: Fix unpaired account_llc_enqueue/dequeue

There is a race condition that, after a task is enqueued
on a runqueue, task_llc(p) may change due to CPU hotplug,
because the llc_id is dynamically allocated and adjusted
at runtime.
Therefore, checking task_llc(p) to determine whether the
task is being dequeued from its preferred LLC is unreliable
and can cause inconsistent values.

To fix this problem, record whether p is enqueued on its
preferred LLC, in order to pair with account_llc_dequeue()
to maintain a consistent nr_pref_llc_running per runqueue.

This bug was reported by sashiko, and the solution was once
suggested by Prateek.

Fixes: 46afe3af7ead ("sched/cache: Track LLC-preferred tasks per runqueue")
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/0c8c6a1571d66792a4d2ff0103ba3cc13e059046.1778703694.git.tim.c.chen@linux.intel.com

sched/cache: Annotate lockless accesses to mm->sc_stat.cpu

mm->sc_stat.cpu is written by task_cache_work() and could be read
locklessly by several functions on other CPUs. Use READ_ONCE and
WRITE_ONCE on mm->sc_stat.cpu access and write to prevent inconsistent
values from compiler optimizations when there are multiple accesses.

For example in get_pref_llc(), if the writer updated the field between
two compiler-generated loads, the validation (e.g., cpu != -1) and
subsequent use (e.g., llc_id(cpu)) could operate on different values,
allowing a negative CPU ID to be used as an index.

Leave plain write in mm_init_sched(), where the mm is not
yet visible to other CPUs.

This bug was reported by sashiko.

Fixes: 47d8696b95f7 ("sched/cache: Assign preferred LLC ID to processes")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/63ea494f12efcf265d7134400a06cd75d7f2c310.1778703694.git.tim.c.chen@linux.intel.com