wifi: ath11k: HAL SRNG: don't deinitialize and re-initialize again
Don't deinitialize and reinitialize the HAL helpers. The dma memory is
deallocated and there is high possibility that we'll not be able to get
the same memory allocated from dma when there is high memory pressure.
wifi: ath12k: enforce CPU endian format for all QMI data
Currently, the QMI interface only works on little endian systems due to how
it encodes and decodes data. Most QMI related data structures do not use
endian specific types and are already defined in CPU native order. The
ath12k specific QMI structs are an exception: they use partially endian
specific types, which prevents the QMI interface from being extended to
support big endian systems.
Update the two affected ath12k QMI structs to use CPU order types instead.
This is required because the QMI interface is being extended to support big
endian system, and that support depends on QMI data structures being
defined in CPU native order.
This change:
* preserves compatibility with existing kernels, which only support little
endian system
* enables future support for big endian systems
* aligns ath12k QMI handling with the general QMI design
wifi: ath12k: Use 1KB Cache Flush Command for QoS TID Descriptors
Currently, if the descriptor size exceeds 128 bytes, the total
descriptor is split into multiple 128-byte segments, each
requiring a separate flush cache queue command. This results in
multiple commands being issued to flush a single TID, which
negatively impacts performance. To optimize this, use the
_FLUSH_QUEUE_1K_DESC REO command to flush a 1KB descriptor in a single
operation to optimize performance.
wifi: ath12k: Fix flush cache failure during RX queue update
Flush cache failures were observed after RX queue update for TID
delete. This occurred because the queue was invalid during flush.
Set the VLD bit in the RX queue update command for TID delete.
This ensures the queue remains valid during the flush cache process.
During stress test scenarios, when the REO command ring becomes full,
the RX queue update command issued during peer deletion fails due to
insufficient space. In response, the host performs a dma_unmap and
frees the associated memory. However, the hardware still retains a
reference to the same memory address. If the kernel later reallocates
this address, unaware that the hardware is still using it, it can
lead to memory corruption-since the host might access or modify
memory that is still actively referenced by the hardware.
Implement a retry mechanism for the HAL_REO_CMD_UPDATE_RX_QUEUE
command during TID deletion to prevent memory corruption. Introduce
a new list, reo_cmd_update_rx_queue_list, in the struct ath12k_dp to
track pending RX queue updates. Protect this list with
reo_rxq_flush_lock, which also ensures synchronized access to
reo_cmd_cache_flush_list. Defer memory release until hardware
confirms the virtual address is no longer in use, avoiding immediate
deallocation on command failure. Release memory for pending RX queue
updates via ath12k_dp_rx_reo_cmd_list_cleanup() on system reset
if hardware confirmation is not received.
wifi: ath12k: Refactor REO command to use ath12k_dp_rx_tid_rxq
Introduce ath12k_dp_rx_tid_rxq as a lightweight structure to represent
only the necessary fields for REO command construction. Replace direct
usage of ath12k_dp_rx_tid in REO command paths with this new structure.
This decouples REO command logic from internal TID state representation,
improves modularity, and reduces unnecessary data dependencies.
wifi: ath12k: Refactor RX TID deletion handling into helper function
Refactor RX TID deletion handling by moving the REO command
setup and send sequence into a new helper function:
ath12k_dp_rx_tid_delete_handler().
This improves code readability and modularity, and prepares
the codebase for potential reuse of the REO command logic in
other contexts where RX TID deletion is required.
These were used by S1G for older chandef representation, but
are no longer needed. Clean them up, even if we can't drop
them from the userspace API entirely.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
wifi: rtw89: avoid circular locking dependency in ser_state_run()
Lockdep gives a splat [1] when ser_hdl_work item is executed. It is
scheduled at mac80211 workqueue via ieee80211_queue_work() and takes a
wiphy lock inside. However, this workqueue can be flushed when e.g.
closing the interface and wiphy lock is already taken in that case.
Choosing wiphy_work_queue() for SER is likely not suitable. Back on to
the global workqueue.
[1]:
WARNING: possible circular locking dependency detected
6.17.0-rc2 #17 Not tainted
------------------------------------------------------
kworker/u32:1/61 is trying to acquire lock: ffff88811bc00768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: ser_state_run+0x5e/0x180 [rtw89_core]
but task is already holding lock: ffffc9000048fd30 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}, at: process_one_work+0x7b5/0x1450
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
wifi: rtw89: avoid possible TX wait initialization race
The value of skb_data->wait indicates whether skb is passed on to the
core mac80211 stack or released by the driver itself. Make sure that by
the time skb is added to txwd queue and becomes visible to the completing
side, it has already allocated and initialized TX wait related data (in
case it's needed).
This is found by code review and addresses a possible race scenario
described below:
Waiting thread Completing thread
rtw89_core_send_nullfunc()
rtw89_core_tx_write_link()
...
rtw89_pci_txwd_submit()
skb_data->wait = NULL
/* add skb to the queue */
skb_queue_tail(&txwd->queue, skb)
/* another thread (e.g. rtw89_ops_tx) performs TX kick off for the same queue */
rtw89_pci_napi_poll()
...
rtw89_pci_release_txwd_skb()
/* get skb from the queue */
skb_unlink(skb, &txwd->queue)
rtw89_pci_tx_status()
rtw89_core_tx_wait_complete()
/* use incorrect skb_data->wait */
rtw89_core_tx_kick_off_and_wait()
/* assign skb_data->wait but too late */
Found by Linux Verification Center (linuxtesting.org).
Fixes: 1ae5ca615285 ("wifi: rtw89: add function to wait for completion of TX skbs") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/20250919210852.823912-3-pchelkin@ispras.ru
The completing side might proceed and free the underlying skb even before
the waiting side is fully awoken and run to execution. Actually the race
happens regardless of wait_for_completion_timeout() exit status, e.g.
the waiting side may hit a timeout and the concurrent completing side is
still able to free the skb.
Skbs which are sent by rtw89_core_tx_kick_off_and_wait() are owned by the
driver. They don't come from core ieee80211 stack so no need to pass them
to ieee80211_tx_status_ni() on completing side.
Introduce a work function which will act as a garbage collector for
rtw89_tx_wait_info objects and the associated skbs. Thus no potentially
heavy locks are required on the completing side.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 1ae5ca615285 ("wifi: rtw89: add function to wait for completion of TX skbs") Cc: stable@vger.kernel.org Suggested-by: Zong-Zhe Yang <kevin_yang@realtek.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/20250919210852.823912-2-pchelkin@ispras.ru
wifi: ath12k: Fix peer lookup in ath12k_dp_mon_rx_deliver_msdu()
In ath12k_dp_mon_rx_deliver_msdu(), peer lookup fails because
rxcb->peer_id is not updated with a valid value. This is expected
in monitor mode, where RX frames bypass the regular RX
descriptor path that typically sets rxcb->peer_id.
As a result, the peer is NULL, and link_id and link_valid fields
in the RX status are not populated. This leads to a WARN_ON in
mac80211 when it receives data frame from an associated station
with invalid link_id.
Fix this potential issue by using ppduinfo->peer_id, which holds
the correct peer id for the received frame. This ensures that the
peer is correctly found and the associated link metadata is updated
accordingly.
Fixes: bd00cc7e8a4c ("wifi: ath12k: replace the usage of rx desc with rx_info") Signed-off-by: Hari Chandrakanthan <quic_haric@quicinc.com> Signed-off-by: Aishwarya R <aishwarya.r@oss.qualcomm.com> Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20250724040552.1170642-1-aishwarya.r@oss.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
wifi: mac80211: fix Rx packet handling when pubsta information is not available
In ieee80211_rx_handle_packet(), if the caller does not provide pubsta
information, an attempt is made to find the station using the address 2
(source address) field in the header. Since pubsta is missing, link
information such as link_valid and link_id is also unavailable. Now if such
a situation comes, and if a matching ML station entry is found based on
the source address, currently the packet is dropped due to missing link ID
in the status field which is not correct.
Hence, to fix this issue, if link_valid is not set and the station is an
ML station, make an attempt to find a link station entry using the source
address. If a valid link station is found, derive the link ID and proceed
with packet processing. Otherwise, drop the packet as per the existing
flow.
wifi: mac80211: correctly initialise S1G chandef for STA
When moving to the APs channel, ensure we correctly initialise the chandef
and perform the required validation. Additionally, if the AP is beaconing on a
2MHz primary, calculate the 2MHz primary center frequency by extracting
the sibling 1MHz primary and averaging the frequencies to find the 2MHz
primary center frequency.
wifi: cfg80211: correctly implement and validate S1G chandef
Currently, the S1G channelisation implementation differs from that of
VHT, which is the PHY that S1G is based on. The major difference between
the clock rate is 1/10th of VHT. However how their channelisation is
represented within cfg80211 and mac80211 vastly differ.
To rectify this, remove the use of IEEE80211_CHAN_1/2/4.. flags that were
previously used to indicate the control channel width, however it should be
implied that the control channels are 1MHz in the case of S1G. Additionally,
introduce the invert - being IEEE80211_CHAN_NO_4/8/16MHz - that imply
the control channel may not be used for a certain bandwidth. With these
new flags, we can perform regulatory and chandef validation just as we would
for VHT.
To deal with the notion that S1G PHYs may contain a 2MHz primary channel,
introduce a new variable, s1g_primary_2mhz, which indicates whether we are
operating on a 2MHz primary channel. In this case, the chandef::chan points to
the 1MHz primary channel pointed to by the primary channel location. Alongside
this, introduce some new helper routines that can extract the sibling 1MHz
channel. The sibling being the alternate 1MHz primary subchannel within the
2MHz primary channel that is not pointed to by chandef::chan.
Furthermore, due to unique restrictions imposed on S1G PHYs, introduce
a new flag, IEEE80211_CHAN_S1G_NO_PRIMARY, which states that the 1MHz channel
cannot be used as a primary channel. This is assumed to be set by vendors
as it is hardware and regdom specific, When we validate a 2MHz primary channel,
we need to ensure both 1MHz subchannels do not contain this flag. If one or
both of the 1MHz subchannels contain this flag then the 2MHz primary is not
permitted for use as a primary channel.
Properly integrate S1G channel validation such that it is implemented
according with other PHY types such as VHT. Additionally, implement a new
S1G-specific regulatory flag to allow cfg80211 to understand specific
vendor requirements for S1G PHYs.
Signed-off-by: Arien Judge <arien.judge@morsemicro.com> Signed-off-by: Andrew Pope <andrew.pope@morsemicro.com> Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20250918051913.500781-2-lachlan.hodges@morsemicro.com
[remove redundant NL80211_ATTR_S1G_PRIMARY_2MHZ check] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit 906a5a8c7152 ("wifi: mac80211: add tx_handlers_drop statistics
to ethtool") added a tx_handlers_drop counter to ethtool stats.
During review [1], Johannes noted that the existing debugfs counter
is now redundant. Remove the debugfs stat to avoid duplication and
streamline statistics reporting.
wifi: mac80211: Remove redundant rcu_read_lock/unlock() in spin_lock
Since commit a8bb74acd8efe ("rcu: Consolidate RCU-sched update-side function definitions")
there is no difference between rcu_read_lock(), rcu_read_lock_bh() and
rcu_read_lock_sched() in terms of RCU read section and the relevant grace
period. That means that spin_lock(), which implies rcu_read_lock_sched(),
also implies rcu_read_lock().
There is no need no explicitly start a RCU read section if one has already
been started implicitly by spin_lock().
Simplify the code and remove the inner rcu_read_lock() invocation.
Ilan Peer [Mon, 8 Sep 2025 11:13:08 +0000 (14:13 +0300)]
wifi: mac80211_hwsim: Add simulation support for NAN device
Add support for simulating a NAN Device interface:
- Update interface limits to include support for NAN Device.
- Increase the number of supported HW addresses to allow unique
addresses for combination such as: station interface + P2P
Device interface + NAN Device interface.
- Declare support for NAN capabilities, specifically support for
NAN synchronization offload and NAN DE user space support.
- Add the relevant callbacks to support start/stop NAN Device
operation.
- Use a timer to simulate starting a Discovery Window (currently
the timer doesn't do much).
- Update the Tx path to simulate that the channel used for NAN
Device is either channel 6 or channel 149.
- Send DW notification when DW starts.
- Send cluster join notification when new cluster starts, or when an
existing cluster is joined. "Joining" is implemented by reusing the
cluster id of any other existing NAN management interface.
Ilan Peer [Mon, 8 Sep 2025 11:13:05 +0000 (14:13 +0300)]
wifi: mac80211: Get the correct interface for non-netdev skb status
The function ieee80211_sdata_from_skb() always returned the P2P Device
interface in case the skb was not associated with a netdev and didn't
consider the possibility that an NAN Device interface is also enabled.
To support configurations where both P2P Device and a NAN Device
interface are active, extend the function to match the correct
interface based on address 2 in the 802.11 MAC header.
Since the 'p2p_sdata' field in struct ieee80211_local is no longer
needed, remove it.
Ilan Peer [Mon, 8 Sep 2025 11:13:03 +0000 (14:13 +0300)]
wifi: mac80211: Accept management frames on NAN interface
Accept Public Action frames and Authentication frames on
NAN Device interface to support flows that require these frames:
- SDFs: For user space Discovery Engine (DE) implementation.
- NAFs: For user space NAN Data Path (NDP) establishment.
- Authentication frames: For NAN Pairing and Verification.
Accept only frames from devices that are part of the NAN
cluster.
Ilan Peer [Mon, 8 Sep 2025 11:13:02 +0000 (14:13 +0300)]
wifi: mac80211: Support Tx of action frame for NAN
Add support for sending management frame over a NAN Device
interface:
- Declare support for the supported management frames types.
- Since action frame transmissions over a NAN Device interface
do not necessarily require a channel configuration, e.g., they
can be transmitted during DW, modify the Tx path to avoid
accessing channel information for NAN Device interface.
- In addition modify the points in the Tx path logic to account
for cases that a band is not specified in the Tx information.
Ilan Peer [Mon, 8 Sep 2025 11:13:01 +0000 (14:13 +0300)]
wifi: cfg80211: Store the NAN cluster ID
When the driver indicates that the device has joined
a cluster, store the cluster ID. This is needed for data
path operations, e.g., filtering received frames etc.
Ilan Peer [Mon, 8 Sep 2025 11:13:00 +0000 (14:13 +0300)]
wifi: cfg80211: Support Tx/Rx of action frame for NAN
Add support for sending and receiving action frames over a NAN Device
interface:
- For Synchronized NAN operation NAN Service Discovery
Frames (SDFs) and NAN Action Frames (NAFs) transmissions
over a NAN Device interface, a channel parameter is not
mandatory as the frame can be transmitted based on the NAN
Device schedule.
- For Unsynchronized NAN Discovery (USD) operation the
SDFs and NAFs could be transmitted using NL80211_CMD_FRAME
where a specific channel and dwell time are configured.
As Synchronized NAN Operation and USD can be done concurrently,
both modes need to be supported. Thus, allow sending NAN action
frames when user space handles the NAN Discovery Engine (DE) with
and without providing a channel as a parameter.
To support reception of NAN Action frames and Authentication
frames (used for NAN paring and verification) allow to
register for management frame reception of NAN Device interface
when user space handles the NAN DE.
Add better break down for NAN capabilities, as NAN has multiple optional
features. This allows to better indicate which features are supported or
or offloaded to the device.
wifi: cfg80211: Add cluster joined notification APIs
The drivers should notify upper layers and user space when a NAN device
joins a cluster. This is needed, for example, to set the correct addr3
in SDF frames. Add API to report cluster join event.
wifi: nl80211: Add NAN Discovery Window (DW) notification
This notification will be used by the device to inform user space
about upcoming DW. When received, user space will be able to prepare
multicast Service Discovery Frames (SDFs) to be transmitted during the
next DW using %NL80211_CMD_FRAME command on the NAN management interface.
The device/driver will take care to transmit the frames in the correct
timing. This allows to implement a synchronized Discovery Engine (DE)
in user space, if the device doesn't support DE offload.
Note that this notification can be sent before the actual DW starts as
long as the driver/device handles the actual timing of the SDF
transmission.
wifi: nl80211: Add more configuration options for NAN commands
Current NAN APIs have only basic configuration for master
preference and operating bands. Add and parse additional parameters
which provide more control over NAN synchronization. The newly added
attributes allow to publish additional NAN attributes and vendor
elements in NAN beacons, control scan and discovery beacons
periodicity, enable/disable DW notifications etc.
wifi: ath12k: Extend beacon miss handling for MLO non-AP STA
Currently, ath12k_mac_handle_beacon_miss() does not handle the beacon
miss for the MLO case.
In MLO scenarios, the host fails to process the beacon miss because the
vdev_id comparison in ath12k_mac_handle_beacon_miss_iter() does not match.
This mismatch occurs since arvif always points to ahvif->deflink, which may
not correspond to the actual vdev_id associated with the event.
Fix this by retrieving arvif from vdev_id instead of ahvif->deflink which
will work for both MLO and Non-MLO case.
Also refactor the ath12k_mac_handle_beacon_miss(), by passing arvif
directly instead of vdev_id and remove ath12k_mac_handle_beacon_miss_iter()
which is no longer needed.
ath12k_mac_handle_beacon_miss() is called from ath12k_roam_event() for WCN
chipsets and ath12k_peer_sta_kickout_event() for QCN chipsets.
So, refactor the ath12k_roam_event() to pass arvif instead vdev_id to the
ath12k_mac_handle_beacon_miss() function to align with the
ath12k_peer_sta_kickout_event() and change the rcu_read_lock() to
guard(rcu)() in the same function ath12k_roam_event().
wifi: ath12k: Add support to handle reason inactivity STA kickout event for QCN9274/IPQ5332
Currently, when the non-AP STA connected to the AP STA, and the AP STA goes
down or becomes inactive without indication, firmware detects the beacon
miss and sends the WMI event WMI_PEER_STA_KICKOUT_EVENTID with reason as
INACTIVITY. The host driver handles this event as low ACK and reports it to
the mac80211 driver.
However, the expectation is that non-AP STA should be disconnected from
AP STA instantly once it receives the STA kickout event with reason of
inactivity.
Trigger a disconnect from AP STA through beacon miss handling upon
receiving non-AP STA peer kickout event with reason code inactivity.
Replace the helper function ath12k_mac_get_ar_by_vdev_id() with
ath12k_mac_get_arvif_by_vdev_id() due to the following reasons.
1. Check the station VIF type for handling the beacon miss.
2. Retrieve the proper ar from the arvif by checking the vdev_id with
vdev_map and link_map lookup which is needed for the MLO case in the
following patch.
wifi: ath12k: enhance the WMI_PEER_STA_KICKOUT event with reasons and RSSI reporting
Enhance the WMI_PEER_STA_KICKOUT event by adding support for reporting the
kickout reason and RSSI value. The reason code will be used in the
following patches when the beacon miss handling is added.
Lingbo Kong [Tue, 12 Aug 2025 03:00:36 +0000 (11:00 +0800)]
wifi: ath12k: report station mode per-chain signal strength
Currently, command “iw wlan0 station dump” does not show per-chain signal
strength.
This is because ath12k does not handle the num_per_chain_rssi and
rssi_avg_beacon reported by firmware to ath12k.
To address this, update ath12k to send WMI_REQUEST_STATS_CMDID with the
flag WMI_REQUEST_RSSI_PER_CHAIN_STAT to the firmware. Then, add logic to
handle num_per_chain_rssi and rssi_avg_beacon in the
ath12k_wmi_tlv_fw_stats_parse(), and assign the resulting per-chain signal
strength to the chain_signal of struct station_info.
After that, "iw dev xxx station dump" shows the correct per-chain signal
strength.
Such as:
Station AA:BB:CC:DD:EE:FF (on wlan0)
inactive time: 212 ms
rx bytes: 10398
rx packets: 64
tx bytes: 4362
tx packets: 33
tx retries: 49
tx failed: 0
beacon loss: 0
beacon rx: 14
rx drop misc: 16
signal: -45 [-51, -46] dBm
beacon signal avg: -44 dBm
It appears that not all hardware/firmware implementations support
group key deletion correctly, which can lead to connection hangs
and deauthentication following GTK rekeying (delete and install).
To avoid this issue, instead of attempting to delete the key using
the special WMI_CIPHER_NONE value, we now replace the key with an
invalid (random) value.
This behavior has been observed with WCN39xx chipsets.
Baochen Qiang [Mon, 11 Aug 2025 09:26:45 +0000 (17:26 +0800)]
wifi: ath10k: avoid unnecessary wait for service ready message
Commit e57b7d62a1b2 ("wifi: ath10k: poll service ready message before
failing") works around the failure in waiting for the service ready
message by active polling. Note the polling is triggered after initial
wait timeout, which means that the wait-till-timeout can not be avoided
even the message is ready.
A possible fix is to do polling once before wait as well, however this
can not handle the race that the message arrives right after polling.
So the solution is to do periodic polling until timeout.
wifi: ath11k: fix NULL dereference in ath11k_qmi_m3_load()
If ab->fw.m3_data points to data, then fw pointer remains null.
Further, if m3_mem is not allocated, then fw is dereferenced to be
passed to ath11k_err function.
Replace fw->size by m3_len.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
wifi: ath: Use of_reserved_mem_region_to_resource() for "memory-region"
Use the newly added of_reserved_mem_region_to_resource() function to
handle "memory-region" properties.
The error handling is a bit different for ath10k. "memory-region" is
optional, so failed lookup is not an error. But then an error in
of_address_to_resource() is treated as an error. However, that
distinction is not really important. Either the region is available
and usable or it is not. So now, it is just
of_reserved_mem_region_to_resource() which is checked for an error.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20250813214933.897486-1-robh@kernel.org Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Arnd Bergmann [Fri, 8 Aug 2025 15:18:00 +0000 (17:18 +0200)]
wifi: ath10k: remove gpio number assignment
The leds-gpio traditionally takes a global gpio number in its platform
data, but the number assigned here is not actually such a number but
only meant to be used internally to this driver.
As part of the kernel-wide cleanup of the old gpiolib interfaces, the
'gpio' number field is going away, so to keep ath10k building, move
the assignment into a private structure instead.
Baochen Qiang [Fri, 15 Aug 2025 01:44:58 +0000 (09:44 +0800)]
wifi: ath12k: downgrade log level for CE buffer enqueue failure
There are two rings involved in the Copy Engine (CE) receive path
handling, the CE status (STS) ring and the CE destination (DST) ring.
Each time CE hardware needs to send an event (e.g. WMI event) to host,
CE hardware finds a buffer (to which the tail pointer (TP) points) in
DST ring and fills it with payload, then hardware fills meta data in
STS ring and fires interrupt to host. Please note the TP of DST ring is
expected to be advanced by CE hardware before interrupting host. While
handling the interrupt, host finds that DST ring buffers are used hence
increases rx_buf_needed to record the number of buffers to be replenished.
Note before that, host compares TP and head pointer (HP) of DST ring to
see if there is available space. Normally rx_buf_needed simply equals
available space. But sometimes CE hardware doesn't (for whatever reason)
update TP timely, making the comparison fails, then enqueue is cancelled
and a warning is logged.
However even enqueue fails this time, rx_buf_needed still records the
numbers of needed buffers. Later when TP gets updated correctly, the
missing buffer will be eventually replenished. And there is no doubt on
the late update, it always comes (or lots of such warnings should be seen).
Since this won't cause any functional issue, downgrade logging level to
avoid misleading.
Baochen Qiang [Fri, 15 Aug 2025 01:44:56 +0000 (09:44 +0800)]
wifi: ath11k: downgrade log level for CE buffer enqueue failure
There are two rings involved in the Copy Engine (CE) receive path
handling, the CE status (STS) ring and the CE destination (DST) ring.
Each time CE hardware needs to send an event (e.g. WMI event) to host,
CE hardware finds a buffer (to which the tail pointer (TP) points) in
DST ring and fills it with payload, then hardware fills meta data in
STS ring and fires interrupt to host. Please note the TP of DST ring is
expected to be advanced by CE hardware before interrupting host. While
handling the interrupt, host finds that DST ring buffers are used hence
increases rx_buf_needed to record the number of buffers to be replenished.
Note before that, host compares TP and head pointer (HP) of DST ring to
see if there is available space. Normally rx_buf_needed simply equals
available space. But sometimes CE hardware doesn't (for whatever reason)
update TP timely, making the comparison fails, then enqueue is cancelled
and a warning is logged:
ath11k_pci 0000:02:00.0: failed to enqueue rx buf: -28
However even enqueue fails this time, rx_buf_needed still records the
numbers of needed buffers. Later when TP gets updated correctly, the
missing buffer will be eventually replenished. And there is no doubt on
the late update, it always comes (or lots of such warnings should be seen).
Since this won't cause any functional issue, downgrade logging level to
avoid misleading.
Sriram R [Wed, 23 Jul 2025 19:06:51 +0000 (00:36 +0530)]
wifi: ath12k: Add fallback for invalid channel number in PHY metadata
Currently, ath12k_dp_rx_h_ppdu() determines the band and frequency
based on the channel number and center frequency from the RX descriptor's
PHY metadata. However, in rare cases, it is observed that frequency
retrieved from the metadata may be invalid or unexpected especially for
6 GHz frames.
This can result in a NULL sband, which prevents proper frequency assignment
in rx_status and potentially leading to incorrect RX packet classification.
To fix this potential issue, add a fallback mechanism that uses
ar->rx_channel to populate the band and frequency when the derived
sband is invalid or missing.
Kang Yang [Tue, 22 Jul 2025 09:59:34 +0000 (17:59 +0800)]
wifi: ath12k: fix the fetching of combined rssi
Currently, host fetches combined rssi from rssi_comb in struct
hal_rx_phyrx_rssi_legacy_info.
rssi_comb is 8th to 15th bits of the second to last variable.
rssi_comb_ppdu is the 0th to 7th of the last variable.
When bandwidth = 20MHz, rssi_comb = rssi_comb_ppdu. But when bandwidth >
20MHz, rssi_comb < rssi_comb_ppdu because rssi_comb only includes power
of primary 20 MHz while rssi_comb_ppdu includes power of active
RUs/subchannels. So should fetch combined rssi from rssi_comb_ppdu.
Also related macro definitions are too long, rename them.
Kang Yang [Tue, 22 Jul 2025 09:59:33 +0000 (17:59 +0800)]
wifi: ath12k: fix HAL_PHYRX_COMMON_USER_INFO handling in monitor mode
Current monitor mode will parse TLV HAL_PHYRX_OTHER_RECEIVE_INFO with
struct hal_phyrx_common_user_info.
Obviously, they do not match. The original intention here was to parse
HAL_PHYRX_COMMON_USER_INFO. So fix it by correctly parsing
HAL_PHYRX_COMMON_USER_INFO instead.
Also add LTF parsing and report to radiotap along with GI.
Baochen Qiang [Mon, 4 Aug 2025 03:03:10 +0000 (11:03 +0800)]
wifi: ath12k: initialize eirp_power before use
Currently, at the end of ath12k_mac_fill_reg_tpc_info(), the
reg_tpc_info struct is populated, including the following:
reg_tpc_info->is_psd_power = is_psd_power;
reg_tpc_info->eirp_power = eirp_power;
Kernel test robot complains on uninitialized symbol:
drivers/net/wireless/ath/ath12k/mac.c:10069
ath12k_mac_fill_reg_tpc_info() error: uninitialized symbol 'eirp_power'
This is because there are some code paths that never set eirp_power, so
the assignment of reg_tpc_info->eirp_power can come from an
uninitialized variable. Functionally this is OK since the eirp_power
only has meaning when is_psd_power is true, and all code paths which set
is_psd_power to true also set eirp_power. However, to keep the robot
happy, always initialize eirp_power before use.
Fixes: aeda163bb0c7 ("wifi: ath12k: fill parameters for vdev set TPC power WMI command") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202505180927.tbNWr3vE-lkp@intel.com/ Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20250804-ath12k-fix-smatch-warning-on-6g-vlp-v1-1-56f1e54152ab@oss.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
wifi: ath12k: Add support to set per-radio RTS threshold
Currently, command to set RTS threshold makes changes to the threshold of
all radios in the multi-radio wiphy. But each radio in a multi-radio wiphy
can have different RTS threshold requirements.
To support this requirement, use the index of radio for which the RTS
threshold needs to be changed from mac80211 - radio_idx. Based on the value
passed, set the RTS threshold value for the corresponding radios. Following
are the possible values of radio_idx and the corresponding behavior in
multi-radio wiphys:
1. radio_idx is -1: consider RTS threshold as a global parameter, i.e.,
make changes to all the radios in a wiphy. If setting RTS threshold
fails for any radio, then the previous RTS threshold values of
respective radios will be restored.
2. radio_idx denotes a specific radio: make changes in RTS threshold to
that radio alone.
3. radio_idx is any other number: report it as an invalid number.
In case of single-radio wiphys, continue with the existing behavior, i.e.,
set the passed RTS threshold value to the radio present.
Cross-merge networking fixes after downstream PR (net-6.17-rc7).
No conflicts.
Adjacent changes:
drivers/net/ethernet/mellanox/mlx5/core/en/fs.h 9536fbe10c9d ("net/mlx5e: Add PSP steering in local NIC RX") 7601a0a46216 ("net/mlx5e: Add a miss level for ipsec crypto offload")
- net: clear sk->sk_ino in sk_set_socket(sk, NULL), fix CRIU
Previous releases - regressions:
- bonding: set random address only when slaves already exist
- rxrpc: fix untrusted unsigned subtract
- eth:
- ice: fix Rx page leak on multi-buffer frames
- mlx5: don't return mlx5_link_info table when speed is unknown
Previous releases - always broken:
- tls: make sure to abort the stream if headers are bogus
- tcp: fix null-deref when using TCP-AO with TCP_REPAIR
- dpll: fix skipping last entry in clock quality level reporting
- eth: qed: don't collect too many protection override GRC elements,
fix memory corruption"
* tag 'net-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
cnic: Fix use-after-free bugs in cnic_delete_task
devlink rate: Remove unnecessary 'static' from a couple places
MAINTAINERS: update sundance entry
net: liquidio: fix overflow in octeon_init_instr_queue()
net: clear sk->sk_ino in sk_set_socket(sk, NULL)
Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
selftests: tls: test skb copy under mem pressure and OOB
tls: make sure to abort the stream if headers are bogus
selftest: packetdrill: Add tcp_fastopen_server_reset-after-disconnect.pkt.
tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().
octeon_ep: fix VF MAC address lifecycle handling
selftests: bonding: add vlan over bond testing
bonding: don't set oif to bond dev when getting NS target destination
net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer
net/mlx5e: Add a miss level for ipsec crypto offload
net/mlx5e: Harden uplink netdev access against device unbind
MAINTAINERS: make the DPLL entry cover drivers
doc/netlink: Fix typos in operation attributes
igc: don't fail igc_probe() on LED setup error
...
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"These are mostly Oliver's Arm changes: lock ordering fixes for the
vGIC, and reverts for a buggy attempt to avoid RCU stalls on large
VMs.
Arm:
- Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
visiting from an MMU notifier
- Fixes to the TLB match process and TLB invalidation range for
managing the VCNR pseudo-TLB
- Prevent SPE from erroneously profiling guests due to UNKNOWN reset
values in PMSCR_EL1
- Fix save/restore of host MDCR_EL2 to account for eagerly
programming at vcpu_load() on VHE systems
- Correct lock ordering when dealing with VGIC LPIs, avoiding
scenarios where an xarray's spinlock was nested with a *raw*
spinlock
- Permit stage-2 read permission aborts which are possible in the
case of NV depending on the guest hypervisor's stage-2 translation
- Call raw_spin_unlock() instead of the internal spinlock API
- Fix parameter ordering when assigning VBAR_EL1
- Reverted a couple of fixes for RCU stalls when destroying a stage-2
page table.
There appears to be some nasty refcounting / UAF issues lurking in
those patches and the band-aid we tried to apply didn't hold.
s390:
- mm fixes, including userfaultfd bug fix
x86:
- Sync the vTPR from the local APIC to the VMCB even when AVIC is
active.
This fixes a bug where host updates to the vTPR, e.g. via
KVM_SET_LAPIC or emulation of a guest access, are lost and result
in interrupt delivery issues in the guest"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active
Revert "KVM: arm64: Split kvm_pgtable_stage2_destroy()"
Revert "KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables"
KVM: arm64: vgic: fix incorrect spinlock API usage
KVM: arm64: Remove stage 2 read fault check
KVM: arm64: Fix parameter ordering for VBAR_EL1 assignment
KVM: arm64: nv: Fix incorrect VNCR invalidation range calculation
KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock
KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks
KVM: arm64: Spin off release helper from vgic_put_irq()
KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs
KVM: arm64: vgic: Drop stale comment on IRQ active state
KVM: arm64: VHE: Save and restore host MDCR_EL2 value correctly
KVM: arm64: Initialize PMSCR_EL1 when in VHE
KVM: arm64: nv: fix VNCR TLB ASID match logic for non-Global entries
KVM: s390: Fix FOLL_*/FAULT_FLAG_* confusion
KVM: s390: Fix incorrect usage of mmu_notifier_register()
KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
KVM: arm64: Mark freed S2 MMUs as invalid
Merge tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
"Fixes and new HW support:
- amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list
- amd/pmf: Support new ACPI ID AMDI0108
- asus-wmi: Re-add extra keys to ignore_key_wlan quirk
- oxpec: Add support for AOKZOE A1X and OneXPlayer X1Pro EVA-02"
* tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: asus-wmi: Re-add extra keys to ignore_key_wlan quirk
platform/x86/amd/pmf: Support new ACPI ID AMDI0108
platform/x86: oxpec: Add support for AOKZOE A1X
platform/x86: oxpec: Add support for OneXPlayer X1Pro EVA-02
platform/x86/amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list
Merge tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML fixes from Johannes Berg:
"A few fixes for UML, which I'd meant to send earlier but then forgot.
All of them are pretty long-standing issues that are either not really
happening (the UAF), in rarely used code (the FD buffer issue), or an
issue only for some host configurations (the executable stack):
- mark stack not executable to work on more modern systems with
selinux
* tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
um: Fix FD copy size in os_rcv_fd_msg()
um: virtio_uml: Fix use-after-free after put_device in probe
um: Don't mark stack executable
octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
The original code relies on cancel_delayed_work() in otx2_ptp_destroy(),
which does not ensure that the delayed work item synctstamp_work has fully
completed if it was already running. This leads to use-after-free scenarios
where otx2_ptp is deallocated by otx2_ptp_destroy(), while synctstamp_work
remains active and attempts to dereference otx2_ptp in otx2_sync_tstamp().
Furthermore, the synctstamp_work is cyclic, the likelihood of triggering
the bug is nonnegligible.
A typical race condition is illustrated below:
CPU 0 (cleanup) | CPU 1 (delayed work callback)
otx2_remove() |
otx2_ptp_destroy() | otx2_sync_tstamp()
cancel_delayed_work() |
kfree(ptp) |
| ptp = container_of(...); //UAF
| ptp-> //UAF
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the delayed work item is properly canceled before the otx2_ptp is
deallocated.
This bug was initially identified through static analysis. To reproduce
and test it, I simulated the OcteonTX2 PCI device in QEMU and introduced
artificial delays within the otx2_sync_tstamp() function to increase the
likelihood of triggering the bug.
Fixes: 2958d17a8984 ("octeontx2-pf: Add support for ptp 1-step mode on CN10K silicon") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The original code uses cancel_delayed_work() in cnic_cm_stop_bnx2x_hw(),
which does not guarantee that the delayed work item 'delete_task' has
fully completed if it was already running. Additionally, the delayed work
item is cyclic, the flush_workqueue() in cnic_cm_stop_bnx2x_hw() only
blocks and waits for work items that were already queued to the
workqueue prior to its invocation. Any work items submitted after
flush_workqueue() is called are not included in the set of tasks that the
flush operation awaits. This means that after the cyclic work items have
finished executing, a delayed work item may still exist in the workqueue.
This leads to use-after-free scenarios where the cnic_dev is deallocated
by cnic_free_dev(), while delete_task remains active and attempt to
dereference cnic_dev in cnic_delete_task().
A typical race condition is illustrated below:
CPU 0 (cleanup) | CPU 1 (delayed work callback)
cnic_netdev_event() |
cnic_stop_hw() | cnic_delete_task()
cnic_cm_stop_bnx2x_hw() | ...
cancel_delayed_work() | /* the queue_delayed_work()
flush_workqueue() | executes after flush_workqueue()*/
| queue_delayed_work()
cnic_free_dev(dev)//free | cnic_delete_task() //new instance
| dev = cp->dev; //use
Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the cyclic delayed work item is properly canceled and that any
ongoing execution of the work item completes before the cnic_dev is
deallocated. Furthermore, since cancel_delayed_work_sync() uses
__flush_work(work, true) to synchronously wait for any currently
executing instance of the work item to finish, the flush_workqueue()
becomes redundant and should be removed.
This bug was identified through static analysis. To reproduce the issue
and validate the fix, I simulated the cnic PCI device in QEMU and
introduced intentional delays — such as inserting calls to ssleep()
within the cnic_delete_task() function — to increase the likelihood
of triggering the bug.
devlink rate: Remove unnecessary 'static' from a couple places
devlink_rate_node_get_by_name() and devlink_rate_nodes_destroy() have a
couple of unnecessary static variables for iterating over devlink rates.
This could lead to races/corruption/unhappiness if two concurrent
operations execute the same function.
Remove 'static' from both. It's amazing this was missed for 4+ years.
While at it, I confirmed there are no more examples of this mistake in
net/ with 1, 2 or 3 levels of indentation.
net: liquidio: fix overflow in octeon_init_instr_queue()
The expression `(conf->instr_type == 64) << iq_no` can overflow because
`iq_no` may be as high as 64 (`CN23XX_MAX_RINGS_PER_PF`). Casting the
operand to `u64` ensures correct 64-bit arithmetic.
Fixes: f21fb3ed364b ("Add support of Cavium Liquidio ethernet adapters") Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
This reverts commit d24341740fe48add8a227a753e68b6eedf4b385a.
It causes errors when trying to configure QoS, as well as
loss of L2 connectivity (on multi-host devices).
Reported-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/20250910170011.70528106@kernel.org Fixes: d24341740fe4 ("net/mlx5e: Update and set Xon/Xoff upon port speed set") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patchset introduces a new dedicated ethtool_ops callback,
.get_rx_ring_count, which enables drivers to provide the number of RX
rings directly, improving efficiency and clarity in RX ring queries and
RSS configuration.
Number of drivers implements .get_rxnfc callback just to report the ring
count, so, having a proper callback makes sense and simplify .get_rxnfc
(in some cases remove it completely).
This has been suggested by Jakub, and follow the same idea as RXFH
driver callbacks [1].
This also port virtio_net to this new callback. Once there is consensus
on this approach, I can start moving the drivers to this new callback.
net: virtio_net: add get_rxrings ethtool callback for RX ring queries
Replace the existing virtnet_get_rxnfc callback with a dedicated
virtnet_get_rxrings implementation to provide the number of RX rings
directly via the new ethtool_ops get_rx_ring_count pointer.
This simplifies the RX ring count retrieval and aligns virtio_net with
the new ethtool API for querying RX ring parameters.
net: ethtool: use the new helper in rss_set_prep_indir()
Refactor rss_set_prep_indir() to utilize the new
ethtool_get_rx_ring_count() helper for determining the number of RX
rings, replacing the direct use of get_rxnfc with ETHTOOL_GRXRINGS.
This ensures compatibility with both legacy and new ethtool_ops
interfaces by transparently multiplexing between them.
net: ethtool: update set_rxfh_indir to use ethtool_get_rx_ring_count helper
Modify ethtool_set_rxfh() to use the new ethtool_get_rx_ring_count()
helper function for retrieving the number of RX rings instead of
directly calling get_rxnfc with ETHTOOL_GRXRINGS.
This way, we can leverage the new helper if it is available in ethtool_ops.
net: ethtool: update set_rxfh to use ethtool_get_rx_ring_count helper
Modify ethtool_set_rxfh() to use the new ethtool_get_rx_ring_count()
helper function for retrieving the number of RX rings instead of
directly calling get_rxnfc with ETHTOOL_GRXRINGS.
This way, we can leverage the new helper if it is available in ethtool_ops.
net: ethtool: add get_rx_ring_count callback to optimize RX ring queries
Add a new optional get_rx_ring_count callback in ethtool_ops to allow
drivers to provide the number of RX rings directly without going through
the full get_rxnfc flow classification interface.
Create ethtool_get_rx_ring_count() to use .get_rx_ring_count if
available, falling back to get_rxnfc() otherwise. It needs to be
non-static, given it will be called by other ethtool functions laters,
as those calling get_rxfh().
net: ethtool: add support for ETHTOOL_GRXRINGS ioctl
This patch adds handling for the ETHTOOL_GRXRINGS ioctl command in the
ethtool ioctl dispatcher. It introduces a new helper function
ethtool_get_rxrings() that calls the driver's get_rxnfc() callback with
appropriate parameters to retrieve the number of RX rings supported
by the device.
By explicitly handling ETHTOOL_GRXRINGS, userspace queries through
ethtool can now obtain RX ring information in a structured manner.
In this patch, ethtool_get_rxrings() is a simply copy of
ethtool_get_rxnfc().
net: ethtool: pass the num of RX rings directly to ethtool_copy_validate_indir
Modify ethtool_copy_validate_indir() and callers to validate indirection
table entries against the number of RX rings as an integer instead of
accessing rx_rings->data.
This will be useful in the future, given that struct ethtool_rxnfc might
not exist for native GRXRINGS call.
Eric Dumazet [Thu, 18 Sep 2025 11:35:46 +0000 (11:35 +0000)]
psp: rename our psp_dev_destroy()
psp_dev_destroy() was already used in drivers/crypto/ccp/psp-dev.c
Use psp_dev_free() instead, to avoid a link error when
CRYPTO_DEV_SP_CCP=y
Fixes: 00c94ca2b99e ("psp: base PSP device support") Closes: https://lore.kernel.org/netdev/CANn89i+ZdBDEV6TE=Nw5gn9ycTzWw4mZOpPuCswgwEsrgOyNnw@mail.gmail.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250918113546.177946-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Thu, 18 Sep 2025 11:09:44 +0000 (13:09 +0200)]
Merge branch 'bnxt_en-updates-for-net-next'
Michael Chan says:
====================
bnxt_en: Updates for net-next
This series includes some code clean-ups and optimizations. New features
include 2 new backing store memory types to collect FW logs for core
dumps, dynamic SRIOV resource allocations for RoCE, and ethtool tunable
for PFC watchdog.
v2: Drop patch #4. The patch makes the code different from the original
bnxt_hwrm_func_backing_store_cfg_v2() that allows instance_bmap to have
bits that are not contiguous. It is safer to keep the original code.
Michael Chan [Wed, 17 Sep 2025 04:08:39 +0000 (21:08 -0700)]
bnxt_en: Implement ethtool .set_tunable() for ETHTOOL_PFC_PREVENTION_TOUT
Support the setting of the tunable if it is supported by firmware.
The supported range is 0 to the maximum msec value reported by
firmware. PFC_STORM_PREVENTION_AUTO is also supported and 0 means it
is disabled.
bnxt_en: Support for RoCE resources dynamically shared within VFs.
Add support for dynamic RoCE SRIOV resource configuration. Instead of
statically dividing the RoCE resources by the number of VFs, provide
the maximum resources and let the FW dynamically dsitribute to the VFs
on the fly.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Anantha Prabhu <anantha.prabhu@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250917040839.1924698-8-michael.chan@broadcom.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
bnxt_en: Add fw log trace support for 5731X/5741X chips
These older chips now support the fw log traces via backing store
qcaps_v2. No other backing store memory types are supported besides
the fw trace types.
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250917040839.1924698-6-michael.chan@broadcom.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michael Chan [Wed, 17 Sep 2025 04:08:33 +0000 (21:08 -0700)]
bnxt_en: Improve bnxt_backing_store_cfg_v2()
Improve the logic that determines the last_type in this function.
The different context memory types are configured in a loop. The
last_type signals the last context memory type to be configured
which requires the ALL_DONE flag to be set for the FW.
The existing logic makes some assumptions that TIM is the last_type
when RDMA is enabled or FTQM is the last_type when only L2 is
enabled. Improve it to just search for the last_type so that we
don't need to make these assumptions that won't necessary be true
for future devices.
Kalesh AP [Wed, 17 Sep 2025 04:08:32 +0000 (21:08 -0700)]
bnxt_en: Optimize bnxt_sriov_disable()
bnxt_sriov_disable() is invoked from 2 places:
1. When the user deletes the VFs.
2. During the unload of the PF driver instance.
Inside bnxt_sriov_disable(), driver invokes
bnxt_restore_pf_fw_resources() which in turn causes a close/open_nic().
There is no harm doing this in the unload path, although it is inefficient
and unnecessary.
Optimize the function to make it more efficient.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250917040839.1924698-4-michael.chan@broadcom.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kalesh AP [Wed, 17 Sep 2025 04:08:31 +0000 (21:08 -0700)]
bnxt_en: Remove unnecessary VF check in bnxt_hwrm_nvm_req()
The driver registers the supported configuration parameters with the
devlink stack only on the PF using devlink_params_register().
Hence there is no need for a VF check inside bnxt_hwrm_nvm_req().
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250917040839.1924698-3-michael.chan@broadcom.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kalesh AP [Wed, 17 Sep 2025 04:08:30 +0000 (21:08 -0700)]
bnxt_en: Drop redundant if block in bnxt_dl_flash_update()
The devlink stack has sanity checks and it invokes flash_update()
only if it is supported by the driver. The VF driver does not
advertise the support for flash_update in struct devlink_ops.
This makes if condition inside bnxt_dl_flash_update() redundant.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250917040839.1924698-2-michael.chan@broadcom.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Wed, 17 Sep 2025 00:28:13 +0000 (17:28 -0700)]
tls: make sure to abort the stream if headers are bogus
Normally we wait for the socket to buffer up the whole record
before we service it. If the socket has a tiny buffer, however,
we read out the data sooner, to prevent connection stalls.
Make sure that we abort the connection when we find out late
that the record is actually invalid. Retrying the parsing is
fine in itself but since we copy some more data each time
before we parse we can overflow the allocated skb space.
Constructing a scenario in which we're under pressure without
enough data in the socket to parse the length upfront is quite
hard. syzbot figured out a way to do this by serving us the header
in small OOB sends, and then filling in the recvbuf with a large
normal send.
Make sure that tls_rx_msg_size() aborts strp, if we reach
an invalid record there's really no way to recover.
Reported-by: Lee Jones <lee@kernel.org> Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250917002814.1743558-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
==================
add basic PSP encryption for TCP connections
This is v13 of the PSP RFC [1] posted by Jakub Kicinski one year
ago. General developments since v1 include a fork of packetdrill [2]
with support for PSP added, as well as some test cases, and an
implementation of PSP key exchange and connection upgrade [3]
integrated into the fbthrift RPC library. Both [2] and [3] have been
tested on server platforms with PSP-capable CX7 NICs. Below is the
cover letter from the original RFC:
Add support for PSP encryption of TCP connections.
PSP is a protocol out of Google:
https://github.com/google/psp/blob/main/doc/PSP_Arch_Spec.pdf
which shares some similarities with IPsec. I added some more info
in the first patch so I'll keep it short here.
The protocol can work in multiple modes including tunneling.
But I'm mostly interested in using it as TLS replacement because
of its superior offload characteristics. So this patch does three
things:
- it adds "core" PSP code
PSP is offload-centric, and requires some additional care and
feeding, so first chunk of the code exposes device info.
This part can be reused by PSP implementations in xfrm, tunneling etc.
- TCP integration TLS style
Reuse some of the existing concepts from TLS offload, such as
attaching crypto state to a socket, marking skbs as "decrypted",
egress validation. PSP does not prescribe key exchange protocols.
To use PSP as a more efficient TLS offload we intend to perform
a TLS handshake ("inline" in the same TCP connection) and negotiate
switching to PSP based on capabilities of both endpoints.
This is also why I'm not including a software implementation.
Nobody would use it in production, software TLS is faster,
it has larger crypto records.
- mlx5 implementation
That's mostly other people's work, not 100% sure those folks
consider it ready hence the RFC in the title. But it works :)
Not posted, queued a branch [4] are follow up pieces:
- standard stats
- netdevsim implementation and tests
Comments we intend to defer to future series:
- we prefer to keep the version field in the tx-assoc netlink
request, because it makes parsing keys require less state early
on, but we are willing to change in the next version of this
series.
- using a static branch to wrap psp_enqueue_set_decrypted() and
other functions called from tcp.
- using INDIRECT_CALL for tls/psp in sk_validate_xmit_skb(). We
prefer to address this in a dedicated patch series, so that this
series does not need to modify the way tls_validate_xmit_skb() is
declared and stubbed out.
Links: https://patch.msgid.link/20250917000954.859376-1-daniel.zahka@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
* add-basic-psp-encryption-for-tcp-connections:
net/mlx5e: Implement PSP key_rotate operation
net/mlx5e: Add Rx data path offload
psp: provide decapsulation and receive helper for drivers
net/mlx5e: Configure PSP Rx flow steering rules
net/mlx5e: Add PSP steering in local NIC RX
net/mlx5e: Implement PSP Tx data path
psp: provide encapsulation helper for drivers
net/mlx5e: Implement PSP operations .assoc_add and .assoc_del
net/mlx5e: Support PSP offload functionality
psp: track generations of device key
net: psp: update the TCP MSS to reflect PSP packet overhead
net: psp: add socket security association code
net: tcp: allow tcp_timewait_sock to validate skbs before handing to device
net: move sk_validate_xmit_skb() to net/core/dev.c
psp: add op for rotation of device key
tcp: add datapath logic for PSP with inline key exchange
net: modify core data structures for PSP datapath support
psp: base PSP device support
psp: add documentation
Raed Salem [Wed, 17 Sep 2025 00:09:46 +0000 (17:09 -0700)]
net/mlx5e: Implement PSP key_rotate operation
Implement .key_rotate operation where when invoked will cause the HW to use
a new master key to derive PSP spi/key pairs with complience with PSP spec.
Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250917000954.859376-20-daniel.zahka@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Raed Salem [Wed, 17 Sep 2025 00:09:45 +0000 (17:09 -0700)]
net/mlx5e: Add Rx data path offload
On receive flow inspect received packets for PSP offload indication using
the cqe, for PSP offloaded packets set SKB PSP metadata i.e spi, header
length and key generation number to stack for further processing.
Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250917000954.859376-19-daniel.zahka@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Raed Salem [Wed, 17 Sep 2025 00:09:44 +0000 (17:09 -0700)]
psp: provide decapsulation and receive helper for drivers
Create psp_dev_rcv(), which drivers can call to psp decapsulate and attach
a psp_skb_ext to an skb.
psp_dev_rcv() only supports what the PSP architecture specification
refers to as "transport mode" packets, where the L3 header is either
IPv6 or IPv4.
Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Co-developed-by: Daniel Zahka <daniel.zahka@gmail.com> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250917000954.859376-18-daniel.zahka@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Raed Salem [Wed, 17 Sep 2025 00:09:43 +0000 (17:09 -0700)]
net/mlx5e: Configure PSP Rx flow steering rules
Set the Rx PSP flow steering rule where PSP packet is identified and
decrypted using the dedicated UDP destination port number 1000. If packet
is decrypted then a PSP marker and syndrome are added to metadata so SW can
use it later on in Rx data path.
The rule is set as part of init_rx netdev profile implementation.
Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250917000954.859376-17-daniel.zahka@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Raed Salem [Wed, 17 Sep 2025 00:09:42 +0000 (17:09 -0700)]
net/mlx5e: Add PSP steering in local NIC RX
Introduce decrypt FT, the RX error FT, and the default rules.
The PSP (PSP) RX decrypt flow table is pointed by the TTC
(Traffic Type Classifier) UDP steering rules.
The decrypt flow table has two flow groups. The first flow group
keeps the decrypt steering rule programmed always when PSP packet is
recognized using the dedicated udp destination port number 1000, if
packet is decrypted then a PSP marker is set in metadata_regB[30].
The second flow group has a default rule to forward all non-offloaded
PSP packet to the TTC UDP default RSS TIR.
The RX error flow table is the destination of the decrypt steering rules in
the PSP RX decrypt flow table. It has two fixed rule one with single copy
action that copies psp_syndrome to metadata_regB[23:29]. The PSP marker
and syndrome is used to filter out non-psp packet and to return the PSP
crypto offload status in Rx flow. The marker is used to identify such
packet in driver so the driver could set SKB PSP metadata. The destination
of RX error flow table is the TTC UDP default RSS TIR. The second rule will
drop packets that failed to be decrypted (like in case illegal SPI or
expired SPI is used).
Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250917000954.859376-16-daniel.zahka@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>