Add a new getsockopt_iter callback to struct proto_ops that uses
sockopt_t, a type-safe wrapper around iov_iter. This provides a clean
interface for socket option operations that works with both user and
kernel buffers.
The sockopt_t type encapsulates an iov_iter and an optlen field.
The optlen field, although not suggested by Linus, serves as both input
(buffer size) and output (returned data size), allowing callbacks to
return random values independent of the bytes written via
copy_to_iter(), so, keep it separated from iov_iter.count.
This is preparatory work for removing the SOL_SOCKET level restriction
from io_uring getsockopt operations.
Keep in mind that both iter_out and iter_in always point to the same
data at all times, and we just have two of them to make the callback
implementation sane.
Jakub Kicinski [Mon, 13 Apr 2026 21:26:02 +0000 (14:26 -0700)]
Merge tag 'for-net-next-2026-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
core:
- hci_core: Rate limit the logging of invalid ISO handle
- hci_sync: make hci_cmd_sync_run_once return -EEXIST if exists
- hci_event: fix locking in hci_conn_request_evt() with HCI_PROTO_DEFER
- hci_event: fix potential UAF in SSP passkey handlers
- HCI: Avoid a couple -Wflex-array-member-not-at-end warnings
- L2CAP: CoC: Disconnect if received packet size exceeds MPS
- L2CAP: Add missing chan lock in l2cap_ecred_reconf_rsp
- L2CAP: Fix printing wrong information if SDU length exceeds MTU
- SCO: check for codecs->num_codecs == 1 before assigning to sco_pi(sk)->codec
drivers:
- btusb: MT7922: Add VID/PID 0489/e174
- btusb: Add Lite-On 04ca:3807 for MediaTek MT7921
- btusb: Add MT7927 IDs ASUS ROG Crosshair X870E Hero, Lenovo Legion Pro 7
16ARX9, Gigabyte Z790 AORUS MASTER X, MSI X870E Ace Max, TP-Link
Archer TBE550E, ASUS X870E / ProArt X870E-Creator.
- btusb: Add MT7902 IDs 13d3/3579, 13d3/3580, 13d3/3594, 13d3/3596, 0e8d/1ede
- btusb: Add MT7902 IDs 13d3/3579, 13d3/3580, 13d3/3594, 13d3/3596, 0e8d/1ede
- btusb: MediaTek MT7922: Add VID 0489 & PID e11d
- btintel: Add support for Scorpious Peak2 support
- btintel: Add support for Scorpious Peak2F support
- btintel_pcie: Add device id of Scorpius Peak2, Nova Lake-PCD-H
- btintel_pcie: Add device id of Scorpious2, Nova Lake-PCD-S
- btmtk: Add reset mechanism if downloading firmware failed
- btmtk: Add MT6639 (MT7927) Bluetooth support
- btmtk: fix ISO interface setup for single alt setting
- btmtk: add MT7902 SDIO support
- Bluetooth: btmtk: add MT7902 MCU support
- btbcm: Add entry for BCM4343A2 UART Bluetooth
- qca: enable pwrseq support for wcn39xx devices
- hci_qca: Fix BT not getting powered-off on rmmod
- hci_qca: disable power control for WCN7850 when bt_en is not defined
- hci_qca: Fix missing wakeup during SSR memdump handling
- hci_ldisc: Clear HCI_UART_PROTO_INIT on error
- mmc: sdio: add MediaTek MT7902 SDIO device ID
- hci_ll: Enable BROKEN_ENHANCED_SETUP_SYNC_CONN for WL183x
* tag 'for-net-next-2026-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (59 commits)
Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling
Bluetooth: btintel_pcie: use strscpy to copy plain strings
Bluetooth: hci_event: fix potential UAF in SSP passkey handlers
Bluetooth: hci.h: Avoid a couple -Wflex-array-member-not-at-end warnings
Bluetooth: SCO: check for codecs->num_codecs == 1 before assigning to sco_pi(sk)->codec
Bluetooth: btintel_pcie: Align shared DMA memory to 128 bytes
Bluetooth: l2cap: Add missing chan lock in l2cap_ecred_reconf_rsp
Bluetooth: hci_ll: Enable BROKEN_ENHANCED_SETUP_SYNC_CONN for WL183x
Bluetooth: btusb: MediaTek MT7922: Add VID 0489 & PID e11d
Bluetooth: btmtk: hide unused btmtk_mt6639_devs[] array
Bluetooth: btusb: Add MT7927 ID for ASUS X870E / ProArt X870E-Creator
Bluetooth: btusb: Add MT7927 ID for TP-Link Archer TBE550E
Bluetooth: btusb: Add MT7927 ID for MSI X870E Ace Max
Bluetooth: btusb: Add MT7927 ID for Gigabyte Z790 AORUS MASTER X
Bluetooth: btusb: Add MT7927 ID for Lenovo Legion Pro 7 16ARX9
Bluetooth: btusb: Add MT7927 ID for ASUS ROG Crosshair X870E Hero
Bluetooth: btmtk: fix ISO interface setup for single alt setting
Bluetooth: btmtk: Add MT6639 (MT7927) Bluetooth support
Bluetooth: fix locking in hci_conn_request_evt() with HCI_PROTO_DEFER
Bluetooth: btmtk: refactor endpoint lookup
...
====================
mlx4: correct error reporting in mlx4_master_process_vhcr()
mlx4_master_process_vhcr() logs vhcr->errno on failures, but this field
is never populated by the PF path. As a result, all failures are reported
with errno 0 and err print in status case which is misleading.
Use the actual return value (err) instead, translate it to FW status
before logging, and report both values.
Jakub Kicinski [Wed, 8 Apr 2026 00:14:38 +0000 (17:14 -0700)]
tcp: update window_clamp when SO_RCVBUF is set
Commit under Fixes moved recomputing the window clamp to
tcp_measure_rcv_mss() (when scaling_ratio changes).
I suspect it missed the fact that we don't recompute the clamp
when rcvbuf is set. Until scaling_ratio changes we are
stuck with the old window clamp which may be based on
the small initial buffer. scaling_ratio may never change.
Inspired by Eric's recent commit d1361840f8c5 ("tcp: fix
SO_RCVLOWAT and RCVBUF autotuning") plumb the user action
thru to TCP and have it update the clamp.
A smaller fix would be to just have tcp_rcvbuf_grow()
adjust the clamp even if SOCK_RCVBUF_LOCK is set.
But IIUC this is what we were trying to get away from
in the first place.
Fixes: a2cbb1603943 ("tcp: Update window clamping condition") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumaze@google.com> Link: https://patch.msgid.link/20260408001438.129165-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling
When a Bluetooth controller encounters a coredump, it triggers the
Subsystem Restart (SSR) mechanism. The controller first reports the
coredump data and, once the upload is complete, sends a hw_error
event. The host relies on this event to proceed with subsequent
recovery actions.
If the host has not finished processing the coredump data when the
hw_error event is received, it waits until either the processing is
complete or the 8-second timeout expires before handling the event.
The current implementation clears QCA_MEMDUMP_COLLECTION using
clear_bit(), which does not wake up waiters sleeping in
wait_on_bit_timeout(). As a result, the waiting thread may remain
blocked until the timeout expires even if the coredump collection
has already completed.
Fix this by clearing QCA_MEMDUMP_COLLECTION with
clear_and_wake_up_bit(), which also wakes up the waiting thread and
allows the hw_error handling to proceed immediately.
Test case:
- Trigger a controller coredump using:
hcitool cmd 0x3f 0c 26
- Tested on QCA6390.
- Capture HCI logs using btmon.
- Verify that the delay between receiving the hw_error event and
initiating the power-off sequence is reduced compared to the
timeout-based behavior.
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Shuai Zhang <shuai.zhang@oss.qualcomm.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: hci_event: fix potential UAF in SSP passkey handlers
hci_conn lookup and field access must be covered by hdev lock in
hci_user_passkey_notify_evt() and hci_keypress_notify_evt(), otherwise
the connection can be freed concurrently.
Extend the hci_dev_lock critical section to cover all conn usage in both
handlers.
Keep the existing keypress notification behavior unchanged by routing
the early exits through a common unlock path.
Fixes: 92a25256f142 ("Bluetooth: mgmt: Implement support for passkey notification") Cc: stable@vger.kernel.org Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: hci.h: Avoid a couple -Wflex-array-member-not-at-end warnings
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
struct hci_std_codecs and struct hci_std_codecs_v2 are flexible
structures, this is structures that contain a flexible-array member
(__u8 codec[]; and struct hci_std_codec_v2 codec[];, correspondingly.)
Since struct hci_rp_read_local_supported_codecs and struct
hci_rp_read_local_supported_codecs_v2 are defined by hardware, we
create the new struct hci_std_codecs_hdr and struct hci_std_codecs_v2_hdr
types, and use them to replace the object types causing trouble in
struct hci_rp_read_local_supported_codecs and struct
hci_rp_read_local_supported_codecs_v2, namely struct hci_std_codecs
std_codecs; and struct hci_std_codecs_v2_hdr std_codecs;.
Also, once -fms-extensions is enabled, we can use transparent struct
members in both struct hci_std_codecs and struct hci_std_codecs_v2_hdr.
Notice that the newly created types does not contain the flex-array
member `codec`, which is the object causing the -Wfamnae warnings.
After these changes, the size of struct hci_rp_read_local_supported_codecs
and struct hci_rp_read_local_supported_codecs_v2, along with their
member's offsets remain the same, hence the memory layouts don't
change:
include/net/bluetooth/hci.h:1490:31: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
include/net/bluetooth/hci.h:1525:34: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: SCO: check for codecs->num_codecs == 1 before assigning to sco_pi(sk)->codec
copy_struct_from_sockptr() fill 'buffer' in
sco_sock_setsockopt() with zeros, so there's no
real problem.
But it actually looks strange to do this,
without checking all of codecs->codecs[0]
really comes from userspace:
sco_pi(sk)->codec = codecs->codecs[0];
As only optlen < sizeof(struct bt_codecs) is checked
and codecs->num_codecs is not checked against != 1,
but only <= 1, and the space for the additional struct bt_codec
is not checked.
Note I don't understand bluetooth and I didn't do any runtime
tests with this! I just found it when debugging a problem
in copy_struct_from_sockptr().
I just added this to check the size is as expected:
make CF=-D__CHECK_ENDIAN__ W=1ce C=1 net/bluetooth/sco.o
Fixes: 3e643e4efa1e ("Bluetooth: Improve setsockopt() handling of malformed user input") Cc: Michal Luczaj <mhal@rbox.co> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: David Wei <dw@davidwei.uk> Cc: linux-bluetooth@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Kiran K [Tue, 7 Apr 2026 10:02:40 +0000 (15:32 +0530)]
Bluetooth: btintel_pcie: Align shared DMA memory to 128 bytes
Align each descriptor/index/context region to 128 bytes before
calculating the total DMA pool size. This ensures the memory layout
shared with firmware meets the 128-byte alignment requirement.
The DMA pool alignment is also set to 128 bytes to match the firmware
expectation for all shared structures.
Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Dudu Lu [Sun, 5 Apr 2026 15:47:41 +0000 (23:47 +0800)]
Bluetooth: l2cap: Add missing chan lock in l2cap_ecred_reconf_rsp
l2cap_ecred_reconf_rsp() calls l2cap_chan_del() without holding
l2cap_chan_lock(). Every other l2cap_chan_del() caller in the file
acquires the lock first. A remote BLE device can send a crafted
L2CAP ECRED reconfiguration response to corrupt the channel list
while another thread is iterating it.
Add l2cap_chan_hold() and l2cap_chan_lock() before l2cap_chan_del(),
and l2cap_chan_unlock() and l2cap_chan_put() after, matching the
pattern used in l2cap_ecred_conn_rsp() and l2cap_conn_del().
Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Signed-off-by: Dudu Lu <phx0fer@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: hci_ll: Enable BROKEN_ENHANCED_SETUP_SYNC_CONN for WL183x
TI WL183x controllers advertise support for the HCI Enhanced Setup
Synchronous Connection command, but SCO setup fails when the enhanced
path is used. The only working configuration is to fall back to the
legacy HCI Setup Synchronous Connection (0x0028).
This matches the scenario described in commit 05abad857277
("Bluetooth: HCI: Add HCI_QUIRK_BROKEN_ENHANCED_SETUP_SYNC_CONN quirk").
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Kamiyama Chiaki <nercone@nercone.dev> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
When USB support is disabled, the array is not referenced anywhere,
causing a warning:
drivers/bluetooth/btmtk.c:35:3: error: 'btmtk_mt6639_devs' defined but not used [-Werror=unused-const-variable=]
35 | } btmtk_mt6639_devs[] = {
| ^~~~~~~~~~~~~~~~~
Move it into the #ifdef block.
Fixes: 28b7c5a6db74 ("Bluetooth: btmtk: Add MT6639 (MT7927) Bluetooth support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Javier Tia [Mon, 30 Mar 2026 20:39:30 +0000 (14:39 -0600)]
Bluetooth: btusb: Add MT7927 ID for ASUS X870E / ProArt X870E-Creator
Add USB device ID 13d3:3588 (IMC Networks/Azurewave) for the MediaTek
MT7927 (Filogic 380) Bluetooth interface found on the ASUS ROG STRIX
X870E-E GAMING WIFI and ASUS ProArt X870E-Creator WiFi motherboards.
The information in /sys/kernel/debug/usb/devices about the Bluetooth
device is listed as the below.
Javier Tia [Mon, 30 Mar 2026 20:39:29 +0000 (14:39 -0600)]
Bluetooth: btusb: Add MT7927 ID for TP-Link Archer TBE550E
Add USB device ID 0489:e116 (Foxconn/Hon Hai) for the MediaTek MT7927
(Filogic 380) Bluetooth interface found on the TP-Link Archer TBE550E
PCIe adapter.
The information in /sys/kernel/debug/usb/devices about the Bluetooth
device is listed as the below.
Javier Tia [Mon, 30 Mar 2026 20:39:27 +0000 (14:39 -0600)]
Bluetooth: btusb: Add MT7927 ID for Gigabyte Z790 AORUS MASTER X
Add USB device ID 0489:e10f (Foxconn/Hon Hai) for the MediaTek MT7927
(Filogic 380) Bluetooth interface found on the Gigabyte Z790 AORUS
MASTER X motherboard.
The information in /sys/kernel/debug/usb/devices about the Bluetooth
device is listed as the below.
Javier Tia [Mon, 30 Mar 2026 20:39:26 +0000 (14:39 -0600)]
Bluetooth: btusb: Add MT7927 ID for Lenovo Legion Pro 7 16ARX9
Add USB device ID 0489:e0fa (Foxconn/Hon Hai) for the MediaTek MT7927
(Filogic 380) Bluetooth interface found on the Lenovo Legion Pro 7
16ARX9 laptop.
The information in /sys/kernel/debug/usb/devices about the Bluetooth
device is listed as the below.
Javier Tia [Mon, 30 Mar 2026 20:39:25 +0000 (14:39 -0600)]
Bluetooth: btusb: Add MT7927 ID for ASUS ROG Crosshair X870E Hero
Add USB device ID 0489:e13a (Foxconn/Hon Hai) for the MediaTek MT7927
(Filogic 380) Bluetooth interface found on the ASUS ROG Crosshair X870E
Hero WiFi motherboard.
The information in /sys/kernel/debug/usb/devices about the Bluetooth
device is listed as the below.
Javier Tia [Mon, 30 Mar 2026 20:39:24 +0000 (14:39 -0600)]
Bluetooth: btmtk: fix ISO interface setup for single alt setting
Some MT6639 Bluetooth USB interfaces (e.g. IMC Networks 13d3:3588 on
ASUS ROG STRIX X870E-E and ProArt X870E-Creator boards) expose only a
single alternate setting (alt 0) on the ISO interface. The driver
unconditionally requests alt setting 1, which fails with EINVAL on
these devices, causing a ~20 second initialization delay and no LE
audio support.
Check the number of available alternate settings before selecting one.
If only alt 0 exists, use it; otherwise request alt 1 as before.
Closes: https://github.com/jetm/mediatek-mt7927-dkms/pull/39 Signed-off-by: Javier Tia <floss@jetm.me> Reported-by: Ryan Gilbert <xelnaga@gmail.com> Tested-by: Ryan Gilbert <xelnaga@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Javier Tia [Mon, 30 Mar 2026 20:39:23 +0000 (14:39 -0600)]
Bluetooth: btmtk: Add MT6639 (MT7927) Bluetooth support
The MediaTek MT7927 (Filogic 380) combo WiFi 7 + BT 5.4 module uses
hardware variant 0x6639 for its Bluetooth subsystem. Without this patch,
the chip fails with "Unsupported hardware variant (00006639)" or hangs
during firmware download.
Three changes are needed to support MT6639:
1. CHIPID workaround: On some boards the BT USB MMIO register reads
0x0000 for dev_id, causing the driver to skip the 0x6639 init path.
Force dev_id to 0x6639 only when the USB VID/PID matches a known
MT6639 device, avoiding misdetection if a future chip also reads
zero. This follows the WiFi-side pattern that uses PCI device IDs
to scope the same workaround.
2. Firmware naming: MT6639 uses firmware version prefix "2_1" instead of
"1_1" used by MT7925 and other variants. The firmware path is
mediatek/mt7927/BT_RAM_CODE_MT6639_2_1_hdr.bin, using the mt7927
directory to match the WiFi firmware convention. The filename will
likely change to use MT7927 once MediaTek submits a dedicated
Linux firmware binary.
3. Section filtering: The MT6639 firmware binary contains 9 sections, but
only sections with (dlmodecrctype & 0xff) == 0x01 are Bluetooth-related.
Sending the remaining WiFi/other sections causes an irreversible BT
subsystem hang requiring a full power cycle. This matches the Windows
driver behavior observed via USB captures.
Also add 0x6639 to the reset register (CONNV3) and firmware setup switch
cases alongside the existing 0x7925 handling.
Pauli Virtanen [Sun, 29 Mar 2026 13:42:59 +0000 (16:42 +0300)]
Bluetooth: fix locking in hci_conn_request_evt() with HCI_PROTO_DEFER
When protocol sets HCI_PROTO_DEFER, hci_conn_request_evt() calls
hci_connect_cfm(conn) without hdev->lock. Generally hci_connect_cfm()
assumes it is held, and if conn is deleted concurrently -> UAF.
Only SCO and ISO set HCI_PROTO_DEFER and only for defer setup listen,
and HCI_EV_CONN_REQUEST is not generated for ISO. In the non-deferred
listening socket code paths, hci_connect_cfm(conn) is called with
hdev->lock held.
Fix by holding the lock.
Fixes: 70c464256310 ("Bluetooth: Refactor connection request handling") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: hci_ldisc: Clear HCI_UART_PROTO_INIT on error
When hci_register_dev() fails in hci_uart_register_dev()
HCI_UART_PROTO_INIT is not cleared before calling hu->proto->close(hu)
and setting hu->hdev to NULL. This means incoming UART data will reach
the protocol-specific recv handler in hci_uart_tty_receive() after
resources are freed.
Clear HCI_UART_PROTO_INIT with a write lock before calling
hu->proto->close() and setting hu->hdev to NULL. The write lock ensures
all active readers have completed and no new reader can enter the
protocol recv path before resources are freed.
This allows the protocol-specific recv functions to remove the
"HCI_UART_REGISTERED" guard without risking a null pointer dereference
if hci_register_dev() fails.
Fixes: 5df5dafc171b ("Bluetooth: hci_uart: Fix another race during initialization") Signed-off-by: Jonathan Rissanen <jonathan.rissanen@axis.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Pauli Virtanen [Wed, 25 Mar 2026 19:07:45 +0000 (21:07 +0200)]
Bluetooth: hci_sync: make hci_cmd_sync_run_once return -EEXIST if exists
hci_cmd_sync_run_once() needs to indicate whether a queue item was
added, so caller can know if callbacks are called, so it can avoid
leaking resources.
Change the function to return -EEXIST if queue item already exists.
Modify all callsites vs. the changes. The only callsite is
hci_abort_conn().
Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Shuai Zhang [Tue, 24 Mar 2026 02:30:16 +0000 (10:30 +0800)]
Bluetooth: hci_qca: disable power control for WCN7850 when bt_en is not defined
On platforms using an M.2 slot with both UART and USB support, bt_en is
pulled high by hardware. In this case, software-based power control
should be disabled. The current platforms are Lemans-EVK and Monaco-EVK.
Add QCA_WCN7850 to the existing condition so that power_ctrl_enabled is
cleared when bt_en is not software-controlled (or absent), aligning its
behavior with WCN6750 and WCN6855
Signed-off-by: Shuai Zhang <shuai.zhang@oss.qualcomm.com> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
====================
Decouple receive and transmit enablement in team driver
Allow independent control over receive and transmit enablement states
for aggregated ports in the team driver.
The motivation is that IEE 802.3ad LACP "independent control" can't
be implemented for the team driver currently. This was added to the
bonding driver in commit 240fd405528b ("bonding: Add independent
control state machine").
This series also has a few patches that add tests to show that the old
coupled enablement still works and that the new decoupled enablement
works as intended (4, 5, and 10).
There are three patches with small fixes as well, with the goal of
making the final decoupling patch clearer (1, 2, and 3).
Signed-off-by: Marc Harvey <marcharvey@google.com>
====================
Marc Harvey [Thu, 9 Apr 2026 02:59:31 +0000 (02:59 +0000)]
net: team: Add new tx_enabled team port option
This option allows independent control over tx enablement without
affecting rx enablement. Like the rx_enabled option, this also
implicitly affects the enabled option.
If this option is not used, then the enabled option will continue to
behave as it did before.
Marc Harvey [Thu, 9 Apr 2026 02:59:30 +0000 (02:59 +0000)]
net: team: Add new rx_enabled team port option
Allow independent control over rx enablement via the rx_enabled option
without affecting tx enablement. This affects the normal enabled
option since a port is only considered enabled if both tx and rx are
enabled.
If this option is not used, then the enabled option will continue to
behave exactly as it did before.
Marc Harvey [Thu, 9 Apr 2026 02:59:29 +0000 (02:59 +0000)]
net: team: Track rx enablement separately from tx enablement
Separate the rx and tx enablement/disablement into different
functions so that it is easier to interact with them independently
later.
Although this patch changes receive and transmit paths, the actual
behavior of the teaming driver should remain unchanged, since there
is no option introduced yet to change rx or tx enablement
independently. Those options will be added in follow-up patches.
Marc Harvey [Thu, 9 Apr 2026 02:59:28 +0000 (02:59 +0000)]
net: team: Rename enablement functions and struct members to tx
Add no functional changes, but rename enablement functions, variables
etc. that are used in teaming driver transmit decisions.
Since rx and tx enablement are still coupled, some of the variables
renamed in this patch are still used for the rx path, but that will
change in a follow-up patch.
Marc Harvey [Thu, 9 Apr 2026 02:59:27 +0000 (02:59 +0000)]
selftests: net: Add test for enablement of ports with teamd
There are no tests that verify enablement and disablement of team driver
ports with teamd. This should work even with changes to the enablement
option, so it is important to test.
This test sets up an active-backup network configuration across two
network namespaces, and tries to send traffic while changing which
link is the active one.
Also increase the team test timeout to 300 seconds, because gracefully
killing teamd can take 30 seconds for each instance.
Marc Harvey [Thu, 9 Apr 2026 02:59:26 +0000 (02:59 +0000)]
selftests: net: Add tests for failover of team-aggregated ports
There are currently no kernel tests that verify the effect of setting
the enabled team driver option. In a followup patch, there will be
changes to this option, so it will be important to make sure it still
behaves as it does now.
The test verifies that tcp continues to work across two different team
devices in separate network namespaces, even when member links are
manually disabled.
====================
net: fix skb_ext BUILD_BUG_ON failures with GCOV
This mini-series fixes build failures in net/core/skbuff.c when the
kernel is built with CONFIG_GCOV_PROFILE_ALL=y.
This is part of a larger effort to add -fprofile-update=atomic to
global CFLAGS_GCOV (posted earlier as a combined series):
https://lore.kernel.org/lkml/20260401142020.1434243-1-khorenko@virtuozzo.com/T/#t
That combined series was split per subsystem as requested by Jakub.
The companion patches are:
- iommu: use __always_inline for amdv1pt_install_leaf_entry()
(sent to iommu maintainers)
- gcov: add -fprofile-update=atomic globally (sent to gcov/kbuild
maintainers, depends on this series and the iommu patch)
Patch 1/2 fixes a pre-existing build failure with CONFIG_GCOV_PROFILE_ALL:
GCOV counters prevent GCC from constant-folding the skb_ext_total_length()
loop. It also removes the CONFIG_KCOV_INSTRUMENT_ALL preprocessor guard
from d6e5794b06c0: that guard was a precaution in case KCOV instrumentation
also prevented constant folding, but KCOV's -fsanitize-coverage=trace-pc
does not interfere with GCC's constant folding (verified experimentally
with GCC 14.2 and GCC 16.0.1), so the guard is unnecessary.
Patch 2/2 is an additional fix needed when -fprofile-update=atomic is
added to CFLAGS_GCOV: __no_profile on the __always_inline function alone
is insufficient because after inlining, the code resides in the caller's
profiled body. The caller (skb_extensions_init) needs __no_profile and
noinline to prevent re-exposure to GCOV instrumentation.
====================
net: add noinline __init __no_profile to skb_extensions_init() for GCOV compatibility
With -fprofile-update=atomic in global CFLAGS_GCOV, GCC still cannot
constant-fold the skb_ext_total_length() loop when it is inlined into a
profiled caller. The existing __no_profile on skb_ext_total_length()
itself is insufficient because after __always_inline expansion the code
resides in the caller's body, which still carries GCOV instrumentation.
Mark skb_extensions_init() with __no_profile so the BUILD_BUG_ON checks
can be evaluated at compile time. Also mark it noinline to prevent the
compiler from inlining it into skb_init() (which lacks __no_profile),
which would re-expose the function body to GCOV instrumentation.
Add __init since skb_extensions_init() is only called from __init
skb_init(). Previously it was implicitly inlined into the .init.text
section; with noinline it would otherwise remain in permanent .text,
wasting memory after boot.
Build-tested with both CONFIG_GCOV_PROFILE_ALL=y and
CONFIG_KCOV_INSTRUMENT_ALL=y.
net: fix skb_ext_total_length() BUILD_BUG_ON with CONFIG_GCOV_PROFILE_ALL
When CONFIG_GCOV_PROFILE_ALL=y is enabled, the kernel fails to build:
In file included from <command-line>:
In function 'skb_extensions_init',
inlined from 'skb_init' at net/core/skbuff.c:5214:2:
././include/linux/compiler_types.h:706:45: error: call to
'__compiletime_assert_1490' declared with attribute error:
BUILD_BUG_ON failed: skb_ext_total_length() > 255
CONFIG_GCOV_PROFILE_ALL adds -fprofile-arcs -ftest-coverage
-fno-tree-loop-im to CFLAGS globally. GCC inserts branch profiling
counters into the skb_ext_total_length() loop and, combined with
-fno-tree-loop-im (which disables loop invariant motion), cannot
constant-fold the result.
BUILD_BUG_ON requires a compile-time constant and fails.
The issue manifests in kernels with 5+ SKB extension types enabled
(e.g., after addition of SKB_EXT_CAN, SKB_EXT_PSP). With 4 extensions
GCC can still unroll and fold the loop despite GCOV instrumentation;
with 5+ it gives up.
Mark skb_ext_total_length() with __no_profile to prevent GCOV from
inserting counters into this function. Without counters the loop is
"clean" and GCC can constant-fold it even with -fno-tree-loop-im active.
This allows BUILD_BUG_ON to work correctly while keeping GCOV profiling
for the rest of the kernel.
This also removes the CONFIG_KCOV_INSTRUMENT_ALL preprocessor guard
introduced by d6e5794b06c0. That guard was added as a precaution because
KCOV instrumentation was also suspected of inhibiting constant folding.
However, KCOV uses -fsanitize-coverage=trace-pc, which inserts
lightweight trace callbacks that do not interfere with GCC's constant
folding or loop optimization passes. Only GCOV's -fprofile-arcs combined
with -fno-tree-loop-im actually prevents the compiler from evaluating
the loop at compile time. The guard is therefore unnecessary and can be
safely removed.
Fixes: 96ea3a1e2d31 ("can: add CAN skb extension infrastructure") Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Reviewed-by: Thomas Weissschuh <linux@weissschuh.net> Link: https://patch.msgid.link/20260410162150.3105738-2-khorenko@virtuozzo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Qingfang Deng [Fri, 10 Apr 2026 05:49:49 +0000 (13:49 +0800)]
pppox: remove sk_pppox() helper
The sk member can be directly accessed from struct pppox_sock without
relying on type casting. Remove the sk_pppox() helper and update all
call sites to use po->sk directly.
Jakub Kicinski [Sun, 12 Apr 2026 21:34:27 +0000 (14:34 -0700)]
Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Tariq Toukan says:
====================
mlx5-next updates 2026-04-09
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Add icm_mng_function_id_mode cap bit
net/mlx5: Rename MLX5_PF page counter type to MLX5_SELF
net/mlx5: Add vhca_id_type bit to alias context
mlx5: Remove redundant iseg base
====================
====================
Add support for PIC64-HPSC/HX MDIO controller
This series adds a driver for the two MDIO controllers of PIC64-HPSC/HX.
The hardware supports C22 and C45 but only C22 is implemented for now.
This MDIO hardware is based on a Microsemi design supported in Linux by
mdio-mscc-miim.c. However, The register interface is completely different
with pic64hpsc, hence the need for a separate driver.
The documentation recommends an input clock of 156.25MHz and a prescaler of
39, which yields an MDIO clock of 1.95MHz.
This was tested on Microchip HB1301 evalkit which has a VSC8574 and a
VSC8541. I've tested with bus frequencies of 0.6, 1.95 and 2.5 MHz.
This series also adds a PHY write barrier when disabling PHY interrupts as
discussed in: https://lore.kernel.org/acvUqDgepCIScs8M@shell.armlinux.org.uk
====================
Charles Perry [Wed, 8 Apr 2026 13:18:16 +0000 (06:18 -0700)]
net: phy: add a PHY write barrier when disabling interrupts
MDIO bus controllers are not required to wait for write transactions to
complete before returning as synchronization is often achieved by polling
status bits.
This can cause issues when disabling interrupts since an interrupt could
fire before the interrupt handler is unregistered and there's no status
bit to poll.
Add a phy_write_barrier() function and use it in phy_disable_interrupts()
to fix this issue. The write barrier just reads an MII register and
discards the value, which is enough to guarantee that previous writes have
completed.
Charles Perry [Wed, 8 Apr 2026 13:18:15 +0000 (06:18 -0700)]
net: mdio: add a driver for PIC64-HPSC/HX MDIO controller
This adds an MDIO driver for PIC64-HPSC/HX. The hardware supports C22
and C45 but only C22 is implemented in this commit.
This MDIO hardware is based on a Microsemi design supported in Linux by
mdio-mscc-miim.c. However, The register interface is completely
different with pic64hpsc, hence the need for a separate driver.
The documentation recommends an input clock of 156.25MHz and a prescaler
of 39, which yields an MDIO clock of 1.95MHz.
The hardware supports an interrupt pin or a "TRIGGER" bit that can be
polled to signal transaction completion. This commit uses polling.
This was tested on Microchip HB1301 evalkit with a VSC8574 and a
VSC8541.
This MDIO hardware is based on a Microsemi design supported in Linux by
mdio-mscc-miim.c. However, The register interface is completely different
with pic64hpsc, hence the need for separate documentation.
The hardware supports C22 and C45.
The documentation recommends an input clock of 156.25MHz and a prescaler
of 39, which yields an MDIO clock of 1.95MHz.
The hardware supports an interrupt pin to signal transaction completion
which is not strictly needed as the software can also poll a "TRIGGER"
bit for this.
====================
net: enetc: improve statistics for v1 and add statistics for v4
For ENETC v1, some standardized statistics were redundantly included in
the unstructured statistics, so remove these duplicated entries.
Previously, the unstructured statistics only contained eMAC data and
did not include pMAC data; add pMAC statistics to ensure completeness.
For ENETC v4, the driver previously reported MAC statistics only for the
internal ENETC (Pseudo MAC). Extend the implementation to provide
additional statistics for both the internal ENETC and the standalone
ENETC.
====================
net: enetc: add unstructured pMAC counters for ENETC v1
The ENETC v1 has two MACs (eMAC and pMAC) to support preemption. The
existing unstructured counters include the eMAC counters, but not the
pMAC counters. So add pMAC counters to improve statistical coverage.
net: enetc: remove standardized counters from enetc_pm_counters
The standardized counters are already exposed via the get_pause_stats(),
get_rmon_stats(), get_eth_ctrl_stats() and get_eth_mac_stats()
interfaces. Keeping the same counters in enetc_pm_counters results in
redundant output.
Remove these standardized counters from enetc_pm_counters and rely on
the existing statistics interfaces to report them.
net: enetc: show RX drop counters only for assigned RX rings
For ENETC v1, each SI provides 16 RBDCR registers for RX ring drop
counters, but this does not imply that an SI actually owns 16 RX rings.
The ENETC hardware supports a total of 16 RX rings, which are assigned
to 3 SIs (1 PSI and 2 VSIs), so each SI is assigned fewer than 16 RX
rings.
The current implementation always reports 16 RX drop counters per SI,
leading to redundant output for SIs with fewer RX rings. Update the
logic to display drop counters only for the RX rings that are actually
assigned to the SI.
net: enetc: add support for the standardized counters
ENETC v4 provides 64-bit counters for IEEE 802.3 basic and mandatory
managed objects, the IETF Management Information Database (MIB) package
(RFC2665), and Remote Network Monitoring (RMON) statistics. In addition,
some ENETCs support preemption, so these ENETCs have two MACs: MAC 0 is
the express MAC (eMAC), MAC 1 is the preemptible MAC (pMAC). Both MACs
support these statistics.
Gal Pressman [Thu, 9 Apr 2026 09:09:45 +0000 (12:09 +0300)]
gre: Count GRE packet drops
GRE is silently dropping packets without updating statistics.
In case of drop, increment rx_dropped counter to provide visibility into
packet loss. For the case where no GRE protocol handler is registered,
use rx_nohandler.
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20260409090945.1542440-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: phy: add support for disabling autonomous EEE
Some PHYs implement autonomous EEE where the PHY manages EEE
independently, preventing the MAC from controlling LPI signaling.
This conflicts with MACs that implement their own LPI control.
This series adds a .disable_autonomous_eee callback to struct phy_driver
and calls it from phy_support_eee(). When a MAC indicates it supports
EEE, the PHY's autonomous EEE is automatically disabled. The setting is
persisted across suspend/resume by re-applying it in phy_init_hw() after
soft reset, following the same pattern suggested by Russell King for PHY
tunables [1].
Patch 1 adds the phylib infrastructure.
Patch 2 implements it for Broadcom BCM54xx (AutogrEEEn).
Patch 3 converts the Realtek RTL8211F, which previously unconditionally
disabled PHY-mode EEE in config_init.
This came up while adding EEE support to the Cadence macb driver (used
on Raspberry Pi 5 with a BCM54210PE PHY). The PHY's AutogrEEEn mode
prevented the MAC from tracking LPI state. The Realtek RTL8211F has
the same pattern, unconditionally disabling PHY-mode EEE with the
comment "Disable PHY-mode EEE so LPI is passed to the MAC".
Other BCM54xx PHYs likely have the same AutogrEEEn register layout,
but I only have access to the BCM54210PE/BCM54213PE datasheets. It
would be appreciated if Florian or others could confirm which other
BCM54xx variants share this register so we can wire them up too.
Tested on Raspberry Pi CM4 (bcmgenet + BCM54210PE),
Raspberry Pi CM5 (Cadence GEM + BCM54210PE) and
Raspberry Pi 5 (Cadence GEM + BCM54213PE).
net: phy: realtek: convert RTL8211F to .disable_autonomous_eee
The RTL8211F previously unconditionally disabled PHY-mode EEE in
config_init. Convert this to use the new .disable_autonomous_eee
callback so it is only disabled when the MAC indicates EEE support
via phy_support_eee().
This preserves PHY-autonomous EEE for MACs that do not support EEE,
while still disabling it when the MAC manages LPI.
net: phy: add support for disabling PHY-autonomous EEE
Some PHYs (e.g. Broadcom BCM54xx, Realtek RTL8211F) implement
autonomous EEE where the PHY manages LPI signaling without forwarding
it to the MAC. This conflicts with MAC drivers that implement their own
LPI control.
Add a .disable_autonomous_eee callback to struct phy_driver and call it
from phy_support_eee(). When a MAC driver indicates it supports EEE via
phy_support_eee(), the PHY's autonomous EEE is automatically disabled so
the MAC can manage LPI entry/exit.
====================
ynl/ethtool/netlink: fix nla_len overflow for large string sets
This series addresses a silent data corruption issue triggered when ynl
retrieves string sets from NICs with a large number of statistics entries
(e.g. mlx5_core with thousands of ETH_SS_STATS strings).
The root cause is that struct nlattr.nla_len is a __u16 (max 65535
bytes). When a NIC exports enough statistics strings, the
ETHTOOL_A_STRINGSET_STRINGS nest built by strset_fill_set() exceeds
this limit. nla_nest_end() silently truncates the length on assignment,
producing a corrupted netlink message.
Patch 1 moves ethtool.py to selftest.
Patch 2 improves the ethtool tool: rename the doit/dumpit helpers
to do_set/do_get and convert do_get to use ynl.do() with an
explicit device header instead of a full dump with client-side filtering.
Patch 3 adds a --dbg-small-recv option to the YNL ethtool tool,
matching the same option already present in cli.py, to help debug netlink
message size issues
Patch 4 adds a new helper nla_nest_end_safe() to check whether the nla_len
is overflow and return -EMSGSIZE early if so.
Patch 5 uses the new helper in ethtool to make sure the ethtool doesn't
reply a corrupted netlink message.
====================
Hangbin Liu [Wed, 8 Apr 2026 07:08:53 +0000 (15:08 +0800)]
ethtool: strset: check nla_len overflow
The netlink attribute length field nla_len is a __u16, which can only
represent values up to 65535 bytes. NICs with a large number of
statistics strings (e.g. mlx5_core with thousands of ETH_SS_STATS
entries) can produce a ETHTOOL_A_STRINGSET_STRINGS nest that exceeds
this limit.
When nla_nest_end() writes the actual nest size back to nla_len, the
value is silently truncated. This results in a corrupted netlink message
being sent to userspace: the parser reads a wrong (truncated) attribute
length and misaligns all subsequent attribute boundaries, causing decode
errors.
Fix this by using the new helper nla_nest_end_safe and error out if
the size exceeds U16_MAX.
Hangbin Liu [Wed, 8 Apr 2026 07:08:52 +0000 (15:08 +0800)]
netlink: add a nla_nest_end_safe() helper
The nla_len field in struct nlattr is a __u16, which can only hold
values up to 65535. If a nested attribute grows beyond this limit,
nla_nest_end() silently truncates the length, producing a corrupted
netlink message with no indication of the problem.
Since nla_nest_end() is used everywhere and this issue rarely happens,
let's add a new helper to check the length.
Hangbin Liu [Wed, 8 Apr 2026 07:08:51 +0000 (15:08 +0800)]
tools: ynl: ethtool: add --dbg-small-recv option
Add a --dbg-small-recv debug option to control the recv() buffer size
used by YNL, matching the same option already present in cli.py. This
is useful if user need to get large netlink message.
Hangbin Liu [Wed, 8 Apr 2026 07:08:50 +0000 (15:08 +0800)]
tools: ynl: ethtool: use doit instead of dumpit for per-device GET
Rename the local helper doit() to do_set() and dumpit() to do_get() to
better reflect their purpose.
Convert do_get() to use ynl.do() with an explicit device header instead
of ynl.dump() followed by client-side filtering. This is more efficient
as the kernel only processes and returns data for the requested device,
rather than dumping all devices across the netns.
====================
bng_en: add link management and statistics support
This series enhances the bng_en driver by adding:
1. Link/PHY support
a. Link query
b. Async Link events
c. Ethtool link set/get functionality
2. Hardware statistics reporting via ethtool -S
This version incorporates feedback received prior to splitting the
original series into two parts.
====================
Implement the legacy ethtool statistics interface (get_sset_count,
get_strings, get_ethtool_stats) to expose hardware counters not
available through standard kernel stats APIs.
Implement netdev_stat_ops to provide standardized per-queue
statistics via the Netlink API.
Below is the description of the hardware drop counters:
rx-hw-drop-overruns: Packets dropped by HW due to resource limitations
(e.g., no BDs available in the host ring).
rx-hw-drops: Total packets dropped by HW (sum of overruns and error
drops).
tx-hw-drop-errors: Packets dropped by HW because they were invalid or
malformed.
tx-hw-drops: Total packets dropped by HW (sum of resource limitations
and error drops).
The implementation was verified using the ynl tool:
Implement the ndo_get_stats64 callback to report aggregate network
statistics. The driver gathers these by accumulating the per-ring
counters into the provided rtnl_link_stats64 structure.
bng_en: periodically fetch and accumulate hardware statistics
Use the timer to schedule periodic stats collection via
the workqueue when the link is up. Fetch fresh counters from
hardware via DMA and accumulate them into 64-bit software
shadows, handling wrap-around for counters narrower than
64 bits.
bng_en: add HW stats infra and structured ethtool ops
Implement the hardware-level statistics foundation and modern structured
ethtool operations.
1. Infrastructure: Add HWRM firmware wrappers (FUNC_QSTATS_EXT,
PORT_QSTATS_EXT, and PORT_QSTATS) to query ring and port counters.
2. Structured ops: Implement .get_eth_phy_stats, .get_eth_mac_stats,
.get_eth_ctrl_stats, .get_pause_stats, and .get_rmon_stats.
Stats are initially reported as 0; accumulation logic is added
in a subsequent patch.
Register for firmware asynchronous events, including link-status,
link-speed, and PHY configuration changes. Upon event reception,
re-query the PHY and update ethtool settings accordingly.
Implement .get_pauseparam and .set_pauseparam to support flow control
configuration. This allows reporting and setting of autoneg, RX pause,
and TX pause states.
bng_en: add ethtool link settings, get_link, and nway_reset
Add get/set_link_ksettings, get_link, and nway_reset support.
Report supported, advertised, and link-partner speeds across NRZ,
PAM4, and PAM4-112 signaling modes. Enable lane count reporting.
bng_en: query PHY capabilities and report link status
Query PHY capabilities and supported speeds from firmware,
retrieve current link state (speed, duplex, pause, FEC),
and log the information. Seed initial link state during probe.
bng_en: add per-PF workqueue, timer, and slow-path task
Add a dedicated single-thread workqueue and a timer for each PF
to drive deferred slow-path work such as link event handling and
stats collection. The timer is stopped via timer_delete_sync()
when interrupts are disabled and restarted on open.
While the close path stops the timer to prevent new tasks from
being scheduled, the sp_task and workqueue are preserved to
maintain state continuity. Final draining and destruction of
the workqueue are handled during PCI remove.
====================
Add TSO map-once DMA helpers and bnxt SW USO support
Greetings:
This series extends net/tso to add a data structure and some helpers allowing
drivers to DMA map headers and packet payloads a single time. The helpers can
then be used to reference slices of shared mapping for each segment. This
helps to avoid the cost of repeated DMA mappings, especially on systems which
use an IOMMU. N per-packet DMA maps are replaced with a single map for the
entire GSO skb. As of v3, the series uses the DMA IOVA API (as suggested by
Leon [1]) and provides a fallback path when an IOMMU is not in use. The DMA
IOVA API provides even better efficiency than the v2; see below.
The added helpers are then used in bnxt to add support for software UDP
Segmentation Offloading (SW USO) for older bnxt devices which do not have
support for USO in hardware. Since the helpers are generic, other drivers
can be extended similarly.
The v2 showed a ~4x reduction in DMA mapping calls at the same wire packet
rate on production traffic with a bnxt device. The v3, however, shows a larger
reduction of about ~6x at the same wire packet rate. This is thanks to Leon's
suggestion of using the DMA IOVA API [1].
Special care is taken to make bnxt ethtool operations work correctly: the ring
size cannot be reduced below a minimum threshold while USO is enabled and
growing the ring automatically re-enables USO if it was previously blocked.
This v10 contains some cosmetic changes (wrapping long lines), moves the test
to the correct directory, and attempts to fix the slot availability check
added in the v9.
I re-ran the python test and the test passed on my bnxt system. I also ran
this on a production system.
====================
Joe Damato [Wed, 8 Apr 2026 23:05:58 +0000 (16:05 -0700)]
net: bnxt: Dispatch to SW USO
Wire in the SW USO path added in preceding commits when hardware USO is
not possible.
When a GSO skb with SKB_GSO_UDP_L4 arrives and the NIC lacks HW USO
capability, redirect to bnxt_sw_udp_gso_xmit() which handles software
segmentation into individual UDP frames submitted directly to the TX
ring.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-10-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Wed, 8 Apr 2026 23:05:57 +0000 (16:05 -0700)]
net: bnxt: Add SW GSO completion and teardown support
Update __bnxt_tx_int and bnxt_free_one_tx_ring_skbs to handle SW GSO
segments:
- MID segments: adjust tx_pkts/tx_bytes accounting and skip skb free
(the skb is shared across all segments and freed only once)
- LAST segments: call tso_dma_map_complete() to tear down the IOVA
mapping if one was used. On the fallback path, payload DMA unmapping
is handled by the existing per-BD dma_unmap_len walk.
Both MID and LAST completions advance tx_inline_cons to release the
segment's inline header slot back to the ring.
is_sw_gso is initialized to zero, so the new code paths are not run.
Add logic for feature advertisement and guardrails for ring sizing.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-9-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Wed, 8 Apr 2026 23:05:56 +0000 (16:05 -0700)]
net: bnxt: Implement software USO
Implement bnxt_sw_udp_gso_xmit() using the core tso_dma_map API and
the pre-allocated TX inline buffer for per-segment headers.
The xmit path:
1. Calls tso_start() to initialize TSO state
2. Stack-allocates a tso_dma_map and calls tso_dma_map_init() to
DMA-map the linear payload and all frags upfront.
3. For each segment:
- Copies and patches headers via tso_build_hdr() into the
pre-allocated tx_inline_buf (DMA-synced per segment)
- Counts payload BDs via tso_dma_map_count()
- Emits long BD (header) + ext BD + payload BDs
- Payload BDs use tso_dma_map_next() which yields (dma_addr,
chunk_len, mapping_len) tuples.
Header BDs set dma_unmap_len=0 since the inline buffer is pre-allocated
and unmapped only at ring teardown.
Completion state is updated by calling tso_dma_map_completion_save() for
the last segment.
Joe Damato [Wed, 8 Apr 2026 23:05:55 +0000 (16:05 -0700)]
net: bnxt: Add boilerplate GSO code
Add bnxt_gso.c and bnxt_gso.h with a stub bnxt_sw_udp_gso_xmit()
function, SW USO constants (BNXT_SW_USO_MAX_SEGS,
BNXT_SW_USO_MAX_DESCS), and the is_sw_gso field in bnxt_sw_tx_bd
with BNXT_SW_GSO_MID/LAST markers.
The full SW USO implementation will be added in a future commit.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-7-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Wed, 8 Apr 2026 23:05:54 +0000 (16:05 -0700)]
net: bnxt: Add TX inline buffer infrastructure
Add per-ring pre-allocated inline buffer fields (tx_inline_buf,
tx_inline_dma, tx_inline_size) to bnxt_tx_ring_info and helpers to
allocate and free them. A producer and consumer (tx_inline_prod,
tx_inline_cons) are added to track which slot(s) of the inline buffer
are in-use.
The inline buffer will be used by the SW USO path for pre-allocated,
pre-DMA-mapped per-segment header copies. In the future, this
could be extended to support TX copybreak.
Allocation helper is marked __maybe_unused in this commit because it
will be wired in later.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-6-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Wed, 8 Apr 2026 23:05:53 +0000 (16:05 -0700)]
net: bnxt: Use dma_unmap_len for TX completion unmapping
Store the DMA mapping length in each TX buffer descriptor via
dma_unmap_len_set at submit time, and use dma_unmap_len at completion
time.
This is a no-op for normal packets but prepares for software USO,
where header BDs set dma_unmap_len to 0 because the header buffer
is unmapped collectively rather than per-segment.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-5-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>