This series includes several changes to the MPTCP RX path. The main
goals are improving the RX performances, and increase the long term
maintainability.
Some changes reflects recent(ish) improvements introduced in the TCP
stack: patch 1, 2 and 3 are the MPTCP counter part of SKB deferral free
and auto-tuning improvements. Note that patch 3 could possibly fix
additional issues, and overall such patch should protect from similar
issues to arise in the future.
Patches 4-7 are aimed at introducing the socket backlog usage which will
be done in a later series to process the packets received by the
different subflows while the msk socket is owned.
Patch 8 is not related to the RX path, but it contains additional tests
for new features recently introduced in net-next.
====================
Here are a few sub-tests for mptcp_join.sh, validating the new 'laminar'
endpoint type.
In a setup where subflows created using the routing rules would be
rejected by the listener, and where the latter announces one IP address,
some cases are verified:
- Without any 'laminar' endpoints: no new subflows are created.
- With one 'laminar' endpoint: a second subflow is created.
- With multiple 'laminar' endpoints: 2 IPv4 subflows are created.
- With one 'laminar' endpoint, but the server announcing a second IP
address, only one subflow is created.
- With one 'laminar' + 'subflow' endpoint, the same endpoint is only
used once.
Paolo Abeni [Sat, 27 Sep 2025 09:40:43 +0000 (11:40 +0200)]
mptcp: minor move_skbs_to_msk() cleanup
Such function is called only by __mptcp_data_ready(), which in turn
is always invoked when msk is not owned by the user: we can drop the
redundant, related check.
Additionally mptcp needs to propagate the socket error only for
current subflow.
Paolo Abeni [Sat, 27 Sep 2025 09:40:41 +0000 (11:40 +0200)]
mptcp: remove unneeded mptcp_move_skb()
Since commit b7535cfed223 ("mptcp: drop legacy code around RX EOF"),
sk_shutdown can't change during the main recvmsg loop, we can drop
the related race breaker.
Paolo Abeni [Sat, 27 Sep 2025 09:40:40 +0000 (11:40 +0200)]
mptcp: introduce the mptcp_init_skb helper
Factor out all the skb initialization step in a new helper and
use it. Note that this change moves the MPTCP CB initialization
earlier: we can do such step as soon as the skb leaves the
subflow socket receive queues.
Paolo Abeni [Sat, 27 Sep 2025 09:40:39 +0000 (11:40 +0200)]
mptcp: rcvbuf auto-tuning improvement
Apply to the MPTCP auto-tuning the same improvements introduced for the
TCP protocol by the merge commit 2da35e4b4df9 ("Merge branch
'tcp-receive-side-improvements'").
The main difference is that TCP subflow and the main MPTCP socket need
to account separately for OoO: MPTCP does not care for TCP-level OoO
and vice versa, as a consequence do not reflect MPTCP-level rcvbuf
increase due to OoO packets at the subflow level.
This refeactor additionally allow dropping the msk receive buffer update
at receive time, as the latter only intended to cope with subflow receive
buffer increase due to OoO packets.
Paolo Abeni [Sat, 27 Sep 2025 09:40:38 +0000 (11:40 +0200)]
tcp: make tcp_rcvbuf_grow() accessible to mptcp code
To leverage the auto-tuning improvements brought by commit 2da35e4b4df9
("Merge branch 'tcp-receive-side-improvements'"), the MPTCP stack need
to access the mentioned helper.
Paolo Abeni [Sat, 27 Sep 2025 09:40:37 +0000 (11:40 +0200)]
mptcp: leverage skb deferral free
Usage of the skb deferral API is straight-forward; with multiple
subflows actives this allow moving part of the received application
load into multiple CPUs.
Eric Dumazet [Sat, 27 Sep 2025 09:28:27 +0000 (09:28 +0000)]
tcp: use skb->len instead of skb->truesize in tcp_can_ingest()
Some applications are stuck to the 20th century and still use
small SO_RCVBUF values.
After the blamed commit, we can drop packets especially
when using LRO/hw-gro enabled NIC and small MSS (1500) values.
LRO/hw-gro NIC pack multiple segments into pages, allowing
tp->scaling_ratio to be set to a high value.
Whenever the receive queue gets full, we can receive a small packet
filling RWIN, but with a high skb->truesize, because most NIC use 4K page
plus sk_buff metadata even when receiving less than 1500 bytes of payload.
Even if we refine how tp->scaling_ratio is estimated,
we could have an issue at the start of the flow, because
the first round of packets (IW10) will be sent based on
the initial tp->scaling_ratio (1/2)
Relax tcp_can_ingest() to use skb->len instead of skb->truesize,
allowing the peer to use final RWIN, assuming a 'perfect'
scaling_ratio of 1.
Jakub Kicinski [Tue, 30 Sep 2025 01:13:51 +0000 (18:13 -0700)]
Merge tag 'for-net-next-2025-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
core:
- MAINTAINERS: add a sub-entry for the Qualcomm bluetooth driver
- Avoid a couple dozen -Wflex-array-member-not-at-end warnings
- bcsp: receive data only if registered
- HCI: Fix using LE/ACL buffers for ISO packets
- hci_core: Detect if an ISO link has stalled
- ISO: Don't initiate CIS connections if there are no buffers
- ISO: Use sk_sndtimeo as conn_timeout
drivers:
- btusb: Check for unexpected bytes when defragmenting HCI frames
- btusb: Add new VID/PID 13d3/3627 for MT7925
- btusb: Add new VID/PID 13d3/3633 for MT7922
- btusb: Add USB ID 2001:332a for D-Link AX9U rev. A1
- btintel: Add support for BlazarIW core
- btintel_pcie: Add support for _suspend() / _resume()
- btintel_pcie: Define hdev->wakeup() callback
- btintel_pcie: Add Bluetooth core/platform as comments
- btintel_pcie: Add id of Scorpious, Panther Lake-H484
- btintel_pcie: Refactor Device Coredump
* tag 'for-net-next-2025-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (30 commits)
Bluetooth: Avoid a couple dozen -Wflex-array-member-not-at-end warnings
Bluetooth: hci_sync: Fix using random address for BIG/PA advertisements
Bluetooth: ISO: don't leak skb in ISO_CONT RX
Bluetooth: ISO: free rx_skb if not consumed
Bluetooth: ISO: Fix possible UAF on iso_conn_free
Bluetooth: SCO: Fix UAF on sco_conn_free
Bluetooth: bcsp: receive data only if registered
Bluetooth: btusb: Add new VID/PID 13d3/3633 for MT7922
Bluetooth: btusb: Add new VID/PID 13d3/3627 for MT7925
Bluetooth: remove duplicate h4_recv_buf() in header
Bluetooth: btusb: Check for unexpected bytes when defragmenting HCI frames
Bluetooth: hci_core: Print information of hcon on hci_low_sent
Bluetooth: hci_core: Print number of packets in conn->data_q
Bluetooth: Add function and line information to bt_dbg
Bluetooth: MGMT: Fix not exposing debug UUID on MGMT_OP_READ_EXP_FEATURES_INFO
Bluetooth: hci_core: Detect if an ISO link has stalled
Bluetooth: ISO: Use sk_sndtimeo as conn_timeout
Bluetooth: HCI: Fix using LE/ACL buffers for ISO packets
Bluetooth: ISO: Don't initiate CIS connections if there are no buffers
MAINTAINERS: add a sub-entry for the Qualcomm bluetooth driver
...
====================
__ptr_ring_zero_tail currently does the - 1 operation twice:
- during initialization of head
- at each loop iteration
Let's just do it in one place, all we need to do
is adjust the loop condition. this is better:
- a slightly clearer logic with less duplication
- uses prefix -- we don't need to save the old value
- one less - 1 operation - for example, when ring is empty
we now don't do - 1 at all, existing code does it once
net: wangxun: add RSS reta and rxfh fields support
Add ethtool ops for Rx flow hashing, query and set RSS indirection table
and hash key. Disable UDP RSS by default, and support to configure L4
header fields with TCP/UDP/SCTP for flow hasing.
For global RSS and multiple RSS scheme, the RSS type fields are defined
identically in the registers. So they can be defined as the macros
WX_RSS_FIELD_* to cleanup the codes. And to prepare for the RXFH support
in the next patch, move the rss_field to struct wx.
net: libwx: support separate RSS configuration for every pool
For those devices which support 64 pools, they also support PF and VF
(i.e. different pools) to configure different RSS key and hash table.
Enable multiple RSS, use up to 64 RSS configurations and each pool has a
specific configuration.
Eric Dumazet [Thu, 25 Sep 2025 23:09:29 +0000 (23:09 +0000)]
net: remove one stac/clac pair from move_addr_to_user()
Convert the get_user() and __put_user() code to the
fast masked_user_access_begin()/unsafe_{get|put}_user()
variant.
This patch increases the performance of an UDP recvfrom()
receiver (netserver) on 120 bytes messages by 7 %
on an AMD EPYC 7B12 64-Core Processor platform.
Presence of audit_sockaddr() makes difficult
to avoid the stac/clac pair in the copy_to_user() call,
this is left for a future patch.
====================
net: stmmac: Drop frames causing HLBS error
This patchset consists of following patchset to avoid netdev watchdog
reset due to Head-of-Line Blocking due to EST scheduling error.
1. Drop those frames causing HLBS error
2. Add HLBS frame drops to taprio stats
Rohan G Thomas [Thu, 25 Sep 2025 14:06:14 +0000 (22:06 +0800)]
net: stmmac: tc: Add HLBS drop count to taprio stats
Add the count of the frames dropped by Head-Of-Line Blocking due to
Scheduling(HLBS) error to taprio window drop count stats.
Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Reviewed-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/20250925-hlbs_2-v3-2-3b39472776c2@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rohan G Thomas [Thu, 25 Sep 2025 14:06:13 +0000 (22:06 +0800)]
net: stmmac: est: Drop frames causing HLBS error
Drop those frames causing Head-of-Line Blocking due to Scheduling
(HLBS) error to avoid HLBS interrupt flooding and netdev watchdog
timeouts due to blocked packets. Tx queues can be configured to drop
those blocked packets by setting Drop Frames causing Scheduling Error
(DFBS) bit of EST_CONTROL register.
Also, add per queue HLBS drop count.
Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com> Reviewed-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/20250925-hlbs_2-v3-1-3b39472776c2@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Corrected function and variable name typos in comments and docstrings:
ixgbe_write_ee_hostif_X550 -> ixgbe_write_ee_hostif_data_X550
ixgbe_get_lcd_x550em -> ixgbe_get_lcd_t_x550em
"Determime" -> "Determine"
"point to hardware structure" -> "pointer to hardware structure"
"To turn on the LED" -> "To turn off the LED"
These changes improve readability, consistency.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250929124427.79219-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: ethtool: remove duplicated mm.o from Makefile
Fixes: 2b30f8291a30 ("net: ethtool: add support for MAC Merge layer") Signed-off-by: Markus Heidelberg <m.heidelberg@cab.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250926131323.222192-1-m.heidelberg@cab.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/mlx5: Expose uar access and odp page fault counters
Add three counters to vnic health reporter:
bar_uar_access, odp_local_triggered_page_fault, and
odp_remote_triggered_page_fault.
- bar_uar_access
number of WRITE or READ access operations to the UAR on the PCIe
BAR.
- odp_local_triggered_page_fault
number of locally-triggered page-faults due to ODP.
- odp_remote_triggered_page_fault
number of remotly-triggered page-faults due to ODP.
Bluetooth: Avoid a couple dozen -Wflex-array-member-not-at-end warnings
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Use the __struct_group() helper to fix 31 instances of the following
type of warnings:
30 net/bluetooth/mgmt_config.c:16:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
1 net/bluetooth/mgmt_config.c:22:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: hci_sync: Fix using random address for BIG/PA advertisements
When creating an advertisement for BIG the address shall not be
non-resolvable since in case of acting as BASS/Broadcast Assistant the
address must be the same as the connection in order to use the PAST
method and even when PAST/BASS are not in the picture a Periodic
Advertisement can still be synchronized thus the same argument as to
connectable advertisements still stand.
Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Pauli Virtanen [Mon, 22 Sep 2025 18:11:22 +0000 (21:11 +0300)]
Bluetooth: ISO: don't leak skb in ISO_CONT RX
For ISO_CONT RX, the data from skb is copied to conn->rx_skb, but the
skb is leaked.
Free skb after copying its data.
Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Pauli Virtanen [Mon, 22 Sep 2025 18:11:21 +0000 (21:11 +0300)]
Bluetooth: ISO: free rx_skb if not consumed
If iso_conn is freed when RX is incomplete, free any leftover skb piece.
Fixes: dc26097bdb86 ("Bluetooth: ISO: Use kref to track lifetime of iso_conn") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
BUG: KASAN: slab-use-after-free in sco_conn_free net/bluetooth/sco.c:87 [inline]
BUG: KASAN: slab-use-after-free in kref_put include/linux/kref.h:65 [inline]
BUG: KASAN: slab-use-after-free in sco_conn_put+0xdd/0x410
net/bluetooth/sco.c:107
Write of size 8 at addr ffff88811cb96b50 by task kworker/u17:4/352
Ivan Pravdin [Sat, 30 Aug 2025 20:03:40 +0000 (16:03 -0400)]
Bluetooth: bcsp: receive data only if registered
Currently, bcsp_recv() can be called even when the BCSP protocol has not
been registered. This leads to a NULL pointer dereference, as shown in
the following stack trace:
Calvin Owens [Tue, 26 Aug 2025 04:11:08 +0000 (21:11 -0700)]
Bluetooth: remove duplicate h4_recv_buf() in header
The "h4_recv.h" header contains a duplicate h4_recv_buf() that is nearly
but not quite identical to the h4_recv_buf() in hci_h4.c.
This duplicated header was added in commit 07eb96a5a7b0 ("Bluetooth:
bpa10x: Use separate h4_recv_buf helper"). I wasn't able to find any
explanation for duplicating the code in the discussion:
Unfortunately, in the years since, several other drivers have come to
also rely on this duplicated function, probably by accident. This is, at
the very least, *extremely* confusing. It's also caused real issues when
it's become out-of-sync, see the following:
ef564119ba83 ("Bluetooth: hci_h4: Add support for ISO packets") 61b27cdf025b ("Bluetooth: hci_h4: Add support for ISO packets in h4_recv.h")
This is the full diff between the two implementations today:
As I read this: If alignment is one, and padding is zero, padding
remains zero throughout the loop. So it seems to me that the two
functions behave strictly identically in that case. All the duplicated
defines are also identical, as is the duplicated h4_recv_pkt structure
declaration.
All four drivers which use the duplicated function use the default
alignment of one, and the default padding of zero. I therefore conclude
the duplicate function may be safely replaced with the core one.
I raised this in an RFC a few months ago, and didn't get much interest:
Arkadiusz Bokowy [Wed, 27 Aug 2025 16:40:16 +0000 (18:40 +0200)]
Bluetooth: btusb: Check for unexpected bytes when defragmenting HCI frames
Some Barrot based USB Bluetooth dongles erroneously send one extra
random byte for the HCI_OP_READ_LOCAL_EXT_FEATURES command. The
consequence of that is that the next HCI transfer is misaligned by one
byte causing undefined behavior. In most cases the response event for
the next command fails with random error code.
Since the HCI_OP_READ_LOCAL_EXT_FEATURES command is used during HCI
controller initialization, the initialization fails rendering the USB
dongle not usable.
> [59.464099] usb 1-1.3: new full-speed USB device number 11 using xhci_hcd
> [59.561617] usb 1-1.3: New USB device found, idVendor=33fa, idProduct=0012, bcdDevice=88.91
> [59.561642] usb 1-1.3: New USB device strings: Mfr=0, Product=2, SerialNumber=0
> [59.561656] usb 1-1.3: Product: UGREEN BT6.0 Adapter
> [61.720116] Bluetooth: hci1: command 0x1005 tx timeout
> [61.720167] Bluetooth: hci1: Opcode 0x1005 failed: -110
This patch was tested with the 33fa:0012 device. The info from the
/sys/kernel/debug/usb/devices is shown below:
> [43.329852] usb 1-1.4: new full-speed USB device number 4 using dwc_otg
> [43.446790] usb 1-1.4: New USB device found, idVendor=33fa, idProduct=0012, bcdDevice=88.91
> [43.446813] usb 1-1.4: New USB device strings: Mfr=0, Product=2, SerialNumber=0
> [43.446821] usb 1-1.4: Product: UGREEN BT6.0 Adapter
> [43.582024] Bluetooth: hci1: Unexpected continuation: 1 bytes
> [43.703025] Bluetooth: hci1: Unexpected continuation: 1 bytes
> [43.750141] Bluetooth: MGMT ver 1.23
Link: https://github.com/bluez/bluez/issues/1326 Signed-off-by: Arkadiusz Bokowy <arkadiusz.bokowy@gmail.com> Tested-by: Arkadiusz Bokowy <arkadiusz.bokowy@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: Add function and line information to bt_dbg
When enabling debug via CONFIG_BT_FEATURE_DEBUG include function and
line information by default otherwise it is hard to make any sense of
which function the logs comes from.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: MGMT: Fix not exposing debug UUID on MGMT_OP_READ_EXP_FEATURES_INFO
The debug UUID was only getting set if MGMT_OP_READ_EXP_FEATURES_INFO
was not called with a specific index which breaks the likes of
bluetoothd since it only invokes MGMT_OP_READ_EXP_FEATURES_INFO when an
adapter is plugged, so instead of depending hdev not to be set just
enable the UUID on any index like it was done with iso_sock_uuid.
Fixes: e625e50ceee1 ("Bluetooth: Introduce debug feature when dynamic debug is disabled") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: hci_core: Detect if an ISO link has stalled
This attempts to detect if an ISO link has been waiting for an ISO
buffer for longer than the maximum allowed transport latency then
proceed to use hci_link_tx_to which prints an error and disconnects.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This aligns the usage of socket sk_sndtimeo as conn_timeout when
initiating a connection and then use it when scheduling the
resulting HCI command, similar to what has been done in bf98feea5b65
("Bluetooth: hci_conn: Always use sk_timeo as conn_timeout").
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
MAINTAINERS: add a sub-entry for the Qualcomm bluetooth driver
Patches modifying drivers/bluetooth/hci_qca.c should be Cc'ed to the
linux-arm-msm mailing list so that Qualcomm maintainers and reviewers
can get notified about proposed changes to it. Add a sub-entry that adds
the mailing list to the list of addresses returned by get_maintainer.pl.
Acked-by: Konrad Dybcio <konradybcio@kernel.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Kiran K [Wed, 6 Aug 2025 06:48:49 +0000 (12:18 +0530)]
Bluetooth: btintel_pcie: Refactor Device Coredump
As device coredumps are not HCI traces, maintain the device coredump at
the driver level and eliminate the dependency on hdev_devcd*()
Signed-off-by: Kiran K <kiran.k@intel.com> Fixes: 07e6bddb54b4 ("Bluetooth: btintel_pcie: Add support for device coredump") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Thorsten Blum [Mon, 11 Aug 2025 09:19:06 +0000 (11:19 +0200)]
Bluetooth: btintel_pcie: Use strscpy() instead of strscpy_pad()
kzalloc() already zero-initializes the destination buffer 'data', making
strscpy() sufficient for safely copying 'name'. The additional
NUL-padding performed by strscpy_pad() is unnecessary.
Add a new local variable to store the length of 'name' and reuse it
instead of recalculating the same length.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Thorsten Blum [Sun, 10 Aug 2025 21:53:20 +0000 (23:53 +0200)]
Bluetooth: Annotate struct hci_drv_rp_read_info with __counted_by_le()
Add the __counted_by_le() compiler attribute to the flexible array
member 'supported_commands' to improve access bounds-checking via
CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Implement hdev->wakeup() callback to support Wake On BT feature.
Test steps:
1. echo enabled > /sys/bus/pci/devices/0000:00:14.7/power/wakeup
2. connect bluetooth hid device
3. put the system to suspend - rtcwake -m mem -s 300
4. press any key on hid to wake up the system
Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Kiran K [Mon, 28 Jul 2025 15:49:08 +0000 (21:19 +0530)]
Bluetooth: btintel_pcie: Add Bluetooth core/platform as comments
Add Bluetooth CNVi core and platform names to the PCI device table for
each device ID as a comment.
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: btintel_pcie: Add support for _suspend() / _resume()
This patch implements _suspend() and _resume() functions for the
Bluetooth controller. When the system enters a suspended state, the
driver notifies the controller to perform necessary housekeeping tasks
by writing to the sleep control register and waits for an alive
interrupt. The firmware raises the alive interrupt when it has
transitioned to the D3 state. The same flow occurs when the system
resumes.
Command to test host initiated wakeup after 60 seconds
sudo rtcwake -m mem -s 60
dmesg log (tested on Whale Peak2 on Panther Lake platform)
On system suspend:
[Fri Jul 25 11:05:37 2025] Bluetooth: hci0: device entered into d3 state from d0 in 80 us
On system resume:
[Fri Jul 25 11:06:36 2025] Bluetooth: hci0: device entered into d0 state from d3 in 7117 us
Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@intel.com> Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Vijay Satija <vijay.satija@intel.com> Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
====================
selftests: Mark auto-deferring functions clearly
selftests/net/lib.sh contains a suite of iproute2 wrappers that
automatically schedule the corresponding cleanup through defer. The fact
they do so is however not immediately obvious, one needs to know which
functions are handling the deferral behind the scenes, and which expect the
caller to handle cleanups themselves.
A convention for these auto-deferring functions would help both writing and
patch review. This patchset does so by marking these functions with an adf_
prefix. We already have a few such functions: forwarding/lib.sh has
adf_mcd_start() and a few selftests add private helpers that conform to
this convention.
Patches #1 to #8 gradually convert individual functions, one per patch.
Patch #9 renames an auto-deferring private helpers named dfr_* to adf_*.
The plan is not to retro-rename all private helpers, but I happened to know
about this one.
Patches #10 to #12 introduce several autodefer helpers for commonly used
forwarding/lib.sh functions, and opportunistically convert straightforward
instances of 'action; defer counteraction' to the new helpers.
Patch #13 adds some README verbiage to pitch defer and the adf_*
convention.
====================
Petr Machata [Thu, 25 Sep 2025 17:31:55 +0000 (19:31 +0200)]
selftests: forwarding: lib: Add an autodefer variant of forwarding_enable()
Most forwarding tests invoke forwarding_enable() to enable the router and
forwarding_restore() to restore the original configuration. Add a helper,
adf_forwarding_enable(), which is like forwarding_enable(), but takes care
of scheduling the cleanup automatically.
Convert the tests that currently use defer to schedule the cleanup.
Petr Machata [Thu, 25 Sep 2025 17:31:54 +0000 (19:31 +0200)]
selftests: forwarding: lib: Add an autodefer variant of simple_if_init()
Most forwarding tests invoke simple_if_init() to set up a VRF-based "host"
and simple_if_fini() to tear it down again. Add a helper,
adf_simple_if_init(), which is like simple_if_fini(), but takes care of
scheduling the cleanup automatically.
Convert the tests that currently use defer to schedule the cleanup.
Petr Machata [Thu, 25 Sep 2025 17:31:53 +0000 (19:31 +0200)]
selftests: forwarding: lib: Add an autodefer variant of vrf_prepare()
Most forwarding tests invoke vrf_prepare() to set up VRF forwarding and
vrf_cleanup() to restore the original configuration. Add a helper,
adf_vrf_prepare(), which is like vrf_prepare(), but takes care of
scheduling the cleanup automatically.
Convert a number of tests that currently use defer to schedule the cleanup.
====================
mptcp: pm: special case for c-flag + luminar endp
Here are some patches for the MPTCP PM, including some refactoring that
I thought it would be best to send at the end of a cycle to avoid
conflicts between net and net-next that could last a few weeks.
The most interesting changes are in the first and last patch, the rest
are patches refactoring the code & tests to validate the modifications.
- Patches 1 & 2: When servers set the C-flag in their MP_CAPABLE to tell
clients not to create subflows to the initial address and port -- e.g.
a deployment behind a L4 load balancer like a typical CDN deployment
-- clients will not use their other endpoints when default settings
are used. That's because the in-kernel path-manager uses the 'subflow'
endpoints to create subflows only to the initial address and port. The
first patch fixes that (for >=v5.14), and the second one validates it.
- Patches 3-14: various patches refactoring the code around the
in-kernel PM (mainly): split too long functions, rename variables and
functions to avoid confusions, reduce structure size, and compare IDs
instead of IP addresses. Note that one patch modifies one internal
variable used in one BPF selftest.
- Patch 15: ability to control endpoints that are used in reaction to a
new address announced by the other peer. With that, endpoints can be
used only once.
====================
Currently, upon the reception of an ADD_ADDR (and when the fullmesh flag
is not used), the in-kernel PM will create new subflows using the local
address the routing configuration will pick.
It would be easier to pick local addresses from a selected list of
endpoints, and use it only once, than relying on routing rules.
Use case: both the client (C) and the server (S) have two addresses (a
and b). The client establishes the connection between C(a) and S(a).
Once established, the server announces its additional address S(b). Once
received, the client connects to it using its second address C(b).
Compared to a situation without the 'laminar' endpoint for C(b), the
client didn't use this address C(b) to establish a subflow to the
server's primary address S(a). So at the end, we have:
C S
C(a) --- S(a)
C(b) --- S(b)
In case of a 3rd address on each side (C(c) and S(c)), upon the
reception of an ADD_ADDR with S(c), the client should not pick C(b)
because it has already been used. C(c) should then be used.
Note that this situation is currently possible if C doesn't add any
endpoint, but configure the routing in order to pick C(b) for the route
to S(b), and pick C(c) for the route to S(c). That doesn't sound very
practical because it means knowing in advance the IP addresses that
will be used and announced by the server.
'laminar', like the idea of laminar flows: the different subflows don't
mix with each other on an endpoint, unlike the "turbulent" way traffic
is mixed by 'fullmesh'.
In the code, the new endpoint type is added. Similar to the other
subflow types, an MPTCP_INFO counter is added. While at it, hole are now
commented in struct mptcp_info, to remember next time that these holes
can no longer be used.
mptcp: pm: in-kernel: compare IDs instead of addresses
When receiving an ADD_ADDR right after the 3WHS, the connection will
switch to 'fully established'. It means the MPTCP worker will be called
to treat two events, in this order: ADD_ADDR_RECEIVED, PM_ESTABLISHED.
The MPTCP endpoints cannot have the ID 0, because it is reserved to the
address and port used by the initial subflow. To be able to deal with
this case in different places, msk->mpc_endpoint_id contains the
endpoint ID linked to the initial subflow. This variable was only set
when treating the first PM_ESTABLISHED event, after ADD_ADDR_RECEIVED.
That's why in fill_local_addresses_vec(), the endpoint addresses were
compared with the one of the initial subflow, instead of only comparing
the IDs.
Instead, msk->mpc_endpoint_id is now set when treating ADD_ADDR_RECEIVED
as well, if needed, then the IDs can be compared.
To be able to do so, the code doing that is now in a dedicated helper,
and called from the functions linked to the two actions.
While at it, mptcp_endp_get_local_id() has also been moved up, next to
this new helper, because they are linked, and to be able to use it in
fill_local_addresses_vec() in the next commit.
All the 'unsigned int' variables from the 'pm_nl_pernet' structure are
bounded to MPTCP_PM_ADDR_MAX, currently set to 8. The endpoint ID is
also bounded by the protocol to 8-bit. MPTCP_PM_ADDR_MAX, if extended
later, will never over 8-bit.
So no need to use 'unsigned int' variables, 'u8' is enough.
Note that the exposed counters in MPTCP_INFO are already limited to
8-bit, same for pm->extra_subflows, and others. So it seems even better
to limit them to 8-bit.
It was in fact never used since its introduction in commit ff5a0b421cb2
("mptcp: faster active backup recovery"). It was probably initially
added to struct pm_nl_pernet during the development of this commit,
before being added to struct mptcp_pernet in ctrl.c, but not removed
from the first place.
mptcp: pm: in-kernel: rename 'local_addr_max' to 'endp_subflow_max'
A few variables linked to the in-kernel Path-Manager are confusing, and
it would help current and future developers, to clarify them.
One of them is 'local_addr_max', which in fact represents the maximum
number of 'subflow' endpoints that can be used to create new subflows,
and not the number of local addresses that have been used to create
subflows.
While at it, add an additional name for the corresponding variable in
MPTCP INFO: mptcpi_endp_subflow_max. Not to break the current uAPI, the
new name is added as a 'define' pointing to the former name. This will
then also help userspace devs.
Also move the variable and function next to the other 'endp_X_max' ones.
mptcp: pm: in-kernel: rename 'add_addr_accept_max' to 'limit_add_addr_accepted'
A few variables linked to the in-kernel Path-Manager are confusing, and
it would help current and future developers, to clarify them.
One of them is 'add_addr_accept_max', which in fact represents the limit
of ADD_ADDR that can be accepted: the limit set via 'ip mptcp limit
add_addr_accepted X' for example. It is not linked to the maximum number
of accepted ADD_ADDR.
While at it, add an additional name for the corresponding variable in
MPTCP INFO: mptcpi_limit_add_addr_accepted. Not to break the current
uAPI, the new name is added as a 'define' pointing to the former name.
This will then also help userspace devs.
mptcp: pm: in-kernel: rename 'add_addr_signal_max' to 'endp_signal_max'
A few variables linked to the in-kernel Path-Manager are confusing, and
it would help current and future developers, to clarify them.
One of them is 'add_addr_signal_max', which in fact represents the
maximum number of 'signal' endpoints that can be used to announced
addresses, and not the number of ADD_ADDR that can be signalled.
While at it, add an additional name for the corresponding variable in
MPTCP INFO: mptcpi_endp_signal_max. Not to break the current uAPI, the
new name is added as a 'define' pointing to the former name. This will
then also help userspace devs.
mptcp: pm: in-kernel: rename 'subflows_max' to 'limit_extra_subflows'
A few variables linked to the in-kernel Path-Manager are confusing, and
it would help current and future developers, to clarify them.
One of them is 'subflows_max', which in fact represents the limit of
extra subflows: the limit set via 'ip mptcp limit subflows X' for
example. It is not linked to the maximum number of created / possible
subflows.
While at it, add an additional name for the corresponding variable in
MPTCP INFO: mptcpi_limit_extra_subflows. Not to break the current uAPI,
the new name is added as a 'define' pointing to the former name. This
will then also help userspace devs.
A few variables linked to the Path-Managers are confusing, and it would
help current and future developers, to clarify them.
One of them is 'subflows', which in fact represents the number of extra
subflows: all the additional subflows created after the initial one, and
not the total number of subflows.
While at it, add an additional name for the corresponding variable in
MPTCP INFO: mptcpi_extra_subflows. Not to break the current uAPI, the
new name is added as a 'define' pointing to the former name. This will
then also help userspace devs.
Before this modification, this function was quite long with many levels
of indentations.
Each case can be split in a dedicated function: fullmesh, non-fullmesh.
To remove one level of indentation, msk->pm.subflows >= subflows_max is
now checked after having added one subflow, and stops the loop if it is
no longer possible to add new subflows. This is fine to do this because
this function should only be called if msk->pm.subflows < subflows_max.
The previous commit adds an exception for the C-flag case. The
'mptcp_join.sh' selftest is extended to validate this case.
In this subtest, there is a typical CDN deployment with a client where
MPTCP endpoints have been 'automatically' configured:
- the server set net.mptcp.allow_join_initial_addr_port=0
- the client has multiple 'subflow' endpoints, and the default limits:
not accepting ADD_ADDRs.
Without the parent patch, the client is not able to establish new
subflows using its 'subflow' endpoints. The parent commit fixes that.
The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.
mptcp: pm: in-kernel: usable client side with C-flag
When servers set the C-flag in their MP_CAPABLE to tell clients not to
create subflows to the initial address and port, clients will likely not
use their other endpoints. That's because the in-kernel path-manager
uses the 'subflow' endpoints to create subflows only to the initial
address and port.
If the limits have not been modified to accept ADD_ADDR, the client
doesn't try to establish new subflows. If the limits accept ADD_ADDR,
the routing routes will be used to select the source IP.
The C-flag is typically set when the server is operating behind a legacy
Layer 4 load balancer, or using anycast IP address. Clients having their
different 'subflow' endpoints setup, don't end up creating multiple
subflows as expected, and causing some deployment issues.
A special case is then added here: when servers set the C-flag in the
MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted.
The 'subflows' endpoints will then be used with this new remote IP and
port. This exception is only allowed when the ADD_ADDR is sent
immediately after the 3WHS, and makes the client switching to the 'fully
established' mode. After that, 'select_local_address()' will not be able
to find any subflows, because 'id_avail_bitmap' will be filled in
mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully
established' mode.
====================
Add support to retrieve hardware channel information
This patch series introduces support for retrieving hardware channel
configuration through the ethtool interface for both PF and VF.
====================
The blamed commit introduced the function lanphy_modify_page_reg which
as name suggests it, it modifies the registers. In the same commit we
have started to use this function inside the drivers. The problem is
that in the function lan8814_config_init we passed the wrong page number
when disabling the aneg towards host side. We passed extended page number
4(LAN8814_PAGE_COMMON_REGS) instead of extended page
5(LAN8814_PAGE_PORT_REGS)
Lorenzo Bianconi [Wed, 24 Sep 2025 21:14:53 +0000 (23:14 +0200)]
net: airoha: npu: Add a NPU callback to initialize flow stats
Introduce a NPU callback to initialize flow stats and remove NPU stats
initialization from airoha_npu_get routine. Add num_stats_entries to
airoha_npu_ppe_stats_setup routine.
This patch makes the code more readable since NPU statistic are now
initialized on demand by the NPU consumer (at the moment NPU statistic
are configured just by the airoha_eth driver).
Moreover this patch allows the NPU consumer (PPE module) to explicitly
enable/disable NPU flow stats.
We are reporting the lane count in the link settings but the flag is not
set to indicate that the driver supports lanes. Set the flag to report
lane count.
Wangxun: vf: Implement some ethtool apis for get_xxx
Implement some ethtool interfaces for obtaining the status of
Wangxun Virtual Function Ethernet.
Just like connection status, version information, queue depth and so on.
====================
add FEC bins histogram report via ethtool
IEEE 802.3ck-2022 defines counters for FEC bins and 802.3df-2024
clarifies it a bit further. Implement reporting interface through as
addition to FEC stats available in ethtool. NetDevSim driver has simple
implementation as an example while mlx5 has much more complex solution.
The example query is the same as usual FEC statistics while the answer
is a bit more verbose:
Simple tests to validate kernel's output. FEC bin range should be valid
means high boundary should be not less than low boundary. Bin boundaries
have to be provided as well as error counter value. Per-plane value
should match bin's value.
net/mlx5e: Add logic to read RS-FEC histogram bin ranges from PPHCR
Introduce support for querying the Ports Phy Histogram Configuration
Register (PPHCR) to retrieve RS-FEC histogram bin ranges. The ranges
are stored in a static array and will be used to map histogram counters
to error levels.
The actual RS-FEC histogram statistics are not yet reported in this
commit and will be handled in a downstream patch.
IEEE 802.3ck-2022 defines counters for FEC bins and 802.3df-2024
clarifies it a bit further. Implement reporting interface through as
addition to FEC stats available in ethtool. Drivers can leave bin
counter uninitialized if per-lane values are provided. In this case the
core will recalculate summ for the bin.
Zhen Ni [Wed, 24 Sep 2025 03:02:19 +0000 (11:02 +0800)]
net: qed: Remove redundant NULL checks after list_first_entry()
list_first_entry() never returns NULL — if the list is empty, it still
returns a pointer to an invalid object, leading to potential invalid
memory access when dereferenced.
The calls to list_first_entry() are always guarded by !list_empty(),
which guarantees a valid entry is returned. Therefore, the additional
`if (!p_buffer) break;` checks in qed_ooo_release_connection_isles(),
qed_ooo_release_all_isles(), and qed_ooo_free() are redundant and
unreachable.
Remove the dead code for clarity and consistency with common list
handling patterns in the kernel. No functional change intended.