Jakub Kicinski [Wed, 14 Jan 2026 01:46:19 +0000 (17:46 -0800)]
Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Tariq Toukan says:
====================
mlx5-next updates 2026-01-13
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Add IFC bits for extended ETS rate limit bandwidth value
net/mlx5: Add support for querying bond speed
net/mlx5: Handle port and vport speed change events in MPESW
net/mlx5: Propagate LAG effective max_tx_speed to vports
net/mlx5: Add max_tx_speed and its CAP bit to IFC
====================
This is subset 1 of the RDS-TCP bug fix collection series I posted last
Oct. The greater series aims to correct multiple rds-tcp bugs that
can cause dropped or out of sequence messages. The set was starting to
get a bit large, so I've broken it down into smaller sets to make
reviews more manageable.
In this subset, we focus on work queue scalability. Messages queues
are refactored to operate in parallel across multiple connections,
which improves response times and avoids timeouts.
The entire set can be viewed in the rfc here:
https://lore.kernel.org/netdev/20251022191715.157755-1-achender@kernel.org/
net/rds: Give each connection path its own workqueue
RDS was written to require ordered workqueues for "cp->cp_wq":
Work is executed in the order scheduled, one item at a time.
If these workqueues are shared across connections,
then work executed on behalf of one connection blocks work
scheduled for a different and unrelated connection.
Luckily we don't need to share these workqueues.
While it obviously makes sense to limit the number of
workers (processes) that ought to be allocated on a system,
a workqueue that doesn't have a rescue worker attached,
has a tiny footprint compared to the connection as a whole:
A workqueue costs ~900 bytes, including the workqueue_struct,
pool_workqueue, workqueue_attrs, wq_node_nr_active and the
node_nr_active flex array. Each connection can have up to 8
(RDS_MPATH_WORKERS) paths for a worst case of ~7 KBytes per
connection. While an RDS/IB connection totals only ~5 MBytes.
So we're getting a signficant performance gain
(90% of connections fail over under 3 seconds vs. 40%)
for a less than 0.02% overhead.
RDS doesn't even benefit from the additional rescue workers:
of all the reasons that RDS blocks workers, allocation under
memory pressue is the least of our concerns. And even if RDS
was stalling due to the memory-reclaim process, the work
executed by the rescue workers are highly unlikely to free up
any memory. If anything, they might try to allocate even more.
By giving each connection path its own workqueues, we allow
RDS to better utilize the unbound workers that the system
has available.
This patch adds a per connection workqueue which can be initialized
and used independently of the globally shared rds_wq.
This patch is the first in a series that aims to address tcp ack
timeouts during the tcp socket shutdown sequence.
This initial refactoring lays the ground work needed to alleviate
queue congestion during heavy reads and writes. The independently
managed queues will allow shutdowns and reconnects respond more quickly
before the peer(s) timeout waiting for the proper acks.
Paolo Abeni [Tue, 13 Jan 2026 10:54:31 +0000 (11:54 +0100)]
Merge branch 'multi-queue-aware-sch_cake'
says:
====================
Multi-queue aware sch_cake
This series adds a multi-queue aware variant of the sch_cake scheduler,
called 'cake_mq'. Using this makes it possible to scale the rate shaper
of sch_cake across multiple CPUs, while still enforcing a single global
rate on the interface.
The approach taken in this patch series is to implement a separate qdisc
called 'cake_mq', which is based on the existing 'mq' qdisc, but differs
in a couple of aspects:
- It will always install a cake instance on each hardware queue (instead
of using the default qdisc for each queue like 'mq' does).
- The cake instances on the queues will share their configuration, which
can only be modified through the parent cake_mq instance.
Doing things this way simplifies user configuration by centralising
all configuration through the cake_mq qdisc (which also serves as an
obvious way of opting into the multi-queue aware behaviour). The cake_mq
qdisc takes all the same configuration parameters as the cake qdisc.
An earlier version of this work was presented at this year's Netdevconf:
https://netdevconf.info/0x19/sessions/talk/mq-cake-scaling-software-rate-limiting-across-cpu-cores.html
The patch series is structured as follows:
- Patch 1 exports the mq qdisc functions for reuse.
- Patch 2 factors out the sch_cake configuration variables into a
separate struct that can be shared between instances.
- Patch 3 adds the basic cake_mq qdisc, reusing the exported mq code
- Patch 4 adds configuration sharing across the cake instances installed
under cake_mq
- Patch 5 adds the shared shaper state that enables the multi-core rate
shaping
- Patch 6 adds selftests for cake_mq
A patch to iproute2 to make it aware of the cake_mq qdisc were submitted
separately with a previous patch version:
Jonas Köppeler [Fri, 9 Jan 2026 13:15:35 +0000 (14:15 +0100)]
selftests/tc-testing: add selftests for cake_mq qdisc
Test 684b: Create CAKE_MQ with default setting (4 queues)
Test 7ee8: Create CAKE_MQ with bandwidth limit (4 queues)
Test 1f87: Create CAKE_MQ with rtt time (4 queues)
Test e9cf: Create CAKE_MQ with besteffort flag (4 queues)
Test 7c05: Create CAKE_MQ with diffserv8 flag (4 queues)
Test 5a77: Create CAKE_MQ with diffserv4 flag (4 queues)
Test 8f7a: Create CAKE_MQ with flowblind flag (4 queues)
Test 7ef7: Create CAKE_MQ with dsthost and nat flag (4 queues)
Test 2e4d: Create CAKE_MQ with wash flag (4 queues)
Test b3e6: Create CAKE_MQ with flowblind and no-split-gso flag (4 queues)
Test 62cd: Create CAKE_MQ with dual-srchost and ack-filter flag (4 queues)
Test 0df3: Create CAKE_MQ with dual-dsthost and ack-filter-aggressive flag (4 queues)
Test 9a75: Create CAKE_MQ with memlimit and ptm flag (4 queues)
Test cdef: Create CAKE_MQ with fwmark and atm flag (4 queues)
Test 93dd: Create CAKE_MQ with overhead 0 and mpu (4 queues)
Test 1475: Create CAKE_MQ with conservative and ingress flag (4 queues)
Test 7bf1: Delete CAKE_MQ with conservative and ingress flag (4 queues)
Test ee55: Replace CAKE_MQ with mpu (4 queues)
Test 6df9: Change CAKE_MQ with mpu (4 queues)
Test 67e2: Show CAKE_MQ class (4 queues)
Test 2de4: Change bandwidth of CAKE_MQ (4 queues)
Test 5f62: Fail to create CAKE_MQ with autorate-ingress flag (4 queues)
Test 038e: Fail to change setting of sub-qdisc under CAKE_MQ
Test 7bdc: Fail to replace sub-qdisc under CAKE_MQ
Test 18e0: Fail to install CAKE_MQ on single queue device
Jonas Köppeler [Fri, 9 Jan 2026 13:15:34 +0000 (14:15 +0100)]
net/sched: sch_cake: share shaper state across sub-instances of cake_mq
This commit adds shared shaper state across the cake instances beneath a
cake_mq qdisc. It works by periodically tracking the number of active
instances, and scaling the configured rate by the number of active
queues.
The scan is lockless and simply reads the qlen and the last_active state
variable of each of the instances configured beneath the parent cake_mq
instance. Locking is not required since the values are only updated by
the owning instance, and eventual consistency is sufficient for the
purpose of estimating the number of active queues.
The interval for scanning the number of active queues is set to 200 us.
We found this to be a good tradeoff between overhead and response time.
For a detailed analysis of this aspect see the Netdevconf talk:
net/sched: sch_cake: Add cake_mq qdisc for using cake on mq devices
Add a cake_mq qdisc which installs cake instances on each hardware
queue on a multi-queue device.
This is just a copy of sch_mq that installs cake instead of the default
qdisc on each queue. Subsequent commits will add sharing of the config
between cake instances, as well as a multi-queue aware shaper algorithm.
net/sched: sch_cake: Factor out config variables into separate struct
Factor out all the user-configurable variables into a separate struct
and embed it into struct cake_sched_data. This is done in preparation
for sharing the configuration across multiple instances of cake in an mq
setup.
To enable the cake_mq qdisc to reuse code from the mq qdisc, export a
bunch of functions from sch_mq. Split common functionality out from some
functions so it can be composed with other code, and export other
functions wholesale. To discourage wanton reuse, put the symbols into a
new NET_SCHED_INTERNAL namespace, and a sch_priv.h header file.
Javen Xu [Fri, 9 Jan 2026 07:04:14 +0000 (15:04 +0800)]
r8169: add DASH support for RTL8127AP
This adds DASH support for chip RTL8127AP. Its mac version is
RTL_GIGA_MAC_VER_80 and revision id is 0x04. DASH is a standard for
remote management of network device, allowing out-of-band control.
Alexei Lazar [Mon, 12 Jan 2026 06:50:08 +0000 (08:50 +0200)]
net/mlx5: Add IFC bits for extended ETS rate limit bandwidth value
Add hardware interface definitions to support extended bandwidth rate
limiting in the QoS Enhanced Transmission Selection (ETS) configuration.
The new fields include:
- max_bw_value: extended from 8-bit to 16-bit in ets_tcn_config_reg,
simplifying the implementation by using a single field instead of
separate MSB/LSB fields.
- qetcr_qshr_max_bw_val_msb: capability bit in qcam_qos_feature_cap_mask
indicating device support for the extended 16-bit max_bw_value field.
These interface additions are prerequisites for increasing the per-TC
rate limit beyond 255 Gbps to support higher-bandwidth NICs.
====================
net: stmmac: pcs: clean up pcs interrupt handling
Clean up the stmmac PCS interrupt handling:
- Avoid promotion to unsigned long from unsigned int by defining PCS
register bits/fields using u32 macros.
- Pass struct stmmac_priv into the host_irq_status MAC core method.
- Move the existing PCS interrupt handler (dwmac_pcs_isr) into
stmmac_pcs.c, change it's arguments, use dev_info() rather than
pr_info()
- arrange to call phylink_pcs_change() on link state changes.
====================
Report PCS link changes to phylink, which will allow phylink's inband
support to respoind to link events once the PCS is appropriately
configured.
An expected behavioural change is that should the PCS report that its
link has failed, but phylink is operating in outband mode and the PHY
reports that link is up, this event will cause the netdev's link to
momentarily drop, making the event more noticable, rather than just
producing a "stmmac_pcs: Link Down" message.
net: stmmac: change arguments to PCS handler and use dev_info()
Change the arguments to the PCS handler so that it can access the
struct device pointer and integrated PCS pointers.
This allows us to use the PCS register offset stored in struct
stmmac_pcs rather than passing it into the function, and also allows
the messages to be printed using dev_info() rather than pr_info(),
thereby allowing the stmmac instance to be identified.
Finally, as dev_info() identifies the driver/device, prefixing with
"stmmac_pcs: " is now redundant, so replace this with just "PCS ".
net: stmmac: pass struct stmmac_priv to host_irq_status() method
Rather than passing struct mac_device_info to the host_irq_status()
method, pass struct stmmac_priv so that we can pass the integrated
PCS to the PCS interrupt handler.
dwmac_pcs_isr() doesn't need to be inlined into the MAC's
host_irq_status method, as handling PCS interrupts isn't performance
critical. However, there is little point calling this function unless
an interrupt is pending for the PCS.
Rename it to stmmac_integrated_pcs_irq() while moving it.
net: stmmac: use BIT_U32() and GENMASK_U32() for PCS registers
stmmac registers a 32-bit. u32 is unsigned int. The use of BIT() and
GENMASK() leads to integer promotion to unsigned long in expressions
such as:
u32 old = foo;
dev_info(dev, "%08x %08x\n", old, old & BIT(1));
resulting in arg2 being accepted as compatible with the format string
and arg3 warning that the argument does not match (because the former
is unsigned int, and the latter is unsigned long.)
Fix this by defining 32-bit register bits using BIT_U32() and
GENMASK_U32() macros.
====================
r8169: add support for RTL8127ATF (10G Fiber SFP)
RTL8127ATF supports a SFP+ port for fiber modules (10GBASE-SR/LR/ER/ZR and
DAC). The list of supported modes was provided by Realtek. According to the
r8127 vendor driver also 1G modules are supported, but this needs some more
complexity in the driver, and only 10G mode has been tested so far.
Therefore mainline support will be limited to 10G for now.
The SFP port signals are hidden in the chip IP and driven by firmware.
Therefore mainline SFP support can't be used here.
The PHY driver is used by the RTL8127ATF support in r8169.
RTL8127ATF reports the same PHY ID as the TP version. Therefore use a dummy
PHY ID.
====================
Heiner Kallweit [Sat, 10 Jan 2026 15:15:32 +0000 (16:15 +0100)]
r8169: add support for RTL8127ATF (Fiber SFP)
RTL8127ATF supports a SFP+ port for fiber modules (10GBASE-SR/LR/ER/ZR and
DAC). The list of supported modes was provided by Realtek. According to the
r8127 vendor driver also 1G modules are supported, but this needs some more
complexity in the driver, and only 10G mode has been tested so far.
Therefore mainline support will be limited to 10G for now.
The SFP port signals are hidden in the chip IP and driven by firmware.
Therefore mainline SFP support can't be used here.
Heiner Kallweit [Sat, 10 Jan 2026 15:14:05 +0000 (16:14 +0100)]
net: phy: realtek: add dummy PHY driver for RTL8127ATF
RTL8127ATF supports a SFP+ port for fiber modules (10GBASE-SR/LR/ER/ZR and
DAC). The list of supported modes was provided by Realtek. According to the
r8127 vendor driver also 1G modules are supported, but this needs some more
complexity in the driver, and only 10G mode has been tested so far.
Therefore mainline support will be limited to 10G for now.
The SFP port signals are hidden in the chip IP and driven by firmware.
Therefore mainline SFP support can't be used here.
This PHY driver is used by the RTL8127ATF support in r8169.
RTL8127ATF reports the same PHY ID as the TP version. Therefore use a dummy
PHY ID. This PHY driver is used by the RTL8127ATF support in r8169.
Jian Zhang [Thu, 8 Jan 2026 10:18:29 +0000 (18:18 +0800)]
net: mctp-i2c: fix duplicate reception of old data
The MCTP I2C slave callback did not handle I2C_SLAVE_READ_REQUESTED
events. As a result, i2c read event will trigger repeated reception of
old data, reset rx_pos when a read request is received.
====================
Add DWMAC glue driver for Motorcomm YT6801
This series adds glue driver for Motorcomm YT6801 PCIe ethernet
controller, which is considered mostly compatible with DWMAC-4 IP by
inspecting the register layout[1]. It integrates a Motorcomm YT8531S PHY
(confirmed by reading PHY ID) and GMII is used to connect the PHY to
MAC[2].
The initialization logic of the MAC is mostly based on previous upstream
effort for the controller[3] and the Deepin-maintained downstream Linux
driver[4] licensed under GPL-2.0 according to its SPDX headers. However,
this series is a completely re-write of the previous patch series,
utilizing the existing DWMAC4 driver and introducing a glue driver only.
This series only aims to add basic networking functions for the
controller, features like WoL, RSS and LED control are omitted for now.
Testing is done on i3-4170, it reaches 939Mbps (TX)/933Mbps (RX) on
average,
Yao Zi [Fri, 9 Jan 2026 09:34:45 +0000 (09:34 +0000)]
net: stmmac: Add glue driver for Motorcomm YT6801 ethernet controller
Motorcomm YT6801 is a PCIe ethernet controller based on DWMAC4 IP. It
integrates an GbE phy, supporting WOL, VLAN tagging and various types
of offloading. It ships an on-chip eFuse for storing various vendor
configuration, including MAC address.
This patch adds basic glue code for the controller, allowing it to be
set up and transmit data at a reasonable speed. Features like WOL could
be implemented in the future.
Signed-off-by: Yao Zi <me@ziyao.cc> Tested-by: Mingcong Bai <jeffbai@aosc.io> Tested-by: Runhua He <hua@aosc.io> Tested-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Sai Krishna <saikrishnag@marvell.com> Link: https://patch.msgid.link/20260109093445.46791-4-me@ziyao.cc Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yao Zi [Fri, 9 Jan 2026 09:34:44 +0000 (09:34 +0000)]
net: phy: motorcomm: Support YT8531S PHY in YT6801 Ethernet controller
YT6801's internal PHY is confirmed as a GMII-capable variant of YT8531S
by a previous series[1] and reading PHY ID. Add support for
PHY_INTERFACE_MODE_GMII for YT8531S to allow the Ethernet driver to
reuse the PHY code for its internal PHY.
Ankit Khushwaha [Fri, 9 Jan 2026 15:22:01 +0000 (20:52 +0530)]
selftests/net/ipsec: Fix variable size type not at the end of struct
The "struct alg" object contains a union of 3 xfrm structures:
union {
struct xfrm_algo;
struct xfrm_algo_aead;
struct xfrm_algo_auth;
}
All of them end with a flexible array member used to store key material,
but the flexible array appears at *different offsets* in each struct.
bcz of this, union itself is of variable-sized & Placing it above
char buf[...] triggers:
ipsec.c:835:5: warning: field 'u' with variable sized type 'union
(unnamed union at ipsec.c:831:3)' not at the end of a struct or class
is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
835 | } u;
| ^
one fix is to use "TRAILING_OVERLAP()" which works with one flexible
array member only.
But In "struct alg" flexible array member exists in all union members,
but not at the same offset, so TRAILING_OVERLAP cannot be applied.
so the fix is to explicitly overlay the key buffer at the correct offset
for the largest union member (xfrm_algo_auth). This ensures that the
flexible-array region and the fixed buffer line up.
====================
net: stmmac: cleanups and low priority fixes
Further cleanups and a few low priority fixes:
- Remove duplicated register definitions from header files
- Fix harmless wrong definition used for PTP message type in
descriptors
- Fix norm_set_tx_desc_len_on_ring() off-by-one error (and make
enh_set_tx_desc_len_on_ring() follow a similar pattern.)
Document the buffer size limits. I believe we never call
norm_set_tx_desc_len_on_ring() with 2KiB lengths.
- use u32 rather than unsigned int for 32-bit quantities in
descriptors
- modernise: convert to use FIELD_PREP() rather than separate mask
and shift definitions.
- Reorganise register and register field definitions: registers
defined in address offset order followed by their register field
definitions.
- Remove lots of unused register definitions.
====================
net: stmmac: cores: remove many xxx_SHIFT definitions
We have many xxx_SHIFT definitions along side their corresponding
xxx_MASK definitions for the various cores. Manually using the
shift and mask can be error prone, as shown with the dwmac4 RXFSTS
fix patch.
Convert sites that use xxx_SHIFT and xxx_MASK directly to use
FIELD_GET(), FIELD_PREP(), and u32_replace_bits() as appropriate.
net: stmmac: descs: remove many xxx_SHIFT definitions
Remove many xxx_SHIFT definitions for descriptors, isntead using
FIELD_PREP(), FIELD_GET(), and u32_replace_bits() as appropriate to
manipulate the bitfields. This avoids potential errors where an
incorrect shift is used with a mask.
Use u32 rather than unsigned int for 32-bit descriptor variables.
This will allow the u32 bitfield helpers to be used. Note, we use
__le32 for the in-memory descriptor structures.
norm_set_tx_desc_len_on_ring() incorrectly tests the buffer length,
leading to a length of 2048 being squeezed into a bitfield covering
bits 10:0 - which results in the buffer 1 size being zero.
If this field is zero, buffer 1 is ignored, and thus is equivalent to
transmitting a zero length buffer.
The path to norm_set_tx_desc_len_on_ring() is only possible when the
hardware does not support enhanced descriptors (plat->enh_desc clear)
which is dependent on the hardware.
The correct definition is RDES1_PTP_MSG_TYPE_MASK, which is also
defined as:
#define RDES1_PTP_MSG_TYPE_MASK GENMASK(11, 8)
Use the correct definition, converting to use FIELD_GET() to extract
it without needing an open-coded shift right that is dependent on the
mask definition.
As this change has no effect on the generated code, there is no need
to treat this as a bug fix.
where rxfsts is tested against small integers 1 .. 3. This results in
the tests always failing, causing the "mtl_rx_fifo__fill_level_empty"
statistic counter to always be incremented no matter what the fill
level actually is.
Fix this by using FIELD_GET() and remove the unnecessary
MTL_DEBUG_RXFSTS_SHIFT definition as FIELD_GET() will shift according
to the least siginificant set bit in the supplied field mask.
Bobby Eshleman [Thu, 8 Jan 2026 01:29:38 +0000 (17:29 -0800)]
net: devmem: convert binding refcount to percpu_ref
Convert net_devmem_dmabuf_binding refcount from refcount_t to percpu_ref
to optimize common-case reference counting on the hot path.
The typical devmem workflow involves binding a dmabuf to a queue
(acquiring the initial reference on binding->ref), followed by
high-volume traffic where every skb fragment acquires a reference.
Eventually traffic stops and the unbind operation releases the initial
reference. Additionally, the high traffic hot path is often multi-core.
This access pattern is ideal for percpu_ref as the first and last
reference during bind/unbind normally book-ends activity in the hot
path.
__net_devmem_dmabuf_binding_free becomes the percpu_ref callback invoked
when the last reference is dropped.
kperf test:
- 4MB message sizes
- 60s of workload each run
- 5 runs
- 4 flows
Jakub Kicinski [Tue, 13 Jan 2026 01:26:45 +0000 (17:26 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-01-09 (ice, ixgbe, idpf)
For ice:
Grzegorz commonizes firmware loading process across all ice devices.
Michal adjusts default queue allocation to be based on
netif_get_num_default_rss_queues() rather than num_online_cpus().
For ixgbe:
Birger Koblitz adds support for 10G-BX modules.
For idpf:
Sreedevi converts always successful function to return void.
Andy Shevchenko fixes kdocs for missing 'Return:' in idpf_txrx.c file.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
idpf: Fix kernel-doc descriptions to avoid warnings
idpf: update idpf_up_complete() return type to void
ice: use netif_get_num_default_rss_queues()
ixgbe: Add 10G-BX support
ice: unify PHY FW loading status handler for E800 devices
====================
Jakub Kicinski [Tue, 13 Jan 2026 01:02:02 +0000 (17:02 -0800)]
Merge tag 'wireless-next-2026-01-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
First set of changes for the current -next cycle, of note:
- ath12k gets an overhaul to support multi-wiphy device
wiphy and pave the way for future device support in
the same driver (rather than splitting to ath13k)
- mac80211 gets some better iteration macros
* tag 'wireless-next-2026-01-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (120 commits)
wifi: mac80211: remove width argument from ieee80211_parse_bitrates
wifi: mac80211_hwsim: remove NAN by default
wifi: mac80211: improve station iteration ergonomics
wifi: mac80211: improve interface iteration ergonomics
wifi: cfg80211: include S1G_NO_PRIMARY flag when sending channel
wifi: mac80211: unexport ieee80211_get_bssid()
wl1251: Replace strncpy with strscpy in wl1251_acx_fw_version
wifi: iwlegacy: 3945-rs: remove redundant pointer check in il3945_rs_tx_status() and il3945_rs_get_rate()
wifi: mac80211: don't send an unused argument to ieee80211_check_combinations
wifi: libertas: fix WARNING in usb_tx_block
wifi: mwifiex: Allocate dev name earlier for interface workqueue name
wifi: wlcore: sdio: Use pm_ptr instead of #ifdef CONFIG_PM
wifi: cfg80211: Fix use_for flag update on BSS refresh
wifi: brcmfmac: rename function that frees vif
wifi: brcmfmac: fix/add kernel-doc comments
wifi: mac80211: Update csa_finalize to use link_id
wifi: cfg80211: add cfg80211_stop_link() for per-link teardown
wifi: ath12k: Skip DP peer creation for scan vdev
wifi: ath12k: move firmware stats request outside of atomic context
wifi: ath12k: add the missing RCU lock in ath12k_dp_tx_free_txbuf()
...
====================
====================
tools: ynl: cli: improve the help and doc
I had some time on the plane to LPC, so here are improvements
to the --help and --list-attrs handling of YNL CLI which seem
in order given growing use of YNL as a real CLI tool.
====================
Jakub Kicinski [Sat, 10 Jan 2026 23:31:42 +0000 (15:31 -0800)]
tools: ynl: cli: print reply in combined format if possible
As pointed out during review of the --list-attrs support the GET
ops very often return the same attrs from do and dump. Make the
output more readable by combining the reply information, from:
Do request attributes:
- ifindex: u32
netdev ifindex
Do reply attributes:
- ifindex: u32
netdev ifindex
[ .. other attrs .. ]
Do request attributes:
- ifindex: u32
netdev ifindex
Do and Dump reply attributes:
- ifindex: u32
netdev ifindex
[ .. other attrs .. ]
Tested-by: Gal Pressman <gal@nvidia.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260110233142.3921386-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 10 Jan 2026 23:31:41 +0000 (15:31 -0800)]
tools: ynl: cli: extract the event/notify handling in --list-attrs
Event and notify handling is quite different from do / dump
handling. Forcing it into print_mode_attrs() doesn't really
buy us anything as events and notifications do not have requests.
Call print_attr_list() directly. Apart form subjective code
clarity this also removes the word "reply" from the output:
Before:
Event reply attributes:
Now:
Event attributes:
Tested-by: Gal Pressman <gal@nvidia.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260110233142.3921386-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 10 Jan 2026 23:31:40 +0000 (15:31 -0800)]
tools: ynl: cli: factor out --list-attrs / --doc handling
We'll soon add more code to the --doc handling. Factor it out
to avoid making main() too long.
Tested-by: Gal Pressman <gal@nvidia.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260110233142.3921386-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 10 Jan 2026 23:31:39 +0000 (15:31 -0800)]
tools: ynl: cli: add --doc as alias to --list-attrs
--list-attrs also provides information about the operation itself.
So --doc seems more appropriate. Add an alias.
Tested-by: Gal Pressman <gal@nvidia.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260110233142.3921386-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 10 Jan 2026 23:31:38 +0000 (15:31 -0800)]
tools: ynl: cli: improve --help
Improve the clarity of --help. Reorder, provide some grouping and
add help messages to most of the options.
No functional changes intended.
Tested-by: Gal Pressman <gal@nvidia.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260110233142.3921386-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 10 Jan 2026 23:31:37 +0000 (15:31 -0800)]
tools: ynl: cli: wrap the doc text if it's long
We already use textwrap when printing "doc" section about an attribute,
but only to indent the text. Switch to using fill() to split and indent
all the lines. While at it indent the text by 2 more spaces, so that it
doesn't align with the name of the attribute.
Before (I'm drawing a "box" at ~60 cols here, in an attempt for clarity):
| - irq-suspend-timeout: uint |
| The timeout, in nanoseconds, of how long to suspend irq|
|processing, if event polling finds events |
After:
| - irq-suspend-timeout: uint |
| The timeout, in nanoseconds, of how long to suspend |
| irq processing, if event polling finds events |
Tested-by: Gal Pressman <gal@nvidia.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260110233142.3921386-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 10 Jan 2026 23:31:36 +0000 (15:31 -0800)]
tools: ynl: cli: introduce formatting for attr names in --list-attrs
It's a little hard to make sense of the output of --list-attrs,
it looks like a wall of text. Sprinkle a little bit of formatting -
make op and attr names bold, and Enum: / Flags: keywords italics.
Tested-by: Gal Pressman <gal@nvidia.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260110233142.3921386-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Miri Korenblit [Thu, 8 Jan 2026 13:32:57 +0000 (14:32 +0100)]
wifi: mac80211: remove width argument from ieee80211_parse_bitrates
The width parameter in ieee80211_parse_bitrates() is unused. Remove it.
While at it, use the already fetched sband pointer as an argument
instead of dereferencing it once again.
Johannes Berg [Thu, 8 Jan 2026 13:31:38 +0000 (14:31 +0100)]
wifi: mac80211_hwsim: remove NAN by default
We're improving NAN support, but NAN datapath support also
means we need to change some other things, e.g. related to
rate control. Remove NAN by default again from hwsim since
it's the much newer feature.
Johannes Berg [Thu, 8 Jan 2026 13:34:32 +0000 (14:34 +0100)]
wifi: mac80211: improve station iteration ergonomics
Right now, the only way to iterate stations is to declare an
iterator function, possibly data structure to use, and pass all
that to the iteration helper function. This is annoying, and
there's really no inherent need for it.
Add a new for_each_station() macro that does the iteration in
a more ergonomic way. To avoid even more exported functions, do
the old ieee80211_iterate_stations_mtx() as an inline using the
new way, which may also let the compiler optimise it a bit more,
e.g. via inlining the iterator function.
Right now, the only way to iterate interfaces is to declare an
iterator function, possibly data structure to use, and pass all
that to the iteration helper function. This is annoying, and
there's really no inherent need for it, except it was easier to
implement with the iflist mutex, but that's not used much now.
Add a new for_each_interface() macro that does the iteration in
a more ergonomic way. To avoid even more exported functions, do
the old ieee80211_iterate_active_interfaces_mtx() as an inline
using the new way, which may also let the compiler optimise it
a bit more, e.g. via inlining the iterator function.
Also provide for_each_active_interface() for the common case of
just iterating active interfaces.
Thorsten Blum [Sun, 11 Jan 2026 13:42:57 +0000 (14:42 +0100)]
wl1251: Replace strncpy with strscpy in wl1251_acx_fw_version
strncpy() is deprecated [1] for NUL-terminated destination buffers since
it does not guarantee NUL termination. Remove the manual NUL termination
and replace strncpy() with strscpy() to ensure NUL termination of the
destination buffer.
Using strscpy_pad() to retain the NUL-padding behavior of strncpy() is
not needed because ->fw_ver is only used as a C-string.
Jakub Kicinski [Sat, 10 Jan 2026 23:19:54 +0000 (15:19 -0800)]
Merge branch 'bnxt_en-updates-for-net-next'
Michael Chan says:
====================
bnxt_en: Updates for net-next
This patchset updates the driver with a FW interface update to support
FEC stats histogram and NVRAM defragmentation. Patch #2 adds PTP
cross timestamps [1]. Patch #3 adds FEC histogram stats. Patch #4 adds
NVRAM defragmentation support that prevents FW update failure when NVRAM
is fragmented. Patch #5 improves RSS distribution accuracy when certain
number of rings is in use. The last patch adds ethtool
.get_link_ext_state() support.
====================
Map the link_down_reason from the FW to the ethtool link_ext_state
when it is available. Also log it to the link down dmesg when it is
available. Add 2 new link_ext_state enums to the UAPI:
Michael Chan [Thu, 8 Jan 2026 18:35:20 +0000 (10:35 -0800)]
bnxt_en: Use a larger RSS indirection table on P5_PLUS chips
The driver currently uses a chip supported RSS indirection table size
just big enough to cover the number of RX rings. Each table with 64
entries requires one HW RSS context. The HW supported table sizes are
64, 128, 256, and 512 entries. Using the smallest table size can cause
unbalanced RSS packet distributions. For example, if the number of
rings is 48, the table size using existing logic will be 64. 32 rings
will have a weight of 1 and 16 rings will have a weight of 2 when
set to default even distribution. This represents a 100% difference in
weights between some of the rings.
Newer FW has increased the RSS indirection table resource. When the
increased resource is detected, use the largest RSS indirection table
size (512 entries) supported by the chip. Using the same example
above, the weights of the 48 rings will be either 10 or 11 when set to
default even distribution. The weight difference is only 10%.
If there are thousands of VFs, there is a possiblity that we may not
be able to allocate this larger RSS indirection table from the FW, so
we add a check to fall back to the legacy scheme.
Pavan Chebbi [Thu, 8 Jan 2026 18:35:19 +0000 (10:35 -0800)]
bnxt_en: Defrag the NVRAM region when resizing UPDATE region fails
When updating to a new firmware pkg, the driver checks if the UPDATE
region is big enough for the pkg and if it's not big enough, it
issues an NVM_WRITE cmd to update with the requested size.
This NVM_WRITE cmd can fail indicating fragmented region. Currently
the driver fails the fw update when this happens. We can improve the
situation by defragmenting the region and try the NVM_WRITE cmd
again. This will make firmware update more reliable.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260108183521.215610-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Thu, 8 Jan 2026 18:35:18 +0000 (10:35 -0800)]
bnxt_en: Add support for FEC bin histograms
Fill in the struct ethtool_fec_hist passed to the bnxt_get_fec_stats()
callback if the FW supports the feature. Bins 0 to 15 inclusive are
available when the feature is supported.
Pavan Chebbi [Thu, 8 Jan 2026 18:35:17 +0000 (10:35 -0800)]
bnxt_en: Add PTP .getcrosststamp() interface to get device/host times
.getcrosststamp() helps the applications to obtain a snapshot of
device and host time almost taken at the same time. This function
will report PCIe PTM device and host times to any application using
the ioctl PTP_SYS_OFFSET_PRECISE. The device time from the HW is
48-bit and needs to be converted to 64-bit.
Michael Chan [Thu, 8 Jan 2026 18:35:16 +0000 (10:35 -0800)]
bnxt_en: Update FW interface to 1.10.3.151
The main changes are the new HWRM_PORT_PHY_FDRSTAT command to collect
FEC histogram bins and the new HWRM_NVM_DEFRAG command to defragment the
NVRAM. There is also a minor name change in struct hwrm_vnic_cfg_input
that requires updating the bnxt_re driver's main.c.
Jakub Kicinski [Thu, 8 Jan 2026 22:52:57 +0000 (14:52 -0800)]
selftests: net: py: ensure defer() is only used within a test case
I wasted a couple of hours recently after accidentally adding
a defer() from within a function which itself was called as
part of defer(). This leads to an infinite loop of defer().
Make sure this cannot happen and raise a helpful exception.
I understand that the pair of _ksft_defer_arm() calls may
not be the most Pythonic way to implement this, but it's
easy enough to understand.
Jakub Kicinski [Thu, 8 Jan 2026 22:52:56 +0000 (14:52 -0800)]
selftests: net: py: capitalize defer queue and improve import
Import utils and refer to the global defer queue that way instead
of importing the queue. This will make it possible to assign value
to the global variable. While at it capitalize the name, to comply
with the Python coding style.
Cosmin Ratiu [Fri, 9 Jan 2026 11:08:51 +0000 (13:08 +0200)]
selftests: drv-net: psp: Better control the used PSP dev
The PSP responder fails when zero or multiple PSP devices are detected.
There's an option to select the device id to use (-d) but it's
currently not used from the PSP self test. It's also hard to use because
the PSP test doesn't dump the PSP devices so can't choose one.
When zero devices are detected, psp_responder fails which will cause the
parent test to fail as well instead of skipping PSP tests.
Fix both of these problems. Change psp_responder to:
- not fail when no PSP devs are detected.
- get an optional -i ifindex argument instead of -d.
- select the correct PSP dev from the dump corresponding to ifindex or
- select the first PSP dev when -i is not given.
- fail when multiple devs are found and -i is not given.
- warn and continue when the requested ifindex is not found.
Also plumb the ifindex from the Python test.
With these, when there are no PSP devs found or the wrong one is chosen,
psp_responder opens the server socket, listens for control connections
normally, and leaves the skipping of the various test cases which
require a PSP device (~most, but not all of them) to the parent test.
This results in output like:
ok 1 psp.test_case # SKIP No PSP devices found
[...]
ok 12 psp.dev_get_device # SKIP No PSP devices found
ok 13 psp.dev_get_device_bad
ok 14 psp.dev_rotate # SKIP No PSP devices found
[...]
====================
net: convert drivers to .get_rx_ring_count()
Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.
Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count().
This simplifies the RX ring count retrieval and aligns the following
drivers with the new ethtool API for querying RX ring parameters.
* hns3
* hns
* qede
* niu
* funeth
* enic
* hinic
* octeontx2
PS: all of these change were compile-tested only.
====================
Breno Leitao [Thu, 8 Jan 2026 11:43:00 +0000 (03:43 -0800)]
net: stmmac: convert to use .get_rx_ring_count
Convert the stmmac driver to use the new .get_rx_ring_count
ethtool operation instead of implementing .get_rxnfc for handling
ETHTOOL_GRXRINGS command.
Since stmmac_get_rxnfc() only handled ETHTOOL_GRXRINGS (returning
-EOPNOTSUPP for all other commands), remove it entirely and replace
it with the simpler stmmac_get_rx_ring_count() callback.
Gal Pressman [Wed, 7 Jan 2026 09:18:47 +0000 (11:18 +0200)]
net/mlx5e: TSO for UDP over GRE over vlan packets
The hardware supports segmentation offload of UDP over GRE over vlan
packets, allow it by adding NETIF_F_GSO_UDP_L4 to hw_enc_features which
will make the vlan device inherit it to its own hw_enc_features.
Side note: it is quite confusing that this change wasn't needed to
offload encapsulated UDP packets regardless of vlan, but that's the way
that the stack handles gso partial features, it assumes they're
supported without caring if the feature is supported in hw_enc_features.
Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20260107091848.621884-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
SGMII in-band autonegotiation was previously kept untouched (and restored
after switching back from 2500Base-X to SGMII). Now that the kernel offers
a way to announce in-band capabilities and nable/disable in-band AN,
implement the .inband_caps and .config_inband driver ops.
This moves the responsibility to configure SGMII in-band AN from the PHY
driver to phylink.
Jakub Kicinski [Wed, 7 Jan 2026 23:25:57 +0000 (15:25 -0800)]
selftests: drv-net: gro: increase the rcvbuf size
The gro.py test (testing software GRO) is slightly flaky when
running against fbnic. We see one flake per roughly 20 runs in NIPA,
mostly in ipip.large, and always including some EAGAIN:
# Shouldn't coalesce if exceed IP max pkt size: Test succeeded
# Expected {65475 899 }, Total 2 packets
# Received {65475 899 }, Total 2 packets.
# Expected {64576 900 900 }, Total 3 packets
# Received {64576 /home/virtme/testing/wt-24/tools/testing/selftests/drivers/net/gro: could not receive: Resource temporarily unavailable
The test sends 2 large frames (64k + change). Looks like the default
packet socket rcvbuf (~200kB) may not be large enough to hold them.
Bump the rcvbuf to 1MB.
Add a debug print showing socket statistics to make debugging this
issue easier in the future. Without the rcvbuf increase we see:
# Shouldn't coalesce if exceed IP max pkt size: Test succeeded
# Expected {65475 899 }, Total 2 packets
# Received {65475 899 }, Total 2 packets.
# Expected {64576 900 900 }, Total 3 packets
# Received {64576 Socket stats: packets=7, drops=3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# /home/virtme/testing/wt-24/tools/testing/selftests/drivers/net/gro: could not receive: Resource temporarily unavailable
Jakub Kicinski [Tue, 6 Jan 2026 20:02:05 +0000 (12:02 -0800)]
selftests: tls: avoid flakiness in data_steal
We see the following failure a few times a week:
# RUN global.data_steal ...
# tls.c:3280:data_steal:Expected recv(cfd, buf2, sizeof(buf2), MSG_DONTWAIT) (10000) == -1 (-1)
# data_steal: Test failed
# FAIL global.data_steal
not ok 8 global.data_steal
The 10000 bytes read suggests that the child process did a recv()
of half of the data using the TLS ULP and we're now getting the
remaining half. The intent of the test is to get the child to
enter _TCP_ recvmsg handler, so it needs to enter the syscall before
parent installed the TLS recvmsg with setsockopt(SOL_TLS).
Instead of the 10msec sleep send 1 byte of data and wait for the
child to consume it.
Sreedevi Joshi [Wed, 26 Nov 2025 17:02:16 +0000 (11:02 -0600)]
idpf: update idpf_up_complete() return type to void
idpf_up_complete() function always returns 0 and no callers use this return
value. Although idpf_vport_open() checks the return value, it only handles
error cases which never occur. Change the return type to void to simplify
the code.
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
On some high-core systems (like AMD EPYC Bergamo, Intel Clearwater
Forest) loading ice driver with default values can lead to queue/irq
exhaustion. It will result in no additional resources for SR-IOV.
In most cases there is no performance reason for more than half
num_cpus(). Limit the default value to it using generic
netif_get_num_default_rss_queues().
Still, using ethtool the number of queues can be changed up to
num_online_cpus(). It can be done by calling:
$ethtool -L ethX combined $(nproc)
This change affects only the default queue amount.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Birger Koblitz [Tue, 2 Dec 2025 05:57:48 +0000 (06:57 +0100)]
ixgbe: Add 10G-BX support
Add support for 10G-BX modules, i.e. 10GBit Ethernet over a single strand
Single-Mode fiber.
The initialization of a 10G-BX SFP+ is the same as for a 10G SX/LX module,
and is identified according to SFF-8472 table 5-3, footnote 3 by the
10G Ethernet Compliance Codes field being empty, the Nominal Bit
Rate being compatible with 12.5GBit, and the module being a fiber module
with a Single Mode fiber link length.
This was tested using a Lightron WSPXG-HS3LC-IEA 1270/1330nm 10km
transceiver:
$ sudo ethtool -m enp1s0f1
Identifier : 0x03 (SFP)
Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID)
Connector : 0x07 (LC)
Transceiver codes : 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Encoding : 0x01 (8B/10B)
BR Nominal : 10300MBd
Rate identifier : 0x00 (unspecified)
Length (SMF) : 10km
Length (OM2) : 0m
Length (OM1) : 0m
Length (Copper or Active cable) : 0m
Length (OM3) : 0m
Laser wavelength : 1330nm
Vendor name : Lightron Inc.
Vendor OUI : 00:13:c5
Vendor PN : WSPXG-HS3LC-IEA
Vendor rev : 0000
Option values : 0x00 0x1a
Option : TX_DISABLE implemented
BR margin max : 0%
BR margin min : 0%
Vendor SN : S142228617
Date code : 140611
Optical diagnostics support : Yes
Signed-off-by: Birger Koblitz <mail@birger-koblitz.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Grzegorz Nitka [Fri, 17 Oct 2025 08:42:28 +0000 (10:42 +0200)]
ice: unify PHY FW loading status handler for E800 devices
Unify handling of PHY firmware load delays across all E800 family
devices. There is an existing mechanism to poll GL_MNG_FWSM_FW_LOADING_M
bit of GL_MNG_FWSM register in order to verify whether PHY FW loading
completed or not. Previously, this logic was limited to E827 variants
only.
Also, inform a user of possible delay in initialization process, by
dumping informational message in dmesg log ("Link initialization is
blocked by PHY FW initialization. Link initialization will continue
after PHY FW initialization completes.").
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>