Arnd Bergmann [Tue, 5 May 2026 18:04:57 +0000 (20:04 +0200)]
w5100: remove MMIO support
This driver supports both SPI and MMIO based register access, but only
the former has devicetree support. While MMIO mode would have worked
with old-style board files, those have never defined such a device
upstream.
Remove the MMIO mode, leaving SPI as the only way to use this driver,
but leave it in two loadable modules. More cleanups can be done by
combining the two into one file.
====================
net/mlx5: Improve representor lifecycle and late IB representor loading
This series addresses two problems that have been present for years, and
fixes one representor reload error-unwind case exposed while making the
reload path reusable.
First, there is no coordination between E-Switch reconfiguration and
representor registration. The E-Switch can be mid-way through a mode
change or VF count update while mlx5_ib walks in and registers or
unregisters representors. Nothing stops them. The race window is small
and there is no field report, but it is clearly wrong.
Second, loading mlx5_ib while the device is already in switchdev mode
does not bring up the IB representors. mlx5_eswitch_register_vport_reps()
only stores callbacks; nobody triggers the actual load after registration.
The series fixes the registration race with a per-E-Switch representor
mutex. The lock is introduced first, then LAG shared-FDB and multiport
E-Switch transitions are adjusted so auxiliary device rescans and IB
representor reloads do not hold ldev->lock while taking the representor
lock. This keeps the intermediate commits bisectable before the stricter
E-Switch serialization and lock assertions are enabled.
After the LAG ordering is fixed, all E-Switch reconfiguration paths that
create, destroy, load, or unload representors take the representor mutex.
esw_mode_change() deliberately drops the mutex around
mlx5_rescan_drivers_locked(), because auxiliary probe and remove paths
re-enter mlx5_eswitch_register_vport_reps() and
mlx5_eswitch_unregister_vport_reps() on the same thread.
The shared-FDB peer IB registration path can hold one E-Switch
representor mutex and then register peer representor ops on another
E-Switch. The series annotates that case as nested locking so lockdep can
distinguish it from recursive locking on the same E-Switch.
For the missing IB representors, mlx5_eswitch_register_vport_reps() queues
a work item that acquires the devlink lock and loads all relevant
representors. This is the change that actually fixes the long-standing
bug.
The reload path also learns to track which representor types were loaded by
the current attempt, so an error does not unload representors that were
already active before the retry.
Patch 1 is cleanup. LAG and MPESW had the same representor reload
sequence duplicated in several places and the copies had started to
drift. This consolidates them into one helper.
Patch 3 adds the per-E-Switch representor lifecycle lock and helper APIs.
Patch 4 adjusts the LAG shared-FDB and multiport E-Switch transitions so
auxiliary device rescans and IB representor reloads run without
ldev->lock held while taking the representor lock.
Patch 5 protects the E-Switch reconfiguration, representor registration
and peer IB representor paths with the representor lock.
Patch 6 fixes representor load error unwind so only representor types
loaded by the current attempt are unloaded on failure.
Patch 7 moves the representor load triggered by
mlx5_eswitch_register_vport_reps() onto the work queue. This is the patch
that fixes IB representors not coming up when mlx5_ib is loaded while the
device is already in switchdev mode.
====================
Mark Bloch [Sun, 3 May 2026 20:27:26 +0000 (23:27 +0300)]
net/mlx5: E-Switch, load reps via work queue after registration
mlx5_eswitch_register_vport_reps() only installs representor callbacks and
marks the rep type as registered. If the E-Switch is already in switchdev
mode, the newly registered rep type must then be loaded for already enabled
vports.
That load path needs to run under the devlink lock, which is not held by
the auxiliary driver registration context. Queue the reload to the E-Switch
workqueue, whose handler acquires the devlink lock, and load the relevant
representors from there.
Since representor registration runs from sleepable auxiliary-driver
context, queue the late reload with GFP_KERNEL. The functions-change
notifier path remains the GFP_ATOMIC user of mlx5_esw_add_work().
The unregister path is unchanged and still unloads representors
synchronously while tearing down the registered callbacks.
Mark Bloch [Sun, 3 May 2026 20:27:25 +0000 (23:27 +0300)]
net/mlx5: E-Switch, unwind only newly loaded representor types
__esw_offloads_load_rep() may return success without invoking the
representor load callback when the representor type is already loaded.
On a later load failure, mlx5_esw_offloads_rep_load() unconditionally
unloaded all previously iterated representor types. This could unload
representor types that were already loaded before this load attempt.
Track which representor types were actually loaded by the current call and
unwind only those on error. Also restore the representor state back to
REP_REGISTERED when the load callback itself fails.
Representor callbacks can be registered and unregistered while the
E-Switch is already in switchdev mode, and the same E-Switch may also be
reconfigured by devlink, VF changes and SF changes. Serialize these paths
with the per-E-Switch representor mutex instead of relying on ad-hoc bit
state and wait queues.
Take the representor lock around the mode transition, VF/SF representor
changes and representor ops registration. Keep mode_lock and the
representor lock unnested by using the operation flag while the mode lock
is dropped. During mode changes, drop the representor lock around the
auxiliary bus rescan because driver bind/unbind may register or unregister
representor ops.
Split representor ops registration into locked public wrappers and blocked
internal helpers, clear the ops pointer on unregister, and add nested
wrappers for the shared-FDB master IB path that registers peer
representor ops while another E-Switch representor lock is already held.
On unregister, always call __unload_reps_all_vport() before marking reps
unregistered and clearing rep_ops. The per-representor state check makes
this a no-op for types that were not loaded, so unregister no longer has
to infer load state from esw->mode.
Mark Bloch [Sun, 3 May 2026 20:27:23 +0000 (23:27 +0300)]
net/mlx5: Lag, avoid LAG and representor lock cycles
The LAG shared-FDB and multiport E-Switch transitions rescan auxiliary
devices and reload IB representors while holding ldev->lock. Driver
bind/unbind paths may register or unregister E-Switch representor ops, and
representor load paths may enter LAG code, so holding ldev->lock across
those calls creates lock-order cycles with the E-Switch representor lock.
Keep the devcom component locked for the transition, but drop ldev->lock
before rescanning auxiliary devices or reloading IB representors. Mark the
LAG transition as in progress while the lock is dropped and assert the
devcom lock where the helper relies on it. This preserves LAG serialization
while avoiding ldev->lock nesting under E-Switch representor registration.
Add a per-E-Switch mutex for serializing representor lifecycle work and
provide small helpers for taking and dropping it. Initialize and destroy
the mutex with the E-Switch offloads state.
Add the lock and helper API first. Follow-up patches will take the lock in
the individual representor lifecycle components. This keeps the functional
changes split by component and leaves this patch without intended behavior
change, making the series easier to review and bisectable.
Mark Bloch [Sun, 3 May 2026 20:27:21 +0000 (23:27 +0300)]
net/mlx5: E-Switch, let esw work callers choose GFP flags
mlx5_esw_add_work() always allocates the queued work item with
GFP_ATOMIC. That is required for the E-Switch functions-change notifier,
but not every caller of this helper will run from atomic context.
Pass an allocation flag to mlx5_esw_add_work() and keep the notifier
caller using GFP_ATOMIC. This allows sleepable callers to use GFP_KERNEL
instead of unnecessarily relying on atomic reserves.
Representor reload during LAG/MPESW transitions has to be repeated in
several flows, and each open-coded loop was easy to get out of sync
when adding new flags or tweaking error handling. Move the sequencing
into a single helper so that all call sites share the same ordering
and checks.
====================
r8152: Add support for the RTL8159 10Gbit USB Ethernet chip
Add support for the RTL8159, which is a 10GBit USB-Ethernet adapter
chip in the RTL815x family of chips.
The RTL8159 re-uses the frame descriptor format and SRAM2 access introduced
with the RTL8157 as well as most of the setup and PM logic of the RTL8157.
The module was tested with a Lekuo DR59R11 USB-C 10GbE Ethernet Adapter:
[ 2502.906947] usb 2-1: new SuperSpeed USB device number 3 using xhci_hcd
[ 2502.927859] usb 2-1: New USB device found, idVendor=0bda, idProduct=815a, bcdDevice=30.00
[ 2502.927867] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=7
[ 2502.927871] usb 2-1: Product: USB 10/100/1G/2.5G/5G/10G LAN
[ 2502.927873] usb 2-1: Manufacturer: Realtek
[ 2502.927875] usb 2-1: SerialNumber: 000388C9B3B5XXXX
[ 2503.063745] r8152-cfgselector 2-1: reset SuperSpeed USB device number 3 using xhci_hcd
[ 2503.123876] r8152 2-1:1.0: Requesting firmware: rtl_nic/rtl8159-1.fw
[ 2503.126267] r8152 2-1:1.0: PHY firmware installed 0 to be loaded: 20
[ 2503.156265] r8152 2-1:1.0: load rtl8159-1 v1 2026/01/01 successfully
[ 2503.270729] r8152 2-1:1.0 eth0: v1.12.13
[ 2503.289349] r8152 2-1:1.0 enx88c9b3b5xxxx: renamed from eth0
[ 2507.777055] r8152 2-1:1.0 enx88c9b3b5xxxx: carrier on
The RTL8159 adapter was tested against an AQC107 PCIe-card supporting
10GBit/s and an RTL8157 5Gbit USB-Ethernet adapter supporting 5GBit/s for
performance, link speed and EEE negotiation. Using USB3.2 Gen 2 (20GBit) with
the RTL8159 USB adapter and running iperf3 against the AQC107 PCIe
card resulted in 8.96 Gbits/sec transfer speed.
The code is based on the out-of-tree r8152 driver published by Realtek under
the GPL.
The RTL8159 requires firmware for the PHY in order to achieve a 10GBit link
speed. Without firmware, only 5GBit were achieved. The firmware can be
extracted from the out-of-tree r8152 driver-code where it is stored in the
ram17 u8-array. Code is added to use the existing firmware upload mechanism
of the driver for the RTL8157/9 PHY firmware code. The firmware will be
submitted separately to linux-firmware.
====================
Birger Koblitz [Tue, 5 May 2026 15:56:35 +0000 (17:56 +0200)]
r8152: Add firmware upload capability for RTL8157/RTL8159
The RTL8159 (RTL_VER_17) requires firmware for its PHY in order to work
at connection speeds > 5GBit. Add support for uploading firmware for
the PHY using the existing rtl8152_apply_firmware() function
in r8157_hw_phy_cfg() and set up the correct names for the firmware
files.
This also adds support for uploading firmware for the RTL8157
(RTL_VER_16) PHY, for which firmware is however not strictly necessary
to work. Still, this allows to upload newer versions of the firmware used
by this chip, e.g. to improve interoperability.
If no firmware is found, both the RTL8157 and the RTL8159 will continue
to work.
Birger Koblitz [Tue, 5 May 2026 15:56:34 +0000 (17:56 +0200)]
r8152: Add support for the RTL8159 chip
The RTL8159 re-uses the packet descriptor format introduced with the
RTL8157 and other hardware features of the RTL8157 (RTL_VER_16) such
as the SRAM access. The support therefore consists in expanding the
existing RTL8157 code for initialization and USB power management
to also be used for the RTL8159 (RTL_VER_17).
Most of the additional code is added in r8157_hw_phy_cfg() to configure
the RTL8159 PHY.
Add support for the USB device ID of Realtek RTL8159-based adapters,
for which the product ID is 0x815a. Detect the RTL8159 as RTL_VER_17
and set it up.
Birger Koblitz [Tue, 5 May 2026 15:56:33 +0000 (17:56 +0200)]
r8152: Add support for 10Gbit Link Speeds and EEE
The RTL8159 supports 10GBit Link speeds. Add support for this speed
in the setup and setting/getting through ethtool. Also add 10GBit EEE.
Add functionality for setup and ethtool get/set methods.
Jakub Kicinski [Thu, 7 May 2026 01:39:00 +0000 (18:39 -0700)]
Merge branch 'net-mlx5e-report-more-netdev-stats'
Tariq Toukan says:
====================
net/mlx5e: Report more netdev stats
This series by Gal extends the set of counters reported in netdev stats,
by adding:
- hw_gso_packets/bytes
- RX HW-GRO stats
- TX csum_none
- TX queue stop/wake
It also aligns the tso_bytes/tso_inner_bytes counters with the netdev
stats API and virtio spec definition.
====================
Gal Pressman [Mon, 4 May 2026 18:37:02 +0000 (21:37 +0300)]
net/mlx5e: Report RX HW-GRO netdev stats
Report RX hardware GRO statistics via the netdev queue stats API by
mapping the existing gro_packets, gro_bytes and gro_skbs counters to the
hw_gro_wire_packets, hw_gro_wire_bytes and hw_gro_packets fields.
Gal Pressman [Mon, 4 May 2026 18:37:00 +0000 (21:37 +0300)]
net/mlx5e: Count full skb length in TSO byte counters
The tso_bytes and tso_inner_bytes counters currently subtract the header
length from skb->len, counting only the payload. This is confusing and
doesn't align with the behavior of other _bytes counters in the driver.
Report the full skb length to align with this expectation.
This also makes our behavior consistent with the netdev stats API and
virtio spec definition.
Justin Lai [Tue, 5 May 2026 06:41:21 +0000 (14:41 +0800)]
rtase: Fix flow control configuration
The hardware has two sets of registers controlling TX/RX flow control.
The effective flow control state is determined by the logical OR of
these two sets of bits.
RTASE_FORCE_TXFLOW_EN and RTASE_FORCE_RXFLOW_EN in RTASE_CPLUS_CMD are
the bits used by the driver to control TX/RX flow control according to
the ethtool pause configuration.
RTASE_TXFLOW_EN and RTASE_RXFLOW_EN in RTASE_GPHY_STD_00 are another
set of TX/RX flow control enable bits. Clear them by default so they do
not keep flow control enabled independently of the driver setting.
With the RTASE_GPHY_STD_00 bits cleared, the effective flow control
state is controlled through RTASE_CPLUS_CMD, so the ethtool setting can
take effect correctly.
Jakub Kicinski [Wed, 6 May 2026 14:29:32 +0000 (07:29 -0700)]
Merge tag 'wireless-next-2026-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Lots of new content in cfg80211/mac80211, notably
- more NAN work, mostly complete now (also hwsim)
- more UHR work (e.g. non-primary channel access),
this will continue for a while
- FTM ranging APIs
* tag 'wireless-next-2026-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (70 commits)
wifi: mac80211: explicitly disable FTM responder on AP stop
wifi: iwlwifi: don't blindly start the responder upon BSS_CHANGED_FTM_RESPONDER
wifi: mac80211_hwsim: claim HT STBC capability
wifi: mac80211_hwsim: enable NAN_DATA interface simulation support
wifi: mac80211_hwsim: Support Tx of multicast data on NAN
wifi: mac80211_hwsim: Do not declare support for NDPE
wifi: mac80211_hwsim: Declare support for secure NAN
wifi: mac80211_hwsim: add NAN data path TX/RX support
wifi: mac80211_hwsim: set HAS_RATE_CONTROL when using NAN
wifi: mac80211_hwsim: implement NAN schedule callbacks
wifi: mac80211_hwsim: add NAN PHY capabilities
wifi: mac80211_hwsim: add NAN_DATA interface limits
wifi: mac80211_hwsim: implement NAN synchronization
wifi: mac80211_hwsim: protect tsf_offset using a spinlock
wifi: mac80211_hwsim: only RX on NAN when active on a slot
wifi: mac80211_hwsim: select NAN TX channel based on current TSF
wifi: mac80211_hwsim: limit TX of frames to the NAN DW
wifi: cfg80211: don't allow NAN DATA on multi radio devices
wifi: mac80211: check AP using NPCA has NPCA capability
wifi: mac80211: don't parse full UHR operation from beacons
...
====================
Johannes Berg [Wed, 6 May 2026 09:32:32 +0000 (11:32 +0200)]
wifi: mac80211_hwsim: claim HT STBC capability
This is already claimed for VHT and HE, so it doesn't really
make sense to not claim it for HT, and this causes sigma-dut
failures since it assumes VHT support implies HT support.
Daniel Gabay [Wed, 6 May 2026 03:44:31 +0000 (06:44 +0300)]
wifi: mac80211_hwsim: enable NAN_DATA interface simulation support
Enable NAN_DATA interface simulation support by adding it to the
supported interface types. This completes the NAN Data Path
simulation introduced in the previous patches.
Ilan Peer [Wed, 6 May 2026 03:44:33 +0000 (06:44 +0300)]
wifi: mac80211_hwsim: Support Tx of multicast data on NAN
Add support for transmitting multicast data frames. These
frames can be transmitted when all the peer NDI stations
on the interface are available at the current slot.
Daniel Gabay [Wed, 6 May 2026 03:44:29 +0000 (06:44 +0300)]
wifi: mac80211_hwsim: add NAN data path TX/RX support
Implement TX and RX path handling for NAN Data Path (NDP) frames,
enabling data communication between NAN peers during scheduled
availability windows.
TX path:
- Select TX channel based on current time slot: use DW channel
during Discovery Windows, or FAW channel from local
schedule during Further Availability Windows.
- Verify peer availability before transmission by checking committed
DW schedule or FAW of the peer schedule.
RX path:
- Extend NAN receive filtering to handle NAN_DATA interface frames.
- Accept incoming frames during FAW slots when channel matches local
schedule.
Daniel Gabay [Wed, 6 May 2026 03:44:28 +0000 (06:44 +0300)]
wifi: mac80211_hwsim: set HAS_RATE_CONTROL when using NAN
- NAN switches between bands/channels per its schedule, so mac80211
rate control can't work, set HAS_RATE_CONTROL instead.
- Skip rate control checks for NAN interfaces in
mac80211_hwsim_sta_rc_update() as it's not relevant.
- Move set_rts_threshold stub to HWSIM_COMMON_OPS and return 0 instead
of -EOPNOTSUPP to prevent failures in non-MLO tests that set RTS
threshold (hwsim ignores the use_rts instruction from mac80211
anyway).
Daniel Gabay [Wed, 6 May 2026 03:44:27 +0000 (06:44 +0300)]
wifi: mac80211_hwsim: implement NAN schedule callbacks
Implement mac80211 schedule callbacks for NAN Data Path support:
- Track local schedule via BSS_CHANGED_NAN_LOCAL_SCHED, caching
the channel for each 16TU time slot.
- Copy peer schedule to driver-private storage in
nan_peer_sched_changed callback for use in TX availability
decisions.
Daniel Gabay [Wed, 6 May 2026 03:44:26 +0000 (06:44 +0300)]
wifi: mac80211_hwsim: add NAN PHY capabilities
Add static HT, VHT and HE PHY capabilities to the NAN capabilities
structure. These are based on the existing band capability structures
and initialization in mac80211_hwsim.
The NAN PHY capabilities are used by mac80211 and nl80211 to
advertise device capabilities for NAN data interfaces.
Benjamin Berg [Wed, 6 May 2026 03:44:24 +0000 (06:44 +0300)]
wifi: mac80211_hwsim: implement NAN synchronization
Add all the handling to do NAN synchronization on 2.4 GHz including
sending out beacons. With this, the mac80211_hwsim NAN device also works
when used in conjunction with an external medium simulation.
Note that the TSF sync is not ideal in case of an external medium
simulation. This is because the mactime for received frames needs to be
estimated and the simulation may not update the timestamp of beacons
to the actual time that the frame was transmitted.
The implementation has an initial short phase where it scans for
clusters. This facilitates cluster joining and avoids creating a new
cluster immediately, which would result in two cluster join
notifications. It does not scan otherwise and will only see another
cluster appearing if a discovery beacon happens to be sent during the
2.4 GHz discovery window (DW).
Benjamin Berg [Wed, 6 May 2026 03:44:23 +0000 (06:44 +0300)]
wifi: mac80211_hwsim: protect tsf_offset using a spinlock
To implement NAN synchronization in hwsim, the TSF needs to be adjusted
regularly from the RX path. Add a spinlock so that this can be done in a
safe manner.
Benjamin Berg [Wed, 6 May 2026 03:44:22 +0000 (06:44 +0300)]
wifi: mac80211_hwsim: only RX on NAN when active on a slot
This moves the NAN receive into the main code and changes it so that
frame RX only happens when the device is active on the channel. This
limits RX to the DW slots as there is currently no datapath.
With this the globally stored channel is obsolete, remove it.
Benjamin Berg [Wed, 6 May 2026 03:44:20 +0000 (06:44 +0300)]
wifi: mac80211_hwsim: limit TX of frames to the NAN DW
Frames submitted on the NAN device interface should only be transmitted
during one of the discovery windows (DWs). It is assumed that software
submits frames from the DW end notifications for the next DW period.
Simulate this behaviour by checking that we are currently in a DW before
transmitting from ieee80211_hwsim_wake_tx_queue. As frames will be
queued up at the start of a DW, wake the management TX queue every time
a DW is started. Do so with a randomized offset just to avoid every
client transmitting at the same time.
Miri Korenblit [Tue, 5 May 2026 16:46:13 +0000 (19:46 +0300)]
wifi: cfg80211: don't allow NAN DATA on multi radio devices
The support for NAN DATA was added for single radio devices only. For
example, checking the interface combinations is done for a single radio.
Prevent registration with NAN DATA interface type for multi radio
devices.
The MANA driver can fail to load on systems with high memory
utilization because several allocations in the queue setup paths
require large physically contiguous blocks via kmalloc. Under memory
fragmentation these high-order allocations may fail, preventing the
driver from creating queues when opening the interface or when
reconfiguring channels, ring parameters or MTU at runtime.
This series addresses the issue by:
1. Converting the tx_qp flat array into an array of pointers with
per-queue kvzalloc (~35 KB each), replacing a single contiguous
allocation that can reach ~2.2 MB at 64 queues.
2. Switching rxbufs_pre, das_pre, and rxq allocations to
kvmalloc/kvzalloc so the allocator can fall back to vmalloc
when contiguous memory is unavailable.
Throughput testing confirms no regression. Since kvmalloc falls
back to vmalloc under memory fragmentation, all kvmalloc calls
were temporarily replaced with vmalloc to simulate the fallback
path (iperf3, GBits/sec):
Aditya Garg [Sat, 2 May 2026 07:45:34 +0000 (00:45 -0700)]
net: mana: Use kvmalloc for large RX queue and buffer allocations
The RX path allocations for rxbufs_pre, das_pre, and rxq scale with
queue count and queue depth. With high queue counts and depth, these can
exceed what kmalloc can reliably provide from physically contiguous
memory under fragmentation.
Switch these from kmalloc to kvmalloc variants so the allocator
transparently falls back to vmalloc when contiguous memory is scarce,
and update the corresponding frees to kvfree.
Aditya Garg [Sat, 2 May 2026 07:45:33 +0000 (00:45 -0700)]
net: mana: Use per-queue allocation for tx_qp to reduce allocation size
Convert tx_qp from a single contiguous array allocation to per-queue
individual allocations. Each mana_tx_qp struct is approximately 35KB.
With many queues (e.g., 32/64), the flat array requires a single
contiguous allocation that can fail under memory fragmentation.
Change mana_tx_qp *tx_qp to mana_tx_qp **tx_qp (array of pointers),
allocating each queue's mana_tx_qp individually via kvzalloc. This
reduces each allocation to ~35KB and provides vmalloc fallback,
avoiding allocation failure due to fragmentation.
====================
selftests: rds: Log collection, TAP compliance and cleanups
This series is a set of bug fixes and improvements for the rds
selftests.
Patch 1 bumps the kselftest timeout from 400s to 800s. The original
limit was developed against a lean config, but the kselftest harness
counts boot time and gcov log collection against the limit, so a
default config with gcov enabled needs more headroom.
Patch 2 corrects some typos in the run.sh USAGE string and removes an
unused "-g" flag.
Patch 3 silences a handful of pylint warnings in test.py: it adds a
module docstring, suppresses the warnings tied to the sys.path.append
import trick, marks the long lived tcpdump Popen with disable-next
consider-using-with, and drops unused exception variables from two
BlockingIOError except clauses.
Patch 4 adds a -t flag to run.sh so the timeout can be overridden
if needed.
Patch 5 adds a RDS_LOG_DIR environment variable that specifies where
logs should be stored, or skips log collection if left unset
Patch 6 adds a SUDO_USER environment variable that sets the user
for tcpdump --relinquish-privileges. This avoid the permissions
drop that would leave pcaps empty on 9pfs since 9p does not
support chown
Patch 7 removes the initial tmp tcpdumps and instead saves the pcaps
directly to the logdir if it is set.
Patch 8 hoists the tcpdump shutdown into a helper and calls it from the
timeout signal handler so that the processes are properly terminated
and dumps are flushed
Patch 9 fixes gcov collection by ensuring debugfs is mounted, and
specifying the --root folder so that gcov can still find the kernel
source when it is run from the ksft test directory.
Patch 10 makes the test output TAP compliant so the kselftest runner
parses results correctly.
====================
This patch updates the rds selftests output to be TAP compliant.
Use ksft_pr() to mark debug output with a leading '# ' so that TAP
parsers treat it as commentary, and convert all informational print()
calls to use ksft_pr(). sys.exit(0) is changed to os._exit(0) to
avoid duplicate prints from the buffered TAP output. The console
output from the tcpdump subprocess is silenced, and the gcov console
output is redirected to a gcovr.log.
Finally adjust the exit path so that the hash check loop sets a
return code instead exiting directly. Then print the TAP results
and totals lines before exiting.
debugfs is not mounted automatically in a virtme-ng guest, so the
gcov data copy from /sys/kernel/debug/gcov/ silently finds nothing
depending on whether debugfs is mounted by default on the host OS.
Fix this by mounting debugfs in run.sh before copying the gcda
files.
Finally when invoked through the kselftest runner, the working
directory is the test directory rather than the kernel source root.
gcovr defaults --root to the current working directory, which causes
it to filter out all coverage data for files under net/rds/ since
they are not under the test directory. Fix this by passing --root
to gcovr explicitly.
The timeout signal handler for the rds selftests currently just
exits when the time limit is exceeded, and forgets to stop the
network dumps. Fix this by hoisting the tcpdump terminate commands
into a helper function, and call it from the signal handler before
exiting
Bound proc.wait() with a timeout (and fall back to proc.kill())
so an unresponsive tcpdump cannot hang the timeout path itself.
We also pop() tcpdump_procs as we iterate, so stop_pcaps() is safe
to call from both the normal cleanup path and the signal handler,
since the second invocation simply has nothing to do
This patch modifies rds selftests to use the environment variable
SUDO_USER for tcpdumps if it is set. This is needed to avoid chown
operations on the vng 9pfs which is not supported. Passing a user
listed in sudoers avoids the tcpdump privilege drop which may
otherwise create empty pcaps
This patch modifies the rds selftest to look for an env variable
RDS_LOG_DIR, and log all traces, pcaps and gcov collections to
the folder specified in RDS_LOG_DIR. If RDS_LOG_DIR is unset,
logs are not collected.
Add a -t flag to run.sh to optionally override the default
timeout. The --timeout flag is already supported in test.py,
so just add the shorthand -t flag
This patch fixes a few pylint errors in test.py. Remove unused exception
variables from except blocks, and disable warnings for imports that cannot
appear at the start of the module. Also disable warnings for the
tcpdump processes. The suggestion to use a with block does not apply
here since the process needs to outlive the parent to collect the dumps.
Lastly add the module docstring at the top of the module.
The 400s time out was originally developed under a leaner
kernel config that booted much faster than a default config.
Boot up is included as part of the over all test runtime, as
well as any log collection done when the test is complete.
A slower config combined with the gcov enabled test means
we'll need more time to accommodate the boot up and log
collection. So, bump time out to 800s.
====================
Fixes for mv88e6xxx for 6320/6321 family
Five fixes for mv88e6xxx for 6320/6321 family, for net-next,
without Fixes tags, as per Andrew's request last year, see
https://lore.kernel.org/netdev/20250313134146.27087-1-kabel@kernel.org/
====================
Marek Behún [Mon, 4 May 2026 15:32:27 +0000 (17:32 +0200)]
net: dsa: mv88e6xxx: enable devlink ATU hash param for 6320 family
Commit 23e8b470c7788 ("net: dsa: mv88e6xxx: Add devlink param for ATU
hash algorithm.") introduced ATU hash algorithm access via devlink, but
did not enable it for the 6320 family. Do it now.
Marek Behún [Mon, 4 May 2026 15:32:25 +0000 (17:32 +0200)]
net: dsa: mv88e6xxx: define .pot_clear() for 6321
Commit 9e907d739cc3 ("net: dsa: mv88e6xxx: add POT operation") did not
add the .pot_clear() method to the 6321 switch operations structure.
Add them now.
====================
selftests: drv-net: convert so_txtime to drv-net
In preparation for extending to pacing hardware offload, convert the
so_txtime.sh test to a drv-net test that can be run against netdevsim
and real hardware.
Two preparatory patches
1. support negative tests, where tests are expected to fail
2. add a tc helper
See individual patches for details and detailed changelog
====================
In preparation for extending to pacing hardware offload, convert the
so_txtime.sh test to a drv-net test that can be run against netdevsim
and real hardware.
Also update so_txtime.c to not exit on first failure, but run to
completion and report exit code there. This helps with debugging
unexpected results, especially when processing multiple packets,
as happens in the "reverse_order" testcase.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
v6 -> v7
- update test to use new argument expect_fail
- v6 received Reviewed-by, but dropped due to above (minor) change
v5 -> v6
- fix order in tools/testing/selftests/drivers/net/config
v4 -> v5
- move qdisc setup/restore into each test
- add tc to utils.py (separate patch)
- test expected failure (separate patch)
- fix pylint
- convert fail to pass for timing errors if KSFT_MACHINE_SLOW
(cmd does not special case KSFT_SKIP process returncode yet)
Responses to sashiko review
- The test converts per packet failure to errors, to continue
testing other packets, but other error() cases are not in scope.
- The test starts sender and receiver at an absolute future time,
like the original test. This assumes ~msec scale sync'ed clocks.
- The tc qdisc replace command works fine with noqueue. Tested
manually.
v3 -> v4
- restore original qdisc after test
- drop unnecessary underscore in tap test names
v2 -> v3
- Makefile: so_txtime from YNL_GEN_FILES to TEST_GEN_FILES (Sashiko, NIPA)
v1 -> v2
- move so_txtime.c for net/lib to drivers/net (Jakub)
- fix drivers/net/config order (Jakub)
- detect passing when failure is expected (Jakub, Sashiko)
- pass pylint --disable=R (Jakub)
- only call ksft_run once (Jakub)
- do not sleep if waiting time is negative (Sashiko)
- add \n when converting error() to fprintf() (Sashiko)
- 4 space indentation, instead of 2 space
- increase sync delay from 100 to 200ms, to fix rare vng flakes
The .port_max_speed_mode() method is not used anymore since commit 40da0c32c3fc ("net: dsa: mv88e6xxx: remove handling for DSA and CPU ports").
Drop it.
====================
udp_tunnel: Speed up UDP tunnel device destruction (Part I)
Most of the UDP tunnel devices call synchronize_rcu() twice
during destruction, for example, vxlan has
1) synchronize_rcu() in udp_tunnel_sock_release()
2) synchronize_net() in vxlan_sock_release()
The goal of this series is to remove the former, and another
followup series removes the latter.
synchronize_rcu() was added in udp_tunnel_sock_release() by
commit 3cf7203ca620 ("net/tunnel: wait until all sk_user_data
reader finish before releasing the sock").
This was intended to protect the fast path of a dying vxlan
from dereferencing vxlan_sock->sock->sk after sock_orphan()
has set sock->sk to NULL.
Most of the UDP tunnel devices store struct socket to its
private struct, but it is NOT needed in the fast paths;
struct sock is used there, but struct socket is only used
for tunnel setup / teardown.
This is probably because UDP tunnel functions accept struct
socket, but even such functions do not need it, except for
udp_tunnel_sock_release(), which can safely access sk->sk_socket.
The overview of the series:
Patch 1 - 5 : Convert UDP tunnel helper to take struct sock
Patch 6 : Small fix for 10-years-old bug
Patch 7 - 14 : Store struct sock in tunnel devices
Patch 15 : Remove synchronize_rcu() in udp_tunnel_sock_release()
With this change, a script creating/upping vxlan in 4000 netns
runs 10x faster.
====================
udp_tunnel: Remove synchronize_rcu() in udp_tunnel_sock_release().
Commit 3cf7203ca620 ("net/tunnel: wait until all sk_user_data
reader finish before releasing the sock") added synchronize_rcu()
in udp_tunnel_sock_release().
This was intended to protect the fast path of a dying vxlan device
from dereferencing vxlan_sock->sock->sk after sock_orphan() has set
sock->sk to NULL.
However, vxlan does not need to access struct socket itself
in the fast path; it only reads struct sock, and struct socket
is only used for tunnel setup and teardown.
This applies to all other UDP tunnel users, and they have been
converted to access struct sock directly.
In addition, each device-specific struct used in their fast paths
is freed after one RCU grace period. Since this occurs after
udp_tunnel_sock_release(), the struct is guaranteed to be freed
after struct udp_sock.
Therefore, synchronize_rcu() in udp_tunnel_sock_release() is
now redundant.
Let's remove it.
Tested:
A script creating/upping vxlan devices in 4000 netns runs 10x
faster with this change. We can see the same improvement with
other UDP tunnel devices as well.
$ cat vxlan.sh
for i in `seq 1 40`
do
(for j in `seq 1 100` ; do
unshare -n bash -c "ip link add vxlan0 type vxlan id 100 local 127.0.0.1 dstport 4789 && ip link set vxlan0 up";
done) &
done
wait
With bpftrace, we can see vxlan_stop() is significantly faster.
tipc udp_bearer does not need to access struct socket itself in
the fast path; it only reads struct sock, and struct socket is
only used for tunnel setup and teardown.
Let's store struct sock directly in struct udp_bearer.
Note that cleanup_bearer() calls synchronize_net() after
udp_tunnel_sock_release(), so udp_bearer is not freed until
inflight fast paths finish.
Note also that synchronize_rcu() is added in the error path
of tipc_udp_enable() since udp_bearer will be kfree()d
immediately once we remove synchronize_rcu() in
udp_tunnel_sock_release().
pfcp does not need to access struct socket itself in the fast
path; it only reads struct sock, and struct socket is only used
for tunnel setup and teardown.
Let's store struct sock directly in struct pfcp_dev.
pfcp_del_sock() is called from dev->netdev_ops->ndo_uninit().
The 2nd synchronize_net() in unregister_netdevice_many_notify()
ensures that inflight pfcp RX fast paths finish before pfcp_dev
is freed.
Note that synchronize_rcu() is added in the error path of
pfcp_newlink() since free_netdev() will free pfcp_dev immediately
once we remove synchronize_rcu() in udp_tunnel_sock_release().
amt does not need to access struct socket itself in the fast path;
it only reads struct sock, and struct socket is only used for tunnel
setup and teardown.
Let's store struct sock directly in struct amt.
amt_dev_stop() is called as dev->netdev_ops->ndo_stop().
synchronize_net() in unregister_netdevice_many_notify() ensures
that inflight amt RX fast paths finish before amt_dev is freed.
amt no longer needs synchronize_rcu() in udp_tunnel_sock_release().
Note that amt_dev_stop() looks buggy; cancel_delayed_work_sync()
should be called after udp_tunnel_sock_release().
fou does not need to access struct socket itself in the fast
path; it only reads struct sock, and struct socket is only used
for tunnel setup and teardown.
Let's store struct sock directly in struct fou.
fou_release() frees struct fou with kfree_rcu(), so fou no
longer needs synchronize_rcu() in udp_tunnel_sock_release().
Note that the error path in fou_create() looks buggy; once the
tunnel is set up and fou_add_to_port_list() fails, struct fou
should be freed with kfree_rcu() _after_ udp_tunnel_sock_release().
bareudp does not need to access struct socket itself in the fast
path; it only reads struct sock, and struct socket is only used
for tunnel setup and teardown.
Let's store struct sock directly in struct bareudp_dev.
bareudp_sock_release() is called from dev->netdev_ops->ndo_stop().
synchronize_net() in unregister_netdevice_many_notify() ensures that
inflight bareudp RX fast paths finish before bareudp_dev is freed.
bareudp no longer needs synchronize_rcu() in udp_tunnel_sock_release().
geneve does not need to access struct socket itself in the fast
path; it only reads struct sock, and struct socket is only used for
tunnel setup and teardown.
Let's store struct sock directly in struct geneve_sock.
__geneve_sock_release() frees geneve_sock with kfree_rcu(), so
geneve no longer needs synchronize_rcu() in udp_tunnel_sock_release().
Commit 3cf7203ca620 ("net/tunnel: wait until all sk_user_data
reader finish before releasing the sock") added synchronize_rcu()
in udp_tunnel_sock_release().
This was intended to protect the fast path of a dying vxlan device
from dereferencing vxlan_sock->sock->sk after sock_orphan() has set
sock->sk to NULL.
However, vxlan does not need to access struct socket itself in the
fast path; it only reads struct sock, and struct socket is only
used for tunnel setup and teardown.
Let's store struct sock directly in struct vxlan_sock.
In the next patch, we will free vxlan_sock with kfree_rcu(), then
vxlan no longer needs synchronize_rcu() in udp_tunnel_sock_release().
udp_tunnel: Pass struct sock to udp_tunnel_sock_release().
None of the udp_tunnel users need struct socket in their
fast paths; it is only used for tunnel setup / teardown.
While the UDP tunnel interface accepts struct socket, this
encourages users to store the pointer unnecessarily. This
leads to extra dereferences when accessing struct sock fields
(e.g., sk->sk_user_data instead of sock->sk->sk_user_data).
Furthermore, these dereferences necessitate synchronize_rcu()
in udp_tunnel_sock_release() to protect the fast paths from
sock_orphan() setting sk->sk_socket to NULL.
This overhead can be avoided if users store the struct sock
pointer directly in their private structures.
As a prep, let's change udp_tunnel_sock_release() to take
struct sock instead of struct socket.
Johannes Berg [Tue, 28 Apr 2026 09:25:41 +0000 (11:25 +0200)]
wifi: mac80211: don't parse full UHR operation from beacons
Currently, as noted in the comment, ieee80211_uhr_oper_size_ok()
will reject the element coming from the beacon, since it's too
short. However, this is incorrect in general, since the element
is extensible, and such extensions could be present in a beacon,
and then it might pass muster anyway.
Using the frame type we now have in the element parse result,
check that it's not coming from a beacon. The size was already
checked (according to frame type) during parsing.
Johannes Berg [Tue, 28 Apr 2026 09:25:40 +0000 (11:25 +0200)]
wifi: cfg80211: separate NPCA validity from chandef validity
When considering both NPCA and DBE, it can appear that the
NPCA configuration is invalid, e.g. for an 80 MHz BSS channel
with DBE to 160 MHz:
| primary channel
| NPCA primary channel
| |
V V
| p | | n | | | | | |
| BSS channel |
| DBE channel |
Now the NPCA primary channel is in the same half as the primary
channel, and the NPCA puncturing bitmap could be completely
invalid as a puncturing bitmap when considering the overall
channel.
Split out the validity checks from cfg80211_chandef_valid() to
a new cfg80211_chandef_npca_valid() function that just checks
the NPCA configuration against the BSS chandef.
Johannes Berg [Tue, 28 Apr 2026 09:25:38 +0000 (11:25 +0200)]
wifi: mac80211: mlme: use NPCA chandef if capable
If the device is capable, parse the AP chandef with NPCA.
Also advertise the other NPCA operational parameters to the
underlying driver and track if they change (though not with
BSS critical update etc. yet)
Since NPCA can only be enabled when the chanctx isn't shared,
the channel context code needs to clear/set npca.enabled in
the per-link configuration, except during association since
we can't enable NPCA before having completed association. In
this case, set npca.enabled during the association process.
Johannes Berg [Tue, 28 Apr 2026 09:25:37 +0000 (11:25 +0200)]
wifi: mac80211: allow only AP chanctx sharing with NPCA
When two interfaces share a channel context, disable NPCA
unless both are AP interfaces that require NPCA. This way,
two AP interfaces can have identical chandefs set up and
share the channel context, but any non-APs cannot share a
chanctx with NPCA (they'd almost certainly have different
BSS color.)
This doesn't mean the chanctx cannot be shared but rather
that NPCA will be disabled on the shared channel context.
Johannes Berg [Tue, 28 Apr 2026 09:25:34 +0000 (11:25 +0200)]
wifi: mac80211: use NPCA in chandef for validation
Put the NPCA parameters into a chandef when parsing data from
the AP to validate them using the cfg80211 code, rather than
implementing that in mac80211 directly.
Note that the parameters are not applied yet, since mac80211
doesn't yet have NPCA support.
Johannes Berg [Tue, 28 Apr 2026 09:25:33 +0000 (11:25 +0200)]
wifi: cfg80211: add helper for parsing NPCA to chandef
Add a cfg80211_chandef_add_npca() helper function that takes an
existing chandef without NPCA and sets the NPCA information from
the format used in UHR operation and UHR Parameters Update.
Johannes Berg [Tue, 28 Apr 2026 09:25:32 +0000 (11:25 +0200)]
wifi: cfg80211: allow representing NPCA in chandef
Add the necessary fields to the chandef data structure
to represent NPCA (the NPCA primary channel and NPCA
punctured/disabled subchannels bitmap), and the code
to check these for validity, compatibility, as well as
allowing it to be passed for AP mode for capable
devices.
Compatibility is assumed to only be the case when it's
actually identical, enabling later management of this
in channel contexts in mac80211 for multiple APs, but
requiring userspace to set up the identical chandef on
all AP interfaces that share a channel (and BSS color.)
Johannes Berg [Tue, 28 Apr 2026 09:25:31 +0000 (11:25 +0200)]
wifi: mac80211: carry element parsing frame type/from_ap
Carry the frame type and from_ap indication in the parse
result, the caller should have it, but we often pass the
resulting data structure around, so this saves passing
more parameters.
If the AP has extended MLD capa/ops we may advertise our own
from userspace. Also add the driver's in this case. This is
fine since the only one right now from the driver is UHR ML-PM
and that's only relevant if the AP already has it too.
Johannes Berg [Tue, 28 Apr 2026 09:06:59 +0000 (11:06 +0200)]
wifi: cfg80211: allow devices to advertise extended MLD capa/ops
For UHR, multi-link power-management capability lives there, and
so it's needed that hostapd knows what to advertise, and clients
should have it shown to userspace for information.
Repurpose the existing NL80211_ATTR_ASSOC_MLD_EXT_CAPA_OPS by
renaming it to NL80211_ATTR_EXT_MLD_CAPA_AND_OPS (with a define
for compatibility) and advertise the capabilities.
We can also later use the value, if needed, to set per-station
capabilities on STAs added to AP interfaces.
Johannes Berg [Tue, 28 Apr 2026 09:06:58 +0000 (11:06 +0200)]
wifi: cfg80211: ensure UHR ML-PM flag is consistent
We check that extended MLD capabilities and operations are
consistent across APs in an AP MLD, but didn't check reserved
fields since they could be defined to differ. Check bit 8 now
since it's defined by UHR to be consistent.
Johannes Berg [Tue, 28 Apr 2026 09:06:57 +0000 (11:06 +0200)]
wifi: mac80211: track AP's extended MLD capa/ops
For UHR multi-link power management, the driver/device needs
to know if the AP supports it, to be able to use it. Track
the AP's extended MLD capabilities and operations so it does.