Nidhish A N [Sun, 19 Oct 2025 08:45:14 +0000 (11:45 +0300)]
wifi: iwlwifi: mld: Move EMLSR prints to IWL_DL_EHT
Modify EMLSR debug prints to use IWL_DL_EHT instead
of IWL_DL_INFO. This will allow better communication
with validation as they might enable only IWL_DL_EHT
or IWL_DL_INFO as required.
Add prints to log attempt to switch links when
missed beacons exceed threshold.
Print both link ids and missed beacons when in EMLSR
mode.
Johannes Berg [Sun, 19 Oct 2025 08:45:10 +0000 (11:45 +0300)]
wifi: iwlwifi: mvm/mld: report non-HT frames as 20 MHz
Non-HT frames can only be encoded in 20 MHz, however, they
could be duplicated on all/some of the subchannels (mostly
used for RTS/CTS), in which case the firmware will report
and estimate of the overall used bandwidth based on energy
detected. This could be confusing so don't report it that
way, always use 20 MHz for non-HT/legacy frames instead.
Note that currently the value doesn't appear to be used by
mac80211, it never checks the bandwidth field for legacy
encodings.
Johannes Berg [Sun, 19 Oct 2025 08:45:09 +0000 (11:45 +0300)]
wifi: iwlwifi: bump core version for BZ/SC/DR
Start supporting Core 100 FW on those devices.
In addition, the move to the new Core scheme (instead of API scheme)
will start Core 100 and not 99, as planned. Adjust for that as well.
Johannes Berg [Sun, 19 Oct 2025 08:45:08 +0000 (11:45 +0300)]
wifi: iwlwifi: fix build when mvm/mld not configured
When neither mvm nor mld are configured, we don't have the
iwl_bz_mac_cfg symbol and thus cannot check for it. But in
that case the relevant device entries aren't and cannot be
present, so just ifdef the test code for that.
Miri Korenblit [Sun, 19 Oct 2025 08:45:07 +0000 (11:45 +0300)]
wifi: iwlwifi: mld: check the validity of noa_len
Validate iwl_probe_resp_data_notif::noa_attr::len_low since we are using
its value to determine the noa_len, which is later used for the NoA
attribute.
wifi: iwlwifi: stop checking the firmware's error pointer
It is not very clear what values we put in min_umac_error_event_table.
For Ma (Meteor Lake), this value is wrong and we get the print:
Not valid error log pointer ...
wifi: iwlwifi: be more chatty when we fail to find a wifi7 device
All wifi7 devices need CONFIG_IWLMLD to be enabled.
If we can't support the wifi7 device and the module is not enabled,
complain to the user.
The check in iwl_req_fw_callback is then no longer required.
Johannes Berg [Sun, 19 Oct 2025 08:45:02 +0000 (11:45 +0300)]
wifi: iwlwifi: mld: update to new sniffer API
This will break current sniffer functionality for firmware
versions that don't have the new API, but supporting both
would be very complex. Convert the code to use only the new
sniffer notification.
The iwlmvm op mode is used for pre EHT HWs. Those HWs doesn't have wider
OFDMA, so phy command versions 5+ (that added support for wider OFDMA)
are not supported. Remove support for them.
This means that we also don't need to set the
IEEE80211_VIF_IGNORE_OFDMA_WIDER_BW, as we don't care about the ap
chandef anyway.
Johannes Berg [Mon, 15 Sep 2025 08:34:28 +0000 (11:34 +0300)]
wifi: iwlwifi: tests: check listed PCI IDs have configs
Add a test that checks, for the old pre-CNVI devices, that
PCI IDs listed in the PCI IDs table will also match in the
config table. Newer ones we test against our database of
devices, but the current database doesn't go back that far,
so at least this checks against the PCI IDs the driver has.
wifi: iwlwifi: iwlmld is always used for wifi7 devices
iwlmld is used since API 97 and for wifi7 devices.
Since APIs < 97 are no longer supported on such devices,
we can remove the API check and always load iwlmld for the wifi7
devices.
wifi: iwlwifi: mld: reschedule check_tpt_wk also not in EMLSR
When the throughput count reaches the threshold, EMLSR is no longer
blocked by throughput.
This doesn't mean that EMLSR will be activated immediately, since there
might be other reasons that block EMLSR.
When the throughput blocker is not set, check_tpt_wk should run every 5
seconds and check if the throughput blocker should be set (if the
throughtput counter dropped).
If not, it should reschedule itself.
In the current code, the worker will reschedule itself only if we are in
EMLSR. This is wrong, since we might be in a case where the throughput
blocker is not set but we are not in EMLSR, and then we will never check
again the throughput counters (and block EMLSR if needed).
Fix this by rescheduling the worker also when EMLSR is not active.
wifi: cfg80211: Add parameters to radio-specific debugfs directories
In multi-radio wiphy architecture, where a single wiphy can have
multiple radios tied to it, radio specific configuration parameters
and global wiphy parameters are maintained for the entire physical
device and common to all radios. But, each radio in a wiphy can
have different values for each radio configuration parameter like
RTS threshold. With the current debugfs directory structure, the
values of global wiphy configuration parameters can be viewed, but,
values of individual radio configuration parameters cannot be viewed.
To address this requirement, maintain separate entries of each radio
configuration parameter i.e., RTS threshold in corresponding radio-
specific debugfs directory. This way, radio-specific configuration
parameters can be maintained along with global wiphy configuration
parameters. Whenever the values are changed for one radio, the values
for rest of the radios in the wiphy and the global wiphy parameter
value will remain intact.
Sample output:
/# iw phy#0 set rts 100 radio 1
/# iw phy#0 set rts 468 radio 0
/# cat /sys/kernel/debug/ieee80211/phy0/rts_threshold
-1
/# cat /sys/kernel/debug/ieee80211/phy0/radio0/radio_rts_threshold
468
/# cat /sys/kernel/debug/ieee80211/phy0/radio1/radio_rts_threshold
100
wifi: cfg80211: Add debugfs support for multi-radio wiphy
In multi-radio wiphy architecture, where a single wiphy can have
multiple radios tied to it, radio specific configuration parameters
and global wiphy parameters are maintained for the entire physical
device and common to all radios. But, each radio in a wiphy can have
different values for each radio configuration parameter, like RTS
threshold. With the current debugfs directory structure, the values
of global wiphy configuration parameters can be viewed, but, values
of individual radio configuration parameters cannot be viewed, as
radio specific configuration parameters are not maintained, separately.
To address this, in addition to maintaining global wiphy configuration
parameters common to all radios, create separate debugfs directories
for each radio in a wiphy to maintain parameters corresponding to that
radio in this directory.
In implementation, maintain a dentry structure in wiphy_radio_cfg, a
structure containing radio configurations of a wiphy. This struct is
maintained to denote per-radio configurations of a wiphy. Create
separate directories representing each radio within phy#X directory in
debugfs during wiphy registration.
Sample directory structure with this change:
ls /sys/kernel/debug/ieee80211/phy0/radio
radio0/ radio1/ radio2/
Currently, RX bitrate statistics are not updated for packets received
on the mesh forwarding path during fast RX processing. This results in
incomplete RX rate tracking in station dump outputs for mesh scenarios.
Update ieee80211_invoke_fast_rx() to record the RX rate using
sta_stats_encode_rate() and store it in the last_rate field of
ieee80211_sta_rx_stats when RX_QUEUED is returned from
ieee80211_rx_mesh_data(). This ensures that RX bitrate is properly
accounted for in both RSS and non-RSS paths.
Lachlan Hodges [Tue, 21 Oct 2025 06:12:01 +0000 (17:12 +1100)]
wifi: cfg80211: default S1G chandef width to 1MHz
When management frames are passed down to be transmitted by usermode, often
times the NL80211_ATTR_CHANNEL_WIDTH is not used as its implied to be
transmitted on the control width. This can lead to errors during chandef
validation as the offsets from the channel center are wrong. Ensure we
initialise S1G chandefs to a width of 1MHz rather then 20MHz.
Johannes Berg [Sun, 19 Oct 2025 08:50:49 +0000 (11:50 +0300)]
wifi: mac80211: reset CRC valid after CSA
While waiting for a beacon after CSA, reset the CRC valid
so that the next beacon is handled even if it happens to
be identical the last one on the old channel. This is an
AP bug either way, but it's better to disconnect cleanly
than to have lingering CSA state.
In the iwlwifi instantiation of this problem, mac80211 is
ignoring the beacon but the firmware creates a new CSA,
and then crashes later because mac80211/driver didn't do
anything about it.
wifi: mac80211_hwsim: advertise puncturing feature support
If userspace provides a puncturing bitmap via the NL80211_ATTR_PUNCT_BITMAP
attribute, the kernel with mac80211_hwsim driver currently rejects the
command with the error: "driver doesn't support puncturing", because the
driver does not advertise support for this feature.
At present, the following hwsim test cases utilize puncturing, but the
bitmap is not sent to the kernel. Instead, the puncturing information is
conveyed only through the beacon data:
* eht_5ghz_80mhz_puncturing_override_1
* eht_5ghz_80mhz_puncturing_override_2
* eht_5ghz_80mhz_puncturing_override_3
A future change in hostapd will begin configuring the puncturing bitmap
explicitly, which will cause these test cases to fail unless the driver
advertises support.
To address this, update mac80211_hwsim driver to advertise puncturing
feature support.
Ryder Lee [Tue, 23 Sep 2025 17:23:22 +0000 (17:23 +0000)]
wifi: cfg80211/mac80211: validate radio frequency range for monitor mode
In multi-radio devices, it is possible to have an MLD AP and a monitor
interface active at the same time. In such cases, monitor mode may not
be able to specify a fixed channel and could end up capturing frames
from all radios, including those outside the intended frequency bands.
This patch adds frequency validation for monitor mode. Received frames
are now only processed if their frequency fall within the allowed ranges
of the radios specified by the interface's radio_mask.
This prevents monitor mode from capturing frames outside the supported radio.
Linus Torvalds [Thu, 16 Oct 2025 16:41:21 +0000 (09:41 -0700)]
Merge tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from CAN
Current release - regressions:
- udp: do not use skb_release_head_state() before
skb_attempt_defer_free()
- gro_cells: use nested-BH locking for gro_cell
- dpll: zl3073x: increase maximum size of flash utility
Previous releases - regressions:
- core: fix lockdep splat on device unregister
- tcp: fix tcp_tso_should_defer() vs large RTT
- tls:
- don't rely on tx_work during send()
- wait for pending async decryptions if tls_strp_msg_hold fails
- can: j1939: add missing calls in NETDEV_UNREGISTER notification
handler
- eth: lan78xx: fix lost EEPROM write timeout in
lan78xx_write_raw_eeprom
Previous releases - always broken:
- ip6_tunnel: prevent perpetual tunnel growth
- dpll: zl3073x: handle missing or corrupted flash configuration
- can: m_can: fix pm_runtime and CAN state handling
- eth:
- ixgbe: fix too early devlink_free() in ixgbe_remove()
- ixgbevf: fix mailbox API compatibility
- gve: Check valid ts bit on RX descriptor before hw timestamping
- idpf: cleanup remaining SKBs in PTP flows
- r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H"
* tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
udp: do not use skb_release_head_state() before skb_attempt_defer_free()
net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset
netdevsim: set the carrier when the device goes up
selftests: tls: add test for short splice due to full skmsg
selftests: net: tls: add tests for cmsg vs MSG_MORE
tls: don't rely on tx_work during send()
tls: wait for pending async decryptions if tls_strp_msg_hold fails
tls: always set record_type in tls_process_cmsg
tls: wait for async encrypt in case of error during latter iterations of sendmsg
tls: trim encrypted message to match the plaintext on short splice
tg3: prevent use of uninitialized remote_adv and local_adv variables
MAINTAINERS: new entry for IPv6 IOAM
gve: Check valid ts bit on RX descriptor before hw timestamping
net: core: fix lockdep splat on device unregister
MAINTAINERS: add myself as maintainer for b53
selftests: net: check jq command is supported
net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit()
tcp: fix tcp_tso_should_defer() vs large RTT
r8152: add error handling in rtl8152_driver_init
usbnet: Fix using smp_processor_id() in preemptible code warnings
...
Linus Torvalds [Thu, 16 Oct 2025 16:39:29 +0000 (09:39 -0700)]
Merge tag 'ata-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
- Do not print an error message (and assume that the General Purpose
Log Directory log page is not supported) for a device that reports a
bogus General Purpose Logging Version.
Unsurprisingly, many vendors fail to report the only valid General
Purpose Logging Version (Damien)
* tag 'ata-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: relax checks in ata_read_log_directory()
Eric Dumazet [Wed, 15 Oct 2025 05:27:15 +0000 (05:27 +0000)]
udp: do not use skb_release_head_state() before skb_attempt_defer_free()
Michal reported and bisected an issue after recent adoption
of skb_attempt_defer_free() in UDP.
The issue here is that skb_release_head_state() is called twice per skb,
one time from skb_consume_udp(), then a second time from skb_defer_free_flush()
and napi_consume_skb().
As Sabrina suggested, remove skb_release_head_state() call from
skb_consume_udp().
Add a DEBUG_NET_WARN_ON_ONCE(skb_nfct(skb)) in skb_attempt_defer_free()
Many thanks to Michal, Sabrina, Paolo and Florian for their help.
Fixes: 6471658dc66c ("udp: use skb_attempt_defer_free()") Reported-and-bisected-by: Michal Kubecek <mkubecek@suse.cz> Closes: https://lore.kernel.org/netdev/gpjh4lrotyephiqpuldtxxizrsg6job7cvhiqrw72saz2ubs3h@g6fgbvexgl3r/ Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Michal Kubecek <mkubecek@suse.cz> Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: Florian Westphal <fw@strlen.de> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20251015052715.4140493-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jiawen Wu [Tue, 14 Oct 2025 06:17:25 +0000 (14:17 +0800)]
net: txgbe: optimize the flow to setup PHY for AML devices
To adapt to new firmware for AML devices, the driver should send the
"SET_LINK_CMD" to the firmware only once when switching PHY interface
mode, and no longer needs to re-trigger PHY configuration based on the
RX signal interrupt (TXGBE_GPIOBIT_3).
In previous firmware versions, the PHY was configured only after receiving
"SET_LINK_CMD", and might remain incomplete if the RX signal was lost.
To handle this case, the driver used TXGBE_GPIOBIT_3 interrupt to resend
the command. This workaround is no longer necessary with the new firmware.
And the unknown link speed is permitted in the mailbox buffer.
Recent firmware updates introduce additional fields in the mailbox message
to provide more information for identifying 40G and 100G QSFP modules.
To accommodate these new fields, expand the mailbox buffer size by 4 bytes.
Without this change, drivers built against the updated firmware cannot
properly identify modules due to mismatched mailbox message lengths.
The old firmware version that used the smaller mailbox buffer has never
been publicly released, so there are no backward-compatibility concerns.
net: fbnic: Fix page chunking logic when PAGE_SIZE > 4K
The HW always works on a 4K page size. When the OS supports larger
pages, we fragment them across multiple BDQ descriptors.
We were not properly incrementing the descriptor, which resulted in us
specifying the last chunks id/addr and then 15 zero descriptors. This
would cause packet loss and driver crashes. This is not a fix since the
Kconfig prevents use outside of x86.
I Viswanath [Mon, 13 Oct 2025 18:16:48 +0000 (23:46 +0530)]
net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset
dev->chipid is used in lan78xx_init_mac_address before it's initialized:
lan78xx_reset() {
lan78xx_init_mac_address()
lan78xx_read_eeprom()
lan78xx_read_raw_eeprom() <- dev->chipid is used here
dev->chipid = ... <- dev->chipid is initialized correctly here
}
Reorder initialization so that dev->chipid is set before calling
lan78xx_init_mac_address().
Fixes: a0db7d10b76e ("lan78xx: Add to handle mux control per chip id") Signed-off-by: I Viswanath <viswanathiyyappan@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Khalid Aziz <khalid@kernel.org> Link: https://patch.msgid.link/20251013181648.35153-1-viswanathiyyappan@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 16 Oct 2025 00:56:20 +0000 (17:56 -0700)]
Merge tag 'linux-can-fixes-for-6.18-20251014' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2025-10-14
The first 2 paches are by Celeste Liu and target the gS_usb driver.
The first patch remove the limitation to 3 CAN interface per USB
device. The second patch adds the missing population of
net_device->dev_port.
The next 4 patches are by me and fix the m_can driver. They add a
missing pm_runtime_disable(), fix the CAN state transition back to
Error Active and fix the state after ifup and suspend/resume.
Another patch by me targets the m_can driver, too and replaces Dong
Aisheng's old email address.
The next 2 patches are by Vincent Mailhol and update the CAN
networking Documentation.
Tetsuo Handa contributes the last patch that add missing cleanup calls
in the NETDEV_UNREGISTER notification handler.
* tag 'linux-can-fixes-for-6.18-20251014' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: j1939: add missing calls in NETDEV_UNREGISTER notification handler
can: add Transmitter Delay Compensation (TDC) documentation
can: remove false statement about 1:1 mapping between DLC and length
can: m_can: replace Dong Aisheng's old email address
can: m_can: fix CAN state in system PM
can: m_can: m_can_chip_config(): bring up interface in correct state
can: m_can: m_can_handle_state_errors(): fix CAN state transition to Error Active
can: m_can: m_can_plat_remove(): add missing pm_runtime_disable()
can: gs_usb: gs_make_candev(): populate net_device->dev_port
can: gs_usb: increase max interface to U8_MAX
====================
Lorenzo Bianconi [Mon, 13 Oct 2025 13:58:50 +0000 (15:58 +0200)]
net: airoha: npu: Add airoha_npu_soc_data struct
Introduce airoha_npu_soc_data structure in order to generalize per-SoC
NPU firmware info. Introduce airoha_npu_load_firmware utility routine.
This is a preliminary patch in order to introduce AN7583 NPU support.
====================
Preserve PSE PD692x0 configuration across reboots
Previously, the driver would always reconfigure the PSE hardware on
probe, causing a port matrix reflash that resulted in temporary power
loss to all connected devices. This change maintains power continuity
by preserving existing configuration when the PSE has been previously
initialized.
====================
net: pse-pd: pd692x0: Preserve PSE configuration across reboots
Detect when PSE hardware is already configured (user byte == 42) and
skip hardware initialization to prevent power interruption to connected
devices during system reboots.
Previously, the driver would always reconfigure the PSE hardware on
probe, causing a port matrix reflash that resulted in temporary power
loss to all connected devices. This change maintains power continuity
by preserving existing configuration when the PSE has been previously
initialized.
net: pse-pd: pd692x0: Separate configuration parsing from hardware setup
Cache the port matrix configuration in driver private data to enable
PSE controller reconfiguration. This refactoring separates device tree
parsing from hardware configuration application, allowing settings to be
reapplied without reparsing the device tree.
This refactoring is a prerequisite for preserving PSE configuration
across reboots to prevent power disruption to connected devices.
net: pse-pd: pd692x0: Replace __free macro with explicit kfree calls
Replace __free(kfree) with explicit kfree() calls to follow the net
subsystem policy of avoiding automatic cleanup macros as described in
the documentation.
Florian Fainelli [Mon, 13 Oct 2025 17:23:06 +0000 (10:23 -0700)]
net: bcmasp: Add support for PHY-based Wake-on-LAN
If available, interrogate the PHY to find out whether we can use it for
Wake-on-LAN. This can be a more power efficient way of implementing that
feature, especially when the MAC is powered off in low power states.
Breno Leitao [Tue, 14 Oct 2025 09:17:25 +0000 (02:17 -0700)]
netdevsim: set the carrier when the device goes up
Bringing a linked netdevsim device down and then up causes communication
failure because both interfaces lack carrier. Basically a ifdown/ifup on
the interface make the link broken.
Commit 3762ec05a9fbda ("netdevsim: add NAPI support") added supported
for NAPI, calling netif_carrier_off() in nsim_stop(). This patch
re-enables the carrier symmetrically on nsim_open(), in case the device
is linked and the peer is up.
Jakub Kicinski [Thu, 16 Oct 2025 00:41:47 +0000 (17:41 -0700)]
Merge branch 'tls-misc-bugfixes'
Sabrina Dubroca says:
====================
tls: misc bugfixes
Jann Horn reported multiple bugs in kTLS. This series addresses them,
and adds some corresponding selftests for those that are reproducible
(and without failure injection).
====================
Sabrina Dubroca [Tue, 14 Oct 2025 09:17:01 +0000 (11:17 +0200)]
selftests: net: tls: add tests for cmsg vs MSG_MORE
We don't have a test to check that MSG_MORE won't let us merge records
of different types across sendmsg calls.
Add new tests that check:
- MSG_MORE is only allowed for DATA records
- a pending DATA record gets closed and pushed before a non-DATA
record is processed
Sabrina Dubroca [Tue, 14 Oct 2025 09:17:00 +0000 (11:17 +0200)]
tls: don't rely on tx_work during send()
With async crypto, we rely on tx_work to actually transmit records
once encryption completes. But while send() is running, both the
tx_lock and socket lock are held, so tx_work_handler cannot process
the queue of encrypted records, and simply reschedules itself. During
a large send(), this could last a long time, and use a lot of memory.
Transmit any pending encrypted records before restarting the main
loop of tls_sw_sendmsg_locked.
Sabrina Dubroca [Tue, 14 Oct 2025 09:16:59 +0000 (11:16 +0200)]
tls: wait for pending async decryptions if tls_strp_msg_hold fails
Async decryption calls tls_strp_msg_hold to create a clone of the
input skb to hold references to the memory it uses. If we fail to
allocate that clone, proceeding with async decryption can lead to
various issues (UAF on the skb, writing into userspace memory after
the recv() call has returned).
In this case, wait for all pending decryption requests.
Sabrina Dubroca [Tue, 14 Oct 2025 09:16:58 +0000 (11:16 +0200)]
tls: always set record_type in tls_process_cmsg
When userspace wants to send a non-DATA record (via the
TLS_SET_RECORD_TYPE cmsg), we need to send any pending data from a
previous MSG_MORE send() as a separate DATA record. If that DATA record
is encrypted asynchronously, tls_handle_open_record will return
-EINPROGRESS. This is currently treated as an error by
tls_process_cmsg, and it will skip setting record_type to the correct
value, but the caller (tls_sw_sendmsg_locked) handles that return
value correctly and proceeds with sending the new message with an
incorrect record_type (DATA instead of whatever was requested in the
cmsg).
Always set record_type before handling the open record. If
tls_handle_open_record returns an error, record_type will be
ignored. If it succeeds, whether with synchronous crypto (returning 0)
or asynchronous (returning -EINPROGRESS), the caller will proceed
correctly.
Sabrina Dubroca [Tue, 14 Oct 2025 09:16:57 +0000 (11:16 +0200)]
tls: wait for async encrypt in case of error during latter iterations of sendmsg
If we hit an error during the main loop of tls_sw_sendmsg_locked (eg
failed allocation), we jump to send_end and immediately
return. Previous iterations may have queued async encryption requests
that are still pending. We should wait for those before returning, as
we could otherwise be reading from memory that userspace believes
we're not using anymore, which would be a sort of use-after-free.
This is similar to what tls_sw_recvmsg already does: failures during
the main loop jump to the "wait for async" code, not straight to the
unlock/return.
Sabrina Dubroca [Tue, 14 Oct 2025 09:16:56 +0000 (11:16 +0200)]
tls: trim encrypted message to match the plaintext on short splice
During tls_sw_sendmsg_locked, we pre-allocate the encrypted message
for the size we're expecting to send during the current iteration, but
we may end up sending less, for example when splicing: if we're
getting the data from small fragments of memory, we may fill up all
the slots in the skmsg with less data than expected.
In this case, we need to trim the encrypted message to only the length
we actually need, to avoid pushing uninitialized bytes down the
underlying TCP socket.
Justin Iurman [Tue, 14 Oct 2025 17:06:50 +0000 (19:06 +0200)]
MAINTAINERS: new entry for IPv6 IOAM
Create a maintainer entry for IPv6 IOAM. Add myself as I authored most
if not all of the IPv6 IOAM code in the kernel and actively participate
in the related IETF groups.
Heiner Kallweit [Tue, 14 Oct 2025 06:02:47 +0000 (08:02 +0200)]
net: bcmgenet: remove unused platform code
This effectively reverts b0ba512e25d7 ("net: bcmgenet: enable driver to
work without a device tree"). There has never been an in-tree user of
struct bcmgenet_platform_data, all devices use OF or ACPI.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/108b4e64-55d4-4b4e-9a11-3c810c319d66@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Abhishek Rawal [Tue, 14 Oct 2025 05:52:33 +0000 (11:22 +0530)]
r8152: Advertise software timestamp information.
Driver calls skb_tx_timestamp(skb) in rtl8152_start_xmit(), but does not advertise the capability in ethtool.
Advertise software timestamp capabilities on struct ethtool_ops.
Tim Hostetler [Tue, 14 Oct 2025 00:47:39 +0000 (00:47 +0000)]
gve: Check valid ts bit on RX descriptor before hw timestamping
The device returns a valid bit in the LSB of the low timestamp byte in
the completion descriptor that the driver should check before
setting the SKB's hardware timestamp. If the timestamp is not valid, do not
hardware timestamp the SKB.
Cc: stable@vger.kernel.org Fixes: b2c7aeb49056 ("gve: Implement ndo_hwtstamp_get/set for RX timestamping") Reviewed-by: Joshua Washington <joshwash@google.com> Signed-off-by: Tim Hostetler <thostet@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20251014004740.2775957-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 15 Oct 2025 16:04:23 +0000 (09:04 -0700)]
Merge branch 'net-deal-with-sticky-tx-queues'
Eric Dumazet says:
====================
net: deal with sticky tx queues
Back in 2010, Tom Herbert added skb->ooo_okay to TCP flows.
Extend the feature to connected flows for other protocols like UDP.
skb->ooo_okay might never be set for bulk flows that always
have at least one skb in a qdisc queue of NIC queue,
especially if TX completion is delayed because of a stressed cpu
or aggressive interrupt mitigation.
The so-called "strange attractors" has caused many performance
issues, we need to do better now that TCP reacts better to
potential reorders.
Add new net.core.txq_reselection_ms sysctl to let
flows follow XPS and select a more efficient queue.
After this series, we no longer have to make sure threads
are pinned to cpus, they can migrate without adding
too much [spinlock, qdisc, TX completion] pressure anymore.
====================
Eric Dumazet [Mon, 13 Oct 2025 15:22:34 +0000 (15:22 +0000)]
net: allow busy connected flows to switch tx queues
This is a followup of commit 726e9e8b94b9 ("tcp: refine
skb->ooo_okay setting") and of prior commit in this series
("net: control skb->ooo_okay from skb_set_owner_w()")
skb->ooo_okay might never be set for bulk flows that always
have at least one skb in a qdisc queue of NIC queue,
especially if TX completion is delayed because of a stressed cpu.
The so-called "strange attractors" has caused many performance
issues (see for instance 9b462d02d6dd ("tcp: TCP Small Queues
and strange attractors")), we need to do better.
We have tried very hard to avoid reorders because TCP was
not dealing with them nicely a decade ago.
Use the new net.core.txq_reselection_ms sysctl to let
flows follow XPS and select a more efficient queue.
After this patch, we no longer have to make sure threads
are pinned to cpus, they now can be migrated without
adding too much spinlock/qdisc/TX completion pressure anymore.
TX completion part was problematic, because it added false sharing
on various socket fields, but also added false sharing and spinlock
contention in mm layers. Calling skb_orphan() from ndo_start_xmit()
is not an option unfortunately.
Note for later:
1) move sk->sk_tx_queue_mapping closer
to sk_tx_queue_mapping_jiffies for better cache locality.
2) Study if 9b462d02d6dd ("tcp: TCP Small Queues
and strange attractors") could be revised.
Tested:
Used a host with 32 TX queues, shared by groups of 8 cores.
XPS setup :
Checked that only queues 0 and 1 are used as instructed by XPS :
tc -s qdisc show dev eth1|grep backlog|grep -v "backlog 0b 0p"
backlog 123489410b 1890p
backlog 69809026b 1064p
backlog 52401054b 805p
Then force each thread to run on cpu 1,9,17,25,33,41,49,57,65,73,81,89,97,105,113,121
C=1;PID=`pidof tcp_stream`;for P in `ls /proc/$PID/task`; do taskset -pc $C $P; C=$(($C + 8));done
Set txq_reselection_ms to 1000
echo 1000 > /proc/sys/net/core/txq_reselection_ms
Eric Dumazet [Mon, 13 Oct 2025 15:22:32 +0000 (15:22 +0000)]
net: control skb->ooo_okay from skb_set_owner_w()
15 years after Tom Herbert added skb->ooo_okay, only TCP transport
benefits from it.
We can support other transports directly from skb_set_owner_w().
If no other TX packet for this socket is in a host queue (qdisc, NIC queue)
there is no risk of self-inflicted reordering, we can set skb->ooo_okay.
This allows netdev_pick_tx() to choose a TX queue based on XPS settings,
instead of reusing the queue chosen at the time the first packet was sent
for connected sockets.
Add optional clock for OSC_IN and fix the below CHECK_DTBS warnings:
arch/arm/boot/dts/nxp/imx/imx6qp-prtwd3.dtb: switch@0 (nxp,sja1105q): Unevaluated properties are not allowed ('clocks' was unexpected)
Signed-off-by: Frank Li <Frank.Li@nxp.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20251010183418.2179063-1-Frank.Li@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 15 Oct 2025 14:57:28 +0000 (07:57 -0700)]
Remove long-stale ext3 defconfig option
Inspired by commit c065b6046b34 ("Use CONFIG_EXT4_FS instead of
CONFIG_EXT3_FS in all of the defconfigs") I looked around for any other
left-over EXT3 config options, and found some old defconfig files still
mentioned CONFIG_EXT3_DEFAULTS_TO_ORDERED.
That config option was removed a decade ago in commit c290ea01abb7 ("fs:
Remove ext3 filesystem driver"). It had a good run, but let's remove it
for good.
Linus Torvalds [Wed, 15 Oct 2025 14:51:57 +0000 (07:51 -0700)]
Merge tag 'ext4_for_linus-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bug fixes from Ted Ts'o:
- Fix regression caused by removing CONFIG_EXT3_FS when testing some
very old defconfigs
- Avoid a BUG_ON when opening a file on a maliciously corrupted file
system
- Avoid mm warnings when freeing a very large orphan file metadata
- Avoid a theoretical races between metadata writeback and checkpoints
(it's very hard to hit in practice, since the race requires that the
writeback take a very long time)
* tag 'ext4_for_linus-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
Use CONFIG_EXT4_FS instead of CONFIG_EXT3_FS in all of the defconfigs
ext4: free orphan info with kvfree
ext4: detect invalid INLINE_DATA + EXTENTS flag combination
ext4, doc: fix and improve directory hash tree description
ext4: wait for ongoing I/O to complete before freeing blocks
jbd2: ensure that all ongoing I/O complete before freeing blocks
PM / devfreq: rockchip-dfi: switch to FIELD_PREP_WM16 macro
The era of hand-rolled HIWORD_UPDATE macros is over, at least for those
drivers that use constant masks.
Like many other Rockchip drivers, rockchip-dfi brings with it its own
HIWORD_UPDATE macro. This variant doesn't shift the value (and like the
others, doesn't do any checking).
Remove it, and replace instances of it with hw_bitfield.h's
FIELD_PREP_WM16. Since FIELD_PREP_WM16 requires contiguous masks and
shifts the value for us, some reshuffling of definitions needs to
happen.
This gives us better compile-time error checking, and in my opinion,
nicer code.
Tested on an RK3568 ODROID-M1 board (LPDDR4X at 1560 MHz, an RK3588
Radxa ROCK 5B board (LPDDR4X at 2112 MHz) and an RK3588 Radxa ROCK 5T
board (LPDDR5 at 2400 MHz). perf measurements were consistent with the
measurements of stress-ng --stream in all cases.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Florian Westphal [Mon, 13 Oct 2025 18:50:52 +0000 (20:50 +0200)]
net: core: fix lockdep splat on device unregister
Since blamed commit, unregister_netdevice_many_notify() takes the netdev
mutex if the device needs it.
If the device list is too long, this will lock more device mutexes than
lockdep can handle:
unshare -n \
bash -c 'for i in $(seq 1 100);do ip link add foo$i type dummy;done'
BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48 max: 48!
48 locks held by kworker/u16:1/69:
#0: ..148 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work
#1: ..d40 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work
#2: ..bd0 (pernet_ops_rwsem){++++}-{4:4}, at: cleanup_net
#3: ..aa8 (rtnl_mutex){+.+.}-{4:4}, at: default_device_exit_batch
#4: ..cb0 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: unregister_netdevice_many_notify
[..]
Add a helper to close and then unlock a list of net_devices.
Devices that are not up have to be skipped - netif_close_many always
removes them from the list without any other actions taken, so they'd
remain in locked state.
Close devices whenever we've used up half of the tracking slots or we
processed entire list without hitting the limit.
Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20251013185052.14021-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alok Tiwari [Mon, 13 Oct 2025 10:01:16 +0000 (03:01 -0700)]
net: bridge: correct debug message function name in br_fill_ifinfo
The debug message in br_fill_ifinfo() incorrectly refers to br_fill_info
instead of the actual function name. Update it for clarity in debugging
output.
Linus Torvalds [Tue, 14 Oct 2025 16:15:45 +0000 (09:15 -0700)]
Merge tag 'for-linus-6.18-2' of https://github.com/cminyard/linux-ipmi
Pull IPMI fixes from Corey Minyard:
"A few bug fixes for patches that went in this release: a refcount
error and some missing or incorrect error checks"
* tag 'for-linus-6.18-2' of https://github.com/cminyard/linux-ipmi:
ipmi: Fix handling of messages with provided receive message pointer
mfd: ls2kbmc: check for devm_mfd_add_devices() failure
mfd: ls2kbmc: Fix an IS_ERR() vs NULL check in probe()
Wang Liang [Mon, 13 Oct 2025 08:00:39 +0000 (16:00 +0800)]
selftests: net: check jq command is supported
The jq command is used in vlan_bridge_binding.sh, if it is not supported,
the test will spam the following log.
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# TEST: Test bridge_binding on->off when lower down [FAIL]
# Got operstate of , expected 0
The rtnetlink.sh has the same problem. It makes sense to check if jq is
installed before running these tests. After this patch, the
vlan_bridge_binding.sh skipped if jq is not supported:
# timeout set to 3600
# selftests: net: vlan_bridge_binding.sh
# TEST: jq not installed [SKIP]
Fixes: dca12e9ab760 ("selftests: net: Add a VLAN bridge binding selftest") Fixes: 6a414fd77f61 ("selftests: rtnetlink: Add an address proto test") Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20251013080039.3035898-1-wangliang74@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jan Vaclav [Thu, 9 Oct 2025 21:09:08 +0000 (23:09 +0200)]
net/hsr: add protocol version to fill_info output
Currently, it is possible to configure IFLA_HSR_VERSION, but
there is no way to check in userspace what the currently
configured HSR protocol version is.
Add it to the output of hsr_fill_info(), when the interface
is using the HSR protocol. Let's not expose it when using
the PRP protocol, since it only has one version and it's
not possible to set it from userspace.
This info could then be used by e.g. ip(8), like so:
$ ip -d link show hsr0
12: hsr0: <BROADCAST,MULTICAST> mtu ...
...
hsr slave1 veth0 slave2 veth1 ... proto 0 version 1 Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Jan Vaclav <jvaclav@redhat.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251009210903.1055187-6-jvaclav@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Byungchul Park [Mon, 13 Oct 2025 04:41:33 +0000 (13:41 +0900)]
netmem: replace __netmem_clear_lsb() with netmem_to_nmdesc()
Now that we have struct netmem_desc, it'd better access the pp fields
via struct netmem_desc rather than struct net_iov.
Introduce netmem_to_nmdesc() for safely converting netmem_ref to
netmem_desc regardless of the type underneath e.i. netmem_desc, net_iov.
While at it, remove __netmem_clear_lsb() and make netmem_to_nmdesc()
used instead.
Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20251013044133.69472-1-byungchul@sk.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kriish Sharma [Mon, 13 Oct 2025 01:43:19 +0000 (01:43 +0000)]
hdlc_ppp: fix potential null pointer in ppp_cp_event logging
drivers/net/wan/hdlc_ppp.c: In function ‘ppp_cp_event’:
drivers/net/wan/hdlc_ppp.c:353:17: warning: ‘%s’ directive argument is null [-Wformat-overflow=]
353 | netdev_info(dev, "%s down\n", proto_name(pid));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/wan/hdlc_ppp.c:342:17: warning: ‘%s’ directive argument is null [-Wformat-overflow=]
342 | netdev_info(dev, "%s up\n", proto_name(pid));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Update proto_name() to return "LCP" by default instead of NULL.
This change silences the compiler without changing existing behavior
and removes the need for the local 'pname' variable in ppp_cp_event.