]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
5 months agor8152: use SHA-256 library API instead of crypto_shash API
Eric Biggers [Mon, 28 Apr 2025 19:16:06 +0000 (12:16 -0700)] 
r8152: use SHA-256 library API instead of crypto_shash API

This user of SHA-256 does not support any other algorithm, so the
crypto_shash abstraction provides no value.  Just use the SHA-256
library API instead, which is much simpler and easier to use.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://patch.msgid.link/20250428191606.856198-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: phy: realtek: Add support for WOL magic packet on RTL8211F
Daniel Braunwarth [Tue, 29 Apr 2025 11:33:37 +0000 (13:33 +0200)] 
net: phy: realtek: Add support for WOL magic packet on RTL8211F

The RTL8211F supports multiple WOL modes. This patch adds support for
magic packets.

The PHY notifies the system via the INTB/PMEB pin when a WOL event
occurs.

Signed-off-by: Daniel Braunwarth <daniel.braunwarth@kuka.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250429-realtek_wol-v2-1-8f84def1ef2c@kuka.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: drv-net: rss_input_xfrm: Check test prerequisites before running
Gal Pressman [Wed, 30 Apr 2025 05:48:01 +0000 (08:48 +0300)] 
selftests: drv-net: rss_input_xfrm: Check test prerequisites before running

Ensure the following prerequisites before executing the test:
1. 'socat' is installed on the remote host.
2. Python version supports socket.SO_INCOMING_CPU (available since v3.11).

Skip the test if either prerequisite is not met.

Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250430054801.750646-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agodt-bindings: net: sun8i-emac: Add A523 EMAC0 compatible
Yixun Lan [Wed, 30 Apr 2025 05:32:04 +0000 (13:32 +0800)] 
dt-bindings: net: sun8i-emac: Add A523 EMAC0 compatible

Allwinner A523 SoC variant (A527/T527) contains an "EMAC0" Ethernet
MAC compatible to the A64 version.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Yixun Lan <dlan@gentoo.org>
Link: https://patch.msgid.link/20250430-01-sun55i-emac0-v3-2-6fc000bbccbd@gentoo.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Jakub Kicinski [Fri, 2 May 2025 00:51:31 +0000 (17:51 -0700)] 
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-04-29 (igb, igc, ixgbe, idpf)

For igb:
Kurt Kanzenbach adds linking of IRQs and queues to NAPI instances and
adds persistent NAPI config. Lastly, he removes undesired IRQs that
occur while busy polling.

For igc:
Kurt Kanzenbach switches the Tx mode for MQPRIO offload to harmonize the
current implementation with TAPRIO.

For ixgbe:
Jedrzej adds separate ethtool ops for E610 devices to account for device
differences.

Slawomir adds devlink region support for E610 devices.

For idpf:
Mateusz assigns and utilizes the ptype field out of libeth_rqe_info.

Michal removes unreachable code.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  idpf: remove unreachable code from setting mailbox
  idpf: assign extracted ptype to struct libeth_rqe_info field
  ixgbe: devlink: add devlink region support for E610
  ixgbe: add E610 .set_phys_id() callback implementation
  ixgbe: apply different rules for setting FC on E610
  ixgbe: add support for ACPI WOL for E610
  ixgbe: create E610 specific ethtool_ops structure
  igc: Change Tx mode for MQPRIO offloading
  igc: Limit netdev_tc calls to MQPRIO
  igb: Get rid of spurious interrupts
  igb: Add support for persistent NAPI config
  igb: Link queues to NAPI instances
  igb: Link IRQs to NAPI instances
====================

Link: https://patch.msgid.link/20250429234651.3982025-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 1 May 2025 22:11:17 +0000 (15:11 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.15-rc5).

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge tag 'net-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 1 May 2025 17:37:49 +0000 (10:37 -0700)] 
Merge tag 'net-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Happy May Day.

  Things have calmed down on our end (knock on wood), no outstanding
  investigations. Including fixes from Bluetooth and WiFi.

  Current release - fix to a fix:

   - igc: fix lock order in igc_ptp_reset

  Current release - new code bugs:

   - Revert "wifi: iwlwifi: make no_160 more generic", fixes regression
     to Killer line of devices reported by a number of people

   - Revert "wifi: iwlwifi: add support for BE213", initial FW is too
     buggy

   - number of fixes for mld, the new Intel WiFi subdriver

  Previous releases - regressions:

   - wifi: mac80211: restore monitor for outgoing frames

   - drv: vmxnet3: fix malformed packet sizing in vmxnet3_process_xdp

   - eth: bnxt_en: fix timestamping FIFO getting out of sync on reset,
     delivering stale timestamps

   - use sock_gen_put() in the TCP fraglist GRO heuristic, don't assume
     every socket is a full socket

  Previous releases - always broken:

   - sched: adapt qdiscs for reentrant enqueue cases, fix list
     corruptions

   - xsk: fix race condition in AF_XDP generic RX path, shared UMEM
     can't be protected by a per-socket lock

   - eth: mtk-star-emac: fix spinlock recursion issues on rx/tx poll

   - btusb: avoid NULL pointer dereference in skb_dequeue()

   - dsa: felix: fix broken taprio gate states after clock jump"

* tag 'net-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  net: vertexcom: mse102x: Fix RX error handling
  net: vertexcom: mse102x: Add range check for CMD_RTS
  net: vertexcom: mse102x: Fix LEN_MASK
  net: vertexcom: mse102x: Fix possible stuck of SPI interrupt
  net: hns3: defer calling ptp_clock_register()
  net: hns3: fixed debugfs tm_qset size
  net: hns3: fix an interrupt residual problem
  net: hns3: store rx VLAN tag offload state for VF
  octeon_ep: Fix host hang issue during device reboot
  net: fec: ERR007885 Workaround for conventional TX
  net: lan743x: Fix memleak issue when GSO enabled
  ptp: ocp: Fix NULL dereference in Adva board SMA sysfs operations
  net: use sock_gen_put() when sk_state is TCP_TIME_WAIT
  bnxt_en: fix module unload sequence
  bnxt_en: Fix ethtool -d byte order for 32-bit values
  bnxt_en: Fix out-of-bound memcpy() during ethtool -w
  bnxt_en: Fix coredump logic to free allocated buffer
  bnxt_en: delay pci_alloc_irq_vectors() in the AER path
  bnxt_en: call pci_alloc_irq_vectors() after bnxt_reserve_rings()
  bnxt_en: Add missing skb_mark_for_recycle() in bnxt_rx_vlan()
  ...

5 months agoMerge branch 'net-vertexcom-mse102x-fix-rx-handling'
Jakub Kicinski [Thu, 1 May 2025 14:24:08 +0000 (07:24 -0700)] 
Merge branch 'net-vertexcom-mse102x-fix-rx-handling'

Stefan Wahren says:

====================
net: vertexcom: mse102x: Fix RX handling

This series is the first part of two series for the Vertexcom driver.
It contains substantial fixes for the RX handling of the Vertexcom MSE102x.
====================

Link: https://patch.msgid.link/20250430133043.7722-1-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: vertexcom: mse102x: Fix RX error handling
Stefan Wahren [Wed, 30 Apr 2025 13:30:43 +0000 (15:30 +0200)] 
net: vertexcom: mse102x: Fix RX error handling

In case the CMD_RTS got corrupted by interferences, the MSE102x
doesn't allow a retransmission of the command. Instead the Ethernet
frame must be shifted out of the SPI FIFO. Since the actual length is
unknown, assume the maximum possible value.

Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250430133043.7722-5-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: vertexcom: mse102x: Add range check for CMD_RTS
Stefan Wahren [Wed, 30 Apr 2025 13:30:42 +0000 (15:30 +0200)] 
net: vertexcom: mse102x: Add range check for CMD_RTS

Since there is no protection in the SPI protocol against electrical
interferences, the driver shouldn't blindly trust the length payload
of CMD_RTS. So introduce a bounds check for incoming frames.

Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250430133043.7722-4-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: vertexcom: mse102x: Fix LEN_MASK
Stefan Wahren [Wed, 30 Apr 2025 13:30:41 +0000 (15:30 +0200)] 
net: vertexcom: mse102x: Fix LEN_MASK

The LEN_MASK for CMD_RTS doesn't cover the whole parameter mask.
The Bit 11 is reserved, so adjust LEN_MASK accordingly.

Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250430133043.7722-3-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: vertexcom: mse102x: Fix possible stuck of SPI interrupt
Stefan Wahren [Wed, 30 Apr 2025 13:30:40 +0000 (15:30 +0200)] 
net: vertexcom: mse102x: Fix possible stuck of SPI interrupt

The MSE102x doesn't provide any SPI commands for interrupt handling.
So in case the interrupt fired before the driver requests the IRQ,
the interrupt will never fire again. In order to fix this always poll
for pending packets after opening the interface.

Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250430133043.7722-2-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'
Jakub Kicinski [Thu, 1 May 2025 14:19:52 +0000 (07:19 -0700)] 
Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'

Jijie Shao says:

====================
There are some bugfix for the HNS3 ethernet driver
====================

Link: https://patch.msgid.link/20250430093052.2400464-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: hns3: defer calling ptp_clock_register()
Jian Shen [Wed, 30 Apr 2025 09:30:52 +0000 (17:30 +0800)] 
net: hns3: defer calling ptp_clock_register()

Currently the ptp_clock_register() is called before relative
ptp resource ready. It may cause unexpected result when upper
layer called the ptp API during the timewindow. Fix it by
moving the ptp_clock_register() to the function end.

Fixes: 0bf5eb788512 ("net: hns3: add support for PTP")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250430093052.2400464-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: hns3: fixed debugfs tm_qset size
Hao Lan [Wed, 30 Apr 2025 09:30:51 +0000 (17:30 +0800)] 
net: hns3: fixed debugfs tm_qset size

The size of the tm_qset file of debugfs is limited to 64 KB,
which is too small in the scenario with 1280 qsets.
The size needs to be expanded to 1 MB.

Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process")
Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20250430093052.2400464-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: hns3: fix an interrupt residual problem
Yonglong Liu [Wed, 30 Apr 2025 09:30:50 +0000 (17:30 +0800)] 
net: hns3: fix an interrupt residual problem

When a VF is passthrough to a VM, and the VM is killed, the reported
interrupt may not been handled, it will remain, and won't be clear by
the nic engine even with a flr or tqp reset. When the VM restart, the
interrupt of the first vector may be dropped by the second enable_irq
in vfio, see the issue below:
https://gitlab.com/qemu-project/qemu/-/issues/2884#note_2423361621

We notice that the vfio has always behaved this way, and the interrupt
is a residue of the nic engine, so we fix the problem by moving the
vector enable process out of the enable_irq loop.

Fixes: 08a100689d4b ("net: hns3: re-organize vector handle")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20250430093052.2400464-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: hns3: store rx VLAN tag offload state for VF
Jian Shen [Wed, 30 Apr 2025 09:30:49 +0000 (17:30 +0800)] 
net: hns3: store rx VLAN tag offload state for VF

The VF driver missed to store the rx VLAN tag strip state when
user change the rx VLAN tag offload state. And it will default
to enable the rx vlan tag strip when re-init VF device after
reset. So if user disable rx VLAN tag offload, and trig reset,
then the HW will still strip the VLAN tag from packet nad fill
into RX BD, but the VF driver will ignore it for rx VLAN tag
offload disabled. It may cause the rx VLAN tag dropped.

Fixes: b2641e2ad456 ("net: hns3: Add support of hardware rx-vlan-offload to HNS3 VF driver")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250430093052.2400464-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Thu, 1 May 2025 14:17:15 +0000 (07:17 -0700)] 
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-04-29 (idpf, igc)

For idpf:
Michal fixes error path handling to remove memory leak.

Larysa prevents reset from being called during shutdown.

For igc:
Jake adjusts locking order to resolve sleeping in atomic context.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: fix lock order in igc_ptp_reset
  idpf: protect shutdown from reset
  idpf: fix potential memory leak on kcalloc() failure
====================

Link: https://patch.msgid.link/20250429221034.3909139-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoocteon_ep: Fix host hang issue during device reboot
Sathesh B Edara [Tue, 29 Apr 2025 11:46:24 +0000 (04:46 -0700)] 
octeon_ep: Fix host hang issue during device reboot

When the host loses heartbeat messages from the device,
the driver calls the device-specific ndo_stop function,
which frees the resources. If the driver is unloaded in
this scenario, it calls ndo_stop again, attempting to free
resources that have already been freed, leading to a host
hang issue. To resolve this, dev_close should be called
instead of the device-specific stop function.dev_close
internally calls ndo_stop to stop the network interface
and performs additional cleanup tasks. During the driver
unload process, if the device is already down, ndo_stop
is not called.

Fixes: 5cb96c29aa0e ("octeon_ep: add heartbeat monitor")
Signed-off-by: Sathesh B Edara <sedara@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250429114624.19104-1-sedara@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: fec: ERR007885 Workaround for conventional TX
Mattias Barthel [Tue, 29 Apr 2025 09:08:26 +0000 (11:08 +0200)] 
net: fec: ERR007885 Workaround for conventional TX

Activate TX hang workaround also in
fec_enet_txq_submit_skb() when TSO is not enabled.

Errata: ERR007885

Symptoms: NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out

commit 37d6017b84f7 ("net: fec: Workaround for imx6sx enet tx hang when enable three queues")
There is a TDAR race condition for mutliQ when the software sets TDAR
and the UDMA clears TDAR simultaneously or in a small window (2-4 cycles).
This will cause the udma_tx and udma_tx_arbiter state machines to hang.

So, the Workaround is checking TDAR status four time, if TDAR cleared by
    hardware and then write TDAR, otherwise don't set TDAR.

Fixes: 53bb20d1faba ("net: fec: add variable reg_desc_active to speed things up")
Signed-off-by: Mattias Barthel <mattias.barthel@atlascopco.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250429090826.3101258-1-mattiasbarthel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: lan743x: Fix memleak issue when GSO enabled
Thangaraj Samynathan [Tue, 29 Apr 2025 05:25:27 +0000 (10:55 +0530)] 
net: lan743x: Fix memleak issue when GSO enabled

Always map the `skb` to the LS descriptor. Previously skb was
mapped to EXT descriptor when the number of fragments is zero with
GSO enabled. Mapping the skb to EXT descriptor prevents it from
being freed, leading to a memory leak

Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250429052527.10031-1-thangaraj.s@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoptp: ocp: Fix NULL dereference in Adva board SMA sysfs operations
Sagi Maimon [Tue, 29 Apr 2025 07:33:20 +0000 (10:33 +0300)] 
ptp: ocp: Fix NULL dereference in Adva board SMA sysfs operations

On Adva boards, SMA sysfs store/get operations can call
__handle_signal_outputs() or __handle_signal_inputs() while the `irig`
and `dcf` pointers are uninitialized, leading to a NULL pointer
dereference in __handle_signal() and causing a kernel crash. Adva boards
don't use `irig` or `dcf` functionality, so add Adva-specific callbacks
`ptp_ocp_sma_adva_set_outputs()` and `ptp_ocp_sma_adva_set_inputs()` that
avoid invoking `irig` or `dcf` input/output routines.

Fixes: ef61f5528fca ("ptp: ocp: add Adva timecard support")
Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250429073320.33277-1-maimon.sagi@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: use sock_gen_put() when sk_state is TCP_TIME_WAIT
Jibin Zhang [Tue, 29 Apr 2025 01:59:48 +0000 (09:59 +0800)] 
net: use sock_gen_put() when sk_state is TCP_TIME_WAIT

It is possible for a pointer of type struct inet_timewait_sock to be
returned from the functions __inet_lookup_established() and
__inet6_lookup_established(). This can cause a crash when the
returned pointer is of type struct inet_timewait_sock and
sock_put() is called on it. The following is a crash call stack that
shows sk->sk_wmem_alloc being accessed in sk_free() during the call to
sock_put() on a struct inet_timewait_sock pointer. To avoid this issue,
use sock_gen_put() instead of sock_put() when sk->sk_state
is TCP_TIME_WAIT.

mrdump.ko        ipanic() + 120
vmlinux          notifier_call_chain(nr_to_call=-1, nr_calls=0) + 132
vmlinux          atomic_notifier_call_chain(val=0) + 56
vmlinux          panic() + 344
vmlinux          add_taint() + 164
vmlinux          end_report() + 136
vmlinux          kasan_report(size=0) + 236
vmlinux          report_tag_fault() + 16
vmlinux          do_tag_recovery() + 16
vmlinux          __do_kernel_fault() + 88
vmlinux          do_bad_area() + 28
vmlinux          do_tag_check_fault() + 60
vmlinux          do_mem_abort() + 80
vmlinux          el1_abort() + 56
vmlinux          el1h_64_sync_handler() + 124
vmlinux        > 0xFFFFFFC080011294()
vmlinux          __lse_atomic_fetch_add_release(v=0xF2FFFF82A896087C)
vmlinux          __lse_atomic_fetch_sub_release(v=0xF2FFFF82A896087C)
vmlinux          arch_atomic_fetch_sub_release(i=1, v=0xF2FFFF82A896087C)
+ 8
vmlinux          raw_atomic_fetch_sub_release(i=1, v=0xF2FFFF82A896087C)
+ 8
vmlinux          atomic_fetch_sub_release(i=1, v=0xF2FFFF82A896087C) + 8
vmlinux          __refcount_sub_and_test(i=1, r=0xF2FFFF82A896087C,
oldp=0) + 8
vmlinux          __refcount_dec_and_test(r=0xF2FFFF82A896087C, oldp=0) + 8
vmlinux          refcount_dec_and_test(r=0xF2FFFF82A896087C) + 8
vmlinux          sk_free(sk=0xF2FFFF82A8960700) + 28
vmlinux          sock_put() + 48
vmlinux          tcp6_check_fraglist_gro() + 236
vmlinux          tcp6_gro_receive() + 624
vmlinux          ipv6_gro_receive() + 912
vmlinux          dev_gro_receive() + 1116
vmlinux          napi_gro_receive() + 196
ccmni.ko         ccmni_rx_callback() + 208
ccmni.ko         ccmni_queue_recv_skb() + 388
ccci_dpmaif.ko   dpmaif_rxq_push_thread() + 1088
vmlinux          kthread() + 268
vmlinux          0xFFFFFFC08001F30C()

Fixes: c9d1d23e5239 ("net: add heuristic for enabling TCP fraglist GRO")
Signed-off-by: Jibin Zhang <jibin.zhang@mediatek.com>
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250429020412.14163-1-shiming.cheng@mediatek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agobnxt_en: fix module unload sequence
Vadim Fedorenko [Wed, 30 Apr 2025 17:03:43 +0000 (10:03 -0700)] 
bnxt_en: fix module unload sequence

Recent updates to the PTP part of bnxt changed the way PTP FIFO is
cleared, skbs waiting for TX timestamps are now cleared during
ndo_close() call. To do clearing procedure, the ptp structure must
exist and point to a valid address. Module destroy sequence had ptp
clear code running before netdev close causing invalid memory access and
kernel crash. Change the sequence to destroy ptp structure after device
close.

Fixes: 8f7ae5a85137 ("bnxt_en: improve TX timestamping FIFO configuration")
Reported-by: Taehee Yoo <ap420073@gmail.com>
Closes: https://lore.kernel.org/netdev/CAMArcTWDe2cd41=ub=zzvYifaYcYv-N-csxfqxUvejy_L0D6UQ@mail.gmail.com/
Signed-off-by: Vadim Fedorenko <vadfed@meta.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250430170343.759126-1-vadfed@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agokbuild: Properly disable -Wunterminated-string-initialization for clang
Nathan Chancellor [Wed, 30 Apr 2025 22:56:34 +0000 (15:56 -0700)] 
kbuild: Properly disable -Wunterminated-string-initialization for clang

Clang and GCC have different behaviors around disabling warnings
included in -Wall and -Wextra and the order in which flags are
specified, which is exposed by clang's new support for
-Wunterminated-string-initialization.

  $ cat test.c
  const char foo[3] = "FOO";
  const char bar[3] __attribute__((__nonstring__)) = "BAR";

  $ clang -fsyntax-only -Wextra test.c
  test.c:1:21: warning: initializer-string for character array is too long, array size is 3 but initializer has size 4 (including the null terminating character); did you mean to use the 'nonstring' attribute? [-Wunterminated-string-initialization]
      1 | const char foo[3] = "FOO";
        |                     ^~~~~
  $ clang -fsyntax-only -Wextra -Wno-unterminated-string-initialization test.c
  $ clang -fsyntax-only -Wno-unterminated-string-initialization -Wextra test.c
  test.c:1:21: warning: initializer-string for character array is too long, array size is 3 but initializer has size 4 (including the null terminating character); did you mean to use the 'nonstring' attribute? [-Wunterminated-string-initialization]
      1 | const char foo[3] = "FOO";
        |                     ^~~~~

  $ gcc -fsyntax-only -Wextra test.c
  test.c:1:21: warning: initializer-string for array of ‘char’ truncates NUL terminator but destination lacks ‘nonstring’ attribute (4 chars into 3 available) [-Wunterminated-string-initialization]
      1 | const char foo[3] = "FOO";
        |                     ^~~~~
  $ gcc -fsyntax-only -Wextra -Wno-unterminated-string-initialization test.c
  $ gcc -fsyntax-only -Wno-unterminated-string-initialization -Wextra test.c

Move -Wextra up right below -Wall in Makefile.extrawarn to ensure these
flags are at the beginning of the warning options list. Move the couple
of warning options that have been added to the main Makefile since
commit e88ca24319e4 ("kbuild: consolidate warning flags in
scripts/Makefile.extrawarn") to scripts/Makefile.extrawarn after -Wall /
-Wextra to ensure they get properly disabled for all compilers.

Fixes: 9d7a0577c9db ("gcc-15: disable '-Wunterminated-string-initialization' entirely for now")
Link: https://github.com/llvm/llvm-project/issues/10359
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 months agoMerge tag 'for-6.15-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Wed, 30 Apr 2025 15:56:50 +0000 (08:56 -0700)] 
Merge tag 'for-6.15-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fix potential inode leak in iget() after memory allocation failure

 - in subpage mode, fix extent buffer bitmap iteration when writing out
   dirty sectors

 - fix range calculation when falling back to COW for a NOCOW file

* tag 'for-6.15-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: adjust subpage bit start based on sectorsize
  btrfs: fix the inode leak in btrfs_iget()
  btrfs: fix COW handling in run_delalloc_nocow()

5 months agoMerge tag 'modules-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/modules...
Linus Torvalds [Wed, 30 Apr 2025 15:37:52 +0000 (08:37 -0700)] 
Merge tag 'modules-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux

Pull modules fixes from Petr Pavlu:
 "A single series to properly handle the module_kobject creation.

  This fixes a problem with missing /sys/module/<module>/drivers for
  built-in modules"

* tag 'modules-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
  drivers: base: handle module_kobject creation
  kernel: globalize lookup_or_create_module_kobject()
  kernel: refactor lookup_or_create_module_kobject()
  kernel: param: rename locate_module_kobject

5 months agoMerge branch 'bnxt_en-fixes'
David S. Miller [Wed, 30 Apr 2025 12:03:22 +0000 (13:03 +0100)] 
Merge branch 'bnxt_en-fixes'

Michael Chan says:

====================
bnxt_en: Misc. bug fixes

This series fixes a bug in the driver initialization path, MSIX
setup sequencing issue in the FW error and AER paths, a missing
skb_mark_for_recycle() in the VLAN error path, some ethtool coredump
fixes, an ethtool selftest fix, and an ethtool register dump byte order
fix.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agobnxt_en: Fix ethtool -d byte order for 32-bit values
Michael Chan [Mon, 28 Apr 2025 22:59:03 +0000 (15:59 -0700)] 
bnxt_en: Fix ethtool -d byte order for 32-bit values

For version 1 register dump that includes the PCIe stats, the existing
code incorrectly assumes that all PCIe stats are 64-bit values.  Fix it
by using an array containing the starting and ending index of the 32-bit
values.  The loop in bnxt_get_regs() will use the array to do proper
endian swap for the 32-bit values.

Fixes: b5d600b027eb ("bnxt_en: Add support for 'ethtool -d'")
Reviewed-by: Shruti Parab <shruti.parab@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agobnxt_en: Fix out-of-bound memcpy() during ethtool -w
Shruti Parab [Mon, 28 Apr 2025 22:59:02 +0000 (15:59 -0700)] 
bnxt_en: Fix out-of-bound memcpy() during ethtool -w

When retrieving the FW coredump using ethtool, it can sometimes cause
memory corruption:

BUG: KFENCE: memory corruption in __bnxt_get_coredump+0x3ef/0x670 [bnxt_en]
Corrupted memory at 0x000000008f0f30e8 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#45):
__bnxt_get_coredump+0x3ef/0x670 [bnxt_en]
ethtool_get_dump_data+0xdc/0x1a0
__dev_ethtool+0xa1e/0x1af0
dev_ethtool+0xa8/0x170
dev_ioctl+0x1b5/0x580
sock_do_ioctl+0xab/0xf0
sock_ioctl+0x1ce/0x2e0
__x64_sys_ioctl+0x87/0xc0
do_syscall_64+0x5c/0xf0
entry_SYSCALL_64_after_hwframe+0x78/0x80

...

This happens when copying the coredump segment list in
bnxt_hwrm_dbg_dma_data() with the HWRM_DBG_COREDUMP_LIST FW command.
The info->dest_buf buffer is allocated based on the number of coredump
segments returned by the FW.  The segment list is then DMA'ed by
the FW and the length of the DMA is returned by FW.  The driver then
copies this DMA'ed segment list to info->dest_buf.

In some cases, this DMA length may exceed the info->dest_buf length
and cause the above BUG condition.  Fix it by capping the copy
length to not exceed the length of info->dest_buf.  The extra
DMA data contains no useful information.

This code path is shared for the HWRM_DBG_COREDUMP_LIST and the
HWRM_DBG_COREDUMP_RETRIEVE FW commands.  The buffering is different
for these 2 FW commands.  To simplify the logic, we need to move
the line to adjust the buffer length for HWRM_DBG_COREDUMP_RETRIEVE
up, so that the new check to cap the copy length will work for both
commands.

Fixes: c74751f4c392 ("bnxt_en: Return error if FW returns more data than dump length")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agobnxt_en: Fix coredump logic to free allocated buffer
Shruti Parab [Mon, 28 Apr 2025 22:59:01 +0000 (15:59 -0700)] 
bnxt_en: Fix coredump logic to free allocated buffer

When handling HWRM_DBG_COREDUMP_LIST FW command in
bnxt_hwrm_dbg_dma_data(), the allocated buffer info->dest_buf is
not freed in the error path.  In the normal path, info->dest_buf
is assigned to coredump->data and it will eventually be freed after
the coredump is collected.

Free info->dest_buf immediately inside bnxt_hwrm_dbg_dma_data() in
the error path.

Fixes: c74751f4c392 ("bnxt_en: Return error if FW returns more data than dump length")
Reported-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agobnxt_en: delay pci_alloc_irq_vectors() in the AER path
Kashyap Desai [Mon, 28 Apr 2025 22:59:00 +0000 (15:59 -0700)] 
bnxt_en: delay pci_alloc_irq_vectors() in the AER path

This patch is similar to the last patch to delay the
pci_alloc_irq_vectors() call in the AER path until after calling
bnxt_reserve_rings().  bnxt_reserve_rings() needs to properly map
the MSIX table first before we call pci_alloc_irq_vectors() which
may immediately write to the MSIX table in some architectures.

Move the bnxt_init_int_mode() call from bnxt_io_slot_reset() to
bnxt_io_resume() after calling bnxt_reserve_rings().

With this change, the AER path may call bnxt_open() ->
bnxt_hwrm_if_change() with bp->irq_tbl set to NULL.  bp->irq_tbl is
cleared when we call bnxt_clear_int_mode() in bnxt_io_slot_reset().
So we cannot use !bp->irq_tbl to detect aborted FW reset.  Add a
new BNXT_FW_RESET_STATE_ABORT to detect aborted FW reset in
bnxt_hwrm_if_change().

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agobnxt_en: call pci_alloc_irq_vectors() after bnxt_reserve_rings()
Kashyap Desai [Mon, 28 Apr 2025 22:58:59 +0000 (15:58 -0700)] 
bnxt_en: call pci_alloc_irq_vectors() after bnxt_reserve_rings()

On some architectures (e.g. ARM), calling pci_alloc_irq_vectors()
will immediately cause the MSIX table to be written.  This will not
work if we haven't called bnxt_reserve_rings() to properly map
the MSIX table to the MSIX vectors reserved by FW.

Fix the FW error recovery path to delay the bnxt_init_int_mode() ->
pci_alloc_irq_vectors() call by removing it from bnxt_hwrm_if_change().
bnxt_request_irq() later in the code path will call it and by then the
MSIX table is properly mapped.

Fixes: 4343838ca5eb ("bnxt_en: Replace deprecated PCI MSIX APIs")
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agobnxt_en: Add missing skb_mark_for_recycle() in bnxt_rx_vlan()
Somnath Kotur [Mon, 28 Apr 2025 22:58:58 +0000 (15:58 -0700)] 
bnxt_en: Add missing skb_mark_for_recycle() in bnxt_rx_vlan()

If bnxt_rx_vlan() fails because the VLAN protocol ID is invalid,
the SKB is freed but we're missing the call to recycle it.  This
may cause the warning:

"page_pool_release_retry() stalled pool shutdown"

Add the missing skb_mark_for_recycle() in bnxt_rx_vlan().

Fixes: 86b05508f775 ("bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP")
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agobnxt_en: Fix ethtool selftest output in one of the failure cases
Kalesh AP [Mon, 28 Apr 2025 22:58:57 +0000 (15:58 -0700)] 
bnxt_en: Fix ethtool selftest output in one of the failure cases

When RDMA driver is loaded, running offline self test is not
supported and driver returns failure early. But it is not clearing
the input buffer and hence the application prints some junk
characters for individual test results.

Fix it by clearing the buffer before returning.

Fixes: 895621f1c816 ("bnxt_en: Don't support offline self test when RoCE driver is loaded")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agobnxt_en: Fix error handling path in bnxt_init_chip()
Shravya KN [Mon, 28 Apr 2025 22:58:56 +0000 (15:58 -0700)] 
bnxt_en: Fix error handling path in bnxt_init_chip()

WARN_ON() is triggered in __flush_work() if bnxt_init_chip() fails
because we call cancel_work_sync() on dim work that has not been
initialized.

WARNING: CPU: 37 PID: 5223 at kernel/workqueue.c:4201 __flush_work.isra.0+0x212/0x230

The driver relies on the BNXT_STATE_NAPI_DISABLED bit to check if dim
work has already been cancelled.  But in the bnxt_open() path,
BNXT_STATE_NAPI_DISABLED is not set and this causes the error
path to think that it needs to cancel the uninitalized dim work.
Fix it by setting BNXT_STATE_NAPI_DISABLED during initialization.
The bit will be cleared when we enable NAPI and initialize dim work.

Fixes: 40452969a506 ("bnxt_en: Fix DIM shutdown")
Suggested-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agoMerge branch 'pds_core-cleanups'
David S. Miller [Wed, 30 Apr 2025 11:56:00 +0000 (12:56 +0100)] 
Merge branch 'pds_core-cleanups'

Shannon Nelson says:

====================
pds_core: small code updates

These are a few little code touch ups for a kdoc complaint,
quicker error detection, and a cleaner initialization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agopds_core: init viftype default in declaration
Shannon Nelson [Fri, 25 Apr 2025 20:46:18 +0000 (13:46 -0700)] 
pds_core: init viftype default in declaration

Initialize the .enabled field of the FWCTL viftype default in
the declaration rather than as a bit of code as it is always
to be enabled and needs no logic around it.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agopds_core: smaller adminq poll starting interval
Shannon Nelson [Fri, 25 Apr 2025 20:46:17 +0000 (13:46 -0700)] 
pds_core: smaller adminq poll starting interval

Shorten the adminq poll starting interval in order to notice
any transaction errors more quickly.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agopds_core: remove extra name description
Shannon Nelson [Fri, 25 Apr 2025 20:46:16 +0000 (13:46 -0700)] 
pds_core: remove extra name description

Fix the kernel-doc complaint
include/linux/pds/pds_adminq.h:481: warning: Excess struct member 'name' description in 'pds_core_lif_getattr_comp'

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agoMerge tag 'v6.15-p6' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Wed, 30 Apr 2025 03:59:42 +0000 (20:59 -0700)] 
Merge tag 'v6.15-p6' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "This fixes a regression in scompress"

* tag 'v6.15-p6' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: scompress - increment scomp_scratch_users when already allocated

5 months agoMerge tag 'nf-next-25-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilt...
Jakub Kicinski [Tue, 29 Apr 2025 23:31:09 +0000 (16:31 -0700)] 
Merge tag 'nf-next-25-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next:

1) Replace msecs_to_jiffies() by secs_to_jiffies(), from Easwar Hariharan.

2) Allow to compile xt_cgroup with cgroupsv2 support only,
   from Michal Koutny.

3) Prepare for sock_cgroup_classid() removal by wrapping it around
   ifdef, also from Michal Koutny.

4) Remove redundant pointer fetch on conntrack template, from Xuanqiang Luo.

5) Re-format one block in the tproxy documentation for consistency,
   from Chen Linxuan.

6) Expose set element count and type via netlink attributes,
   from Florian Westphal.

* tag 'nf-next-25-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: export set count and backend name to userspace
  docs: tproxy: fix formatting for nft code block
  netfilter: conntrack: Remove redundant NFCT_ALIGN call
  net: cgroup: Guard users of sock_cgroup_classid()
  netfilter: xt_cgroup: Make it independent from net_cls
  netfilter: xt_IDLETIMER: convert timeouts to secs_to_jiffies()
====================

Link: https://patch.msgid.link/20250428221254.3853-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoidpf: remove unreachable code from setting mailbox
Michal Swiatkowski [Wed, 9 Apr 2025 06:29:45 +0000 (08:29 +0200)] 
idpf: remove unreachable code from setting mailbox

Remove code that isn't reached. There is no need to check for
adapter->req_vec_chunks, because if it isn't set idpf_set_mb_vec_id()
won't be called.

Only one path when idpf_set_mb_vec_id() is called:
idpf_intr_req()
 -> idpf_send_alloc_vectors_msg() -> adapter->req_vec_chunk is allocated
 here, otherwise an error is returned and idpf_intr_req() exits with an
 error.

The idpf_set_mb_vec_id() becomes one-liner and it is called only once.
Remove it and set mailbox vector index directly.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agotools: ynl: fix typo in info string
Ruben Wauters [Mon, 28 Apr 2025 21:51:09 +0000 (22:51 +0100)] 
tools: ynl: fix typo in info string

replaces formmated with formatted
also corrects grammar by replacing a with an, and capitalises RST

Signed-off-by: Ruben Wauters <rubenru09@aol.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250428215541.6029-1-rubenru09@aol.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoidpf: assign extracted ptype to struct libeth_rqe_info field
Mateusz Polchlopek [Sat, 1 Mar 2025 19:02:44 +0000 (20:02 +0100)] 
idpf: assign extracted ptype to struct libeth_rqe_info field

Assign the ptype extracted from qword to the ptype field of struct
libeth_rqe_info.
Remove the now excess ptype param of idpf_rx_singleq_extract_fields(),
idpf_rx_singleq_extract_base_fields() and
idpf_rx_singleq_extract_flex_fields().

Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoixgbe: devlink: add devlink region support for E610
Slawomir Mrozowicz [Fri, 11 Apr 2025 13:06:26 +0000 (15:06 +0200)] 
ixgbe: devlink: add devlink region support for E610

Provide support for the following devlink cmds:
 -DEVLINK_CMD_REGION_GET
 -DEVLINK_CMD_REGION_NEW
 -DEVLINK_CMD_REGION_DEL
 -DEVLINK_CMD_REGION_READ

ixgbe devlink region implementation, similarly to the ice one,
lets user to create snapshots of content of Non Volatile Memory,
content of Shadow RAM, and capabilities of the device.

For both NVM and SRAM regions provide .read() handler to let user
read their contents without the need to create full snapshots.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com>
Co-developed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoixgbe: add E610 .set_phys_id() callback implementation
Jedrzej Jagielski [Mon, 3 Mar 2025 12:06:30 +0000 (13:06 +0100)] 
ixgbe: add E610 .set_phys_id() callback implementation

Legacy implementation of .set_phys_id() ethtool callback is not
applicable for E610 device.

Add new implementation which uses 0x06E9 command by calling
ixgbe_aci_set_port_id_led().

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoixgbe: apply different rules for setting FC on E610
Jedrzej Jagielski [Mon, 3 Mar 2025 12:06:29 +0000 (13:06 +0100)] 
ixgbe: apply different rules for setting FC on E610

E610 device doesn't support disabling FC autonegotiation.

Create dedicated E610 .set_pauseparam() implementation and assign
it to ixgbe_ethtool_ops_e610.

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoixgbe: add support for ACPI WOL for E610
Jedrzej Jagielski [Mon, 3 Mar 2025 12:06:28 +0000 (13:06 +0100)] 
ixgbe: add support for ACPI WOL for E610

Currently only APM (Advanced Power Management) is supported by
the ixgbe driver. It works for magic packets only, as for different
sources of wake-up E610 adapter utilizes different feature.

Add E610 specific implementation of ixgbe_set_wol() callback. When
any of broadcast/multicast/unicast wake-up is set, disable APM and
configure ACPI (Advanced Configuration and Power Interface).

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoixgbe: create E610 specific ethtool_ops structure
Jedrzej Jagielski [Mon, 3 Mar 2025 12:06:27 +0000 (13:06 +0100)] 
ixgbe: create E610 specific ethtool_ops structure

E610's implementation of various ethtool ops is different than
the ones corresponding to ixgbe legacy products. Therefore create
separate E610 ethtool_ops struct which will be filled out in the
forthcoming patches.

Add adequate ops struct basing on MAC type. This step requires
changing a bit the flow of probing by placing ixgbe_set_ethtool_ops
after hw.mac.type is assigned. So move the whole netdev assignment
block after hw.mac.type is known. This step doesn't have any additional
impact on probing sequence.

Suggested-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Bharath R <bharath.r@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoigc: Change Tx mode for MQPRIO offloading
Kurt Kanzenbach [Fri, 21 Mar 2025 13:52:39 +0000 (14:52 +0100)] 
igc: Change Tx mode for MQPRIO offloading

The current MQPRIO offload implementation uses the legacy TSN Tx mode. In
this mode the hardware uses four packet buffers and considers queue
priorities.

In order to harmonize the TAPRIO implementation with MQPRIO, switch to the
regular TSN Tx mode. This mode also uses four packet buffers and considers
queue priorities. In addition to the legacy mode, transmission is always
coupled to Qbv. The driver already has mechanisms to use a dummy schedule
of 1 second with all gates open for ETF. Simply use this for MQPRIO too.

This reduces code and makes it easier to add support for frame preemption
later.

Tested on i225 with real time application using high priority queue, iperf3
using low priority queue and network TAP device.

Acked-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoigc: Limit netdev_tc calls to MQPRIO
Kurt Kanzenbach [Fri, 21 Mar 2025 13:52:38 +0000 (14:52 +0100)] 
igc: Limit netdev_tc calls to MQPRIO

Limit netdev_tc calls to MQPRIO. Currently these calls are made in
igc_tsn_enable_offload() and igc_tsn_disable_offload() which are used by
TAPRIO and ETF as well. However, these are only required for MQPRIO.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoigb: Get rid of spurious interrupts
Kurt Kanzenbach [Wed, 19 Mar 2025 10:26:42 +0000 (11:26 +0100)] 
igb: Get rid of spurious interrupts

When running the igc with XDP/ZC in busy polling mode with deferral of hard
interrupts, interrupts still happen from time to time. That is caused by
the igb task watchdog which triggers Rx interrupts periodically.

That mechanism has been introduced to overcome skb/memory allocation
failures [1]. So the Rx clean functions stop processing the Rx ring in case
of such failure. The task watchdog triggers Rx interrupts periodically in
the hope that memory became available in the mean time.

The current behavior is undesirable for real time applications, because the
driver induced Rx interrupts trigger also the softirq processing. However,
all real time packets should be processed by the application which uses the
busy polling method.

Therefore, only trigger the Rx interrupts in case of real allocation
failures. Introduce a new flag for signaling that condition.

Follow the same logic as in commit 8dcf2c212078 ("igc: Get rid of spurious
interrupts").

[1] - https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=3be507547e6177e5c808544bd6a2efa2c7f1d436

Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Sweta Kumari <sweta.kumari@intel.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoigb: Add support for persistent NAPI config
Kurt Kanzenbach [Wed, 19 Mar 2025 10:26:41 +0000 (11:26 +0100)] 
igb: Add support for persistent NAPI config

Use netif_napi_add_config() to assign persistent per-NAPI config.

This is useful for preserving NAPI settings when changing queue counts or
for user space programs using SO_INCOMING_NAPI_ID.

Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoigb: Link queues to NAPI instances
Kurt Kanzenbach [Wed, 19 Mar 2025 10:26:40 +0000 (11:26 +0100)] 
igb: Link queues to NAPI instances

Link queues to NAPI instances via netdev-genl API. This is required to use
XDP/ZC busy polling. See commit 5ef44b3cb43b ("xsk: Bring back busy polling
support") for details.

This also allows users to query the info with netlink:

|$ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
|                               --dump queue-get --json='{"ifindex": 2}'
|[{'id': 0, 'ifindex': 2, 'napi-id': 8201, 'type': 'rx'},
| {'id': 1, 'ifindex': 2, 'napi-id': 8202, 'type': 'rx'},
| {'id': 2, 'ifindex': 2, 'napi-id': 8203, 'type': 'rx'},
| {'id': 3, 'ifindex': 2, 'napi-id': 8204, 'type': 'rx'},
| {'id': 0, 'ifindex': 2, 'napi-id': 8201, 'type': 'tx'},
| {'id': 1, 'ifindex': 2, 'napi-id': 8202, 'type': 'tx'},
| {'id': 2, 'ifindex': 2, 'napi-id': 8203, 'type': 'tx'},
| {'id': 3, 'ifindex': 2, 'napi-id': 8204, 'type': 'tx'}]

Add rtnl locking to PCI error handlers, because netif_queue_set_napi()
requires the lock held.

While at __igb_open() use RCT coding style.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Sweta Kumari <sweta.kumari@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoigb: Link IRQs to NAPI instances
Kurt Kanzenbach [Wed, 19 Mar 2025 10:26:39 +0000 (11:26 +0100)] 
igb: Link IRQs to NAPI instances

Link IRQs to NAPI instances via netdev-genl API. This allows users to query
that information via netlink:

|$ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
|                               --dump napi-get --json='{"ifindex": 2}'
|[{'defer-hard-irqs': 0,
|  'gro-flush-timeout': 0,
|  'id': 8204,
|  'ifindex': 2,
|  'irq': 127,
|  'irq-suspend-timeout': 0},
| {'defer-hard-irqs': 0,
|  'gro-flush-timeout': 0,
|  'id': 8203,
|  'ifindex': 2,
|  'irq': 126,
|  'irq-suspend-timeout': 0},
| {'defer-hard-irqs': 0,
|  'gro-flush-timeout': 0,
|  'id': 8202,
|  'ifindex': 2,
|  'irq': 125,
|  'irq-suspend-timeout': 0},
| {'defer-hard-irqs': 0,
|  'gro-flush-timeout': 0,
|  'id': 8201,
|  'ifindex': 2,
|  'irq': 124,
|  'irq-suspend-timeout': 0}]
|$ cat /proc/interrupts | grep enp2s0
|123:          0          1 IR-PCI-MSIX-0000:02:00.0   0-edge      enp2s0
|124:          0          7 IR-PCI-MSIX-0000:02:00.0   1-edge      enp2s0-TxRx-0
|125:          0          0 IR-PCI-MSIX-0000:02:00.0   2-edge      enp2s0-TxRx-1
|126:          0          5 IR-PCI-MSIX-0000:02:00.0   3-edge      enp2s0-TxRx-2
|127:          0          0 IR-PCI-MSIX-0000:02:00.0   4-edge      enp2s0-TxRx-3

Reviewed-by: Joe Damato <jdamato@fastly.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agonet: phy: aquantia: fix commenting format
Aryan Srivastava [Mon, 28 Apr 2025 21:49:20 +0000 (09:49 +1200)] 
net: phy: aquantia: fix commenting format

Comment was erroneously added with /**, amend this to use /* as it is
not a kernel-doc.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202504262247.1UBrDBVN-lkp@intel.com/
Signed-off-by: Aryan Srivastava <aryan.srivastava@alliedtelesis.co.nz>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250428214920.813038-1-aryan.srivastava@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: ipv6: fix UDPv6 GSO segmentation with NAT
Felix Fietkau [Sat, 26 Apr 2025 15:32:09 +0000 (17:32 +0200)] 
net: ipv6: fix UDPv6 GSO segmentation with NAT

If any address or port is changed, update it in all packets and recalculate
checksum.

Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250426153210.14044-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'fix-felix-dsa-taprio-gates-after-clock-jump'
Jakub Kicinski [Tue, 29 Apr 2025 21:44:39 +0000 (14:44 -0700)] 
Merge branch 'fix-felix-dsa-taprio-gates-after-clock-jump'

Vladimir Oltean says:

====================
Fix Felix DSA taprio gates after clock jump

Richie Pearn presented a reproducible situation where traffic would get
blocked on the NXP LS1028A switch if a certain taprio schedule was
applied, and stepping the PTP clock would take place. The latter event
is an expected initial occurrence, but also at runtime, for example when
transitioning from one grandmaster to another.

The issue is completely described in patch 1/4, which also contains
the fix, but it has left me with some doubts regarding the need for
vsc9959_tas_clock_adjust() in general.

In order to prove to myself that vsc9959_tas_clock_adjust() is needed in
general, I have written a selftest for the tc-taprio data path in patch
4/4. On the LS1028A, we can clearly see the following failures without
that function:

INFO: Forcing a backward clock jump
TEST: ping                                                          [FAIL]
INFO: Setting up taprio after PTP
TEST: In band with gate                                             [FAIL]
        Reception of 100 packets failed
TEST: Out of band with gate                                         [FAIL]
        Reception of 100 packets failed

As for testing my fix from patch 1/4, that was quite a bit more complex
to do automatically. In fact, I couldn't find any other schedule that
would fail to be updated by vsc9959_tas_clock_adjust() as cleanly as
the schedule from Richie, so I've added that specific schedule as the
test_clock_jump_backward() test.

The test ordering is also (unfortunately) very strategic. Running the
selftest to the end dirties the GCL RAM, and when running
test_clock_jump_backward() once again, the GCL entries won't be all
zeroes as they were the first time around. They will contain bits and
pieces of old schedules, making it very challenging to make it fail.

Thus, test_clock_jump_backward() is the first in the test suite, and
without patch 1/4, it is only supposed to fail the _first_ time when
running after a clean boot.
====================

Link: https://patch.msgid.link/20250426144859.3128352-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: net: tc_taprio: new test
Vladimir Oltean [Sat, 26 Apr 2025 14:48:58 +0000 (17:48 +0300)] 
selftests: net: tc_taprio: new test

Add a forwarding path test for tc-taprio, based on isochron. This is
specifically intended for NICs with an offloaded data path (switchdev/DSA)
and requires taprio 'flags 2'. Also, $h1 and $h2 must support hardware
timestamping, and $h1 tc-etf offload, for isochron to work.

Packets received by a switch while the egress port has a taprio schedule
with an open gate for the traffic class must be sent right away.

Packets received by the switch while the traffic class gate must be
delayed until it opens.

Packets received by the switch must be dropped if the gate for the
traffic class never opens.

Packets should pass if the maximum SDU for the traffic class allows it,
and should be dropped otherwise.

The schedule should auto-update itself if clock jumps take place while
taprio is installed. Repeat most of the above tests after forcing two
clock jumps, one backwards (in Jan 1970) and one back into the present.

Symlink it from tools/testing/selftests/drivers/net/dsa, because usually
DSA ports have the same MAC address, and we need STABLE_MAC_ADDRS=yes
from its forwarding.config for the test to run successfully.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250426144859.3128352-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: net: tsn_lib: add window_size argument to isochron_do()
Vladimir Oltean [Sat, 26 Apr 2025 14:48:57 +0000 (17:48 +0300)] 
selftests: net: tsn_lib: add window_size argument to isochron_do()

Make out-of-band testing (send a packet when its traffic class gate is
closed, expecting it to be delayed) more predictable by allowing the
window size to be customized by isochron_do().

From man isochron-send, the window size alters the advance time (the
delta between the transmission time of the packet, and its expected TX
time when using SO_TXTIME or tc-taprio on the sender). In absence of the
argument, isochron-send defaults to maximizing the advance time (making
it equal to the cycle length).

The default behavior is exactly what is problematic. An advance time
that is too large will make packets intended to be out-of-band still be
potentially in-band with an open gate from the schedule's previous cycle.
We need to allow that advance time to be reduced.

Perhaps a bit confusingly, isochron_do() has a shift_time argument
currently, but that does not help here. The shift time shifts both the
user space wakeup time and the expected TX time by equal amounts, it is
unable of bringing them closer to one another.

Set the window size properly for the Ocelot PSFP selftest as well.
That used to work due to a very carefully chosen SHIFT_TIME_NS.
I've re-tested that the test still works properly.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250426144859.3128352-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: net: tsn_lib: create common helper for counting received packets
Vladimir Oltean [Sat, 26 Apr 2025 14:48:56 +0000 (17:48 +0300)] 
selftests: net: tsn_lib: create common helper for counting received packets

This snippet will be necessary for a future isochron-based test, so
provide a simpler high-level interface for counting the received
packets.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250426144859.3128352-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: felix: fix broken taprio gate states after clock jump
Vladimir Oltean [Sat, 26 Apr 2025 14:48:55 +0000 (17:48 +0300)] 
net: dsa: felix: fix broken taprio gate states after clock jump

Simplest setup to reproduce the issue: connect 2 ports of the
LS1028A-RDB together (eno0 with swp0) and run:

$ ip link set eno0 up && ip link set swp0 up
$ tc qdisc replace dev swp0 parent root handle 100 taprio num_tc 8 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 map 0 1 2 3 4 5 6 7 \
base-time 0 sched-entry S 20 300000 sched-entry S 10 200000 \
sched-entry S 20 300000 sched-entry S 48 200000 \
sched-entry S 20 300000 sched-entry S 83 200000 \
sched-entry S 40 300000 sched-entry S 00 200000 flags 2
$ ptp4l -i eno0 -f /etc/linuxptp/configs/gPTP.cfg -m &
$ ptp4l -i swp0 -f /etc/linuxptp/configs/gPTP.cfg -m

One will observe that the PTP state machine on swp0 starts
synchronizing, then it attempts to do a clock step, and after that, it
never fails to recover from the condition below.

ptp4l[82.427]: selected best master clock 00049f.fffe.05f627
ptp4l[82.428]: port 1 (swp0): MASTER to UNCALIBRATED on RS_SLAVE
ptp4l[83.252]: port 1 (swp0): UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[83.886]: rms 4537731277 max 9075462553 freq -18518 +/- 11467 delay   818 +/-   0
ptp4l[84.170]: timed out while polling for tx timestamp
ptp4l[84.171]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[84.172]: port 1 (swp0): send peer delay request failed
ptp4l[84.173]: port 1 (swp0): clearing fault immediately
ptp4l[84.269]: port 1 (swp0): SLAVE to LISTENING on INIT_COMPLETE
ptp4l[85.303]: timed out while polling for tx timestamp
ptp4l[84.171]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[84.172]: port 1 (swp0): send peer delay request failed
ptp4l[84.173]: port 1 (swp0): clearing fault immediately
ptp4l[84.269]: port 1 (swp0): SLAVE to LISTENING on INIT_COMPLETE
ptp4l[85.303]: timed out while polling for tx timestamp
ptp4l[85.304]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[85.305]: port 1 (swp0): send peer delay response failed
ptp4l[85.306]: port 1 (swp0): clearing fault immediately
ptp4l[86.304]: timed out while polling for tx timestamp

A hint is given by the non-zero statistics for dropped packets which
were expecting hardware TX timestamps:

$ ethtool --include-statistics -T swp0
(...)
Statistics:
  tx_pkts: 30
  tx_lost: 11
  tx_err: 0

We know that when PTP clock stepping takes place (from ocelot_ptp_settime64()
or from ocelot_ptp_adjtime()), vsc9959_tas_clock_adjust() is called.

Another interesting hint is that placing an early return in
vsc9959_tas_clock_adjust(), so as to neutralize this function, fixes the
issue and TX timestamps are no longer dropped.

The debugging function written by me and included below is intended to
read the GCL RAM, after the admin schedule became operational, through
the two status registers available for this purpose:
QSYS_GCL_STATUS_REG_1 and QSYS_GCL_STATUS_REG_2.

static void vsc9959_print_tas_gcl(struct ocelot *ocelot)
{
u32 val, list_length, interval, gate_state;
int i, err;

err = read_poll_timeout(ocelot_read, val,
!(val & QSYS_PARAM_STATUS_REG_8_CONFIG_PENDING),
10, 100000, false, ocelot, QSYS_PARAM_STATUS_REG_8);
if (err) {
dev_err(ocelot->dev,
"Failed to wait for TAS config pending bit to clear: %pe\n",
ERR_PTR(err));
return;
}

val = ocelot_read(ocelot, QSYS_PARAM_STATUS_REG_3);
list_length = QSYS_PARAM_STATUS_REG_3_LIST_LENGTH_X(val);

dev_info(ocelot->dev, "GCL length: %u\n", list_length);

for (i = 0; i < list_length; i++) {
ocelot_rmw(ocelot,
   QSYS_GCL_STATUS_REG_1_GCL_ENTRY_NUM(i),
   QSYS_GCL_STATUS_REG_1_GCL_ENTRY_NUM_M,
   QSYS_GCL_STATUS_REG_1);
interval = ocelot_read(ocelot, QSYS_GCL_STATUS_REG_2);
val = ocelot_read(ocelot, QSYS_GCL_STATUS_REG_1);
gate_state = QSYS_GCL_STATUS_REG_1_GATE_STATE_X(val);

dev_info(ocelot->dev, "GCL entry %d: states 0x%x interval %u\n",
 i, gate_state, interval);
}
}

Calling it from two places: after the initial QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE
performed by vsc9959_qos_port_tas_set(), and after the one done by
vsc9959_tas_clock_adjust(), I notice the following difference.

From the tc-taprio process context, where the schedule was initially
configured, the GCL looks like this:

mscc_felix 0000:00:00.5: GCL length: 8
mscc_felix 0000:00:00.5: GCL entry 0: states 0x20 interval 300000
mscc_felix 0000:00:00.5: GCL entry 1: states 0x10 interval 200000
mscc_felix 0000:00:00.5: GCL entry 2: states 0x20 interval 300000
mscc_felix 0000:00:00.5: GCL entry 3: states 0x48 interval 200000
mscc_felix 0000:00:00.5: GCL entry 4: states 0x20 interval 300000
mscc_felix 0000:00:00.5: GCL entry 5: states 0x83 interval 200000
mscc_felix 0000:00:00.5: GCL entry 6: states 0x40 interval 300000
mscc_felix 0000:00:00.5: GCL entry 7: states 0x0 interval 200000

But from the ptp4l clock stepping process context, when the
vsc9959_tas_clock_adjust() hook is called, the GCL RAM of the
operational schedule now looks like this:

mscc_felix 0000:00:00.5: GCL length: 8
mscc_felix 0000:00:00.5: GCL entry 0: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 1: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 2: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 3: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 4: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 5: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 6: states 0x0 interval 0
mscc_felix 0000:00:00.5: GCL entry 7: states 0x0 interval 0

I do not have a formal explanation, just experimental conclusions.
It appears that after triggering QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE
for a port's TAS, the GCL entry RAM is updated anyway, despite what the
documentation claims: "Specify the time interval in
QSYS::GCL_CFG_REG_2.TIME_INTERVAL. This triggers the actual RAM
write with the gate state and the time interval for the entry number
specified". We don't touch that register (through vsc9959_tas_gcl_set())
from vsc9959_tas_clock_adjust(), yet the GCL RAM is updated anyway.

It seems to be updated with effectively stale memory, which in my
testing can hold a variety of things, including even pieces of the
previously applied schedule, for particular schedule lengths.

As such, in most circumstances it is very difficult to pinpoint this
issue, because the newly updated schedule would "behave strangely",
but ultimately might still pass traffic to some extent, due to some
gate entries still being present in the stale GCL entry RAM. It is easy
to miss.

With the particular schedule given at the beginning, the GCL RAM
"happens" to be reproducibly rewritten with all zeroes, and this is
consistent with what we see: when the time-aware shaper has gate entries
with all gates closed, traffic is dropped on TX, no wonder we can't
retrieve TX timestamps.

Rewriting the GCL entry RAM when reapplying the new base time fixes the
observed issue.

Fixes: 8670dc33f48b ("net: dsa: felix: update base time of time-aware shaper when adjusting PTP time")
Reported-by: Richie Pearn <richard.pearn@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250426144859.3128352-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: ethernet: mtk_eth_soc: fix SER panic with 4GB+ RAM
Chad Monroe [Sun, 27 Apr 2025 01:05:44 +0000 (02:05 +0100)] 
net: ethernet: mtk_eth_soc: fix SER panic with 4GB+ RAM

If the mtk_poll_rx() function detects the MTK_RESETTING flag, it will
jump to release_desc and refill the high word of the SDP on the 4GB RFB.
Subsequently, mtk_rx_clean will process an incorrect SDP, leading to a
panic.

Add patch from MediaTek's SDK to resolve this.

Fixes: 2d75891ebc09 ("net: ethernet: mtk_eth_soc: support 36-bit DMA addressing on MT7988")
Link: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/71f47ea785699c6aa3b922d66c2bdc1a43da25b1
Signed-off-by: Chad Monroe <chad@monroe.io>
Link: https://patch.msgid.link/4adc2aaeb0fb1b9cdc56bf21cf8e7fa328daa345.1745715843.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoigc: fix lock order in igc_ptp_reset
Jacob Keller [Tue, 22 Apr 2025 21:03:09 +0000 (14:03 -0700)] 
igc: fix lock order in igc_ptp_reset

Commit 1a931c4f5e68 ("igc: add lock preventing multiple simultaneous PTM
transactions") added a new mutex to protect concurrent PTM transactions.
This lock is acquired in igc_ptp_reset() in order to ensure the PTM
registers are properly disabled after a device reset.

The flow where the lock is acquired already holds a spinlock, so acquiring
a mutex leads to a sleep-while-locking bug, reported both by smatch,
and the kernel test robot.

The critical section in igc_ptp_reset() does correctly use the
readx_poll_timeout_atomic variants, but the standard PTM flow uses regular
sleeping variants. This makes converting the mutex to a spinlock a bit
tricky.

Instead, re-order the locking in igc_ptp_reset. Acquire the mutex first,
and then the tmreg_lock spinlock. This is safe because there is no other
ordering dependency on these locks, as this is the only place where both
locks were acquired simultaneously. Indeed, any other flow acquiring locks
in that order would be wrong regardless.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes: 1a931c4f5e68 ("igc: add lock preventing multiple simultaneous PTM transactions")
Link: https://lore.kernel.org/intel-wired-lan/Z_-P-Hc1yxcw0lTB@stanley.mountain/
Link: https://lore.kernel.org/intel-wired-lan/202504211511.f7738f5d-lkp@intel.com/T/#u
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoidpf: protect shutdown from reset
Larysa Zaremba [Thu, 10 Apr 2025 11:52:23 +0000 (13:52 +0200)] 
idpf: protect shutdown from reset

Before the referenced commit, the shutdown just called idpf_remove(),
this way IDPF_REMOVE_IN_PROG was protecting us from the serv_task
rescheduling reset. Without this flag set the shutdown process is
vulnerable to HW reset or any other triggering conditions (such as
default mailbox being destroyed).

When one of conditions checked in idpf_service_task becomes true,
vc_event_task can be rescheduled during shutdown, this leads to accessing
freed memory e.g. idpf_req_rel_vector_indexes() trying to read
vport->q_vector_idxs. This in turn causes the system to become defunct
during e.g. systemctl kexec.

Considering using IDPF_REMOVE_IN_PROG would lead to more heavy shutdown
process, instead just cancel the serv_task before cancelling
adapter->serv_task before cancelling adapter->vc_event_task to ensure that
reset will not be scheduled while we are doing a shutdown.

Fixes: 4c9106f4906a ("idpf: fix adapter NULL pointer dereference on reboot")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoidpf: fix potential memory leak on kcalloc() failure
Michal Swiatkowski [Fri, 4 Apr 2025 10:54:21 +0000 (12:54 +0200)] 
idpf: fix potential memory leak on kcalloc() failure

In case of failing on rss_data->rss_key allocation the function is
freeing vport without freeing earlier allocated q_vector_idxs. Fix it.

Move from freeing in error branch to goto scheme.

Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport")
Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Suggested-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoMerge branch 'xsk-respect-the-offsets-when-copying-frags'
Jakub Kicinski [Tue, 29 Apr 2025 21:26:29 +0000 (14:26 -0700)] 
Merge branch 'xsk-respect-the-offsets-when-copying-frags'

Bui Quang Minh says:

====================
xsk: respect the offsets when copying frags

In commit 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb
conversion"), we introduce a helper to convert zerocopy xdp_buff to skb.
However, in the frag copy, we mistakenly ignore the frag's offset. This
series adds the missing offset when copying frags in
xdp_copy_frags_from_zc(). This function is not used anywhere so no
backport is needed.

This series also makes xdp_copy_frags_from_zc() use page allocation API
page_pool_dev_alloc() instead of page_pool_dev_alloc_netmem() to avoid
possible confusion of the returned value.
====================

Link: https://patch.msgid.link/20250426081220.40689-1-minhquangbui99@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoxsk: convert xdp_copy_frags_from_zc() to use page_pool_dev_alloc()
Bui Quang Minh [Sat, 26 Apr 2025 08:12:20 +0000 (15:12 +0700)] 
xsk: convert xdp_copy_frags_from_zc() to use page_pool_dev_alloc()

This commit makes xdp_copy_frags_from_zc() use page allocation API
page_pool_dev_alloc() instead of page_pool_dev_alloc_netmem() to avoid
possible confusion of the returned value.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20250426081220.40689-3-minhquangbui99@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoxsk: respect the offsets when copying frags
Bui Quang Minh [Sat, 26 Apr 2025 08:12:19 +0000 (15:12 +0700)] 
xsk: respect the offsets when copying frags

In commit 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb
conversion"), we introduce a helper to convert zerocopy xdp_buff to skb.
However, in the frag copy, we mistakenly ignore the frag's offset. This
commit adds the missing offset when copying frags in
xdp_copy_frags_from_zc(). This function is not used anywhere so no
backport is needed.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20250426081220.40689-2-minhquangbui99@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge tag 'mmc-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Tue, 29 Apr 2025 21:23:36 +0000 (14:23 -0700)] 
Merge tag 'mmc-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "Renesas SDHI fixes:

   - Fix error-paths in probe

   - Fix build-error when CONFIG_REGULATOR is unset"

* tag 'mmc-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: renesas_sdhi: disable clocks if registering regulator failed
  mmc: renesas_sdhi: add regulator dependency
  mmc: renesas_sdhi: Fix error handling in renesas_sdhi_probe

5 months agonet: mdio: mux-meson-gxl: set reversed bit when using internal phy
Da Xue [Fri, 25 Apr 2025 19:20:09 +0000 (15:20 -0400)] 
net: mdio: mux-meson-gxl: set reversed bit when using internal phy

This bit is necessary to receive packets from the internal PHY.
Without this bit set, no activity occurs on the interface.

Normally u-boot sets this bit, but if u-boot is compiled without
net support, the interface will be up but without any activity.
If bit is set once, it will work until the IP is powered down or reset.

The vendor SDK sets this bit along with the PHY_ID bits.

Signed-off-by: Da Xue <da@libre.computer>
Fixes: 9a24e1ff4326 ("net: mdio: add amlogic gxl mdio mux support")
Link: https://patch.msgid.link/20250425192009.1439508-1-da@libre.computer
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dlink: Correct endianness handling of led_mode
Simon Horman [Fri, 25 Apr 2025 15:50:47 +0000 (16:50 +0100)] 
net: dlink: Correct endianness handling of led_mode

As it's name suggests, parse_eeprom() parses EEPROM data.

This is done by reading data, 16 bits at a time as follows:

for (i = 0; i < 128; i++)
                ((__le16 *) sromdata)[i] = cpu_to_le16(read_eeprom(np, i));

sromdata is at the same memory location as psrom.
And the type of psrom is a pointer to struct t_SROM.

As can be seen in the loop above, data is stored in sromdata, and thus psrom,
as 16-bit little-endian values.

However, the integer fields of t_SROM are host byte order integers.
And in the case of led_mode this leads to a little endian value
being incorrectly treated as host byte order.

Looking at rio_set_led_mode, this does appear to be a bug as that code
masks led_mode with 0x1, 0x2 and 0x8. Logic that would be effected by a
reversed byte order.

This problem would only manifest on big endian hosts.

Found by inspection while investigating a sparse warning
regarding the crc field of t_SROM.

I believe that warning is a false positive. And although I plan
to send a follow-up to use little-endian types for other the integer
fields of PSROM_t I do not believe that will involve any bug fixes.

Compile tested only.

Fixes: c3f45d322cbd ("dl2k: Add support for IP1000A-based cards")
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250425-dlink-led-mode-v1-1-6bae3c36e736@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agocrypto/krb5: Fix change to use SG miter to use offset
David Howells [Mon, 28 Apr 2025 10:22:06 +0000 (11:22 +0100)] 
crypto/krb5: Fix change to use SG miter to use offset

The recent patch to make the rfc3961 simplified code use sg_miter rather
than manually walking the scatterlist to hash the contents of a buffer
described by that scatterlist failed to take the starting offset into
account.

This is indicated by the selftests reporting:

    krb5: Running aes128-cts-hmac-sha256-128 mic
    krb5: !!! TESTFAIL crypto/krb5/selftest.c:446
    krb5: MIC mismatch

Fix this by calling sg_miter_skip() before doing the loop to advance
by the offset.

This only affects packet signing modes and not full encryption in RxGK
because, for full encryption, the message digest is handled inside the
authenc and krb5enc drivers.

Note: Nothing in linus/master uses the krb5lib, though the bug is there.
It is used by AF_RXRPC's RxGK implementation in -next, no need to backport.

Fixes: da6f9bf40ac2 ("crypto: krb5 - Use SG miter instead of doing it by hand")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://patch.msgid.link/3824017.1745835726@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: phylink: Drop unused defines for SUPPORTED/ADVERTISED_INTERFACES
Alexander Duyck [Sun, 27 Apr 2025 19:59:49 +0000 (12:59 -0700)] 
net: phylink: Drop unused defines for SUPPORTED/ADVERTISED_INTERFACES

The defines for SUPPORTED_INTERFACES and ADVERTISED_INTERFACES both appear
to be unused. I couldn't find anything that actually references them in the
original diff that added them and it seems like they have persisted despite
using deprecated defines that aren't supposed to be used as per the
ethtool.h header that defines the bits they are composed of.

Since they are unused, and not supposed to be used anymore I am just
dropping the lines of code since they seem to just be occupying space.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/174578398922.1580647.9720643128205980455.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge tag 'fsnotify_for_v6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 29 Apr 2025 18:23:53 +0000 (11:23 -0700)] 
Merge tag 'fsnotify_for_v6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify fix from Jan Kara:
 "A fix for the recently merged mount notification support"

* tag 'fsnotify_for_v6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  selftests/fs/mount-notify: test also remove/flush of mntns marks
  fanotify: fix flush of mntns marks

5 months agoMerge tag 'platform-drivers-x86-v6.15-4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Tue, 29 Apr 2025 18:18:45 +0000 (11:18 -0700)] 
Merge tag 'platform-drivers-x86-v6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers fixes from Ilpo Järvinen:
 "Fixes and new HW support

   - amd/pmc: Require at least 2.5 seconds between HW sleep cycles

   - alienware-wmi-wmax:
       - Add support for Alienware m15 R7
       - Fix error handling to avoid uninitialized variable

   - asus-wmi: Disable OOBE state also on resume

   - ideapad-laptop: Support a few new buttons

   - intel/hid: Add Panther Lake support

   - intel-uncore-freq: Fix missing uncore sysfs during CPU hotplug"

* tag 'platform-drivers-x86-v6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: ideapad-laptop: add support for some new buttons
  platform/x86: asus-wmi: Disable OOBE state after resume from hibernation
  platform/x86: alienware-wmi-wmax: Add support for Alienware m15 R7
  platform/x86/intel: hid: Add Pantherlake support
  platform/x86: alienware-wmi-wmax: Fix uninitialized variable due to bad error handling
  platform/x86/intel-uncore-freq: Fix missing uncore sysfs during CPU hotplug
  platform/x86/amd: pmc: Require at least 2.5 seconds between HW sleep cycles

5 months agoMerge tag 'fixes-2025-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt...
Linus Torvalds [Tue, 29 Apr 2025 18:10:46 +0000 (11:10 -0700)] 
Merge tag 'fixes-2025-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock fixes from Mike Rapoport:
 "Fixes for nid setting in memmap_init_reserved_pages():

   - pass 'size' rather than 'end' to memblock_set_node() as that
     function expects

   - fix a corner case when memblock.reserved is doubled at
     memmap_init_reserved_pages() and the newly reserved block
     won't have nid assigned"

* tag 'fixes-2025-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock tests: add test for memblock_set_node
  mm/memblock: repeat setting reserved region nid if array is doubled
  mm/memblock: pass size instead of end to memblock_set_node()

5 months agoMerge branch 'io_uring-zcrx-selftests-more-cleanups'
Jakub Kicinski [Tue, 29 Apr 2025 18:09:54 +0000 (11:09 -0700)] 
Merge branch 'io_uring-zcrx-selftests-more-cleanups'

David Wei says:

====================
io_uring/zcrx: selftests: more cleanups

Patch 1 use rand_port() instead of hard coding port 9999. Patch 2 parses
JSON from ethtool -g instead of string.
====================

Link: https://patch.msgid.link/20250426195525.1906774-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoio_uring/zcrx: selftests: parse json from ethtool -g
David Wei [Sat, 26 Apr 2025 19:55:25 +0000 (12:55 -0700)] 
io_uring/zcrx: selftests: parse json from ethtool -g

Parse JSON from ethtool -g instead of parsing text output.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250426195525.1906774-3-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoio_uring/zcrx: selftests: use rand_port()
David Wei [Sat, 26 Apr 2025 19:55:24 +0000 (12:55 -0700)] 
io_uring/zcrx: selftests: use rand_port()

Use rand_port() and stop hard coding port 9999.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250426195525.1906774-2-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoptp: ocp: Add const to bp->attr_group allocation type
Kees Cook [Sat, 26 Apr 2025 06:18:59 +0000 (23:18 -0700)] 
ptp: ocp: Add const to bp->attr_group allocation type

In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

The assigned type is "const struct attribute_group **", but the returned
type, while technically matching, will be not const qualified. As there is
no general way to safely add const qualifiers, adjust the allocation type
to match the assignment.

Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250426061858.work.470-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonfp: xsk: Adjust allocation type for nn->dp.xsk_pools
Kees Cook [Sat, 26 Apr 2025 06:08:42 +0000 (23:08 -0700)] 
nfp: xsk: Adjust allocation type for nn->dp.xsk_pools

In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

The assigned type "struct xsk_buff_pool **", but the returned type will be
"struct xsk_buff_pool ***". These are the same allocation size (pointer
size), but the types don't match. Adjust the allocation type to match
the assignment.

Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250426060841.work.016-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/mlx4_core: Adjust allocation type for buddy->bits
Kees Cook [Sat, 26 Apr 2025 06:07:58 +0000 (23:07 -0700)] 
net/mlx4_core: Adjust allocation type for buddy->bits

In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

The assigned type is "unsigned long **", but the returned type will be
"long **". These are the same size allocation (pointer size) but the
types do not match. Adjust the allocation type to match the assignment.

Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250426060757.work.865-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agopds_core: Allocate pdsc_viftype_defaults copy with ARRAY_SIZE()
Kees Cook [Sat, 26 Apr 2025 06:07:13 +0000 (23:07 -0700)] 
pds_core: Allocate pdsc_viftype_defaults copy with ARRAY_SIZE()

In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

This is allocating a copy of pdsc_viftype_defaults, which is an array of
struct pdsc_viftype. To correctly return "struct pdsc_viftype *" in the
future, adjust the allocation to allocating ARRAY_SIZE-many entries. The
resulting allocation size is the same.

Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250426060712.work.575-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoipv4: fib: Fix fib_info_hash_alloc() allocation type
Kees Cook [Sat, 26 Apr 2025 06:05:30 +0000 (23:05 -0700)] 
ipv4: fib: Fix fib_info_hash_alloc() allocation type

In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

This was allocating many sizeof(struct hlist_head *) when it actually
wanted sizeof(struct hlist_head). Luckily these are the same size.
Adjust the allocation type to match the assignment.

Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250426060529.work.873-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'ip-improve-tcp-sock-multipath-routing'
Paolo Abeni [Tue, 29 Apr 2025 14:22:26 +0000 (16:22 +0200)] 
Merge branch 'ip-improve-tcp-sock-multipath-routing'

Willem de Bruijn says:

====================
ip: improve tcp sock multipath routing

From: Willem de Bruijn <willemb@google.com>

Improve layer 4 multipath hash policy for local tcp connections:

patch 1: Select a source address that matches the nexthop device.
         Due to tcp_v4_connect making separate route lookups for saddr
         and route, the two can currently be inconsistent.

patch 2: Use all paths when opening multiple local tcp connections to
         the same ip address and port.

patch 3: Test the behavior. Extend the fib_tests.sh testsuite with one
         opening many connections, and count SYNs on both egress
         devices, for packets matching the source address of the dev.

Changelog in the individual patches
====================

Link: https://patch.msgid.link/20250424143549.669426-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoselftests/net: test tcp connection load balancing
Willem de Bruijn [Thu, 24 Apr 2025 14:35:20 +0000 (10:35 -0400)] 
selftests/net: test tcp connection load balancing

Verify that TCP connections use both routes when connecting multiple
times to a remote service over a two nexthop multipath route.

Use socat to create the connections. Use tc prio + tc filter to
count routes taken, counting SYN packets across the two egress
devices. Also verify that the saddr matches that of the device.

To avoid flaky tests when testing inherently randomized behavior,
set a low bar and pass if even a single SYN is observed on each
device.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250424143549.669426-4-willemdebruijn.kernel@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoip: load balance tcp connections to single dst addr and port
Willem de Bruijn [Thu, 24 Apr 2025 14:35:19 +0000 (10:35 -0400)] 
ip: load balance tcp connections to single dst addr and port

Load balance new TCP connections across nexthops also when they
connect to the same service at a single remote address and port.

This affects only port-based multipath hashing:
fib_multipath_hash_policy 1 or 3.

Local connections must choose both a source address and port when
connecting to a remote service, in ip_route_connect. This
"chicken-and-egg problem" (commit 2d7192d6cbab ("ipv4: Sanitize and
simplify ip_route_{connect,newports}()")) is resolved by first
selecting a source address, by looking up a route using the zero
wildcard source port and address.

As a result multiple connections to the same destination address and
port have no entropy in fib_multipath_hash.

This is not a problem when forwarding, as skb-based hashing has a
4-tuple. Nor when establishing UDP connections, as autobind there
selects a port before reaching ip_route_connect.

Load balance also TCP, by using a random port in fib_multipath_hash.
Port assignment in inet_hash_connect is not atomic with
ip_route_connect. Thus ports are unpredictable, effectively random.

Implementation details:

Do not actually pass a random fl4_sport, as that affects not only
hashing, but routing more broadly, and can match a source port based
policy route, which existing wildcard port 0 will not. Instead,
define a new wildcard flowi flag that is used only for hashing.

Selecting a random source is equivalent to just selecting a random
hash entirely. But for code clarity, follow the normal 4-tuple hash
process and only update this field.

fib_multipath_hash can be reached with zero sport from other code
paths, so explicitly pass this flowi flag, rather than trying to infer
this case in the function itself.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250424143549.669426-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoipv4: prefer multipath nexthop that matches source address
Willem de Bruijn [Thu, 24 Apr 2025 14:35:18 +0000 (10:35 -0400)] 
ipv4: prefer multipath nexthop that matches source address

With multipath routes, try to ensure that packets leave on the device
that is associated with the source address.

Avoid the following tcpdump example:

    veth0 Out IP 10.1.0.2.38640 > 10.2.0.3.8000: Flags [S]
    veth1 Out IP 10.1.0.2.38648 > 10.2.0.3.8000: Flags [S]

Which can happen easily with the most straightforward setup:

    ip addr add 10.0.0.1/24 dev veth0
    ip addr add 10.1.0.1/24 dev veth1

    ip route add 10.2.0.3 nexthop via 10.0.0.2 dev veth0 \
       nexthop via 10.1.0.2 dev veth1

This is apparently considered WAI, based on the comment in
ip_route_output_key_hash_rcu:

    * 2. Moreover, we are allowed to send packets with saddr
    *    of another iface. --ANK

It may be ok for some uses of multipath, but not all. For instance,
when using two ISPs, a router may drop packets with unknown source.

The behavior occurs because tcp_v4_connect makes three route
lookups when establishing a connection:

1. ip_route_connect calls to select a source address, with saddr zero.
2. ip_route_connect calls again now that saddr and daddr are known.
3. ip_route_newports calls again after a source port is also chosen.

With a route with multiple nexthops, each lookup may make a different
choice depending on available entropy to fib_select_multipath. So it
is possible for 1 to select the saddr from the first entry, but 3 to
select the second entry. Leading to the above situation.

Address this by preferring a match that matches the flowi4 saddr. This
will make 2 and 3 make the same choice as 1. Continue to update the
backup choice until a choice that matches saddr is found.

Do this in fib_select_multipath itself, rather than passing an fl4_oif
constraint, to avoid changing non-multipath route selection. Commit
e6b45241c57a ("ipv4: reset flowi parameters on route connect") shows
how that may cause regressions.

Also read ipv4.sysctl_fib_multipath_use_neigh only once. No need to
refresh in the loop.

This does not happen in IPv6, which performs only one lookup.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250424143549.669426-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: ti: icssg-prueth: Add ICSSG FW Stats
MD Danish Anwar [Thu, 24 Apr 2025 09:53:16 +0000 (15:23 +0530)] 
net: ti: icssg-prueth: Add ICSSG FW Stats

The ICSSG firmware maintains set of stats called PA_STATS.
Currently the driver only dumps 4 stats. Add support for dumping more
stats.

The offset for different stats are defined as MACROs in icssg_switch_map.h
file. All the offsets are for Slice0. Slice1 offsets are slice0 + 4.
The offset calculation is taken care while reading the stats in
emac_update_hardware_stats().

The statistics are documented in
Documentation/networking/device_drivers/icssg_prueth.rst

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20250424095316.2643573-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agotools/Makefile: Add ynl target
Joe Damato [Wed, 23 Apr 2025 20:46:44 +0000 (20:46 +0000)] 
tools/Makefile: Add ynl target

Add targets to build, clean, and install ynl headers, libynl.a, and
python tooling.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250423204647.190784-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agortase: Modify the format specifier in snprintf to %u
Justin Lai [Fri, 25 Apr 2025 06:40:57 +0000 (14:40 +0800)] 
rtase: Modify the format specifier in snprintf to %u

Modify the format specifier in snprintf to %u.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250425064057.30035-1-justinlai0215@realtek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge tag 'v6.15-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Mon, 28 Apr 2025 23:56:01 +0000 (16:56 -0700)] 
Merge tag 'v6.15-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - Fix three potential use after frees: in session logoff, in krb5 auth,
   and in RPC open

 - Fix missing rc check in session setup authentication

* tag 'v6.15-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix use-after-free in session logoff
  ksmbd: fix use-after-free in kerberos authentication
  ksmbd: fix use-after-free in ksmbd_session_rpc_open
  smb: server: smb2pdu: check return value of xa_store()

5 months agoMerge branch 'phase-out-hybrid-pci-devres-api'
Jakub Kicinski [Mon, 28 Apr 2025 23:19:19 +0000 (16:19 -0700)] 
Merge branch 'phase-out-hybrid-pci-devres-api'

Philipp Stanner says:

====================
Phase out hybrid PCI devres API

Fixes a number of minor issues with the usage of the PCI API in net.
Notbaly, it replaces calls to the sometimes-managed
pci_request_regions() to the always-managed pcim_request_all_regions(),
enabling us to remove that hybrid functionality from PCI.
====================

Link: https://patch.msgid.link/20250425085740.65304-2-phasta@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: thunder_bgx: Don't disable PCI device manually
Philipp Stanner [Fri, 25 Apr 2025 08:57:41 +0000 (10:57 +0200)] 
net: thunder_bgx: Don't disable PCI device manually

thunder_bgx's PCI device is enabled with pcim_enable_device(), a managed
devres function which ensures that the device gets enabled on driver
detach automatically.

Remove the calls to pci_disable_device().

Signed-off-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250425085740.65304-10-phasta@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: thunder_bgx: Use pure PCI devres API
Philipp Stanner [Fri, 25 Apr 2025 08:57:40 +0000 (10:57 +0200)] 
net: thunder_bgx: Use pure PCI devres API

The currently used function pci_request_regions() is one of the
problematic "hybrid devres" PCI functions, which are sometimes managed
through devres, and sometimes not (depending on whether
pci_enable_device() or pcim_enable_device() has been called before).

The PCI subsystem wants to remove this behavior and, therefore, needs to
port all users to functions that don't have this problem.

Furthermore, the PCI function being managed implies that it's not
necessary to call pci_release_regions() manually.

Remove the calls to pci_release_regions().

Replace pci_request_regions() with pcim_request_all_regions().

Signed-off-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250425085740.65304-9-phasta@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: mdio: thunder: Use pure PCI devres API
Philipp Stanner [Fri, 25 Apr 2025 08:57:39 +0000 (10:57 +0200)] 
net: mdio: thunder: Use pure PCI devres API

The currently used function pci_request_regions() is one of the
problematic "hybrid devres" PCI functions, which are sometimes managed
through devres, and sometimes not (depending on whether
pci_enable_device() or pcim_enable_device() has been called before).

The PCI subsystem wants to remove this behavior and, therefore, needs to
port all users to functions that don't have this problem.

Furthermore, the PCI function being managed implies that it's not
necessary to call pci_release_regions() manually.

Remove the calls to pci_release_regions().

Replace pci_request_regions() with pcim_request_all_regions().

Signed-off-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250425085740.65304-8-phasta@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: ethernet: sis900: Use pure PCI devres API
Philipp Stanner [Fri, 25 Apr 2025 08:57:38 +0000 (10:57 +0200)] 
net: ethernet: sis900: Use pure PCI devres API

The currently used function pci_request_regions() is one of the
problematic "hybrid devres" PCI functions, which are sometimes managed
through devres, and sometimes not (depending on whether
pci_enable_device() or pcim_enable_device() has been called before).

The PCI subsystem wants to remove this behavior and, therefore, needs to
port all users to functions that don't have this problem.

Replace pci_request_regions() with pcim_request_all_regions().

Signed-off-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Daniele Venzano <venza@brownhat.org>
Link: https://patch.msgid.link/20250425085740.65304-7-phasta@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: ethernet: natsemi: Use pure PCI devres API
Philipp Stanner [Fri, 25 Apr 2025 08:57:37 +0000 (10:57 +0200)] 
net: ethernet: natsemi: Use pure PCI devres API

The currently used function pci_request_regions() is one of the
problematic "hybrid devres" PCI functions, which are sometimes managed
through devres, and sometimes not (depending on whether
pci_enable_device() or pcim_enable_device() has been called before).

The PCI subsystem wants to remove this behavior and, therefore, needs to
port all users to functions that don't have this problem.

Replace pci_request_regions() with pcim_request_all_regions().

Signed-off-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250425085740.65304-6-phasta@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>