]> git.ipfire.org Git - thirdparty/linux.git/log
thirdparty/linux.git
2 weeks agonet: phy: add support for disabling PHY-autonomous EEE
Nicolai Buchwitz [Mon, 6 Apr 2026 07:13:07 +0000 (09:13 +0200)] 
net: phy: add support for disabling PHY-autonomous EEE

Some PHYs (e.g. Broadcom BCM54xx, Realtek RTL8211F) implement
autonomous EEE where the PHY manages LPI signaling without forwarding
it to the MAC. This conflicts with MAC drivers that implement their own
LPI control.

Add a .disable_autonomous_eee callback to struct phy_driver and call it
from phy_support_eee(). When a MAC driver indicates it supports EEE via
phy_support_eee(), the PHY's autonomous EEE is automatically disabled so
the MAC can manage LPI entry/exit.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260406-devel-autonomous-eee-v1-1-b335e7143711@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: airoha: Add missing RX_CPU_IDX() configuration in airoha_qdma_cleanup_rx_queue()
Lorenzo Bianconi [Wed, 8 Apr 2026 18:26:56 +0000 (20:26 +0200)] 
net: airoha: Add missing RX_CPU_IDX() configuration in airoha_qdma_cleanup_rx_queue()

When the descriptor index written in REG_RX_CPU_IDX() is equal to the one
stored in REG_RX_DMA_IDX(), the hw will stop since the QDMA RX ring is
empty.
Add missing REG_RX_CPU_IDX() configuration in airoha_qdma_cleanup_rx_queue
routine during QDMA RX ring cleanup.

Fixes: 514aac359987 ("net: airoha: Add missing cleanup bits in airoha_qdma_cleanup_rx_queue()")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260408-airoha-cpu-idx-airoha_qdma_cleanup_rx_queue-v1-1-8efa64844308@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'ynl-ethtool-netlink-fix-nla_len-overflow-for-large-string-sets'
Jakub Kicinski [Sun, 12 Apr 2026 18:23:52 +0000 (11:23 -0700)] 
Merge branch 'ynl-ethtool-netlink-fix-nla_len-overflow-for-large-string-sets'

Hangbin Liu says:

====================
ynl/ethtool/netlink: fix nla_len overflow for large string sets

This series addresses a silent data corruption issue triggered when ynl
retrieves string sets from NICs with a large number of statistics entries
(e.g. mlx5_core with thousands of ETH_SS_STATS strings).

The root cause is that struct nlattr.nla_len is a __u16 (max 65535
bytes). When a NIC exports enough statistics strings, the
ETHTOOL_A_STRINGSET_STRINGS nest built by strset_fill_set() exceeds
this limit. nla_nest_end() silently truncates the length on assignment,
producing a corrupted netlink message.

Patch 1 moves ethtool.py to selftest.

Patch 2 improves the ethtool tool: rename the doit/dumpit helpers
to do_set/do_get and convert do_get to use ynl.do() with an
explicit device header instead of a full dump with client-side filtering.

Patch 3 adds a --dbg-small-recv option to the YNL ethtool tool,
matching the same option already present in cli.py, to help debug netlink
message size issues

Patch 4 adds a new helper nla_nest_end_safe() to check whether the nla_len
is overflow and return -EMSGSIZE early if so.

Patch 5 uses the new helper in ethtool to make sure the ethtool doesn't
reply a corrupted netlink message.
====================

Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-0-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoethtool: strset: check nla_len overflow
Hangbin Liu [Wed, 8 Apr 2026 07:08:53 +0000 (15:08 +0800)] 
ethtool: strset: check nla_len overflow

The netlink attribute length field nla_len is a __u16, which can only
represent values up to 65535 bytes. NICs with a large number of
statistics strings (e.g. mlx5_core with thousands of ETH_SS_STATS
entries) can produce a ETHTOOL_A_STRINGSET_STRINGS nest that exceeds
this limit.

When nla_nest_end() writes the actual nest size back to nla_len, the
value is silently truncated. This results in a corrupted netlink message
being sent to userspace: the parser reads a wrong (truncated) attribute
length and misaligns all subsequent attribute boundaries, causing decode
errors.

Fix this by using the new helper nla_nest_end_safe and error out if
the size exceeds U16_MAX.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-5-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetlink: add a nla_nest_end_safe() helper
Hangbin Liu [Wed, 8 Apr 2026 07:08:52 +0000 (15:08 +0800)] 
netlink: add a nla_nest_end_safe() helper

The nla_len field in struct nlattr is a __u16, which can only hold
values up to 65535. If a nested attribute grows beyond this limit,
nla_nest_end() silently truncates the length, producing a corrupted
netlink message with no indication of the problem.

Since nla_nest_end() is used everywhere and this issue rarely happens,
let's add a new helper to check the length.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-4-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agotools: ynl: ethtool: add --dbg-small-recv option
Hangbin Liu [Wed, 8 Apr 2026 07:08:51 +0000 (15:08 +0800)] 
tools: ynl: ethtool: add --dbg-small-recv option

Add a --dbg-small-recv debug option to control the recv() buffer size
used by YNL, matching the same option already present in cli.py. This
is useful if user need to get large netlink message.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-3-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agotools: ynl: ethtool: use doit instead of dumpit for per-device GET
Hangbin Liu [Wed, 8 Apr 2026 07:08:50 +0000 (15:08 +0800)] 
tools: ynl: ethtool: use doit instead of dumpit for per-device GET

Rename the local helper doit() to do_set() and dumpit() to do_get() to
better reflect their purpose.

Convert do_get() to use ynl.do() with an explicit device header instead
of ynl.dump() followed by client-side filtering. This is more efficient
as the kernel only processes and returns data for the requested device,
rather than dumping all devices across the netns.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-2-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agotools: ynl: move ethtool.py to selftest
Hangbin Liu [Wed, 8 Apr 2026 07:08:49 +0000 (15:08 +0800)] 
tools: ynl: move ethtool.py to selftest

We have converted all the samples to selftests. This script is
the last piece of random "PoC" code we still have lying around.
Let's move it to tests.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260408-b4-ynl_ethtool-v2-1-7623a5e8f70b@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'net-mana-fix-debugfs-directory-naming-and-file-lifecycle'
Jakub Kicinski [Sun, 12 Apr 2026 18:22:56 +0000 (11:22 -0700)] 
Merge branch 'net-mana-fix-debugfs-directory-naming-and-file-lifecycle'

Erni Sri Satya Vennela says:

====================
net: mana: Fix debugfs directory naming and file lifecycle

This series fixes two pre-existing debugfs issues in the MANA driver.

Patch 1 fixes the per-device debugfs directory naming to use the unique
PCI BDF address via pci_name(), avoiding a potential NULL pointer
dereference when pdev->slot is NULL (e.g. VFIO passthrough, nested KVM)
and preventing name collisions across multiple PFs or VFs.

Patch 2 moves the current_speed debugfs file creation from
mana_probe_port() to mana_init_port() so it survives detach/attach
cycles triggered by MTU changes or XDP program changes.
====================

Link: https://patch.msgid.link/20260408081224.302308-1-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: mana: Move current_speed debugfs file to mana_init_port()
Erni Sri Satya Vennela [Wed, 8 Apr 2026 08:12:20 +0000 (01:12 -0700)] 
net: mana: Move current_speed debugfs file to mana_init_port()

Move the current_speed debugfs file creation from mana_probe_port() to
mana_init_port(). The file was previously created only during initial
probe, but mana_cleanup_port_context() removes the entire vPort debugfs
directory during detach/attach cycles. Since mana_init_port() recreates
the directory on re-attach, moving current_speed here ensures it survives
these cycles.

Fixes: 75cabb46935b ("net: mana: Add support for net_shaper_ops")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260408081224.302308-3-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: mana: Use pci_name() for debugfs directory naming
Erni Sri Satya Vennela [Wed, 8 Apr 2026 08:12:19 +0000 (01:12 -0700)] 
net: mana: Use pci_name() for debugfs directory naming

Use pci_name(pdev) for the per-device debugfs directory instead of
hardcoded "0" for PFs and pci_slot_name(pdev->slot) for VFs. The
previous approach had two issues:

1. pci_slot_name() dereferences pdev->slot, which can be NULL for VFs
   in environments like generic VFIO passthrough or nested KVM,
   causing a NULL pointer dereference.

2. Multiple PFs would all use "0", and VFs across different PCI
   domains or buses could share the same slot name, leading to
   -EEXIST errors from debugfs_create_dir().

pci_name(pdev) returns the unique BDF address, is always valid, and is
unique across the system.

Fixes: 6607c17c6c5e ("net: mana: Enable debugfs files for MANA device")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260408081224.302308-2-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonfc: llcp: add missing return after LLCP_CLOSED checks
Junxi Qian [Wed, 8 Apr 2026 08:10:06 +0000 (16:10 +0800)] 
nfc: llcp: add missing return after LLCP_CLOSED checks

In nfc_llcp_recv_hdlc() and nfc_llcp_recv_disc(), when the socket
state is LLCP_CLOSED, the code correctly calls release_sock() and
nfc_llcp_sock_put() but fails to return. Execution falls through to
the remainder of the function, which calls release_sock() and
nfc_llcp_sock_put() again. This results in a double release_sock()
and a refcount underflow via double nfc_llcp_sock_put(), leading to
a use-after-free.

Add the missing return statements after the LLCP_CLOSED branches
in both functions to prevent the fall-through.

Fixes: d646960f7986 ("NFC: Initial LLCP support")
Signed-off-by: Junxi Qian <qjx1298677004@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260408081006.3723-1-qjx1298677004@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'bng_en-add-link-management-and-statistics-support'
Jakub Kicinski [Sun, 12 Apr 2026 18:09:39 +0000 (11:09 -0700)] 
Merge branch 'bng_en-add-link-management-and-statistics-support'

Bhargava Marreddy says:

====================
bng_en: add link management and statistics support

This series enhances the bng_en driver by adding:
1. Link/PHY support
   a. Link query
   b. Async Link events
   c. Ethtool link set/get functionality
2. Hardware statistics reporting via ethtool -S

This version incorporates feedback received prior to splitting the
original series into two parts.
====================

Link: https://patch.msgid.link/20260406180420.279470-1-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobng_en: add support for ethtool -S stats display
Bhargava Marreddy [Mon, 6 Apr 2026 18:04:20 +0000 (23:34 +0530)] 
bng_en: add support for ethtool -S stats display

Implement the legacy ethtool statistics interface (get_sset_count,
get_strings, get_ethtool_stats) to expose hardware counters not
available through standard kernel stats APIs.

Ex:
a) Per-queue ring stats
     rxq0_ucast_packets: 2
     rxq0_mcast_packets: 0
     rxq0_bcast_packets: 15
     rxq0_ucast_bytes: 120
     rxq0_mcast_bytes: 0
     rxq0_bcast_bytes: 900
     txq0_ucast_packets: 0
     txq0_mcast_packets: 0
     txq0_bcast_packets: 0
     txq0_ucast_bytes: 0
     txq0_mcast_bytes: 0
     txq0_bcast_bytes: 0

b) Per-queue TPA(LRO/GRO) stats
     rxq4_tpa_eligible_pkt: 0
     rxq4_tpa_eligible_bytes: 0
     rxq4_tpa_pkt: 0
     rxq4_tpa_bytes: 0
     rxq4_tpa_errors: 0
     rxq4_tpa_events: 0

c) Port level stats
     rxp_good_vlan_frames: 0
     rxp_mtu_err_frames: 0
     rxp_tagged_frames: 0
     rxp_double_tagged_frames: 0
     rxp_pfc_ena_frames_pri0: 0
     rxp_pfc_ena_frames_pri1: 0
     rxp_pfc_ena_frames_pri2: 0
     rxp_pfc_ena_frames_pri3: 0
     rxp_pfc_ena_frames_pri4: 0
     rxp_pfc_ena_frames_pri5: 0
     rxp_pfc_ena_frames_pri6: 0
     rxp_pfc_ena_frames_pri7: 0
     rxp_eee_lpi_events: 0
     rxp_eee_lpi_duration: 0
     rxp_runt_bytes: 0
     rxp_runt_frames: 0
     txp_good_vlan_frames: 0
     txp_jabber_frames: 0
     txp_fcs_err_frames: 0
     txp_pfc_ena_frames_pri0: 0
     txp_pfc_ena_frames_pri1: 0
     txp_pfc_ena_frames_pri2: 0
     txp_pfc_ena_frames_pri3: 0
     txp_pfc_ena_frames_pri4: 0
     txp_pfc_ena_frames_pri5: 0
     txp_pfc_ena_frames_pri6: 0
     txp_pfc_ena_frames_pri7: 0
     txp_eee_lpi_events: 0
     txp_eee_lpi_duration: 0
     txp_xthol_frames: 0

d) Per-priority stats
     rx_bytes_pri0: 4182650
     rx_bytes_pri1: 4182650
     rx_bytes_pri2: 4182650
     rx_bytes_pri3: 4182650
     rx_bytes_pri4: 4182650
     rx_bytes_pri5: 4182650
     rx_bytes_pri6: 4182650
     rx_bytes_pri7: 4182650

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-11-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobng_en: implement netdev_stat_ops
Bhargava Marreddy [Mon, 6 Apr 2026 18:04:19 +0000 (23:34 +0530)] 
bng_en: implement netdev_stat_ops

Implement netdev_stat_ops to provide standardized per-queue
statistics via the Netlink API.

Below is the description of the hardware drop counters:

rx-hw-drop-overruns: Packets dropped by HW due to resource limitations
(e.g., no BDs available in the host ring).
rx-hw-drops: Total packets dropped by HW (sum of overruns and error
drops).
tx-hw-drop-errors: Packets dropped by HW because they were invalid or
malformed.
tx-hw-drops: Total packets dropped by HW (sum of resource limitations
and error drops).

The implementation was verified using the ynl tool:

./tools/net/ynl/pyynl/cli.py --spec \
Documentation/netlink/specs/netdev.yaml --dump qstats-get --json \
'{"ifindex":14, "scope":"queue"}'

[{'ifindex': 14, 'queue-id': 0, 'queue-type': 'rx', 'rx-bytes': 758,
'rx-hw-drop-overruns': 0, 'rx-hw-drops': 0, 'rx-packets': 11},
 {'ifindex': 14, 'queue-id': 1, 'queue-type': 'rx', 'rx-bytes': 0,
'rx-hw-drop-overruns': 0, 'rx-hw-drops': 0, 'rx-packets': 0},
{'ifindex': 14, 'queue-id': 0, 'queue-type': 'tx', 'tx-bytes': 0,
'tx-hw-drop-errors': 0, 'tx-hw-drops': 0, 'tx-packets': 0},
 {'ifindex': 14, 'queue-id': 1, 'queue-type': 'tx', 'tx-bytes': 0,
'tx-hw-drop-errors': 0, 'tx-hw-drops': 0, 'tx-packets': 0},
 {'ifindex': 14, 'queue-id': 2, 'queue-type': 'tx', 'tx-bytes': 810,
'tx-hw-drop-errors': 0, 'tx-hw-drops': 0, 'tx-packets': 10},]

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-10-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobng_en: implement ndo_get_stats64
Bhargava Marreddy [Mon, 6 Apr 2026 18:04:18 +0000 (23:34 +0530)] 
bng_en: implement ndo_get_stats64

Implement the ndo_get_stats64 callback to report aggregate network
statistics. The driver gathers these by accumulating the per-ring
counters into the provided rtnl_link_stats64 structure.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-9-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobng_en: periodically fetch and accumulate hardware statistics
Bhargava Marreddy [Mon, 6 Apr 2026 18:04:17 +0000 (23:34 +0530)] 
bng_en: periodically fetch and accumulate hardware statistics

Use the timer to schedule periodic stats collection via
the workqueue when the link is up. Fetch fresh counters from
hardware via DMA and accumulate them into 64-bit software
shadows, handling wrap-around for counters narrower than
64 bits.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Rahul Gupta <rahul-rg.gupta@broadcom.com>
Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-8-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobng_en: add HW stats infra and structured ethtool ops
Bhargava Marreddy [Mon, 6 Apr 2026 18:04:16 +0000 (23:34 +0530)] 
bng_en: add HW stats infra and structured ethtool ops

Implement the hardware-level statistics foundation and modern structured
ethtool operations.

1. Infrastructure: Add HWRM firmware wrappers (FUNC_QSTATS_EXT,
   PORT_QSTATS_EXT, and PORT_QSTATS) to query ring and port counters.
2. Structured ops: Implement .get_eth_phy_stats, .get_eth_mac_stats,
   .get_eth_ctrl_stats, .get_pause_stats, and .get_rmon_stats.

Stats are initially reported as 0; accumulation logic is added
in a subsequent patch.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-7-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobng_en: add support for link async events
Bhargava Marreddy [Mon, 6 Apr 2026 18:04:15 +0000 (23:34 +0530)] 
bng_en: add support for link async events

Register for firmware asynchronous events, including link-status,
link-speed, and PHY configuration changes. Upon event reception,
re-query the PHY and update ethtool settings accordingly.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-6-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobng_en: implement ethtool pauseparam operations
Bhargava Marreddy [Mon, 6 Apr 2026 18:04:14 +0000 (23:34 +0530)] 
bng_en: implement ethtool pauseparam operations

Implement .get_pauseparam and .set_pauseparam to support flow control
configuration. This allows reporting and setting of autoneg, RX pause,
and TX pause states.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-5-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobng_en: add ethtool link settings, get_link, and nway_reset
Bhargava Marreddy [Mon, 6 Apr 2026 18:04:13 +0000 (23:34 +0530)] 
bng_en: add ethtool link settings, get_link, and nway_reset

Add get/set_link_ksettings, get_link, and nway_reset support.
Report supported, advertised, and link-partner speeds across NRZ,
PAM4, and PAM4-112 signaling modes. Enable lane count reporting.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-4-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobng_en: query PHY capabilities and report link status
Bhargava Marreddy [Mon, 6 Apr 2026 18:04:12 +0000 (23:34 +0530)] 
bng_en: query PHY capabilities and report link status

Query PHY capabilities and supported speeds from firmware,
retrieve current link state (speed, duplex, pause, FEC),
and log the information. Seed initial link state during probe.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Rajashekar Hudumula <rajashekar.hudumula@broadcom.com>
Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-3-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agobng_en: add per-PF workqueue, timer, and slow-path task
Bhargava Marreddy [Mon, 6 Apr 2026 18:04:11 +0000 (23:34 +0530)] 
bng_en: add per-PF workqueue, timer, and slow-path task

Add a dedicated single-thread workqueue and a timer for each PF
to drive deferred slow-path work such as link event handling and
stats collection. The timer is stopped via timer_delete_sync()
when interrupts are disabled and restarted on open.

While the close path stops the timer to prevent new tasks from
being scheduled, the sp_task and workqueue are preserved to
maintain state continuity. Final draining and destruction of
the workqueue are handled during PCI remove.

Signed-off-by: Bhargava Marreddy <bhargava.marreddy@broadcom.com>
Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Ajit Kumar Khaparde <ajit.khaparde@broadcom.com>
Link: https://patch.msgid.link/20260406180420.279470-2-bhargava.marreddy@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoata: libata-scsi: fix requeue of deferred ATA PASS-THROUGH commands
Igor Pylypiv [Sun, 12 Apr 2026 15:36:37 +0000 (08:36 -0700)] 
ata: libata-scsi: fix requeue of deferred ATA PASS-THROUGH commands

Commit 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
introduced ata_scsi_requeue_deferred_qc() to handle commands deferred
during resets or NCQ failures. This deferral logic completed commands
with DID_SOFT_ERROR to trigger a retry in the SCSI mid-layer.

However, DID_SOFT_ERROR is subject to scsi_cmd_retry_allowed() checks.
ATA PASS-THROUGH commands sent via SG_IO ioctl have scmd->allowed set
to zero. This causes the mid-layer to fail the command immediately
instead of retrying, even though the command was never actually issued
to the hardware.

Switch to DID_REQUEUE to ensure these commands are inserted back into
the request queue regardless of retry limits.

Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
2 weeks agoMerge branch 'add-tso-map-once-dma-helpers-and-bnxt-sw-uso-support'
Jakub Kicinski [Sun, 12 Apr 2026 17:54:35 +0000 (10:54 -0700)] 
Merge branch 'add-tso-map-once-dma-helpers-and-bnxt-sw-uso-support'

Joe Damato says:

====================
Add TSO map-once DMA helpers and bnxt SW USO support

Greetings:

This series extends net/tso to add a data structure and some helpers allowing
drivers to DMA map headers and packet payloads a single time. The helpers can
then be used to reference slices of shared mapping for each segment. This
helps to avoid the cost of repeated DMA mappings, especially on systems which
use an IOMMU. N per-packet DMA maps are replaced with a single map for the
entire GSO skb. As of v3, the series uses the DMA IOVA API (as suggested by
Leon [1]) and provides a fallback path when an IOMMU is not in use. The DMA
IOVA API provides even better efficiency than the v2; see below.

The added helpers are then used in bnxt to add support for software UDP
Segmentation Offloading (SW USO) for older bnxt devices which do not have
support for USO in hardware. Since the helpers are generic, other drivers
can be extended similarly.

The v2 showed a ~4x reduction in DMA mapping calls at the same wire packet
rate on production traffic with a bnxt device. The v3, however, shows a larger
reduction of about ~6x at the same wire packet rate. This is thanks to Leon's
suggestion of using the DMA IOVA API [1].

Special care is taken to make bnxt ethtool operations work correctly: the ring
size cannot be reduced below a minimum threshold while USO is enabled and
growing the ring automatically re-enables USO if it was previously blocked.

This v10 contains some cosmetic changes (wrapping long lines), moves the test
to the correct directory, and attempts to fix the slot availability check
added in the v9.

I re-ran the python test and the test passed on my bnxt system. I also ran
this on a production system.
====================

Link: https://patch.msgid.link/20260408230607.2019402-1-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: drv-net: Add USO test
Joe Damato [Wed, 8 Apr 2026 23:05:59 +0000 (16:05 -0700)] 
selftests: drv-net: Add USO test

Add a simple test for USO. Tests both ipv4 and ipv6 with several full
segments and a partial segment.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-11-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: bnxt: Dispatch to SW USO
Joe Damato [Wed, 8 Apr 2026 23:05:58 +0000 (16:05 -0700)] 
net: bnxt: Dispatch to SW USO

Wire in the SW USO path added in preceding commits when hardware USO is
not possible.

When a GSO skb with SKB_GSO_UDP_L4 arrives and the NIC lacks HW USO
capability, redirect to bnxt_sw_udp_gso_xmit() which handles software
segmentation into individual UDP frames submitted directly to the TX
ring.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-10-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: bnxt: Add SW GSO completion and teardown support
Joe Damato [Wed, 8 Apr 2026 23:05:57 +0000 (16:05 -0700)] 
net: bnxt: Add SW GSO completion and teardown support

Update __bnxt_tx_int and bnxt_free_one_tx_ring_skbs to handle SW GSO
segments:

- MID segments: adjust tx_pkts/tx_bytes accounting and skip skb free
  (the skb is shared across all segments and freed only once)

- LAST segments: call tso_dma_map_complete() to tear down the IOVA
  mapping if one was used. On the fallback path, payload DMA unmapping
  is handled by the existing per-BD dma_unmap_len walk.

Both MID and LAST completions advance tx_inline_cons to release the
segment's inline header slot back to the ring.

is_sw_gso is initialized to zero, so the new code paths are not run.

Add logic for feature advertisement and guardrails for ring sizing.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-9-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: bnxt: Implement software USO
Joe Damato [Wed, 8 Apr 2026 23:05:56 +0000 (16:05 -0700)] 
net: bnxt: Implement software USO

Implement bnxt_sw_udp_gso_xmit() using the core tso_dma_map API and
the pre-allocated TX inline buffer for per-segment headers.

The xmit path:
1. Calls tso_start() to initialize TSO state
2. Stack-allocates a tso_dma_map and calls tso_dma_map_init() to
   DMA-map the linear payload and all frags upfront.
3. For each segment:
   - Copies and patches headers via tso_build_hdr() into the
     pre-allocated tx_inline_buf (DMA-synced per segment)
   - Counts payload BDs via tso_dma_map_count()
   - Emits long BD (header) + ext BD + payload BDs
   - Payload BDs use tso_dma_map_next() which yields (dma_addr,
     chunk_len, mapping_len) tuples.

Header BDs set dma_unmap_len=0 since the inline buffer is pre-allocated
and unmapped only at ring teardown.

Completion state is updated by calling tso_dma_map_completion_save() for
the last segment.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-8-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: bnxt: Add boilerplate GSO code
Joe Damato [Wed, 8 Apr 2026 23:05:55 +0000 (16:05 -0700)] 
net: bnxt: Add boilerplate GSO code

Add bnxt_gso.c and bnxt_gso.h with a stub bnxt_sw_udp_gso_xmit()
function, SW USO constants (BNXT_SW_USO_MAX_SEGS,
BNXT_SW_USO_MAX_DESCS), and the is_sw_gso field in bnxt_sw_tx_bd
with BNXT_SW_GSO_MID/LAST markers.

The full SW USO implementation will be added in a future commit.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-7-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: bnxt: Add TX inline buffer infrastructure
Joe Damato [Wed, 8 Apr 2026 23:05:54 +0000 (16:05 -0700)] 
net: bnxt: Add TX inline buffer infrastructure

Add per-ring pre-allocated inline buffer fields (tx_inline_buf,
tx_inline_dma, tx_inline_size) to bnxt_tx_ring_info and helpers to
allocate and free them. A producer and consumer (tx_inline_prod,
tx_inline_cons) are added to track which slot(s) of the inline buffer
are in-use.

The inline buffer will be used by the SW USO path for pre-allocated,
pre-DMA-mapped per-segment header copies. In the future, this
could be extended to support TX copybreak.

Allocation helper is marked __maybe_unused in this commit because it
will be wired in later.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-6-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: bnxt: Use dma_unmap_len for TX completion unmapping
Joe Damato [Wed, 8 Apr 2026 23:05:53 +0000 (16:05 -0700)] 
net: bnxt: Use dma_unmap_len for TX completion unmapping

Store the DMA mapping length in each TX buffer descriptor via
dma_unmap_len_set at submit time, and use dma_unmap_len at completion
time.

This is a no-op for normal packets but prepares for software USO,
where header BDs set dma_unmap_len to 0 because the header buffer
is unmapped collectively rather than per-segment.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-5-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: bnxt: Add a helper for tx_bd_ext
Joe Damato [Wed, 8 Apr 2026 23:05:52 +0000 (16:05 -0700)] 
net: bnxt: Add a helper for tx_bd_ext

Factor out some code to setup tx_bd_exts into a helper function. This
helper will be used by SW USO implementation in the following commits.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-4-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: bnxt: Export bnxt_xmit_get_cfa_action
Joe Damato [Wed, 8 Apr 2026 23:05:51 +0000 (16:05 -0700)] 
net: bnxt: Export bnxt_xmit_get_cfa_action

Export bnxt_xmit_get_cfa_action so that it can be used in future commits
which add software USO support to bnxt.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-3-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: tso: Introduce tso_dma_map and helpers
Joe Damato [Wed, 8 Apr 2026 23:05:50 +0000 (16:05 -0700)] 
net: tso: Introduce tso_dma_map and helpers

Add struct tso_dma_map to tso.h for tracking DMA addresses of mapped
GSO payload data and tso_dma_map_completion_state.

The tso_dma_map combines DMA mapping storage with iterator state, allowing
drivers to walk pre-mapped DMA regions linearly. Includes fields for
the DMA IOVA path (iova_state, iova_offset, total_len) and a fallback
per-region path (linear_dma, frags[], frag_idx, offset).

The tso_dma_map_completion_state makes the IOVA completion state opaque
for drivers. Drivers are expected to allocate this and use the added
helpers to update the completion state.

Adds skb_frag_phys() to skbuff.h, returning the physical address
of a paged fragment's data, which is used by the tso_dma_map helpers
introduced in this commit described below.

The added TSO DMA map helpers are:

tso_dma_map_init(): DMA-maps the linear payload region and all frags
upfront. Prefers the DMA IOVA API for a single contiguous mapping with
one IOTLB sync; falls back to per-region dma_map_phys() otherwise.
Returns 0 on success, cleans up partial mappings on failure.

tso_dma_map_cleanup(): Handles both IOVA and fallback teardown paths.

tso_dma_map_count(): counts how many descriptors the next N bytes of
payload will need. Returns 1 if IOVA is used since the mapping is
contiguous.

tso_dma_map_next(): yields the next (dma_addr, chunk_len) pair.
On the IOVA path, each segment is a single contiguous chunk. On the
fallback path, indicates when a chunk starts a new DMA mapping so the
driver can set dma_unmap_len on that descriptor for completion-time
unmapping.

tso_dma_map_completion_save(): updates the completion state. Drivers
will call this at xmit time.

tso_dma_map_complete(): tears down the mapping at completion time and
returns true if the IOVA path was used. If it was not used, this is a
no-op and returns false.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260408230607.2019402-2-joe@dama.to
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge tag 'wq-for-7.0-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 12 Apr 2026 17:42:40 +0000 (10:42 -0700)] 
Merge tag 'wq-for-7.0-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue fix from Tejun Heo:
 "This is a fix for a stall which triggers on ordered workqueues when
  there are multiple inactive work items during workqueue property
  changes through sysfs, which doesn't happen that frequently.

  While really late, the fix is very low risk as it just repeats an
  operation which is already being performed:

   - Fix incomplete activation of multiple inactive works when
     unplugging a pool_workqueue, where the pending_pwqs list
     wasn't being updated for subsequent works"

* tag 'wq-for-7.0-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Add pool_workqueue to pending_pwqs list when unplugging multiple inactive works

2 weeks agoalpha: Define pgprot_modify to silence tautological comparison warnings
Matt Turner [Fri, 3 Apr 2026 15:01:28 +0000 (11:01 -0400)] 
alpha: Define pgprot_modify to silence tautological comparison warnings

Alpha's pgprot_noncached, pgprot_writecombine, and pgprot_device are
all identity macros, so the generic pgprot_modify() produces
tautological self-comparisons that GCC warns about:

  include/linux/pgtable.h:1701:25: warning: self-comparison always
  evaluates to true [-Wtautological-compare]

Since all caching attributes are no-ops on Alpha, define
pgprot_modify() to simply return newprot.

Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Magnus Lindholm <linmag7@gmail.com>
Link: https://lore.kernel.org/r/20260403150128.488513-1-mattst88@gmail.com
Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
2 weeks agoalpha: add support for SECCOMP and SECCOMP_FILTER
Magnus Lindholm [Thu, 9 Apr 2026 17:10:15 +0000 (19:10 +0200)] 
alpha: add support for SECCOMP and SECCOMP_FILTER

Add SECCOMP and SECCOMP_FILTER support to the Alpha architecture and fix
syscall entry and ptrace issues uncovered by the seccomp-bpf selftests.

The syscall entry path is reworked to consistently track syscall state
using r0, r1 and r2:
  - r1 holds the active syscall number
  - r2 preserves the original syscall number for restart
  - r0 carries the return value, with r19 (a3) indicating success/error

This allows syscall restarts to be permitted only for valid ERESTART*
return codes and prevents kernel-internal restart values from leaking to
userspace. The syscall tracing error marker is corrected to use the saved
syscall number slot, matching the Alpha ABI.

Additionally, implement minimal PTRACE_GETREGSET and PTRACE_SETREGSET
support for NT_PRSTATUS, exporting struct pt_regs directly. This fixes
ptrace-based seccomp tests that previously failed with -EIO.

With these changes, seccomp-bpf and ptrace syscall tests pass reliably on
Alpha.

Tested-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
Link: https://lore.kernel.org/r/20260409171439.8759-2-linmag7@gmail.com
Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
2 weeks agoMerge tag 'timers-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 12 Apr 2026 17:01:55 +0000 (10:01 -0700)] 
Merge tag 'timers-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "Two fixes for the time/timers subsystem:

   - Invert the inverted fastpath decision in check_tick_dependency(),
     which prevents NOHZ full to stop the tick. That's a regression
     introduced in the 7.0 merge window.

   - Prevent a unpriviledged DoS in the clockevents code, where user
     space can starve the timer interrupt by arming a timerfd or posix
     interval timer in a tight loop with an absolute expiry time in the
     past. The fix turned out to be incomplete and was was amended
     yesterday to make it work on some 20 years old AMD machines as
     well. All issues with it have been confirmed to be resolved by
     various reporters"

* tag 'timers-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevents: Prevent timer interrupt starvation
  tick/nohz: Fix inverted return value in check_tick_dependency() fast path

2 weeks agovsock/virtio: remove unnecessary call to `virtio_transport_get_ops`
Luigi Leonardi [Wed, 8 Apr 2026 15:21:02 +0000 (17:21 +0200)] 
vsock/virtio: remove unnecessary call to `virtio_transport_get_ops`

`virtio_transport_send_pkt_info` gets all the transport information
from the parameter `t_ops`. There is no need to call
`virtio_transport_get_ops()`.

Remove it.

Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260408-remove_parameter-v2-1-e00f31cf7a17@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: skb: clean up dead code after skb_kfree_head() simplification
Jiayuan Chen [Fri, 10 Apr 2026 03:47:32 +0000 (11:47 +0800)] 
net: skb: clean up dead code after skb_kfree_head() simplification

Since commit 0f42e3f4fe2a ("net: skb: fix cross-cache free of
KFENCE-allocated skb head"), skb_kfree_head() always calls kfree()
and no longer uses end_offset to distinguish between skb_small_head_cache
and generic kmalloc caches.

Clean up the leftovers:

- Remove the unused end_offset parameter from skb_kfree_head() and
  update all callers.
- Remove the SKB_SMALL_HEAD_HEADROOM guard in __skb_unclone_keeptruesize()
  which was protecting the old skb_kfree_head() logic.
- Update the SKB_SMALL_HEAD_CACHE_SIZE comment to reflect that the
  non-power-of-2 sizing is no longer used for free-path disambiguation.

No functional change.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260410034736.297900-1-jiayuan.chen@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonetkit: Don't emit scrub attribute for single device mode
Daniel Borkmann [Fri, 10 Apr 2026 07:23:34 +0000 (09:23 +0200)] 
netkit: Don't emit scrub attribute for single device mode

When userspace reads a single mode netkit device via RTM_GETLINK,
it receives IFLA_NETKIT_SCRUB=NETKIT_SCRUB_DEFAULT attribute from
netkit_fill_info(). If that attribute is echoed back to recreate
the device, the seen_scrub presence check in netkit_new_link()
causes creation to fail with -EOPNOTSUPP. Since it has no meaning
for single devices at this point, just don't dump it.

Fixes: 481038960538 ("netkit: Add single device mode for netkit")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260410072334.548232-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: lan743x: rename chip_rev to fpga_rev
Thangaraj Samynathan [Fri, 10 Apr 2026 08:57:10 +0000 (14:27 +0530)] 
net: lan743x: rename chip_rev to fpga_rev

The variable chip_rev stores the value read from the FPGA_REV
register and represents the FPGA revision. Rename it to fpga_rev
to better reflect its meaning.

No functional change intended.

Signed-off-by: Thangaraj Samynathan <thangaraj.s@microchip.com>
Link: https://patch.msgid.link/20260410085710.9246-1-thangaraj.s@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: netfilter: nft_tproxy.sh: adjust to socat changes
Florian Westphal [Thu, 9 Apr 2026 22:45:02 +0000 (00:45 +0200)] 
selftests: netfilter: nft_tproxy.sh: adjust to socat changes

Like e65d8b6f3092 ("selftests: drv-net: adjust to socat changes") we
need to add shut-none for this test too.

The extra 0-packet can trigger a second (unexpected) reply from the server.

Fixes: 7e37e0eacd22 ("selftests: netfilter: nft_tproxy.sh: add tcp tests")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20260408152432.24b8ad0d@kernel.org/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260409224506.27072-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge tag 'nf-next-26-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfi...
Jakub Kicinski [Sun, 12 Apr 2026 16:39:20 +0000 (09:39 -0700)] 
Merge tag 'nf-next-26-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter: updates for net-next

1-3) IPVS updates from Julian Anastasov to enhance visibility into
     IPVS internal state by exposing hash size, load factor etc and
     allows userspace to tune the load factor used for resizing hash
     tables.

4) reject empty/not nul terminated device names from xt_physdev.
   This isn't a bug fix; existing code doesn't require a c-string.
   But clean this up anyway because conceptually the interface name
   definitely should be a c-string.

5) Switch nfnetlink to skb_mac_header helpers that didn't exist back
   when this code was written.  This gives us additional debug checks
   but is not intended to change functionality.

6) Let the xt ttl/hoplimit match reject unknown operator modes.
   This is a cleanup, the evaluation function simply returns false when
   the mode is out of range.  From Marino Dzalto.

7) xt_socket match should enable defrag after all other checks. This
   bug is harmless, historically defrag could not be disabled either
   except by rmmod.

8) remove UDP-Lite conntrack support, from Fernando Fernandez Mancera.

9) Avoid a couple -Wflex-array-member-not-at-end warnings in the old
   xtables 32bit compat code, from Gustavo A. R. Silva.

10) nftables fwd expression should drop packets when their ttl/hl has
    expired.  This is a bug fix deferred, its not deemed important
    enough for -rc8.
11) Add additional checks before assuming the mac header is an ethernet
    header, from Zhengchuan Liang.

* tag 'nf-next-26-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: require Ethernet MAC header before using eth_hdr()
  netfilter: nft_fwd_netdev: check ttl/hl before forwarding
  netfilter: x_tables: Avoid a couple -Wflex-array-member-not-at-end warnings
  netfilter: conntrack: remove UDP-Lite conntrack support
  netfilter: xt_socket: enable defrag after all other checks
  netfilter: xt_HL: add pr_fmt and checkentry validation
  netfilter: nfnetlink: prefer skb_mac_header helpers
  netfilter: x_physdev: reject empty or not-nul terminated device names
  ipvs: add conn_lfactor and svc_lfactor sysctl vars
  ipvs: add ip_vs_status info
  ipvs: show the current conn_tab size to users
====================

Link: https://patch.msgid.link/20260410112352.23599-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet/sched: act_ct: Only release RCU read lock after ct_ft
Jamal Hadi Salim [Fri, 10 Apr 2026 11:16:27 +0000 (07:16 -0400)] 
net/sched: act_ct: Only release RCU read lock after ct_ft

When looking up a flow table in act_ct in tcf_ct_flow_table_get(),
rhashtable_lookup_fast() internally opens and closes an RCU read critical
section before returning ct_ft.
The tcf_ct_flow_table_cleanup_work() can complete before refcount_inc_not_zero()
is invoked on the returned ct_ft resulting in a UAF on the already freed ct_ft
object. This vulnerability can lead to privilege escalation.

Analysis from zdi-disclosures@trendmicro.com:
When initializing act_ct, tcf_ct_init() is called, which internally triggers
tcf_ct_flow_table_get().

static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params)

{
                struct zones_ht_key key = { .net = net, .zone = params->zone };
                struct tcf_ct_flow_table *ct_ft;
                int err = -ENOMEM;

                mutex_lock(&zones_mutex);
                ct_ft = rhashtable_lookup_fast(&zones_ht, &key, zones_params); // [1]
                if (ct_ft && refcount_inc_not_zero(&ct_ft->ref)) // [2]
                                goto out_unlock;
                ...
}

static __always_inline void *rhashtable_lookup_fast(
                struct rhashtable *ht, const void *key,
                const struct rhashtable_params params)
{
                void *obj;

                rcu_read_lock();
                obj = rhashtable_lookup(ht, key, params);
                rcu_read_unlock();

                return obj;
}

At [1], rhashtable_lookup_fast() looks up and returns the corresponding ct_ft
from zones_ht . The lookup is performed within an RCU read critical section
through rcu_read_lock() / rcu_read_unlock(), which prevents the object from
being freed. However, at the point of function return, rcu_read_unlock() has
already been called, and there is nothing preventing ct_ft from being freed
before reaching refcount_inc_not_zero(&ct_ft->ref) at [2]. This interval becomes
the race window, during which ct_ft can be freed.

Free Process:

tcf_ct_flow_table_put() is executed through the path tcf_ct_cleanup() call_rcu()
tcf_ct_params_free_rcu() tcf_ct_params_free() tcf_ct_flow_table_put().

static void tcf_ct_flow_table_put(struct tcf_ct_flow_table *ct_ft)
{
                if (refcount_dec_and_test(&ct_ft->ref)) {
                                rhashtable_remove_fast(&zones_ht, &ct_ft->node, zones_params);
                                INIT_RCU_WORK(&ct_ft->rwork, tcf_ct_flow_table_cleanup_work); // [3]
                                queue_rcu_work(act_ct_wq, &ct_ft->rwork);
                }
}

At [3], tcf_ct_flow_table_cleanup_work() is scheduled as RCU work

static void tcf_ct_flow_table_cleanup_work(struct work_struct *work)

{
                struct tcf_ct_flow_table *ct_ft;
                struct flow_block *block;

                ct_ft = container_of(to_rcu_work(work), struct tcf_ct_flow_table,
                                                                rwork);
                nf_flow_table_free(&ct_ft->nf_ft);
                block = &ct_ft->nf_ft.flow_block;
                down_write(&ct_ft->nf_ft.flow_block_lock);
                WARN_ON(!list_empty(&block->cb_list));
                up_write(&ct_ft->nf_ft.flow_block_lock);
                kfree(ct_ft); // [4]

                module_put(THIS_MODULE);
}

tcf_ct_flow_table_cleanup_work() frees ct_ft at [4]. When this function executes
between [1] and [2], UAF occurs.

This race condition has a very short race window, making it generally
difficult to trigger. Therefore, to trigger the vulnerability an msleep(100) was
inserted after[1]

Fixes: 138470a9b2cc2 ("net/sched: act_ct: fix lockdep splat in tcf_ct_flow_table_get")
Reported-by: zdi-disclosures@trendmicro.com
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260410111627.46611-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoASoC: dt-bindings: rockchip: convert rk3399-gru-sound to DT Schema
Anushka Badhe [Fri, 10 Apr 2026 05:55:32 +0000 (11:25 +0530)] 
ASoC: dt-bindings: rockchip: convert rk3399-gru-sound to DT Schema

Convert the rockchip,rk3399-gru-sound.txt DT binding to DT Schema
format.

Update rockchip,cpu from a single I2S controller phandle to a
phandle-array. Add an optional second entry for the SPDIF controller,
as seen in rk3399-gru.dtsi, required by boards with DisplayPort audio.

Signed-off-by: Anushka Badhe <anushkabadhe@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260410055532.60868-1-anushkabadhe@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2 weeks agoMerge tag 'linux-can-fixes-for-7.0-20260409' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Sun, 12 Apr 2026 16:24:15 +0000 (09:24 -0700)] 
Merge tag 'linux-can-fixes-for-7.0-20260409' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2026-04-09

Johan Hovold's patch fixes the a devres lifetime in the ucan driver.

The last patch is by Samuel Page and fixes a use-after-free in
raw_rcv() in the CAN_RAW protocol.

* tag 'linux-can-fixes-for-7.0-20260409' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: raw: fix ro->uniq use-after-free in raw_rcv()
  can: ucan: fix devres lifetime
====================

Link: https://patch.msgid.link/20260409165942.588421-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge tag 'wireless-next-2026-04-10' of https://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Sun, 12 Apr 2026 16:17:42 +0000 (09:17 -0700)] 
Merge tag 'wireless-next-2026-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
Final updates, notably:
 - crypto: move Michael MIC code into wireless (only)
 - mac80211:
   - multi-link 4-addr support
   - NAN data support (but no drivers yet)
 - ath10k: DT quirk to make it work on some devices
 - ath12k: IPQ5424 support
 - rtw89: USB improvements for performance

* tag 'wireless-next-2026-04-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (124 commits)
  wifi: cfg80211: Explicitly include <linux/export.h> in michael-mic.c
  wifi: ath10k: Add device-tree quirk to skip host cap QMI requests
  dt-bindings: wireless: ath10k: Add quirk to skip host cap QMI requests
  crypto: Remove michael_mic from crypto_shash API
  wifi: ipw2x00: Use michael_mic() from cfg80211
  wifi: ath12k: Use michael_mic() from cfg80211
  wifi: ath11k: Use michael_mic() from cfg80211
  wifi: mac80211, cfg80211: Export michael_mic() and move it to cfg80211
  wifi: ipw2x00: Rename michael_mic() to libipw_michael_mic()
  wifi: libertas_tf: refactor endpoint lookup
  wifi: libertas: refactor endpoint lookup
  wifi: at76c50x: refactor endpoint lookup
  wifi: ath12k: Enable IPQ5424 WiFi device support
  wifi: ath12k: Add CE remap hardware parameters for IPQ5424
  wifi: ath12k: add ath12k_hw_regs for IPQ5424
  wifi: ath12k: add ath12k_hw_version_map entry for IPQ5424
  wifi: ath12k: Add ath12k_hw_params for IPQ5424
  dt-bindings: net: wireless: add ath12k wifi device IPQ5424
  wifi: ath10k: fix station lookup failure during disconnect
  wifi: ath12k: Create symlink for each radio in a wiphy
  ...
====================

Link: https://patch.msgid.link/20260410064703.735099-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agotcp: add indirect call wrapper in tcp_conn_request()
Eric Dumazet [Fri, 10 Apr 2026 17:49:50 +0000 (17:49 +0000)] 
tcp: add indirect call wrapper in tcp_conn_request()

Small improvement in SYN processing, to directly call
tcp_v6_init_seq_and_ts_off() or tcp_v4_init_seq_and_ts_off().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260410174950.745670-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: Rename ifq_idx to rxq_idx in netif_mp_* helpers
Daniel Borkmann [Fri, 10 Apr 2026 13:06:02 +0000 (15:06 +0200)] 
net: Rename ifq_idx to rxq_idx in netif_mp_* helpers

Rename the leftover ifq_idx parameter naming to rxq_idx to be
consistent with the rest of the file and the header declaration.
Back then this was taken out of the queue leasing series given
the cleanup is independent. No functional change.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/netdev/20260131160237.07789674@kernel.org
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20260410130602.552600-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoselftests: net: py: add test case filtering and listing
Jakub Kicinski [Fri, 10 Apr 2026 01:39:21 +0000 (18:39 -0700)] 
selftests: net: py: add test case filtering and listing

When developing new test cases and reproducing failures in
existing ones we currently have to run the entire test which
can take minutes to finish.

Add command line options for test selection, modeled after
kselftest_harness.h:

  -l       list tests (filtered, if filters were specified)
  -t name  include test
  -T name  exclude test

Since we don't have as clean separation into fixture / variant /
test as kselftest_harness this is not really a 1 to 1 match.
We have to lean on glob patterns instead.

Like in kselftest_harness filters are evaluated in order, first
match wins. If only exclusions are specified everything else is
included and vice versa.

Glob patterns (*, ?, [) are supported in addition to exact
matching.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Tested-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260410013921.1710295-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: fix reference tracker mismanagement in netdev_put_lock()
Jakub Kicinski [Fri, 10 Apr 2026 15:36:00 +0000 (08:36 -0700)] 
net: fix reference tracker mismanagement in netdev_put_lock()

dev_put() releases a reference which didn't have a tracker.
References without a tracker are accounted in the tracking
code as "no_tracker". We can't free the tracker and then
call dev_put(). The references themselves will be fine
but the tracking code will think it's a double-release:

  refcount_t: decrement hit 0; leaking memory.

IOW commit under fixes confused dev_put() (release never tracked
reference) with __dev_put() (just release the reference, skipping
the reference tracking infra).

Since __netdev_put_lock() uses dev_put() we can't feed a previously
tracked netdev ref into it. Let's flip things around.
netdev_put(dev, NULL) is the same as dev_put(dev) so make
netdev_put_lock() the real function and have __netdev_put_lock()
feed it a NULL tracker for all the cases that were untracked.

Fixes: d04686d9bc86 ("net: Implement netdev_nl_queue_create_doit")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260410153600.1984522-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agotcp: return a drop_reason from tcp_add_backlog()
Eric Dumazet [Thu, 9 Apr 2026 10:11:47 +0000 (10:11 +0000)] 
tcp: return a drop_reason from tcp_add_backlog()

Part of a stack canary removal from tcp_v{4,6}_rcv().

Return a drop_reason instead of a boolean, so that we no longer
have to pass the address of a local variable.

$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-37 (-37)
Function                                     old     new   delta
tcp_v6_rcv                                  3133    3129      -4
tcp_v4_rcv                                  3206    3202      -4
tcp_add_backlog                             1281    1252     -29
Total: Before=25567186, After=25567149, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409101147.1642967-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'ipvlan-multicast-delivery-changes'
Jakub Kicinski [Sun, 12 Apr 2026 16:05:59 +0000 (09:05 -0700)] 
Merge branch 'ipvlan-multicast-delivery-changes'

Eric Dumazet says:

====================
ipvlan: multicast delivery changes

As we did recently for macvlan, this series adds some relief
when ipvlan is under multicast storms.
====================

Link: https://patch.msgid.link/20260409085238.1122947-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoipvlan: avoid spinlock contention in ipvlan_multicast_enqueue()
Eric Dumazet [Thu, 9 Apr 2026 08:52:38 +0000 (08:52 +0000)] 
ipvlan: avoid spinlock contention in ipvlan_multicast_enqueue()

Under high stress, we spend a lot of time cloning skbs,
then acquiring a spinlock, then freeing the clone because
the queue is full.

Add a shortcut to avoid these costs under pressure, as we did
in macvlan with commit 0d5dc1d7aad1 ("macvlan: avoid spinlock
contention in macvlan_broadcast_enqueue()")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409085238.1122947-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoipvlan: ipvlan_handle_mode_l2() refactoring
Eric Dumazet [Thu, 9 Apr 2026 08:52:37 +0000 (08:52 +0000)] 
ipvlan: ipvlan_handle_mode_l2() refactoring

Reduce indentation level, and add a likely() clause
as we expect to process more unicast packets than multicast ones.

No functional change, this eases the following patch review.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260409085238.1122947-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: Add net_cookie to Dead loop messages
Chris J Arges [Wed, 8 Apr 2026 19:10:47 +0000 (14:10 -0500)] 
net: Add net_cookie to Dead loop messages

Network devices can have the same name within different network namespaces.
To help distinguish these devices, add the net_cookie value which can be
used to identify the netns.

Signed-off-by: Chris J Arges <carges@cloudflare.com>
Link: https://patch.msgid.link/20260408191056.1036330-1-carges@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'net-dsa-tag_rtl8_4-fixes-doc-and-set-keep'
Jakub Kicinski [Sun, 12 Apr 2026 16:03:58 +0000 (09:03 -0700)] 
Merge branch 'net-dsa-tag_rtl8_4-fixes-doc-and-set-keep'

Luiz Angelo Daros de Luca says:

====================
net: dsa: tag_rtl8_4: fixes doc and set keep

This small series addresses two points in the rtl8_4 tagger used by the
realtel rtl8365mb driver.

The first patch updates the documentation of the tag format while the
second patch sets the KEEP flag bit, ensuring that the switch
respects the frame's VLAN format as provided by the kernel.

These patches were previously part of a larger series but are being
submitted independently as they are self-contained and already
received review.

Link: https://patch.msgid.link/CAD++jLmX31KfhGXA6SMAPXb14dHSC1t4JQZ=PQvjh-3hUcnzJA@mail.gmail.com
====================

Link: https://patch.msgid.link/20260408-realtek_fixes-v1-0-915ff1404d56@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: dsa: tag_rtl8_4: set KEEP flag
Luiz Angelo Daros de Luca [Wed, 8 Apr 2026 20:31:02 +0000 (17:31 -0300)] 
net: dsa: tag_rtl8_4: set KEEP flag

KEEP=1 is needed because we should respect the format of the packet as
the kernel sends it to us. Unless tx forward offloading is used, the
kernel is giving us the packet exactly as it should leave the specified
port on the wire. Until now this was not needed because the ports were
always functioning in a standalone mode in a VLAN-unaware way, so the
switch would not tag or untag frames anyway. But arguably it should have
been KEEP=1 all along.

Co-developed-by: Alvin Å ipraga <alsi@bang-olufsen.dk>
Signed-off-by: Alvin Å ipraga <alsi@bang-olufsen.dk>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260408-realtek_fixes-v1-2-915ff1404d56@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: dsa: tag_rtl8_4: update format description
Alvin Å ipraga [Wed, 8 Apr 2026 20:31:01 +0000 (17:31 -0300)] 
net: dsa: tag_rtl8_4: update format description

Document the updated tag layout fields (EFID, VSEL/VIDX) and clarify
which bits are set/cleared when emitting tags.

Co-developed-by: Alvin Å ipraga <alsi@bang-olufsen.dk>
Signed-off-by: Alvin Å ipraga <alsi@bang-olufsen.dk>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260408-realtek_fixes-v1-1-915ff1404d56@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet/sched: cls_fw: fix NULL dereference of "old" filters before change()
Davide Caratti [Wed, 8 Apr 2026 15:24:36 +0000 (17:24 +0200)] 
net/sched: cls_fw: fix NULL dereference of "old" filters before change()

Like pointed out by Sashiko [1], since commit ed76f5edccc9 ("net: sched:
protect filter_chain list with filter_chain_lock mutex") TC filters are
added to a shared block and published to datapath before their ->change()
function is called. This is a problem for cls_fw: an invalid filter
created with the "old" method can still classify some packets before it
is destroyed by the validation logic added by Xiang.
Therefore, insisting with repeated runs of the following script:

 # ip link add dev crash0 type dummy
 # ip link set dev crash0 up
 # mausezahn  crash0 -c 100000 -P 10 \
 > -A 4.3.2.1 -B 1.2.3.4 -t udp "dp=1234" -q &
 # sleep 1
 # tc qdisc add dev crash0 egress_block 1 clsact
 # tc filter add block 1 protocol ip prio 1 matchall \
 > action skbedit mark 65536 continue
 # tc filter add block 1 protocol ip prio 2 fw
 # ip link del dev crash0

can still make fw_classify() hit the WARN_ON() in [2]:

 WARNING: ./include/net/pkt_cls.h:88 at fw_classify+0x244/0x250 [cls_fw], CPU#18: mausezahn/1399
 Modules linked in: cls_fw(E) act_skbedit(E)
 CPU: 18 UID: 0 PID: 1399 Comm: mausezahn Tainted: G            E       7.0.0-rc6-virtme #17 PREEMPT(full)
 Tainted: [E]=UNSIGNED_MODULE
 Hardware name: Red Hat KVM, BIOS 1.16.3-2.el9 04/01/2014
 RIP: 0010:fw_classify+0x244/0x250 [cls_fw]
 Code: 5c 49 c7 45 00 00 00 00 00 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 5b b8 ff ff ff ff 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 90 <0f> 0b 90 eb a0 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90
 RSP: 0018:ffffd1b7026bf8a8 EFLAGS: 00010202
 RAX: ffff8c5ac9c60800 RBX: ffff8c5ac99322c0 RCX: 0000000000000004
 RDX: 0000000000000001 RSI: ffff8c5b74d7a000 RDI: ffff8c5ac8284f40
 RBP: ffffd1b7026bf8d0 R08: 0000000000000000 R09: ffffd1b7026bf9b0
 R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000010000
 R13: ffffd1b7026bf930 R14: ffff8c5ac8284f40 R15: 0000000000000000
 FS:  00007fca40c37740(0000) GS:ffff8c5b74d7a000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fca40e822a0 CR3: 0000000005ca0001 CR4: 0000000000172ef0
 Call Trace:
  <TASK>
  tcf_classify+0x17d/0x5c0
  tc_run+0x9d/0x150
  __dev_queue_xmit+0x2ab/0x14d0
  ip_finish_output2+0x340/0x8f0
  ip_output+0xa4/0x250
  raw_sendmsg+0x147d/0x14b0
  __sys_sendto+0x1cc/0x1f0
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x126/0xf80
  entry_SYSCALL_64_after_hwframe+0x77/0x7f
 RIP: 0033:0x7fca40e822ba
 Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
 RSP: 002b:00007ffc248a42c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 000055ef233289d0 RCX: 00007fca40e822ba
 RDX: 000000000000001e RSI: 000055ef23328c30 RDI: 0000000000000003
 RBP: 000055ef233289d0 R08: 00007ffc248a42d0 R09: 0000000000000010
 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000001e
 R13: 00000000000186a0 R14: 0000000000000000 R15: 00007fca41043000
  </TASK>
 irq event stamp: 1045778
 hardirqs last  enabled at (1045784): [<ffffffff864ec042>] __up_console_sem+0x52/0x60
 hardirqs last disabled at (1045789): [<ffffffff864ec027>] __up_console_sem+0x37/0x60
 softirqs last  enabled at (1045426): [<ffffffff874d48c7>] __alloc_skb+0x207/0x260
 softirqs last disabled at (1045434): [<ffffffff874fe8f8>] __dev_queue_xmit+0x78/0x14d0

Then, because of the value in the packet's mark, dereference on 'q->handle'
with NULL 'q' occurs:

 BUG: kernel NULL  pointer dereference, address: 0000000000000038
 [...]
 RIP: 0010:fw_classify+0x1fe/0x250 [cls_fw]
 [...]

Skip "old-style" classification on shared blocks, so that the NULL
dereference is fixed and WARN_ON() is not hit anymore in the short
lifetime of invalid cls_fw "old-style" filters.

[1] https://sashiko.dev/#/patchset/20260331050217.504278-1-xmei5%40asu.edu
[2] https://elixir.bootlin.com/linux/v7.0-rc6/source/include/net/pkt_cls.h#L86

Fixes: faeea8bbf6e9 ("net/sched: cls_fw: fix NULL pointer dereference on shared blocks")
Fixes: ed76f5edccc9 ("net: sched: protect filter_chain list with filter_chain_lock mutex")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://patch.msgid.link/e39cbd3103a337f1e515d186fe697b4459d24757.1775661704.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'wangxun-improvement'
Jakub Kicinski [Sun, 12 Apr 2026 15:42:30 +0000 (08:42 -0700)] 
Merge branch 'wangxun-improvement'

Jiawen Wu says:

====================
Wangxun improvement

This patch series cleans up the code and enhances the implementation.
====================

Link: https://patch.msgid.link/20260407025616.33652-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: libwx: improve flow control setting
Jiawen Wu [Tue, 7 Apr 2026 02:56:16 +0000 (10:56 +0800)] 
net: libwx: improve flow control setting

Save the current mode of flow control, and enhance the statistics of
pause frames.

The received pause frames are divided into XON and XOFF to be counted.
And due to the hardware defect of SP devices, XON packets cannot be
trasmitted correctly, so Tx XON pause is disabled by default for those
devices.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-10-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: libwx: wrap-around and reset qmprc counter
Jiawen Wu [Tue, 7 Apr 2026 02:56:15 +0000 (10:56 +0800)] 
net: libwx: wrap-around and reset qmprc counter

The WX_PX_MPRC registers are not clear-on-read hardware counters. The
previous implementation directly read and accumulated these 32-bit values
into a 64-bit software counter. Now implement a rd32_wrap() helper
function to calculate the delta counter to correct the statistic.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-9-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: wangxun: schedule hardware stats update in watchdog
Jiawen Wu [Tue, 7 Apr 2026 02:56:14 +0000 (10:56 +0800)] 
net: wangxun: schedule hardware stats update in watchdog

Hardware statistics should be updated periodically in the watchdog to
prevent 32-bit registers from overflowing. This is also required for the
upcoming pause frame accounting logic, which relies on regular statistics
sampling.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-8-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: wangxun: reorder timer and work sync cancellations
Jiawen Wu [Tue, 7 Apr 2026 02:56:13 +0000 (10:56 +0800)] 
net: wangxun: reorder timer and work sync cancellations

When removing the device, timer_delete_sync(&wx->service_timer) is
called in .ndo_stop() after cancel_work_sync(&wx->service_task). This
may cause new work to be queued after device down.

Move unregister_netdev() before cancel_work_sync(), and use
timer_shutdown_sync() to prevent the timer from being re-armed.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-7-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: wangxun: move ethtool_ops.set_channels into libwx
Jiawen Wu [Tue, 7 Apr 2026 02:56:12 +0000 (10:56 +0800)] 
net: wangxun: move ethtool_ops.set_channels into libwx

Since function ops wx->setup_tc() is set in txgbe and ngbe,
ethtool_ops.set_channels can be implemented in libwx to reduce
duplicated code.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-6-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: wangxun: replace busy-wait reset flag with kernel mutex
Jiawen Wu [Tue, 7 Apr 2026 02:56:11 +0000 (10:56 +0800)] 
net: wangxun: replace busy-wait reset flag with kernel mutex

Replace the busy-wait loop using test_and_set_bit(WX_STATE_RESETTING)
with a proper per-device mutex to serialize reset operations.

The reset flag is reserved for other code paths (like watchdog), which
need tocheck if a reset is in process.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-5-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: ngbe: remove redundant macros
Jiawen Wu [Tue, 7 Apr 2026 02:56:10 +0000 (10:56 +0800)] 
net: ngbe: remove redundant macros

NGBE_NCSI_SUP and NGBE_NCSI_MASK are duplicate-defined, they can be
replaced by the macros defined in libwx. Just remove them.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260407025616.33652-4-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: ngbe: move the WOL functions to libwx
Jiawen Wu [Tue, 7 Apr 2026 02:56:09 +0000 (10:56 +0800)] 
net: ngbe: move the WOL functions to libwx

Remove duplicate-defined register macros, move the WOL implementation to
the library module. So that the WOL functions can be reused in txgbe
later.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20260407025616.33652-3-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agonet: ngbe: remove netdev->ethtool->wol_enabled setting
Jiawen Wu [Tue, 7 Apr 2026 02:56:08 +0000 (10:56 +0800)] 
net: ngbe: remove netdev->ethtool->wol_enabled setting

netdev->ethtool->wol_enabled is set in ethtool core code, so remove the
redundant setting in ngbe_set_wol().

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Joe Damato <joe@dama.to>
Link: https://patch.msgid.link/20260407025616.33652-2-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge tag 'sched-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 12 Apr 2026 15:30:20 +0000 (08:30 -0700)] 
Merge tag 'sched-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Ingo Molnar:
 "Fix DL server related slowdown to deferred fair tasks"

* tag 'sched-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Use revised wakeup rule for dl_server

2 weeks agonet: ethernet: ti: am65-cpsw: add support for J722S SoC family
Nora Schiffer [Tue, 7 Apr 2026 10:48:02 +0000 (12:48 +0200)] 
net: ethernet: ti: am65-cpsw: add support for J722S SoC family

The J722S CPSW3G is mostly identical to the AM64's, but additionally
supports SGMII.

Signed-off-by: Nora Schiffer <nora.schiffer@ew.tq-group.com>
Link: https://patch.msgid.link/6118e81358d47f455e4c1dbddf3ece6f3329e184.1775558273.git.nora.schiffer@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agodt-bindings: net: ti: k3-am654-cpsw-nuss: Add ti,j722s-cpsw-nuss compatible
Nora Schiffer [Tue, 7 Apr 2026 10:48:01 +0000 (12:48 +0200)] 
dt-bindings: net: ti: k3-am654-cpsw-nuss: Add ti,j722s-cpsw-nuss compatible

The J722S CPSW3G is mostly identical to the AM64's, but additionally
supports SGMII. The AM64 compatible ti,am642-cpsw-nuss is used as a
fallback.

Signed-off-by: Nora Schiffer <nora.schiffer@ew.tq-group.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/191e9f7e3a6c14eabe891a98c5fb646766479c0a.1775558273.git.nora.schiffer@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge branch 'dpll-zl3073x-add-ref-sync-pair-support'
Jakub Kicinski [Sun, 12 Apr 2026 15:27:43 +0000 (08:27 -0700)] 
Merge branch 'dpll-zl3073x-add-ref-sync-pair-support'

Ivan Vecera says:

====================
dpll: zl3073x: add ref-sync pair support

This series adds Reference-Sync pair support to the ZL3073x DPLL driver.
A Ref-Sync pair consists of a clock reference and a low-frequency sync
signal (e.g. 1 PPS) where the DPLL locks to the clock reference but
phase-aligns to the sync reference.

Patches 1-3 are preparatory cleanups and helper additions:
- Clean up esync get/set callbacks with early returns and use the
  zl3073x_out_is_ndiv() helper
- Convert open-coded clear-and-set bitfield patterns to FIELD_MODIFY()
- Add ref sync control and output clock type accessor helpers

Patch 4 adds the 'ref-sync-sources' phandle-array property to the
dpll-pin device tree binding schema and updates the ZL3073x binding
examples.

Patch 5 implements the driver support:
- ref_sync_get/set callbacks with frequency validation
- Automatic sync source exclusion from reference selection
- Device tree based ref-sync pair registration

Tested and verified on Microchip EDS2 (pcb8385) development board.
====================

Link: https://patch.msgid.link/20260408102716.443099-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agodpll: zl3073x: add ref-sync pair support
Ivan Vecera [Wed, 8 Apr 2026 10:27:16 +0000 (12:27 +0200)] 
dpll: zl3073x: add ref-sync pair support

Add support for ref-sync pair registration using the 'ref-sync-sources'
phandle property from device tree. A ref-sync pair consists of a clock
reference and a low-frequency sync signal where the DPLL locks to the
clock reference but phase-aligns to the sync reference.

The implementation:
- Stores fwnode handle in zl3073x_dpll_pin during pin registration
- Adds ref_sync_get/set callbacks to read and write the sync control
  mode and pair registers
- Validates ref-sync frequency constraints: sync signal must be 8 kHz
  or less, clock reference must be 1 kHz or more and higher than sync
- Excludes sync source from automatic reference selection by setting
  its priority to NONE on connect; on disconnect the priority is left
  as NONE and the user must explicitly make the pin selectable again
- Iterates ref-sync-sources phandles to register declared pairings
  via dpll_pin_ref_sync_pair_add()

Reviewed-by: Petr Oros <poros@redhat.com>
Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260408102716.443099-6-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agodt-bindings: dpll: add ref-sync-sources property
Ivan Vecera [Wed, 8 Apr 2026 10:27:15 +0000 (12:27 +0200)] 
dt-bindings: dpll: add ref-sync-sources property

Add ref-sync-sources phandle-array property to the dpll-pin schema
allowing board designers to declare which input pins can serve as
sync sources in a Reference-Sync pair.  A Ref-Sync pair consists of
a clock reference and a low-frequency sync signal where the DPLL locks
to the clock but phase-aligns to the sync reference.

Update both examples in the Microchip ZL3073x binding to demonstrate
the new property with a 1 PPS sync source paired to a clock source.

Reviewed-by: Petr Oros <poros@redhat.com>
Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260408102716.443099-5-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agodpll: zl3073x: add ref sync and output clock type helpers
Ivan Vecera [Wed, 8 Apr 2026 10:27:14 +0000 (12:27 +0200)] 
dpll: zl3073x: add ref sync and output clock type helpers

Add ZL_REF_SYNC_CTRL_MODE_REFSYNC_PAIR and ZL_REF_SYNC_CTRL_PAIR
register definitions.

Add inline helpers to get and set the sync control mode and sync pair
fields of the reference sync control register:

  zl3073x_ref_sync_mode_get/set() - ZL_REF_SYNC_CTRL_MODE field
  zl3073x_ref_sync_pair_get/set() - ZL_REF_SYNC_CTRL_PAIR field

Add inline helpers to get and set the clock type field of the output
mode register:

  zl3073x_out_clock_type_get/set() - ZL_OUTPUT_MODE_CLOCK_TYPE field

Convert existing esync callbacks to use the new helpers.

Reviewed-by: Petr Oros <poros@redhat.com>
Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260408102716.443099-4-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agodpll: zl3073x: use FIELD_MODIFY() for clear-and-set patterns
Ivan Vecera [Wed, 8 Apr 2026 10:27:13 +0000 (12:27 +0200)] 
dpll: zl3073x: use FIELD_MODIFY() for clear-and-set patterns

Replace open-coded clear-and-set bitfield operations with
FIELD_MODIFY().

Reviewed-by: Petr Oros <poros@redhat.com>
Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260408102716.443099-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agodpll: zl3073x: clean up esync get/set and use zl3073x_out_is_ndiv()
Ivan Vecera [Wed, 8 Apr 2026 10:27:12 +0000 (12:27 +0200)] 
dpll: zl3073x: clean up esync get/set and use zl3073x_out_is_ndiv()

Return -EOPNOTSUPP early in esync_get callbacks when esync is not
supported instead of conditionally populating the range at the end.
This simplifies the control flow by removing the finish label/goto
in the output variant and the conditional range assignment in both
input and output variants.

Replace open-coded N-div signal format switch statements with
zl3073x_out_is_ndiv() helper in esync_get, esync_set and
frequency_set callbacks.

Reviewed-by: Petr Oros <poros@redhat.com>
Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Link: https://patch.msgid.link/20260408102716.443099-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 weeks agoMerge tag 'ras-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 12 Apr 2026 15:27:09 +0000 (08:27 -0700)] 
Merge tag 'ras-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 MCE fix from Ingo Molnar:
 "Fix incorrect hardware errors reported on Zen3 CPUs, such as bogus
  L3 cache deferred errors (Yazen Ghannam)"

* tag 'ras-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce/amd: Filter bogus hardware errors on Zen3 clients

2 weeks agoMerge tag 'perf-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 12 Apr 2026 15:17:52 +0000 (08:17 -0700)] 
Merge tag 'perf-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Four Intel uncore PMU driver fixes by Zide Chen"

* tag 'perf-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Remove extra double quote mark
  perf/x86/intel/uncore: Fix die ID init and look up bugs
  perf/x86/intel/uncore: Skip discovery table for offline dies
  perf/x86/intel/uncore: Fix iounmap() leak on global_init failure

2 weeks agoMerge tag 'v7.0-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Sun, 12 Apr 2026 15:11:02 +0000 (08:11 -0700)] 
Merge tag 'v7.0-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

 - Enforce rx socket buffer limit in af_alg

 - Fix array overflow in af_alg_pull_tsgl

 - Fix out-of-bounds access when parsing extensions in X.509

 - Fix minimum rx size check in algif_aead

* tag 'v7.0-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: algif_aead - Fix minimum RX size check for decryption
  X.509: Fix out-of-bounds access when parsing extensions
  crypto: af_alg - Fix page reassignment overflow in af_alg_pull_tsgl
  crypto: af_alg - limit RX SG extraction by receive buffer budget

2 weeks agoi3c: master: adi: Fix error propagation for CCCs
Jorge Marques [Mon, 23 Mar 2026 16:11:33 +0000 (17:11 +0100)] 
i3c: master: adi: Fix error propagation for CCCs

adi_i3c_master_send_ccc_cmd() always returned 0, ignoring the transfer
result populated in the completion path. As a consequence, CCC command
errors were silently dropped, including the default -ETIMEDOUT and
later overwritten by adi_i3c_master_end_xfer_locked().

Fix this by returning xfer->ret so that callers correctly receive any
transfer error codes.

Fixes: a79ac2cdc91d ("i3c: master: Add driver for Analog Devices I3C Controller IP")
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Link: https://patch.msgid.link/20260323-ad4062-positive-error-fix-v3-5-30bdc68004be@analog.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: master: Fix error codes at send_ccc_cmd
Jorge Marques [Mon, 23 Mar 2026 16:11:32 +0000 (17:11 +0100)] 
i3c: master: Fix error codes at send_ccc_cmd

i3c_master_send_ccc_cmd_locked() would propagate cmd->err (positive,
Mx codes) to the ret variable, cascading down multiple methods until
reaching methods that explicitly stated they would return 0 on success
or negative error code. For example, the call chain:

  i3c_device_enable_ibi <- i3c_dev_enable_ibi_locked <-
  master->ops.enable_ibi <- i3c_master_enec_locked <-
  i3c_master_enec_disec_locked <- i3c_master_send_ccc_cmd_locked

Fix this by returning the ret value, callers can still read the cmd->err
value if ret is negative.

All corner cases where the Mx codes do need to be handled individually,
are resolved in previous commits. Those corner cases are all scenarios
when I3C_ERROR_M2 is expected and acceptable.
The prerequisite patches for the fix are:

  i3c: master: Move rstdaa error suppression
  i3c: master: Move entdaa error suppression
  i3c: master: Move bus_init error suppression

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-iio/aYXvT5FW0hXQwhm_@stanley.mountain/
Fixes: 3a379bbcea0a ("i3c: Add core I3C infrastructure")
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Link: https://patch.msgid.link/20260323-ad4062-positive-error-fix-v3-4-30bdc68004be@analog.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: master: Move bus_init error suppression
Jorge Marques [Mon, 23 Mar 2026 16:11:31 +0000 (17:11 +0100)] 
i3c: master: Move bus_init error suppression

Prepare to fix improper Mx positive error propagation in later commits
by handling Mx error codes where the i3c_ccc_cmd command is allocated.
The CCC DISEC to broadcast address is invoked with
i3c_master_enec_disec_locked() and yields error I3C_ERROR_M2 if there
are no devices active on the bus. This is expected at the bus
initialization stage, where it is not known yet that there are no active
devices on the bus. Add bool suppress_m2 argument to
i3c_master_enec_disec_locked() and update the call site at
i3c_master_bus_init() with the exact corner case to not require
propagating positive Mx error codes. Other call site should not suppress
the error code, for example, if a driver requests to peripheral to
disable events and the transfer is not acknowledged, this is an error
and should not proceed.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Link: https://patch.msgid.link/20260323-ad4062-positive-error-fix-v3-3-30bdc68004be@analog.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: master: Move entdaa error suppression
Jorge Marques [Mon, 23 Mar 2026 16:11:30 +0000 (17:11 +0100)] 
i3c: master: Move entdaa error suppression

Prepare to fix improper Mx positive error propagation in later commits
by handling Mx error codes where the i3c_ccc_cmd command is allocated.
The CCC ENTDAA is invoked with i3c_master_entdaa_locked() and yields
error I3C_ERROR_M2 if there are no devices active on the bus. Some
controllers may also yield if there are no more devices need an dynamic
address, since the sequence do always end in a NACK. Handle inside
i3c_master_entdaa_locked(), checking cmd->err directly. Both call sites
are updated, adi_i3c_master_do_daa() and cdns_i3c_master_do_daa().

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Link: https://patch.msgid.link/20260323-ad4062-positive-error-fix-v3-2-30bdc68004be@analog.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: master: Move rstdaa error suppression
Jorge Marques [Mon, 23 Mar 2026 16:11:29 +0000 (17:11 +0100)] 
i3c: master: Move rstdaa error suppression

Prepare to fix improper Mx positive error propagation in later
commits by handling Mx error codes where the i3c_ccc_cmd command
is allocated. Two of the four i3c_master_rstdaa_locked() are error
paths that already suppressed the return value, the remaining two
are changed to handle the I3C_ERROR_M2 Mx error code inside
i3c_master_rstdaa_locked(), checking cmd->err directly.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Jorge Marques <jorge.marques@analog.com>
Link: https://patch.msgid.link/20260323-ad4062-positive-error-fix-v3-1-30bdc68004be@analog.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: dw: Simplify xfer cleanup with __free(kfree)
Felix Gu [Sat, 4 Apr 2026 10:32:31 +0000 (18:32 +0800)] 
i3c: dw: Simplify xfer cleanup with __free(kfree)

Convert dw-i3c-master to use __free(kfree) guards for struct dw_i3c_xfer
allocations. This frees xfer objects automatically on scope exit, and
removes the now-unused dw_i3c_master_free_xfer() helper.

Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260404-dw-i3c-2-v3-2-8f7d146549c1@gmail.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: dw: Fix memory leak in dw_i3c_master_i3c_xfers()
Felix Gu [Sat, 4 Apr 2026 10:32:30 +0000 (18:32 +0800)] 
i3c: dw: Fix memory leak in dw_i3c_master_i3c_xfers()

The dw_i3c_master_i3c_xfers() function allocates memory for the xfer
structure using dw_i3c_master_alloc_xfer(). If pm_runtime_resume_and_get()
fails, the function returns without freeing the allocated xfer, resulting
in a memory leak.

Since dw_i3c_master_free_xfer() is a thin wrapper around kfree(), use
the __free(kfree) cleanup attribute to handle the free automatically on
all exit paths.

Fixes: 62fe9d06f570 ("i3c: dw: Add power management support")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260404-dw-i3c-2-v3-1-8f7d146549c1@gmail.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: master: renesas: Use __free(kfree) for xfer cleanup in renesas_i3c_send_ccc_cmd()
Felix Gu [Mon, 6 Apr 2026 12:43:17 +0000 (20:43 +0800)] 
i3c: master: renesas: Use __free(kfree) for xfer cleanup in renesas_i3c_send_ccc_cmd()

Use __free(kfree) for automatic cleanup, matching the pattern already
used in other functions in this driver.

Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Tested-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Reviewed-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260406-renesas-v3-2-4b724d7708f4@gmail.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: master: renesas: Fix memory leak in renesas_i3c_i3c_xfers()
Felix Gu [Mon, 6 Apr 2026 12:43:16 +0000 (20:43 +0800)] 
i3c: master: renesas: Fix memory leak in renesas_i3c_i3c_xfers()

The xfer structure allocated by renesas_i3c_alloc_xfer() was never freed
in the renesas_i3c_i3c_xfers() function. Use the __free(kfree) cleanup
attribute to automatically free the memory when the variable goes out of
scope.

Fixes: d028219a9f14 ("i3c: master: Add basic driver for the Renesas I3C controller")
Tested-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Reviewed-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Link: https://patch.msgid.link/20260406-renesas-v3-1-4b724d7708f4@gmail.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: master: dw-i3c: Balance PM runtime usage count on probe failure
Felix Gu [Sat, 21 Mar 2026 09:04:43 +0000 (17:04 +0800)] 
i3c: master: dw-i3c: Balance PM runtime usage count on probe failure

When DW_I3C_DISABLE_RUNTIME_PM_QUIRK is set, the probe function calls
pm_runtime_get_noresume() to prevent runtime suspend. However, if
i3c_master_register() fails, the error path does not balance this
call, leaving the usage count incremented.

Add pm_runtime_put_noidle() in the error cleanup path to properly
balance the usage count.

Fixes: fba0e56ee752 ("i3c: dw: Disable runtime PM on Agilex5 to avoid bus hang on IBI")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260321-dw-i3c-1-v1-1-821623aac7bb@gmail.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: master: dw-i3c: Fix missing reset assertion in remove() callback
Felix Gu [Fri, 20 Mar 2026 14:18:02 +0000 (22:18 +0800)] 
i3c: master: dw-i3c: Fix missing reset assertion in remove() callback

The reset line acquired during probe is currently left deasserted when
the driver is unbound.

Switch to devm_reset_control_get_optional_exclusive_deasserted() to
ensure the reset is automatically re-asserted by the devres core when
the driver is removed.

Fixes: 62fe9d06f570 ("i3c: dw: Add power management support")
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260320-dw-i3c-v3-1-477040c2e3f5@gmail.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: mipi-i3c-hci-pci: Enable IBI while runtime suspended for Intel controllers
Adrian Hunter [Fri, 6 Mar 2026 08:53:38 +0000 (10:53 +0200)] 
i3c: mipi-i3c-hci-pci: Enable IBI while runtime suspended for Intel controllers

Intel LPSS I3C controllers can wake from runtime suspend to receive
in-band interrupts (IBIs), and they also implement the MIPI I3C HCI
Multi-Bus Instance capability.  When multiple I3C bus instances share the
same PCI wakeup, the PCI parent must coordinate runtime PM so that all
instances suspend together and their mipi-i3c-hci runtime suspend
callbacks are invoked in a consistent manner.

Enable IBI-based wakeup by setting HCI_QUIRK_RPM_IBI_ALLOWED for the
intel-lpss-i3c platform device.  Also set HCI_QUIRK_RPM_PARENT_MANAGED so
that the mipi-i3c-hci core driver expects runtime PM to be controlled by
the PCI parent rather than by individual instances.  For all Intel HCI PCI
configurations, enable the corresponding control_instance_pm flag in the
PCI driver.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260306085338.62955-6-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: mipi-i3c-hci-pci: Add optional ability to manage child runtime PM
Adrian Hunter [Fri, 6 Mar 2026 08:53:37 +0000 (10:53 +0200)] 
i3c: mipi-i3c-hci-pci: Add optional ability to manage child runtime PM

Some platforms implement the MIPI I3C HCI Multi-Bus Instance capability,
where a single parent device hosts multiple I3C controller instances.  In
such designs, the parent - not the individual child instances - may need to
coordinate runtime PM so that all controllers runtime PM callbacks are
invoked in a controlled and synchronized manner.

For example, if the parent enables IBI-wakeup when transitioning into a
low-power state, every bus instance must remain able to receive IBIs up
until that point.  This requires deferring the individual controllers'
runtime suspend callbacks (which disable bus activity) until the parent
decides it is safe for all instances to suspend together.

To support this usage model:

  * Add runtime PM and system PM callbacks in the PCI driver to invoke
    the mipi-i3c-hci driver's runtime PM callbacks for each instance.

  * Introduce a driver-data flag, control_instance_pm, which opts into
    the new parent-managed PM behaviour.

  * Ensure the callbacks are only used when the corresponding instance is
    operational at suspend time.  This is reliable because the operational
    state cannot change while the parent device is undergoing a PM
    transition, and PCI always performs a runtime resume before system
    suspend on current configurations, so that suspend and resume alternate
    irrespective of whether it is runtime or system PM.

By that means, parent-managed runtime PM coordination for multi-instance
MIPI I3C HCI PCI devices is provided without altering existing behaviour on
platforms that do not require it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260306085338.62955-5-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: mipi-i3c-hci: Allow parent to manage runtime PM
Adrian Hunter [Fri, 6 Mar 2026 08:53:36 +0000 (10:53 +0200)] 
i3c: mipi-i3c-hci: Allow parent to manage runtime PM

Some platforms implement the MIPI I3C HCI Multi-Bus Instance capability,
where a single parent device hosts multiple I3C controller instances.  In
such designs, the parent - not the individual child instances - may need to
coordinate runtime PM so that all controllers runtime PM callbacks are
invoked in a controlled and synchronized manner.

For example, if the parent enables IBI-wakeup when transitioning into a
low-power state, every bus instance must remain able to receive IBIs up
until that point.  This requires deferring the individual controllers'
runtime suspend callbacks (which disable bus activity) until the parent
decides it is safe for all instances to suspend together.

To support this usage model:

  * Export the low-level runtime PM suspend and resume helpers so that
    the parent can explicitly invoke them.

  * Add a new quirk, HCI_QUIRK_RPM_PARENT_MANAGED, allowing platforms to
    bypass per-instance runtime PM callbacks and delegate control to the
    parent device.

  * Move DEFAULT_AUTOSUSPEND_DELAY_MS into the header so it can be shared
    by parent-managed PM implementations.

The new quirk allows platforms with multi-bus parent-managed PM
infrastructure to correctly coordinate runtime PM across all I3C HCI
instances.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260306085338.62955-4-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: mipi-i3c-hci: Add quirk to allow IBI while runtime suspended
Adrian Hunter [Fri, 6 Mar 2026 08:53:35 +0000 (10:53 +0200)] 
i3c: mipi-i3c-hci: Add quirk to allow IBI while runtime suspended

Some I3C controllers can be automatically runtime-resumed in order to
handle in-band interrupts (IBIs), meaning that runtime suspend does not
need to be blocked when IBIs are enabled.

For example, a PCI-attached controller in a low-power state may generate
a Power Management Event (PME) when the SDA line is pulled low to signal
the START condition of an IBI. The PCI subsystem will then runtime-resume
the device, allowing the IBI to be received without requiring the
controller to remain active.

Introduce a new quirk, HCI_QUIRK_RPM_IBI_ALLOWED, so that drivers can
opt-in to this capability via driver data.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260306085338.62955-3-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2 weeks agoi3c: mipi-i3c-hci-pci: Set d3hot_delay to 0 for Intel controllers
Adrian Hunter [Fri, 6 Mar 2026 08:53:34 +0000 (10:53 +0200)] 
i3c: mipi-i3c-hci-pci: Set d3hot_delay to 0 for Intel controllers

Set d3hot_delay to 0 for Intel controllers because a delay is not needed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260306085338.62955-2-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>